Hello
I am trying to use CEP of Spark for log files (as batch job), but not for
streams (as realtime).
Is that possible ? If yes, do you know examples Scala codes about that ?
Or should I convert the log files (with time stamps) into streams ?
But how to handle time stamps in Spark ?
If I can n
Hi all,
In spark 2.2.1, when I load parquet files, it shows differently ordered
result of original dataset.
It seems like FileSourceScanExec.createNonBucketedReadRDD method sorts
parquet file splits by their own lengths.
-
val splitFiles = selectedPartitions.flatMap { partition =>
The other way might be to launch a single SparkContext and then run jobs
inside of it.
You can take a look at these projects:
-
https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs
- http://livy.incubator.apache.org
Problems with this way:
Hello,
We have a Spark cluster with 3 worker nodes available as EC2 on AWS. Spark
application is running in cluster mode and the checkpoints are stored in EFS.
Spark version used is 2.2.0.
We noticed the below error coming up – our understanding was that this
intermittent checkpoint issue will
unsubscribe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi All,
DataWorks Summit, San Jose, 2018 is a good place to share your experience of
advanced analytics, data science, machine learning and deep learning.
We have Artificial Intelligence and Data Science session, to cover technologies
such as:
Apache Spark, Sciki-learn, TensorFlow, Keras, Apache
Hi
Spark 2.0 doesn't support stored by . Is there any alternative to achieve
the same.