date:20180207

Spark CEP with files and no streams ?

2018-02-07 Thread Esa Heikkinen

Hello I am trying to use CEP of Spark for log files (as batch job), but not for streams (as realtime). Is that possible ? If yes, do you know examples Scala codes about that ? Or should I convert the log files (with time stamps) into streams ? But how to handle time stamps in Spark ? If I can n

How to preserve the order of parquet files?

2018-02-07 Thread Kevin Jung

Hi all, In spark 2.2.1, when I load parquet files, it shows differently ordered result of original dataset. It seems like FileSourceScanExec.createNonBucketedReadRDD method sorts parquet file splits by their own lengths. - val splitFiles = selectedPartitions.flatMap { partition =>

Re: Sharing spark executor pool across multiple long running spark applications

2018-02-07 Thread Vadim Semenov

The other way might be to launch a single SparkContext and then run jobs inside of it. You can take a look at these projects: - https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs - http://livy.incubator.apache.org Problems with this way:

Issue with EFS checkpoint

2018-02-07 Thread Khan, Obaidur Rehman

Hello, We have a Spark cluster with 3 worker nodes available as EC2 on AWS. Spark application is running in cluster mode and the checkpoints are stored in EFS. Spark version used is 2.2.0. We noticed the below error coming up – our understanding was that this intermittent checkpoint issue will

unsubscribe

2018-02-07 Thread dmp

unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[CFP] DataWorks Summit, San Jose, 2018

2018-02-07 Thread Yanbo Liang

Hi All, DataWorks Summit, San Jose, 2018 is a good place to share your experience of advanced analytics, data science, machine learning and deep learning. We have Artificial Intelligence and Data Science session, to cover technologies such as: Apache Spark, Sciki-learn, TensorFlow, Keras, Apache

Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it

2018-02-07 Thread Pralabh Kumar

Hi Spark 2.0 doesn't support stored by . Is there any alternative to achieve the same.

Spark CEP with files and no streams ?

How to preserve the order of parquet files?

Re: Sharing spark executor pool across multiple long running spark applications

Issue with EFS checkpoint

unsubscribe

[CFP] DataWorks Summit, San Jose, 2018

Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it

7 matches

Site Navigation

Mail list logo

Footer information