Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Diwakar Dhanuskodi
msung Mobile. Original message From: Cody Koeninger Date:19/07/2016 20:49 (GMT+05:30) To: Diwakar Dhanuskodi Cc: Martin Eden , user Subject: Re: Spark streaming takes longer time to read json into dataframes Yes, if you need more parallelism, you need to either add more

Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Diwakar Dhanuskodi
msung Mobile. Original message From: Cody Koeninger Date:19/07/2016 20:49 (GMT+05:30) To: Diwakar Dhanuskodi Cc: Martin Eden , user Subject: Re: Spark streaming takes longer time to read json into dataframes Yes, if you need more parallelism, you need to either add more

Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Cody Koeninger
> as you set in Kafka. > > Have you seen this? > http://spark.apache.org/docs/latest/streaming-kafka-integration.html > > M > > On Sat, Jul 16, 2016 at 5:26 AM, Diwakar Dhanuskodi > wrote: >> >> >> -- Forwarded message ------ >> From: Diwa

Re: Spark streaming takes longer time to read json into dataframes

2016-07-17 Thread Diwakar Dhanuskodi
. Original message From: Martin Eden Date:16/07/2016 14:01 (GMT+05:30) To: Diwakar Dhanuskodi Cc: user Subject: Re: Spark streaming takes longer time to read json into dataframes Hi, I would just do a repartition on the initial direct DStream since otherwise each RDD in the stream

Re: Spark streaming takes longer time to read json into dataframes

2016-07-16 Thread Martin Eden
at 5:26 AM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > > -- Forwarded message -- > From: Diwakar Dhanuskodi > Date: Sat, Jul 16, 2016 at 9:30 AM > Subject: Re: Spark streaming takes longer time to read json into dataframes > To: Jean Geo

Fwd: Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Diwakar Dhanuskodi
-- Forwarded message -- From: Diwakar Dhanuskodi Date: Sat, Jul 16, 2016 at 9:30 AM Subject: Re: Spark streaming takes longer time to read json into dataframes To: Jean Georges Perrin Hello, I need it on memory. Increased executor memory to 25G and executor cores to 3. Got

Re: Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Jean Georges Perrin
Do you need it on disk or just push it to memory? Can you try to increase memory or # of cores (I know it sounds basic) > On Jul 15, 2016, at 11:43 PM, Diwakar Dhanuskodi > wrote: > > Hello, > > I have 400K json messages pulled from Kafka into spark streaming using > DirectStream approach.

Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Diwakar Dhanuskodi
Hello, I have 400K json messages pulled from Kafka into spark streaming using DirectStream approach. Size of 400K messages is around 5G. Kafka topic is single partitioned. I am using rdd.read.json(_._2) inside foreachRDD to convert rdd into dataframe. It takes almost 2.3 minutes to convert into