msung Mobile.
Original message From: Cody Koeninger
Date:19/07/2016 20:49 (GMT+05:30)
To: Diwakar Dhanuskodi Cc:
Martin Eden , user
Subject: Re: Spark streaming takes longer time to read json into
dataframes
Yes, if you need more parallelism, you need to either add more
msung Mobile.
Original message From: Cody Koeninger
Date:19/07/2016 20:49 (GMT+05:30)
To: Diwakar Dhanuskodi Cc:
Martin Eden , user
Subject: Re: Spark streaming takes longer time to read json into
dataframes
Yes, if you need more parallelism, you need to either add more
> as you set in Kafka.
>
> Have you seen this?
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html
>
> M
>
> On Sat, Jul 16, 2016 at 5:26 AM, Diwakar Dhanuskodi
> wrote:
>>
>>
>> -- Forwarded message ------
>> From: Diwa
.
Original message From: Martin Eden
Date:16/07/2016 14:01 (GMT+05:30)
To: Diwakar Dhanuskodi Cc:
user Subject: Re: Spark streaming takes
longer time to read json into dataframes
Hi,
I would just do a repartition on the initial direct DStream since otherwise
each RDD in the stream
at 5:26 AM, Diwakar Dhanuskodi <
diwakar.dhanusk...@gmail.com> wrote:
>
> -- Forwarded message --
> From: Diwakar Dhanuskodi
> Date: Sat, Jul 16, 2016 at 9:30 AM
> Subject: Re: Spark streaming takes longer time to read json into dataframes
> To: Jean Geo
-- Forwarded message --
From: Diwakar Dhanuskodi
Date: Sat, Jul 16, 2016 at 9:30 AM
Subject: Re: Spark streaming takes longer time to read json into dataframes
To: Jean Georges Perrin
Hello,
I need it on memory. Increased executor memory to 25G and executor cores
to 3. Got
Do you need it on disk or just push it to memory? Can you try to increase
memory or # of cores (I know it sounds basic)
> On Jul 15, 2016, at 11:43 PM, Diwakar Dhanuskodi
> wrote:
>
> Hello,
>
> I have 400K json messages pulled from Kafka into spark streaming using
> DirectStream approach.
Hello,
I have 400K json messages pulled from Kafka into spark streaming using
DirectStream approach. Size of 400K messages is around 5G. Kafka topic is
single partitioned. I am using rdd.read.json(_._2) inside foreachRDD to
convert rdd into dataframe. It takes almost 2.3 minutes to convert into