Cody, 
Yes , I am  printing  each  messages . It is  processing all  messages under 
each  dstream block.

Source systems are   publishing  1 Million messages /4 secs which is less than 
batch interval. The issue is  with  Directsream processing 10 message per 
event. When partitions were  increased to  20 in topic, DirectStream picksup 
only 200 messages ( I guess 10 for  each partition ) at a time for  processing 
. I have  16 executors running for  streaming ( both  yarn client & cluster 
mode). 
I am  expecting  Directsream to  process  1 million messages which  published 
in topic < batch interval . 

Using  createStream , It could  batch 150K messages and process . createStream 
is  better than  Directsream in  this  case . Again why only  150K.

Any  clarification is  much  appreciated  on directStream processing millions 
per batch .




Sent from Samsung Mobile.

<div>-------- Original message --------</div><div>From: Cody Koeninger 
<c...@koeninger.org> </div><div>Date:06/02/2016  01:30  (GMT+05:30) 
</div><div>To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> </div><div>Cc: 
user@spark.apache.org </div><div>Subject: Re: Kafka directsream receiving rate 
</div><div>
</div>Have you tried just printing each message, to see which ones are being 
processed?

On Fri, Feb 5, 2016 at 1:41 PM, Diwakar Dhanuskodi 
<diwakar.dhanusk...@gmail.com> wrote:
I am  able  to  see  no of  messages processed  per  event  in  sparkstreaming 
web UI . Also  I am  counting  the  messages inside  foreachRDD .
Removed  the  settings for  backpressure but still  the  same .





Sent from Samsung Mobile.


-------- Original message --------
From: Cody Koeninger <c...@koeninger.org>
Date:06/02/2016 00:33 (GMT+05:30)
To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: Kafka directsream receiving rate

How are you counting the number of messages?

I'd go ahead and remove the settings for backpressure and maxrateperpartition, 
just to eliminate that as a variable.

On Fri, Feb 5, 2016 at 12:22 PM, Diwakar Dhanuskodi 
<diwakar.dhanusk...@gmail.com> wrote:
I am  using  one  directsream. Below  is  the  call  to directsream:-

val topicSet = topics.split(",").toSet
val kafkaParams = Map[String,String]("bootstrap.servers" -> 
"datanode4.isdp.com:9092")
val k = 
KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc, 
kafkaParams, topicSet)

When  I replace   DirectStream call  to  createStream,  all  messages were  
read  by  one  Dstream block.:-
val k = KafkaUtils.createStream(ssc, "datanode4.isdp.com:2181","resp",topicMap 
,StorageLevel.MEMORY_ONLY)

I am  using   below  spark-submit to execute:
./spark-submit --master yarn-client --conf 
"spark.dynamicAllocation.enabled=true" --conf 
"spark.shuffle.service.enabled=true" --conf "spark.sql.tungsten.enabled=false" 
--conf "spark.sql.codegen=false" --conf "spark.sql.unsafe.enabled=false" --conf 
"spark.streaming.backpressure.enabled=true" --conf "spark.locality.wait=1s" 
--conf "spark.shuffle.consolidateFiles=true"   --conf 
"spark.streaming.kafka.maxRatePerPartition=1000000" --driver-memory 2g 
--executor-memory 1g --class com.tcs.dime.spark.SparkReceiver   --files 
/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml,/etc/hadoop/conf/mapred-site.xml,/etc/hadoop/conf/yarn-site.xml,/etc/hive/conf/hive-site.xml
 --jars 
/root/dime/jars/spark-streaming-kafka-assembly_2.10-1.5.1.jar,/root/Jars/sparkreceiver.jar
 /root/Jars/sparkreceiver.jar




Sent from Samsung Mobile.


-------- Original message --------
From: Cody Koeninger <c...@koeninger.org>
Date:05/02/2016 22:07 (GMT+05:30)
To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: Kafka directsream receiving rate

If you're using the direct stream, you have 0 receivers.  Do you mean you have 
1 executor?

Can you post the relevant call to createDirectStream from your code, as well as 
any relevant spark configuration?

On Thu, Feb 4, 2016 at 8:13 PM, Diwakar Dhanuskodi 
<diwakar.dhanusk...@gmail.com> wrote:
Adding more info

Batch  interval  is  2000ms.
I expect all 100 messages  go thru one  dstream from  directsream but it 
receives at rate of 10 messages at time. Am  I missing  some  configurations 
here. Any help appreciated. 

Regards 
Diwakar.


Sent from Samsung Mobile.


-------- Original message --------
From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
Date:05/02/2016 07:33 (GMT+05:30)
To: user@spark.apache.org
Cc:
Subject: Kafka directsream receiving rate

Hi,
Using spark 1.5.1.
I have a topic with 20 partitions.  When I publish 100 messages. Spark direct 
stream is receiving 10 messages per  dstream. I have  only  one  receiver . 
When I used createStream the  receiver  received  entire 100 messages  at once. 
 

Appreciate  any  help .

Regards 
Diwakar


Sent from Samsung Mobile.



Reply via email to