Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
ays read data from the beginning as you set to > “smallest”, otherwise if you set to “largest”, you will always get data > from the end immediately. > > > > There’s a JIRA and PR to follow this, but still not merged to the master, > you can check to see it (https://issues.

RE: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Shao, Saisai
/issues.apache.org/jira/browse/SPARK-2492). Thanks Jerry From: Abraham Jacob [mailto:abe.jac...@gmail.com<mailto:abe.jac...@gmail.com>] Sent: Saturday, October 11, 2014 6:57 AM To: Sean McNamara Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark Streaming Kafk

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Sean McNamara
ache.org/jira/browse/SPARK-2492). Thanks Jerry From: Abraham Jacob [mailto:abe.jac...@gmail.com<mailto:abe.jac...@gmail.com>] Sent: Saturday, October 11, 2014 6:57 AM To: Sean McNamara Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark Streaming KafkaUtils I

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
sues.apache.org/jira/browse/SPARK-2492 > ). > > > > Thanks > > Jerry > > > > *From:* Abraham Jacob [mailto:abe.jac...@gmail.com] > *Sent:* Saturday, October 11, 2014 6:57 AM > *To:* Sean McNamara > *Cc:* user@spark.apache.org > *Subject:* Re: Spark Streamin

RE: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Shao, Saisai
://issues.apache.org/jira/browse/SPARK-2492). Thanks Jerry From: Abraham Jacob [mailto:abe.jac...@gmail.com] Sent: Saturday, October 11, 2014 6:57 AM To: Sean McNamara Cc: user@spark.apache.org Subject: Re: Spark Streaming KafkaUtils Issue Probably this is the issue - http://www.michael-noll.com/blog/2014/10/01

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
Probably this is the issue - http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/ - Spark’s usage of the Kafka consumer parameter auto.offset.reset is different from Kafka’s semantic

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Sean McNamara
How long do you let the consumers run for? Is it less than 60 seconds by chance? auto.commit.interval.ms defaults to 6 (60 seconds). If so that may explain why you are seeing that behavior. Cheers, Sean On Oct 10, 2014, at 4:47 PM, Abraham Jacob mailto:abe.jac...@gmail.com>> wrote: S

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
Sure... I do set the group.id for all the consumers to be the same. Here is the code --- SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("Streaming WordCount"); sparkConf.set("spark.shuffle.manager", "SORT"); sparkConf.set("spark.streaming.unpersist", "tr

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Sean McNamara
Would you mind sharing the code leading to your createStream? Are you also setting group.id? Thanks, Sean On Oct 10, 2014, at 4:31 PM, Abraham Jacob wrote: > Hi Folks, > > I am seeing some strange behavior when using the Spark Kafka connector in > Spark streaming. > > I have a Kafka top

Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
Hi Folks, I am seeing some strange behavior when using the Spark Kafka connector in Spark streaming. I have a Kafka topic which has 8 partitions. I have a kafka producer that pumps some messages into this topic. On the consumer side I have a spark streaming application that that has 8 executors