subject:"Re\: Java Kafka Word Count Issue"

Re: Java Kafka Word Count Issue

2015-02-02 Thread Jadhav Shweta


Hi Sean,

Kafka Producer is working fine.
This is related to Spark.

How can i configure spark so that it will make sure to remember count from the 
beginning.

If my log.text file has

spark
apache
kafka
spark

My Spark program gives correct output as 

spark 2
apache 1
kafka 1

but when I append spark to my log.text file

Spark program gives output as

spark 1

which should be spark 3.

So how to handle this in Spark code.

Thanks and regards 
Shweta Jadhav



-Sean Owen so...@cloudera.com wrote: -
To: Jadhav Shweta jadhav.shw...@tcs.com
From: Sean Owen so...@cloudera.com
Date: 02/02/2015 04:13PM
Subject: Re: Java Kafka Word Count Issue

This is a question about the Kafka producer right? Not Spark

On Feb 2, 2015 10:34 AM, Jadhav Shweta jadhav.shw...@tcs.com wrote:

Hi All,

I am trying to run Kafka Word Count Program.
please find below, the link for the same
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java

I have set spark master to setMaster(local[*])

and I have started Kafka Producer which reads the file.

If my file has already few words
then after running Spark java program I get proper output.

But when i append new words in same file it starts word count again from 1.

If I need to do word count for already present and newly appended words exactly 
what changes I need to make in code for that.

P.S. I am using Spark spark-1.2.0-bin-hadoop2.3

Thanks and regards 
Shweta Jadhav
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you

Re: Java Kafka Word Count Issue

2015-02-02 Thread Sean Owen

First I would check your code to see how you are pushing records into the
topic. Is it reading the whole file each time and resending all of it?

Then see if you are using the same consumer.id on the Spark side. Otherwise
you are not reading from the same offset when restarting Spark but instead
reading from the default defined in Kafka by auto.offset.reset, which you
may be setting to 'smallest'.

This is why I think this is likely an issue with how you use Kafka.
On Feb 2, 2015 10:34 AM, Jadhav Shweta jadhav.shw...@tcs.com wrote:

Hi All,

I am trying to run Kafka Word Count Program.
please find below, the link for the same

https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java

I have set spark master to setMaster(local[*])

and I have started Kafka Producer which reads the file.

If my file has already few words
then after running Spark java program I get proper output.

But when i append new words in same file it starts word count again from 1.

If I need to do word count for already present and newly appended words
exactly what changes I need to make in code for that.

P.S. I am using Spark spark-1.2.0-bin-hadoop2.3

Thanks and regards
Shweta Jadhav

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Re: Java Kafka Word Count Issue

2015-02-02 Thread VISHNU SUBRAMANIAN

You can use updateStateByKey() to perform the above operation.

On Mon, Feb 2, 2015 at 4:29 PM, Jadhav Shweta jadhav.shw...@tcs.com wrote:

Hi Sean,

Kafka Producer is working fine.
This is related to Spark.

How can i configure spark so that it will make sure to remember count from
the beginning.

If my log.text file has

spark
apache
kafka
spark

My Spark program gives correct output as

spark 2
apache 1
kafka 1

but when I append spark to my log.text file

Spark program gives output as

spark 1

which should be spark 3.

So how to handle this in Spark code.

Thanks and regards
Shweta Jadhav

-Sean Owen so...@cloudera.com wrote: -
To: Jadhav Shweta jadhav.shw...@tcs.com
From: Sean Owen so...@cloudera.com
Date: 02/02/2015 04:13PM
Subject: Re: Java Kafka Word Count Issue

This is a question about the Kafka producer right? Not Spark
On Feb 2, 2015 10:34 AM, Jadhav Shweta jadhav.shw...@tcs.com wrote:

Hi All,

I am trying to run Kafka Word Count Program.
please find below, the link for the same

https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java

I have set spark master to setMaster(local[*])

and I have started Kafka Producer which reads the file.

If my file has already few words
then after running Spark java program I get proper output.

But when i append new words in same file it starts word count again from
1.

If I need to do word count for already present and newly appended words
exactly what changes I need to make in code for that.

P.S. I am using Spark spark-1.2.0-bin-hadoop2.3

Thanks and regards
Shweta Jadhav

Re: Java Kafka Word Count Issue

2015-02-02 Thread Jadhav Shweta


Hi,

I added checkpoint directory and now Using updateStateByKey()

import com.google.common.base.Optional;
Function2ListInteger, OptionalInteger, OptionalInteger updateFunction =
  new Function2ListInteger, OptionalInteger, OptionalInteger() {
@Override public OptionalInteger call(ListInteger values, 
OptionalInteger state) {
  Integer newSum = ...  // add the new values with the previous running 
count to get the new count
  return Optional.of(newSum);
}
  };
JavaPairDStreamString, Integer runningCounts = 
pairs.updateStateByKey(updateFunction);

But I didn't get what exactly I should assign in Integer newSum = ...  // add 
the new values with the previous running count to get the new count


Thanks and regards
Shweta Jadhav



-VISHNU SUBRAMANIAN johnfedrickena...@gmail.com wrote: -
To: Jadhav Shweta jadhav.shw...@tcs.com
From: VISHNU SUBRAMANIAN johnfedrickena...@gmail.com
Date: 02/02/2015 04:39PM
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: Java Kafka Word Count Issue

You can use updateStateByKey() to perform the above operation.

On Mon, Feb 2, 2015 at 4:29 PM, Jadhav Shweta jadhav.shw...@tcs.com wrote:

Hi Sean,

Kafka Producer is working fine.
This is related to Spark.

How can i configure spark so that it will make sure to remember count from the 
beginning.

If my log.text file has

spark
apache
kafka
spark

My Spark program gives correct output as 

spark 2
apache 1
kafka 1

but when I append spark to my log.text file

Spark program gives output as

spark 1

which should be spark 3.

So how to handle this in Spark code.

Thanks and regards 
Shweta Jadhav



-Sean Owen so...@cloudera.com wrote: -
To: Jadhav Shweta jadhav.shw...@tcs.com
From: Sean Owen so...@cloudera.com
Date: 02/02/2015 04:13PM
Subject: Re: Java Kafka Word Count Issue

This is a question about the Kafka producer right? Not Spark

On Feb 2, 2015 10:34 AM, Jadhav Shweta jadhav.shw...@tcs.com wrote:

Hi All,

I am trying to run Kafka Word Count Program.
please find below, the link for the same
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java

I have set spark master to setMaster(local[*])

and I have started Kafka Producer which reads the file.

If my file has already few words
then after running Spark java program I get proper output.

But when i append new words in same file it starts word count again from 1.

If I need to do word count for already present and newly appended words exactly 
what changes I need to make in code for that.

P.S. I am using Spark spark-1.2.0-bin-hadoop2.3

Thanks and regards 
Shweta Jadhav
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you

Re: Java Kafka Word Count Issue

Re: Java Kafka Word Count Issue

Re: Java Kafka Word Count Issue

Re: Java Kafka Word Count Issue

4 matches

Site Navigation

Mail list logo

Footer information