RE: Regarding The Kafka Offset Management Issue In Direct Stream Approach.

2015-11-25 Thread Dave Ariens
Charan,

You may find this Gist useful for storing/retrieving offsets for Kafka topics:

https://gist.github.com/ariens/e6a39bc3dbeb11467e53


From: Cody Koeninger [c...@koeninger.org]
Sent: Friday, November 06, 2015 10:10 AM
To: users@kafka.apache.org
Subject: Re: Regarding The Kafka Offset Management Issue In Direct Stream 
Approach.

Questions about Spark-kafka integration are better directed to the Spark
user mailing list.

I'm not 100% sure what you're asking.  The spark createDirectStream api
will not store any offsets internally, unless you enable checkpointing.



On Sun, Nov 1, 2015 at 10:26 PM, Charan Ganga Phani Adabala <
char...@eiqnetworks.com> wrote:

> Hi All,
>
> We are working in Apache spark with Kafka integration, in this use case we
> are using DirectStream approach. we want to avoid the data loss in this
> approach for actually we take offsets and saving that offset into MongoDB.
>
> We want some clarification is Spark stores any offsets internally, let us
> explain some example :
>
> For the first rdd batch we get 0 to 5 offsets of events to be processed,
> but unexpectedly the application is crashed, then we started aging the
> application, then this job fetches again from 0 to 5 events or where the
> event stopped in previous job.
>
> We are not committing any offsets in the above process, because we have to
> commit offsets manually in DirectStream approach. Is that new job fetches
> events form 0th position.
>
>
>
>
>
> Thanks & Regards,
>
> *Ganga Phani Charan Adabala | Software Engineer*
>
> o:  +91-40-23116680 | c:  +91-9491418099
>
> e:  char...@eiqnetworks.com
>
> [image: cid:image001.jpg@01CF60B1.87C0C870]
> *EiQ Networks®, Inc.* |  www.eiqnetworks.com
>
> *www.socvue.com <http://www.socvue.com/>* | www.eiqfederal.com
>
>
>
> [image: Blog] <http://blog.eiqnetworks.com/>Blog
> <http://blog.eiqnetworks.com/>   [image: Twitter]
> <https://twitter.com/eiqnetworks> Twitter
> <https://twitter.com/eiqnetworks>   [image: LinkedIn]
> <http://www.linkedin.com/company/eiqnetworks> LinkedIn
> <http://www.linkedin.com/company/eiqnetworks>   [image: Facebook]
> <http://www.facebook.com/eiqnetworks> Facebook
> <http://www.facebook.com/eiqnetworks>
>
>
>
> *"This email is intended only for the use of the individual or entity
> named above and may contain information that is confidential and
> privileged. If you are not the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of the email is strictly
> prohibited. If you have received this email in error, please destroy
> the original message."*
>
>
>
>
>


Re: Regarding The Kafka Offset Management Issue In Direct Stream Approach.

2015-11-06 Thread Cody Koeninger
Questions about Spark-kafka integration are better directed to the Spark
user mailing list.

I'm not 100% sure what you're asking.  The spark createDirectStream api
will not store any offsets internally, unless you enable checkpointing.



On Sun, Nov 1, 2015 at 10:26 PM, Charan Ganga Phani Adabala <
char...@eiqnetworks.com> wrote:

> Hi All,
>
> We are working in Apache spark with Kafka integration, in this use case we
> are using DirectStream approach. we want to avoid the data loss in this
> approach for actually we take offsets and saving that offset into MongoDB.
>
> We want some clarification is Spark stores any offsets internally, let us
> explain some example :
>
> For the first rdd batch we get 0 to 5 offsets of events to be processed,
> but unexpectedly the application is crashed, then we started aging the
> application, then this job fetches again from 0 to 5 events or where the
> event stopped in previous job.
>
> We are not committing any offsets in the above process, because we have to
> commit offsets manually in DirectStream approach. Is that new job fetches
> events form 0th position.
>
>
>
>
>
> Thanks & Regards,
>
> *Ganga Phani Charan Adabala | Software Engineer*
>
> o:  +91-40-23116680 | c:  +91-9491418099
>
> e:  char...@eiqnetworks.com
>
> [image: cid:image001.jpg@01CF60B1.87C0C870]
> *EiQ Networks®, Inc.* |  www.eiqnetworks.com
>
> *www.socvue.com * | www.eiqfederal.com
>
>
>
> [image: Blog] Blog
>    [image: Twitter]
>  Twitter
>    [image: LinkedIn]
>  LinkedIn
>    [image: Facebook]
>  Facebook
> 
>
>
>
> *"This email is intended only for the use of the individual or entity
> named above and may contain information that is confidential and
> privileged. If you are not the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of the email is strictly
> prohibited. If you have received this email in error, please destroy
> the original message."*
>
>
>
>
>


Regarding The Kafka Offset Management Issue In Direct Stream Approach.

2015-11-01 Thread Charan Ganga Phani Adabala
Hi All,
We are working in Apache spark with Kafka integration, in this use case we are 
using DirectStream approach. we want to avoid the data loss in this approach 
for actually we take offsets and saving that offset into MongoDB.
We want some clarification is Spark stores any offsets internally, let us 
explain some example :
For the first rdd batch we get 0 to 5 offsets of events to be processed, but 
unexpectedly the application is crashed, then we started aging the application, 
then this job fetches again from 0 to 5 events or where the event stopped in 
previous job.
We are not committing any offsets in the above process, because we have to 
commit offsets manually in DirectStream approach. Is that new job fetches 
events form 0th position.


Thanks & Regards,
Ganga Phani Charan Adabala | Software Engineer
o:  +91-40-23116680 | c:  +91-9491418099
e:  char...@eiqnetworks.com
[cid:image001.jpg@01CF60B1.87C0C870]
EiQ Networks(r), Inc. |  www.eiqnetworks.com
www.socvue.com | 
www.eiqfederal.com

[Blog]Blog   
[Twitter]   
Twitter   [LinkedIn] 
  
LinkedIn   [Facebook] 
  
Facebook

"This email is intended only for the use of the individual or entity named 
above and may contain information that is confidential and privileged. If you 
are not the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the email is strictly prohibited. If you have 
received this email in error, please destroy the original message."




Re: Regarding the Kafka offset management issue in Direct Stream Approach.

2015-10-26 Thread Cody Koeninger
Questions about spark's kafka integration should probably be directed to
the spark user mailing list, not this one.  I don't monitor kafka mailing
lists as closely, for instance.

For the direct stream, Spark doesn't keep any state regarding offsets,
unless you enable checkpointing.  Have you read
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md





On Mon, Oct 26, 2015 at 3:43 AM, Charan Ganga Phani Adabala <
char...@eiqnetworks.com> wrote:

> Hi All,
>
>
>
> We are working in Apache spark with Kafka integration, in this use case we
> are using DirectStream approach. we want to avoid the data loss in this
> approach for actually we take offsets and saving that offset into MongoDB.
>
> We want some clarification is Spark stores any offsets internally, let us
> explain some *example* :
>
> For the first rdd batch *we get 0 to 5 offsets of events to be processed*,
> but *unexpectedly* the application is crashed, then we started aging the
> application, then this job *fetches again from 0 to 5 events or where the
> event stopped in previous job.*
>
> *We are not committing any offsets in the above process, because we have
> to commit offsets manually in DirectStream approach. Is that new job
> fetches events form 0th position.*
>
>
>
>
>
> Thanks & Regards,
>
> *Ganga Phani Charan Adabala | Software Engineer*
>
> o:  +91-40-23116680 | c:  +91-9491418099
>
> e:  char...@eiqnetworks.com
>
> [image: cid:image001.jpg@01CF60B1.87C0C870]
> *EiQ Networks®, Inc.* |  www.eiqnetworks.com
>
> *www.socvue.com * | www.eiqfederal.com
>
>
>
> [image: Blog] Blog
>    [image: Twitter]
>  Twitter
>    [image: LinkedIn]
>  LinkedIn
>    [image: Facebook]
>  Facebook
> 
>
>
>
> *"This email is intended only for the use of the individual or entity
> named above and may contain information that is confidential and
> privileged. If you are not the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of the email is strictly
> prohibited. If you have received this email in error, please destroy
> the original message."*
>
>
>
>
>


Regarding the Kafka offset management issue in Direct Stream Approach.

2015-10-26 Thread Charan Ganga Phani Adabala
Hi All,

We are working in Apache spark with Kafka integration, in this use case we are 
using DirectStream approach. we want to avoid the data loss in this approach 
for actually we take offsets and saving that offset into MongoDB.
We want some clarification is Spark stores any offsets internally, let us 
explain some example :
For the first rdd batch we get 0 to 5 offsets of events to be processed, but 
unexpectedly the application is crashed, then we started aging the application, 
then this job fetches again from 0 to 5 events or where the event stopped in 
previous job.
We are not committing any offsets in the above process, because we have to 
commit offsets manually in DirectStream approach. Is that new job fetches 
events form 0th position.


Thanks & Regards,
Ganga Phani Charan Adabala | Software Engineer
o:  +91-40-23116680 | c:  +91-9491418099
e:  char...@eiqnetworks.com
[cid:image001.jpg@01CF60B1.87C0C870]
EiQ Networks(r), Inc. |  www.eiqnetworks.com
www.socvue.com | 
www.eiqfederal.com

[Blog]Blog   
[Twitter]   
Twitter   [LinkedIn] 
  
LinkedIn   [Facebook] 
  
Facebook

"This email is intended only for the use of the individual or entity named 
above and may contain information that is confidential and privileged. If you 
are not the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the email is strictly prohibited. If you have 
received this email in error, please destroy the original message."