Re: Spark streaming to kafka exactly once
Ok, Thanks for your answers On 3/22/17, 1:34 PM, "Cody Koeninger" wrote: If you're talking about reading the same message multiple times in a failure situation, see https://github.com/koeninger/kafka-exactly-once If you're talking about producing the same message multiple times in a failure situation, keep an eye on https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging If you're talking about producers just misbehaving and sending different copies of what is essentially the same message from a domain perspective, you have to dedupe that with your own logic. On Wed, Mar 22, 2017 at 2:52 PM, Matt Deaver wrote: > You have to handle de-duplication upstream or downstream. It might > technically be possible to handle this in Spark but you'll probably have a > better time handling duplicates in the service that reads from Kafka. > > On Wed, Mar 22, 2017 at 1:49 PM, Maurin Lenglart > wrote: >> >> Hi, >> we are trying to build a spark streaming solution that subscribe and push >> to kafka. >> >> But we are running into the problem of duplicates events. >> >> Right now, I am doing a “forEachRdd” and loop over the message of each >> partition and send those message to kafka. >> >> >> >> Is there any good way of solving that issue? >> >> >> >> thanks > > > > > -- > Regards, > > Matt > Data Engineer > https://www.linkedin.com/in/mdeaver > http://mattdeav.pythonanywhere.com/
Re: Spark streaming to kafka exactly once
If you're talking about reading the same message multiple times in a failure situation, see https://github.com/koeninger/kafka-exactly-once If you're talking about producing the same message multiple times in a failure situation, keep an eye on https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging If you're talking about producers just misbehaving and sending different copies of what is essentially the same message from a domain perspective, you have to dedupe that with your own logic. On Wed, Mar 22, 2017 at 2:52 PM, Matt Deaver wrote: > You have to handle de-duplication upstream or downstream. It might > technically be possible to handle this in Spark but you'll probably have a > better time handling duplicates in the service that reads from Kafka. > > On Wed, Mar 22, 2017 at 1:49 PM, Maurin Lenglart > wrote: >> >> Hi, >> we are trying to build a spark streaming solution that subscribe and push >> to kafka. >> >> But we are running into the problem of duplicates events. >> >> Right now, I am doing a “forEachRdd” and loop over the message of each >> partition and send those message to kafka. >> >> >> >> Is there any good way of solving that issue? >> >> >> >> thanks > > > > > -- > Regards, > > Matt > Data Engineer > https://www.linkedin.com/in/mdeaver > http://mattdeav.pythonanywhere.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark streaming to kafka exactly once
You have to handle de-duplication upstream or downstream. It might technically be possible to handle this in Spark but you'll probably have a better time handling duplicates in the service that reads from Kafka. On Wed, Mar 22, 2017 at 1:49 PM, Maurin Lenglart wrote: > Hi, > we are trying to build a spark streaming solution that subscribe and push > to kafka. > > But we are running into the problem of duplicates events. > > Right now, I am doing a “forEachRdd” and loop over the message of each > partition and send those message to kafka. > > > > Is there any good way of solving that issue? > > > > thanks > -- Regards, Matt Data Engineer https://www.linkedin.com/in/mdeaver http://mattdeav.pythonanywhere.com/
Spark streaming to kafka exactly once
Hi, we are trying to build a spark streaming solution that subscribe and push to kafka. But we are running into the problem of duplicates events. Right now, I am doing a “forEachRdd” and loop over the message of each partition and send those message to kafka. Is there any good way of solving that issue? thanks
Re: Spark Streaming to Kafka
Thanks Saisai. On Wed, May 20, 2015 at 11:23 AM, Saisai Shao wrote: > I think here is the PR https://github.com/apache/spark/pull/2994 you > could refer to. > > 2015-05-20 13:41 GMT+08:00 twinkle sachdeva : > >> Hi, >> >> As Spark streaming is being nicely integrated with consuming messages >> from Kafka, so I thought of asking the forum, that is there any >> implementation available for pushing data to Kafka from Spark Streaming too? >> >> Any link(s) will be helpful. >> >> Thanks and Regards, >> Twinkle >> > >
Re: Spark Streaming to Kafka
I think here is the PR https://github.com/apache/spark/pull/2994 you could refer to. 2015-05-20 13:41 GMT+08:00 twinkle sachdeva : > Hi, > > As Spark streaming is being nicely integrated with consuming messages from > Kafka, so I thought of asking the forum, that is there any implementation > available for pushing data to Kafka from Spark Streaming too? > > Any link(s) will be helpful. > > Thanks and Regards, > Twinkle >
Spark Streaming to Kafka
Hi, As Spark streaming is being nicely integrated with consuming messages from Kafka, so I thought of asking the forum, that is there any implementation available for pushing data to Kafka from Spark Streaming too? Any link(s) will be helpful. Thanks and Regards, Twinkle