The offset of a message in Kafka never changes. Thanks,
Jun On Thu, Jun 5, 2014 at 8:27 AM, Nagesh <nageswara.r...@gmail.com> wrote: > As Junn Rao said, it is pretty much possible multiple publishers publishes > to a topic and different group of consumers can consume a message and apply > group specific logic example raw data processing, aggregation etc., Each > distinguished group will receive a copy. > > But the offset cannot be used UUID as the counter may reset incase you > restart Kafka for some reasons. Not sure, can someone throw some light? > > Regards, > Nageswara Rao > > > On Thu, Jun 5, 2014 at 8:18 PM, Jun Rao <jun...@gmail.com> wrote: > > > It sounds like that you want to write to a data store and a data pipe > > atomically. Since both the data store and the data pipe that you want to > > use are highly available, the only case that you want to protect is the > > client failing btw the two writes. One way to do that is to let the > client > > publish to Kafka first with the strongest ack. Then, run a few consumers > to > > read data from Kafka and then write the data to the data store. Any one > of > > those consumers can die and the work will be automatically picked up by > the > > remaining ones. You can use partition id and the offset of each message > as > > its UUID if needed. > > > > Thanks, > > > > Jun > > > > > > On Wed, Jun 4, 2014 at 10:56 AM, Jonathan Hodges <hodg...@gmail.com> > > wrote: > > > > > Sorry didn't realize the mailing list wasn't copied... > > > > > > > > > ---------- Forwarded message ---------- > > > From: Jonathan Hodges <hodg...@gmail.com> > > > Date: Wed, Jun 4, 2014 at 10:56 AM > > > Subject: Re: Hadoop Summit Meetups > > > To: Neha Narkhede <neha.narkh...@gmail.com> > > > > > > > > > We have a number of customer facing online learning applications. > These > > > applications are using heterogeneous technologies with different data > > > models in underlying data stores such as RDBMS, Cassandra, MongoDB, > etc. > > > We would like to run offline analysis on the data contained in these > > > learning applications with tools like Hadoop and Spark. > > > > > > One thought is to use Kafka as a way for these learning applications to > > > emit data in near real-time for analytics. We developed a common model > > > represented as Avro records in HDFS that spans these learning > > applications > > > so that we can accept the same structured message from them. This > allows > > > for comparing apples to apples across these apps as opposed to messy > > > transformations. > > > > > > So this all sounds good until you dig into the details. One pattern is > > for > > > these applications to update state locally in their data stores first > and > > > then publish to Kafka. The problem with this is these two operations > > > aren't atomic so the local persist can succeed and the publish to Kafka > > > fail leaving the application and HDFS out of sync. You can try to add > > some > > > retry logic to the clients, but this quickly becomes very complicated > and > > > still doesn't solve the underlying problem. > > > > > > Another pattern is to publish to Kafka first with -1 and wait for the > ack > > > from leader and replicas before persisting locally. This is probably > > > better than the other pattern but does add some complexity to the > client. > > > The clients must now generate unique entity IDs/UUID for persistence > > when > > > they typically rely on the data store for creating these. Also the > > publish > > > to Kafka can succeed and persist locally can fail leaving the stores > out > > of > > > sync. In this case the learning application needs to determine how to > > get > > > itself in sync. It can rely on getting this back from Kafka, but it is > > > possible the local store failure can't be fixed in a timely manner e.g. > > > hardware failure, constraint, etc. In this case the application needs > to > > > show an error to the user and likely need to do something like send a > > > delete message to Kafka to remove the earlier published message. > > > > > > A third last resort pattern might be go the CDC route with something > like > > > Databus. This would require implementing additional fetchers and > relays > > to > > > support Cassandra and MongoDB. Also the data will need to be > transformed > > > on the Hadoop/Spark side for virtually every learning application since > > > they have different data models. > > > > > > I hope this gives enough detail to start discussing transactional > > messaging > > > in Kafka. We are willing to help in this effort if it makes sense for > > our > > > use cases. > > > > > > Thanks > > > Jonathan > > > > > > > > > > > > On Wed, Jun 4, 2014 at 9:44 AM, Neha Narkhede <neha.narkh...@gmail.com > > > > > wrote: > > > > > > > If you are comfortable, share it on the mailing list. If not, I'm > happy > > > to > > > > have this discussion privately. > > > > > > > > Thanks, > > > > Neha > > > > On Jun 4, 2014 9:42 AM, "Neha Narkhede" <neha.narkh...@gmail.com> > > wrote: > > > > > > > >> Glad it was useful. It will be great if you can share your > > requirements > > > >> on atomicity. A couple of us are very interested in thinking about > > > >> transactional messaging in Kafka. > > > >> > > > >> Thanks, > > > >> Neha > > > >> On Jun 4, 2014 6:57 AM, "Jonathan Hodges" <hodg...@gmail.com> > wrote: > > > >> > > > >>> Hi Neha, > > > >>> > > > >>> Thanks so much to you and the Kafka team for putting together the > > > meetup. > > > >>> It was very nice and gave people from out of town like us the > > ability > > > to > > > >>> join in person. > > > >>> > > > >>> We are the guys from Pearson Education and we talked a little about > > > >>> supplying some details on some of our use cases with respect to > > > atomicity > > > >>> of source systems eventing data and persisting locally. Should we > > just > > > >>> post to the list or is there somewhere else we should send these > > > details? > > > >>> > > > >>> Thanks again! > > > >>> Jonathan > > > >>> > > > >>> > > > >>> > > > >>> On Fri, Apr 11, 2014 at 9:31 AM, Neha Narkhede < > > > neha.narkh...@gmail.com> > > > >>> wrote: > > > >>> > > > >>> > Yes, that's a great idea. I can help organize the meetup at > > LinkedIn. > > > >>> > > > > >>> > Thanks, > > > >>> > Neha > > > >>> > > > > >>> > > > > >>> > On Fri, Apr 11, 2014 at 8:44 AM, Saurabh Agarwal (BLOOMBERG/ 731 > > > >>> LEXIN) < > > > >>> > sagarwal...@bloomberg.net> wrote: > > > >>> > > > > >>> > > great idea. I am interested in attending as well.... > > > >>> > > > > > >>> > > ----- Original Message ----- > > > >>> > > From: users@kafka.apache.org > > > >>> > > To: users@kafka.apache.org > > > >>> > > At: Apr 11 2014 11:40:56 > > > >>> > > > > > >>> > > With the Hadoop Summit in San Jose 6/3 - 6/5 I wondered if any > of > > > the > > > >>> > > LinkedIn geniuses were thinking of putting together a meet-up > on > > > any > > > >>> of > > > >>> > the > > > >>> > > associated technologies like Kafka, Samza, Databus, etc. For > us > > > poor > > > >>> > souls > > > >>> > > that don't live on the West Coast it was a great experience > > > >>> attending the > > > >>> > > Kafka meetup last year. > > > >>> > > > > > >>> > > Jonathan > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > > > > ------------------------------------------------------------------------------- > > > >>> > > > > > >>> > > > > >>> > > > >> > > > > > > > > > -- > Thanks & Regards, > Nageswara Rao.V >