If it was my architecture, I'd consider publishing a "control message" either to same topic or separate saying "done with topic A". The "thing" that needs to continue copying A also needs to know where to continue, so presumably if the "done with A" message is missing, it will read from sorted-topic and find out where it was at. So even if we accidentally dropped the "done" message, it will waste a bit of time, but won't corrupt anything.
This is one of the use-cases that transactional producers would be so cool :) I can see how adding metadata to commit message can emulate some light-weight transactions, but I'd be concerned that this capability can get abused... P.S Thanks. I like my new address :) On Mon, Aug 3, 2015 at 6:46 PM, James Cheng <jch...@tivo.com> wrote: > Nice new email address, Gwen. :) > > On Aug 3, 2015, at 3:17 PM, Gwen Shapira <g...@confluent.io> wrote: > > > You are correct. You can see that ZookeeperConsumerConnector is hardcoded > > with null metadata. > > > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala#L310 > > > > More interesting, it looks like the Metadata is not exposed in the new > > KafkaConsumer either. > > > > Mind sharing what did you plan to use it for? this will help us figure > out > > how to expose it :) > > > > I'm juggling around a couple designs on what I'm trying to do, so it may > turn out that I actually don't need it. > > My generic answer is, I have some application state related to the > processing of a topic, and I wanted to store it somewhere persistent that > will survive across process death and disk failure. So storing it with the > offset seemed like a nice solution. I could alternatively store it in a > standalone kafka topic instead. > > The more detailed answer: I'm doing a time-based merge sort of 3 topics > (A, B, and C) and outputtiing the results into a new output topic (let's > call it "sorted-topic"). Except that during the initial creation of > "sorted-topic", and I want all of topic A to be output to sorted-topic > first, and then followed by an on-going merge sort of topics B, C, and any > updates to A that come along. > > I was trying to handle the situation of what happens if I crash when > initially copying topic A into sorted-topic. And I was thinking that I > could save some metadata in my topic A checkpoint that says "still doing > initial copy of A". So that way, when I start up next time, I would know to > continue copying topic A to the output. Once I have finished copying all of > A to sorted-topic, that I would store "finished doing initial copy of A" > into my checkpoint, and that upon restart, I would check that and know to > start doing the merge sort of A B C. > > I have a couple other designs that seem cleaner, tho, so I might not > actually need it. > > -James > > > Gwen > > > > > > On Mon, Aug 3, 2015 at 1:52 PM, James Cheng <jch...@tivo.com> wrote: > > > >> According to > >> > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest > , > >> we can store custom metadata with our checkpoints. It looks like the > high > >> level consumer does not support committing offsets with metadata, and > that > >> in order to checkpoint with custom metadata, we have to issue the > >> OffsetCommitRequest ourselves. Is that correct? > >> > >> Thanks, > >> -James > >> > >> > >