You can micro batch kafka contents into a file that's replicated (e.g. HDFS) and then ack all of the input tuples after the file has been closed.
On Wed, May 11, 2016 at 3:43 PM, Milind Vaidya <[email protected]> wrote: > in case of failure to upload a file or disk corruption leading to loss of > file, we have only current offset in Kafka Spout but have no record as to > which offsets were lost in the file which need to be replayed. So these can > be stored externally in zookeeper and could be used to account for lost > data. For them to save in ZK, they should be available in a bolt. > > On Wed, May 11, 2016 at 11:10 AM, Nathan Leung <[email protected]> wrote: > >> Why not just ack the tuple once it's been written to a file. If your >> topology fails then the data will be re-read from Kafka. Kafka spout >> already does this for you. Then uploading files to S3 is the >> responsibility of another job. For example, a storm topology that monitors >> the output folder. >> >> Monitoring the data from Kafka all the way out to S3 seems unnecessary. >> >> On Wed, May 11, 2016 at 1:50 PM, Milind Vaidya <[email protected]> wrote: >> >>> It does not matter, in the sense I am ready to upgrade if this thing is >>> in the roadmap. >>> >>> None the less >>> >>> kafka_2.9.2-0.8.1.1 apache-storm-0.9.4 >>> >>> >>> >>> >>> On Wed, May 11, 2016 at 5:53 AM, Abhishek Agarwal <[email protected]> >>> wrote: >>> >>>> which version of storm-kafka, are you using? >>>> >>>> On Wed, May 11, 2016 at 12:29 AM, Milind Vaidya <[email protected]> >>>> wrote: >>>> >>>>> Anybody ? Anything about this ? >>>>> >>>>> On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya <[email protected]> >>>>> wrote: >>>>> >>>>>> Is there any way I can know what Kafka offset corresponds to current >>>>>> tuple I am processing in a bolt ? >>>>>> >>>>>> Use case : Need to batch events from Kafka, persists them to a local >>>>>> file and eventually upload it to the S3. To manager failure cases, need >>>>>> to >>>>>> know the Kafka offset for a message, so that it can be persisted to >>>>>> Zookeeper and will be used to write / upload file. >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Abhishek Agarwal >>>> >>>> >>> >> >
