Have you read this part of the documentation?
http://kafka.apache.org/documentation.html#semantics

Just wondering if that solves your use case.


On Mon, Feb 10, 2014 at 9:11 AM, Garry Turkington <
g.turking...@improvedigital.com> wrote:

> Hi,
>
> I've been doing some prototyping on Kafka for a few months now and like
> what I see. It's a good fit for some of my use cases in the areas of data
> distribution but also for processing - liking a lot of what I see in Samza.
> I'm now working through some of the operational issues and have a question
> to the community.
>
> I have several data sources that I want to push into Kafka but some of the
> most important are arriving as a stream of files being dropped either into
> a SFTP location or S3. Conceptually the data is really a stream but its
> being chunked and made more batch by the deployment model of the
> operational servers. So pulling the data into Kafka and seeing it more as a
> stream again is a big plus.
>
> But, I really don't want duplicate messages. I know Kafka provides at
> least once semantics and that's fine, I'm happy to have the de-dupe logic
> external to Kafka. And if I look at my producer I can build up a protocol
> around adding record metadata and using Zookeeper to give me pretty high
> confidence that my clients will know if they are reading from a file that
> was fully published into Kafka or not.
>
> I had assumed that this wouldn't be a unique use case but on doing a bunch
> of searches I really don't find much in terms of either tools that help or
> even just best practice patterns for handling this type of need to support
> exactly-once message processing.
>
> So now I'm thinking that either I just need better web search skills or
> that actually this isn't something many others are doing and if so then
> there's likely a reason for that.
>
> Any thoughts?
>
> Thanks
> Garry
>
>

Reply via email to