I hadn't noticed the UDP requirement before, that does complicate things,
and unless you're in absolute control of the network path, some data loss
is virtually guaranteed. Are you allowed to have more than one
"collector/producer" machine so that that if one fails you won't be stuck?
If you can have multiple collector/producer machines, is UDP multicasting
(with later deduplication) is an option?

Absolutely no data loss and no duplication can be pretty high standard--
doable, but are there there some aspects of your high-level design that can
be changed to more easily accommodate it?

Here, we use the stock java producer client, but we are transitioning to a
custom one that offers better guarantees under asynchronous operation. Our
use case is for logging data, so losing some or having it delivered late
won't stop anyone's progress, and we have retry logic built in to each step
in the chain. We probably lose some records here or there, but not enough
to drastically alter any outcomes for a user.

--Tom


On Thu, Jan 30, 2014 at 7:35 AM, Thibaud Chardonnens
<thibaud...@gmail.com>wrote:

> Thanks for your reply, but I am missing something, how do you push the
> data to a specific topic in your example? Through which client?
>
> Le 30 janv. 2014 à 15:16, Tom Brown <tombrow...@gmail.com> a écrit :
>
> > Why go with a fancy multithreaded producer architecture? Why not rely on
> a
> > simple python/perl/whatever implementation and let a scalable web server
> > handle the threading issues?
>
>

Reply via email to