Re: New Producer API - batched sync mode support

Roshan Naik Mon, 27 Apr 2015 14:55:25 -0700

@Gwen

 - A failure in delivery of one or more events in the batch (typical Flume
case) is considered a failure of the entire batch and the client
redelivers the entire batch.
 - If clients want more fine grained control, alternative option is to
indicate which events failed in the return value of  producer.send(list<>)



@Joel
  Fine grained tracking of status of individual events is quite painful in
contrast to simply blocking on every batch. Old style Batched-sync mode
has great advantages in terms of simplicity and performance.
  Imagine a simple use case of client simply reading a directory of log
files and splitting them into log messages/events and pushing them through
kafka. Becomes more complex when all this tracking data needs to be
persisted to accommodate for client restarts/crashes. Tracking a simple
current line number and file name is easy to programming with and persist
to accommodateŠ as opposed start/end position of each log message in the
file. 


-roshan





On 4/27/15 2:07 PM, "Joel Koshy" <jjkosh...@gmail.com> wrote:

>As long as you retain the returned futures somewhere, you can always
>iterate over the futures after the flush completes and check for
>success/failure. Would that work for you?
>
>On Mon, Apr 27, 2015 at 08:53:36PM +0000, Roshan Naik wrote:
>> The important guarantee that is needed for a client producer thread is
>> that it requires an indication of success/failure of the batch of events
>> it pushed. Essentially it needs to retry producer.send() on that same
>> batch in case of failure. My understanding is that flush will simply
>>flush
>> data from all threads (correct me if I am wrong).
>> 
>> -roshan
>> 
>> 
>> 
>> On 4/27/15 1:36 PM, "Joel Koshy" <jjkosh...@gmail.com> wrote:
>> 
>> >This sounds like flush:
>> 
>>>https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+me
>>>th
>> >od+to+the+producer+API
>> >
>> >which was recently implemented in trunk.
>> >
>> >Joel
>> >
>> >On Mon, Apr 27, 2015 at 08:19:40PM +0000, Roshan Naik wrote:
>> >> Been evaluating the perf of old and new Produce APIs for reliable
>>high
>> >>volume streaming data movement. I do see one area of improvement that
>> >>the new API could use for synchronous clients.
>> >> 
>> >> AFAIKT, the new API does not support batched synchronous transfers.
>>To
>> >>do synchronous send, one needs to do a future.get() after every
>> >>Producer.send(). I changed the new
>> >>o.a.k.clients.tools.ProducerPerformance tool to asses the perf of this
>> >>mode of operation. May not be surprising that it much slower than the
>> >>async mode... hard t push it beyond 4MB/s.
>> >> 
>> >> The 0.8.1 Scala based producer API supported a batched sync mode via
>> >>Producer.send( List<KeyedMessage> ) . My measurements show that it was
>> >>able to approach (and sometimes exceed) the old async speeds...
>>266MB/s
>> >> 
>> >> 
>> >> Supporting this batched sync mode is very critical for streaming
>> >>clients (such as flume for example) that need delivery guarantees.
>> >>Although it can be done with Async mode, it requires additional book
>> >>keeping as to which events are delivered and which ones are not. The
>> >>programming model becomes much simpler with the batched sync mode.
>> >>Client having to deal with one single future.get() helps performance
>> >>greatly too as I noted.
>> >> 
>> >> Wanted to propose adding this as an enhancement to the new Producer
>>API.
>> >
>> 
>

Re: New Producer API - batched sync mode support

Reply via email to