Hi Marc,
That describes the behavior of the kafka producer library that batches
writes to kafka.  This post on confluent.io explains it pretty well:
http://kafka.apache.org/082/javadoc/index.html?org/apache/kafka/clients/pro
ducer/KafkaProducer.html

But the general idea is that the producer will group together a bunch of
writes to kafka for a specific topic and partition, and then send them as
a single request.

Durability guarantees in kafka depend on your configuration, and can be
very week, or very strong. Reading the kafka documentation page¹s sections
about producers should make it clear which setting improve Durability at
the cost of latency and throughput.  But there would be a risk of loosing
the messages that are inside the proxy application during a failure,
unless there is a replay ability from the source.
-Erik




On 8/27/15, 12:34 AM, "Marc Bollinger" <m...@lumoslabs.com> wrote:

>Apologies if this is somewhat redundant, I'm quite new to both Kafka and
>the Confluent Platform. Ewen, when you say "Under the hood, the new
>producer will automatically batch requests."
>
>Do you mean that this is a current or planned behavior of the REST proxy?
>Are there any durability guarantees, or are batches just held in memory
>before being sent to Kafka (or some other option)?
>
>Thanks!
>
>> On Aug 26, 2015, at 9:50 PM, Ewen Cheslack-Postava <e...@confluent.io>
>>wrote:
>> 
>> Hemanth,
>> 
>> The Confluent Platform 1.0 version of have JSON embedded format support
>> (i.e. direct embedding of JSON messages), but you can serialize, base64
>> encode, and use the binary mode, paying a bit of overhead. However,
>>since
>> then we merged a patch to add JSON support:
>> https://github.com/confluentinc/kafka-rest/pull/89 The JSON support does
>> not interact with the schema registry at all. If you're ok building your
>> own version from trunk you could use that, or this will be released with
>> our next platform version.
>> 
>> In the REST proxy, each HTTP requests will result in one call to
>> producer.send(). Under the hood, the new producer will automatically
>>batch
>> requests. The default settings will only batch when it's necessary
>>(because
>> there are already too many outstanding requests, so messages pile up in
>>the
>> local buffer), so you get the advantages of batching, but with a lower
>> request rate the messages will still be sent to the broker immediately.
>> 
>> -Ewen
>> 
>> On Wed, Aug 26, 2015 at 9:31 PM, Hemanth Abbina
>><heman...@eiqnetworks.com>
>> wrote:
>> 
>>> Ewen,
>>> 
>>> Thanks for the explanation.
>>> 
>>> We have control over the logs format coming to HAProxy. Right now,
>>>these
>>> are plain JSON logs (just like syslog messages with few additional meta
>>> information) sent to HAProxy from remote clients using HTTPs. No
>>> serialization is used.
>>> 
>>> Currently, we have one log each of the HTTP request. I understood that
>>> every request is produced individually without batching.
>>> 
>>> Will this work with REST proxy, without using schema registry ?
>>> 
>>> --regards
>>> Hemanth
>>> 
>>> -----Original Message-----
>>> From: Ewen Cheslack-Postava [mailto:e...@confluent.io]
>>> Sent: Thursday, August 27, 2015 9:14 AM
>>> To: users@kafka.apache.org
>>> Subject: Re: Http Kafka producer
>>> 
>>> Hemanth,
>>> 
>>> Can you be a bit more specific about your setup? Do you have control
>>>over
>>> the format of the request bodies that reach HAProxy or not? If you do,
>>> Confluent's REST proxy should work fine and does not require the Schema
>>> Registry. It supports both binary (encoded as base64 so it can be
>>>passed
>>> via the JSON request body) and Avro. With Avro it uses the schema
>>>registry,
>>> but the binary mode doesn't require it.
>>> 
>>> If you don't have control over the format, then the REST proxy is not
>>> currently designed to support that use case. I don't think HAProxy can
>>> rewrite request bodies (beyond per-line regexes, which would be hard to
>>> make work), so that's not an option either. It would certainly be
>>>possible
>>> to make a small addition to the REST proxy to allow binary request
>>>bodies
>>> to be produced directly to a topic specified in the URL, though you'd
>>>be
>>> paying pretty high overhead per message -- without the ability to
>>>batch,
>>> you're doing one HTTP request per messages. This might not be bad if
>>>your
>>> messages are large enough? (Then again, the same issue applies
>>>regardless
>>> of what solution you end up with if each of the requests to HAProxy
>>>only
>>> contains one message).
>>> 
>>> -Ewen
>>> 
>>> 
>>> 
>>> On Wed, Aug 26, 2015 at 5:05 PM, Hemanth Abbina
>>><heman...@eiqnetworks.com>
>>> wrote:
>>> 
>>>> Marc,
>>>> 
>>>> Thanks for your response.  Let's have more details on the problem.
>>>> 
>>>> As I already mentioned in the previous post, here is our expected data
>>>> flow:  logs -> HAProxy -> {new layer } -> Kafka Cluster
>>>> 
>>>> The 'new layer' should receive logs as HTTP requests from HAproxy and
>>>> produce the same logs to Kafka without loss.
>>>> 
>>>> Options that seems to be available, are 1. Flume: It has a HTTP source
>>>> & Kafka sink, but the documentation says HTTP source is not for
>>>> production use.
>>>> 2. Kafka Rest Proxy: Though this seems to be fine, adding another
>>>> dependency of Schema Registry servers to validate the schema, which
>>>> should be again used by the consumers.
>>>> 3. Custom plugin to handle this functionality: Though the
>>>> functionality seems to be simple - scalability, reliability aspects
>>>> and maintenance would be more.
>>>> 
>>>> Thanks
>>>> Hemanth
>>>> 
>>>> -----Original Message-----
>>>> From: Marc Bollinger [mailto:m...@lumoslabs.com]
>>>> Sent: Thursday, August 27, 2015 4:39 AM
>>>> To: users@kafka.apache.org
>>>> Cc: dev-subscr...@kafka.apache.org
>>>> Subject: Re: Http Kafka producer
>>>> 
>>>> I'm actually also really interested in this...I had a chat about this
>>>> on the distributed systems slack's <http://dist-sys.slack.com> Kafka
>>>> channel a few days ago, but we're not much further than griping about
>>> the problem.
>>>> We're basically migrating an existing event system, one which packed
>>>> messages into files, waited for a time-or-space threshold to be
>>>> crossed, then dealt with distribution in terms of files. Basically,
>>>> we'd like to keep a lot of those semantics: we can acknowledge success
>>>> on the app server as soon as we've flushed to disk, and rely on the
>>>> filesystem for durability, and total order across the system doesn't
>>>> matter, as the HTTP PUTs sending the messages are load balanced across
>>>> many app servers. We also can tolerate [very] long downstream event
>>>> system outages, because...we're ultimately just writing sequentially
>>>> to disk, per process (I should mention that this part is in Rails,
>>>> which means we're dealing largely in terms of processes, not threads).
>>>> 
>>>> RocksDB was mentioned in the discussion, but spending exactly 5
>>>> minutes researching that solution, it seems like the dead simplest
>>>> solution on an app server in terms of moving parts (multiple processes
>>>> writing, one process reading/forwarding to Kafka) wouldn't work well
>>> with RocksDB.
>>>> Although now that I'm looking at it more, it looks like they're
>>>> working on a MySQL storage engine?
>>>> 
>>>> Anyway yeah, I'd love some discussion on this, or war stories of
>>>> migration to Kafka from other event systems (F/OSS or...bespoke).
>>>> 
>>>> On Wed, Aug 26, 2015 at 3:45 PM, Hemanth Abbina
>>>> <heman...@eiqnetworks.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Our application receives events through a HAProxy server on HTTPs,
>>>>> which should be forwarded and stored to Kafka cluster.
>>>>> 
>>>>> What should be the best option for this ?
>>>>> This layer should receive events from HAProxy & produce them to
>>>>> Kafka cluster, in a reliable and efficient way (and should scale
>>> horizontally).
>>>>> 
>>>>> Please suggest.
>>>>> 
>>>>> --regards
>>>>> Hemanth
>>> 
>>> 
>>> 
>>> --
>>> Thanks,
>>> Ewen
>> 
>> 
>> 
>> -- 
>> Thanks,
>> Ewen

Reply via email to