Re: New Java Producer: Single Producer vs multiple Producers

2015-04-27 Thread Roshan Naik
On 4/27/15 11:15 AM, Jay Kreps jay.kr...@gmail.com wrote: The new producer is pretty new still so I suspect there is a fair amount of low-hanging performance work for anyone who wanted to take a shot at it. I do see something. Will discuss it on a different email thread.

Re: New Producer API - batched sync mode support

2015-04-27 Thread Joel Koshy
This sounds like flush: https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API which was recently implemented in trunk. Joel On Mon, Apr 27, 2015 at 08:19:40PM +, Roshan Naik wrote: Been evaluating the perf of old and new Produce APIs for reliable

Re: New Producer API - batched sync mode support

2015-04-27 Thread Gwen Shapira
@Roshan - if the data was already written to Kafka, your approach will generate LOTS of duplicates. I'm not convinced its ideal. What's wrong with callbacks? On Mon, Apr 27, 2015 at 2:53 PM, Roshan Naik ros...@hortonworks.com wrote: @Gwen - A failure in delivery of one or more events in the

Re: New Producer API - batched sync mode support

2015-04-27 Thread Gwen Shapira
I should have been clearer - I used Roshan's terminology in my reply. Basically, the old producer batch Send() just took a sequence of messages. I assumed Roshan is looking for something similar - which allows for mixing messages for multiple partitions and therefore can fail for some messages

New Producer API - batched sync mode support

2015-04-27 Thread Roshan Naik
Been evaluating the perf of old and new Produce APIs for reliable high volume streaming data movement. I do see one area of improvement that the new API could use for synchronous clients. AFAIKT, the new API does not support batched synchronous transfers. To do synchronous send, one needs to

Re: New Producer API - batched sync mode support

2015-04-27 Thread Magnus Edenhill
Hi Gwen, can you clarify: by batch do you mean the protocol MessageSet, or some java client internal construct? If the former I was under the impression that a produced MessageSet either succeeds delivery or errors in its entirety on the broker. Thanks, Magnus 2015-04-27 23:05 GMT+02:00 Gwen

Re: New Producer API - batched sync mode support

2015-04-27 Thread Roshan Naik
The important guarantee that is needed for a client producer thread is that it requires an indication of success/failure of the batch of events it pushed. Essentially it needs to retry producer.send() on that same batch in case of failure. My understanding is that flush will simply flush data from

Re: Why fetching meta-data for topic is done three times?

2015-04-27 Thread Zakee
What values do you have for below properties? Or are these set to defaults? message.send.max.retries retry.backoff.ms topic.metadata.refresh.interval.ms Thanks Zakee On Apr 23, 2015, at 11:48 PM, Madhukar Bharti bhartimadhu...@gmail.com wrote: Hi All, Once gone through code found

Re: New Producer API - batched sync mode support

2015-04-27 Thread Joel Koshy
As long as you retain the returned futures somewhere, you can always iterate over the futures after the flush completes and check for success/failure. Would that work for you? On Mon, Apr 27, 2015 at 08:53:36PM +, Roshan Naik wrote: The important guarantee that is needed for a client

Re: New Producer API - batched sync mode support

2015-04-27 Thread Gwen Shapira
Batch failure is a bit meaningless, since in the same batch, some records can succeed and others may fail. To implement an error handling logic (usually different than retry, since the producer has a configuration controlling retries), we recommend using the callback option of Send(). Gwen P.S

Re: New Producer API - batched sync mode support

2015-04-27 Thread Roshan Naik
@Gwen - A failure in delivery of one or more events in the batch (typical Flume case) is considered a failure of the entire batch and the client redelivers the entire batch. - If clients want more fine grained control, alternative option is to indicate which events failed in the return value of

Re: New producer: metadata update problem on 2 Node cluster.

2015-04-27 Thread Manikumar Reddy
Any comments on this issue? On Apr 24, 2015 8:05 PM, Manikumar Reddy ku...@nmsworks.co.in wrote: We are testing new producer on a 2 node cluster. Under some node failure scenarios, producer is not able to update metadata. Steps to reproduce 1. form a 2 node cluster (K1, K2) 2. create a

Topic missing Leader and Isr

2015-04-27 Thread Buntu Dev
I checked out the Kafka 0.8.2 branch and used the ./bin/kafka-topics.sh script to create a topic: ./bin/kafka-topics.sh -create -zookeeper zk-node:2181 -partitions 12 -replication-factor 1 -topic my-topic` But when I use the --describe to look at the topic both the 'Leader' and 'Isr' are

Re: New Producer API - batched sync mode support

2015-04-27 Thread Roshan Naik
On 4/27/15 2:59 PM, Gwen Shapira gshap...@cloudera.com wrote: @Roshan - if the data was already written to Kafka, your approach will generate LOTS of duplicates. I'm not convinced its ideal. Only if the delivery failure rate is very high (i.e. short lived but very frequent). This batch

Kafka commit offset

2015-04-27 Thread Gomathivinayagam Muthuvinayagam
I am trying to commit offset request in a background thread. I am able to commit it so far. I am using high level consumer api. So if I just use high level consumer api, and if I have disabled auto commit, with kafka as the storage for offsets, will the high level consumer api use automatically

Re: New producer: metadata update problem on 2 Node cluster.

2015-04-27 Thread Ewen Cheslack-Postava
Maybe add this to the description of https://issues.apache.org/jira/browse/KAFKA-1843 ? I can't find it now, but I think there was another bug where I described a similar problem -- in some cases it makes sense to fall back to the list of bootstrap nodes because you've gotten into a bad state and

Re: Getting java.lang.IllegalMonitorStateException in mirror maker when building fetch request

2015-04-27 Thread Jiangjie Qin
Hi Tao, KAFKA-2150 has been filed. Jiangjie On 4/24/15, 12:38 PM, tao xiao xiaotao...@gmail.com wrote: Hi team, I observed java.lang.IllegalMonitorStateException thrown from AbstractFetcherThread in mirror maker when it is trying to build the fetchrequst. Below is the error [2015-04-23

Re: New Producer API - batched sync mode support

2015-04-27 Thread Joel Koshy
Fine grained tracking of status of individual events is quite painful in contrast to simply blocking on every batch. Old style Batched-sync mode has great advantages in terms of simplicity and performance. I may be missing something, but I'm not so convinced that it is that painful/very

Re: New Producer API - batched sync mode support

2015-04-27 Thread Ewen Cheslack-Postava
A couple of thoughts: 1. @Joel I agree it's not hard to use the new API but it definitely is more verbose. If that snippet of code is being written across hundreds of projects, that probably means we're missing an important API. Right now I've only seen the one complaint, but it's worth finding

Re: New and old producers partition messages differently

2015-04-27 Thread James Cheng
On Apr 26, 2015, at 9:03 PM, Gwen Shapira gshap...@cloudera.com wrote: Definitely +1 for advertising this in the docs. What I can't figure out is the upgrade path... if my application assumes that all data for a single user is in one partition (so it subscribes to a single partition and

Re: New Java Producer: Single Producer vs multiple Producers

2015-04-27 Thread Jay Kreps
Hey Jiangjie, Yeah, not sure the bottleneck. It maybe the sender or lock contention on the writer threads. You could use top or one of the java tools to check out the per-thread cpu usage. The benchmarking I had done previously showed an ability to max out the 1G network cards we had with a

Re: New and old producers partition messages differently

2015-04-27 Thread Jay Kreps
Yeah I agree we could have handled this better. I think the story we have now is that you can override it using the partition argument in the producer (and when we get the patch for pluggable producer we can bundle a LegacyPartitioner or something like that). The reason for murmur2 over 3 was

Re: New Java Producer: Single Producer vs multiple Producers

2015-04-27 Thread Jiangjie Qin
Hi Jay, Does o.a.k.clients.tools.ProducerPerformance provide multi-thread test? I did not find it. I tweaked the test a little bit to make it multi-threaded and what I found is that in a single thread case, with each message of 10 bytes, single caller thread has ~2M messages/second throughput.

question about producer and committed state

2015-04-27 Thread Madhusudan Ramanna
Hi, We're using Kafka 0.8.1.1 Here is the snippet from documentation about request.required.acks - 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server