Re: Batch producer latencies and flush()

2015-06-28 Thread Ewen Cheslack-Postava
The logic you're requesting is basically what the new producer implements.
The first condition is the batch size limit and the second is linger.ms.
The actual logic is a bit more complicated and has some caveats dealing
with, for example, backing off after failures, but you can see in this code

https://github.com/apache/kafka/blob/0.8.2.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L222

that the two normal conditions that will trigger a send are full and
expired.

Note that increasing batch size and linger ms will generally *increase*
your latency -- in most cases their effect is to make messages wait longer
on the client before being sent because it can result in higher throughput.
There may be edge cases where this isn't the case (e.g. high latencies to
the broker can cause a low linger.ms to have a negative effect in
combination with max.in.flight.requests.per.connection), but usually this
will be the case.

For the specific case you gave with increasing batch size, I would guess it
stopped having any effect because after 64KB you were never getting full
batches -- they were probably being sent out due to linger.ms expiring with
few enough in flight requests before the batch was full.

Maybe giving some more concrete numbers for the settings and some idea of
message size + message rate in specific instances would allow people to
suggest tweaks that might help?

-Ewen

On Sun, Jun 28, 2015 at 11:17 AM, Achanta Vamsi Subhash 
achanta.va...@flipkart.com wrote:

 *bump*

 On Tue, Jun 23, 2015 at 1:03 PM, Achanta Vamsi Subhash 
 achanta.va...@flipkart.com wrote:

  Hi,
 
  We are using the batch producer of 0.8.2.1 and we are getting very bad
  latencies for the topics. We have ~40K partitions now in a 20-node
 cluster.
 
  - We have many topics and each with messages published to them varying.
  Ex: some topics take 10k/sec and other 2000/minute.
  - We are seeing latencies of 99th percentile 2sec and 95th percentile of
  1sec.
  - The current parameters that are tunable are batch size, buffer size and
  linger. We monitor the metrics for the new producer and tuned the above
  accordingly. Still, we are not able to get any improvements. Batch size
 in
  a sense didn't matter after increasing from 64KB (we increased it till
 1MB).
  - We also noticed that the record queue time is high (2-3sec).
  Documentation describes that this is the time records wait in the
  accumulator to be sent.
 
  Later looking at the code in the trunk, I see that the batch size set is
  same for all the TopicPartitions and each have their own RecordBatch.
 Also,
  flush() method is added in the latest code.
 
  We want to have an upper bound on the latencies for every message push
  irrespective of the incoming rate. Can we achieve it by following logic:
 
  - Wait until X-Kb of batch size / Topic Partition is reached
  (or)
  - Wait for Y-ms
 
  If either of them is reached, flush the producer records. Can this be
 part
  of the producer code itself? This will avoid the case of records getting
  accumulated for 2-3 sec.
 
  Please correct me if the analysis is wrong and suggest me on how do we
  improve latencies of the new producer. Thanks.
 
  --
  Regards
  Vamsi Subhash
 



 --
 Regards
 Vamsi Subhash

 --



 --

 This email and any files transmitted with it are confidential and intended
 solely for the use of the individual or entity to whom they are addressed.
 If you have received this email in error please notify the system manager.
 This message contains confidential information and is intended only for the
 individual named. If you are not the named addressee you should not
 disseminate, distribute or copy this e-mail. Please notify the sender
 immediately by e-mail if you have received this e-mail by mistake and
 delete this e-mail from your system. If you are not the intended recipient
 you are notified that disclosing, copying, distributing or taking any
 action in reliance on the contents of this information is strictly
 prohibited. Although Flipkart has taken reasonable precautions to ensure no
 viruses are present in this email, the company cannot accept responsibility
 for any loss or damage arising from the use of this email or attachments




-- 
Thanks,
Ewen


Re: Batch producer latencies and flush()

2015-06-28 Thread Achanta Vamsi Subhash
*bump*

On Tue, Jun 23, 2015 at 1:03 PM, Achanta Vamsi Subhash 
achanta.va...@flipkart.com wrote:

 Hi,

 We are using the batch producer of 0.8.2.1 and we are getting very bad
 latencies for the topics. We have ~40K partitions now in a 20-node cluster.

 - We have many topics and each with messages published to them varying.
 Ex: some topics take 10k/sec and other 2000/minute.
 - We are seeing latencies of 99th percentile 2sec and 95th percentile of
 1sec.
 - The current parameters that are tunable are batch size, buffer size and
 linger. We monitor the metrics for the new producer and tuned the above
 accordingly. Still, we are not able to get any improvements. Batch size in
 a sense didn't matter after increasing from 64KB (we increased it till 1MB).
 - We also noticed that the record queue time is high (2-3sec).
 Documentation describes that this is the time records wait in the
 accumulator to be sent.

 Later looking at the code in the trunk, I see that the batch size set is
 same for all the TopicPartitions and each have their own RecordBatch. Also,
 flush() method is added in the latest code.

 We want to have an upper bound on the latencies for every message push
 irrespective of the incoming rate. Can we achieve it by following logic:

 - Wait until X-Kb of batch size / Topic Partition is reached
 (or)
 - Wait for Y-ms

 If either of them is reached, flush the producer records. Can this be part
 of the producer code itself? This will avoid the case of records getting
 accumulated for 2-3 sec.

 Please correct me if the analysis is wrong and suggest me on how do we
 improve latencies of the new producer. Thanks.

 --
 Regards
 Vamsi Subhash




-- 
Regards
Vamsi Subhash

-- 


--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify the system manager. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited. Although Flipkart has taken reasonable precautions to ensure no 
viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments


Batch producer latencies and flush()

2015-06-23 Thread Achanta Vamsi Subhash
Hi,

We are using the batch producer of 0.8.2.1 and we are getting very bad
latencies for the topics. We have ~40K partitions now in a 20-node cluster.

- We have many topics and each with messages published to them varying. Ex:
some topics take 10k/sec and other 2000/minute.
- We are seeing latencies of 99th percentile 2sec and 95th percentile of
1sec.
- The current parameters that are tunable are batch size, buffer size and
linger. We monitor the metrics for the new producer and tuned the above
accordingly. Still, we are not able to get any improvements. Batch size in
a sense didn't matter after increasing from 64KB (we increased it till 1MB).
- We also noticed that the record queue time is high (2-3sec).
Documentation describes that this is the time records wait in the
accumulator to be sent.

Later looking at the code in the trunk, I see that the batch size set is
same for all the TopicPartitions and each have their own RecordBatch. Also,
flush() method is added in the latest code.

We want to have an upper bound on the latencies for every message push
irrespective of the incoming rate. Can we achieve it by following logic:

- Wait until X-Kb of batch size / Topic Partition is reached
(or)
- Wait for Y-ms

If either of them is reached, flush the producer records. Can this be part
of the producer code itself? This will avoid the case of records getting
accumulated for 2-3 sec.

Please correct me if the analysis is wrong and suggest me on how do we
improve latencies of the new producer. Thanks.

-- 
Regards
Vamsi Subhash

-- 


--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify the system manager. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited. Although Flipkart has taken reasonable precautions to ensure no 
viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments