Re: [DISCUSS] KIP-782: Expandable batch size in producer

2022-05-02 Thread Jack Vanlightly
The current configs are hard to use for the Kafka user and a little inflexible so I am pleased to see the discussion. Ultimately we want flexibility. We don't want to force users to understand the underlying implementation/protocol and we want the producer to handle high or low throughput

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-13 Thread Jun Rao
Hi, Lucas, Thanks for the reply. It would be useful to summarize the benefits of a separate batch.max.size. To me, it's not clear why a user would want two different batch sizes. In your example, I can understand why a user would want to form a batch with a 5ms linger. But why would a user prefer

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-10 Thread Lucas Bradstreet
Hi Jun, One difference compared to increasing the default batch size is that users may actually prefer smaller batches but it makes much less sense to accumulate many small batches if a batch is already sending. For example, imagine a user that prefer 16K batches with 5ms linger. Everything is

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-10 Thread Jun Rao
Hi, Artem, Luke, Thanks for the reply. 11. If we get rid of batch.max.size and increase the default batch.size, it's true the behavior is slightly different than before. However, does that difference matter to most of our users? In your example, if a user sets linger.ms to 100ms and thinks 256KB

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-09 Thread Luke Chen
Hi Jun, 11. In addition to Artem's comment, I think the reason to have additional "batch.max.size" is to have more flexibility to users. For example: With linger.ms=100ms, batch.size=16KB, now, we have 20KB of data coming to a partition within 50ms. Now, sender is ready to pick up the batch to

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-08 Thread Artem Livshits
Hi Jun, 11. That was my initial thinking as well, but in a discussion some people pointed out the change of behavior in some scenarios. E.g. if someone for some reason really wants batches to be at least 16KB and sets large linger.ms, and most of the time the batches are filled quickly enough

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-08 Thread Jun Rao
Hi, Artem, Thanks for the reply. 11. Got it. To me, batch.size is really used for throughput and not for latency guarantees. There is no guarantee when 16KB will be accumulated. So, if users want any latency guarantee, they will need to specify linger.ms accordingly. Then, batch.size can just be

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-08 Thread Artem Livshits
Hi Jun, 10. My understanding is that MemoryRecords would under the covers be allocated in chunks, so logically it still would be one MemoryRecords object, it's just instead of allocating one large chunk upfront, smaller chunks are allocated as needed to grow the batch and linked into a list. 11.

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-07 Thread Jun Rao
Hi, Luke, Thanks for the KIP. A few comments below. 10. Accumulating small batches could improve memory usage. Will that introduce extra copying when generating a produce request? Currently, a produce request takes a single MemoryRecords per partition. 11. Do we need to introduce a new config

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-29 Thread Artem Livshits
Hi Luke, I don't mind increasing the max.request.size to a higher number, e.g. 2MB could be good. I think we should also run some benchmarks to see the effects of different sizes. I agree that changing round robin to random solves an independent existing issue, however the logic in this KIP

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-24 Thread Luke Chen
Hi Artem, Yes, I agree if we go with random selection instead of round-robin selection, the latency issue will be more fair. That is, if there are 10 partitions, the 10th partition will always be the last choice in each round in current design, but with random selection, the chance to be selected

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-23 Thread Artem Livshits
> maybe I can firstly decrease the "batch.max.size" to 32KB I think 32KB is too small. With 5 in-flight and 100ms latency we can produce 1.6MB/s per partition. With 256KB we can produce 12.8MB/s per partition. We should probably set up some testing and see if 256KB has problems. To

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-23 Thread Luke Chen
Hi Tom, Thanks for your comments. And thanks for Artem's explanation. Below is my response: > Currently because buffers are allocated using batch.size it means we can handle records that are that large (e.g. one big record per batch). Doesn't the introduction of smaller buffer sizes

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-22 Thread Artem Livshits
> I think this KIP would change the behaviour of producers when there are multiple partitions ready to be sent This is correct, the pattern changes and becomes more coarse-grained. But I don't think it changes fairness over the long run. I think it's a good idea to change drainIndex to be

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-22 Thread Tom Bentley
Hi Luke, Thanks for the KIP! Currently because buffers are allocated using batch.size it means we can handle records that are that large (e.g. one big record per batch). Doesn't the introduction of smaller buffer sizes (batch.initial.size) mean a corresponding decrease in the maximum record size

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-10-20 Thread Luke Chen
Hi Ismael and all devs, Is there any comments/suggestions to this KIP? If no, I'm going to update the KIP based on my previous mail, and start a vote tomorrow or next week. Thank you. Luke On Mon, Oct 18, 2021 at 2:40 PM Luke Chen wrote: > Hi Ismael, > Thanks for your comments. > > 1. Why do

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-10-18 Thread Luke Chen
Hi Ismael, Thanks for your comments. 1. Why do we have to reallocate the buffer? We can keep a list of buffers instead and avoid reallocation. -> Do you mean we allocate multiple buffers with "buffer.initial.size", and link them together (with linked list)? ex: a. We allocate 4KB initial buffer |

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-10-17 Thread Ismael Juma
I think we should also consider tweaking the semantics of batch.size so that the sent batches can be larger if the batch is not ready to be sent (while still respecting max.request.size and perhaps a new max.batch.size). Ismael On Sun, Oct 17, 2021, 12:08 PM Ismael Juma wrote: > Hi Luke, > >

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-10-17 Thread Ismael Juma
Hi Luke, Thanks for the KIP. Why do we have to reallocate the buffer? We can keep a list of buffers instead and avoid reallocation. Ismael On Sun, Oct 17, 2021, 2:02 AM Luke Chen wrote: > Hi Kafka dev, > I'd like to start the discussion for the proposal: KIP-782: Expandable > batch size in

[DISCUSS] KIP-782: Expandable batch size in producer

2021-10-17 Thread Luke Chen
Hi Kafka dev, I'd like to start the discussion for the proposal: KIP-782: Expandable batch size in producer. The main purpose for this KIP is to have better memory usage in producer, and also save users from the dilemma while setting the batch size configuration. After this KIP, users can set a