The current configs are hard to use for the Kafka user and a little inflexible
so I am pleased to see the discussion.
Ultimately we want flexibility. We don't want to force users to understand the
underlying implementation/protocol and we want the producer to handle high or
low throughput
Hi, Lucas,
Thanks for the reply. It would be useful to summarize the benefits of a
separate batch.max.size. To me, it's not clear why a user would want two
different batch sizes. In your example, I can understand why a user would
want to form a batch with a 5ms linger. But why would a user prefer
Hi Jun,
One difference compared to increasing the default batch size is that users
may actually prefer smaller batches but it makes much less sense to
accumulate many small batches if a batch is already sending.
For example, imagine a user that prefer 16K batches with 5ms linger.
Everything is
Hi, Artem, Luke,
Thanks for the reply.
11. If we get rid of batch.max.size and increase the default batch.size,
it's true the behavior is slightly different than before. However, does
that difference matter to most of our users? In your example, if a user
sets linger.ms to 100ms and thinks 256KB
Hi Jun,
11. In addition to Artem's comment, I think the reason to have additional
"batch.max.size" is to have more flexibility to users.
For example:
With linger.ms=100ms, batch.size=16KB, now, we have 20KB of data coming to
a partition within 50ms. Now, sender is ready to pick up the batch to
Hi Jun,
11. That was my initial thinking as well, but in a discussion some people
pointed out the change of behavior in some scenarios. E.g. if someone for
some reason really wants batches to be at least 16KB and sets large
linger.ms, and most of the time the batches are filled quickly enough
Hi, Artem,
Thanks for the reply.
11. Got it. To me, batch.size is really used for throughput and not for
latency guarantees. There is no guarantee when 16KB will be accumulated.
So, if users want any latency guarantee, they will need to specify
linger.ms accordingly.
Then, batch.size can just be
Hi Jun,
10. My understanding is that MemoryRecords would under the covers be
allocated in chunks, so logically it still would be one MemoryRecords
object, it's just instead of allocating one large chunk upfront, smaller
chunks are allocated as needed to grow the batch and linked into a list.
11.
Hi, Luke,
Thanks for the KIP. A few comments below.
10. Accumulating small batches could improve memory usage. Will that
introduce extra copying when generating a produce request? Currently, a
produce request takes a single MemoryRecords per partition.
11. Do we need to introduce a new config
Hi Luke,
I don't mind increasing the max.request.size to a higher number, e.g. 2MB
could be good. I think we should also run some benchmarks to see the
effects of different sizes.
I agree that changing round robin to random solves an independent existing
issue, however the logic in this KIP
Hi Artem,
Yes, I agree if we go with random selection instead of round-robin
selection, the latency issue will be more fair. That is, if there are 10
partitions, the 10th partition will always be the last choice in each round
in current design, but with random selection, the chance to be selected
> maybe I can firstly decrease the "batch.max.size" to 32KB
I think 32KB is too small. With 5 in-flight and 100ms latency we can
produce 1.6MB/s per partition. With 256KB we can produce 12.8MB/s per
partition. We should probably set up some testing and see if 256KB has
problems.
To
Hi Tom,
Thanks for your comments. And thanks for Artem's explanation.
Below is my response:
> Currently because buffers are allocated using batch.size it means we can
handle records that are that large (e.g. one big record per batch). Doesn't
the introduction of smaller buffer sizes
> I think this KIP would change the behaviour of producers when there are
multiple partitions ready to be sent
This is correct, the pattern changes and becomes more coarse-grained. But
I don't think it changes fairness over the long run. I think it's a good
idea to change drainIndex to be
Hi Luke,
Thanks for the KIP!
Currently because buffers are allocated using batch.size it means we can
handle records that are that large (e.g. one big record per batch). Doesn't
the introduction of smaller buffer sizes (batch.initial.size) mean a
corresponding decrease in the maximum record size
Hi Ismael and all devs,
Is there any comments/suggestions to this KIP?
If no, I'm going to update the KIP based on my previous mail, and start a
vote tomorrow or next week.
Thank you.
Luke
On Mon, Oct 18, 2021 at 2:40 PM Luke Chen wrote:
> Hi Ismael,
> Thanks for your comments.
>
> 1. Why do
Hi Ismael,
Thanks for your comments.
1. Why do we have to reallocate the buffer? We can keep a list of buffers
instead and avoid reallocation.
-> Do you mean we allocate multiple buffers with "buffer.initial.size", and
link them together (with linked list)?
ex:
a. We allocate 4KB initial buffer
|
I think we should also consider tweaking the semantics of batch.size so
that the sent batches can be larger if the batch is not ready to be sent
(while still respecting max.request.size and perhaps a new max.batch.size).
Ismael
On Sun, Oct 17, 2021, 12:08 PM Ismael Juma wrote:
> Hi Luke,
>
>
Hi Luke,
Thanks for the KIP. Why do we have to reallocate the buffer? We can keep a
list of buffers instead and avoid reallocation.
Ismael
On Sun, Oct 17, 2021, 2:02 AM Luke Chen wrote:
> Hi Kafka dev,
> I'd like to start the discussion for the proposal: KIP-782: Expandable
> batch size in
Hi Kafka dev,
I'd like to start the discussion for the proposal: KIP-782: Expandable
batch size in producer.
The main purpose for this KIP is to have better memory usage in producer,
and also save users from the dilemma while setting the batch size
configuration. After this KIP, users can set a
20 matches
Mail list logo