Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-08-03 Thread Erik van Oosten

Hi Philip, Colin, Chris, Matthias, Kirk, David, Xiangyuan LI,

KIP-944 was extended a bit more to explain why effect systems like Zio 
and Cats-effects make it impossible to run code on a specific thread.


I understand that using an effect system is pretty far removed from 
writing Java in transaction script style, the style that is probably 
used by most Kafka committers. I took me quite some time to get 
comfortable with effects. It is not the academic fringe tool as 
perceived by many. For me it is a way to quickly and correctly write 
serious data processing applications.


Even so, we both use the same Kafka eco-system and supporting different 
styles only makes it more rich. IMHO it would be a shame if we can not 
live together using the same code base.


Philip, thanks for your support. I hope I have convinced the others as 
well by now. If not, I am giving up and I will spend my energy elsewhere.


Kind regards,
    Erik.


Op 24-07-2023 om 18:12 schreef Erik van Oosten:

Hello Xiangyuan LI,

I am not familiar with coroutines, nor with Kotlin. You will have to 
work with the documentation: 
https://kotlinlang.org/docs/coroutines-overview.html


However, I am familiar with Zio and Cats-effects (both Scala 
libraries). In both Zio and Cats-effects one creates effects (aka 
workflows) which are descriptions of a computation. For example, when 
executing the Scala code `val effect = ZIO.attempt(println("Hello 
world!"))` one creates only a description; it does not print anything 
yet. The language to describe these effects is rich enough to describe 
entire applications including things like concurrency. In fact, the 
language is so rich, that it is the most convenient way that I know to 
safely write highly concurrent and async applications. For many 
developer teams the performance penalty (which is real but not big) is 
worth it.


To execute a Zio or Cats effect one gives it to the runtime. The 
runtime then schedules the work on one of the threads in its 
thread-pool. Zio, nor Cats-effects supports running an effect on the 
thread that manages the thread-pool.


I hope this clear enough.

Kind regards,
    Erik.


Op 24-07-2023 om 05:21 schreef Xiangyuan LI:

Hi Erik:
 I read KIP-944 and email list roughly, it seems most Java 
developer not
familiar with the conception of "coroutine" so cannot imagine why 
code of

one function without Thread.start() may run in separate threads and even
developer couldn't control it. Maybe you need a more elaborate 
description

to demonstrate how coroutine code run.

Erik van Oosten  于2023年7月23日周日 
17:47写道:



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-24 Thread Erik van Oosten
nor the
consumer is

fully

nonblocking. There are some corner cases where we do in

fact

block.

From

memory, the producer blocks in some "buffer full" cases,
and the
consumer blocks sometimes when fetching metadata.

I am aware of that. This is not an issue; all async
runtimes can
cooperate with blocking code.


Hmm, not sure what you mean by "cooperate with blocking

code."

If you

have 10 green threads you're multiplexing on to one CPU

thread,

and that
CPU thread gets blocked because of what one green thread is
doing, the
other 9 green threads are blocked too, right? I guess it's
"just" a
performance problem, but it still seems like it could be a
serious one.

  > I suspect it would be more appropriate for Kotlin
coroutines, Zio

coroutines and so on to adopt this "pass messages to and
from a
background worker thread" model than to try to re-engineer
the Kafka
client ot work from random threads.

In both zio-kafka and fs2-kafka this is already the approach
we are

taking.

Unfortunately, the Kafka consumer forces us to perform some
work in
callbacks:

   * commit completed callback: register that the
callback is
complete,
   * partition revoked callback: in this callback we

need to

submit
 commits from everything consumed and processed so

far,

using
 timeouts if processing takes to long. In an async
runtime, this is
 an inherently multi-threaded process. Especially, we
cannot do
 timeouts without involving multiple threads.


I don't see why this has to be "inherently multi-threaded."

Why

can't

we have the other threads report back what messages they've
processed to
the worker thread. Then it will be able to handle these
callbacks without
involving the other threads.

regards,
Colin


I have extended the KIP's motivation to explain the major

use

case.

Please read KIP-944 again. Even though the description is
extensive
(this callback from callback stuff is tricky), you will find
that my
goals are modest.

Also the implementation is just a few lines. With
understanding of the
idea it should not be a lot of work to follow it.

Kind regards,
  Erik.


Op 07-07-2023 om 19:57 schreef Colin McCabe:

Hi Erik,

It's not clear ot me that it's safe to access the Kafka
consumer or

producer concurrently from different threads. There are data
structures
that aren't protected by locks, so I wouldn't necessarily

expect

accessing
and mutating them in a concurrent way to work. This is true

even

if the
accesses happen at different times, because modern CPUs

require

memory
barriers to guarantee inter-thread visibilty of loads and
stores.

I am writing this is without doing a detailed dive into the
code (I

haven't been into the consumer / producer code in a bit.)
Someone who has
worked more on the consumer recently might be able to give
specific
examples of things that wouldn't work.

I know that there are at least a few locks in the consumer
code now,

due to our need to send heartbeats from a worker thread. I

don't

think
those would be sufficient to protect a client that is making
calls from
random threads.

There has been some discussion of moving to a more
traditional model

where people make calls to the client and the clients passes

the

given data
to a single background worker thread. This would avoid a lot

lof

the
footguns of the current model and probably better reflect how
people
actually use the client.

Another issue is that neither the producer nor the consumer
is fully

nonblocking. There are some corner cases where we do in fact
block. From
memory, the producer blocks in some "buffer full" cases, and

the

consumer
blocks sometimes when fetching metadata.

I suspect it would be more appropriate for Kotlin

coroutines,

Zio

coroutines and so on to adopt this "pass messages to and from

a

background
worker thread" model  than to try to re-engineer the Kafka
client ot work
from random threads.

There is actually somed good  advice about how to handle
multiple

threads in the KafkaConsumer.java header file itself. Check

the

sections
"One Consumer Per Thread" and "Decouple Consumption and
Processing." What
I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick
response.

However, overall I think this one needs a lot more discussion
before
getting anywhere near a vote. I will leave a -1 just as a
procedural step.
Maybe some of the people working in the client area can also
chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I

understand

this may

not

be something that keeps you busy, but this KIP is

important

to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
   Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hell

Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-23 Thread Xiangyuan LI
gt;>>>>> another thread.*
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> To make sure there are no more misunderstandings about
> this, I
> >>>>>>>>>>>>>> have
> >>>>>>>>>>>>>> added this goal to the KIP.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Erik,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sorry, I spoke imprecisely. My concern is not concurrent
> >>>>>>>>>>>>> access, but
> >>>>>>>>>>>> multithreaded access in general. Basically cache line
> visibility
> >>>>>>>>>>>> issues.
> >>>>>>>>>>>>>>  > This is true even if the accesses happen at different
> >>>>>>>>>>>>>> times,
> >>>>>>>>>>>> because
> >>>>>>>>>>>>>>> modern CPUs require memory barriers to guarantee
> inter-thread
> >>>>>>>>>>>> visibilty
> >>>>>>>>>>>>>>> of loads and stores.
> >>>>>>>>>>>>>> In KIP-944, the callback thread can only delegate to another
> >>>>>>>>>>>>>> thread
> >>>>>>>>>>>>>> after reading from and writing to a threadlocal variable,
> >>>>>>>>>>>>>> providing the
> >>>>>>>>>>>>>> barriers right there.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> I don't see any documentation that accessing thread local
> >>>>>>>>>>>>> variables
> >>>>>>>>>>>> provides a total store or load barrier. Do you have such
> >>>>>>>>>>>> documentation? It
> >>>>>>>>>>>> seems like if this were the case, we could eliminate volatile
> >>>>>>>>>>>> variables
> >>>>>>>>>>>> from most of the code base.
> >>>>>>>>>>>>>>  > I know that there are at least a few locks in the
> >>>>>>>>>>>>>> consumer code
> >>>>>>>>>>>> now,
> >>>>>>>>>>>>>>> due to our need to send heartbeats from a worker thread. I
> >>>>>>>>>>>>>>> don't think
> >>>>>>>>>>>>>>> those would be sufficient to protect a client that is
> making
> >>>>>>>>>>>>>>> calls
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>>> random threads.
> >>>>>>>>>>>>>> In the current implementation the consumer is also invoked
> >>>>>>>>>>>>>> from random
> >>>>>>>>>>>>>> threads. If it works now, it should continue to work.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm not sure what you're referring to. Can you expand on
> this?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>  > There has been some discussion of moving to a more
> >>>>>>>>>>>>>> traditional
> >>>>>>>>>>>> model
> >>>>>>>>>>>>>>> where people make calls to the client and the clients
> passes
> >>>>>>>>>>>>>>> the given
> >>>>>>>>>>>>>>> data to a single background worker thread. This would
> avoid a
> >>>>>>>>>>>>>>> lot lof
> >>>>>>>>>>>>>>> the footguns of the current model and probably better
> reflect
> >>>>>>>>>>>>>>> how
> >>>>>>>>>>>> people
> >>>>>>>>>>>>>>> actually use the client.
> >>>>>>>>>>>>>> That is awesome. However, I'd rather not wait for that.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  > Another issue is that neither the producer nor the
> >>>>>>>>>>>>>> consumer is
> >>>>>>>>>>>> fully
> >>>>>>>>>>>>>>> nonblocking. There are some corner cases where we do in
> fact
> >>>>>>>>>>>>>>> block.
> >>>>>>>>>>>> From
> >>>>>>>>>>>>>>> memory, the producer blocks in some "buffer full" cases,
> >>>>>>>>>>>>>>> and the
> >>>>>>>>>>>>>>> consumer blocks sometimes when fetching metadata.
> >>>>>>>>>>>>>> I am aware of that. This is not an issue; all async
> >>>>>>>>>>>>>> runtimes can
> >>>>>>>>>>>>>> cooperate with blocking code.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Hmm, not sure what you mean by "cooperate with blocking
> code."
> >>>>>>>>>>>>> If you
> >>>>>>>>>>>> have 10 green threads you're multiplexing on to one CPU
> thread,
> >>>>>>>>>>>> and that
> >>>>>>>>>>>> CPU thread gets blocked because of what one green thread is
> >>>>>>>>>>>> doing, the
> >>>>>>>>>>>> other 9 green threads are blocked too, right? I guess it's
> >>>>>>>>>>>> "just" a
> >>>>>>>>>>>> performance problem, but it still seems like it could be a
> >>>>>>>>>>>> serious one.
> >>>>>>>>>>>>>>  > I suspect it would be more appropriate for Kotlin
> >>>>>>>>>>>>>> coroutines, Zio
> >>>>>>>>>>>>>>> coroutines and so on to adopt this "pass messages to and
> >>>>>>>>>>>>>>> from a
> >>>>>>>>>>>>>>> background worker thread" model than to try to re-engineer
> >>>>>>>>>>>>>>> the Kafka
> >>>>>>>>>>>>>>> client ot work from random threads.
> >>>>>>>>>>>>>> In both zio-kafka and fs2-kafka this is already the approach
> >>>>>>>>>>>>>> we are
> >>>>>>>>>>>> taking.
> >>>>>>>>>>>>>> Unfortunately, the Kafka consumer forces us to perform some
> >>>>>>>>>>>>>> work in
> >>>>>>>>>>>>>> callbacks:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>   * commit completed callback: register that the
> >>>>>>>>>>>>>> callback is
> >>>>>>>>>>>>>> complete,
> >>>>>>>>>>>>>>   * partition revoked callback: in this callback we
> need to
> >>>>>>>>>>>>>> submit
> >>>>>>>>>>>>>> commits from everything consumed and processed so
> far,
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>> timeouts if processing takes to long. In an async
> >>>>>>>>>>>>>> runtime, this is
> >>>>>>>>>>>>>> an inherently multi-threaded process. Especially, we
> >>>>>>>>>>>>>> cannot do
> >>>>>>>>>>>>>> timeouts without involving multiple threads.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> I don't see why this has to be "inherently multi-threaded."
> Why
> >>>>>>>>>>>>> can't
> >>>>>>>>>>>> we have the other threads report back what messages they've
> >>>>>>>>>>>> processed to
> >>>>>>>>>>>> the worker thread. Then it will be able to handle these
> >>>>>>>>>>>> callbacks without
> >>>>>>>>>>>> involving the other threads.
> >>>>>>>>>>>>> regards,
> >>>>>>>>>>>>> Colin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have extended the KIP's motivation to explain the major
> use
> >>>>>>>>>>>>>> case.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Please read KIP-944 again. Even though the description is
> >>>>>>>>>>>>>> extensive
> >>>>>>>>>>>>>> (this callback from callback stuff is tricky), you will find
> >>>>>>>>>>>>>> that my
> >>>>>>>>>>>>>> goals are modest.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Also the implementation is just a few lines. With
> >>>>>>>>>>>>>> understanding of the
> >>>>>>>>>>>>>> idea it should not be a lot of work to follow it.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>  Erik.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Op 07-07-2023 om 19:57 schreef Colin McCabe:
> >>>>>>>>>>>>>>> Hi Erik,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It's not clear ot me that it's safe to access the Kafka
> >>>>>>>>>>>>>>> consumer or
> >>>>>>>>>>>> producer concurrently from different threads. There are data
> >>>>>>>>>>>> structures
> >>>>>>>>>>>> that aren't protected by locks, so I wouldn't necessarily
> expect
> >>>>>>>>>>>> accessing
> >>>>>>>>>>>> and mutating them in a concurrent way to work. This is true
> even
> >>>>>>>>>>>> if the
> >>>>>>>>>>>> accesses happen at different times, because modern CPUs
> require
> >>>>>>>>>>>> memory
> >>>>>>>>>>>> barriers to guarantee inter-thread visibilty of loads and
> >>>>>>>>>>>> stores.
> >>>>>>>>>>>>>>> I am writing this is without doing a detailed dive into the
> >>>>>>>>>>>>>>> code (I
> >>>>>>>>>>>> haven't been into the consumer / producer code in a bit.)
> >>>>>>>>>>>> Someone who has
> >>>>>>>>>>>> worked more on the consumer recently might be able to give
> >>>>>>>>>>>> specific
> >>>>>>>>>>>> examples of things that wouldn't work.
> >>>>>>>>>>>>>>> I know that there are at least a few locks in the consumer
> >>>>>>>>>>>>>>> code now,
> >>>>>>>>>>>> due to our need to send heartbeats from a worker thread. I
> don't
> >>>>>>>>>>>> think
> >>>>>>>>>>>> those would be sufficient to protect a client that is making
> >>>>>>>>>>>> calls from
> >>>>>>>>>>>> random threads.
> >>>>>>>>>>>>>>> There has been some discussion of moving to a more
> >>>>>>>>>>>>>>> traditional model
> >>>>>>>>>>>> where people make calls to the client and the clients passes
> the
> >>>>>>>>>>>> given data
> >>>>>>>>>>>> to a single background worker thread. This would avoid a lot
> lof
> >>>>>>>>>>>> the
> >>>>>>>>>>>> footguns of the current model and probably better reflect how
> >>>>>>>>>>>> people
> >>>>>>>>>>>> actually use the client.
> >>>>>>>>>>>>>>> Another issue is that neither the producer nor the consumer
> >>>>>>>>>>>>>>> is fully
> >>>>>>>>>>>> nonblocking. There are some corner cases where we do in fact
> >>>>>>>>>>>> block. From
> >>>>>>>>>>>> memory, the producer blocks in some "buffer full" cases, and
> the
> >>>>>>>>>>>> consumer
> >>>>>>>>>>>> blocks sometimes when fetching metadata.
> >>>>>>>>>>>>>>> I suspect it would be more appropriate for Kotlin
> coroutines,
> >>>>>>>>>>>>>>> Zio
> >>>>>>>>>>>> coroutines and so on to adopt this "pass messages to and from
> a
> >>>>>>>>>>>> background
> >>>>>>>>>>>> worker thread" model  than to try to re-engineer the Kafka
> >>>>>>>>>>>> client ot work
> >>>>>>>>>>>> from random threads.
> >>>>>>>>>>>>>>> There is actually somed good  advice about how to handle
> >>>>>>>>>>>>>>> multiple
> >>>>>>>>>>>> threads in the KafkaConsumer.java header file itself. Check
> the
> >>>>>>>>>>>> sections
> >>>>>>>>>>>> "One Consumer Per Thread" and "Decouple Consumption and
> >>>>>>>>>>>> Processing." What
> >>>>>>>>>>>> I'm recommending here is essentially the latter.
> >>>>>>>>>>>>>>> I do understand that it's frustrating to not get a quick
> >>>>>>>>>>>>>>> response.
> >>>>>>>>>>>> However, overall I think this one needs a lot more discussion
> >>>>>>>>>>>> before
> >>>>>>>>>>>> getting anywhere near a vote. I will leave a -1 just as a
> >>>>>>>>>>>> procedural step.
> >>>>>>>>>>>> Maybe some of the people working in the client area can also
> >>>>>>>>>>>> chime in.
> >>>>>>>>>>>>>>> best,
> >>>>>>>>>>>>>>> Colin
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
> >>>>>>>>>>>>>>>> Dear PMCs,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So far there have been 0 responses to KIP-944. I
> understand
> >>>>>>>>>>>>>>>> this may
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>>>> be something that keeps you busy, but this KIP is
> important
> >>>>>>>>>>>>>>>> to people
> >>>>>>>>>>>>>>>> that use async runtimes like Zio, Cats and Kotlin.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Is there anything you need to come to a decision?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>   Erik.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
> >>>>>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'd like to call a vote on KIP-944 Support async
> >>>>>>>>>>>>>>>>> runtimes in
> >>>>>>>>>>>> consumer.
> >>>>>>>>>>>>>>>>> It has has been 'under discussion' for 7 days now. 'Under
> >>>>>>>>>>>> discussion'
> >>>>>>>>>>>>>>>>> between quotes, because there were 0 comments so far. I
> >>>>>>>>>>>>>>>>> hope the KIP
> >>>>>>>>>>>>>>>>> is clear!
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>> description:https://cwiki.apache.org/confluence/x/chw0Dw
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>>  Erik.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> --
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
>
>


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-23 Thread Erik van Oosten
ill be able to handle these
callbacks without
involving the other threads.

regards,
Colin


I have extended the KIP's motivation to explain the major use
case.

Please read KIP-944 again. Even though the description is
extensive
(this callback from callback stuff is tricky), you will find
that my
goals are modest.

Also the implementation is just a few lines. With
understanding of the
idea it should not be a lot of work to follow it.

Kind regards,
 Erik.


Op 07-07-2023 om 19:57 schreef Colin McCabe:

Hi Erik,

It's not clear ot me that it's safe to access the Kafka
consumer or

producer concurrently from different threads. There are data
structures
that aren't protected by locks, so I wouldn't necessarily expect
accessing
and mutating them in a concurrent way to work. This is true even
if the
accesses happen at different times, because modern CPUs require
memory
barriers to guarantee inter-thread visibilty of loads and
stores.

I am writing this is without doing a detailed dive into the
code (I

haven't been into the consumer / producer code in a bit.)
Someone who has
worked more on the consumer recently might be able to give
specific
examples of things that wouldn't work.

I know that there are at least a few locks in the consumer
code now,

due to our need to send heartbeats from a worker thread. I don't
think
those would be sufficient to protect a client that is making
calls from
random threads.

There has been some discussion of moving to a more
traditional model

where people make calls to the client and the clients passes the
given data
to a single background worker thread. This would avoid a lot lof
the
footguns of the current model and probably better reflect how
people
actually use the client.

Another issue is that neither the producer nor the consumer
is fully

nonblocking. There are some corner cases where we do in fact
block. From
memory, the producer blocks in some "buffer full" cases, and the
consumer
blocks sometimes when fetching metadata.

I suspect it would be more appropriate for Kotlin coroutines,
Zio

coroutines and so on to adopt this "pass messages to and from a
background
worker thread" model  than to try to re-engineer the Kafka
client ot work
from random threads.

There is actually somed good  advice about how to handle
multiple

threads in the KafkaConsumer.java header file itself. Check the
sections
"One Consumer Per Thread" and "Decouple Consumption and
Processing." What
I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick
response.

However, overall I think this one needs a lot more discussion
before
getting anywhere near a vote. I will leave a -1 just as a
procedural step.
Maybe some of the people working in the client area can also
chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand
this may

not

be something that keeps you busy, but this KIP is important
to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
  Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async
runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under

discussion'

between quotes, because there were 0 comments so far. I
hope the KIP
is clear!

KIP
description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
 Erik.



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-22 Thread David Jacot
gt;>>>>>>> > It's not clear ot me that it's safe to access the Kafka
> >>>>>>>>>>>> consumer or
> >>>>>>>>>>>>> producer concurrently from different threads.
> >>>>>>>>>>>> Concurrent access is /not/ a design goal of KIP-944. In fact,
> >>>>>>>>>>>> it goes
> >>>>>>>>>>>> through great lengths to make sure that this cannot happen.
> >>>>>>>>>>>>
> >>>>>>>>>>>> *The only design goal is to allow callbacks to call the
> >>>>>>>>>>>> consumer from
> >>>>>>>>>>>> another thread.*
> >>>>>>>>>>>>
> >>>>>>>>>>>> To make sure there are no more misunderstandings about this, I
> >>>>>>>>>>>> have
> >>>>>>>>>>>> added this goal to the KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>> Hi Erik,
> >>>>>>>>>>>
> >>>>>>>>>>> Sorry, I spoke imprecisely. My concern is not concurrent
> >>>>>>>>>>> access, but
> >>>>>>>>>> multithreaded access in general. Basically cache line visibility
> >>>>>>>>>> issues.
> >>>>>>>>>>>> > This is true even if the accesses happen at different
> >>>>>>>>>>>> times,
> >>>>>>>>>> because
> >>>>>>>>>>>>> modern CPUs require memory barriers to guarantee inter-thread
> >>>>>>>>>> visibilty
> >>>>>>>>>>>>> of loads and stores.
> >>>>>>>>>>>> In KIP-944, the callback thread can only delegate to another
> >>>>>>>>>>>> thread
> >>>>>>>>>>>> after reading from and writing to a threadlocal variable,
> >>>>>>>>>>>> providing the
> >>>>>>>>>>>> barriers right there.
> >>>>>>>>>>>>
> >>>>>>>>>>> I don't see any documentation that accessing thread local
> >>>>>>>>>>> variables
> >>>>>>>>>> provides a total store or load barrier. Do you have such
> >>>>>>>>>> documentation? It
> >>>>>>>>>> seems like if this were the case, we could eliminate volatile
> >>>>>>>>>> variables
> >>>>>>>>>> from most of the code base.
> >>>>>>>>>>>> > I know that there are at least a few locks in the
> >>>>>>>>>>>> consumer code
> >>>>>>>>>> now,
> >>>>>>>>>>>>> due to our need to send heartbeats from a worker thread. I
> >>>>>>>>>>>>> don't think
> >>>>>>>>>>>>> those would be sufficient to protect a client that is making
> >>>>>>>>>>>>> calls
> >>>>>>>>>> from
> >>>>>>>>>>>>> random threads.
> >>>>>>>>>>>> In the current implementation the consumer is also invoked
> >>>>>>>>>>>> from random
> >>>>>>>>>>>> threads. If it works now, it should continue to work.
> >>>>>>>>>>>>
> >>>>>>>>>>> I'm not sure what you're referring to. Can you expand on this?
> >>>>>>>>>>>
> >>>>>>>>>>>> > There has been some discussion of moving to a more
> >>>>>>>>>>>> traditional
> >>>>>>>>>> model
> >>>>>>>>>>>>> where people make calls to the client and the clients passes
> >>>>>>>>>>>>> the given
> >>>>>>>>>>>>> data to a single background worker thread. This would avoid a
> >>>>>>>>>>>>> lot lof
> >>>>>>>>>>>>> the footguns of the current model and probably better reflect
> >>>>>>>>>>>>> how
> >>>>>>>>>> people
> >>>>>>>>>>>>> actually use the client.
> >>>>>>>>>>>> That is awesome. However, I'd rather not wait for that.
> >>>>>>>>>>>>
> >>>>>>>>>>>> > Another issue is that neither the producer nor the
> >>>>>>>>>>>> consumer is
> >>>>>>>>>> fully
> >>>>>>>>>>>>> nonblocking. There are some corner cases where we do in fact
> >>>>>>>>>>>>> block.
> >>>>>>>>>> From
> >>>>>>>>>>>>> memory, the producer blocks in some "buffer full" cases,
> >>>>>>>>>>>>> and the
> >>>>>>>>>>>>> consumer blocks sometimes when fetching metadata.
> >>>>>>>>>>>> I am aware of that. This is not an issue; all async
> >>>>>>>>>>>> runtimes can
> >>>>>>>>>>>> cooperate with blocking code.
> >>>>>>>>>>>>
> >>>>>>>>>>> Hmm, not sure what you mean by "cooperate with blocking code."
> >>>>>>>>>>> If you
> >>>>>>>>>> have 10 green threads you're multiplexing on to one CPU thread,
> >>>>>>>>>> and that
> >>>>>>>>>> CPU thread gets blocked because of what one green thread is
> >>>>>>>>>> doing, the
> >>>>>>>>>> other 9 green threads are blocked too, right? I guess it's
> >>>>>>>>>> "just" a
> >>>>>>>>>> performance problem, but it still seems like it could be a
> >>>>>>>>>> serious one.
> >>>>>>>>>>>> > I suspect it would be more appropriate for Kotlin
> >>>>>>>>>>>> coroutines, Zio
> >>>>>>>>>>>>> coroutines and so on to adopt this "pass messages to and
> >>>>>>>>>>>>> from a
> >>>>>>>>>>>>> background worker thread" model than to try to re-engineer
> >>>>>>>>>>>>> the Kafka
> >>>>>>>>>>>>> client ot work from random threads.
> >>>>>>>>>>>> In both zio-kafka and fs2-kafka this is already the approach
> >>>>>>>>>>>> we are
> >>>>>>>>>> taking.
> >>>>>>>>>>>> Unfortunately, the Kafka consumer forces us to perform some
> >>>>>>>>>>>> work in
> >>>>>>>>>>>> callbacks:
> >>>>>>>>>>>>
> >>>>>>>>>>>>  * commit completed callback: register that the
> >>>>>>>>>>>> callback is
> >>>>>>>>>>>> complete,
> >>>>>>>>>>>>  * partition revoked callback: in this callback we need to
> >>>>>>>>>>>> submit
> >>>>>>>>>>>>commits from everything consumed and processed so far,
> >>>>>>>>>>>> using
> >>>>>>>>>>>>timeouts if processing takes to long. In an async
> >>>>>>>>>>>> runtime, this is
> >>>>>>>>>>>>an inherently multi-threaded process. Especially, we
> >>>>>>>>>>>> cannot do
> >>>>>>>>>>>>timeouts without involving multiple threads.
> >>>>>>>>>>>>
> >>>>>>>>>>> I don't see why this has to be "inherently multi-threaded." Why
> >>>>>>>>>>> can't
> >>>>>>>>>> we have the other threads report back what messages they've
> >>>>>>>>>> processed to
> >>>>>>>>>> the worker thread. Then it will be able to handle these
> >>>>>>>>>> callbacks without
> >>>>>>>>>> involving the other threads.
> >>>>>>>>>>> regards,
> >>>>>>>>>>> Colin
> >>>>>>>>>>>
> >>>>>>>>>>>> I have extended the KIP's motivation to explain the major use
> >>>>>>>>>>>> case.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Please read KIP-944 again. Even though the description is
> >>>>>>>>>>>> extensive
> >>>>>>>>>>>> (this callback from callback stuff is tricky), you will find
> >>>>>>>>>>>> that my
> >>>>>>>>>>>> goals are modest.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also the implementation is just a few lines. With
> >>>>>>>>>>>> understanding of the
> >>>>>>>>>>>> idea it should not be a lot of work to follow it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>> Erik.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Op 07-07-2023 om 19:57 schreef Colin McCabe:
> >>>>>>>>>>>>> Hi Erik,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It's not clear ot me that it's safe to access the Kafka
> >>>>>>>>>>>>> consumer or
> >>>>>>>>>> producer concurrently from different threads. There are data
> >>>>>>>>>> structures
> >>>>>>>>>> that aren't protected by locks, so I wouldn't necessarily expect
> >>>>>>>>>> accessing
> >>>>>>>>>> and mutating them in a concurrent way to work. This is true even
> >>>>>>>>>> if the
> >>>>>>>>>> accesses happen at different times, because modern CPUs require
> >>>>>>>>>> memory
> >>>>>>>>>> barriers to guarantee inter-thread visibilty of loads and
> >>>>>>>>>> stores.
> >>>>>>>>>>>>> I am writing this is without doing a detailed dive into the
> >>>>>>>>>>>>> code (I
> >>>>>>>>>> haven't been into the consumer / producer code in a bit.)
> >>>>>>>>>> Someone who has
> >>>>>>>>>> worked more on the consumer recently might be able to give
> >>>>>>>>>> specific
> >>>>>>>>>> examples of things that wouldn't work.
> >>>>>>>>>>>>> I know that there are at least a few locks in the consumer
> >>>>>>>>>>>>> code now,
> >>>>>>>>>> due to our need to send heartbeats from a worker thread. I don't
> >>>>>>>>>> think
> >>>>>>>>>> those would be sufficient to protect a client that is making
> >>>>>>>>>> calls from
> >>>>>>>>>> random threads.
> >>>>>>>>>>>>> There has been some discussion of moving to a more
> >>>>>>>>>>>>> traditional model
> >>>>>>>>>> where people make calls to the client and the clients passes the
> >>>>>>>>>> given data
> >>>>>>>>>> to a single background worker thread. This would avoid a lot lof
> >>>>>>>>>> the
> >>>>>>>>>> footguns of the current model and probably better reflect how
> >>>>>>>>>> people
> >>>>>>>>>> actually use the client.
> >>>>>>>>>>>>> Another issue is that neither the producer nor the consumer
> >>>>>>>>>>>>> is fully
> >>>>>>>>>> nonblocking. There are some corner cases where we do in fact
> >>>>>>>>>> block. From
> >>>>>>>>>> memory, the producer blocks in some "buffer full" cases, and the
> >>>>>>>>>> consumer
> >>>>>>>>>> blocks sometimes when fetching metadata.
> >>>>>>>>>>>>> I suspect it would be more appropriate for Kotlin coroutines,
> >>>>>>>>>>>>> Zio
> >>>>>>>>>> coroutines and so on to adopt this "pass messages to and from a
> >>>>>>>>>> background
> >>>>>>>>>> worker thread" model  than to try to re-engineer the Kafka
> >>>>>>>>>> client ot work
> >>>>>>>>>> from random threads.
> >>>>>>>>>>>>> There is actually somed good  advice about how to handle
> >>>>>>>>>>>>> multiple
> >>>>>>>>>> threads in the KafkaConsumer.java header file itself. Check the
> >>>>>>>>>> sections
> >>>>>>>>>> "One Consumer Per Thread" and "Decouple Consumption and
> >>>>>>>>>> Processing." What
> >>>>>>>>>> I'm recommending here is essentially the latter.
> >>>>>>>>>>>>> I do understand that it's frustrating to not get a quick
> >>>>>>>>>>>>> response.
> >>>>>>>>>> However, overall I think this one needs a lot more discussion
> >>>>>>>>>> before
> >>>>>>>>>> getting anywhere near a vote. I will leave a -1 just as a
> >>>>>>>>>> procedural step.
> >>>>>>>>>> Maybe some of the people working in the client area can also
> >>>>>>>>>> chime in.
> >>>>>>>>>>>>> best,
> >>>>>>>>>>>>> Colin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
> >>>>>>>>>>>>>> Dear PMCs,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So far there have been 0 responses to KIP-944. I understand
> >>>>>>>>>>>>>> this may
> >>>>>>>>>> not
> >>>>>>>>>>>>>> be something that keeps you busy, but this KIP is important
> >>>>>>>>>>>>>> to people
> >>>>>>>>>>>>>> that use async runtimes like Zio, Cats and Kotlin.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Is there anything you need to come to a decision?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>  Erik.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
> >>>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'd like to call a vote on KIP-944 Support async
> >>>>>>>>>>>>>>> runtimes in
> >>>>>>>>>> consumer.
> >>>>>>>>>>>>>>> It has has been 'under discussion' for 7 days now. 'Under
> >>>>>>>>>> discussion'
> >>>>>>>>>>>>>>> between quotes, because there were 0 comments so far. I
> >>>>>>>>>>>>>>> hope the KIP
> >>>>>>>>>>>>>>> is clear!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>> description:https://cwiki.apache.org/confluence/x/chw0Dw
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>> Erik.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >
> --
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
>
>


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-22 Thread Erik van Oosten
s where we do in fact
block. From
memory, the producer blocks in some "buffer full" cases, and the
consumer
blocks sometimes when fetching metadata.

I suspect it would be more appropriate for Kotlin coroutines,
Zio

coroutines and so on to adopt this "pass messages to and from a
background
worker thread" model  than to try to re-engineer the Kafka
client ot work
from random threads.

There is actually somed good  advice about how to handle
multiple

threads in the KafkaConsumer.java header file itself. Check the
sections
"One Consumer Per Thread" and "Decouple Consumption and
Processing." What
I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick
response.

However, overall I think this one needs a lot more discussion
before
getting anywhere near a vote. I will leave a -1 just as a
procedural step.
Maybe some of the people working in the client area can also
chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand
this may

not

be something that keeps you busy, but this KIP is important
to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
 Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async 
runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under

discussion'

between quotes, because there were 0 comments so far. I
hope the KIP
is clear!

KIP 
description:https://cwiki.apache.org/confluence/x/chw0Dw


Kind regards,
    Erik.





--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-22 Thread Chris Egerton
gt;>>>>>>>> don't think
> >>>>>>>>>>>> those would be sufficient to protect a client that is making
> >>>>>>>>>>>> calls
> >>>>>>>>> from
> >>>>>>>>>>>> random threads.
> >>>>>>>>>>> In the current implementation the consumer is also invoked
> >>>>>>>>>>> from random
> >>>>>>>>>>> threads. If it works now, it should continue to work.
> >>>>>>>>>>>
> >>>>>>>>>> I'm not sure what you're referring to. Can you expand on this?
> >>>>>>>>>>
> >>>>>>>>>>> > There has been some discussion of moving to a more
> >>>>>>>>>>> traditional
> >>>>>>>>> model
> >>>>>>>>>>>> where people make calls to the client and the clients passes
> >>>>>>>>>>>> the given
> >>>>>>>>>>>> data to a single background worker thread. This would avoid a
> >>>>>>>>>>>> lot lof
> >>>>>>>>>>>> the footguns of the current model and probably better reflect
> >>>>>>>>>>>> how
> >>>>>>>>> people
> >>>>>>>>>>>> actually use the client.
> >>>>>>>>>>> That is awesome. However, I'd rather not wait for that.
> >>>>>>>>>>>
> >>>>>>>>>>> > Another issue is that neither the producer nor the
> >>>>>>>>>>> consumer is
> >>>>>>>>> fully
> >>>>>>>>>>>> nonblocking. There are some corner cases where we do in fact
> >>>>>>>>>>>> block.
> >>>>>>>>> From
> >>>>>>>>>>>> memory, the producer blocks in some "buffer full" cases, and
> the
> >>>>>>>>>>>> consumer blocks sometimes when fetching metadata.
> >>>>>>>>>>> I am aware of that. This is not an issue; all async runtimes
> can
> >>>>>>>>>>> cooperate with blocking code.
> >>>>>>>>>>>
> >>>>>>>>>> Hmm, not sure what you mean by "cooperate with blocking code."
> >>>>>>>>>> If you
> >>>>>>>>> have 10 green threads you're multiplexing on to one CPU thread,
> >>>>>>>>> and that
> >>>>>>>>> CPU thread gets blocked because of what one green thread is
> >>>>>>>>> doing, the
> >>>>>>>>> other 9 green threads are blocked too, right? I guess it's
> "just" a
> >>>>>>>>> performance problem, but it still seems like it could be a
> >>>>>>>>> serious one.
> >>>>>>>>>>> > I suspect it would be more appropriate for Kotlin
> >>>>>>>>>>> coroutines, Zio
> >>>>>>>>>>>> coroutines and so on to adopt this "pass messages to and from
> a
> >>>>>>>>>>>> background worker thread" model than to try to re-engineer
> >>>>>>>>>>>> the Kafka
> >>>>>>>>>>>> client ot work from random threads.
> >>>>>>>>>>> In both zio-kafka and fs2-kafka this is already the approach
> >>>>>>>>>>> we are
> >>>>>>>>> taking.
> >>>>>>>>>>> Unfortunately, the Kafka consumer forces us to perform some
> >>>>>>>>>>> work in
> >>>>>>>>>>> callbacks:
> >>>>>>>>>>>
> >>>>>>>>>>>  * commit completed callback: register that the callback is
> >>>>>>>>>>> complete,
> >>>>>>>>>>>  * partition revoked callback: in this callback we need to
> >>>>>>>>>>> submit
> >>>>>>>>>>>commits from everything consumed and processed so far,
> >>>>>>>>>>> using
> >>>>>>>>>>>timeouts if processing takes to long. In an async
> >>>>>>>>>>> runtime, this is
> >>>>>>>>>>>an inherently multi-threaded process. Especially, we
> >>>>>>>>>>> cannot do
> >>>>>>>>>>>timeouts without involving multiple threads.
> >>>>>>>>>>>
> >>>>>>>>>> I don't see why this has to be "inherently multi-threaded." Why
> >>>>>>>>>> can't
> >>>>>>>>> we have the other threads report back what messages they've
> >>>>>>>>> processed to
> >>>>>>>>> the worker thread. Then it will be able to handle these
> >>>>>>>>> callbacks without
> >>>>>>>>> involving the other threads.
> >>>>>>>>>> regards,
> >>>>>>>>>> Colin
> >>>>>>>>>>
> >>>>>>>>>>> I have extended the KIP's motivation to explain the major use
> >>>>>>>>>>> case.
> >>>>>>>>>>>
> >>>>>>>>>>> Please read KIP-944 again. Even though the description is
> >>>>>>>>>>> extensive
> >>>>>>>>>>> (this callback from callback stuff is tricky), you will find
> >>>>>>>>>>> that my
> >>>>>>>>>>> goals are modest.
> >>>>>>>>>>>
> >>>>>>>>>>> Also the implementation is just a few lines. With
> >>>>>>>>>>> understanding of the
> >>>>>>>>>>> idea it should not be a lot of work to follow it.
> >>>>>>>>>>>
> >>>>>>>>>>> Kind regards,
> >>>>>>>>>>> Erik.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Op 07-07-2023 om 19:57 schreef Colin McCabe:
> >>>>>>>>>>>> Hi Erik,
> >>>>>>>>>>>>
> >>>>>>>>>>>> It's not clear ot me that it's safe to access the Kafka
> >>>>>>>>>>>> consumer or
> >>>>>>>>> producer concurrently from different threads. There are data
> >>>>>>>>> structures
> >>>>>>>>> that aren't protected by locks, so I wouldn't necessarily expect
> >>>>>>>>> accessing
> >>>>>>>>> and mutating them in a concurrent way to work. This is true even
> >>>>>>>>> if the
> >>>>>>>>> accesses happen at different times, because modern CPUs require
> >>>>>>>>> memory
> >>>>>>>>> barriers to guarantee inter-thread visibilty of loads and stores.
> >>>>>>>>>>>> I am writing this is without doing a detailed dive into the
> >>>>>>>>>>>> code (I
> >>>>>>>>> haven't been into the consumer / producer code in a bit.)
> >>>>>>>>> Someone who has
> >>>>>>>>> worked more on the consumer recently might be able to give
> specific
> >>>>>>>>> examples of things that wouldn't work.
> >>>>>>>>>>>> I know that there are at least a few locks in the consumer
> >>>>>>>>>>>> code now,
> >>>>>>>>> due to our need to send heartbeats from a worker thread. I don't
> >>>>>>>>> think
> >>>>>>>>> those would be sufficient to protect a client that is making
> >>>>>>>>> calls from
> >>>>>>>>> random threads.
> >>>>>>>>>>>> There has been some discussion of moving to a more
> >>>>>>>>>>>> traditional model
> >>>>>>>>> where people make calls to the client and the clients passes the
> >>>>>>>>> given data
> >>>>>>>>> to a single background worker thread. This would avoid a lot lof
> >>>>>>>>> the
> >>>>>>>>> footguns of the current model and probably better reflect how
> >>>>>>>>> people
> >>>>>>>>> actually use the client.
> >>>>>>>>>>>> Another issue is that neither the producer nor the consumer
> >>>>>>>>>>>> is fully
> >>>>>>>>> nonblocking. There are some corner cases where we do in fact
> >>>>>>>>> block. From
> >>>>>>>>> memory, the producer blocks in some "buffer full" cases, and the
> >>>>>>>>> consumer
> >>>>>>>>> blocks sometimes when fetching metadata.
> >>>>>>>>>>>> I suspect it would be more appropriate for Kotlin coroutines,
> >>>>>>>>>>>> Zio
> >>>>>>>>> coroutines and so on to adopt this "pass messages to and from a
> >>>>>>>>> background
> >>>>>>>>> worker thread" model  than to try to re-engineer the Kafka
> >>>>>>>>> client ot work
> >>>>>>>>> from random threads.
> >>>>>>>>>>>> There is actually somed good  advice about how to handle
> >>>>>>>>>>>> multiple
> >>>>>>>>> threads in the KafkaConsumer.java header file itself. Check the
> >>>>>>>>> sections
> >>>>>>>>> "One Consumer Per Thread" and "Decouple Consumption and
> >>>>>>>>> Processing." What
> >>>>>>>>> I'm recommending here is essentially the latter.
> >>>>>>>>>>>> I do understand that it's frustrating to not get a quick
> >>>>>>>>>>>> response.
> >>>>>>>>> However, overall I think this one needs a lot more discussion
> >>>>>>>>> before
> >>>>>>>>> getting anywhere near a vote. I will leave a -1 just as a
> >>>>>>>>> procedural step.
> >>>>>>>>> Maybe some of the people working in the client area can also
> >>>>>>>>> chime in.
> >>>>>>>>>>>> best,
> >>>>>>>>>>>> Colin
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
> >>>>>>>>>>>>> Dear PMCs,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So far there have been 0 responses to KIP-944. I understand
> >>>>>>>>>>>>> this may
> >>>>>>>>> not
> >>>>>>>>>>>>> be something that keeps you busy, but this KIP is important
> >>>>>>>>>>>>> to people
> >>>>>>>>>>>>> that use async runtimes like Zio, Cats and Kotlin.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Is there anything you need to come to a decision?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>  Erik.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
> >>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'd like to call a vote on KIP-944 Support async runtimes in
> >>>>>>>>> consumer.
> >>>>>>>>>>>>>> It has has been 'under discussion' for 7 days now. 'Under
> >>>>>>>>> discussion'
> >>>>>>>>>>>>>> between quotes, because there were 0 comments so far. I
> >>>>>>>>>>>>>> hope the KIP
> >>>>>>>>>>>>>> is clear!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> KIP description:
> https://cwiki.apache.org/confluence/x/chw0Dw
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>> Erik.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >> --
> >> Erik van Oosten
> >> e.vanoos...@grons.nl
> >> https://day-to-day-stuff.blogspot.com
> >>
> >>
> --
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
>
>


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-22 Thread Erik van Oosten
ponses to KIP-944. I understand
this may

not

be something that keeps you busy, but this KIP is important
to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
 Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under

discussion'

between quotes, because there were 0 comments so far. I
hope the KIP
is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
Erik.



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-22 Thread Erik van Oosten
background
worker thread" model  than to try to re-engineer the Kafka
client ot work
from random threads.

There is actually somed good  advice about how to handle
multiple

threads in the KafkaConsumer.java header file itself. Check the
sections
"One Consumer Per Thread" and "Decouple Consumption and
Processing." What
I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick
response.

However, overall I think this one needs a lot more discussion
before
getting anywhere near a vote. I will leave a -1 just as a
procedural step.
Maybe some of the people working in the client area can also
chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand
this may

not

be something that keeps you busy, but this KIP is important
to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
 Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under

discussion'

between quotes, because there were 0 comments so far. I
hope the KIP
is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
Erik.




--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-22 Thread Chris Egerton
;>>> Sorry, I spoke imprecisely. My concern is not concurrent
> >>>>>>>> access, but
> >>>>>>> multithreaded access in general. Basically cache line visibility
> >>>>>>> issues.
> >>>>>>>>>> This is true even if the accesses happen at different times,
> >>>>>>> because
> >>>>>>>>>> modern CPUs require memory barriers to guarantee inter-thread
> >>>>>>> visibilty
> >>>>>>>>>> of loads and stores.
> >>>>>>>>> In KIP-944, the callback thread can only delegate to another
> >>>>>>>>> thread
> >>>>>>>>> after reading from and writing to a threadlocal variable,
> >>>>>>>>> providing the
> >>>>>>>>> barriers right there.
> >>>>>>>>>
> >>>>>>>> I don't see any documentation that accessing thread local
> >>>>>>>> variables
> >>>>>>> provides a total store or load barrier. Do you have such
> >>>>>>> documentation? It
> >>>>>>> seems like if this were the case, we could eliminate volatile
> >>>>>>> variables
> >>>>>>> from most of the code base.
> >>>>>>>>>> I know that there are at least a few locks in the
> >>>>>>>>> consumer code
> >>>>>>> now,
> >>>>>>>>>> due to our need to send heartbeats from a worker thread. I
> >>>>>>>>>> don't think
> >>>>>>>>>> those would be sufficient to protect a client that is making
> >>>>>>>>>> calls
> >>>>>>> from
> >>>>>>>>>> random threads.
> >>>>>>>>> In the current implementation the consumer is also invoked
> >>>>>>>>> from random
> >>>>>>>>> threads. If it works now, it should continue to work.
> >>>>>>>>>
> >>>>>>>> I'm not sure what you're referring to. Can you expand on this?
> >>>>>>>>
> >>>>>>>>>> There has been some discussion of moving to a more
> >>>>>>>>> traditional
> >>>>>>> model
> >>>>>>>>>> where people make calls to the client and the clients passes
> >>>>>>>>>> the given
> >>>>>>>>>> data to a single background worker thread. This would avoid a
> >>>>>>>>>> lot lof
> >>>>>>>>>> the footguns of the current model and probably better reflect
> >>>>>>>>>> how
> >>>>>>> people
> >>>>>>>>>> actually use the client.
> >>>>>>>>> That is awesome. However, I'd rather not wait for that.
> >>>>>>>>>
> >>>>>>>>>> Another issue is that neither the producer nor the
> >>>>>>>>> consumer is
> >>>>>>> fully
> >>>>>>>>>> nonblocking. There are some corner cases where we do in fact
> >>>>>>>>>> block.
> >>>>>>> From
> >>>>>>>>>> memory, the producer blocks in some "buffer full" cases, and the
> >>>>>>>>>> consumer blocks sometimes when fetching metadata.
> >>>>>>>>> I am aware of that. This is not an issue; all async runtimes can
> >>>>>>>>> cooperate with blocking code.
> >>>>>>>>>
> >>>>>>>> Hmm, not sure what you mean by "cooperate with blocking code."
> >>>>>>>> If you
> >>>>>>> have 10 green threads you're multiplexing on to one CPU thread,
> >>>>>>> and that
> >>>>>>> CPU thread gets blocked because of what one green thread is
> >>>>>>> doing, the
> >>>>>>> other 9 green threads are blocked too, right? I guess it's "just" a
> >>>>>>> performance problem, but it still seems like it could be a
> >>>>>>> serious one.
> >>>>>>>>>> I suspect it would be more appropriate for Kotlin
> >>>>>>>>> coroutines, Zio
> >>>>>>>>>> coroutines and so on to adopt this "pass messages to and from a
> >>>>>>>>>> background worker thread" model than to try to re-engineer
> >>>>>>>>>> the Kafka
> >>>>>>>>>> client ot work from random threads.
> >>>>>>>>> In both zio-kafka and fs2-kafka this is already the approach
> >>>>>>>>> we are
> >>>>>>> taking.
> >>>>>>>>> Unfortunately, the Kafka consumer forces us to perform some
> >>>>>>>>> work in
> >>>>>>>>> callbacks:
> >>>>>>>>>
> >>>>>>>>> * commit completed callback: register that the callback is
> >>>>>>>>> complete,
> >>>>>>>>> * partition revoked callback: in this callback we need to
> >>>>>>>>> submit
> >>>>>>>>>   commits from everything consumed and processed so far,
> >>>>>>>>> using
> >>>>>>>>>   timeouts if processing takes to long. In an async
> >>>>>>>>> runtime, this is
> >>>>>>>>>   an inherently multi-threaded process. Especially, we
> >>>>>>>>> cannot do
> >>>>>>>>>   timeouts without involving multiple threads.
> >>>>>>>>>
> >>>>>>>> I don't see why this has to be "inherently multi-threaded." Why
> >>>>>>>> can't
> >>>>>>> we have the other threads report back what messages they've
> >>>>>>> processed to
> >>>>>>> the worker thread. Then it will be able to handle these
> >>>>>>> callbacks without
> >>>>>>> involving the other threads.
> >>>>>>>> regards,
> >>>>>>>> Colin
> >>>>>>>>
> >>>>>>>>> I have extended the KIP's motivation to explain the major use
> >>>>>>>>> case.
> >>>>>>>>>
> >>>>>>>>> Please read KIP-944 again. Even though the description is
> >>>>>>>>> extensive
> >>>>>>>>> (this callback from callback stuff is tricky), you will find
> >>>>>>>>> that my
> >>>>>>>>> goals are modest.
> >>>>>>>>>
> >>>>>>>>> Also the implementation is just a few lines. With
> >>>>>>>>> understanding of the
> >>>>>>>>> idea it should not be a lot of work to follow it.
> >>>>>>>>>
> >>>>>>>>> Kind regards,
> >>>>>>>>>Erik.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Op 07-07-2023 om 19:57 schreef Colin McCabe:
> >>>>>>>>>> Hi Erik,
> >>>>>>>>>>
> >>>>>>>>>> It's not clear ot me that it's safe to access the Kafka
> >>>>>>>>>> consumer or
> >>>>>>> producer concurrently from different threads. There are data
> >>>>>>> structures
> >>>>>>> that aren't protected by locks, so I wouldn't necessarily expect
> >>>>>>> accessing
> >>>>>>> and mutating them in a concurrent way to work. This is true even
> >>>>>>> if the
> >>>>>>> accesses happen at different times, because modern CPUs require
> >>>>>>> memory
> >>>>>>> barriers to guarantee inter-thread visibilty of loads and stores.
> >>>>>>>>>> I am writing this is without doing a detailed dive into the
> >>>>>>>>>> code (I
> >>>>>>> haven't been into the consumer / producer code in a bit.)
> >>>>>>> Someone who has
> >>>>>>> worked more on the consumer recently might be able to give specific
> >>>>>>> examples of things that wouldn't work.
> >>>>>>>>>> I know that there are at least a few locks in the consumer
> >>>>>>>>>> code now,
> >>>>>>> due to our need to send heartbeats from a worker thread. I don't
> >>>>>>> think
> >>>>>>> those would be sufficient to protect a client that is making
> >>>>>>> calls from
> >>>>>>> random threads.
> >>>>>>>>>> There has been some discussion of moving to a more
> >>>>>>>>>> traditional model
> >>>>>>> where people make calls to the client and the clients passes the
> >>>>>>> given data
> >>>>>>> to a single background worker thread. This would avoid a lot lof
> >>>>>>> the
> >>>>>>> footguns of the current model and probably better reflect how
> >>>>>>> people
> >>>>>>> actually use the client.
> >>>>>>>>>> Another issue is that neither the producer nor the consumer
> >>>>>>>>>> is fully
> >>>>>>> nonblocking. There are some corner cases where we do in fact
> >>>>>>> block. From
> >>>>>>> memory, the producer blocks in some "buffer full" cases, and the
> >>>>>>> consumer
> >>>>>>> blocks sometimes when fetching metadata.
> >>>>>>>>>> I suspect it would be more appropriate for Kotlin coroutines,
> >>>>>>>>>> Zio
> >>>>>>> coroutines and so on to adopt this "pass messages to and from a
> >>>>>>> background
> >>>>>>> worker thread" model  than to try to re-engineer the Kafka
> >>>>>>> client ot work
> >>>>>>> from random threads.
> >>>>>>>>>> There is actually somed good  advice about how to handle
> >>>>>>>>>> multiple
> >>>>>>> threads in the KafkaConsumer.java header file itself. Check the
> >>>>>>> sections
> >>>>>>> "One Consumer Per Thread" and "Decouple Consumption and
> >>>>>>> Processing." What
> >>>>>>> I'm recommending here is essentially the latter.
> >>>>>>>>>> I do understand that it's frustrating to not get a quick
> >>>>>>>>>> response.
> >>>>>>> However, overall I think this one needs a lot more discussion
> >>>>>>> before
> >>>>>>> getting anywhere near a vote. I will leave a -1 just as a
> >>>>>>> procedural step.
> >>>>>>> Maybe some of the people working in the client area can also
> >>>>>>> chime in.
> >>>>>>>>>> best,
> >>>>>>>>>> Colin
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
> >>>>>>>>>>> Dear PMCs,
> >>>>>>>>>>>
> >>>>>>>>>>> So far there have been 0 responses to KIP-944. I understand
> >>>>>>>>>>> this may
> >>>>>>> not
> >>>>>>>>>>> be something that keeps you busy, but this KIP is important
> >>>>>>>>>>> to people
> >>>>>>>>>>> that use async runtimes like Zio, Cats and Kotlin.
> >>>>>>>>>>>
> >>>>>>>>>>> Is there anything you need to come to a decision?
> >>>>>>>>>>>
> >>>>>>>>>>> Kind regards,
> >>>>>>>>>>> Erik.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
> >>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd like to call a vote on KIP-944 Support async runtimes in
> >>>>>>> consumer.
> >>>>>>>>>>>> It has has been 'under discussion' for 7 days now. 'Under
> >>>>>>> discussion'
> >>>>>>>>>>>> between quotes, because there were 0 comments so far. I
> >>>>>>>>>>>> hope the KIP
> >>>>>>>>>>>> is clear!
> >>>>>>>>>>>>
> >>>>>>>>>>>> KIP description:https://cwiki.apache.org/confluence/x/chw0Dw
> >>>>>>>>>>>>
> >>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>Erik.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>
> --
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
>
>


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-22 Thread Erik van Oosten
oach 
we are

taking.
Unfortunately, the Kafka consumer forces us to perform some 
work in

callbacks:

    * commit completed callback: register that the callback is 
complete,
    * partition revoked callback: in this callback we need to 
submit
  commits from everything consumed and processed so far, 
using
  timeouts if processing takes to long. In an async 
runtime, this is
  an inherently multi-threaded process. Especially, we 
cannot do

  timeouts without involving multiple threads.

I don't see why this has to be "inherently multi-threaded." Why 
can't
we have the other threads report back what messages they've 
processed to
the worker thread. Then it will be able to handle these 
callbacks without

involving the other threads.

regards,
Colin

I have extended the KIP's motivation to explain the major use 
case.


Please read KIP-944 again. Even though the description is 
extensive
(this callback from callback stuff is tricky), you will find 
that my

goals are modest.

Also the implementation is just a few lines. With 
understanding of the

idea it should not be a lot of work to follow it.

Kind regards,
   Erik.


Op 07-07-2023 om 19:57 schreef Colin McCabe:

Hi Erik,

It's not clear ot me that it's safe to access the Kafka 
consumer or
producer concurrently from different threads. There are data 
structures
that aren't protected by locks, so I wouldn't necessarily expect 
accessing
and mutating them in a concurrent way to work. This is true even 
if the
accesses happen at different times, because modern CPUs require 
memory

barriers to guarantee inter-thread visibilty of loads and stores.
I am writing this is without doing a detailed dive into the 
code (I
haven't been into the consumer / producer code in a bit.) 
Someone who has

worked more on the consumer recently might be able to give specific
examples of things that wouldn't work.
I know that there are at least a few locks in the consumer 
code now,
due to our need to send heartbeats from a worker thread. I don't 
think
those would be sufficient to protect a client that is making 
calls from

random threads.
There has been some discussion of moving to a more 
traditional model
where people make calls to the client and the clients passes the 
given data
to a single background worker thread. This would avoid a lot lof 
the
footguns of the current model and probably better reflect how 
people

actually use the client.
Another issue is that neither the producer nor the consumer 
is fully
nonblocking. There are some corner cases where we do in fact 
block. From
memory, the producer blocks in some "buffer full" cases, and the 
consumer

blocks sometimes when fetching metadata.
I suspect it would be more appropriate for Kotlin coroutines, 
Zio
coroutines and so on to adopt this "pass messages to and from a 
background
worker thread" model  than to try to re-engineer the Kafka 
client ot work

from random threads.
There is actually somed good  advice about how to handle 
multiple
threads in the KafkaConsumer.java header file itself. Check the 
sections
"One Consumer Per Thread" and "Decouple Consumption and 
Processing." What

I'm recommending here is essentially the latter.
I do understand that it's frustrating to not get a quick 
response.
However, overall I think this one needs a lot more discussion 
before
getting anywhere near a vote. I will leave a -1 just as a 
procedural step.
Maybe some of the people working in the client area can also 
chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand 
this may

not
be something that keeps you busy, but this KIP is important 
to people

that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
    Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under

discussion'
between quotes, because there were 0 comments so far. I 
hope the KIP

is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
   Erik.





--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-21 Thread Matthias J. Sax
ntime, this is
  an inherently multi-threaded process. Especially, we cannot do
  timeouts without involving multiple threads.


I don't see why this has to be "inherently multi-threaded." Why can't

we have the other threads report back what messages they've processed to
the worker thread. Then it will be able to handle these callbacks without
involving the other threads.

regards,
Colin


I have extended the KIP's motivation to explain the major use case.

Please read KIP-944 again. Even though the description is extensive
(this callback from callback stuff is tricky), you will find that my
goals are modest.

Also the implementation is just a few lines. With understanding of the
idea it should not be a lot of work to follow it.

Kind regards,
   Erik.


Op 07-07-2023 om 19:57 schreef Colin McCabe:

Hi Erik,

It's not clear ot me that it's safe to access the Kafka consumer or

producer concurrently from different threads. There are data structures
that aren't protected by locks, so I wouldn't necessarily expect accessing
and mutating them in a concurrent way to work. This is true even if the
accesses happen at different times, because modern CPUs require memory
barriers to guarantee inter-thread visibilty of loads and stores.

I am writing this is without doing a detailed dive into the code (I

haven't been into the consumer / producer code in a bit.) Someone who has
worked more on the consumer recently might be able to give specific
examples of things that wouldn't work.

I know that there are at least a few locks in the consumer code now,

due to our need to send heartbeats from a worker thread. I don't think
those would be sufficient to protect a client that is making calls from
random threads.

There has been some discussion of moving to a more traditional model

where people make calls to the client and the clients passes the given data
to a single background worker thread. This would avoid a lot lof the
footguns of the current model and probably better reflect how people
actually use the client.

Another issue is that neither the producer nor the consumer is fully

nonblocking. There are some corner cases where we do in fact block. From
memory, the producer blocks in some "buffer full" cases, and the consumer
blocks sometimes when fetching metadata.

I suspect it would be more appropriate for Kotlin coroutines, Zio

coroutines and so on to adopt this "pass messages to and from a background
worker thread" model  than to try to re-engineer the Kafka client ot work
from random threads.

There is actually somed good  advice about how to handle multiple

threads in the KafkaConsumer.java header file itself. Check the sections
"One Consumer Per Thread" and "Decouple Consumption and Processing." What
I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick response.

However, overall I think this one needs a lot more discussion before
getting anywhere near a vote. I will leave a -1 just as a procedural step.
Maybe some of the people working in the client area can also chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand this may

not

be something that keeps you busy, but this KIP is important to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under

discussion'

between quotes, because there were 0 comments so far. I hope the KIP
is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
   Erik.



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com





Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-14 Thread Kirk True
r load barrier. Do you have such documentation? It
>>>>> seems like if this were the case, we could eliminate volatile variables
>>>>> from most of the code base.
>>>>>>>   > I know that there are at least a few locks in the consumer code
>>>>> now,
>>>>>>>> due to our need to send heartbeats from a worker thread. I don't think
>>>>>>>> those would be sufficient to protect a client that is making calls
>>>>> from
>>>>>>>> random threads.
>>>>>>> In the current implementation the consumer is also invoked from random
>>>>>>> threads. If it works now, it should continue to work.
>>>>>>> 
>>>>>> I'm not sure what you're referring to. Can you expand on this?
>>>>>> 
>>>>>>>   > There has been some discussion of moving to a more traditional
>>>>> model
>>>>>>>> where people make calls to the client and the clients passes the given
>>>>>>>> data to a single background worker thread. This would avoid a lot lof
>>>>>>>> the footguns of the current model and probably better reflect how
>>>>> people
>>>>>>>> actually use the client.
>>>>>>> That is awesome. However, I'd rather not wait for that.
>>>>>>> 
>>>>>>>   > Another issue is that neither the producer nor the consumer is
>>>>> fully
>>>>>>>> nonblocking. There are some corner cases where we do in fact block.
>>>>> From
>>>>>>>> memory, the producer blocks in some "buffer full" cases, and the
>>>>>>>> consumer blocks sometimes when fetching metadata.
>>>>>>> I am aware of that. This is not an issue; all async runtimes can
>>>>>>> cooperate with blocking code.
>>>>>>> 
>>>>>> Hmm, not sure what you mean by "cooperate with blocking code." If you
>>>>> have 10 green threads you're multiplexing on to one CPU thread, and that
>>>>> CPU thread gets blocked because of what one green thread is doing, the
>>>>> other 9 green threads are blocked too, right? I guess it's "just" a
>>>>> performance problem, but it still seems like it could be a serious one.
>>>>>>>   > I suspect it would be more appropriate for Kotlin coroutines, Zio
>>>>>>>> coroutines and so on to adopt this "pass messages to and from a
>>>>>>>> background worker thread" model than to try to re-engineer the Kafka
>>>>>>>> client ot work from random threads.
>>>>>>> In both zio-kafka and fs2-kafka this is already the approach we are
>>>>> taking.
>>>>>>> Unfortunately, the Kafka consumer forces us to perform some work in
>>>>>>> callbacks:
>>>>>>> 
>>>>>>>* commit completed callback: register that the callback is complete,
>>>>>>>* partition revoked callback: in this callback we need to submit
>>>>>>>  commits from everything consumed and processed so far, using
>>>>>>>  timeouts if processing takes to long. In an async runtime, this is
>>>>>>>  an inherently multi-threaded process. Especially, we cannot do
>>>>>>>  timeouts without involving multiple threads.
>>>>>>> 
>>>>>> I don't see why this has to be "inherently multi-threaded." Why can't
>>>>> we have the other threads report back what messages they've processed to
>>>>> the worker thread. Then it will be able to handle these callbacks without
>>>>> involving the other threads.
>>>>>> regards,
>>>>>> Colin
>>>>>> 
>>>>>>> I have extended the KIP's motivation to explain the major use case.
>>>>>>> 
>>>>>>> Please read KIP-944 again. Even though the description is extensive
>>>>>>> (this callback from callback stuff is tricky), you will find that my
>>>>>>> goals are modest.
>>>>>>> 
>>>>>>> Also the implementation is just a few lines. With understanding of the
>>>>>>> idea it should not be a lot of work to follow it.
>>>>>>> 
>>>>>&

Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-14 Thread Erik van Oosten
 that it's safe to access the Kafka consumer or

producer concurrently from different threads. There are data structures
that aren't protected by locks, so I wouldn't necessarily expect accessing
and mutating them in a concurrent way to work. This is true even if the
accesses happen at different times, because modern CPUs require memory
barriers to guarantee inter-thread visibilty of loads and stores.

I am writing this is without doing a detailed dive into the code (I

haven't been into the consumer / producer code in a bit.) Someone who has
worked more on the consumer recently might be able to give specific
examples of things that wouldn't work.

I know that there are at least a few locks in the consumer code now,

due to our need to send heartbeats from a worker thread. I don't think
those would be sufficient to protect a client that is making calls from
random threads.

There has been some discussion of moving to a more traditional model

where people make calls to the client and the clients passes the given data
to a single background worker thread. This would avoid a lot lof the
footguns of the current model and probably better reflect how people
actually use the client.

Another issue is that neither the producer nor the consumer is fully

nonblocking. There are some corner cases where we do in fact block. From
memory, the producer blocks in some "buffer full" cases, and the consumer
blocks sometimes when fetching metadata.

I suspect it would be more appropriate for Kotlin coroutines, Zio

coroutines and so on to adopt this "pass messages to and from a background
worker thread" model  than to try to re-engineer the Kafka client ot work
from random threads.

There is actually somed good  advice about how to handle multiple

threads in the KafkaConsumer.java header file itself. Check the sections
"One Consumer Per Thread" and "Decouple Consumption and Processing." What
I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick response.

However, overall I think this one needs a lot more discussion before
getting anywhere near a vote. I will leave a -1 just as a procedural step.
Maybe some of the people working in the client area can also chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand this may

not

be something that keeps you busy, but this KIP is important to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under

discussion'

between quotes, because there were 0 comments so far. I hope the KIP
is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
   Erik.



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-13 Thread Colin McCabe
>>> >>> coroutines and so on to adopt this "pass messages to and from a
>>> >>> background worker thread" model than to try to re-engineer the Kafka
>>> >>> client ot work from random threads.
>>> >> In both zio-kafka and fs2-kafka this is already the approach we are
>>> taking.
>>> >>
>>> >> Unfortunately, the Kafka consumer forces us to perform some work in
>>> >> callbacks:
>>> >>
>>> >>* commit completed callback: register that the callback is complete,
>>> >>* partition revoked callback: in this callback we need to submit
>>> >>  commits from everything consumed and processed so far, using
>>> >>  timeouts if processing takes to long. In an async runtime, this is
>>> >>  an inherently multi-threaded process. Especially, we cannot do
>>> >>  timeouts without involving multiple threads.
>>> >>
>>> > I don't see why this has to be "inherently multi-threaded." Why can't
>>> we have the other threads report back what messages they've processed to
>>> the worker thread. Then it will be able to handle these callbacks without
>>> involving the other threads.
>>> >
>>> > regards,
>>> > Colin
>>> >
>>> >> I have extended the KIP's motivation to explain the major use case.
>>> >>
>>> >> Please read KIP-944 again. Even though the description is extensive
>>> >> (this callback from callback stuff is tricky), you will find that my
>>> >> goals are modest.
>>> >>
>>> >> Also the implementation is just a few lines. With understanding of the
>>> >> idea it should not be a lot of work to follow it.
>>> >>
>>> >> Kind regards,
>>> >>   Erik.
>>> >>
>>> >>
>>> >> Op 07-07-2023 om 19:57 schreef Colin McCabe:
>>> >>> Hi Erik,
>>> >>>
>>> >>> It's not clear ot me that it's safe to access the Kafka consumer or
>>> producer concurrently from different threads. There are data structures
>>> that aren't protected by locks, so I wouldn't necessarily expect accessing
>>> and mutating them in a concurrent way to work. This is true even if the
>>> accesses happen at different times, because modern CPUs require memory
>>> barriers to guarantee inter-thread visibilty of loads and stores.
>>> >>>
>>> >>> I am writing this is without doing a detailed dive into the code (I
>>> haven't been into the consumer / producer code in a bit.) Someone who has
>>> worked more on the consumer recently might be able to give specific
>>> examples of things that wouldn't work.
>>> >>>
>>> >>> I know that there are at least a few locks in the consumer code now,
>>> due to our need to send heartbeats from a worker thread. I don't think
>>> those would be sufficient to protect a client that is making calls from
>>> random threads.
>>> >>>
>>> >>> There has been some discussion of moving to a more traditional model
>>> where people make calls to the client and the clients passes the given data
>>> to a single background worker thread. This would avoid a lot lof the
>>> footguns of the current model and probably better reflect how people
>>> actually use the client.
>>> >>>
>>> >>> Another issue is that neither the producer nor the consumer is fully
>>> nonblocking. There are some corner cases where we do in fact block. From
>>> memory, the producer blocks in some "buffer full" cases, and the consumer
>>> blocks sometimes when fetching metadata.
>>> >>>
>>> >>> I suspect it would be more appropriate for Kotlin coroutines, Zio
>>> coroutines and so on to adopt this "pass messages to and from a background
>>> worker thread" model  than to try to re-engineer the Kafka client ot work
>>> from random threads.
>>> >>>
>>> >>> There is actually somed good  advice about how to handle multiple
>>> threads in the KafkaConsumer.java header file itself. Check the sections
>>> "One Consumer Per Thread" and "Decouple Consumption and Processing." What
>>> I'm recommending here is essentially the latter.
>>> >>>
>>> >>> I do understand that it's frustrating to not get a quick response.
>>> However, overall I think this one needs a lot more discussion before
>>> getting anywhere near a vote. I will leave a -1 just as a procedural step.
>>> Maybe some of the people working in the client area can also chime in.
>>> >>>
>>> >>> best,
>>> >>> Colin
>>> >>>
>>> >>>
>>> >>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
>>> >>>> Dear PMCs,
>>> >>>>
>>> >>>> So far there have been 0 responses to KIP-944. I understand this may
>>> not
>>> >>>> be something that keeps you busy, but this KIP is important to people
>>> >>>> that use async runtimes like Zio, Cats and Kotlin.
>>> >>>>
>>> >>>> Is there anything you need to come to a decision?
>>> >>>>
>>> >>>> Kind regards,
>>> >>>>Erik.
>>> >>>>
>>> >>>>
>>> >>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
>>> >>>>> Hello all,
>>> >>>>>
>>> >>>>> I'd like to call a vote on KIP-944 Support async runtimes in
>>> consumer.
>>> >>>>> It has has been 'under discussion' for 7 days now. 'Under
>>> discussion'
>>> >>>>> between quotes, because there were 0 comments so far. I hope the KIP
>>> >>>>> is clear!
>>> >>>>>
>>> >>>>> KIP description:https://cwiki.apache.org/confluence/x/chw0Dw
>>> >>>>>
>>> >>>>> Kind regards,
>>> >>>>>   Erik.
>>> >>>>>
>>> >>>>>
>>> >> --
>>> >> Erik van Oosten
>>> >> e.vanoos...@grons.nl
>>> >> https://day-to-day-stuff.blogspot.com
>>>
>>> --
>>> Erik van Oosten
>>> e.vanoos...@grons.nl
>>> https://day-to-day-stuff.blogspot.com
>>>
>>>


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-12 Thread Erik van Oosten
is that neither the producer nor the consumer is fully
nonblocking. There are some corner cases where we do in fact block. 
From
memory, the producer blocks in some "buffer full" cases, and the 
consumer

blocks sometimes when fetching metadata.

I suspect it would be more appropriate for Kotlin coroutines, Zio
coroutines and so on to adopt this "pass messages to and from a 
background
worker thread" model  than to try to re-engineer the Kafka client ot 
work

from random threads.

There is actually somed good advice about how to handle multiple
threads in the KafkaConsumer.java header file itself. Check the 
sections
"One Consumer Per Thread" and "Decouple Consumption and Processing." 
What

I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick response.

However, overall I think this one needs a lot more discussion before
getting anywhere near a vote. I will leave a -1 just as a procedural 
step.

Maybe some of the people working in the client area can also chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand this 
may

not
be something that keeps you busy, but this KIP is important to 
people

that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
    Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in

consumer.
It has has been 'under discussion' for 7 days now. 'Under 
discussion'
between quotes, because there were 0 comments so far. I hope 
the KIP

is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
   Erik.



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-12 Thread Erik van Oosten
than to try to re-engineer the Kafka client ot work
from random threads.

There is actually somed good  advice about how to handle multiple

threads in the KafkaConsumer.java header file itself. Check the sections
"One Consumer Per Thread" and "Decouple Consumption and Processing." What
I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick response.

However, overall I think this one needs a lot more discussion before
getting anywhere near a vote. I will leave a -1 just as a procedural step.
Maybe some of the people working in the client area can also chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand this may

not

be something that keeps you busy, but this KIP is important to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in

consumer.

It has has been 'under discussion' for 7 days now. 'Under discussion'
between quotes, because there were 0 comments so far. I hope the KIP
is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
   Erik.


--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-11 Thread Philip Nee
's motivation to explain the major use case.
>> >>
>> >> Please read KIP-944 again. Even though the description is extensive
>> >> (this callback from callback stuff is tricky), you will find that my
>> >> goals are modest.
>> >>
>> >> Also the implementation is just a few lines. With understanding of the
>> >> idea it should not be a lot of work to follow it.
>> >>
>> >> Kind regards,
>> >>   Erik.
>> >>
>> >>
>> >> Op 07-07-2023 om 19:57 schreef Colin McCabe:
>> >>> Hi Erik,
>> >>>
>> >>> It's not clear ot me that it's safe to access the Kafka consumer or
>> producer concurrently from different threads. There are data structures
>> that aren't protected by locks, so I wouldn't necessarily expect accessing
>> and mutating them in a concurrent way to work. This is true even if the
>> accesses happen at different times, because modern CPUs require memory
>> barriers to guarantee inter-thread visibilty of loads and stores.
>> >>>
>> >>> I am writing this is without doing a detailed dive into the code (I
>> haven't been into the consumer / producer code in a bit.) Someone who has
>> worked more on the consumer recently might be able to give specific
>> examples of things that wouldn't work.
>> >>>
>> >>> I know that there are at least a few locks in the consumer code now,
>> due to our need to send heartbeats from a worker thread. I don't think
>> those would be sufficient to protect a client that is making calls from
>> random threads.
>> >>>
>> >>> There has been some discussion of moving to a more traditional model
>> where people make calls to the client and the clients passes the given data
>> to a single background worker thread. This would avoid a lot lof the
>> footguns of the current model and probably better reflect how people
>> actually use the client.
>> >>>
>> >>> Another issue is that neither the producer nor the consumer is fully
>> nonblocking. There are some corner cases where we do in fact block. From
>> memory, the producer blocks in some "buffer full" cases, and the consumer
>> blocks sometimes when fetching metadata.
>> >>>
>> >>> I suspect it would be more appropriate for Kotlin coroutines, Zio
>> coroutines and so on to adopt this "pass messages to and from a background
>> worker thread" model  than to try to re-engineer the Kafka client ot work
>> from random threads.
>> >>>
>> >>> There is actually somed good  advice about how to handle multiple
>> threads in the KafkaConsumer.java header file itself. Check the sections
>> "One Consumer Per Thread" and "Decouple Consumption and Processing." What
>> I'm recommending here is essentially the latter.
>> >>>
>> >>> I do understand that it's frustrating to not get a quick response.
>> However, overall I think this one needs a lot more discussion before
>> getting anywhere near a vote. I will leave a -1 just as a procedural step.
>> Maybe some of the people working in the client area can also chime in.
>> >>>
>> >>> best,
>> >>> Colin
>> >>>
>> >>>
>> >>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
>> >>>> Dear PMCs,
>> >>>>
>> >>>> So far there have been 0 responses to KIP-944. I understand this may
>> not
>> >>>> be something that keeps you busy, but this KIP is important to people
>> >>>> that use async runtimes like Zio, Cats and Kotlin.
>> >>>>
>> >>>> Is there anything you need to come to a decision?
>> >>>>
>> >>>> Kind regards,
>> >>>>Erik.
>> >>>>
>> >>>>
>> >>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
>> >>>>> Hello all,
>> >>>>>
>> >>>>> I'd like to call a vote on KIP-944 Support async runtimes in
>> consumer.
>> >>>>> It has has been 'under discussion' for 7 days now. 'Under
>> discussion'
>> >>>>> between quotes, because there were 0 comments so far. I hope the KIP
>> >>>>> is clear!
>> >>>>>
>> >>>>> KIP description:https://cwiki.apache.org/confluence/x/chw0Dw
>> >>>>>
>> >>>>> Kind regards,
>> >>>>>   Erik.
>> >>>>>
>> >>>>>
>> >> --
>> >> Erik van Oosten
>> >> e.vanoos...@grons.nl
>> >> https://day-to-day-stuff.blogspot.com
>>
>> --
>> Erik van Oosten
>> e.vanoos...@grons.nl
>> https://day-to-day-stuff.blogspot.com
>>
>>


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-11 Thread Philip Nee
; I am writing this is without doing a detailed dive into the code (I
> haven't been into the consumer / producer code in a bit.) Someone who has
> worked more on the consumer recently might be able to give specific
> examples of things that wouldn't work.
> >>>
> >>> I know that there are at least a few locks in the consumer code now,
> due to our need to send heartbeats from a worker thread. I don't think
> those would be sufficient to protect a client that is making calls from
> random threads.
> >>>
> >>> There has been some discussion of moving to a more traditional model
> where people make calls to the client and the clients passes the given data
> to a single background worker thread. This would avoid a lot lof the
> footguns of the current model and probably better reflect how people
> actually use the client.
> >>>
> >>> Another issue is that neither the producer nor the consumer is fully
> nonblocking. There are some corner cases where we do in fact block. From
> memory, the producer blocks in some "buffer full" cases, and the consumer
> blocks sometimes when fetching metadata.
> >>>
> >>> I suspect it would be more appropriate for Kotlin coroutines, Zio
> coroutines and so on to adopt this "pass messages to and from a background
> worker thread" model  than to try to re-engineer the Kafka client ot work
> from random threads.
> >>>
> >>> There is actually somed good  advice about how to handle multiple
> threads in the KafkaConsumer.java header file itself. Check the sections
> "One Consumer Per Thread" and "Decouple Consumption and Processing." What
> I'm recommending here is essentially the latter.
> >>>
> >>> I do understand that it's frustrating to not get a quick response.
> However, overall I think this one needs a lot more discussion before
> getting anywhere near a vote. I will leave a -1 just as a procedural step.
> Maybe some of the people working in the client area can also chime in.
> >>>
> >>> best,
> >>> Colin
> >>>
> >>>
> >>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
> >>>> Dear PMCs,
> >>>>
> >>>> So far there have been 0 responses to KIP-944. I understand this may
> not
> >>>> be something that keeps you busy, but this KIP is important to people
> >>>> that use async runtimes like Zio, Cats and Kotlin.
> >>>>
> >>>> Is there anything you need to come to a decision?
> >>>>
> >>>> Kind regards,
> >>>>Erik.
> >>>>
> >>>>
> >>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
> >>>>> Hello all,
> >>>>>
> >>>>> I'd like to call a vote on KIP-944 Support async runtimes in
> consumer.
> >>>>> It has has been 'under discussion' for 7 days now. 'Under discussion'
> >>>>> between quotes, because there were 0 comments so far. I hope the KIP
> >>>>> is clear!
> >>>>>
> >>>>> KIP description:https://cwiki.apache.org/confluence/x/chw0Dw
> >>>>>
> >>>>> Kind regards,
> >>>>>   Erik.
> >>>>>
> >>>>>
> >> --
> >> Erik van Oosten
> >> e.vanoos...@grons.nl
> >> https://day-to-day-stuff.blogspot.com
>
> --
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
>
>


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-08 Thread Erik van Oosten
like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
       Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in consumer.
It has has been 'under discussion' for 7 days now. 'Under discussion'
between quotes, because there were 0 comments so far. I hope the KIP
is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
      Erik.



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com


--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-08 Thread Colin McCabe
's not clear ot me that it's safe to access the Kafka consumer or producer 
>> concurrently from different threads. There are data structures that aren't 
>> protected by locks, so I wouldn't necessarily expect accessing and mutating 
>> them in a concurrent way to work. This is true even if the accesses happen 
>> at different times, because modern CPUs require memory barriers to guarantee 
>> inter-thread visibilty of loads and stores.
>>
>> I am writing this is without doing a detailed dive into the code (I haven't 
>> been into the consumer / producer code in a bit.) Someone who has worked 
>> more on the consumer recently might be able to give specific examples of 
>> things that wouldn't work.
>>
>> I know that there are at least a few locks in the consumer code now, due to 
>> our need to send heartbeats from a worker thread. I don't think those would 
>> be sufficient to protect a client that is making calls from random threads.
>>
>> There has been some discussion of moving to a more traditional model where 
>> people make calls to the client and the clients passes the given data to a 
>> single background worker thread. This would avoid a lot lof the footguns of 
>> the current model and probably better reflect how people actually use the 
>> client.
>>
>> Another issue is that neither the producer nor the consumer is fully 
>> nonblocking. There are some corner cases where we do in fact block. From 
>> memory, the producer blocks in some "buffer full" cases, and the consumer 
>> blocks sometimes when fetching metadata.
>>
>> I suspect it would be more appropriate for Kotlin coroutines, Zio coroutines 
>> and so on to adopt this "pass messages to and from a background worker 
>> thread" model  than to try to re-engineer the Kafka client ot work from 
>> random threads.
>>
>> There is actually somed good  advice about how to handle multiple threads in 
>> the KafkaConsumer.java header file itself. Check the sections  "One Consumer 
>> Per Thread" and "Decouple Consumption and Processing." What I'm recommending 
>> here is essentially the latter.
>>
>> I do understand that it's frustrating to not get a quick response. However, 
>> overall I think this one needs a lot more discussion before getting anywhere 
>> near a vote. I will leave a -1 just as a procedural step. Maybe some of the 
>> people working in the client area can also chime in.
>>
>> best,
>> Colin
>>
>>
>> On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
>>> Dear PMCs,
>>>
>>> So far there have been 0 responses to KIP-944. I understand this may not
>>> be something that keeps you busy, but this KIP is important to people
>>> that use async runtimes like Zio, Cats and Kotlin.
>>>
>>> Is there anything you need to come to a decision?
>>>
>>> Kind regards,
>>>       Erik.
>>>
>>>
>>> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
>>>> Hello all,
>>>>
>>>> I'd like to call a vote on KIP-944 Support async runtimes in consumer.
>>>> It has has been 'under discussion' for 7 days now. 'Under discussion'
>>>> between quotes, because there were 0 comments so far. I hope the KIP
>>>> is clear!
>>>>
>>>> KIP description:https://cwiki.apache.org/confluence/x/chw0Dw
>>>>
>>>> Kind regards,
>>>>      Erik.
>>>>
>>>>
> -- 
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-08 Thread Erik van Oosten
 than 
to try to re-engineer the Kafka client ot work from random threads.

There is actually somed good  advice about how to handle multiple threads in the KafkaConsumer.java 
header file itself. Check the sections  "One Consumer Per Thread" and "Decouple 
Consumption and Processing." What I'm recommending here is essentially the latter.

I do understand that it's frustrating to not get a quick response. However, 
overall I think this one needs a lot more discussion before getting anywhere 
near a vote. I will leave a -1 just as a procedural step. Maybe some of the 
people working in the client area can also chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand this may not
be something that keeps you busy, but this KIP is important to people
that use async runtimes like Zio, Cats and Kotlin.

Is there anything you need to come to a decision?

Kind regards,
      Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in consumer.
It has has been 'under discussion' for 7 days now. 'Under discussion'
between quotes, because there were 0 comments so far. I hope the KIP
is clear!

KIP description:https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
     Erik.



--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-07 Thread Colin McCabe
Hi Erik,

It's not clear ot me that it's safe to access the Kafka consumer or producer 
concurrently from different threads. There are data structures that aren't 
protected by locks, so I wouldn't necessarily expect accessing and mutating 
them in a concurrent way to work. This is true even if the accesses happen at 
different times, because modern CPUs require memory barriers to guarantee 
inter-thread visibilty of loads and stores.

I am writing this is without doing a detailed dive into the code (I haven't 
been into the consumer / producer code in a bit.) Someone who has worked more 
on the consumer recently might be able to give specific examples of things that 
wouldn't work.

I know that there are at least a few locks in the consumer code now, due to our 
need to send heartbeats from a worker thread. I don't think those would be 
sufficient to protect a client that is making calls from random threads.

There has been some discussion of moving to a more traditional model where 
people make calls to the client and the clients passes the given data to a 
single background worker thread. This would avoid a lot lof the footguns of the 
current model and probably better reflect how people actually use the client.

Another issue is that neither the producer nor the consumer is fully 
nonblocking. There are some corner cases where we do in fact block. From 
memory, the producer blocks in some "buffer full" cases, and the consumer 
blocks sometimes when fetching metadata.

I suspect it would be more appropriate for Kotlin coroutines, Zio coroutines 
and so on to adopt this "pass messages to and from a background worker thread" 
model  than to try to re-engineer the Kafka client ot work from random threads.

There is actually somed good  advice about how to handle multiple threads in 
the KafkaConsumer.java header file itself. Check the sections  "One Consumer 
Per Thread" and "Decouple Consumption and Processing." What I'm recommending 
here is essentially the latter.

I do understand that it's frustrating to not get a quick response. However, 
overall I think this one needs a lot more discussion before getting anywhere 
near a vote. I will leave a -1 just as a procedural step. Maybe some of the 
people working in the client area can also chime in.

best,
Colin


On Thu, Jul 6, 2023, at 12:02, Erik van Oosten wrote:
> Dear PMCs,
>
> So far there have been 0 responses to KIP-944. I understand this may not 
> be something that keeps you busy, but this KIP is important to people 
> that use async runtimes like Zio, Cats and Kotlin.
>
> Is there anything you need to come to a decision?
>
> Kind regards,
>      Erik.
>
>
> Op 05-07-2023 om 11:38 schreef Erik van Oosten:
>> Hello all,
>>
>> I'd like to call a vote on KIP-944 Support async runtimes in consumer. 
>> It has has been 'under discussion' for 7 days now. 'Under discussion' 
>> between quotes, because there were 0 comments so far. I hope the KIP 
>> is clear!
>>
>> KIP description: https://cwiki.apache.org/confluence/x/chw0Dw
>>
>> Kind regards,
>>     Erik.
>>
> -- 
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-06 Thread Erik van Oosten

Dear PMCs,

So far there have been 0 responses to KIP-944. I understand this may not 
be something that keeps you busy, but this KIP is important to people 
that use async runtimes like Zio, Cats and Kotlin.


Is there anything you need to come to a decision?

Kind regards,
    Erik.


Op 05-07-2023 om 11:38 schreef Erik van Oosten:

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in consumer. 
It has has been 'under discussion' for 7 days now. 'Under discussion' 
between quotes, because there were 0 comments so far. I hope the KIP 
is clear!


KIP description: https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
    Erik.


--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



[VOTE] KIP-944 Support async runtimes in consumer

2023-07-05 Thread Erik van Oosten

Hello all,

I'd like to call a vote on KIP-944 Support async runtimes in consumer. 
It has has been 'under discussion' for 7 days now. 'Under discussion' 
between quotes, because there were 0 comments so far. I hope the KIP is 
clear!


KIP description: https://cwiki.apache.org/confluence/x/chw0Dw

Kind regards,
    Erik.