Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread John Roesler
Thanks, all,

This proposal sounds good to me.

We could try to just disable the current flaky tests as a one-time step and 
configure GitHub only to merge green builds as a bulwark against the future.

By definition, flaky tests may not fail during the buggy PR build itself, but 
they should make themselves known soon afterwards, in the nightly and 
subsequent PRs. As long as committers are empowered to identify and revert 
flakiness-inducing PRs, they should be able to unblock their subsequent PRs. 

In other words, I’m biased to think that new flakiness indicates 
non-deterministic bugs more often than it indicates a bad test.

But whatever we do, it’s better than merging red builds. 

Thanks,
John

On Sat, Nov 11, 2023, at 10:48, Ismael Juma wrote:
> One more thing:
>
> 3. If test flakiness is introduced by a recent PR, it's appropriate to
> revert said PR vs disabling the flaky tests.
>
> Ismael
>
> On Sat, Nov 11, 2023, 8:45 AM Ismael Juma  wrote:
>
>> Hi David,
>>
>> I would be fine with:
>> 1. Only allowing merges if the build is green
>> 2. Disabling all flaky tests that aren't fixed within a week. That is, if
>> a test is flaky for more than a week, it should be automatically disabled
>> (it doesn't add any value since it gets ignored).
>>
>> We need both to make this work, if you just do step 1, then we will be
>> stuck with no ability to merge anything.
>>
>> Ismael
>>
>>
>>
>> On Sat, Nov 11, 2023, 2:02 AM David Jacot 
>> wrote:
>>
>>> Hi all,
>>>
>>> The state of our CI worries me a lot. Just this week, we merged two PRs
>>> with compilation errors and one PR introducing persistent failures. This
>>> really hurts the quality and the velocity of the project and it basically
>>> defeats the purpose of having a CI because we tend to ignore it nowadays.
>>>
>>> Should we continue to merge without a green build? No! We should not so I
>>> propose to prevent merging a pull request without a green build. This is a
>>> really simple and bold move that will prevent us from introducing
>>> regressions and will improve the overall health of the project. At the
>>> same
>>> time, I think that we should disable all the known flaky tests, raise
>>> jiras
>>> for them, find an owner for each of them, and fix them.
>>>
>>> What do you think?
>>>
>>> Best,
>>> David
>>>
>>


Re: [ANNOUNCE] New Kafka PMC Member: Satish Duggana

2023-10-29 Thread John Roesler
Congratulations, Satish!
-John

On Sun, Oct 29, 2023, at 08:09, Randall Hauch wrote:
> Congratulations, Satish!
>
> On Sun, Oct 29, 2023 at 1:47 AM Tom Bentley  wrote:
>
>> Congratulations!
>>
>> On Sun, 29 Oct 2023 at 5:41 PM, Guozhang Wang 
>> wrote:
>>
>> > Congratulations Satish!
>> >
>> > On Sat, Oct 28, 2023 at 12:59 AM Luke Chen  wrote:
>> > >
>> > > Congrats Satish!
>> > >
>> > > Luke
>> > >
>> > > On Sat, Oct 28, 2023 at 11:16 AM ziming deng > >
>> > > wrote:
>> > >
>> > > > Congratulations Satish!
>> > > >
>> > > > > On Oct 27, 2023, at 23:03, Jun Rao 
>> wrote:
>> > > > >
>> > > > > Hi, Everyone,
>> > > > >
>> > > > > Satish Duggana has been a Kafka committer since 2022. He has been
>> > very
>> > > > > instrumental to the community since becoming a committer. It's my
>> > > > pleasure
>> > > > > to announce that Satish is now a member of Kafka PMC.
>> > > > >
>> > > > > Congratulations Satish!
>> > > > >
>> > > > > Jun
>> > > > > on behalf of Apache Kafka PMC
>> > > >
>> > > >
>> >
>> >
>>


Re: [ANNOUNCE] New Kafka PMC Member: Justine Olshan

2023-09-24 Thread John Roesler
Congratulations, Justine!
-John

On Sun, Sep 24, 2023, at 05:05, Mickael Maison wrote:
> Congratulations Justine!
>
> On Sun, Sep 24, 2023 at 5:04 AM Sophie Blee-Goldman
>  wrote:
>>
>> Congrats Justine!
>>
>> On Sat, Sep 23, 2023, 4:36 PM Tom Bentley  wrote:
>>
>> > Congratulations!
>> >
>> > On Sun, 24 Sept 2023 at 12:32, Satish Duggana 
>> > wrote:
>> >
>> > > Congratulations Justine!!
>> > >
>> > > On Sat, 23 Sept 2023 at 15:46, Bill Bejeck  wrote:
>> > > >
>> > > > Congrats Justine!
>> > > >
>> > > > -Bill
>> > > >
>> > > > On Sat, Sep 23, 2023 at 6:23 PM Greg Harris
>> > > > > >
>> > > > wrote:
>> > > >
>> > > > > Congratulations Justine!
>> > > > >
>> > > > > On Sat, Sep 23, 2023 at 5:49 AM Boudjelda Mohamed Said
>> > > > >  wrote:
>> > > > > >
>> > > > > > Congrats Justin !
>> > > > > >
>> > > > > > On Sat 23 Sep 2023 at 14:44, Randall Hauch 
>> > wrote:
>> > > > > >
>> > > > > > > Congratulations, Justine!
>> > > > > > >
>> > > > > > > On Sat, Sep 23, 2023 at 4:25 AM Kamal Chandraprakash <
>> > > > > > > kamal.chandraprak...@gmail.com> wrote:
>> > > > > > >
>> > > > > > > > Congrats Justine!
>> > > > > > > >
>> > > > > > > > On Sat, Sep 23, 2023, 13:28 Divij Vaidya <
>> > > divijvaidy...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Congratulations Justine!
>> > > > > > > > >
>> > > > > > > > > On Sat 23. Sep 2023 at 07:06, Chris Egerton <
>> > > > > fearthecel...@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Congrats Justine!
>> > > > > > > > > > On Fri, Sep 22, 2023, 20:47 Guozhang Wang <
>> > > > > > > guozhang.wang...@gmail.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Congratulations!
>> > > > > > > > > > >
>> > > > > > > > > > > On Fri, Sep 22, 2023 at 8:44 PM Tzu-Li (Gordon) Tai <
>> > > > > > > > > tzuli...@apache.org
>> > > > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > Congratulations Justine!
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Fri, Sep 22, 2023, 19:25 Philip Nee <
>> > > philip...@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Congrats Justine!
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Fri, Sep 22, 2023 at 7:07 PM Luke Chen <
>> > > > > show...@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi, Everyone,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Justine Olshan has been a Kafka committer since
>> > Dec.
>> > > > > 2022.
>> > > > > > > She
>> > > > > > > > > has
>> > > > > > > > > > > been
>> > > > > > > > > > > > > > very active and instrumental to the community since
>> > > > > becoming
>> > > > > > > a
>> > > > > > > > > > > committer.
>> > > > > > > > > > > > > > It's my pleasure to announce that Justine is now a
>> > > > > member of
>> > > > > > > > > Kafka
>> > > > > > > > > > > PMC.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Congratulations Justine!
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Luke
>> > > > > > > > > > > > > > on behalf of Apache Kafka PMC
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> > >
>> >


Re: [VOTE] KIP-954: expand default DSL store configuration to custom types

2023-07-29 Thread John Roesler
Thanks for the KIP, Almog!

I'm +1 (binding) 

I've reviewed the KIP and skimmed the discussion thread. I think this is going 
to be a very nice improvement.

Thanks,
-John

On Sat, Jul 29, 2023, at 13:26, Guozhang Wang wrote:
> Thanks Almog! I made a pass over the updated wiki and have no more questions. 
> +1
>
> Guozhang
>
> On Wed, Jul 26, 2023 at 8:14 AM Almog Gavra  wrote:
>>
>> Hello Everyone,
>>
>> Opening the voting for KIP-954. The discussion is converging, but please
>> feel free to chime in on the last few conversation points if you aren't
>> happy with where it settled.
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-954%3A+expand+default+DSL+store+configuration+to+custom+types
>>
>> Cheers,
>> Almog


Re: [DISCUSS] KIP-759: Unneeded repartition canceling

2023-07-26 Thread John Roesler
Hello Shay,

Thanks for the KIP!

I just took a look in preparation to vote, and there are two small-ish things 
that I'd like to fix first. Apologies if this stuff has already come up in the 
discussion thread; I only skimmed it.

1. The KIP only mentions the name of the method instead of providing a code 
snippet showing exactly what the method signature will be in the interface. 
Normally, KIPs do the latter because it removes all ambiguity from the 
proposal. It also gives you an opportunity to write down the Javadoc you would 
add to the method instead of just mentioning the points that you plan to 
document.

2. The KIP lists some concerns, but not what you will do to mitigate them. For 
example, the concern about IQ not behaving correctly. Will you disable the use 
of the implicit partitioner downstream of one of these cancellations? Or 
provide a new interface to supply the "reverse mapping" you mentioned? Or 
include documentation in the Javadoc for how to deal with the situation? I 
think there are a range of options for each of those concerns, and we should 
state up front what we plan to do.

Thanks again!
-John

On 2023/07/24 20:33:05 Sophie Blee-Goldman wrote:
> Thanks Shay! You and Matthias have convinced me, I'm happy with the current
> proposal. I think once you make the minor
> updates to the KIP document this will be ready for voting again.
> 
> Cheers,
> Sophie
> 
> On Mon, Jul 24, 2023 at 8:26 AM Shay Lin  wrote:
> 
> > Hi Sophie and Matthias, thanks for your comments and replies.
> >
> > 1. Scope of change: KStreams only or KStreams/KTable
> > I took some time to digest your points, looking through how KStreams
> > triggers repartitions today. I noticed that `repartitionRequired`is a flag
> > in KStreamImpl etc and not in KTableImpl etc. When I look further, in the
> > case of KTable, instead of passing in a boolean flag, a repartition node `
> > TableRepartitionMapNode` is directly created. I went back and referenced
> > the two issue tickets KAFKA-10844 and KAFKA-4835, both requests were
> > focused on KStreams, i.e. not to change the partition why the input streams
> > are already correctly keyed. Is it possible that in the case of KTable,
> > users always intend to repartition (change key) when they call on
> > aggregate? -- (this was written before I saw Matthias's comment)
> >
> > Overall, based on the tickets, I see the benefit of doing a contained
> > change focusing on KStreams, i.e. repartitionRequired, which would solve
> > the pain points nicely. If we ran into similar complaints/optimization
> > requests for KTable down the line, we can address them on top of this(let
> > me know if we have these requests already, I might just be negligent).
> >
> > 2. API: markAsPartitioned() vs config
> > If we go with the KStreams only scope, markAsPartition() is more
> > adequate, i.e. maps nicely to repartitionRequired. There is a list of
> > NamedOperations that may or may not trigger repartition based on its
> > context(KStreams or KTable) which would make the implementation more
> > confusing.
> >
> > 3. KIP documentation: Thanks for providing the links to previous KIPs. I
> > will be adding the three use cases and javadoc. I will also document the
> > risks when it relates to IQ and Join.
> >
> > Best,
> > Shay
> >
> > On Fri, Jul 21, 2023 at 5:55 PM Matthias J. Sax  wrote:
> >
> > > I agree that it could easily be misused. There is a few Jira tickets for
> > > cases when people want to "cancel" a repartition step. I would hope
> > > those tickets are linked to the KIP (if not, we should do this, and
> > > maybe even c&p those cases as motivation into the KIP itself)?
> > >
> > > It's always a tricky question to what extend we want to guide users, and
> > > to what extend we need to give levers for advances case (and how to
> > > design those levers...) It's for sure a good idea to call out "use with
> > > case" in the JavaDocs for the new method.
> > >
> > >
> > > -Matthias
> > >
> > > On 7/21/23 3:34 PM, Sophie Blee-Goldman wrote:
> > > > I guess I felt a bit uneasy about how this could be used/abused while
> > > > reading the KIP, but if we truly believe this is an advanced feature,
> > I'm
> > > > fine with the way things currently are. It doesn't feel like the best
> > > API,
> > > > but it does seem to be the best *possible* API given the way things
> > are.
> > > >
> > > > W.r.t the KTable notes, that all makes sense to me. I just wanted to
> > lay
> > > > out all the potential cases to make sure we had our bases covered.
> > > >
> > > > I still think an example or two would help, but the only thing I will
> > > > actually wait on before feeling comfortable enough to vote on this
> > would
> > > be
> > > > a clear method signature (and maybe sample javadocs) in the "Public
> > > > Interfaces" section.
> > > >
> > > > Thanks again for the KIP Shay! Hope I haven't dragged it out too much
> > > >
> > > > On Fri, Jul 21, 2023 at 3:19 PM Matthias J. Sax 
> > > wrote:
> > > >
> 

Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-06-21 Thread John Roesler
No worries, I should have included a ";)" to let you know it was mostly 
tongue-in-cheek.


Thanks,
-John

On 6/21/23 12:34, Nick Telford wrote:

Sorry John, I didn't mean to mis-characterize it like that. I was mostly
referring to disabling memtables. AFAIK the SstFileWriter API is primarily
designed for bulk ingest, e.g. for bootstrapping a database from a backup,
rather than during normal operation of an online database. That said, I was
overly alarmist in my phrasing.

My concern is only that, while the concept seems quite reasonable, there
are no doubt hidden issues lurking.

On Wed, 21 Jun 2023 at 18:25, John Roesler  wrote:


Thanks Nick,

That sounds good to me.

I can't let (2) slide, though.. Writing and ingesting SST files is not a
RocksDB internal, but rather a supported usage pattern on public APIs.
Regardless, I think your overall preference is fine with me, especially
if we can internalize this change within the store implementation itself.

Thanks,
-John

On 6/21/23 11:50, Nick Telford wrote:

Hi Bruno,

1.
Isn't this exactly the same issue that WriteBatchWithIndex transactions
have, whereby exceeding (or likely to exceed) configured memory needs to
trigger an early commit?

2.
This is one of my big concerns. Ultimately, any approach based on

cracking

open RocksDB internals and using it in ways it's not really designed for

is

likely to have some unforseen performance or consistency issues.

3.
What's your motivation for removing these early commits? While not

ideal, I

think they're a decent compromise to ensure consistency whilst

maintaining

good and predictable performance.
All 3 of your suggested ideas seem *very* complicated, and might actually
make behaviour less predictable for users as a consequence.

I'm a bit concerned that the scope of this KIP is growing a bit out of
control. While it's good to discuss ideas for future improvements, I

think

it's important to narrow the scope down to a design that achieves the

most

pressing objectives (constant sized restorations during dirty
close/unexpected errors). Any design that this KIP produces can

ultimately

be changed in the future, especially if the bulk of it is internal
behaviour.

I'm going to spend some time next week trying to re-work the original
WriteBatchWithIndex design to remove the newTransaction() method, such

that

it's just an implementation detail of RocksDBStore. That way, if we want

to

replace WBWI with something in the future, like the SST file management
outlined by John, then we can do so with little/no API changes.

Regards,

Nick







Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-06-21 Thread John Roesler

Thanks Nick,

That sounds good to me.

I can't let (2) slide, though.. Writing and ingesting SST files is not a 
RocksDB internal, but rather a supported usage pattern on public APIs. 
Regardless, I think your overall preference is fine with me, especially 
if we can internalize this change within the store implementation itself.


Thanks,
-John

On 6/21/23 11:50, Nick Telford wrote:

Hi Bruno,

1.
Isn't this exactly the same issue that WriteBatchWithIndex transactions
have, whereby exceeding (or likely to exceed) configured memory needs to
trigger an early commit?

2.
This is one of my big concerns. Ultimately, any approach based on cracking
open RocksDB internals and using it in ways it's not really designed for is
likely to have some unforseen performance or consistency issues.

3.
What's your motivation for removing these early commits? While not ideal, I
think they're a decent compromise to ensure consistency whilst maintaining
good and predictable performance.
All 3 of your suggested ideas seem *very* complicated, and might actually
make behaviour less predictable for users as a consequence.

I'm a bit concerned that the scope of this KIP is growing a bit out of
control. While it's good to discuss ideas for future improvements, I think
it's important to narrow the scope down to a design that achieves the most
pressing objectives (constant sized restorations during dirty
close/unexpected errors). Any design that this KIP produces can ultimately
be changed in the future, especially if the bulk of it is internal
behaviour.

I'm going to spend some time next week trying to re-work the original
WriteBatchWithIndex design to remove the newTransaction() method, such that
it's just an implementation detail of RocksDBStore. That way, if we want to
replace WBWI with something in the future, like the SST file management
outlined by John, then we can do so with little/no API changes.

Regards,

Nick



Re: [DISCUSS] KIP-941: Range queries to accept null lower and upper bounds

2023-06-21 Thread John Roesler

Hi all,

Thanks for the KIP, Lucia! This is a nice change.

To Kirk's question (1), the example is a bit misleading. The typical 
case that would ease user pain is specifically using "null" to indicate 
an open-ended range, especially since null is not a valid key.


I could additionally see an empty string as being nice, but the actual 
API is generic, not String, so there's no meaningful concept of 
empty/blank/whitespace that we could check for, just null or not.


Regarding (2), there's no public factory that takes Optional parameters. 
I think you're looking at the private constructor. An alternative Lucia 
could consider is to instead propose adding a new factory like 
`withRange(Optional lower, Optional upper)`.


FWIW, I'd be in favor of this KIP as proposed.

A couple of smaller notes:

3. In the compatibility notes, I wasn't sure what "web request" was 
referring to. I think you just mean that all existing valid API calls 
will continue to work the same, and we're only making the withRange 
method more permissive with its arguments.


4. For the Test Plan, I wrote some tests that validate these queries 
against every kind and configuration of store possible. Please add your 
new test cases to that one to make absolutely sure it'll work for every 
store. Obviously, you may also want to add some specific unit tests in 
addition.


See 
https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/integration/IQv2StoreIntegrationTest.java


Thanks again!
-John

On 6/21/23 12:00, Kirk True wrote:

Hi Lucia,

One question:

1. Since the proposed implementation change for withRange() method uses 
Optional.ofNullable() (which only catches nulls and not blank/whitespace 
strings), wouldn’t users still need to have code like that in the example?

2. Why don't users create RangeQuery objects that use Optional directly? What’s 
the benefit of introducing what appears to be a very thin utility facade?

Thanks,
Kirk


On Jun 21, 2023, at 9:51 AM, Kirk True  wrote:

Hi Lucia,

Thanks for the KIP!

The KIP wasn’t in the email and I didn’t see it on the main KIP directory. Here 
it is:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-941%3A+Range+queries+to+accept+null+lower+and+upper+bounds

Can the KIP be added to the main KIP page 
(https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)?
 That will help with discoverability and encourage discussion.

Thanks,
Kirk


On Jun 15, 2023, at 2:13 PM, Lucia Cerchie  
wrote:

Hi everyone,

I'd like to discuss KIP-941, which will change the behavior of range
queries to make it easier for users to execute full range scans when using
interactive queries with upper and lower bounds from query parameters in
web client requests.

I much appreciate your input!

Lucia Cerchie
--

[image: Confluent] 
Lucia Cerchie
Developer Advocate
Follow us: [image: Blog]
[image:
Twitter] [image: Slack]
[image: YouTube]


[image: Try Confluent Cloud for Free]








Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-06-20 Thread John Roesler

Oh, that's a good point.

On the topic of a behavioral switch for disabled caches, the typical use 
case for disabling the cache is to cause each individual update to 
propagate down the topology, so another thought might be to just go 
ahead and add the memory we would have used for the memtables to the 
cache size, but if people did disable the cache entirely, then we could 
still go ahead and forward the records on each write?


I know that Guozhang was also proposing for a while to actually decouple 
caching and forwarding, which might provide a way to side-step this 
dilemma (i.e., we just always forward and only apply the cache to state 
and changelog writes).


By the way, I'm basing my statement on why you'd disable caches on 
memory, but also on the guidance here: 
https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html 
. That doc also contains a section on how to bound the total memory 
usage across RocksDB memtables, which points to another benefit of 
disabling memtables and managing the write buffer ourselves (simplified 
memory configuration).


Thanks,
-John

On 6/20/23 16:05, Nick Telford wrote:

Potentially we could just go the memorable with Rocks WriteBatches route if
the cache is disabled?

On Tue, 20 Jun 2023, 22:00 John Roesler,  wrote:


Touché!

Ok, I agree that figuring out the case of a disabled cache would be
non-trivial. Ingesting single-record SST files will probably not be
performant, but benchmarking may prove different. Or maybe we can have
some reserved cache space on top of the user-configured cache, which we
would have reclaimed from the memtable space. Or some other, more
creative solution.

Thanks,
-John

On 6/20/23 15:30, Nick Telford wrote:

Note that users can disable the cache, which would still be

ok, I think. We wouldn't ingest the SST files on every record, but just
append to them and only ingest them on commit, when we're already
waiting for acks and a RocksDB commit.

In this case, how would uncommitted records be read by joins?

On Tue, 20 Jun 2023, 20:51 John Roesler,  wrote:


Ah, sorry Nick,

I just meant the regular heap based cache that we maintain in Streams. I
see that it's not called "RecordCache" (my mistake).

The actual cache is ThreadCache:



https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/ThreadCache.java


Here's the example of how we use the cache in KeyValueStore:



https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/CachingKeyValueStore.java


It's basically just an on-heap Map of records that have not yet been
written to the changelog or flushed into the underlying store. It gets
flushed when the total cache size exceeds `cache.max.bytes.buffering` or
the `commit.interval.ms` elapses.

Speaking of those configs, another benefit to this idea is that we would
no longer need to trigger extra commits based on the size of the ongoing
transaction. Instead, we'd just preserve the existing cache-flush
behavior. Note that users can disable the cache, which would still be
ok, I think. We wouldn't ingest the SST files on every record, but just
append to them and only ingest them on commit, when we're already
waiting for acks and a RocksDB commit.

Thanks,
-John

On 6/20/23 14:09, Nick Telford wrote:

Hi John,

By "RecordCache", do you mean the RocksDB "WriteBatch"? I can't find

any

class called "RecordCache"...

Cheers,

Nick

On Tue, 20 Jun 2023 at 19:42, John Roesler 

wrote:



Hi Nick,

Thanks for picking this up again!

I did have one new thought over the intervening months, which I'd like
your take on.

What if, instead of using the RocksDB atomic write primitive at all,

we

instead just:
1. disable memtables entirely
2. directly write the RecordCache into SST files when we flush
3. atomically ingest the SST file(s) into RocksDB when we get the ACK
from the changelog (see





https://github.com/EighteenZi/rocksdb_wiki/blob/master/Creating-and-Ingesting-SST-files.md

and





https://github.com/facebook/rocksdb/blob/master/java/src/main/java/org/rocksdb/IngestExternalFileOptions.java

and





https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L1413-L1429

)
4. track the changelog offsets either in another CF or the same CF

with

a reserved key, either of which will make the changelog offset update
atomic with the file ingestions

I suspect this'll have a number of benefits:
* writes to RocksDB will always be atomic
* we don't fragment memory between the RecordCache and the memtables
* RecordCache gives far higher performance than memtable for reads and
writes
* we don't need any new "transaction" concepts or memory bound configs

What do you think?

Thanks,
-John

On 6/20/23 10:51, Nick Telford wrote:

Hi Bruno,

Thanks for reviewing the KIP. It's

Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-06-20 Thread John Roesler

Touché!

Ok, I agree that figuring out the case of a disabled cache would be 
non-trivial. Ingesting single-record SST files will probably not be 
performant, but benchmarking may prove different. Or maybe we can have 
some reserved cache space on top of the user-configured cache, which we 
would have reclaimed from the memtable space. Or some other, more 
creative solution.


Thanks,
-John

On 6/20/23 15:30, Nick Telford wrote:

Note that users can disable the cache, which would still be

ok, I think. We wouldn't ingest the SST files on every record, but just
append to them and only ingest them on commit, when we're already
waiting for acks and a RocksDB commit.

In this case, how would uncommitted records be read by joins?

On Tue, 20 Jun 2023, 20:51 John Roesler,  wrote:


Ah, sorry Nick,

I just meant the regular heap based cache that we maintain in Streams. I
see that it's not called "RecordCache" (my mistake).

The actual cache is ThreadCache:

https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/ThreadCache.java

Here's the example of how we use the cache in KeyValueStore:

https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/CachingKeyValueStore.java

It's basically just an on-heap Map of records that have not yet been
written to the changelog or flushed into the underlying store. It gets
flushed when the total cache size exceeds `cache.max.bytes.buffering` or
the `commit.interval.ms` elapses.

Speaking of those configs, another benefit to this idea is that we would
no longer need to trigger extra commits based on the size of the ongoing
transaction. Instead, we'd just preserve the existing cache-flush
behavior. Note that users can disable the cache, which would still be
ok, I think. We wouldn't ingest the SST files on every record, but just
append to them and only ingest them on commit, when we're already
waiting for acks and a RocksDB commit.

Thanks,
-John

On 6/20/23 14:09, Nick Telford wrote:

Hi John,

By "RecordCache", do you mean the RocksDB "WriteBatch"? I can't find any
class called "RecordCache"...

Cheers,

Nick

On Tue, 20 Jun 2023 at 19:42, John Roesler  wrote:


Hi Nick,

Thanks for picking this up again!

I did have one new thought over the intervening months, which I'd like
your take on.

What if, instead of using the RocksDB atomic write primitive at all, we
instead just:
1. disable memtables entirely
2. directly write the RecordCache into SST files when we flush
3. atomically ingest the SST file(s) into RocksDB when we get the ACK
from the changelog (see



https://github.com/EighteenZi/rocksdb_wiki/blob/master/Creating-and-Ingesting-SST-files.md

and



https://github.com/facebook/rocksdb/blob/master/java/src/main/java/org/rocksdb/IngestExternalFileOptions.java

and



https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L1413-L1429

)
4. track the changelog offsets either in another CF or the same CF with
a reserved key, either of which will make the changelog offset update
atomic with the file ingestions

I suspect this'll have a number of benefits:
* writes to RocksDB will always be atomic
* we don't fragment memory between the RecordCache and the memtables
* RecordCache gives far higher performance than memtable for reads and
writes
* we don't need any new "transaction" concepts or memory bound configs

What do you think?

Thanks,
-John

On 6/20/23 10:51, Nick Telford wrote:

Hi Bruno,

Thanks for reviewing the KIP. It's been a long road, I started working

on

this more than a year ago, and most of the time in the last 6 months

has

been spent on the "Atomic Checkpointing" stuff that's been benched, so

some

of the reasoning behind some of my decisions have been lost, but I'll

do

my

best to reconstruct them.

1.
IIRC, this was the initial approach I tried. I don't remember the exact
reasons I changed it to use a separate "view" of the StateStore that
encapsulates the transaction, but I believe it had something to do with
concurrent access to the StateStore from Interactive Query threads.

Reads

from interactive queries need to be isolated from the currently ongoing
transaction, both for consistency (so interactive queries don't observe
changes that are subsequently rolled-back), but also to prevent

Iterators

opened by an interactive query from being closed and invalidated by the
StreamThread when it commits the transaction, which causes your

interactive

queries to crash.

Another reason I believe I implemented it this way was a separation of
concerns. Recall that newTransaction() originally created an object of

type

Transaction, not StateStore. My intent was to improve the type-safety

of

the API, in an effort to ensure Transactions weren't used incorrectly.
Unfortunately, t

Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-06-20 Thread John Roesler

Ah, sorry Nick,

I just meant the regular heap based cache that we maintain in Streams. I 
see that it's not called "RecordCache" (my mistake).


The actual cache is ThreadCache: 
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/ThreadCache.java


Here's the example of how we use the cache in KeyValueStore:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/CachingKeyValueStore.java

It's basically just an on-heap Map of records that have not yet been 
written to the changelog or flushed into the underlying store. It gets 
flushed when the total cache size exceeds `cache.max.bytes.buffering` or 
the `commit.interval.ms` elapses.


Speaking of those configs, another benefit to this idea is that we would 
no longer need to trigger extra commits based on the size of the ongoing 
transaction. Instead, we'd just preserve the existing cache-flush 
behavior. Note that users can disable the cache, which would still be 
ok, I think. We wouldn't ingest the SST files on every record, but just 
append to them and only ingest them on commit, when we're already 
waiting for acks and a RocksDB commit.


Thanks,
-John

On 6/20/23 14:09, Nick Telford wrote:

Hi John,

By "RecordCache", do you mean the RocksDB "WriteBatch"? I can't find any
class called "RecordCache"...

Cheers,

Nick

On Tue, 20 Jun 2023 at 19:42, John Roesler  wrote:


Hi Nick,

Thanks for picking this up again!

I did have one new thought over the intervening months, which I'd like
your take on.

What if, instead of using the RocksDB atomic write primitive at all, we
instead just:
1. disable memtables entirely
2. directly write the RecordCache into SST files when we flush
3. atomically ingest the SST file(s) into RocksDB when we get the ACK
from the changelog (see

https://github.com/EighteenZi/rocksdb_wiki/blob/master/Creating-and-Ingesting-SST-files.md
and

https://github.com/facebook/rocksdb/blob/master/java/src/main/java/org/rocksdb/IngestExternalFileOptions.java
and

https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L1413-L1429
)
4. track the changelog offsets either in another CF or the same CF with
a reserved key, either of which will make the changelog offset update
atomic with the file ingestions

I suspect this'll have a number of benefits:
* writes to RocksDB will always be atomic
* we don't fragment memory between the RecordCache and the memtables
* RecordCache gives far higher performance than memtable for reads and
writes
* we don't need any new "transaction" concepts or memory bound configs

What do you think?

Thanks,
-John

On 6/20/23 10:51, Nick Telford wrote:

Hi Bruno,

Thanks for reviewing the KIP. It's been a long road, I started working on
this more than a year ago, and most of the time in the last 6 months has
been spent on the "Atomic Checkpointing" stuff that's been benched, so

some

of the reasoning behind some of my decisions have been lost, but I'll do

my

best to reconstruct them.

1.
IIRC, this was the initial approach I tried. I don't remember the exact
reasons I changed it to use a separate "view" of the StateStore that
encapsulates the transaction, but I believe it had something to do with
concurrent access to the StateStore from Interactive Query threads. Reads
from interactive queries need to be isolated from the currently ongoing
transaction, both for consistency (so interactive queries don't observe
changes that are subsequently rolled-back), but also to prevent Iterators
opened by an interactive query from being closed and invalidated by the
StreamThread when it commits the transaction, which causes your

interactive

queries to crash.

Another reason I believe I implemented it this way was a separation of
concerns. Recall that newTransaction() originally created an object of

type

Transaction, not StateStore. My intent was to improve the type-safety of
the API, in an effort to ensure Transactions weren't used incorrectly.
Unfortunately, this didn't pan out, but newTransaction() remained.

Finally, this had the added benefit that implementations could easily add
support for transactions *without* re-writing their existing,
non-transactional implementation. I think this can be a benefit both for
implementers of custom StateStores, but also for anyone extending
RocksDbStore, as they can rely on the existing access methods working how
they expect them to.

I'm not too happy with the way the current design has panned out, so I'm
open to ideas on how to improve it. Key to this is finding some way to
ensure that reads from Interactive Query threads are properly isolated

from

the transaction, *without* the performance overhead of checking which
thread the method is being called from on every access.

As for replacing fl

Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-06-20 Thread John Roesler
We should

have a

configuration

to

fall

back to the current behavior (and/or disable

txn

stores

for

ALOS)

unless

the benchmarks show negligible overhead for

longer

commits /

large-enough

batch sizes.

If you prefer to keep the KIP smaller, I would

rather

cut out

state-store-managed checkpointing rather than

proper

OOMe

handling

and

being able to switch to non-txn behavior. The

checkpointing

is

not

necessary to solve the recovery-under-EOS

problem.

On

the

other

hand,

once

WriteBatchWithIndex is in, it will be much

easier

to

add

state-store-managed checkpointing.

If you share the current implementation, I am

happy

to

help

you

address

the
OOMe and configuration parts as well as review

and

test

the

patch.


Best,
Alex


1.

https://github.com/facebook/rocksdb/issues/608


On Tue, Nov 22, 2022 at 6:31 PM Nick Telford <

nick.telf...@gmail.com



wrote:


Hi John,

Thanks for the review and feedback!

1. Custom Stores: I've been mulling over this

problem

myself.

As

it

stands,

custom stores would essentially lose

checkpointing

with no

indication

that

they're expected to make changes, besides a

line

in

the

release

notes. I

agree that the best solution would be to

provide a

default

that

checkpoints

to a file. The one thing I would change is

that

the

checkpointing

is

to

a

store-local file, instead of a per-Task file.

This

way the

StateStore

still

technically owns its own checkpointing (via a

default

implementation),

and

the StateManager/Task execution engine

doesn't

need

to know

anything

about

checkpointing, which greatly simplifies some

of

the

logic.


2. OOME errors: The main reasons why I didn't

explore

a

solution

to

this is

a) to keep this KIP as simple as possible,

and

b)

because

I'm

not

exactly

how to signal that a Task should commit

prematurely.

I'm

confident

it's

possible, and I think it's worth adding a

section

on

handling

this.

Besides

my proposal to force an early commit once

memory

usage

reaches

a

threshold,

is there any other approach that you might

suggest

for

tackling

this

problem?

3. ALOS: I can add in an explicit paragraph,

but

my

assumption

is

that

since transactional behaviour comes at

little/no

cost, that

it

should

be

available by default on all stores,

irrespective

of

the

processing

mode.

While ALOS doesn't use transactions, the Task

itself

still

"commits",

so

the behaviour should be correct under ALOS

too.

I'm

not

convinced

that

it's

worth having both

transactional/non-transactional

stores

available, as

it

would considerably increase the complexity of

the

codebase,

for

very

little

benefit.

4. Method deprecation: Are you referring to

StateStore#getPosition()?

As I

understand it, Position contains the

position of

the

*source*

topics,

whereas the commit offsets would be the

*changelog*

offsets.

So

it's

still

necessary to retain the Position data, as

well

as

the

changelog

offsets.

What I meant in the KIP is that Position

offsets

are

currently

stored

in a

file, and since we can atomically store

metadata

along with

the

record

batch we commit to RocksDB, we can move our

Position

offsets

in

to

this

metadata too, and gain the same transactional

guarantees

that

we

will

for

changelog offsets, ensuring that the Position

offsets

are

consistent

with

the records that are read from the database.

Regards,
Nick

On Tue, 22 Nov 2022 at 16:25, John Roesler <

vvcep...@apache.org>

wrote:



Thanks for publishing this alternative,

Nick!


The benchmark you mentioned in the KIP-844

discussion

seems

like

a

compelling reason to revisit the built-in

transactionality

mechanism.

I

also appreciate you analysis, showing that

for

most

use

cases,

the

write

batch approach should be just fine.

There are a couple of points that would

hold

me

back from

approving

this

KIP right now:

1. Loss of coverage for custom stores.
The fact that you can plug in a

(relatively)

simple

implementation

of

the

XStateStore interfaces and automagically

get a

distributed

database

out

of

it is a significant benefit of Kafka

Streams.

I'd

hate to

lose

it,

so

it

would be better to spend some time and

come up

with

a way

to

preserve

that

property. For example, can we provide a

default

implementation

of

`commit(..)` that re-implements the

existing

checkpoint-file

approach? Or

perhaps add an `isTransactional()` flag to

the

state

store

interface

so

that the runtime can decide whether to

continue

to

manage

checkpoint

files

vs delegating transactionality to the

stores?


2. Guarding against OOME
I appreciate your analysis, but I don't

think

it's

sufficient

to

say

that

we will solve the memory problem later if

it

becomes

necessary.

The

experience leading to that si

Re: [VOTE] KIP-872: Add Serializer#serializeToByteBuffer() to reduce memory copying

2023-06-20 Thread John Roesler
Hi Divij and ShunKang,

I pulled open this thread to see if you needed my vote, but FYI, Divij is a 
committer now, so he can re-cast his vote as binding.

Thanks,
-John

On 2023/06/20 13:37:04 ShunKang Lin wrote:
> Hi all,
> 
> Bump this thread again and see if we could get a few more votes.
> Currently we have +3 non-binding (from Divij Vaidya, Kirk True and Kamal
> Chandraprakash)  and +2 binding (from Luke Chen and ziming deng).
> Hoping we can get this approved, reviewed, and merged in time for 3.6.0.
> 
> Best,
> ShunKang
> 
> ShunKang Lin  于2023年5月7日周日 15:24写道:
> 
> > Hi everyone,
> >
> > I'd like to open the vote for KIP-872, which proposes to add
> > Serializer#serializeToByteBuffer() to reduce memory copying.
> >
> > The proposal is here:
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228495828
> >
> > The pull request is here:
> > https://github.com/apache/kafka/pull/12685
> >
> > Thanks to all who reviewed the proposal, and thanks in advance for taking
> > the time to vote!
> >
> > Best,
> > ShunKang
> >
> 


Re: [ANNOUNCE] New committer: Divij Vaidya

2023-06-13 Thread John Roesler
Congratulations, Divij!
-John

On Tue, Jun 13, 2023, at 09:44, David Arthur wrote:
> Congrats Divij!
>
> On Tue, Jun 13, 2023 at 12:34 PM Igor Soarez  wrote:
>
>> Congratulations Divij!
>>
>> --
>> Igor
>>
>
>
> -- 
> -David


Re: [VOTE] 3.5.0 RC1

2023-06-07 Thread John Roesler
Thanks for running this release, Mickael!

I've verified:
* the signature
* that I can compile the project
* that I can run the tests. I saw one flaky test failure, but I don't think it 
should block us. Reported as 
https://issues.apache.org/jira/browse/KAFKA-13531?focusedCommentId=17730190
* the Kafka, Consumer, and Streams quickstarts with ZK and KRaft

I'm +1 (binding)

Thanks,
-John

On Wed, Jun 7, 2023, at 06:16, Josep Prat wrote:
> Hi MIckael,
>
> Apparently you did it in this PR already :) :
> https://github.com/apache/kafka/pull/13749 (this PR among other things
> removes classgraph.
>
> Without being a lawyer, I think I agree with you as stating we depend on
> something we don't would be less problematic than the other way around.
>
> Best,
>
> On Wed, Jun 7, 2023 at 12:14 PM Mickael Maison 
> wrote:
>
>> Hi Josep,
>>
>> Thanks for spotting this. If not already done, can you open a
>> ticket/PR to fix this on trunk? It looks like the last couple of
>> releases already had that issue. Since we're including a license for a
>> dependency we don't ship, I think we can consider this non blocking.
>> The other way around (shipping a dependency without its license) would
>> be blocking.
>>
>> Thanks,
>> Mickael
>>
>> On Tue, Jun 6, 2023 at 10:10 PM Jakub Scholz  wrote:
>> >
>> > +1 (non-binding) ... I used the staged binaries with Scala 2.13 and
>> staged
>> > artifacts to run my tests. All seems to work fine.
>> >
>> > Thanks for running the release Mickael!
>> >
>> > Jakub
>> >
>> > On Mon, Jun 5, 2023 at 3:39 PM Mickael Maison 
>> wrote:
>> >
>> > > Hello Kafka users, developers and client-developers,
>> > >
>> > > This is the second candidate for release of Apache Kafka 3.5.0. Some
>> > > of the major features include:
>> > > - KIP-710: Full support for distributed mode in dedicated MirrorMaker
>> > > 2.0 clusters
>> > > - KIP-881: Rack-aware Partition Assignment for Kafka Consumers
>> > > - KIP-887: Add ConfigProvider to make use of environment variables
>> > > - KIP-889: Versioned State Stores
>> > > - KIP-894: Use incrementalAlterConfig for syncing topic configurations
>> > > - KIP-900: KRaft kafka-storage.sh API additions to support SCRAM for
>> > > Kafka Brokers
>> > >
>> > > Release notes for the 3.5.0 release:
>> > > https://home.apache.org/~mimaison/kafka-3.5.0-rc1/RELEASE_NOTES.html
>> > >
>> > > *** Please download, test and vote by Friday June 9, 5pm PT
>> > >
>> > > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > https://kafka.apache.org/KEYS
>> > >
>> > > * Release artifacts to be voted upon (source and binary):
>> > > https://home.apache.org/~mimaison/kafka-3.5.0-rc1/
>> > >
>> > > * Maven artifacts to be voted upon:
>> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> > >
>> > > * Javadoc:
>> > > https://home.apache.org/~mimaison/kafka-3.5.0-rc1/javadoc/
>> > >
>> > > * Tag to be voted upon (off 3.5 branch) is the 3.5.0 tag:
>> > > https://github.com/apache/kafka/releases/tag/3.5.0-rc1
>> > >
>> > > * Documentation:
>> > > https://kafka.apache.org/35/documentation.html
>> > >
>> > > * Protocol:
>> > > https://kafka.apache.org/35/protocol.html
>> > >
>> > > * Successful Jenkins builds for the 3.5 branch:
>> > > Unit/integration tests: I'm struggling to get all tests to pass in the
>> > > same build. I'll run a few more builds to ensure each test pass at
>> > > least once in the CI. All tests passed locally.
>> > > System tests: The build is still running, I'll send an update once I
>> > > have the results.
>> > >
>> > > Thanks,
>> > > Mickael
>> > >
>>
>
>
> -- 
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |   
>      
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B


Re: [DISCUSS] Adding non-committers as Github collaborators

2023-06-07 Thread John Roesler
Hello again, all,

FYI, I've just opened a request for clarification on the ability to trigger 
builds: https://issues.apache.org/jira/browse/INFRA-24673

Thanks,
-John

On Tue, Jun 6, 2023, at 19:11, Hao Li wrote:
> Thanks John for looking into this!
>
> Hao
>
> On Tue, Jun 6, 2023 at 8:32 AM John Roesler  wrote:
>
>> Hello all,
>>
>> I put in a ticket with Apache Infra to re-send these invites, and they
>> told me I should just remove the usernames in one commit and then re-add
>> them in a subsequent commit to trigger the invites again.
>>
>> I will go ahead and do this for the users who requested it on this thread
>> (Greg and Andrew, as well as for Victoria who asked me about it
>> separately). If there is anyone else who needs a re-send, please let us
>> know.
>>
>> I'm sorry for the confusion, Hao. The docs claimed we could add 20 users,
>> but when I actually checked in the file, I got an automated notification
>> that we could actually only have 10.
>>
>> As for triggering the build, I don't believe you'll be able to log in to
>> Jenkins, but you should be able to say "retest this please" in a PR comment
>> to trigger it. Apparently, that doesn't work anymore, though. I'll file an
>> Infra ticket for it.
>>
>> Thanks all,
>> John
>>
>> On Fri, Jun 2, 2023, at 18:46, Hao Li wrote:
>> > Hi Luke,
>> >
>> > Sorry for the late reply. Can you also add me to the whitelist? I believe
>> > I'm supposed to be there as well. Matthias and John can vouch for me :)
>> My
>> > github ID is "lihaosky".
>> >
>> > Thanks,
>> > Hao
>> >
>> > On Fri, Jun 2, 2023 at 4:43 PM Greg Harris > >
>> > wrote:
>> >
>> >> Luke,
>> >>
>> >> I see that the PR has been merged, but I don't believe it re-sent the
>> >> invitation.
>> >>
>> >> Thanks
>> >> Greg
>> >>
>> >>
>> >> On Wed, May 31, 2023 at 6:46 PM Luke Chen  wrote:
>> >> >
>> >> > Hi Greg and Andrew,
>> >> >
>> >> > Sorry, I don't know how to re-sent the invitation.
>> >> > It looks like it is auto sent after the .asf.yaml updated.
>> >> > Since updating collaborator list is part of release process based on
>> the
>> >> doc
>> >> > <https://kafka.apache.org/contributing>, I just created a new list
>> and
>> >> > opened a PR:
>> >> > https://github.com/apache/kafka/pull/13790
>> >> >
>> >> > Hope that after this PR merged, you'll get a new invitation.
>> >> >
>> >> > Thanks.
>> >> > Luke
>> >> >
>> >> > On Thu, Jun 1, 2023 at 5:27 AM Andrew Grant
>> > >> >
>> >> > wrote:
>> >> >
>> >> > > Hi all,
>> >> > > Like Greg I also received an invitation to collaborate but was too
>> >> slow to
>> >> > > accept the invite :( I'm also wondering if there's a way to resend
>> the
>> >> > > invite? I'm andymg3 on GitHub.
>> >> > > Thanks,
>> >> > > Andrew
>> >> > >
>> >> > > On Tue, May 30, 2023 at 12:12 PM Greg Harris
>> >> > >> > > >
>> >> > > wrote:
>> >> > >
>> >> > > > Hey all,
>> >> > > >
>> >> > > > I received an invitation to collaborate on apache/kafka, but let
>> the
>> >> > > > invitation expire after 7 days.
>> >> > > > Is there a workflow for refreshing the invite, or is an admin
>> able to
>> >> > > > manually re-invite me?
>> >> > > > I'm gharris1727 on github.
>> >> > > >
>> >> > > > Thanks!
>> >> > > > Greg
>> >> > > >
>> >> > > > On Wed, May 24, 2023 at 9:32 AM Justine Olshan
>> >> > > >  wrote:
>> >> > > > >
>> >> > > > > Hey Yash,
>> >> > > > > I'm not sure how it used to be for sure, but I do remember we
>> used
>> >> to
>> >> > > > have
>> >> > > > > a different build system. I wonder if 

Re: [DISCUSS] Adding non-committers as Github collaborators

2023-06-06 Thread John Roesler
Hello Greg, Andrew, and Victoria,

I have just re-added you to the file, so you should receive new invites. Please 
let us know if you have any trouble!

Thanks,
-John

On Tue, Jun 6, 2023, at 10:31, John Roesler wrote:
> Hello all,
>
> I put in a ticket with Apache Infra to re-send these invites, and they 
> told me I should just remove the usernames in one commit and then 
> re-add them in a subsequent commit to trigger the invites again.
>
> I will go ahead and do this for the users who requested it on this 
> thread (Greg and Andrew, as well as for Victoria who asked me about it 
> separately). If there is anyone else who needs a re-send, please let us 
> know.
>
> I'm sorry for the confusion, Hao. The docs claimed we could add 20 
> users, but when I actually checked in the file, I got an automated 
> notification that we could actually only have 10.
>
> As for triggering the build, I don't believe you'll be able to log in 
> to Jenkins, but you should be able to say "retest this please" in a PR 
> comment to trigger it. Apparently, that doesn't work anymore, though. 
> I'll file an Infra ticket for it.
>
> Thanks all,
> John
>
> On Fri, Jun 2, 2023, at 18:46, Hao Li wrote:
>> Hi Luke,
>>
>> Sorry for the late reply. Can you also add me to the whitelist? I believe
>> I'm supposed to be there as well. Matthias and John can vouch for me :) My
>> github ID is "lihaosky".
>>
>> Thanks,
>> Hao
>>
>> On Fri, Jun 2, 2023 at 4:43 PM Greg Harris 
>> wrote:
>>
>>> Luke,
>>>
>>> I see that the PR has been merged, but I don't believe it re-sent the
>>> invitation.
>>>
>>> Thanks
>>> Greg
>>>
>>>
>>> On Wed, May 31, 2023 at 6:46 PM Luke Chen  wrote:
>>> >
>>> > Hi Greg and Andrew,
>>> >
>>> > Sorry, I don't know how to re-sent the invitation.
>>> > It looks like it is auto sent after the .asf.yaml updated.
>>> > Since updating collaborator list is part of release process based on the
>>> doc
>>> > <https://kafka.apache.org/contributing>, I just created a new list and
>>> > opened a PR:
>>> > https://github.com/apache/kafka/pull/13790
>>> >
>>> > Hope that after this PR merged, you'll get a new invitation.
>>> >
>>> > Thanks.
>>> > Luke
>>> >
>>> > On Thu, Jun 1, 2023 at 5:27 AM Andrew Grant >> >
>>> > wrote:
>>> >
>>> > > Hi all,
>>> > > Like Greg I also received an invitation to collaborate but was too
>>> slow to
>>> > > accept the invite :( I'm also wondering if there's a way to resend the
>>> > > invite? I'm andymg3 on GitHub.
>>> > > Thanks,
>>> > > Andrew
>>> > >
>>> > > On Tue, May 30, 2023 at 12:12 PM Greg Harris
>>> >> > > >
>>> > > wrote:
>>> > >
>>> > > > Hey all,
>>> > > >
>>> > > > I received an invitation to collaborate on apache/kafka, but let the
>>> > > > invitation expire after 7 days.
>>> > > > Is there a workflow for refreshing the invite, or is an admin able to
>>> > > > manually re-invite me?
>>> > > > I'm gharris1727 on github.
>>> > > >
>>> > > > Thanks!
>>> > > > Greg
>>> > > >
>>> > > > On Wed, May 24, 2023 at 9:32 AM Justine Olshan
>>> > > >  wrote:
>>> > > > >
>>> > > > > Hey Yash,
>>> > > > > I'm not sure how it used to be for sure, but I do remember we used
>>> to
>>> > > > have
>>> > > > > a different build system. I wonder if this used to work with the
>>> old
>>> > > > build
>>> > > > > system and not any more.
>>> > > > > I'd be curious if other projects have something similar and how it
>>> > > works.
>>> > > > >
>>> > > > > Thanks,
>>> > > > > Justine
>>> > > > >
>>> > > > > On Wed, May 24, 2023 at 9:22 AM Yash Mayya 
>>> > > wrote:
>>> > > > >
>>> > > > > > Hi Justine,
>>> > > > > >
>>> > > > > > Thanks for th

Re: [DISCUSS] Adding non-committers as Github collaborators

2023-06-06 Thread John Roesler
Hi Divij,

Interesting; perhaps they didn't need to send you an invite because your github 
account is already associated with another Apache project? The actions you 
listed are the things that the collaborator role gives you, so I think you're 
good to go.

Thanks,
-John

On Tue, Jun 6, 2023, at 11:14, Divij Vaidya wrote:
> Hey John
>
> What does the invite do? I did not receive any invite either but I have
> been able to add labels, close PRs etc. since I was added to the
> contributors list (perhaps because I already have an apache LDAP account?).
> I am working with the assumption that things are working for me even if I
> didn't have to accept an invite. Is that a correct assumption?
>
> --
> Divij Vaidya
>
>
>
> On Tue, Jun 6, 2023 at 5:32 PM John Roesler  wrote:
>
>> Hello all,
>>
>> I put in a ticket with Apache Infra to re-send these invites, and they
>> told me I should just remove the usernames in one commit and then re-add
>> them in a subsequent commit to trigger the invites again.
>>
>> I will go ahead and do this for the users who requested it on this thread
>> (Greg and Andrew, as well as for Victoria who asked me about it
>> separately). If there is anyone else who needs a re-send, please let us
>> know.
>>
>> I'm sorry for the confusion, Hao. The docs claimed we could add 20 users,
>> but when I actually checked in the file, I got an automated notification
>> that we could actually only have 10.
>>
>> As for triggering the build, I don't believe you'll be able to log in to
>> Jenkins, but you should be able to say "retest this please" in a PR comment
>> to trigger it. Apparently, that doesn't work anymore, though. I'll file an
>> Infra ticket for it.
>>
>> Thanks all,
>> John
>>
>> On Fri, Jun 2, 2023, at 18:46, Hao Li wrote:
>> > Hi Luke,
>> >
>> > Sorry for the late reply. Can you also add me to the whitelist? I believe
>> > I'm supposed to be there as well. Matthias and John can vouch for me :)
>> My
>> > github ID is "lihaosky".
>> >
>> > Thanks,
>> > Hao
>> >
>> > On Fri, Jun 2, 2023 at 4:43 PM Greg Harris > >
>> > wrote:
>> >
>> >> Luke,
>> >>
>> >> I see that the PR has been merged, but I don't believe it re-sent the
>> >> invitation.
>> >>
>> >> Thanks
>> >> Greg
>> >>
>> >>
>> >> On Wed, May 31, 2023 at 6:46 PM Luke Chen  wrote:
>> >> >
>> >> > Hi Greg and Andrew,
>> >> >
>> >> > Sorry, I don't know how to re-sent the invitation.
>> >> > It looks like it is auto sent after the .asf.yaml updated.
>> >> > Since updating collaborator list is part of release process based on
>> the
>> >> doc
>> >> > <https://kafka.apache.org/contributing>, I just created a new list
>> and
>> >> > opened a PR:
>> >> > https://github.com/apache/kafka/pull/13790
>> >> >
>> >> > Hope that after this PR merged, you'll get a new invitation.
>> >> >
>> >> > Thanks.
>> >> > Luke
>> >> >
>> >> > On Thu, Jun 1, 2023 at 5:27 AM Andrew Grant
>> > >> >
>> >> > wrote:
>> >> >
>> >> > > Hi all,
>> >> > > Like Greg I also received an invitation to collaborate but was too
>> >> slow to
>> >> > > accept the invite :( I'm also wondering if there's a way to resend
>> the
>> >> > > invite? I'm andymg3 on GitHub.
>> >> > > Thanks,
>> >> > > Andrew
>> >> > >
>> >> > > On Tue, May 30, 2023 at 12:12 PM Greg Harris
>> >> > >> > > >
>> >> > > wrote:
>> >> > >
>> >> > > > Hey all,
>> >> > > >
>> >> > > > I received an invitation to collaborate on apache/kafka, but let
>> the
>> >> > > > invitation expire after 7 days.
>> >> > > > Is there a workflow for refreshing the invite, or is an admin
>> able to
>> >> > > > manually re-invite me?
>> >> > > > I'm gharris1727 on github.
>> >> > > >
>> >> > > > Thanks!
>> >> > > > Greg
>> >> > 

Re: [DISCUSS] Adding non-committers as Github collaborators

2023-06-06 Thread John Roesler
;> > > > > >
>> > > > > > > Yash,
>> > > > > > >
>> > > > > > > When I rebuild, I go to the CloudBees CI page and I have to
>> log in
>> > > > with
>> > > > > > my
>> > > > > > > apache account.
>> > > > > > > Not sure if the change in the build system or the need to sign
>> in
>> > > is
>> > > > part
>> > > > > > > of the problem.
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, May 24, 2023 at 4:54 AM Federico Valeri <
>> > > > fedeval...@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > +1 on Divij suggestions
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Wed, May 24, 2023 at 12:04 PM Divij Vaidya <
>> > > > divijvaidy...@gmail.com
>> > > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Hey folks
>> > > > > > > > >
>> > > > > > > > > A week into this experiment, I am finding the ability to
>> add
>> > > > labels,
>> > > > > > > > > request for reviewers and ability to close PRs very useful.
>> > > > > > > > >
>> > > > > > > > > 1. May I suggest an improvement to the process by
>> requesting
>> > > for
>> > > > some
>> > > > > > > > > guidance on the interest areas for various committers. This
>> > > would
>> > > > > > help
>> > > > > > > us
>> > > > > > > > > request for reviews from the right set of individuals.
>> > > > > > > > > As a reference, we have tried something similar with Apache
>> > > > TinkerPop
>> > > > > > > > (see
>> > > > > > > > > TinkerPop Contributors section at the end) [1], where the
>> > > > committers
>> > > > > > > self
>> > > > > > > > > identify their preferred area of interest.
>> > > > > > > > >
>> > > > > > > > > 2. I would also request creation of the following new
>> labels:
>> > > > > > > > > tiered-storage, transactions, security, refactor,
>> zk-migration,
>> > > > > > > > > first-contribution (so that we can prioritize reviews for
>> first
>> > > > time
>> > > > > > > > > contributors as an encouragement), build, metrics
>> > > > > > > > >
>> > > > > > > > > [1] https://tinkerpop.apache.org/
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Divij Vaidya
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Mon, May 15, 2023 at 11:07 PM John Roesler <
>> > > > vvcep...@apache.org>
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hello again, all,
>> > > > > > > > > >
>> > > > > > > > > > Just a quick update: after merging the changes to
>> asf.yaml, I
>> > > > > > > received
>> > > > > > > > a
>> > > > > > > > > > notification that the list is limited to only 10 people,
>> not
>> > > > 20 as
>> > > > > > > the
>> > > > > > > > > > documentation states.
>> > > > > > > > > >
>> > > > > > > > > > Here is the list of folks who will now be able to triage
>> PRs
>> > > > and
>> > > > > > > > trigger
>> > > > > > > > > > builds: Victoria Xia, Greg Harris, Divij Vaidya, Lucas
>> > > > Brutschy,
>> > > > > > Yash
>> > &g

Re: [VOTE] KIP-923: Add A Grace Period to Stream Table Join

2023-06-06 Thread John Roesler
Thanks for the KIP, Walker!

I’m +1 (binding)

Thanks,
John

On Mon, Jun 5, 2023, at 13:39, Victoria Xia wrote:
> Hi Walker,
>
> Thanks for the KIP! Left a clarification question on the discussion thread
> just now but it's about an implementation detail, so I don't think it
> changes anything in this vote thread.
>
> +1 (non-binding)
>
> Cheers,
> Victoria
>
> On Mon, Jun 5, 2023 at 10:23 AM Bill Bejeck  wrote:
>
>> Hi Walker,
>>
>> Thanks for the KIP.
>>
>> I've caught up on the discussion thread and I'm satisfied with all
>> responses.
>>
>> +1(binding)
>>
>> -Bill
>>
>> On Mon, Jun 5, 2023 at 10:20 AM Bruno Cadonna  wrote:
>>
>> > Hi Walker,
>> >
>> > Thank you for the KIP!
>> >
>> > +1 (binding)
>> >
>> > Best,
>> > Bruno
>> >
>> > On 24.05.23 23:00, Walker Carlson wrote:
>> > > Hello everybody,
>> > >
>> > > I'm opening the vote on KIP-923 here
>> > > .
>> > >
>> > > If we have more to discus please continue the discussion on the
>> existing
>> > > thread
>> https://www.mail-archive.com/dev@kafka.apache.org/msg130657.html
>> > >
>> > > best,
>> > > Walker
>> > >
>> >
>>


Re: [VOTE] KIP-925: rack aware task assignment in Kafka Streams

2023-06-05 Thread John Roesler
Thanks, Hao!

I'm +1 (binding)
-John

On 2023/06/05 18:07:56 Walker Carlson wrote:
> +1 (binding)
> 
> On Mon, Jun 5, 2023 at 3:14 AM Bruno Cadonna  wrote:
> 
> > Hi Hao,
> >
> > +1 (binding)
> >
> > Thanks!
> > Bruno
> >
> > On 30.05.23 21:16, Colt McNealy wrote:
> > > +1 (non-binding)
> > >
> > > Thank you Hao!
> > >
> > > Colt McNealy
> > >
> > > *Founder, LittleHorse.dev*
> > >
> > >
> > > On Tue, May 30, 2023 at 9:50 AM Hao Li  wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'd like to open the vote for KIP-925: rack aware task assignment in
> > Kafka
> > >> Streams. The link for the KIP is
> > >>
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams
> > >> .
> > >>
> > >> --
> > >> Thanks,
> > >> Hao
> > >>
> > >
> >
> 


Re: [DISCUSS] KIP-925: rack aware task assignment in Kafka Streams

2023-05-22 Thread John Roesler
Hi Hao,

Thanks for the KIP!

Overall, I think this is a great idea. I always wanted to circle back after the 
Smooth Scaling KIP to put a proper optimization algorithm into place. I think 
this has the promise to really improve the quality of the balanced assignments 
we produce.

Thanks for providing the details about the MaxCut/MinFlow algorithm. It seems 
like a good choice for me, assuming we choose the right scaling factors for the 
weights we add to the graph. Unfortunately, I don't think that there's a good 
way to see how easy or hard this is going to be until we actually implement it 
and test it.

That leads to the only real piece of feedback I had on the KIP, which is the 
testing portion. You mentioned system/integration/unit tests, but there's not 
too much information about what those tests will do. I'd like to suggest that 
we invest in more simulation testing specifically, similar to what we did in 
https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/TaskAssignorConvergenceTest.java
 .

In fact, it seems like we _could_ write the simulation up front, and then 
implement the algorithm in a dummy way and just see whether it passes the 
simulations or not, before actually integrating it with Kafka Streams.

Basically, I'd be +1 on this KIP today, but I'd feel confident about it if we 
had a little more detail regarding how we are going to verify that the new 
optimizer is actually going to produce more optimal plans than the existing 
assigner we have today.

Thanks again!
-John

On 2023/05/22 16:49:22 Hao Li wrote:
> Hi Colt,
> 
> Thanks for the comments.
> 
> > and I struggle to see how the algorithm isn't at least O(N) where N is
> the number of Tasks...?
> 
> For O(E^2 * (CU)) complexity, C and U can be viewed as constant. Number of
> edges E is T * N where T is the number of clients and N is the number of
> Tasks. This is because a task can be assigned to any client so there will
> be an edge between every task and every client. The total complexity would
> be O(T * N) if we want to be more specific.
> 
> > But if the leaders for each partition are spread across multiple zones,
> how will you handle that?
> 
> This is what the min-cost flow solution is trying to solve? i.e. Find an
> assignment of tasks to clients where across AZ traffic can be minimized.
> But there are some constraints to the solution and one of them is we need
> to balance task assignment first (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams#KIP925:RackawaretaskassignmentinKafkaStreams-Designforrackawareassignment).
> So in your example of three tasks' partitions being in the same AZ of a
> client, if there are other clients, we still want to balance the tasks to
> other clients even if putting all tasks to a single client can result in 0
> cross AZ traffic. In
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams#KIP925:RackawaretaskassignmentinKafkaStreams-Algorithm
> section, the algorithm will try to find a min-cost solution based on
> balanced assignment instead of pure min-cost.
> 
> Thanks,
> Hao
> 
> On Tue, May 9, 2023 at 5:55 PM Colt McNealy  wrote:
> 
> > Hello Hao,
> >
> > First of all, THANK YOU for putting this together. I had been hoping
> > someone might bring something like this forward. A few comments:
> >
> > **1: Runtime Complexity
> > > Klein’s cycle canceling algorithm can solve the min-cost flow problem in
> > O(E^2CU) time where C is max cost and U is max capacity. In our particular
> > case, C is 1 and U is at most 3 (A task can have at most 3 topics including
> > changelog topic?). So the algorithm runs in O(E^2) time for our case.
> >
> > A Task can have multiple input topics, and also multiple state stores, and
> > multiple output topics. The most common case is three topics as you
> > described, but this is not necessarily guaranteed. Also, math is one of my
> > weak points, but to me O(E^2) is equivalent to O(1), and I struggle to see
> > how the algorithm isn't at least O(N) where N is the number of Tasks...?
> >
> > **2: Broker-Side Partition Assignments
> > Consider the case with just three topics in a Task (one input, one output,
> > one changelog). If all three partition leaders are in the same Rack (or
> > better yet, the same broker), then we could get massive savings by
> > assigning the Task to that Rack/availability zone. But if the leaders for
> > each partition are spread across multiple zones, how will you handle that?
> > Is that outside the scope of this KIP, or is it worth introducing a
> > kafka-streams-generate-rebalance-proposal.sh tool?
> >
> > Colt McNealy
> > *Founder, LittleHorse.io*
> >
> >
> > On Tue, May 9, 2023 at 4:03 PM Hao Li  wrote:
> >
> > > Hi all,
> > >
> > > I have submitted KIP-925 to add rack awareness logic in task assignment
> > in
> > > Kafka Streams and would like to s

Re: [VOTE] KIP-927: Improve the kafka-metadata-quorum output

2023-05-20 Thread John Roesler
I’m +1 (binding)

Thanks, Federico!

It looks like a nice improvement to me. 

-John


On Sat, May 20, 2023, at 09:16, Kamal Chandraprakash wrote:
> +1 (non binding)
>
> On Wed, May 17, 2023 at 3:22 AM Divij Vaidya 
> wrote:
>
>> +1 (non binding)
>>
>> Divij Vaidya
>>
>>
>>
>> On Tue, May 16, 2023 at 4:35 AM ziming deng 
>> wrote:
>>
>> > Thanks for this improvement, +1 from me(binging)
>> >
>> > —
>> > Best,
>> > Ziming
>> >
>> > > On May 16, 2023, at 00:43, Federico Valeri 
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > I'd like to start a vote on KIP-927: Improve the kafka-metadata-quorum
>> > output.
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-927%3A+Improve+the+kafka-metadata-quorum+output
>> > >
>> > > Discussion thread:
>> > > https://lists.apache.org/thread/pph59hxvz5jkk709x53p44xrpdqwv8qc
>> > >
>> > > Thanks
>> > > Fede
>> >
>> >
>>


Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-15 Thread John Roesler
Hello again, all,

Just a quick update: after merging the changes to asf.yaml, I received a 
notification that the list is limited to only 10 people, not 20 as the 
documentation states. 

Here is the list of folks who will now be able to triage PRs and trigger 
builds: Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash Mayya, 
Philip Nee, vamossagar12, Christo Lolov, Federico Valeri, and andymg3

Thanks all,
-John

On 2023/05/12 15:53:40 John Roesler wrote:
> Thanks again for bringing this up, David!
> 
> As an update to the community, the PMC has approved a process to make use of 
> this feature.
> 
> Here are the relevant updates:
> 
> PR to add the policy: https://github.com/apache/kafka-site/pull/510
> 
> PR to update the list: https://github.com/apache/kafka/pull/13713
> 
> Ticket to automate this process.. Contributions welcome :) 
> https://issues.apache.org/jira/browse/KAFKA-14995
> 
> And to make sure it doesn't fall through the cracks in the mean time, here's 
> the release process step: 
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList
> 
> Unfortunately, the "collaborator" feature only allows 20 usernames, so we 
> have decided to simply take the top 20 non-committer authors from the past 
> year (according to git shortlog). Congratulations to our new collaborators!
> 
> Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash Mayya, Philip 
> Nee, vamossagar12, Christo Lolov, Federico Valeri, and andymg3
> 
> Thanks,
> -John
> 
> On 2023/04/27 18:45:09 David Arthur wrote:
> > Hey folks,
> > 
> > I stumbled across this wiki page from the infra team that describes the
> > various features supported in the ".asf.yaml" file:
> > https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
> > 
> > One section that looked particularly interesting was
> > https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
> > 
> > github:
> >   collaborators:
> > - userA
> > - userB
> > 
> > This would allow us to define non-committers as collaborators on the Github
> > project. Concretely, this means they would receive the "triage" Github role
> > (defined here
> > https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role).
> > Practically, this means we could let non-committers do things like assign
> > labels and reviewers on Pull Requests.
> > 
> > I wanted to see what the committer group thought about this feature. I
> > think it could be useful.
> > 
> > Cheers,
> > David
> > 
> 


Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-12 Thread John Roesler
Thanks again for bringing this up, David!

As an update to the community, the PMC has approved a process to make use of 
this feature.

Here are the relevant updates:

PR to add the policy: https://github.com/apache/kafka-site/pull/510

PR to update the list: https://github.com/apache/kafka/pull/13713

Ticket to automate this process.. Contributions welcome :) 
https://issues.apache.org/jira/browse/KAFKA-14995

And to make sure it doesn't fall through the cracks in the mean time, here's 
the release process step: 
https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList

Unfortunately, the "collaborator" feature only allows 20 usernames, so we have 
decided to simply take the top 20 non-committer authors from the past year 
(according to git shortlog). Congratulations to our new collaborators!

Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash Mayya, Philip 
Nee, vamossagar12,, Christo Lolov, Federico Valeri, andymg3, RivenSun, Kirk 
True, Matthew de Detrich, Akhilesh C, Alyssa Huang, Artem Livshits, Gantigmaa 
Selenge, Hao Li, Niket, and hudeqi

Thanks,
-John

On 2023/04/27 18:45:09 David Arthur wrote:
> Hey folks,
> 
> I stumbled across this wiki page from the infra team that describes the
> various features supported in the ".asf.yaml" file:
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
> 
> One section that looked particularly interesting was
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
> 
> github:
>   collaborators:
> - userA
> - userB
> 
> This would allow us to define non-committers as collaborators on the Github
> project. Concretely, this means they would receive the "triage" Github role
> (defined here
> https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role).
> Practically, this means we could let non-committers do things like assign
> labels and reviewers on Pull Requests.
> 
> I wanted to see what the committer group thought about this feature. I
> think it could be useful.
> 
> Cheers,
> David
> 


[jira] [Created] (KAFKA-14995) Automate asf.yaml collaborators refresh

2023-05-12 Thread John Roesler (Jira)
John Roesler created KAFKA-14995:


 Summary: Automate asf.yaml collaborators refresh
 Key: KAFKA-14995
 URL: https://issues.apache.org/jira/browse/KAFKA-14995
 Project: Kafka
  Issue Type: Improvement
Reporter: John Roesler


We have added a policy to use the asf.yaml Github Collaborators: 
[https://github.com/apache/kafka-site/pull/510]

The policy states that we set this list to be the top 20 commit authors who are 
not Kafka committers. Unfortunately, it's not trivial to compute this list.

Here is the process I followed to generate the list the first time (note that I 
generated this list on 2023-04-28, so the lookback is one year:

1. List authors by commit volume in the last year:
{code:java}
$ git shortlog --email --numbered --summary --since=2022-04-28 | vim {code}
2. manually filter out the authors who are committers, based on 
[https://kafka.apache.org/committers]

3. truncate the list to 20 authors

4. for each author

4a. Find a commit in the `git log` that they were the author on:
{code:java}
commit 440bed2391338dc10fe4d36ab17dc104b61b85e8
Author: hudeqi <1217150...@qq.com>
Date:   Fri May 12 14:03:17 2023 +0800
...{code}
4b. Look up that commit in Github: 
[https://github.com/apache/kafka/commit/440bed2391338dc10fe4d36ab17dc104b61b85e8]

4c. Copy their Github username into .asf.yaml under both the PR whitelist and 
the Collaborators lists.

5. Send a PR to update .asf.yaml: [https://github.com/apache/kafka/pull/13713]

 

This is pretty time consuming and is very scriptable. Two complications:
 * To do the filtering, we need to map from Git log "Author" to documented 
Kafka "Committer" that we can use to perform the filter. Suggestion: just 
update the structure of the "Committers" page to include their Git "Author" 
name and email 
([https://github.com/apache/kafka-site/blob/asf-site/committers.html)]
 * To generate the YAML lists, we need to map from Git log "Author" to Github 
username. There's presumably some way to do this in the Github REST API (the 
mapping is based on the email, IIUC), or we could also just update the 
Committers page to also document each committer's Github username.

 

Ideally, we would write this script (to be stored in the Apache Kafka repo) and 
create a Github Action to run it every three months.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Adding non-committers as Github collaborators

2023-04-28 Thread John Roesler
Hi all,

This is a great suggestion! It seems like a really good way to make the Apache 
Kafka project more efficient in general and also smooth the path to 
committership.

I've brought the topic up with the Apache Kafka PMC to consider adopting a 
policy around the "collaborator" rule.

In the mean time, it would be great to hear from the broader community what 
your thoughts are around this capability.

Thanks,
-John

On Fri, Apr 28, 2023, at 10:50, Justine Olshan wrote:
> I'm also a bit concerned by the 20 active collaborators rule. How do we
> pick the 20 people?
>
> Justine
>
> On Fri, Apr 28, 2023 at 8:36 AM Matthias J. Sax  wrote:
>
>> In general I am +1
>>
>> The only question I have is about
>>
>> > You may only have 20 active collaborators at any given time per
>> repository.
>>
>> Not sure if this is a concern or not? I would assume not, but wanted to
>> bring it to everyone's attention.
>>
>> There is actually also a way to allow people to re-trigger Jenkins jobs:
>> https://github.com/apache/kafka/pull/13578
>>
>> Retriggering test is a little bit more sensitive as our resources are
>> limited, and we should avoid overwhelming Jenkins even more.
>>
>>
>> -Matthias
>>
>>
>> On 4/27/23 11:45 AM, David Arthur wrote:
>> > Hey folks,
>> >
>> > I stumbled across this wiki page from the infra team that describes the
>> > various features supported in the ".asf.yaml" file:
>> >
>> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
>> >
>> > One section that looked particularly interesting was
>> >
>> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
>> >
>> > github:
>> >collaborators:
>> >  - userA
>> >  - userB
>> >
>> > This would allow us to define non-committers as collaborators on the
>> Github
>> > project. Concretely, this means they would receive the "triage" Github
>> role
>> > (defined here
>> >
>> https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role
>> ).
>> > Practically, this means we could let non-committers do things like assign
>> > labels and reviewers on Pull Requests.
>> >
>> > I wanted to see what the committer group thought about this feature. I
>> > think it could be useful.
>> >
>> > Cheers,
>> > David
>> >
>>


Re: [VOTE] KIP-906 Tools migration guidelines

2023-03-24 Thread John Roesler
Thanks Federico,

I'm +1 (binding)

-John

On Fri, Mar 24, 2023, at 01:11, Federico Valeri wrote:
> Bumping this thread for more votes.
>
> Thanks.
>
> On Wed, Mar 15, 2023, 9:57 AM Alexandre Dupriez 
> wrote:
>
>> Hi, Frederico,
>>
>> Thanks for the KIP.
>>
>> Non-binding +1.
>>
>> Thanks,
>> Alexandre
>>
>> Le mer. 15 mars 2023 à 08:28, Luke Chen  a écrit :
>> >
>> > +1 from me.
>> >
>> > Luke
>> >
>> > On Wed, Mar 15, 2023 at 4:11 PM Federico Valeri 
>> > wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > I'd like to start the vote on KIP-906 Tools migration guidelines.
>> > >
>> > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-906%3A+Tools+migration+guidelines
>> > >
>> > > Discussion thread:
>> > > https://lists.apache.org/thread/o2ytmjj2tyc2xcy6xh5tco31yyjzvz8p
>> > >
>> > > Thanks
>> > > Fede
>> > >
>>


Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-03-24 Thread John Roesler
Thanks for the KIP, Sagar!

At first glance, this seems like a very useful feature.

A common pain point in Streams is when upstream producers don't send regular 
updates and stream time cannot advance. This causes stream-time-driven 
operations to appear to hang, like time windows not closing, suppressions not 
firing, etc.

>From your KIP, I have a good idea of how the feature would be integrated into 
>connect, and it sounds good to me. I don't quite see how downstream clients, 
>such as a downstream Streams or Flink application, or users of the Consumer 
>would make use of this feature. Could you add some examples of that nature?

Thank you,
-John

On Fri, Mar 24, 2023, at 05:23, Sagar wrote:
> Hi All,
>
> Bumping the thread again.
>
> Sagar.
>
>
> On Fri, Mar 10, 2023 at 4:42 PM Sagar  wrote:
>
>> Hi All,
>>
>> Bumping this discussion thread again.
>>
>> Thanks!
>> Sagar.
>>
>> On Thu, Mar 2, 2023 at 3:44 PM Sagar  wrote:
>>
>>> Hi All,
>>>
>>> I wanted to create a discussion thread for KIP-910:
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-910%3A+Update+Source+offsets+for+Source+Connectors+without+producing+records
>>>
>>> Thanks!
>>> Sagar.
>>>
>>


Re: [DISCUSS] KIP-914 Join Processor Semantics for Versioned Stores

2023-03-09 Thread John Roesler
Thanks for the KIP, Victoria!

I had some questions/concerns, but you addressed them in the Rejected 
Alternatives section. Thanks for the thorough proposal!

-John

On Thu, Mar 9, 2023, at 18:59, Victoria Xia wrote:
> Hi everyone,
>
> I have a proposal for updating Kafka Streams's stream-table join and
> table-table join semantics for the new versioned key-value state stores
> introduced in KIP-889
> .
> Would love to hear your thoughts and suggestions.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+Join+Processor+Semantics+for+Versioned+Stores
>
> Thanks,
> Victoria


Re: [ANNOUNCE] New Kafka PMC Member: Chris Egerton

2023-03-09 Thread John Roesler
Congratulations, Chris!
-John

On Thu, Mar 9, 2023, at 20:02, Luke Chen wrote:
> Congratulations, Chris!
>
> On Fri, Mar 10, 2023 at 9:57 AM Yash Mayya  wrote:
>
>> Congratulations Chris!
>>
>> On Thu, Mar 9, 2023, 23:42 Jun Rao  wrote:
>>
>> > Hi, Everyone,
>> >
>> > Chris Egerton has been a Kafka committer since July 2022. He has been
>> very
>> > instrumental to the community since becoming a committer. It's my
>> pleasure
>> > to announce that Chris is now a member of Kafka PMC.
>> >
>> > Congratulations Chris!
>> >
>> > Jun
>> > on behalf of Apache Kafka PMC
>> >
>>


Re: [ANNOUNCE] New Kafka PMC Member: David Arthur

2023-03-09 Thread John Roesler
Congratulations, David!
-John

On Thu, Mar 9, 2023, at 20:18, ziming deng wrote:
> Congrats David!
>
> Ziming
>
>> On Mar 10, 2023, at 10:02, Luke Chen  wrote:
>> 
>> Congratulations, David!
>> 
>> On Fri, Mar 10, 2023 at 9:56 AM Yash Mayya  wrote:
>> 
>>> Congrats David!
>>> 
>>> On Thu, Mar 9, 2023, 23:42 Jun Rao  wrote:
>>> 
 Hi, Everyone,
 
 David Arthur has been a Kafka committer since 2013. He has been very
 instrumental to the community since becoming a committer. It's my
>>> pleasure
 to announce that David is now a member of Kafka PMC.
 
 Congratulations David!
 
 Jun
 on behalf of Apache Kafka PMC
 
>>>


Re: [VOTE] KIP-898: Modernize Connect plugin discovery

2023-02-28 Thread John Roesler
Thanks for the KIP, Greg!

I’m +1 (binding)

I really appreciate all the care you took in the migration and test design. 

Thanks,
John

On Tue, Feb 28, 2023, at 04:33, Federico Valeri wrote:
> +1 (non binding)
>
> Thanks
> Fede
>
> On Tue, Feb 28, 2023 at 10:10 AM Mickael Maison
>  wrote:
>>
>> +1 (binding)
>>
>> Thanks,
>> Mickael
>>
>> On Mon, Feb 27, 2023 at 7:42 PM Chris Egerton  
>> wrote:
>> >
>> > +1 (binding). Thanks for the KIP!
>> >
>> > On Mon, Feb 27, 2023 at 12:51 PM Greg Harris 
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I'd like to call a vote for KIP-898 which aims to improve the performance
>> > > of Connect startup by allowing discovery of plugins via the 
>> > > ServiceLoader.
>> > >
>> > > KIP:
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-898
>> > > %3A+Modernize+Connect+plugin+discovery
>> > >
>> > > Discussion thread:
>> > > https://lists.apache.org/thread/wxh0r343w86s91py0876njzbyn5qxd8s
>> > >
>> > > Thanks!
>> > >


Re: [DISCUSS] KIP-909: Allow clients to rebootstrap DNS lookup failure

2023-02-23 Thread John Roesler
Thanks for the KIP, Philip!

I think your proposal makes sense. I suspect the reason that we previously did 
the DNS resolution in the constructor is to fail fast if the name is wrong. On 
the other hand, it's generally a hassle to do failure-prone or slow operations 
in a constructor, so I'm in favor of moving it to poll.

I'm also in favor of throwing NetworkException (or some other 
RetriableException), since failing to resolve the DNS entry for the brokers 
shouldn't poison the state of the client, and it should be fine for users to 
retry if they want to.

I actually do think there might be some overlap with KIP-899. If we go ahead 
and move DNS resolution to poll, then KIP-899 becomes just a question of 
whether we should call poll at other points after the first resolution. It 
seems like these could potentially be merged into one proposal, or you and Ivan 
could coordinate on symbiotic KIPs.

Thanks again,
-John

On 2023/02/23 17:29:23 Philip Nee wrote:
> Hi all!
> 
> I want to start a discussion thread about how we can handle client
> bootstrap failure due DNS lookup.  This requires a bit of behavioral
> change, so a KIP is proposed and attached to this email. Let me know what
> you think!
> 
> 
> *A small remark here*: *As the title of this KIP might sound
> familiar/similar to KIP-899, it is not the same.*
> 
> *In Summary:* I want to propose a KIP to change the existing bootstrap
> (upon instantiation) strategy because it is reasonable to allow clients to
> retry
> 
> *KIP: *
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-909%3A+Allow+Clients+to+Rebootstrap+Upon+Failed+DNS+Resolution
> 
> Thanks!
> Philip
> 


Re: [ANNOUNCE] New committer: Lucas Bradstreet

2023-02-16 Thread John Roesler
Congratulations, Lucas!
-John

On Thu, Feb 16, 2023, at 19:52, Luke Chen wrote:
> Congratulations, Lucas!
>
> Luke
>
> On Fri, Feb 17, 2023 at 8:57 AM Guozhang Wang 
> wrote:
>
>> Congratulations, Lucas!
>>
>> On Thu, Feb 16, 2023 at 3:18 PM Kowshik Prakasam 
>> wrote:
>> >
>> > Congratulations, Lucas!
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> > On Thu, Feb 16, 2023, 2:07 PM Justine Olshan
>> 
>> > wrote:
>> >
>> > > Congratulations Lucas!
>> > >
>> > > Thanks for your mentorship on some of my KIPs as well :)
>> > >
>> > > On Thu, Feb 16, 2023 at 1:56 PM Jun Rao 
>> wrote:
>> > >
>> > > > Hi, Everyone,
>> > > >
>> > > > The PMC of Apache Kafka is pleased to announce a new Kafka committer
>> > > Lucas
>> > > > Bradstreet.
>> > > >
>> > > > Lucas has been a long time Kafka contributor since Oct. 2018. He has
>> been
>> > > > extremely valuable for Kafka on both performance and correctness
>> > > > improvements.
>> > > >
>> > > > The following are his performance related contributions.
>> > > >
>> > > > KAFKA-9820: validateMessagesAndAssignOffsetsCompressed allocates
>> batch
>> > > > iterator which is not used
>> > > > KAFKA-9685: Solve Set concatenation perf issue in AclAuthorizer
>> > > > KAFKA-9729: avoid readLock in authorizer ACL lookups
>> > > > KAFKA-9039: Optimize ReplicaFetcher fetch path
>> > > > KAFKA-8841: Reduce overhead of
>> ReplicaManager.updateFollowerFetchState
>> > > >
>> > > > The following are his correctness related contributions.
>> > > >
>> > > > KAFKA-13194: LogCleaner may clean past highwatermark
>> > > > KAFKA-10432: LeaderEpochCache is incorrectly recovered on segment
>> > > recovery
>> > > > for epoch 0
>> > > > KAFKA-9137: Fix incorrect FetchSessionCache eviction logic
>> > > >
>> > > > Congratulations, Lucas!
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun (on behalf of the Apache Kafka PMC)
>> > > >
>> > >
>>


Re: [ANNOUNCE] New committer: Walker Carlson

2023-01-17 Thread John Roesler
Congratulations, Walker!
-John

On Tue, Jan 17, 2023, at 18:50, Guozhang Wang wrote:
> Congrats, Walker!
>
> On Tue, Jan 17, 2023 at 2:20 PM Chris Egerton 
> wrote:
>
>> Congrats, Walker!
>>
>> On Tue, Jan 17, 2023, 17:07 Bill Bejeck  wrote:
>>
>> > Congratulations, Walker!
>> >
>> > -Bill
>> >
>> > On Tue, Jan 17, 2023 at 4:57 PM Matthias J. Sax 
>> wrote:
>> >
>> > > Dear community,
>> > >
>> > > I am pleased to announce Walker Carlson as a new Kafka committer.
>> > >
>> > > Walker has been contributing to Apache Kafka since November 2019. He
>> > > made various contributions including the following KIPs.
>> > >
>> > > KIP-671: Introduce Kafka Streams Specific Uncaught Exception Handler
>> > > KIP-696: Update Streams FSM to clarify ERROR state meaning
>> > > KIP-715: Expose Committed offset in streams
>> > >
>> > >
>> > > Congratulations Walker and welcome on board!
>> > >
>> > >
>> > > Thanks,
>> > >-Matthias (on behalf of the Apache Kafka PMC)
>> > >
>> >
>>


Re: [ANNOUNCE] New committer: Satish Duggana

2023-01-17 Thread John Roesler
Ay, sorry about my autocorrect, Satish. 

On Tue, Jan 17, 2023, at 19:13, John Roesler wrote:
> Congratulations, Salish! I missed the announcement before. 
>
> -John
>
> On Tue, Jan 17, 2023, at 18:53, Guozhang Wang wrote:
>> Congratulations, Satish!
>>
>> On Tue, Jan 10, 2023 at 11:31 AM Rajini Sivaram 
>> wrote:
>>
>>> Congratulations, Satish!
>>>
>>> Regards,
>>>
>>> Rajini
>>>
>>> On Tue, Jan 10, 2023 at 5:12 PM Bruno Cadonna  wrote:
>>>
>>> > Congrats!
>>> >
>>> > Best,
>>> > Bruno
>>> >
>>> > On 24.12.22 12:44, Manikumar wrote:
>>> > > Congrats, Satish!  Well deserved.
>>> > >
>>> > > On Sat, Dec 24, 2022, 5:10 PM Tom Bentley  wrote:
>>> > >
>>> > >> Congratulations!
>>> > >>
>>> > >> On Sat, 24 Dec 2022 at 05:05, Luke Chen  wrote:
>>> > >>
>>> > >>> Congratulations, Satish!
>>> > >>>
>>> > >>> On Sat, Dec 24, 2022 at 4:12 AM Federico Valeri <
>>> fedeval...@gmail.com>
>>> > >>> wrote:
>>> > >>>
>>> > >>>> Hi Satish, congrats!
>>> > >>>>
>>> > >>>> On Fri, Dec 23, 2022, 8:46 PM Viktor Somogyi-Vass
>>> > >>>>  wrote:
>>> > >>>>
>>> > >>>>> Congrats Satish!
>>> > >>>>>
>>> > >>>>> On Fri, Dec 23, 2022, 19:38 Mickael Maison <
>>> mickael.mai...@gmail.com
>>> > >>>
>>> > >>>>> wrote:
>>> > >>>>>
>>> > >>>>>> Congratulations Satish!
>>> > >>>>>>
>>> > >>>>>> On Fri, Dec 23, 2022 at 7:36 PM Divij Vaidya <
>>> > >>> divijvaidy...@gmail.com>
>>> > >>>>>> wrote:
>>> > >>>>>>>
>>> > >>>>>>> Congratulations Satish! 🎉
>>> > >>>>>>>
>>> > >>>>>>> On Fri 23. Dec 2022 at 19:32, Josep Prat
>>> > >>> >> > >>>>>
>>> > >>>>>>> wrote:
>>> > >>>>>>>
>>> > >>>>>>>> Congrats Satish!
>>> > >>>>>>>>
>>> > >>>>>>>> ———
>>> > >>>>>>>> Josep Prat
>>> > >>>>>>>>
>>> > >>>>>>>> Aiven Deutschland GmbH
>>> > >>>>>>>>
>>> > >>>>>>>> Immanuelkirchstraße 26, 10405 Berlin
>>> > >>>>>>>> <
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>
>>> > >>>
>>> > >>
>>> >
>>> https://www.google.com/maps/search/Immanuelkirchstra%C3%9Fe+26,+10405+Berlin?entry=gmail&source=g
>>> > >>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B
>>> > >>>>>>>>
>>> > >>>>>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>>> > >>>>>>>>
>>> > >>>>>>>> m: +491715557497
>>> > >>>>>>>>
>>> > >>>>>>>> w: aiven.io
>>> > >>>>>>>>
>>> > >>>>>>>> e: josep.p...@aiven.io
>>> > >>>>>>>>
>>> > >>>>>>>> On Fri, Dec 23, 2022, 19:23 Chris Egerton <
>>> > >>> fearthecel...@gmail.com
>>> > >>>>>
>>> > >>>>>> wrote:
>>> > >>>>>>>>
>>> > >>>>>>>>> Congrats, Satish!
>>> > >>>>>>>>>
>>> > >>>>>>>>> On Fri, Dec 23, 2022, 13:19 Arun Raju 
>>> > >>>>> wrote:
>>> > >>>>>>>>>
>>> > >>>>>>>>>> Congratulations 👏
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> On Fri, Dec 23, 2022, 1:08 PM Jun Rao
>>> > >>> >> > >>>>>
>>> > >>>>>>>> wrote:
>>> > >>>>>>>>>>
>>> > >>>>>>>>>>> Hi, Everyone,
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> The PMC of Apache Kafka is pleased to announce a new
>>> > >> Kafka
>>> > >>>>>> committer
>>> > >>>>>>>>>> Satish
>>> > >>>>>>>>>>> Duggana.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Satish has been a long time Kafka contributor since 2017.
>>> > >>> He
>>> > >>>> is
>>> > >>>>>> the
>>> > >>>>>>>>> main
>>> > >>>>>>>>>>> driver behind KIP-405 that integrates Kafka with remote
>>> > >>>>> storage,
>>> > >>>>>> a
>>> > >>>>>>>>>>> significant and much anticipated feature in Kafka.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Congratulations, Satish!
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Thanks,
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Jun (on behalf of the Apache Kafka PMC)
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>
>>> > >>>>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>> --
>>> > >>>>>>> Divij Vaidya
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>
>>> > >>>
>>> > >>
>>> > >
>>> >
>>>


Re: [ANNOUNCE] New committer: Satish Duggana

2023-01-17 Thread John Roesler
Congratulations, Salish! I missed the announcement before. 

-John

On Tue, Jan 17, 2023, at 18:53, Guozhang Wang wrote:
> Congratulations, Satish!
>
> On Tue, Jan 10, 2023 at 11:31 AM Rajini Sivaram 
> wrote:
>
>> Congratulations, Satish!
>>
>> Regards,
>>
>> Rajini
>>
>> On Tue, Jan 10, 2023 at 5:12 PM Bruno Cadonna  wrote:
>>
>> > Congrats!
>> >
>> > Best,
>> > Bruno
>> >
>> > On 24.12.22 12:44, Manikumar wrote:
>> > > Congrats, Satish!  Well deserved.
>> > >
>> > > On Sat, Dec 24, 2022, 5:10 PM Tom Bentley  wrote:
>> > >
>> > >> Congratulations!
>> > >>
>> > >> On Sat, 24 Dec 2022 at 05:05, Luke Chen  wrote:
>> > >>
>> > >>> Congratulations, Satish!
>> > >>>
>> > >>> On Sat, Dec 24, 2022 at 4:12 AM Federico Valeri <
>> fedeval...@gmail.com>
>> > >>> wrote:
>> > >>>
>> >  Hi Satish, congrats!
>> > 
>> >  On Fri, Dec 23, 2022, 8:46 PM Viktor Somogyi-Vass
>> >   wrote:
>> > 
>> > > Congrats Satish!
>> > >
>> > > On Fri, Dec 23, 2022, 19:38 Mickael Maison <
>> mickael.mai...@gmail.com
>> > >>>
>> > > wrote:
>> > >
>> > >> Congratulations Satish!
>> > >>
>> > >> On Fri, Dec 23, 2022 at 7:36 PM Divij Vaidya <
>> > >>> divijvaidy...@gmail.com>
>> > >> wrote:
>> > >>>
>> > >>> Congratulations Satish! 🎉
>> > >>>
>> > >>> On Fri 23. Dec 2022 at 19:32, Josep Prat
>> > >>> > > >
>> > >>> wrote:
>> > >>>
>> >  Congrats Satish!
>> > 
>> >  ———
>> >  Josep Prat
>> > 
>> >  Aiven Deutschland GmbH
>> > 
>> >  Immanuelkirchstraße 26, 10405 Berlin
>> >  <
>> > >>
>> > >
>> > 
>> > >>>
>> > >>
>> >
>> https://www.google.com/maps/search/Immanuelkirchstra%C3%9Fe+26,+10405+Berlin?entry=gmail&source=g
>> > >>>
>> > 
>> >  Amtsgericht Charlottenburg, HRB 209739 B
>> > 
>> >  Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>> > 
>> >  m: +491715557497
>> > 
>> >  w: aiven.io
>> > 
>> >  e: josep.p...@aiven.io
>> > 
>> >  On Fri, Dec 23, 2022, 19:23 Chris Egerton <
>> > >>> fearthecel...@gmail.com
>> > >
>> > >> wrote:
>> > 
>> > > Congrats, Satish!
>> > >
>> > > On Fri, Dec 23, 2022, 13:19 Arun Raju 
>> > > wrote:
>> > >
>> > >> Congratulations 👏
>> > >>
>> > >> On Fri, Dec 23, 2022, 1:08 PM Jun Rao
>> > >>> > > >
>> >  wrote:
>> > >>
>> > >>> Hi, Everyone,
>> > >>>
>> > >>> The PMC of Apache Kafka is pleased to announce a new
>> > >> Kafka
>> > >> committer
>> > >> Satish
>> > >>> Duggana.
>> > >>>
>> > >>> Satish has been a long time Kafka contributor since 2017.
>> > >>> He
>> >  is
>> > >> the
>> > > main
>> > >>> driver behind KIP-405 that integrates Kafka with remote
>> > > storage,
>> > >> a
>> > >>> significant and much anticipated feature in Kafka.
>> > >>>
>> > >>> Congratulations, Satish!
>> > >>>
>> > >>> Thanks,
>> > >>>
>> > >>> Jun (on behalf of the Apache Kafka PMC)
>> > >>>
>> > >>
>> > >
>> > 
>> > >>> --
>> > >>> Divij Vaidya
>> > >>
>> > >
>> > 
>> > >>>
>> > >>
>> > >
>> >
>>


Re: [ANNOUNCE] New committer: Stanislav Kozlovski

2023-01-17 Thread John Roesler
Congrats, Stanislav!
-John

On Tue, Jan 17, 2023, at 18:56, Ismael Juma wrote:
> Congratulations Stanislav!
>
> Ismael
>
> On Tue, Jan 17, 2023 at 7:51 AM Jun Rao  wrote:
>
>> Hi, Everyone,
>>
>> The PMC of Apache Kafka is pleased to announce a new Kafka committer
>> Stanislav Kozlovski.
>>
>> Stan has been contributing to Apache Kafka since June 2018. He made various
>> contributions including the following KIPs.
>>
>> KIP-455: Create an Administrative API for Replica Reassignment
>> KIP-412: Extend Admin API to support dynamic application log levels
>>
>> Congratulations, Stan!
>>
>> Thanks,
>>
>> Jun (on behalf of the Apache Kafka PMC)
>>


Re: [VOTE] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2023-01-09 Thread John Roesler
Yes, you are!

Congrats again :)
-John

On Mon, Jan 9, 2023, at 08:25, Viktor Somogyi-Vass wrote:
> Hey all,
>
> Now that I'm a committer am I allowed to change my non-binding vote to
> binding to pass the KIP? :)
>
> On Thu, Nov 10, 2022 at 6:13 PM Greg Harris 
> wrote:
>
>> +1 (non-binding)
>>
>> Thanks for the KIP, this is an important improvement.
>>
>> Greg
>>
>> On Thu, Nov 10, 2022 at 7:21 AM John Roesler  wrote:
>>
>> > Thanks for the KIP, Daniel!
>> >
>> > I'm no MM expert, but I've read over the KIP and discussion, and it seems
>> > reasonable to me.
>> >
>> > I'm +1 (binding).
>> >
>> > Thanks,
>> > -John
>> >
>> > On 2022/10/22 07:38:38 Urbán Dániel wrote:
>> > > Hi everyone,
>> > >
>> > > I would like to start a vote on KIP-710 which aims to support running a
>> > > dedicated MM2 cluster in distributed mode:
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters
>> > >
>> > > Regards,
>> > > Daniel
>> > >
>> > >
>> > > --
>> > > Ezt az e-mailt átvizsgálta az Avast AntiVirus szoftver.
>> > > www.avast.com
>> > >
>> >
>>


Re: [DISCUSS] KIP-894: Use incrementalAlterConfigs API for syncing topic configurations

2023-01-07 Thread John Roesler
Hi Tina,

Thanks for the KIP!

I hope someone with prior MM or Kafka config experience is able to chime in 
here; I have neither.

I took a look at your KIP, and it makes sense to me. I also think your 
migration plan is a good one.

One suggestion: I'm not sure how concerned you are about people's ability to 
migrate, but if you want to make it as smooth as possible, you could add one 
more step. In the 4.0 release, while removing `use.incremental.alter.configs`, 
you can also add `use.legacy.alter.configs` to give users a path to continue 
using the old behavior even after the default changes.

Clearly, this will prolong the deprecation period, with implications on code 
maintenance, so there is some downside. But generally, I've found going above 
and beyond to support smooth upgrades for users to be well worth it in the long 
run.

Thanks again,
-John


On Fri, Jan 6, 2023, at 05:49, Gantigmaa Selenge wrote:
> Hi everyone,
>
> I would like to start a discussion on the MirrorMaker update that 
> proposes
> replacing the deprecated alterConfigs API with the 
> incrementalAlterConfigs
> API for syncing topic configurations. Please take a look at the proposal
> here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-894%3A+Use+incrementalAlterConfigs+API+for+syncing+topic+configurations
>
>
> Regards,
> Tina


Re: [ANNOUNCE] New committer: Edoardo Comar

2023-01-07 Thread John Roesler
Congrats, Edoardo!
-John

On Fri, Jan 6, 2023, at 20:47, Matthias J. Sax wrote:
> Congrats!
>
> On 1/6/23 5:15 PM, Luke Chen wrote:
>> Congratulations, Edoardo!
>> 
>> Luke
>> 
>> On Sat, Jan 7, 2023 at 7:58 AM Mickael Maison 
>> wrote:
>> 
>>> Congratulations Edo!
>>>
>>>
>>> On Sat, Jan 7, 2023 at 12:05 AM Jun Rao  wrote:

 Hi, Everyone,

 The PMC of Apache Kafka is pleased to announce a new Kafka committer
>>> Edoardo
 Comar.

 Edoardo has been a long time Kafka contributor since 2016. His major
 contributions are the following.

 KIP-302: Enable Kafka clients to use all DNS resolved IP addresses
 KIP-277: Fine Grained ACL for CreateTopics API
 KIP-136: Add Listener name to SelectorMetrics tags

 Congratulations, Edoardo!

 Thanks,

 Jun (on behalf of the Apache Kafka PMC)
>>>
>>


Re: [ANNOUNCE] New committer: Justine Olshan

2023-01-03 Thread John Roesler
Congrats, Justine!
-John

On Tue, Jan 3, 2023, at 13:03, Matthias J. Sax wrote:
> Congrats!
>
> On 12/29/22 6:47 PM, ziming deng wrote:
>> Congratulations Justine!
>> —
>> Best,
>> Ziming
>> 
>>> On Dec 30, 2022, at 10:06, Luke Chen  wrote:
>>>
>>> Congratulations, Justine!
>>> Well deserved!
>>>
>>> Luke
>>>
>>> On Fri, Dec 30, 2022 at 9:15 AM Ron Dagostino  wrote:
>>>
 Congratulations, Justine!Well-deserved., and I’m very happy for you.

 Ron

> On Dec 29, 2022, at 6:13 PM, Israel Ekpo  wrote:
>
> Congratulations Justine!
>
>
>> On Thu, Dec 29, 2022 at 5:05 PM Greg Harris
 
>> wrote:
>>
>> Congratulations Justine!
>>
>>> On Thu, Dec 29, 2022 at 1:37 PM Bill Bejeck  wrote:
>>>
>>> Congratulations Justine!
>>>
>>>
>>> -Bill
>>>
 On Thu, Dec 29, 2022 at 4:36 PM Philip Nee 
 wrote:
>>>
 wow congrats!

 On Thu, Dec 29, 2022 at 1:05 PM Chris Egerton <
 fearthecel...@gmail.com
>>>
 wrote:

> Congrats, Justine!
>
> On Thu, Dec 29, 2022, 15:58 David Jacot  wrote:
>
>> Hi all,
>>
>> The PMC of Apache Kafka is pleased to announce a new Kafka
>> committer
>> Justine
>> Olshan.
>>
>> Justine has been contributing to Kafka since June 2019. She
>>> contributed
> 53
>> PRs including the following KIPs.
>>
>> KIP-480: Sticky Partitioner
>> KIP-516: Topic Identifiers & Topic Deletion State Improvements
>> KIP-854: Separate configuration for producer ID expiry
>> KIP-890: Transactions Server-Side Defense (in progress)
>>
>> Congratulations, Justine!
>>
>> Thanks,
>>
>> David (on behalf of the Apache Kafka PMC)
>>
>

>>>
>>

>> 
>>


Re: [ANNOUNCE] New committer: Viktor Somogyi-Vass

2022-12-20 Thread John Roesler
Congratulations, Viktor!
-John 

On Sat, Dec 17, 2022, at 05:24, Viktor Somogyi-Vass wrote:
> Thanks again everyone!
>
> On Fri, Dec 16, 2022, 18:36 Bill Bejeck  wrote:
>
>> Congratulations, Viktor!
>>
>> -Bill
>>
>> On Fri, Dec 16, 2022 at 12:32 PM Matthias J. Sax  wrote:
>>
>> > Congrats!
>> >
>> > On 12/15/22 7:10 AM, Rajini Sivaram wrote:
>> > > Congratulations, Viktor!
>> > >
>> > > Regards,
>> > >
>> > > Rajini
>> > >
>> > >
>> > > On Thu, Dec 15, 2022 at 11:41 AM Ron Dagostino 
>> > wrote:
>> > >
>> > >> Congrats to you too, Victor!
>> > >>
>> > >> Ron
>> > >>
>> > >>> On Dec 15, 2022, at 4:59 AM, Viktor Somogyi-Vass <
>> > >> viktor.somo...@cloudera.com.invalid> wrote:
>> > >>>
>> > >>> Thank you everyone! :)
>> > >>>
>> >  On Thu, Dec 15, 2022 at 10:22 AM Mickael Maison <
>> > >> mickael.mai...@gmail.com>
>> >  wrote:
>> > 
>> >  Congratulations Viktor!
>> > 
>> > > On Thu, Dec 15, 2022 at 10:06 AM Tamas Barnabas Egyed
>> > >  wrote:
>> > >
>> > > Congratulations, Viktor!
>> > 
>> > >>
>> > >
>> >
>>


Re: [ANNOUNCE] New committer: Josep Prat

2022-12-20 Thread John Roesler
Congratulations, Josep!
-John

On Tue, Dec 20, 2022, at 20:02, Luke Chen wrote:
> Congratulations, Josep!
>
> Luke
>
> On Wed, Dec 21, 2022 at 6:26 AM Viktor Somogyi-Vass
>  wrote:
>
>> Congrats Josep!
>>
>> On Tue, Dec 20, 2022, 21:56 Matthias J. Sax  wrote:
>>
>> > Congrats!
>> >
>> > On 12/20/22 12:01 PM, Josep Prat wrote:
>> > > Thank you all!
>> > >
>> > > ———
>> > > Josep Prat
>> > >
>> > > Aiven Deutschland GmbH
>> > >
>> > > Immanuelkirchstraße 26, 10405 Berlin
>> > >
>> > > Amtsgericht Charlottenburg, HRB 209739 B
>> > >
>> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>> > >
>> > > m: +491715557497
>> > >
>> > > w: aiven.io
>> > >
>> > > e: josep.p...@aiven.io
>> > >
>> > > On Tue, Dec 20, 2022, 20:42 Bill Bejeck  wrote:
>> > >
>> > >> Congratulations Josep!
>> > >>
>> > >> -Bill
>> > >>
>> > >> On Tue, Dec 20, 2022 at 1:11 PM Mickael Maison <
>> > mickael.mai...@gmail.com>
>> > >> wrote:
>> > >>
>> > >>> Congratulations Josep!
>> > >>>
>> > >>> On Tue, Dec 20, 2022 at 6:55 PM Bruno Cadonna 
>> > >> wrote:
>> > 
>> >  Congrats, Josep!
>> > 
>> >  Well deserved!
>> > 
>> >  Best,
>> >  Bruno
>> > 
>> >  On 20.12.22 18:40, Kirk True wrote:
>> > > Congrats Josep!
>> > >
>> > > On Tue, Dec 20, 2022, at 9:33 AM, Jorge Esteban Quilcate Otoya
>> wrote:
>> > >> Congrats Josep!!
>> > >>
>> > >> On Tue, 20 Dec 2022, 17:31 Greg Harris,
>> > >> > > 
>> > >> wrote:
>> > >>
>> > >>> Congratulations Josep!
>> > >>>
>> > >>> On Tue, Dec 20, 2022 at 9:29 AM Chris Egerton <
>> > >>> fearthecel...@gmail.com>
>> > >>> wrote:
>> > >>>
>> >  Congrats Josep! Well-earned.
>> > 
>> >  On Tue, Dec 20, 2022, 12:26 Jun Rao 
>> > >>> wrote:
>> > 
>> > > Hi, Everyone,
>> > >
>> > > The PMC of Apache Kafka is pleased to announce a new Kafka
>> > >>> committer
>> >  Josep
>> > >Prat.
>> > >
>> > > Josep has been contributing to Kafka since May 2021. He
>> > >>> contributed 20
>> >  PRs
>> > > including the following 2 KIPs.
>> > >
>> > > KIP-773 Differentiate metric latency measured in ms and ns
>> > > KIP-744: Migrate TaskMetadata and ThreadMetadata to an
>> interface
>> > >>> with
>> > > internal implementation
>> > >
>> > > Congratulations, Josep!
>> > >
>> > > Thanks,
>> > >
>> > > Jun (on behalf of the Apache Kafka PMC)
>> > >
>> > 
>> > >>>
>> > >>
>> > >
>> > >>>
>> > >>
>> > >
>> >
>>


Re: [DISCUSS] KIP-886 Add Client Producer and Consumer Builders

2022-12-17 Thread John Roesler
Hello Dan,

Thanks for the KIP!

I just saw the vote thread, took a look at the KIP, and had some questions. 
Sorry for the delay in reviewing.

It can be a hassle, but the KIP document should specify the exact public 
interfaces you are proposing. This helps to prevent a gap between what 
reviewers think they are approving and what actually gets shipped.

For example, it’s not clear to me whether you intend to add all the configs to 
the builders right now, or just the common ones.

My only other concern is how we can keep the builders up to date as configs are 
added or their descriptions are refactored. Keeping synonymous APIs consistent 
has been a problem with the Kafka Streams Scala API, for example.

I have two suggestions that might help: (1) propose to update all the JavaDocs 
of the existing Config classes to link to the equivalent builder methods, which 
will help remind people when they add new configs. (2) instead of writing new 
JavaDocs on the builder methods, link them back to the Config definitions. 
Linking in both directions will not only remind contributors to consider both 
classes, but it will also help inform users of their options and the 
relationship between them. 

Thanks again for the KIP!
-John

On Wed, Nov 16, 2022, at 16:50, Dan S wrote:
> Hello all,
>
> I've gotten great feedback from Knowles here, and from Luke Chen on the
> jira, so thanks to both so much. This is my first KIP, and I'm pretty new
> to contributing to kafka, so I'd like to learn a little bit more about the
> process and the way things usually work.
> Should I open a PR first? Should I simply wait for discussion comments or
> votes (I've gotten no votes yet)?
>
> Thanks so much,
>
> Dan
>
> On Fri, Nov 11, 2022 at 1:30 AM Knowles Atchison Jr 
> wrote:
>
>> This would be helpful. For our own client library wrappers we implemented
>> this functionality for any type with defaults for  and
>>  consumers/producers.
>>
>> On Thu, Nov 10, 2022, 6:35 PM Dan S  wrote:
>>
>> > Hello all,
>> >
>> > I think that adding builders for the producer and the consumer in kafka
>> > client would make it much easier for developers to instantiate new
>> > producers and consumers, especially if they are using an IDE with
>> > intellisense, and using the IDE to navigate to the documentation which
>> > could be added to the builder's withXYZ methods.
>> >
>> > Please let me know if you have any comments, questions, or suggestions!
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-886%3A+Add+Client+Producer+and+Consumer+Builders
>> >
>> > Thanks,
>> >
>> > Dan
>> >
>>


Re: [VOTE] KIP-889 Versioned State Stores

2022-12-15 Thread John Roesler
Thanks for the thorough KIP, Victoria!

I'm +1 (binding)

-John

On 2022/12/15 19:56:21 Victoria Xia wrote:
> Hi all,
> 
> I'd like to start a vote on KIP-889 for introducing versioned key-value
> state stores to Kafka Streams:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores
> 
> The discussion thread has been open for a few weeks now and has converged
> among the current participants.
> 
> Thanks,
> Victoria
> 


Re: [VOTE] KIP-893: The Kafka protocol should support nullable structs

2022-12-05 Thread John Roesler
+1 (binding)

Thanks,
-John

On Mon, Dec 5, 2022, at 16:57, Kirk True wrote:
> +1 (non-binding)
>
> On Mon, Dec 5, 2022, at 10:05 AM, Colin McCabe wrote:
>> +1 (binding)
>> 
>> best,
>> Colin
>> 
>> On Mon, Dec 5, 2022, at 10:03, David Jacot wrote:
>> > Hi all,
>> >
>> > As this KIP-893 is trivial and non-controversial, I would like to
>> > start the vote on it. The KIP is here:
>> > https://cwiki.apache.org/confluence/x/YJIODg
>> >
>> > Thanks,
>> > David
>>


Re: Ci stability

2022-11-24 Thread John Roesler
Hi Dan,

I’m not sure if there’s a consistently used tag, but I’ve gotten good mileage 
out of just searching for “flaky” or “flaky test” in Jira. 

If you’re thinking about filing a ticket for a specific test failure you’ve 
seen, I’ve also usually been able to find out whether there’s already a ticket 
by searching for the test class or method name. 

People seem to typically file tickets with “flaky” in the title and then the 
test name. 

Thanks again for your interest in improving the situation!
-John

On Thu, Nov 24, 2022, at 10:08, Dan S wrote:
> Thanks for the reply John! Is there a jira tag or view or something that
> can be used to find all the failing tests and maybe even try to fix them
> (even if fix just means extending a timeout)?
>
>
>
> On Thu, Nov 24, 2022, 16:03 John Roesler  wrote:
>
>> Hi Dan,
>>
>> Thanks for pointing this out. Flaky tests are a perennial problem. We
>> knock them out every now and then, but eventually more spring up.
>>
>> I’ve had some luck in the past filing Jira tickets for the failing tests
>> as they pop up in my PRs. Another thing that seems to motivate people is to
>> open a PR to disable the test in question, as you mention. That can be a
>> bit aggressive, though, so it wouldn’t be my first suggestion.
>>
>> I appreciate you bringing this up. I agree that flaky tests pose a risk to
>> the project because it makes it harder to know whether a PR breaks things
>> or not.
>>
>> Thanks,
>> John
>>
>> On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
>> > Hello all,
>> >
>> > I've had a pr that has been open for a little over a month (several
>> > feedback cycles happened), and I've never seen a fully passing build
>> (tests
>> > in completely different parts of the codebase seemed to fail, often
>> > timeouts). A cursory look at open PRs seems to indicate that mine is not
>> > the only one. I was wondering if there is a place where all the flaky
>> tests
>> > are being tracked, and if it makes sense to fix (or at least temporarily
>> > disable) them so that confidence in new PRs could be increased.
>> >
>> > Thanks,
>> >
>> > Dan
>>


Re: Ci stability

2022-11-24 Thread John Roesler
Hi Dan,

Thanks for pointing this out. Flaky tests are a perennial problem. We knock 
them out every now and then, but eventually more spring up.

I’ve had some luck in the past filing Jira tickets for the failing tests as 
they pop up in my PRs. Another thing that seems to motivate people is to open a 
PR to disable the test in question, as you mention. That can be a bit 
aggressive, though, so it wouldn’t be my first suggestion.

I appreciate you bringing this up. I agree that flaky tests pose a risk to the 
project because it makes it harder to know whether a PR breaks things or not. 

Thanks,
John

On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
> Hello all,
>
> I've had a pr that has been open for a little over a month (several
> feedback cycles happened), and I've never seen a fully passing build (tests
> in completely different parts of the codebase seemed to fail, often
> timeouts). A cursory look at open PRs seems to indicate that mine is not
> the only one. I was wondering if there is a place where all the flaky tests
> are being tracked, and if it makes sense to fix (or at least temporarily
> disable) them so that confidence in new PRs could be increased.
>
> Thanks,
>
> Dan


Re: Regarding an issue with message distributions in kafka.

2022-11-23 Thread John Roesler
Hi Puneet,

Thanks for the question. It sounds like either the three consumers have 
different group.id configurations or they are using “assign” instead of 
“subscribe”.

You are correct, if the consumers are all in the same consumer group and 
subscribed to the same topic, Kafka will only assign partition 0 to one of them 
at a time. 

I hope this helps!
-John

On Wed, Nov 23, 2022, at 05:14, Puneet Dubey wrote:
> Hi.
>
> In our app we have integrated kafka to publish and subscribe to the 
> events. Based on the documentation and distribution policy we designed 
> our application where we are facing some contradictory behavior. As per 
> the documentation at a given time there can be a maximum of one 
> consumer per partition. Our application is written in spring boot and 
> we are using org.springframework.kafka dependency to manage kafka and 
> kafka clusters are hosted in aws ec2 instances. 
>
> While producing messages we are writing them in a specific partition 
> let's say 0 and we have a group of 3 consumers all reading same 
> partition and belong to the same group. Our expectation was that at a 
> time only one active consumer rest of the two will be idle. But what's 
> happening is all of them are receiving and processing the same message. 
> Due to which desired action is being performed thrice. Please suggest a 
> solution.
>
> -- 
> Thanks & Regards,
> Puneet Dubey
> *Technical Consultant*__
> Epictenet 
> *M.* +91 8269285876
> *P.* +0755 4030158 
> *W.* www.epictenet.com
> *A.* 3 & 11B, Essargee New Fort, BHEL, Bhopal (M.P.) 462022, India
> *HO. *Level 10, 47 York Street, Sydney, NSW 2000, Australia


Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2022-11-22 Thread John Roesler
Thanks for publishing this alternative, Nick!

The benchmark you mentioned in the KIP-844 discussion seems like a compelling 
reason to revisit the built-in transactionality mechanism. I also appreciate 
you analysis, showing that for most use cases, the write batch approach should 
be just fine.

There are a couple of points that would hold me back from approving this KIP 
right now:

1. Loss of coverage for custom stores.
The fact that you can plug in a (relatively) simple implementation of the 
XStateStore interfaces and automagically get a distributed database out of it 
is a significant benefit of Kafka Streams. I'd hate to lose it, so it would be 
better to spend some time and come up with a way to preserve that property. For 
example, can we provide a default implementation of `commit(..)` that 
re-implements the existing checkpoint-file approach? Or perhaps add an 
`isTransactional()` flag to the state store interface so that the runtime can 
decide whether to continue to manage checkpoint files vs delegating 
transactionality to the stores?

2. Guarding against OOME
I appreciate your analysis, but I don't think it's sufficient to say that we 
will solve the memory problem later if it becomes necessary. The experience 
leading to that situation would be quite bad: Imagine, you upgrade to AK 
3.next, your tests pass, so you deploy to production. That night, you get paged 
because your app is now crashing with OOMEs. As with all OOMEs, you'll have a 
really hard time finding the root cause, and once you do, you won't have a 
clear path to resolve the issue. You could only tune down the commit interval 
and cache buffer size until you stop getting crashes.

FYI, I know of multiple cases where people run EOS with much larger commit 
intervals to get better batching than the default, so I don't think this 
pathological case would be as rare as you suspect.

Given that we already have the rudiments of an idea of what we could do to 
prevent this downside, we should take the time to design a solution. We owe it 
to our users to ensure that awesome new features don't come with bitter pills 
unless we can't avoid it.

3. ALOS mode.
On the other hand, I didn't see an indication of how stores will be handled 
under ALOS (aka non-EOS) mode. Theoretically, the transactionality of the store 
and the processing mode are orthogonal. A transactional store would serve ALOS 
just as well as a non-transactional one (if not better). Under ALOS, though, 
the default commit interval is five minutes, so the memory issue is far more 
pressing.

As I see it, we have several options to resolve this point. We could 
demonstrate that transactional stores work just fine for ALOS and we can 
therefore just swap over unconditionally. We could also disable the 
transactional mechanism under ALOS so that stores operate just the same as they 
do today when run in ALOS mode. Finally, we could do the same as in KIP-844 and 
make transactional stores opt-in (it'd be better to avoid the extra opt-in 
mechanism, but it's a good get-out-of-jail-free card).

4. (minor point) Deprecation of methods

You mentioned that the new `commit` method replaces flush, 
updateChangelogOffsets, and checkpoint. It seems to me that the point about 
atomicity and Position also suggests that it replaces the Position callbacks. 
However, the proposal only deprecates `flush`. Should we be deprecating other 
methods as well?

Thanks again for the KIP! It's really nice that you and Alex will get the 
chance to collaborate on both directions so that we can get the best outcome 
for Streams and its users.

-John


On 2022/11/21 15:02:15 Nick Telford wrote:
> Hi everyone,
> 
> As I mentioned in the discussion thread for KIP-844, I've been working on
> an alternative approach to achieving better transactional semantics for
> Kafka Streams StateStores.
> 
> I've published this separately as KIP-892: Transactional Semantics for
> StateStores
> ,
> so that it can be discussed/reviewed separately from KIP-844.
> 
> Alex: I'm especially interested in what you think!
> 
> I have a nearly complete implementation of the changes outlined in this
> KIP, please let me know if you'd like me to push them for review in advance
> of a vote.
> 
> Regards,
> 
> Nick
> 


Re: [VOTE] KIP-884: Add config to configure KafkaClientSupplier in Kafka Streams

2022-11-21 Thread John Roesler
I'm +1 (binding)

Thanks for the KIP!
-John

On 2022/11/17 21:06:29 Hao Li wrote:
> Hi all,
> 
> I would like start a vote on KIP-884:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-884%3A+Add+config+to+configure+KafkaClientSupplier+in+Kafka+Streams
> 
> 
> Thanks,
> Hao
> 


Re: Starting out with Kafka

2022-11-13 Thread John Roesler
Glad to hear it!
-John 

On Fri, Nov 11, 2022, at 14:57, vinay deshpande wrote:
> Hi John,
> Thanks a ton. Things are looking good now.
> Even though Scala was present on my machine the ./gradlew compile was
> unable to find the Scala classes, on installing Scala plugin on IDE and
> then recompiling from terminal the projecting is building fine now.
>
> Thanks John and Ziming for helping out.
>
> Thanks,
> Vinay
>
> On Fri, Nov 11, 2022 at 11:01 AM John Roesler  wrote:
>
>> Hmm,
>>
>> I assume you did, but just to be sure, did you compile from the terminal
>> before trying to build in idea? There are some generated Java classes that
>> might be getting missed.
>>
>> The compile step is probably something like “compileJava compileScala
>> compileTestJava compileTestScala” from the top of my head.
>>
>> I hope this helps!
>> -John
>>
>> Ps: I don’t think image attachments work with the mailing list. Maybe you
>> can use a gist?
>>
>> On Fri, Nov 11, 2022, at 12:51, vinay deshpande wrote:
>> > Hi,
>> > I tried both the suggestions given in the previous mail threads, all
>> > the unit tests passed except for testMuteOnOOM(). But the issue with
>> > IDE persists.
>> > I even tried invalidating the cache a few times, deleted the .bin/ and
>> > .idea/ folder and built again  but there are quite few imports that
>> > aren't being resolved (especially kafka.server, kafka.zk and kafka.log).
>> > I'm attaching the result of the unit test here.
>> > Screen Shot 2022-11-11 at 10.45.48 AM.png
>> > TIA.
>> >
>> > On Fri, Nov 11, 2022 at 5:04 AM John Roesler 
>> wrote:
>> >> Hello Vinay,
>> >>
>> >> One thing I’ve noticed recently is that I have to click the “build”
>> button in intellij before I can use the “run” or “debug” buttons. I’m not
>> sure why.
>> >>
>> >> Welcome to the community!
>> >> -John
>> >>
>> >> On Fri, Nov 11, 2022, at 02:47, deng ziming wrote:
>> >> > Hello, Vinay
>> >> > Kafka uses gradlew as build tool and java/scala as program language,
>> >> > You can firstly use `./gradlew unitTest` to build it using terminal,
>> >> > and reload it in gradle window, sometimes I also change default build
>> >> > tool from IDEA to gradle in Preference/Build/build tools/Gradle:
>> >> >
>> >> > PastedGraphic-1.tiff
>> >> >
>> >> > --
>> >> > Ziming
>> >> >
>> >> >> On Nov 11, 2022, at 13:30, vinay deshpande 
>> wrote:
>> >> >>
>> >> >> Hi All,
>> >> >> I have a basic question: I tried importing kafka source code into
>> intellij
>> >> >> but there are bunch of imports that IDE cannot find like these:
>> >> >>
>> >> >> import kafka.api.ApiVersion;
>> >> >> import kafka.log.CleanerConfig;
>> >> >> import kafka.log.LogConfig;
>> >> >> import kafka.log.LogManager;
>> >> >>
>> >> >>
>> >> >> TIA.
>> >> >>
>> >> >> Thanks,
>> >> >> Vinay
>>


Re: Streams: clarification needed, checkpoint vs. position files

2022-11-11 Thread John Roesler
Hi all,

Just to clarify: there actually is a position file. It was a small detail of 
the IQv2 implementation to add it, otherwise a persistent store's position 
would be lost after a restart.

Otherwise, Sophie is right on the money. The checkpoint refers to an offset in 
the changelog, while the position refers to offsets in the task's input topics 
topics. So they are similar in function and structure, but they refer to two 
different things.

I agree that, given this, it doesn't seem like consolidating them (for example, 
into one file) would be worth it. It would make the code more complicated 
without deduping any information.

I hope this helps, and look forward to what you're cooking up, Nick!
-John

On 2022/11/12 00:50:27 Sophie Blee-Goldman wrote:
> Hey Nick,
> 
> I haven't been following the new IQv2 work very closely so take this with a
> grain of salt,
> but as far as I'm aware there's no such thing as "position files" -- the
> Position is just an
> in-memory object and is related to a user's query against the state store,
> whereas a
> checkpoint file reflects the current state of the store ie how much of the
> changelog it
> contains.
> 
> In other words while these might look like they do similar things, the
> actual usage and
> implementation of Positions vs checkpoint files is pretty much unrelated.
> So I don't think
> it would sense for Streams to try and consolidate these or replace one with
> another.
> 
> Hope this answers your question, and I'll ping John to make sure I'm not
> misleading
> you regarding the usage/intention of Positions
> 
> Sophie
> 
> On Fri, Nov 11, 2022 at 6:48 AM Nick Telford  wrote:
> 
> > Hi everyone,
> >
> > I'm trying to understand how StateStores work internally for some changes
> > that I plan to propose, and I'd like some clarification around checkpoint
> > files and position files.
> >
> > It appears as though position files are relatively new, and were created as
> > part of the IQv2 initiative, as a means to track the position of the local
> > state store so that reads could be bound by particular positions?
> >
> > Checkpoint files look much older, and are managed by the Task itself
> > (actually, ProcessorStateManager). It looks like this is used exclusively
> > for determining a) whether to restore a store, and b) which offsets to
> > restore from?
> >
> > If I've understood the above correctly, is there any scope to potentially
> > replace checkpoint files with StateStore#position()?
> >
> > Regards,
> >
> > Nick
> >
> 


Re: Starting out with Kafka

2022-11-11 Thread John Roesler
Hmm,

I assume you did, but just to be sure, did you compile from the terminal before 
trying to build in idea? There are some generated Java classes that might be 
getting missed.

The compile step is probably something like “compileJava compileScala 
compileTestJava compileTestScala” from the top of my head. 

I hope this helps!
-John

Ps: I don’t think image attachments work with the mailing list. Maybe you can 
use a gist?

On Fri, Nov 11, 2022, at 12:51, vinay deshpande wrote:
> Hi,
> I tried both the suggestions given in the previous mail threads, all 
> the unit tests passed except for testMuteOnOOM(). But the issue with 
> IDE persists.
> I even tried invalidating the cache a few times, deleted the .bin/ and 
> .idea/ folder and built again  but there are quite few imports that 
> aren't being resolved (especially kafka.server, kafka.zk and kafka.log).
> I'm attaching the result of the unit test here.
> Screen Shot 2022-11-11 at 10.45.48 AM.png
> TIA.
>
> On Fri, Nov 11, 2022 at 5:04 AM John Roesler  wrote:
>> Hello Vinay,
>> 
>> One thing I’ve noticed recently is that I have to click the “build” button 
>> in intellij before I can use the “run” or “debug” buttons. I’m not sure why. 
>> 
>> Welcome to the community!
>> -John
>> 
>> On Fri, Nov 11, 2022, at 02:47, deng ziming wrote:
>> > Hello, Vinay
>> > Kafka uses gradlew as build tool and java/scala as program language,
>> > You can firstly use `./gradlew unitTest` to build it using terminal, 
>> > and reload it in gradle window, sometimes I also change default build 
>> > tool from IDEA to gradle in Preference/Build/build tools/Gradle:
>> >
>> > PastedGraphic-1.tiff
>> >
>> > --
>> > Ziming
>> >
>> >> On Nov 11, 2022, at 13:30, vinay deshpande  wrote:
>> >> 
>> >> Hi All,
>> >> I have a basic question: I tried importing kafka source code into intellij
>> >> but there are bunch of imports that IDE cannot find like these:
>> >> 
>> >> import kafka.api.ApiVersion;
>> >> import kafka.log.CleanerConfig;
>> >> import kafka.log.LogConfig;
>> >> import kafka.log.LogManager;
>> >> 
>> >> 
>> >> TIA.
>> >> 
>> >> Thanks,
>> >> Vinay


Re: Starting out with Kafka

2022-11-11 Thread John Roesler
Hello Vinay,

One thing I’ve noticed recently is that I have to click the “build” button in 
intellij before I can use the “run” or “debug” buttons. I’m not sure why. 

Welcome to the community!
-John

On Fri, Nov 11, 2022, at 02:47, deng ziming wrote:
> Hello, Vinay
> Kafka uses gradlew as build tool and java/scala as program language,
> You can firstly use `./gradlew unitTest` to build it using terminal, 
> and reload it in gradle window, sometimes I also change default build 
> tool from IDEA to gradle in Preference/Build/build tools/Gradle:
>
> PastedGraphic-1.tiff
>
> --
> Ziming
>
>> On Nov 11, 2022, at 13:30, vinay deshpande  wrote:
>> 
>> Hi All,
>> I have a basic question: I tried importing kafka source code into intellij
>> but there are bunch of imports that IDE cannot find like these:
>> 
>> import kafka.api.ApiVersion;
>> import kafka.log.CleanerConfig;
>> import kafka.log.LogConfig;
>> import kafka.log.LogManager;
>> 
>> 
>> TIA.
>> 
>> Thanks,
>> Vinay


Re: [VOTE] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2022-11-10 Thread John Roesler
Thanks for the KIP, Daniel!

I'm no MM expert, but I've read over the KIP and discussion, and it seems 
reasonable to me.

I'm +1 (binding).

Thanks,
-John

On 2022/10/22 07:38:38 Urbán Dániel wrote:
> Hi everyone,
> 
> I would like to start a vote on KIP-710 which aims to support running a 
> dedicated MM2 cluster in distributed mode:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters
> 
> Regards,
> Daniel
> 
> 
> -- 
> Ezt az e-mailt átvizsgálta az Avast AntiVirus szoftver.
> www.avast.com
> 


[jira] [Created] (KAFKA-14364) Support evolving serde with Foreign Key Join

2022-11-07 Thread John Roesler (Jira)
John Roesler created KAFKA-14364:


 Summary: Support evolving serde with Foreign Key Join
 Key: KAFKA-14364
 URL: https://issues.apache.org/jira/browse/KAFKA-14364
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: John Roesler


The current implementation of Foreign-Key join uses a hash comparison to 
determine whether it should emit join results or not. See 
[https://github.com/apache/kafka/blob/807c5b4d282e7a7a16d0bb94aa2cda9566a7cc2d/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionResolverJoinProcessorSupplier.java#L94-L110]

As specified in KIP-213 
([https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable]
 ), we must do a comparison of this nature in order to get correct results when 
the foreign-key reference changes, as the old reference might emit delayed 
results after the new instance generates its updated results, leading to an 
incorrect final join state.

The hash comparison prevents this race condition by ensuring that any emitted 
results correspond to the _current_ version of the left-hand-side record (and 
therefore that the foreign-key reference itself has not changed).

An undesired side-effect of this is that if users update their serdes (in a 
compatible way), for example to add a new optional field to the record, then 
the resulting hash will change for existing records. This will cause Streams to 
stop emitting results for those records until a new left-hand-side update comes 
in, recording a new hash for those records.

It should be possible to provide a fix. Some ideas:
 * only consider the foreign-key references itself in the hash function (this 
was the original proposal, but we opted to hash the entire record as an 
optimization to suppress unnecessary updates).
 * provide a user-overridable hash function. This would be more flexible, but 
also pushes a lot of complexity onto users, and opens up the possibility to 
completely break semantics.

We will need to design the solution carefully so that we can preserve the 
desired correctness guarantee.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-852: Optimize calculation of size for log in remote tier

2022-11-06 Thread John Roesler
Hi Divij,

Thanks for the KIP!

I’ve read through your write-up, and it sounds reasonable to me. 

I’m +1 (binding)

Thanks,
John

On Tue, Nov 1, 2022, at 05:03, Divij Vaidya wrote:
> Hey folks
>
> The discuss thread for this KIP has been open for a few months with no
> concerns being surfaced. I would like to start a vote for the
> implementation of this KIP.
>
> The KIP is available at
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
>
>
> Regards
> Divij Vaidya


Re: [DISCUSS] KIP-872: Add Serializer#serializeToByteBuffer() to reduce memory copying

2022-11-06 Thread John Roesler
Thanks for the reply, ShunKang!

You’re absolutely right, we should not change the behavior of the existing 
method. 

Regarding the new method, I was thinking that this is a good opportunity to 
correct what seems to be strange semantics in the original one. If we keep the 
same semantics and want to correct it later, we’ll be forced to introduce yet 
another method later. This especially makes sense if we’re thinking of 
deprecating the original method. But if you think it’s better to keep it the 
way it is, I’m fine with it. 

I have no other comments.

Thanks again for the KIP,
John

On Sat, Nov 5, 2022, at 11:59, ShunKang Lin wrote:
> Hi John,
>
> Thanks for your comments!
>
> For your first question, I see some unit test cases that give us a
> ByteBuffer not set to read before calling
> `ByteBufferSerializer#serialize(String, ByteBuffer)`, e.g.
> `ArticleSerializer`, `AugmentedArticleSerializer`,
> `AugmentedCommentSerializer` and `CommentSerializer`. If we don't flip the
> ByteBuffer inside the `ByteBufferSerializer#serialize(String, ByteBuffer)`
> it will break user code using `ByteBufferSerializer#serialize(String,
> ByteBuffer)`, and if we don't flip the ByteBuffer inside
> the `ByteBufferSerializer#serializeToByteBuffer(String, ByteBuffer)`, it
> will be even more strange to the user, because
> `ByteBufferSerializer#serialize(String, ByteBuffer)` and
> `ByteBufferSerializer#serializeToByteBuffer(String, ByteBuffer)` require
> users use the ByteBufferSerializer in two different ways. So if we think of
> `ByteBufferSerialize#serializeToByteBuffer(String, ByteBuffer)` as setting
> up a ByteBuffer to read later, is it more acceptable?
>
> For your second question, I plan to ultimately replace byte[] with
> ByteBuffer, I will document the intent in your KIP and JavaDocs later.
>
> I will clarify that if a Serializer implements the new method, then the old
> one will never be called.
>
> Best,
> ShunKang
>
> John Roesler  于2022年11月4日周五 22:42写道:
>
>> Hi ShunKang,
>>
>> Thanks for the KIP!
>>
>> I’ve been wanting to transition toward byte buffers for a while, so this
>> is a nice start.
>>
>> I thought it was a bit weird to flip the buffer inside the serializer, but
>> I see the existing one already does that. I would have thought it would
>> make more sense for the caller to give us a buffer already set up for
>> reading. Do you think it makes sense to adopt this pattern for the new
>> method?
>>
>> Do you plan to keep the new methods as optional indefinitely, or do you
>> plan to ultimately replace byte[] with ByteBuffer? If it’s the latter, then
>> it would be good to document the intent in your KIP and JavaDocs.
>>
>> It would be good to clarify that if a Serializer implements the new
>> method, then the old one will never be called. That way, implementations
>> can just throw an exception on that method instead of implementing both.
>>
>> Thanks again!
>> -John
>>
>> On Wed, Nov 2, 2022, at 20:14, ShunKang Lin wrote:
>> > Bump this thread again : )
>> >
>> > ShunKang Lin 于2022年9月25日 周日23:59写道:
>> >
>> >> Hi all, I'd like to start a new discussion thread on KIP-872 (Kafka
>> >> Client) which proposes that add Serializer#serializeToByteBuffer() to
>> >> reduce memory copying.
>> >>
>> >> KIP:
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228495828
>> >> Thanks, ShunKang
>> >>
>>


Re: [DISCUSS] KIP-872: Add Serializer#serializeToByteBuffer() to reduce memory copying

2022-11-04 Thread John Roesler
Hi ShunKang,

Thanks for the KIP!

I’ve been wanting to transition toward byte buffers for a while, so this is a 
nice start. 

I thought it was a bit weird to flip the buffer inside the serializer, but I 
see the existing one already does that. I would have thought it would make more 
sense for the caller to give us a buffer already set up for reading. Do you 
think it makes sense to adopt this pattern for the new method?

Do you plan to keep the new methods as optional indefinitely, or do you plan to 
ultimately replace byte[] with ByteBuffer? If it’s the latter, then it would be 
good to document the intent in your KIP and JavaDocs.

It would be good to clarify that if a Serializer implements the new method, 
then the old one will never be called. That way, implementations can just throw 
an exception on that method instead of implementing both. 

Thanks again!
-John

On Wed, Nov 2, 2022, at 20:14, ShunKang Lin wrote:
> Bump this thread again : )
>
> ShunKang Lin 于2022年9月25日 周日23:59写道:
>
>> Hi all, I'd like to start a new discussion thread on KIP-872 (Kafka
>> Client) which proposes that add Serializer#serializeToByteBuffer() to
>> reduce memory copying.
>>
>> KIP:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228495828
>> Thanks, ShunKang
>>


Re: [VOTE] KIP-869: Improve Streams State Restoration Visibility

2022-10-20 Thread John Roesler
Thanks for the KIP, Guozhang!

I'm +1 (binding)

-John

On Wed, Oct 12, 2022, at 16:36, Nick Telford wrote:
> Can't wait!
> +1 (non-binding)
>
> On Wed, 12 Oct 2022, 18:02 Guozhang Wang, 
> wrote:
>
>> Hello all,
>>
>> I'd like to start a vote for the following KIP, aiming to improve Kafka
>> Stream's restoration visibility via new metrics and callback methods:
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility
>>
>>
>> Thanks!
>> -- Guozhang
>>


Re: [ANNOUNCE] New committer: Deng Ziming

2022-10-10 Thread John Roesler
Congratulations, Ziming!

On Mon, Oct 10, 2022, at 12:05, Justine Olshan wrote:
> Congratulations Ziming! I'll always remember the help you provided with
> topic IDs.
> Very well deserved!
>
> On Mon, Oct 10, 2022 at 9:53 AM Matthew Benedict de Detrich
>  wrote:
>
>> Congratulations!
>>
>> On Mon, 10 Oct 2022, 11:30 Jason Gustafson, 
>> wrote:
>>
>> > Hi All
>> >
>> > The PMC for Apache Kafka has invited Deng Ziming to become a committer,
>> > and we are excited to announce that he has accepted!
>> >
>> > Ziming has been contributing to Kafka for about three years. He has
>> > authored
>> > more than 100 patches and helped to review nearly as many. In particular,
>> > he made significant contributions to the KRaft project which had a big
>> part
>> > in reaching our production readiness goal in the 3.3 release:
>> > https://blogs.apache.org/kafka/entry/what-rsquo-s-new-in.
>> >
>> > Please join me in congratulating Ziming! Thanks for all of your
>> > contributions!
>> >
>> > -- Jason, on behalf of the Apache Kafka PMC
>> >
>>


Re: [VOTE] 3.3.1 RC0

2022-09-30 Thread John Roesler
Hi José,

I verified the signatures and ran all the unit tests, as well as the Streams 
integration tests with:

> ./gradlew -version
> 
> 
> Gradle 7.4.2
> 
> 
> Build time:   2022-03-31 15:25:29 UTC
> Revision: 540473b8118064efcc264694cbcaa4b677f61041
> 
> Kotlin:   1.5.31
> Groovy:   3.0.9
> Ant:  Apache Ant(TM) version 1.10.11 compiled on July 10 2021
> JVM:  1.8.0_342 (Private Build 25.342-b07)
> OS:   Linux 5.15.0-48-generic amd64

I'm +1 (binding), pending system test results.

Thanks,
-John

On Fri, Sep 30, 2022, at 11:46, Bill Bejeck wrote:
> Hi,
>
> I did the following to validate the release:
>
>1. Validated all checksums, signatures
>2. Built from source and ran all the unit tests
>3. Ran the ZK and KRaft quickstart
>4. Ran the Raft single quorum test
>5. Ran the Kafka Streams quick start
>
> +1(binding) pending successful system test run
>
> Thanks,
> Bill
>
> On Fri, Sep 30, 2022 at 5:30 AM David Jacot 
> wrote:
>
>> Hey,
>>
>> I performed the following validations:
>> * Verified all checksums and signatures.
>> * Built from source and ran unit tests.
>> * Ran the first quickstart steps for both ZK and KRaft.
>> * Spotchecked the Javadocs.
>>
>> I am +1 (binding), assuming that the system tests look good.
>>
>> Thanks for running the release.
>>
>> Best,
>> David
>>
>> On Fri, Sep 30, 2022 at 2:23 AM José Armando García Sancio
>>  wrote:
>> >
>> > On Thu, Sep 29, 2022 at 2:39 PM José Armando García Sancio
>> >  wrote:
>> > > Please download, test and vote by Tuesday, October 4, 9am PT.
>> >
>> > The vote will be open for 72 hours. Please vote by Sunday, October 2nd,
>> 3 PM PT.
>> >
>> > Thanks!
>> > --
>> > -José
>>


Re: [VOTE] 3.3.0 RC2

2022-09-26 Thread John Roesler
Thanks for running this, David!

I've verified the signatures, looked at the docs, and run the quickstart (ZK 
and KRaft). I also ran the unit tests, as well as all the tests for Streams 
locally.

The docs look a little malformed (the "collapse/expand" button floats over the 
text, the collapsed doc tree is only halfway collapsed, and there's a weird 
empty panel on the right).

We can fix the docs site independent of this release, so I'm +1 (binding).

Thanks,
-John

On Tue, Sep 20, 2022, at 18:17, David Arthur wrote:
> Hello Kafka users, developers and client-developers,
>
> This is the second release candidate for Apache Kafka 3.3.0. Many new
> features and bug fixes are included in this major release of Kafka. A
> significant number of the issues in this release are related to KRaft,
> which will be considered "production ready" as part of this release
> (KIP-833)
>
> KRaft improvements:
> * KIP-778: Online KRaft to KRaft Upgrades
> * KIP-833: Mark KRaft as Production Ready
> * KIP-835: Monitor Quorum health (many new KRaft metrics)
> * KIP-836: Expose voter lag via kafka-metadata-quorum.sh
> * KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft
> * KIP-859: Add Metadata Log Processing Error Related Metrics
>
> Other major improvements include:
> * KIP-618: Exactly-Once Support for Source Connectors
> * KIP-831: Add metric for log recovery progress
> * KIP-827: Expose logdirs total and usable space via Kafka API
> * KIP-834: Add ability to Pause / Resume KafkaStreams Topologies
>
> The full release notes are available here:
> https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/RELEASE_NOTES.html
>
> Please download, test and vote by Monday, Sep 26 at 5pm EDT
>
> Also, huge thanks to José for running the release so far. He has done
> the vast majority of the work to prepare this rather large release :)
>
> -
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc: https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/javadoc/
>
> * Tag to be voted upon (off 3.3 branch) is the 3.3.0 tag:
> https://github.com/apache/kafka/releases/tag/3.3.0-rc2
>
> * Documentation:  https://kafka.apache.org/33/documentation.html
>
> * Protocol: https://kafka.apache.org/33/protocol.html
>
>
>
>
> Successful Jenkins builds to follow in a future update to this email.
>
>
> Thanks!
> David Arthur


[jira] [Created] (KAFKA-14254) Format timestamps in assignor logs as dates instead of integers

2022-09-21 Thread John Roesler (Jira)
John Roesler created KAFKA-14254:


 Summary: Format timestamps in assignor logs as dates instead of 
integers
 Key: KAFKA-14254
 URL: https://issues.apache.org/jira/browse/KAFKA-14254
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: John Roesler


This is a follow-on task from [https://github.com/apache/kafka/pull/12582]

There is another log line that prints the same timestamp: "Triggering the 
followup rebalance scheduled for ...", which should also be printed as a 
date/time in the same manner as PR 12582.

We should also search the codebase a little to see if we're printing timestamps 
in other log lines that would be better off as date/times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14253) StreamsPartitionAssignor should print the member count in assignment logs

2022-09-21 Thread John Roesler (Jira)
John Roesler created KAFKA-14253:


 Summary: StreamsPartitionAssignor should print the member count in 
assignment logs
 Key: KAFKA-14253
 URL: https://issues.apache.org/jira/browse/KAFKA-14253
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: John Roesler


Debugging rebalance and assignment issues is harder than it needs to be. One 
simple thing that can help is to print out information in the logs that users 
have to compute today.

For example, the StreamsPartitionAssignor prints two messages that contain the 
the newline-delimited group membership:
{code:java}
[StreamsPartitionAssignor] [...-StreamThread-1] stream-thread 
[...-StreamThread-1-consumer] All members participating in this rebalance:

: []

: []

: []{code}
and
{code:java}
[StreamsPartitionAssignor] [...-StreamThread-1] stream-thread 
[...-StreamThread-1-consumer] Assigned tasks [...] including stateful [...] to 
clients as:

=[activeTasks: ([...]) standbyTasks: ([...])]

=[activeTasks: ([...]) standbyTasks: ([...])]

=[activeTasks: ([...]) standbyTasks: ([...])
{code}
 

In both of these cases, it would be nice to:
 # Include the number of members in the group (I.e., "15 members participating" 
and "to 15 clients as")
 # sort the member ids (to help compare the membership and assignment across 
rebalances)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-821: Connect Transforms support for nested structures

2022-09-14 Thread John Roesler
Thanks, all!

I've reviewed the current state of the KIP, and I'm still +1 (binding).

Thanks,
-John

On Fri, Sep 2, 2022, at 12:03, Chris Egerton wrote:
> +1 (binding). Thanks Jorge, great stuff!
>
> We should probably verify with the people that have already cast +1 votes
> that they're still on board, since the design has shifted a bit since the
> last vote was casted.
>
> On 2022/06/28 20:42:14 Jorge Esteban Quilcate Otoya wrote:
>> Hi everyone,
>>
>> I'd like to bump this vote thread. Currently it's missing 1 +1 binding
> vote
>> to pass (2 +1 binding, 1 +1 non-binding).
>>
>> There has been additional discussions to consider array access and
>> deep-scan (similar to JsonPath) but hasn't been included as part of this
>> KIP.
>> The only minor change since the previous votes has been the change of
>> configuration name: from `field.style` to `field.syntax.version`.
>>
>> KIP:
>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
>>
>> Cheers,
>> Jorge.
>>
>> On Fri, 22 Apr 2022 at 00:01, Bill Bejeck  wrote:
>>
>> > Thanks for the KIP, Jorge.
>> >
>> > This looks like a great addition to Kafka Connect.
>> >
>> > +1(binding)
>> >
>> > -Bill
>> >
>> > On Thu, Apr 21, 2022 at 6:41 PM John Roesler  wrote:
>> >
>> > > Thanks for the KIP, Jorge!
>> > >
>> > > I’ve just looked over the KIP, and it looks good to me.
>> > >
>> > > I’m +1 (binding)
>> > >
>> > > Thanks,
>> > > John
>> > >
>> > > On Thu, Apr 21, 2022, at 09:10, Chris Egerton wrote:
>> > > > This is a worthwhile addition to the SMTs that ship out of the box
> with
>> > > > Kafka Connect. +1 non-binding
>> > > >
>> > > > On Thu, Apr 21, 2022, 09:51 Jorge Esteban Quilcate Otoya <
>> > > > quilcate.jo...@gmail.com> wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> I'd like to start a vote on KIP-821:
>> > > >>
>> > > >>
>> > >
>> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
>> > > >>
>> > > >> Thanks,
>> > > >> Jorge
>> > > >>
>> > >
>> >
>>


Re: [VOTE] KIP-862: Self-join optimization for stream-stream joins

2022-09-12 Thread John Roesler
Thanks for the updates, Vicky!

I've reviewed the KIP and your POC PR,
and I'm +1 (binding).

Thanks!
-John

On Mon, Sep 12, 2022, at 09:13, Vasiliki Papavasileiou wrote:
> Hey Guozhang,
>
> Great suggestion, I made the change.
>
> Best,
> Vicky
>
> On Fri, Sep 9, 2022 at 10:43 PM Guozhang Wang  wrote:
>
>> Thanks Vicky, that reads much clearer now.
>>
>> Just regarding the value string name itself: "self.join" may be confusing
>> compared to other values that people would think before this config is
>> enabled, self-join are not allowed at all. Maybe we can rename it to
>> "single.store.self.join"?
>>
>> Guozhang
>>
>> On Fri, Sep 9, 2022 at 2:15 AM Vasiliki Papavasileiou
>>  wrote:
>>
>> > Hey Guozhang,
>> >
>> > Ah it seems my text was not very clear :)
>> > With "TOPOLOGY_OPTIMIZATION_CONFIG will be extended to accept a list of
>> > optimization rule configs" I meant that it will accept the new value
>> > strings for each optimization rule. Let me rephrase that in the KIP to
>> make
>> > it clearer.
>> > Is it better now?
>> >
>> > Best,
>> > Vicky
>> >
>> > On Thu, Sep 8, 2022 at 9:07 PM Guozhang Wang  wrote:
>> >
>> > > Thanks Vicky,
>> > >
>> > > I read through the KIP again and it looks good to me. Just a quick
>> > question
>> > > regarding the public config changes: you mentioned "No public
>> interfaces
>> > > will be impacted. The config TOPOLOGY_OPTIMIZATION_CONFIG will be
>> > extended
>> > > to accept a list of optimization rule configs in addition to the global
>> > > values "all" and "none" . But there are no new value strings mentioned
>> in
>> > > this KIP, so that means we will apply this optimization only when `all`
>> > is
>> > > specified in the config right?
>> > >
>> > >
>> > > Guozhang
>> > >
>> > >
>> > > On Thu, Sep 8, 2022 at 12:02 PM Vasiliki Papavasileiou
>> > >  wrote:
>> > >
>> > > > Hello everyone,
>> > > >
>> > > > I'd like to open the vote for KIP-862, which proposes to optimize
>> > > > stream-stream self-joins by using a single state store for the join.
>> > > >
>> > > > The proposal is here:
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-862%3A+Self-join+optimization+for+stream-stream+joins
>> > > >
>> > > > Thanks to all who reviewed the proposal, and thanks in advance for
>> > taking
>> > > > the time to vote!
>> > > >
>> > > > Thank you,
>> > > > Vicky
>> > > >
>> > >
>> > >
>> > > --
>> > > -- Guozhang
>> > >
>> >
>>
>>
>> --
>> -- Guozhang
>>


[jira] [Created] (KAFKA-14202) IQv2: Expose binary store schema to store implementations

2022-09-06 Thread John Roesler (Jira)
John Roesler created KAFKA-14202:


 Summary: IQv2: Expose binary store schema to store implementations
 Key: KAFKA-14202
 URL: https://issues.apache.org/jira/browse/KAFKA-14202
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: John Roesler


One feature of IQv2 is that store implementations can handle custom queries. 
Many custom query handlers will need to process the key or value bytes, for 
example deserializing them to implement some filter or aggregations, or even 
performing binary operations on them.

For the most part, this should be straightforward for users, since they provide 
Streams with the serdes, the store implementation, and the custom queries.

However, Streams will sometimes pack extra data around the data produced by the 
user-provided serdes. For example, the Timestamped store wrappers add a 
timestamp on the beginning of the value byte array. And in Windowed stores, we 
add window timestamps to the key bytes.

It would be nice to have some generic mechanism to communicate those schemas to 
the user-provided inner store layers to support users who need to write custom 
queries. For example, perhaps we can add an extractor class to the state store 
context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-844: Transactional State Stores

2022-09-01 Thread John Roesler
Thanks for the KIP, Alex!

+1 (binding) from me. 

-John

On Thu, Sep 1, 2022, at 09:51, Guozhang Wang wrote:
> +1, thanks Alex!
>
> On Thu, Sep 1, 2022 at 6:33 AM Bruno Cadonna  wrote:
>
>> Thanks for the KIP!
>>
>> +1 (binding)
>>
>> Best,
>> Bruno
>>
>> On 01.09.22 15:26, Colt McNealy wrote:
>> > +1
>> >
>> > Hi Alex,
>> >
>> > Thank you for your work on the KIP. I'm not a committer so my vote is
>> > non-binding but I strongly support this improvement.
>> >
>> > Thank you,
>> > Colt McNealy
>> > *Founder, LittleHorse.io*
>> >
>> >
>> > On Thu, Sep 1, 2022 at 8:20 AM Alexander Sorokoumov
>> >  wrote:
>> >
>> >> Hi All,
>> >>
>> >> I would like to start a voting thread on KIP-844, which introduces
>> >> transactional state stores to avoid wiping local state on crash failure
>> >> under EOS.
>> >>
>> >> KIP:
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-844%3A+Transactional+State+Stores
>> >> Discussion thread:
>> >> https://lists.apache.org/thread/4vc18t0o2wsk0n235dd4pd1hlr1p6gm2
>> >> Jira: https://issues.apache.org/jira/browse/KAFKA-12549
>> >>
>> >> Best,
>> >> Alex
>> >>
>> >
>>
>
>
> -- 
> -- Guozhang


Re: Hosting Kafka Videos on ASF YouTube channel

2022-08-25 Thread John Roesler
Thanks all,

I’m also +1 on the Kafka Streams videos. 

Thanks,
John

On Tue, Aug 9, 2022, at 03:54, Mickael Maison wrote:
> Hi,
>
> I checked the four Streams videos
> (https://kafka.apache.org/32/documentation/streams/), they are good
> and don't mention any vendors.
> +1 (binding) for these four videos
>
> For the last video (https://kafka.apache.org/intro and
> https://kafka.apache.org/quickstart) we will have to wait till the
> intro is edited.
>
> Thanks,
> Mickael
>
>
> On Mon, Aug 8, 2022 at 11:12 PM Joe Brockmeier  wrote:
>>
>> Repurpose away. Thanks!
>>
>> On Mon, Aug 8, 2022 at 4:55 PM Bill Bejeck  wrote:
>> >
>> > Hi Joe,
>> >
>> > Thanks that works for me. As for you watching the videos, they are about 
>> > 10 minutes each, and you can watch them at 1.5 - 1.75 playback speed.
>> >
>> > If it's ok with you, I'm going to repurpose this thread as a voting thread 
>> > for the videos.
>> >
>> > I watched the Kafka Streams videos on 
>> > https://kafka.apache.org/32/documentation/streams/, and I can confirm they 
>> > are vendor-neutral.
>> > The other videos and logo that show up at the end are coming from YouTube, 
>> > so once move the videos to the ASF channel, that should go away.
>> >
>> > +1(binding).
>> >
>> > Thanks,
>> > Bill
>> >
>> >
>> >
>> > On Mon, Aug 8, 2022 at 9:46 AM Joe Brockmeier  wrote:
>> >>
>> >> If we can get a +1 from the PMC on each video that they're happy that
>> >> the videos are vendor neutral I think we can do that. I'll also need
>> >> to view them as well. I hope they're not long videos. :-)
>> >>
>> >> On Tue, Aug 2, 2022 at 3:38 PM Bill Bejeck  wrote:
>> >> >
>> >> > Hi Joe,
>> >> >
>> >> > Yes, that is correct.  Sorry, I should have mentioned that in the 
>> >> > original email.  That is the only video where Tim says that.
>> >> > The Kafka Streams videos do not mention Confluent.
>> >> >
>> >> > We're currently pursuing editing the video to remove the "from 
>> >> > Confluent" part.
>> >> > Note that the site also uses the same video on the "quickstart" page, 
>> >> > so both places will be fixed when editing is completed.
>> >> >
>> >> > Can we pursue hosting the Kafka Streams videos for now, then revisit 
>> >> > the "What is Apache Kafka?" when the editing is done?
>> >> >
>> >> > Thanks,
>> >> > Bill
>> >> >
>> >> >
>> >> > On Tue, Aug 2, 2022 at 3:12 PM Joe Brockmeier  wrote:
>> >> >>
>> >> >> Hi Bill,
>> >> >>
>> >> >> I'm not sure changing hosting would quite solve the problem. The first
>> >> >> video I see on this page:
>> >> >>
>> >> >> https://kafka.apache.org/intro
>> >> >>
>> >> >> Starts with "Hi, I'm Bill Berglund from *Confluent*" rather than "Hi,
>> >> >> I'm Bill from Apache Kafka" -- so moving to the ASF Youtube channel
>> >> >> wouldn't completely solve the problem.
>> >> >>
>> >> >> On Tue, Aug 2, 2022 at 3:05 PM Bill Bejeck  wrote:
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > I am an Apache Kafka® committer and PMC member, and I'm working on 
>> >> >> > our site to address some issues around our embedded videos and 
>> >> >> > branding.
>> >> >> >
>> >> >> > The Kafka site has six embedded videos:  
>> >> >> > https://kafka.apache.org/intro, https://kafka.apache.org/quickstart, 
>> >> >> > and four videos on 
>> >> >> > https://kafka.apache.org/32/documentation/streams/.
>> >> >> >
>> >> >> > The videos are hosted on the Confluent YouTube channel, so the 
>> >> >> > branding on the video is from Confluent.  Since it's coming from 
>> >> >> > YouTube, there's no way to change it.
>> >> >> >
>> >> >> > Would it be possible to upload these videos to the Apache Foundation 
>> >> >> > YouTube channel 
>> >> >> > (https://www.youtube.com/c/TheApacheFoundation/featured)?  Doing 
>> >> >> > this would automatically change the branding to Apache.
>> >> >> >
>> >> >> > Thanks, and I look forward to working with you on this matter.
>> >> >> >
>> >> >> > Bill Bejeck
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Joe Brockmeier
>> >> >> Vice President Marketing & Publicity
>> >> >> j...@apache.org
>> >>
>> >>
>> >>
>> >> --
>> >> Joe Brockmeier
>> >> Vice President Marketing & Publicity
>> >> j...@apache.org
>>
>>
>>
>> --
>> Joe Brockmeier
>> Vice President Marketing & Publicity
>> j...@apache.org


Re: [DISCUSS] KIP-862: Implement self-join optimization

2022-08-12 Thread John Roesler
Thanks for the KIP, Vicky!

Re 1/2, I agree with what you both worked out. 

Re 3: It sounds like you were able to preserve backward compatibility, so I 
don’t think you need to add any new configs. I think you can just switch it on 
if people specify “all”. 

Thanks!
-John


On Thu, Aug 11, 2022, at 11:27, Guozhang Wang wrote:
> Thanks Vicky for your reply!
>
> Re 1/2): I think you have a great point here to adhere with the existing
> implementation, I'm convinced. In that case we do not need to consider
> left/outer-joins, and hence do not need to worry about the extra store in
> the impl.
>
> Re 3): I'm curious how the compatibility is preserved since with
> optimizations turned on, we would use fewer stores and hence the store name
> suffixes would change. In your experiment did you specifically specify the
> store names, e.g. via Materialized? I'd be glad if it turns out to really
> be conveniently backward compatible, and rest with my concerns :)
>
>
> Guozhang
>
> On Thu, Aug 11, 2022 at 4:44 AM Vasiliki Papavasileiou
>  wrote:
>
>> Hi Guozhang,
>>
>> Thank you very much for your comments.
>>
>> Regarding 1: the extra state store is only needed in outer joins since
>> that's the only case we have non-joining records that would need to get
>> emitted when the window closes, right? If we do decide to go with an
>> outer-join implementation, I will make sure to have the extra state store
>> as well. Thank you for pointing it out.
>>
>> Regarding 2: As the self-join is only a physical optimization over an inner
>> join whose two arguments are the same entity, it should return the same
>> results as the inner join. We wouldn't want a user upgrading and enabling
>> the optimization to suddenly see that their joins behave differently and
>> produce different results.
>> As an example, consider the records  and  where A is the key and
>> the number is the value and both are strings. Assume these records are
>> piped into an input topic. And assume we have a self-join (not optimized,
>> so inner join implementation) whose joiner concatenates the values.
>> The output of the join after processing the first record is : .
>> The output of the join after processing the second record is: ,
>> , 
>> So, for an inner join whose two arguments are the same stream, a record
>> does join with itself. And as a user, I would expect the self-join
>> optimization to produce the same results. What do you think?
>>
>> Regarding 3: I did a small experiment and I think the changes I did are
>> backwards compatible. Basically, I created a topology without the
>> optimization, had it process some data and killed it. Then I started it
>> again but with the optimization turned on, and the processing resumed fine
>> as in there was no exception and no extra state stores created and the join
>> results made sense. The optimization is keeping the same state store and
>> doesn't change the names or indices of nodes in the topology. I will
>> however need to add a case for self-joins in the upgrade system tests to
>> make sure that things don't break. Is this sufficient?
>> Regarding the config, one way to go would be to have one config per
>> optimization but I am worried that this will get unwieldy if in the future
>> we have a lot of them and also requires the user to know about the
>> optimizations to be able to benefit from them. Another alternative is to
>> assume that if the TOPOLOGY_OPTIMIZATION_CONFIG is on (`all`), then all
>> optimizations are applied. If the user doesn't want a specific
>> optimization, then they need to turn that one off. So, we will have a
>> config per optimization but they will be on by default.
>>
>> Best,
>> Vicky
>>
>> On Tue, Aug 9, 2022 at 7:03 PM Guozhang Wang  wrote:
>>
>> > Hello Vicky,
>> >
>> > Thanks for the KIP! I made a quick pass and here are some quick thoughts:
>> >
>> > 1. Store Implementation: this may be not directly related to the KIP
>> itself
>> > since its all internal, but the stream-stream join state store
>> > implementation has been changed in
>> > https://issues.apache.org/jira/browse/KAFKA-10847, in which we added a
>> > separate store to maintain all the records that have not found a match
>> yet,
>> > and would emit them when time passed for left/outer joins. In this
>> > optimization, I think we can still go with a single store but we need to
>> > make sure we do not regress on KAFKA-10847, i.e. for records not finding
>> a
>> > match, we should also emit them when time passed by, this would likely
>> rely
>> > on the ability to range-over the only store on its "expired" records. A
>> > good reference would be in the recent works to allow emitting final for
>> > windowed aggregations (cc @Hao Li  who can provide
>> > some more references).
>> >
>> > 2. Join Semantics and Outer-Joins: I think we need to clarify for any
>> > single stream record, would itself also be considered a "match" for
>> itself,
>> > OR should we consider only a different record but with the same key

Re: [VOTE] KIP-837 Allow MultiCasting a Result Record.

2022-08-12 Thread John Roesler
Thanks, Sagar!

I’m +1 (binding)

Can you add a short explanation to each rejected alternative? I was wondering 
why we wouldn’t provide an overloaded to()/addSink() (the first rejected 
alternative), and I had to look back at the Streams code to see that they both 
already accept the partitioner (I thought it was a config). 

Thanks!
-John

On Tue, Aug 9, 2022, at 13:44, Walker Carlson wrote:
> +1 (non binding)
>
> Walker
>
> On Tue, May 31, 2022 at 4:44 AM Sagar  wrote:
>
>> Hi All,
>>
>> I would like to start a voting thread on
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883356
>> .
>>
>> I am just starting this as the discussion thread has been open for 10+
>> days. In case there are some comments, we can always discuss them over
>> there.
>>
>> Thanks!
>> Sagar.
>>


Re: [DISCUSS] Website changes required for Apache projects

2022-07-28 Thread John Roesler
gt; >> as it aligns with what we see now on the Apache® Software Foundation
>> > page.
>> > > >>
>> > > >>
>> > > >> Regarding the branding, that's not in the video file itself but
>> comes
>> > > >> from YouTube and the video's channel.
>> > > >>
>> > > >> I propose that we host the video on the Apache YouTube
>> > > >> <https://www.youtube.com/c/TheApacheFoundation/featured> channel,
>> and
>> > > >> that would take care of the branding issue.
>> > > >>
>> > > >>
>> > > >> What do you think?
>> > > >>
>> > > >>
>> > > >> On Thu, Jul 21, 2022 at 4:19 AM Divij Vaidya <
>> divijvaidy...@gmail.com
>> > >
>> > > >> wrote:
>> > > >>
>> > > >>> Thanks for chiming in with your opinions John/Mickael.
>> > > >>>
>> > > >>> The current set of videos are very helpful and removing them might
>> > be a
>> > > >>> disservice to our users. The most ideal solution would be to host
>> the
>> > > >>> videos on Apache servers without any branding. Another less than
>> > ideal
>> > > >>> solution would be to host a repository of links to educational
>> > content on
>> > > >>> our website.
>> > > >>>
>> > > >>> As for the next steps, I am going to do the following which would
>> > help us
>> > > >>> get answers on whether solution 1 or solution 2 is more feasible.
>> > Please
>> > > >>> let me know if you think we need to do something different here.
>> > > >>> 1. Reach out to ASF legal and ask what permissions/licence would we
>> > > >>> require
>> > > >>> from the video owners to host the videos ourselves.
>> > > >>> 2. Reach out to ASF community mailing list
>> > > >>> <
>> > > >>>
>> > https://www.apache.org/foundation/mailinglists.html#foundation-community
>> > > >>> >
>> > > >>> and ask how other communities are hosting educational content.
>> > > >>>
>> > > >>> There is still an open question about how we decide what content
>> gets
>> > > >>> added
>> > > >>> and what doesn't. I would propose that the model should be the same
>> > as
>> > > >>> accepting code changes i.e. it goes through a community review
>> > requiring
>> > > >>> votes committers/PMC members.
>> > > >>>
>> > > >>> Regards,
>> > > >>> Divij Vaidya
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Thu, Jul 21, 2022 at 3:57 AM John Roesler 
>> > > >>> wrote:
>> > > >>>
>> > > >>> > Hi all,
>> > > >>> >
>> > > >>> > Yes, thanks Divij for driving this!
>> > > >>> >
>> > > >>> > I tend to agree with Mickael about having vendor branding
>> > > >>> > front-and-center like that.
>> > > >>> >
>> > > >>> > On the other hand, I think the video itself is quite nice, and
>> > > >>> > it's a good thing to put in front of newcomers for a human
>> > > >>> > introduction to the project.
>> > > >>> >
>> > > >>> > I took a look at the video on those pages, and I'm not sure
>> > > >>> > if the videos themselves are branded. It looks like the branding
>> > > >>> > marks are markup that YouTube pastes on top of the video.
>> > > >>> >
>> > > >>> > Perhaps a solution is for Kafka to set up a channel of our own
>> > > >>> > and upload the videos there? Or maybe just host the videos
>> > > >>> > as static resources on our site directly? Approaches like those
>> > > >>> > are  probably good policy anyway, because then we
>> > > >>> > would control the content that shows on our site.
>> > > >>> >
>> > > >>> > Thank

Re: [DISCUSS] Website changes required for Apache projects

2022-07-21 Thread John Roesler
Thanks, Divij,

I'm glad you reached out to those two groups. I'm curious to see what
they say.

In response to your comment:
> There is still an open question about how we decide what content gets added
> and what doesn't. I would propose that the model should be the same as
> accepting code changes i.e. it goes through a community review requiring
> votes committers/PMC members.

It seems to me that your proposal is already what we have in place. All
website changes, whether adding images, text, videos, or anything else,
are sent via Github pull request, and then reviewed and merged by
committers in accordance with our project guidelines.

Git-blame shows that this particular video was added to the site as
part of the redesign in https://github.com/apache/kafka-site/pull/269 .
That PR was open for two months, during which time any community
member could have reviewed and commented if they objected to any
of the content, including the video. In fact, the author included rendered
screenshots that include the branding on the PR.

Personally, I would just treat this like any bug. I think we can acknowledge
that it was a miss on the part of the author and reviewer to allow that
branding on the page, but it should be sufficient just to fix the bug. The
fact that one problem slips through code review and ships to prod only
to be noticed two years later doesn't necessarily mean we need to create
new policies and procedures.

Just to be clear, I do agree that this branding is a problem, and I'm grateful
that Mickael noticed it, and I'm supportive of fixing it by moving the video
to an AK-owned channel or self-hosting the video. I just don't want to
over-rotate by proposing a new approval process for website changes in
response to an isolated incident.

Thanks,
-John

On Thu, Jul 21, 2022, at 03:18, Divij Vaidya wrote:
> Thanks for chiming in with your opinions John/Mickael.
>
> The current set of videos are very helpful and removing them might be a
> disservice to our users. The most ideal solution would be to host the
> videos on Apache servers without any branding. Another less than ideal
> solution would be to host a repository of links to educational content on
> our website.
>
> As for the next steps, I am going to do the following which would help us
> get answers on whether solution 1 or solution 2 is more feasible. Please
> let me know if you think we need to do something different here.
> 1. Reach out to ASF legal and ask what permissions/licence would we require
> from the video owners to host the videos ourselves.
> 2. Reach out to ASF community mailing list
> <https://www.apache.org/foundation/mailinglists.html#foundation-community>
> and ask how other communities are hosting educational content.
>
> There is still an open question about how we decide what content gets added
> and what doesn't. I would propose that the model should be the same as
> accepting code changes i.e. it goes through a community review requiring
> votes committers/PMC members.
>
> Regards,
> Divij Vaidya
>
>
>
> On Thu, Jul 21, 2022 at 3:57 AM John Roesler  wrote:
>
>> Hi all,
>>
>> Yes, thanks Divij for driving this!
>>
>> I tend to agree with Mickael about having vendor branding
>> front-and-center like that.
>>
>> On the other hand, I think the video itself is quite nice, and
>> it's a good thing to put in front of newcomers for a human
>> introduction to the project.
>>
>> I took a look at the video on those pages, and I'm not sure
>> if the videos themselves are branded. It looks like the branding
>> marks are markup that YouTube pastes on top of the video.
>>
>> Perhaps a solution is for Kafka to set up a channel of our own
>> and upload the videos there? Or maybe just host the videos
>> as static resources on our site directly? Approaches like those
>> are  probably good policy anyway, because then we
>> would control the content that shows on our site.
>>
>> Thanks,
>> John
>>
>> On Tue, Jul 19, 2022, at 11:48, Mickael Maison wrote:
>> > Hi Divij,
>> >
>> > Thanks for leading this work.
>> >
>> > To be honest I'm not sure what to do with the videos. I'm actually
>> > wondering if these videos should be on our website at all.
>> >
>> > My concerns is that they are branded. I find the content of the videos
>> > very good but I don't think we should include branded content from
>> > vendors on the Apache website, or at least not put it front and
>> > center. This is literally the first thing we show to newcomers,
>> > there's one at the top of both the Intro
>> > (https://kaf

Re: [DISCUSS] Website changes required for Apache projects

2022-07-20 Thread John Roesler
Hi all,

Yes, thanks Divij for driving this!

I tend to agree with Mickael about having vendor branding
front-and-center like that.

On the other hand, I think the video itself is quite nice, and
it's a good thing to put in front of newcomers for a human
introduction to the project.

I took a look at the video on those pages, and I'm not sure
if the videos themselves are branded. It looks like the branding
marks are markup that YouTube pastes on top of the video.

Perhaps a solution is for Kafka to set up a channel of our own
and upload the videos there? Or maybe just host the videos
as static resources on our site directly? Approaches like those
are  probably good policy anyway, because then we
would control the content that shows on our site.

Thanks,
John

On Tue, Jul 19, 2022, at 11:48, Mickael Maison wrote:
> Hi Divij,
>
> Thanks for leading this work.
>
> To be honest I'm not sure what to do with the videos. I'm actually
> wondering if these videos should be on our website at all.
>
> My concerns is that they are branded. I find the content of the videos
> very good but I don't think we should include branded content from
> vendors on the Apache website, or at least not put it front and
> center. This is literally the first thing we show to newcomers,
> there's one at the top of both the Intro
> (https://kafka.apache.org/intro) and quickstart
> (https://kafka.apache.org/quickstart) pages.
>
> If tomorrow another vendor was to open a PR adding their videos to
> other pages, would we allow that? I searched the archives and couldn't
> find a discussion about adding branded third party content to the
> website. If I missed that, please share a link, otherwise I think this
> should be discussed.
>
> Thanks,
> Mickael
>
> On Tue, Jul 19, 2022 at 4:08 PM Divij Vaidya  wrote:
>>
>> Hi community
>>
>> We have managed to fix most of the required items, thanks to Mickael
>> Maison, Luke Chen and Tom Bentley for quick reviews.
>>
>> But we still need to talk about item #5 i.e. the problem with "Embedded
>> videos don't have an image placeholder". Quoting from the ASF guidelines:
>> *Can I embed videos (from YouTube, Vimeo, etc.)?*
>>
>> *Yes, you can embed videos on the website, but they should load only after
>> the user actively wants them to load. Arrange this by showing a placeholder
>> image first and loading the video after the user clicks on the image. Make
>> it clear that users who click the image will load a video from a third
>> party.*
>>
>> *If you don’t want placeholder images, consider self-hosted videos and
>> using an open source player like Plyr .*
>>
>> We seem to have two options:
>> 1. Replace videos on the website with links to the videos OR
>> 2. Take a placeholder image and use JS to trigger playback after the user
>> clicks.
>>
>> I would suggest going with option#1 right now due to time constraints and
>> create a ticket to do (more user friendly) option#2 in the future.* What do
>> you think?*
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Wed, Jul 13, 2022 at 5:10 PM Divij Vaidya 
>> wrote:
>>
>> > Hello Apache Kafka community
>> >
>> > The ASF has a new data privacy policy to comply with the GDPR (the
>> > European Union's General Data Protection Regulation) and we - like all
>> > other ASF projects - have been asked to update our project homepage
>> > accordingly.
>> >
>> > Mickael Maison has kindly traged the initial set of requirements and
>> > listed down the required set of changes at
>> > https://issues.apache.org/jira/browse/KAFKA-13868.
>> >
>> > I would like to bring your attention to a few PRs that address the
>> > required changes and also solicit your comments on how I plan to solve
>> > others.
>> >
>> > 1. Our website is missing privacy policy -> Addressed by adding an item in
>> > the top nav bar https://github.com/apache/kafka-site/pull/421. *Action -
>> > please review the PR.*
>> > 2. It's using Google Analytics -> I would propose that we should get rid
>> > of Google Analytics in favor of Apache recommended Matomo
>> >  for website analytics.
>> > If you folks agree, I would request a Matomo site ID for Apache Kafka to
>> > make the required changes.
>> > *Action - do you agree to this change?*3. It's using Google Fonts -> I
>> > have moved the Google fonts to a self hosted version which is acceptable by
>> > Apache in the PR https://github.com/apache/kafka-site/pull/420.
>> > *Action - please review the PR. *4. It's using scripts hosted on
>> > Cloudflare CDN -> We use JS scripts such as handlebars
>> >  and prism
>> > . Both these libraries are MIT licensed and hence,
>> > could be hosted locally along with the website. I will move them along to
>> > be placed along with the website.
>> > *Action - do you agree to this change?*5. Embedded videos don't have an
>> > image placeholder -> I don't have a proposed solution f

Re: [VOTE] KIP-851: : Add requireStable flag into ListConsumerGroupOffsetsOptions

2022-06-30 Thread John Roesler
Thanks for the KIP, Guozhang!

I’m +1 (binding)

-John

On Thu, Jun 30, 2022, at 21:17, deng ziming wrote:
> Thanks for this KIP,
> we have a kafka-consumer-groups.sh shell which is based on the API you 
> proposed to change, is it worth update it as well?
>
> --
> Best,
> Ziming
>
>> On Jul 1, 2022, at 9:04 AM, Guozhang Wang  wrote:
>> 
>> Hello folks,
>> 
>> I'd like to call out for a vote for the following KIP to expose the
>> requireStable flag inside admin client's options as well:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-851%3A+Add+requireStable+flag+into+ListConsumerGroupOffsetsOptions
>> 
>> Any feedback as well as your votes are welcome.
>> 
>> -- Guozhang


[jira] [Created] (KAFKA-14020) Performance regression in Producer

2022-06-24 Thread John Roesler (Jira)
John Roesler created KAFKA-14020:


 Summary: Performance regression in Producer
 Key: KAFKA-14020
 URL: https://issues.apache.org/jira/browse/KAFKA-14020
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 3.3.0
Reporter: John Roesler


[https://github.com/apache/kafka/commit/f7db6031b84a136ad0e257df722b20faa7c37b8a]
 introduced a 10% performance regression in the KafkaProducer under a default 
config.

 

The context for this result is a benchmark that we run for Kafka Streams. The 
benchmark provisions 5 independent AWS clusters, including one broker node on 
an i3.large and one client node on an i3.large. During a benchmark run, we 
first run the Producer for 10 minutes to generate test data, and then we run 
Kafka Streams under a number of configurations to measure its performance.

Our observation was a 10% regression in throughput under the simplest 
configuration, in which Streams simply consumes from a topic and does nothing 
else. That benchmark actually runs faster than the producer that generates the 
test data, so its thoughput is bounded by the data generator's throughput. 
After investigation, we realized that the regression was in the data generator, 
not the consumer or Streams.

We have numerous benchmark runs leading up to the commit in question, and they 
all show a throughput in the neighborhood of 115,000 records per second. We 
also have 40 runs including and after that commit, and they all show a 
throughput in the neighborhood of 105,000 records per second. A test on [trunk 
with the commit reverted |https://github.com/apache/kafka/pull/12342] shows a 
return to around 115,000 records per second.

Config:
{code:java}
final Properties properties = new Properties();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, broker);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
StringSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
StringSerializer.class);
{code}
Here's the producer code in the data generator. Our tests were running with 
three produceThreads.
{code:java}
 for (int t = 0; t < produceThreads; t++) {
futures.add(executorService.submit(() -> {
int threadTotal = 0;
long lastPrint = start;
final long printInterval = Duration.ofSeconds(10).toMillis();
long now;
try (final org.apache.kafka.clients.producer.Producer 
producer = new KafkaProducer<>(producerConfig(broker))) {
while (limit > (now = System.currentTimeMillis()) - start) {
for (int i = 0; i < 1000; i++) {
final String key = keys.next();
final String data = dataGen.generate();

producer.send(new ProducerRecord<>(topic, key, 
valueBuilder.apply(key, data)));

threadTotal++;
}

if ((now - lastPrint) > printInterval) {
System.out.println(Thread.currentThread().getName() + " 
produced " + numberFormat.format(threadTotal) + " to " + topic + " in " + 
Duration.ofMillis(now - start));
lastPrint = now;
}
}
}
total.addAndGet(threadTotal);
System.out.println(Thread.currentThread().getName() + " finished (" + 
numberFormat.format(threadTotal) + ") in " + Duration.ofMillis(now - start));
}));
}{code}
As you can see, this is a very basic usage.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [VOTE] KIP-840: Config file option for MessageReader/MessageFormatter in ConsoleProducer/ConsoleConsumer

2022-06-08 Thread John Roesler
Thanks for the KIP, Alexandre!

I’m +1 (binding)

-John

On Wed, Jun 8, 2022, at 20:54, deng ziming wrote:
> Thank you for this KIP,
>
> +1 (non-binding)
>
> -- 
> Best,
> Ziming
>
>> On Jun 7, 2022, at 8:53 PM, Alexandre Garnier  wrote:
>> 
>> Hi!
>> 
>> A little reminder to vote for this KIP.
>> 
>> Thanks.
>> 
>> 
>> Le mer. 1 juin 2022 à 10:58, Alexandre Garnier  a écrit :
>>> 
>>> Hi everyone!
>>> 
>>> I propose to start voting for KIP-840:
>>> https://cwiki.apache.org/confluence/x/bBqhD
>>> 
>>> Thanks,
>>> --
>>> Alex


Re: [VOTE] KIP-827: Expose logdirs total and usable space via Kafka API

2022-05-31 Thread John Roesler
Thanks for the KIP Mickael,

I'm +1 (binding)

-John

On Tue, May 31, 2022, at 18:48, Jun Rao wrote:
> Hi, Mickael,
>
> Thanks for the KIP. +1
>
> Jun
>
> On Wed, May 25, 2022 at 7:54 AM Tom Bentley  wrote:
>
>> Hi Mickael,
>>
>> Thanks for the KIP! +1 (binding).
>>
>> Kind regards,
>>
>> Tom
>>
>> On Thu, 19 May 2022 at 11:28, Federico Valeri 
>> wrote:
>>
>> > Thanks Mickael.
>> >
>> > +1 (non binding)
>> >
>> > On Wed, May 18, 2022 at 11:08 AM Divij Vaidya 
>> > wrote:
>> > >
>> > > +1 non binding.
>> > >
>> > > Divij Vaidya
>> > >
>> > >
>> > >
>> > > On Tue, May 17, 2022 at 6:16 PM Igor Soarez  wrote:
>> > >
>> > > > Thanks for this KIP Mickael.
>> > > >
>> > > > +1 non binding
>> > > >
>> > > > --
>> > > > Igor
>> > > >
>> > > > On Tue, May 17, 2022, at 2:48 PM, Luke Chen wrote:
>> > > > > Hi Mickael,
>> > > > >
>> > > > > +1 (binding) from me.
>> > > > > Thanks for the KIP!
>> > > > >
>> > > > > Luke
>> > > > >
>> > > > > On Tue, May 17, 2022 at 9:30 PM Mickael Maison <
>> > mickael.mai...@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > >> Hi,
>> > > > >>
>> > > > >> I'd like to start a vote on KIP-827. It proposes exposing the
>> total
>> > > > >> and usable space of logdirs
>> > > > >> via the DescribeLogDirs API:
>> > > > >>
>> > > > >>
>> > > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Mickael
>> > > > >>
>> > > >
>> >
>> >
>>


Re: [VOTE] KIP-843: Adding metricOrElseCreate method to Metrics

2022-05-31 Thread John Roesler
Generally, I agree with Ismael that having a new, weird name will make it hard 
to keep them straight. Then again, we need to make them different to prevent 
confusion about their semantics. To be clear, I'll be a +1 regardless of how we 
break this dilemma.

One suggestion: We currently have addMetric to add a new metric. We can take 
some inspiration from the Java Map interface and call this new method 
`addMetricIfAbsent`. Having the same prefix should help discovery, and 
following the Map convention should help confusion.

Thanks all,
-John



On Tue, May 31, 2022, at 12:13, Sagar wrote:
> Oh yeah there's another metric function which is get-only. I think we
> should go ahead with getOrCreateMetric.
>
> Thanks!
> Sagar.
>
> On Tue, May 31, 2022 at 10:02 PM Guozhang Wang  wrote:
>
>> I'd prefer the getOrCreateMetric function name, since for the existing "
>> sensor(String name)" function that only takes a single `String` parameter,
>> its semantics is already "get or create". Whereas the existing
>> "metric(MetricName)" function's semantics is "get" only. So in my mind, the
>> inconsistent conventions in function signatures already exist today. And
>> with the other option we would need to educate users that "all the `sensor`
>> functions are get-or-create, but, please remember that the `metric`
>> function with just the metric name is get-only, while other `metric`
>> overrides with more parameters are get-or-create", which I think is even
>> more confusing.
>>
>>
>> Guozhang
>>
>>
>> On Mon, May 30, 2022 at 9:51 PM Sagar  wrote:
>>
>> > Hi Ismael,
>> >
>> > I guess in that case, we will have to go with the name *metric*- similar
>> to
>> > *sensor* - which David pointed out above because I think that's the
>> closest
>> > method which either gets or creates a new sensor. Current addMetric in
>> the
>> > Metrics class throw an IllegalArguementException when the metric already
>> > exists and that's why I still think getOrCreateMetric still signifies the
>> > action correctly. Or how about addOrGetMetric or getOrAddMetric, just
>> > replacing create with add to keep it similar to the already present
>> > addMetric method.
>> >
>> > Thanks!
>> > Sagar.
>> >
>> > On Tue, May 31, 2022 at 1:19 AM Ismael Juma  wrote:
>> >
>> > > I think it's confusing to use two completely different naming
>> conventions
>> > > in the same class. We either stick with the existing convention or we
>> > > create a new one and deprecate old method(s). I am not sure there is
>> > enough
>> > > value in this case for the latter, but it would be good to hear what
>> > others
>> > > think.
>> > >
>> > > Ismael
>> > >
>> > > On Mon, May 30, 2022, 2:08 AM Bruno Cadonna 
>> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I would also lean towards getOrCreateMetric() for the reasons pointed
>> > > > out by Sagar. But I am fine either way.
>> > > >
>> > > > Best,
>> > > > Bruno
>> > > >
>> > > > On 30.05.22 10:54, Sagar wrote:
>> > > > > Hi Bruno/David,
>> > > > >
>> > > > > Thanks for the suggestions. I would personally lean towards using
>> > > > > getOrCreateMetric as it clearly explains the intent. Having said
>> > that,
>> > > if
>> > > > > we want to use just metric(similar to sensor), that should also be
>> > ok.
>> > > > Just
>> > > > > that I feel getOrCreateMetric is easily understandable.
>> > > > >
>> > > > > Thanks!
>> > > > > Sagar.
>> > > > >
>> > > > > On Mon, May 30, 2022 at 2:16 PM David Jacot
>> > > > > > > >
>> > > > > wrote:
>> > > > >
>> > > > >> Hi all,
>> > > > >>
>> > > > >> Looking at the current Metrics' API, we have `sensor` which gets
>> or
>> > > > creates
>> > > > >> a sensor. How about using `metric` to follow the same naming
>> > > convention?
>> > > > >>
>> > > > >> Best,
>> > > > >> David
>> > > > >>
>> > > > >> On Mon, May 30, 2022 at 9:18 AM Bruno Cadonna > >
>> > > > wrote:
>> > > > >>>
>> > > > >>> Hi Sagar,
>> > > > >>> Hi Ismael,
>> > > > >>>
>> > > > >>> what about getOrCreateMetric()?
>> > > > >>>
>> > > > >>> Best,
>> > > > >>> Bruno
>> > > > >>>
>> > > > >>>
>> > > > >>> On 28.05.22 18:56, Sagar wrote:
>> > > >  Hi Ismael,
>> > > > 
>> > > >  Actually Bruno suggested renaming it to getMetricOrElseCreate
>> and
>> > we
>> > > >  decided to go ahead with that one. These were the only names
>> that
>> > we
>> > > >  considered for the KIP.
>> > > > 
>> > > >  Thanks!
>> > > >  Sagar.
>> > > > 
>> > > > 
>> > > >  On Sat, May 28, 2022 at 8:19 PM Ismael Juma 
>> > > > wrote:
>> > > > 
>> > > > > Thanks for the KIP. The method makes sense, but the name is a
>> bit
>> > > > >> verbose.
>> > > > > Have we considered a more concise name?
>> > > > >
>> > > > > Ismael
>> > > > >
>> > > > > On Tue, May 24, 2022, 4:49 AM Sagar > >
>> > > > >> wrote:
>> > > > >
>> > > > >> Hi All,
>> > > > >>
>> > > > >> I would like to open a voting thread for the following KIP:
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>

Re: [VOTE] KIP-846: Processor-level Streams metrics for records/bytes Producedd

2022-05-31 Thread John Roesler
+1 (binding)

Thanks,
John

On Mon, May 30, 2022, at 13:00, Bill Bejeck wrote:
> +1 (binding)
>
> -Bill
>
> On Mon, May 30, 2022 at 4:49 AM Sagar  wrote:
>
>> +1 (non-binding).
>>
>> Thanks!
>> Sagar.
>>
>> On Mon, May 30, 2022 at 1:11 PM Bruno Cadonna  wrote:
>>
>> > +1 (binding)
>> >
>> > Thanks!
>> > Bruno
>> >
>> > On 30.05.22 09:36, Sophie Blee-Goldman wrote:
>> > > Hey all,
>> > >
>> > >   I'd like to kick off the voting thread for the KIP I proposed to add
>> > > processor-level "bytes/records produced" metrics to Kafka Streams.
>> > >
>> > > Thanks!
>> > >
>> > > KIP-846: Task-level Streams metrics for bytes/records Produced
>> > > <
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211886093
>> > >
>> > >
>> > > Cheers,
>> > > Sophie
>> > >
>> >
>>


Re: [DISCUSS] KIP-846: Task-level Streams metrics for bytes/records Produced

2022-05-28 Thread John Roesler
Thanks for the well motivated and documented KIP, Sophie! I’m in favor of this 
change. 

-John

On Sat, May 28, 2022, at 06:42, Sophie Blee-Goldman wrote:
> Hey all,
>
> I'd like to propose a very small KIP to add two metrics that will help fill
> a gap in the derivable produced and consumed metrics. Please take a look
> and reply here with any questions or concerns.
>
> KIP-846: Task-level Streams metrics for bytes/records Produced
> 
>
> Given the small nature of this I'm going to call for a vote soon, but
> please don't hesitate to raise anything you feel should be discussed in
> more detail first.
>
> Thanks!
> Sophie


Re: [DISCUSS] KIP-844: Transactional State Stores

2022-05-24 Thread John Roesler
Thanks for the KIP, Alex!

I'm really happy to see your proposal. This improvement fills a long-standing 
gap.

I have a few questions:

1. Configuration
The KIP only mentions RocksDB, but of course, Streams also ships with an 
InMemory store, and users also plug in their own custom state stores. It is 
also common to use multiple types of state stores in the same application for 
different purposes.

Against this backdrop, the choice to configure transactionality as a top-level 
config, as well as to configure the store transaction mechanism as a top-level 
config, seems a bit off.

Did you consider instead just adding the option to the RocksDB*StoreSupplier 
classes and the factories in Stores ? It seems like the desire to enable the 
feature by default, but with a feature-flag to disable it was a factor here. 
However, as you pointed out, there are some major considerations that users 
should be aware of, so opt-in doesn't seem like a bad choice, either. You could 
add an Enum argument to those factories like 
`RocksDBTransactionalMechanism.{NONE,

Some points in favor of this approach:
* Avoid "stores that don't support transactions ignore the config" complexity
* Users can choose how to spend their memory budget, making some stores 
transactional and others not
* When we add transactional support to in-memory stores, we don't have to 
figure out what to do with the mechanism config (i.e., what do you set the 
mechanism to when there are multiple kinds of transactional stores in the 
topology?)

2. caching/flushing/transactions
The coupling between memory usage and flushing that you mentioned is a bit 
troubling. It also occurs to me that there seems to be some relationship with 
the existing record cache, which is also an in-memory holding area for records 
that are not yet written to the cache and/or store (albeit with no particular 
semantics). Have you considered how all these components should relate? For 
example, should a "full" WriteBatch actually trigger a flush so that we don't 
get OOMEs? If the proposed transactional mechanism forces all uncommitted 
writes to be buffered in memory, until a commit, then what is the advantage 
over just doing the same thing with the RecordCache and not introducing the 
WriteBatch at all?

3. ALOS
You mentioned that a transactional store can help reduce duplication in the 
case of ALOS. We might want to be careful about claims like that. Duplication 
isn't the way that repeated processing manifests in state stores. Rather, it is 
in the form of dirty reads during reprocessing. This feature may reduce the 
incidence of dirty reads during reprocessing, but not in a predictable way. 
During regular processing today, we will send some records through to the 
changelog in between commit intervals. Under ALOS, if any of those dirty writes 
gets committed to the changelog topic, then upon failure, we have to roll the 
store forward to them anyway, regardless of this new transactional mechanism. 
That's a fixable problem, by the way, but this KIP doesn't seem to fix it. I 
wonder if we should make any claims about the relationship of this feature to 
ALOS if the real-world behavior is so complex.

4. IQ
As a reminder, we have a new IQv2 mechanism now. Should we propose any changes 
to IQv1 to support this transactional mechanism, versus just proposing it for 
IQv2? Certainly, it seems strange only to propose a change for IQv1 and not v2.

Regarding your proposal for IQv1, I'm unsure what the behavior should be for 
readCommitted, since the current behavior also reads out of the RecordCache. I 
guess if readCommitted==false, then we will continue to read from the cache 
first, then the Batch, then the store; and if readCommitted==true, we would 
skip the cache and the Batch and only read from the persistent RocksDB store?

What should IQ do if I request to readCommitted on a non-transactional store?

Thanks again for proposing the KIP, and my apologies for the long reply; I'm 
hoping to air all my concerns in one "batch" to save time for you.

Thanks,
-John

On Tue, May 24, 2022, at 03:45, Alexander Sorokoumov wrote:
> Hi all,
>
> I've written a KIP for making Kafka Streams state stores transactional and
> would like to start a discussion:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-844%3A+Transactional+State+Stores
>
> Best,
> Alex


Re: [VOTE] KIP-843: Adding metricOrElseCreate method to Metrics

2022-05-24 Thread John Roesler
Hi again Sagar,

My apologies; I was thinking of the `sensor` method:
org.apache.kafka.common.metrics.Metrics#sensor(java.lang.String, 
org.apache.kafka.common.metrics.MetricConfig, long, 
org.apache.kafka.common.metrics.Sensor.RecordingLevel, 
org.apache.kafka.common.metrics.Sensor...)

I'm in favor of your KIP. Also, sorry for responding to the VOTE thread instead 
of DISCUSS.

I'm +1 (binding)
-John

On Tue, May 24, 2022, at 09:10, John Roesler wrote:
> Hi Sagar,
>
> Thanks for the KIP!
>
> I’m not at my computer right now, but I think I confronted a similar 
> problem a while back for the Streams metrics. I think that I already 
> made the “addMetric” method to be idempotent, so if it’s already 
> registered, the call just returns the old one instead of creating a new 
> one. That way, you no longer have to check up front if the metric is 
> registered. I think that is also motivation for this KIP, right?
>
> Thanks,
> John
>
> On Tue, May 24, 2022, at 06:48, Sagar wrote:
>> Hi All,
>>
>> I would like to open a voting thread for the following KIP:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-843%3A+Adding+metricOrElseCreate+method+to+Metrics
>>
>> Thanks!
>> Sagar.


Re: [VOTE] KIP-843: Adding metricOrElseCreate method to Metrics

2022-05-24 Thread John Roesler
Hi Sagar,

Thanks for the KIP!

I’m not at my computer right now, but I think I confronted a similar problem a 
while back for the Streams metrics. I think that I already made the “addMetric” 
method to be idempotent, so if it’s already registered, the call just returns 
the old one instead of creating a new one. That way, you no longer have to 
check up front if the metric is registered. I think that is also motivation for 
this KIP, right?

Thanks,
John

On Tue, May 24, 2022, at 06:48, Sagar wrote:
> Hi All,
>
> I would like to open a voting thread for the following KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-843%3A+Adding+metricOrElseCreate+method+to+Metrics
>
> Thanks!
> Sagar.


Re: [VOTE] KIP-833: Mark KRaft as Production Ready

2022-05-24 Thread John Roesler
+1 (binding) from me. 

Thanks, Colin!
-John


On Tue, May 24, 2022, at 01:58, David Jacot wrote:
> +1. Thanks Colin!
>
> On Tue, May 24, 2022 at 4:50 AM Luke Chen  wrote:
>>
>> +1 from me.
>> Kraft is coming!!!
>>
>> Luke
>>
>> On Tue, May 24, 2022 at 7:26 AM Israel Ekpo  wrote:
>>
>> > +1 (non-binding) from me.
>> >
>> > I am very happy to finally see this.
>> >
>> > On Mon, May 23, 2022 at 6:08 PM Jason Gustafson > > >
>> > wrote:
>> >
>> > > Thanks Colin. +1 from me. Very exciting!
>> > >
>> > > On Tue, May 17, 2022 at 10:53 AM Colin McCabe 
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'd like to start the vote on KIP-833: Mark KRaft as Production Ready.
>> > > > https://cwiki.apache.org/confluence/x/8xKhD
>> > > >
>> > > > thanks,
>> > > > Colin
>> > > >
>> > >
>> > --
>> > Israel Ekpo
>> > Lead Instructor, IzzyAcademy.com
>> > https://www.youtube.com/c/izzyacademy
>> > https://izzyacademy.com/
>> >


Re: [HEADS-UP] Modification to KIP Template

2022-05-11 Thread John Roesler
+1 from me also. This is a great idea. 

Thanks,
John

On Wed, May 11, 2022, at 02:49, Luke Chen wrote:
> Hi Mickael,
>
> +1 to add "test plan" section to KIP template.
> Thanks for the improvement!
>
> Luke
>
> On Tue, May 10, 2022 at 11:21 PM Mickael Maison 
> wrote:
>
>> Hi,
>>
>> I did not see any comments nor concerns so I went ahead and added the
>> Test Plan section to the KIP template.
>>
>> Thanks,
>> Mickael
>>
>> On Fri, Apr 1, 2022 at 5:53 PM Mickael Maison 
>> wrote:
>> >
>> > Hi,
>> >
>> > Unearthing this old thread as today I stumbled on the issue that
>> > Ismael reported. It looks like this was never fixed!
>> >
>> > The "Test Plan" section was only added in the KIP-template page [0]
>> > and not in the actual KIP-template template [1] that is used when
>> > doing `Create -> KIP-Template` or by clicking on `Create KIP` on
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> >
>> > I think this new section makes sense and it's very easy to add it to
>> > the actual template. Before doing it, I just want to ping the dev list
>> > to see if anybody has suggestions or concerns since this was discussed
>> > many years ago now.
>> >
>> > 0: https://cwiki.apache.org/confluence/display/KAFKA/KIP-Template
>> > 1:
>> https://cwiki.apache.org/confluence/pages/templates2/viewpagetemplate.action?entityId=54329345&key=KAFKA
>> >
>> > Thanks,
>> > Mickael
>> >
>> > On Fri, May 27, 2016 at 10:55 AM Ismael Juma  wrote:
>> > >
>> > > Hi Gwen,
>> > >
>> > > Thanks for adding the "Test Plans" section. I think it may be worth
>> adding
>> > > a note about performance testing plans too (whenever relevant). By the
>> way,
>> > > even though the following page has the new section, if I use `Create ->
>> > > KIP-Template`, the new section doesn't appear. Do you know why?
>> > >
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-Template
>> > >
>> > > Ismael
>> > >
>> > > On Fri, May 27, 2016 at 3:24 AM, Gwen Shapira 
>> wrote:
>> > >
>> > > > Hi Kafka Developers,
>> > > >
>> > > > Just a quick heads-up that I added a new section to the KIP
>> template: "Test
>> > > > Plans".
>> > > > I think its a good habit to think about how a feature will be tested
>> while
>> > > > planning it. I'm talking about high-level notes on system tests, not
>> gritty
>> > > > details.
>> > > >
>> > > > This will apply to new KIPs, not ones in discussion/implementation
>> phases
>> > > > (although if your KIP is under discussion and you want to add test
>> plans,
>> > > > it will be very nice of you).
>> > > >
>> > > > I figured we all agree that thinking a bit about tests is a good
>> idea, so I
>> > > > added it first and started a discussion later. If you strongly
>> object,
>> > > > please respond with strong objections. Wikis are easy to edit :)
>> > > >
>> > > > Gwen
>> > > >
>>


Re: [DISCUSS] KIP-834: Pause / Resume KafkaStreams Topologies

2022-05-11 Thread John Roesler
have the chance to catch up to the (paused) active task state
>> > >> before
>> > >>>> they stop
>> > >>>> as well, in which case having them continue feels fine to me.
>> However
>> > >>> this
>> > >>>> is a
>> > >>>> relatively trivial benefit and I would only consider it as a
>> deciding
>> > >>>> factor when all
>> > >>>> things are equal otherwise.
>> > >>>>
>> > >>>> My concern is the more interesting case: when this feature is used
>> to
>> > >>> pause
>> > >>>> only
>> > >>>> one nodes, or some subset of the overall application. In this case,
>> > >> yes,
>> > >>>> the standby
>> > >>>> tasks will indeed fall out of sync. But the only reason I can
>> imagine
>> > >>>> someone using
>> > >>>> the pause feature in such a way is because there is something going
>> > >>> wrong,
>> > >>>> or about
>> > >>>> to go wrong, on that particular node. For example as mentioned
>> above,
>> > >> if
>> > >>>> the user
>> > >>>> wants to cut down on costs without stopping everything, or if the
>> node
>> > >> is
>> > >>>> about to
>> > >>>> run out of disk or needs to be debugged or so on. And in this case,
>> > >>>> continuing to
>> > >>>> process the standby tasks while other instances continue to run
>> would
>> > >>>> pretty much
>> > >>>> defeat the purpose of pausing it entirely, and might have unpleasant
>> > >>>> consequences
>> > >>>> for the unsuspecting developer.
>> > >>>>
>> > >>>> All that said, I don't want to block this KIP so if you have strong
>> > >>>> feelings about the
>> > >>>> standby behavior I'm happy to back down. I'm only pushing back now
>> > >>> because
>> > >>>> it
>> > >>>> felt like there wasn't any particular motivation for the standbys to
>> > >>>> continue processing
>> > >>>> or not, and I figured I'd try to fill in this gap with my thoughts
>> on
>> > >> the
>> > >>>> matter :)
>> > >>>> Either way we should just make sure that this behavior is documented
>> > >>>> clearly,
>> > >>>> since it may be surprising if we decide to only pause active
>> > processing
>> > >>>> (another option
>> > >>>> is to rename the method something like #pauseProcessing or
>> > >>>> #pauseActiveProcessing
>> > >>>> so that it's hard to miss).
>> > >>>>
>> > >>>> Thanks! Sorry for the lengthy response, but hopefully we won't need
>> to
>> > >>>> debate this any
>> > >>>> further. Beyond this I'm satisfied with the latest proposal
>> > >>>>
>> > >>>> On Mon, May 9, 2022 at 5:16 PM John Roesler 
>> > >> wrote:
>> > >>>>
>> > >>>>> Thanks for the updates, Jim!
>> > >>>>>
>> > >>>>> After this discussion and your updates, this KIP looks good to me.
>> > >>>>>
>> > >>>>> Thanks,
>> > >>>>> John
>> > >>>>>
>> > >>>>> On Mon, May 9, 2022, at 17:52, Jim Hughes wrote:
>> > >>>>>> Hi Sophie, all,
>> > >>>>>>
>> > >>>>>> I've updated the KIP with feedback from the discussion so far:
>> > >>>>>>
>> > >>>>>
>> > >>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832
>> > >>>>>>
>> > >>>>>> As a terse summary of my current position:
>> > >>>>>> Pausing will only stop processing and punctuation (respecting
>> > modular
>> > >>>>>> topologies).
>> > >>>>>> Paused topologies will still a) consume from inpu

Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-05-10 Thread John Roesler
+1 from me! Thanks for volunteering, José. 
-John

On Tue, May 10, 2022, at 17:53, José Armando García Sancio wrote:
> Hi all,
>
> I would like to volunteer for the release of Apache Kafka 3.3.0. If
> people agree, I'll start working on the release plan and update this
> thread.
>
> Thanks,
> -José


Re: [VOTE] KIP-834: Pause / Resume KafkaStreams Topologies

2022-05-10 Thread John Roesler
Thanks Jim,

I’m +1 (binding)

-John

On Tue, May 10, 2022, at 14:05, Jim Hughes wrote:
> Hi all,
>
> I'm asking for a vote on KIP-834:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832
>
> Thanks in advance!
>
> Jim


Re: [VOTE] KIP-832 Allow creating a producer/consumer using a producer/consumer

2022-05-10 Thread John Roesler
+1 (binding)

Thanks, François!
-John

On Tue, May 10, 2022, at 03:09, Bruno Cadonna wrote:
> Hi Francois,
>
> Thanks for the KIP whose link is:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882578
>
> +1 (binding)
>
> Best,
> Bruno
>
> On 09.05.22 23:23, François Rosière wrote:
>> 
>>


Re: [DISCUSS] KIP-834: Pause / Resume KafkaStreams Topologies

2022-05-09 Thread John Roesler
ht up.  If standby tasks would run out of
> space, there are probably bigger problems.
>
> If later it is desirable to manage punctuation or standby tasks, then it
> should be easy for future folks to modify things.
>
> Overall, I'd frame this KIP as "pause processing resulting in outputs".
>
> Cheers,
>
> Jim
>
>
>
>> On Mon, May 9, 2022 at 10:33 AM Guozhang Wang  wrote:
>>
>> > I think for named topology we can leave the scope of this KIP as "all or
>> > nothing", i.e. when you pause an instance you pause all of its
>> topologies.
>> > I raised this question in my previous email just trying to clarify if
>> this
>> > is what you have in mind. We can leave the question of finer controlled
>> > pausing behavior for later when we have named topology being exposed via
>> > another KIP.
>> >
>> >
>> > Guozhang
>> >
>> > On Mon, May 9, 2022 at 7:50 AM John Roesler  wrote:
>> >
>> > > Hi Jim,
>> > >
>> > > Thanks for the replies. This all sounds good to me. Just two further
>> > > comments:
>> > >
>> > > 3. It seems like you should aim for the simplest semantics. If the
>> intent
>> > > is to “pause” the instance, then you’d better pause the whole instance.
>> > If
>> > > you leave punctuations and standbys running, I expect we’d see bug
>> > reports
>> > > come in that the instance isn’t really paused.
>> > >
>> > > 5. Since you won the race to write a KIP, I don’t think it makes too
>> much
>> > > sense to worry too much about modular topologies. When they propose
>> their
>> > > KIP, they will have to specify a lot of state management behavior, and
>> > > pause/resume will have to be part of it. If they have some concern
>> about
>> > > your KIP, they’ll chime in. It doesn’t make sense for you to try and
>> > guess
>> > > what that proposal will look like.
>> > >
>> > > To be honest, you’re proposing a KafkaStreams runtime-level
>> pause/resume
>> > > function, not a topology-level one anyway, so it seems pretty clear
>> that
>> > it
>> > > would pause the whole runtime (of a single instance) regardless of any
>> > > modular topologies. If the intent is to pause individual topologies in
>> > the
>> > > future, you’d need a different API anyway.
>> > >
>> > > Thanks!
>> > > -John
>> > >
>> > > On Mon, May 9, 2022, at 08:10, Jim Hughes wrote:
>> > > > Hi John,
>> > > >
>> > > > Long emails are great; responding inline!
>> > > >
>> > > > On Sat, May 7, 2022 at 4:54 PM John Roesler 
>> > wrote:
>> > > >
>> > > >> Thanks for the KIP, Jim!
>> > > >>
>> > > >> This conversation seems to highlight that the KIP needs to specify
>> > > >> some of its behavior as well as its APIs, where the behavior is
>> > > >> observable and significant to users.
>> > > >>
>> > > >> For example:
>> > > >>
>> > > >> 1. Do you plan to have a guarantee that immediately after
>> > > >> calling KafkaStreams.pause(), users should observe that the instance
>> > > >> stops processing new records? Or should they expect that the threads
>> > > >> will continue to process some records and pause asynchronously
>> > > >> (you already answered this in the thread earlier)?
>> > > >>
>> > > >
>> > > > I'm happy to build up to a guarantee of sorts.  My current idea is
>> that
>> > > > pause() does not do anything "exceptional" to get control back from a
>> > > > running topology.  A currently running topology would get to complete
>> > its
>> > > > loop.
>> > > >
>> > > > Separately, I'm still piecing together how commits work.  By some
>> > > > mechanism, after a pause, I do agree that the topology needs to
>> commit
>> > > its
>> > > > work in some manner.
>> > > >
>> > > >
>> > > >> 2. Will the threads continue to poll new records until they
>> naturally
>> > > fill
>> > > >> up the task buffers, or will they immediately pause their Consumers
>

Re: [DISCUSS] KIP-834: Pause / Resume KafkaStreams Topologies

2022-05-09 Thread John Roesler
Hi Jim,

Thanks for the replies. This all sounds good to me. Just two further comments:

3. It seems like you should aim for the simplest semantics. If the intent is to 
“pause” the instance, then you’d better pause the whole instance. If you leave 
punctuations and standbys running, I expect we’d see bug reports come in that 
the instance isn’t really paused.

5. Since you won the race to write a KIP, I don’t think it makes too much sense 
to worry too much about modular topologies. When they propose their KIP, they 
will have to specify a lot of state management behavior, and pause/resume will 
have to be part of it. If they have some concern about your KIP, they’ll chime 
in. It doesn’t make sense for you to try and guess what that proposal will look 
like.

To be honest, you’re proposing a KafkaStreams runtime-level pause/resume 
function, not a topology-level one anyway, so it seems pretty clear that it 
would pause the whole runtime (of a single instance) regardless of any modular 
topologies. If the intent is to pause individual topologies in the future, 
you’d need a different API anyway. 

Thanks!
-John

On Mon, May 9, 2022, at 08:10, Jim Hughes wrote:
> Hi John,
>
> Long emails are great; responding inline!
>
> On Sat, May 7, 2022 at 4:54 PM John Roesler  wrote:
>
>> Thanks for the KIP, Jim!
>>
>> This conversation seems to highlight that the KIP needs to specify
>> some of its behavior as well as its APIs, where the behavior is
>> observable and significant to users.
>>
>> For example:
>>
>> 1. Do you plan to have a guarantee that immediately after
>> calling KafkaStreams.pause(), users should observe that the instance
>> stops processing new records? Or should they expect that the threads
>> will continue to process some records and pause asynchronously
>> (you already answered this in the thread earlier)?
>>
>
> I'm happy to build up to a guarantee of sorts.  My current idea is that
> pause() does not do anything "exceptional" to get control back from a
> running topology.  A currently running topology would get to complete its
> loop.
>
> Separately, I'm still piecing together how commits work.  By some
> mechanism, after a pause, I do agree that the topology needs to commit its
> work in some manner.
>
>
>> 2. Will the threads continue to poll new records until they naturally fill
>> up the task buffers, or will they immediately pause their Consumers
>> as well?
>>
>
> Presently, I'm suggesting that consumers would fill up their buffers.
>
>
>> 3. Will threads continue to call (system time) punctuators, or would
>> punctuations also be paused?
>>
>
> In my first pass at thinking through this, I left the punctuators running.
> To be honest, I'm not sure what they do, so my approach is either lucky and
> correct or it could be Very Clearly Wrong.;)
>
>
>> I realize that some of those questions simply may not have occurred to
>> you, so this is not a criticism for leaving them off; I'm just pointing out
>> that although we don't tend to mention implementation details in KIPs,
>> we also can't be too high level, since there are a lot of operational
>> details that users rely on to achieve various behaviors in Streams.
>>
>
> Ayup, I will add some details as we iron out the guarantees, implementation
> details that are at the API level.  This one is tough since internal
> features like NamedTopologies are part of the discussion.
>
>
>
>> A couple more comments:
>>
>> 4. +1 to what Guozhang said. It seems like we should we also do a commit
>> before entering the paused state. That way, any open transactions would
>> be closed and not have to worry about timing out. Even under ALOS, it
>> seems best to go ahead and complete the processing of in-flight records
>> by committing. That way, if anything happens to die while it's paused,
>> existing
>> work won't have to be repeated. Plus, if there are any processors with side
>> effects, users won't have to tolerate weird edge cases where a pause occurs
>> after a processor sees a record, but before the result is sent to its
>> outputs.
>>
>> 5. I noticed that you proposed not to add a PAUSED state, but I didn't
>> follow
>> the rationale. Adding a state seems beneficial for a number of reasons:
>> StreamThreads already use the thread state to determine whether to process
>> or not, so avoiding a new State would just mean adding a separate flag to
>> track
>> and then checking your new flag in addition to the State in the thread.
>> Also,
>> operating Streams applications is a non-trivial task

Re: [DISCUSS] KIP-832 Allow creating a producer/consumer using a producer/consumer config

2022-05-07 Thread John Roesler
Thanks, François!

Those changes look good to me.

Thanks,
-John

On Fri, May 6, 2022, at 13:51, François Rosière wrote:
> The KIP has been updated to reflect the last discussion
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882578#KIP832:Allowcreatingaproducer/consumerusingaproducer/consumerconfig-ProposedChanges
>
>
> Le ven. 6 mai 2022 à 20:44, François Rosière  a
> écrit :
>
>> Hello,
>>
>> No problem to also add a constructor taking the StreamsConfig in the
>> TopologyTestDriver.
>>
>> Summary about the changes to apply:
>>
>>- Create 2 new constructors in KafkaProducer
>>- Create a new constructor in KafkaConsumer and increase de visibility
>>of an existing one
>>- Create a new constructor in TopologyTestDriver
>>
>> Kr,
>>
>> F.
>>
>> Le ven. 6 mai 2022 à 16:57, John Roesler  a écrit :
>>
>>> Thanks for the KIP, François!
>>>
>>> I'm generally in favor of your KIP, since you're
>>> proposing to follow the existing pattern of the
>>> constructors for both Producer and Consumer,
>>> but with the config object instead of Properties
>>> or Map configs. Also, because we already have
>>> this pattern in Streams, and we are just
>>> extending it to Producer and Consumer.
>>>
>>> Following on the KIP-378 discussion, I do still think
>>> this is somewhat of an abuse of the Config objects,
>>> and it would be better to have a formal dependency
>>> injection interface, but I also don't want to let perfect
>>> be the enemy of good. Since it looks like this approach
>>> works, and there is also some precedent for it already,
>>> I'd be inclined to approve it.
>>>
>>> Since KIP-378 didn't make it over the finish line, and it
>>> seems like a small expansion to your proposal, do you
>>> mind also adding the StreamsConfig to the
>>> TopologyTestDriver constructors? That way, we can go
>>> ahead and resolve both KIPs at once.
>>>
>>> Thank you,
>>> -John
>>>
>>>
>>> On Fri, May 6, 2022, at 06:06, François Rosière wrote:
>>> > To stay consistent with existing code, we should simply add 2
>>> constructors.
>>> > One with ser/deser and one without.
>>> > So that, users have the choice to use one or the other.
>>> > I updated the KIP accordingly.
>>> >
>>> > Le ven. 6 mai 2022 à 12:55, François Rosière <
>>> francois.rosi...@gmail.com> a
>>> > écrit :
>>> >
>>> >> On the other hand, the KafkaConsumer constructor with a config +
>>> >> serializer and deserializer already exists but is not public.
>>> >> It would also complexify a bit the caller to not have the
>>> >> serializer/deserializer exposed at constructor level.
>>> >>
>>> >> Once the KIP would have been implemented, for streams, instead of
>>> having a
>>> >> custom config (already possible), I may simply define a custom
>>> >> KafkaClientSupplier reusing the custom configs of both the producer
>>> and the
>>> >> consumer.
>>> >> This supplier currently creates producers and consumers using the
>>> >> constructors with a map of config + serializer/deserializer.
>>> >>
>>> >> So, it seems it's easier to have the constructor with 3 parameters.
>>> But in
>>> >> any case, it will work if the config can be accessed...
>>> >>
>>> >> Le ven. 6 mai 2022 à 12:14, François Rosière <
>>> francois.rosi...@gmail.com>
>>> >> a écrit :
>>> >>
>>> >>> Hello,
>>> >>>
>>> >>> We may create a constructor with a single parameter which is the
>>> config
>>> >>> but then, I would need to give the serializer/deserializer by also
>>> >>> overriding the config.
>>> >>> Like I would do for the interceptors.
>>> >>> So, no real opinion on that, both solutions are ok for me.
>>> >>> Maybe easier to take the approach of the single parameter.
>>> >>>
>>> >>> Hope it respond to the question.
>>> >>>
>>> >>> Kr,
>>> >>>
>>> >>> F.
>>> >>>
>>> >>> Le ven. 6 mai 2022 à 11:59, Bruno Cadonna  a
>>> écrit :
>>&g

Re: [DISCUSS] KIP-834: Pause / Resume KafkaStreams Topologies

2022-05-07 Thread John Roesler
Thanks for the KIP, Jim!

This conversation seems to highlight that the KIP needs to specify
some of its behavior as well as its APIs, where the behavior is
observable and significant to users.

For example:

1. Do you plan to have a guarantee that immediately after
calling KafkaStreams.pause(), users should observe that the instance
stops processing new records? Or should they expect that the threads
will continue to process some records and pause asynchronously
(you already answered this in the thread earlier)?

2. Will the threads continue to poll new records until they naturally fill
up the task buffers, or will they immediately pause their Consumers
as well?

3. Will threads continue to call (system time) punctuators, or would
punctuations also be paused?

I realize that some of those questions simply may not have occurred to
you, so this is not a criticism for leaving them off; I'm just pointing out
that although we don't tend to mention implementation details in KIPs,
we also can't be too high level, since there are a lot of operational
details that users rely on to achieve various behaviors in Streams.

A couple more comments:

4. +1 to what Guozhang said. It seems like we should we also do a commit
before entering the paused state. That way, any open transactions would
be closed and not have to worry about timing out. Even under ALOS, it
seems best to go ahead and complete the processing of in-flight records
by committing. That way, if anything happens to die while it's paused, existing
work won't have to be repeated. Plus, if there are any processors with side
effects, users won't have to tolerate weird edge cases where a pause occurs
after a processor sees a record, but before the result is sent to its outputs.

5. I noticed that you proposed not to add a PAUSED state, but I didn't follow
the rationale. Adding a state seems beneficial for a number of reasons:
StreamThreads already use the thread state to determine whether to process
or not, so avoiding a new State would just mean adding a separate flag to track
and then checking your new flag in addition to the State in the thread. Also,
operating Streams applications is a non-trivial task, and users rely on the 
State
(and transitions) to understand Streams's behavior. Adding a PAUSED state
is an elegant way to communicate to operators what is happening with the
application. Note that the person digging though logs and metrics, trying
to understand why the application isn't doing anything is probably not going
to be the same person who is calling pause() and resume(). Also, if you add
a state, you don't need `isPaused()`.

5b. If you buy the arguments to go ahead and commit as well as the
argument to add a State, then I'd also suggest to follow the existing patterns
for the shutdown states by also adding PAUSING. That
way, you'll also expose a way to understand that Streams received the signal
to pause, and that it's still processing and committing some records in
preparation to enter a PAUSED state. I'm not sure if a RESUMING state would
also make sense.

And that's all I have to say about that. I hope you don't find my
long message offputting. I'm fundamentally in favor of your KIP,
and I think with a little more explanation in the KIP, and a few
small tweaks to the proposal, we'll be able to provide good
ergonomics to our users.

Thanks,
-John

On Sat, May 7, 2022, at 00:06, Guozhang Wang wrote:
> I'm in favor of the "just pausing the instance itself“ option as well. As
> for EOS, the point is that when the processing is paused, we would not
> trigger any `producer.send` during the time, and the transaction timeout is
> sort of relying on that behavior, so my point was that it's probably better
> to also commit the processing before we pause it.
>
>
> Guozhang
>
> On Fri, May 6, 2022 at 6:12 PM Jim Hughes 
> wrote:
>
>> Hi Matthias,
>>
>> Since the only thing which will be paused is processing the topology, I
>> think we can let commits happen naturally.
>>
>> Good point about getting the paused state to new members; it is seeming
>> like the "building block" approach is a good one to keep things simple at
>> first.
>>
>> Cheers,
>>
>> Jim
>>
>> On Fri, May 6, 2022 at 8:31 PM Matthias J. Sax  wrote:
>>
>> > I think it's tricky to propagate a pauseAll() via the rebalance
>> > protocol. New members joining the group would need to get paused, too?
>> > Could there be weird race conditions with overlapping pauseAll() and
>> > resumeAll() calls on different instanced while there could be a errors /
>> > network partitions or similar?
>> >
>> > I would argue that similar to IQ, we provide the basic building blocks,
>> > and leave it the user users to implement cross instance management for a
>> > pauseAll() scenario. -- Also, if there is really demand, we can always
>> > add pauseAll()/resumeAll() as follow up work.
>> >
>> > About named typologies: I agree to Jim to not include them in this KIP
>> > as they are not a public feature yet. If we mak

Re: [DISCUSS] KIP-832 Allow creating a producer/consumer using a producer/consumer config

2022-05-06 Thread John Roesler
Thanks for the KIP, François!

I'm generally in favor of your KIP, since you're
proposing to follow the existing pattern of the
constructors for both Producer and Consumer,
but with the config object instead of Properties
or Map configs. Also, because we already have
this pattern in Streams, and we are just
extending it to Producer and Consumer.

Following on the KIP-378 discussion, I do still think
this is somewhat of an abuse of the Config objects,
and it would be better to have a formal dependency
injection interface, but I also don't want to let perfect
be the enemy of good. Since it looks like this approach
works, and there is also some precedent for it already,
I'd be inclined to approve it.

Since KIP-378 didn't make it over the finish line, and it
seems like a small expansion to your proposal, do you
mind also adding the StreamsConfig to the
TopologyTestDriver constructors? That way, we can go
ahead and resolve both KIPs at once.

Thank you,
-John


On Fri, May 6, 2022, at 06:06, François Rosière wrote:
> To stay consistent with existing code, we should simply add 2 constructors.
> One with ser/deser and one without.
> So that, users have the choice to use one or the other.
> I updated the KIP accordingly.
>
> Le ven. 6 mai 2022 à 12:55, François Rosière  a
> écrit :
>
>> On the other hand, the KafkaConsumer constructor with a config +
>> serializer and deserializer already exists but is not public.
>> It would also complexify a bit the caller to not have the
>> serializer/deserializer exposed at constructor level.
>>
>> Once the KIP would have been implemented, for streams, instead of having a
>> custom config (already possible), I may simply define a custom
>> KafkaClientSupplier reusing the custom configs of both the producer and the
>> consumer.
>> This supplier currently creates producers and consumers using the
>> constructors with a map of config + serializer/deserializer.
>>
>> So, it seems it's easier to have the constructor with 3 parameters. But in
>> any case, it will work if the config can be accessed...
>>
>> Le ven. 6 mai 2022 à 12:14, François Rosière 
>> a écrit :
>>
>>> Hello,
>>>
>>> We may create a constructor with a single parameter which is the config
>>> but then, I would need to give the serializer/deserializer by also
>>> overriding the config.
>>> Like I would do for the interceptors.
>>> So, no real opinion on that, both solutions are ok for me.
>>> Maybe easier to take the approach of the single parameter.
>>>
>>> Hope it respond to the question.
>>>
>>> Kr,
>>>
>>> F.
>>>
>>> Le ven. 6 mai 2022 à 11:59, Bruno Cadonna  a écrit :
>>>
 Hi Francois,

 Thank you for updating the KIP!

 Now the motivation of the KIP is much clearer.

 I would still be interested in:

  >> 2. Why do you only want to change/add the constructors that take the
  >> properties objects and de/serializers and you do not also want to
  >> add/change the constructors that take only the properties?


 Best,
 Bruno

 On 05.05.22 23:15, François Rosière wrote:
 > Hello Bruno,
 >
 > The KIP as been updated. Feel free to give more feedbacks and I will
 > complete accordingly.
 >
 > Kr,
 >
 > F.
 >
 > Le jeu. 5 mai 2022 à 22:22, Bruno Cadonna  a
 écrit :
 >
 >> Hi Francois,
 >>
 >> Thanks for the KIP!
 >>
 >> Here my first feedback:
 >>
 >> 1. Could you please extend the motivation section, so that it is clear
 >> for a non-Spring dev why the change is needed? Usually, a motivation
 >> section benefits a lot from an actual example.
 >> Extending the motivation section would also make the KIP more
 >> self-contained which is important IMO since this is kind of a log of
 the
 >> major changes to Kafka. Descriptions of major changes should not
 >> completely depend on external links (which may become dead in future).
 >> Referencing external resources to point to more details or give
 context
 >> is useful, though.
 >>
 >> 2. Why do you only want to change/add the constructors that take the
 >> properties objects and de/serializers and you do not also want to
 >> add/change the constructors that take only the properties?
 >>
 >> 3. I found the following stalled KIP whose motivation is really
 similar
 >> to yours:
 >>
 >>
 >>
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-378%3A+Enable+Dependency+Injection+for+Kafka+Streams+handlers
 >>
 >> That KIP is also the reason why Kafka Streams still has the
 constructors
 >> with the StreamsConfig parameter. Maybe you want to mention this KIP
 in
 >> yours or even incorporate the remaining topology test driver API
 changes
 >> in your KIP.
 >> Some related links:
 >> - https://github.com/apache/kafka/pull/5344#issuecomment-413350338
 >> - https://github.com/apache/kafka/pull/10484
 >> 

Re: [VOTE] KIP-821: Connect Transforms support for nested structures

2022-04-21 Thread John Roesler
Thanks for the KIP, Jorge!

I’ve just looked over the KIP, and it looks good to me.

I’m +1 (binding)

Thanks,
John

On Thu, Apr 21, 2022, at 09:10, Chris Egerton wrote:
> This is a worthwhile addition to the SMTs that ship out of the box with
> Kafka Connect. +1 non-binding
>
> On Thu, Apr 21, 2022, 09:51 Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'd like to start a vote on KIP-821:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
>>
>> Thanks,
>> Jorge
>>


  1   2   3   4   5   6   7   8   9   10   >