RE: Consuming an entire partition with control messages

2023-07-27 Thread miltan
Hi Team,

Greetings,

We actually reached out to you for Oracle/ IT / SAP / Infor / Microsoft "VOTEC 
IT SERVICE PARTNERSHIP"  "IT SERVICE OUTSOURCING" " "PARTNER SERVICE 
SUBCONTRACTING"

We have very attractive newly introduce reasonably price PARTNER IT SERVICE ODC 
SUBCONTRACTING MODEL in USA, Philippines, India and Singapore etc with White 
Label Model.

Our LOW COST IT SERVICE ODC MODEL eliminate the cost of expensive employee 
payroll, Help partner to get profit more than 50% on each project.. ..We really 
mean it.

We are already working with platinum partner like NTT DATA, NEC Singapore, 
Deloitte, Hitachi consulting. ACCENTURE, Abeam Singapore etc.

Are u keen to understand VOTEC IT PARTNERSHIP offerings? Looping KB 
kail...@votecgroup.com | Partnership In charge |

Let us know your availability this week OR Next week??

-Original Message-
From: Matthias J. Sax [mailto:mj...@apache.org] 
Sent: 27 July 2023 22:05
To: users@kafka.apache.org
Subject: Re: Consuming an entire partition with control messages

Well, `kafka-consumer-group.sh` can only display the difference between 
"committed offset" and "end offset". It cannot know what the "right" 
offset to be committed is. It's really the responsibility of the consumers to 
commit correctly.

-Matthias

On 7/27/23 1:03 AM, Vincent Maurin wrote:
> Thank you Matthias for your answer, I open an issue on the aiokafka 
> project as follow up, let's see how we can resolve it there
> https://github.com/aio-libs/aiokafka/issues/911
> 
> As mentioned in the issue, some tools like kafka-consumer-groups.sh 
> also display a lag of "1" in this kind of situation
> 
> Best regards,
> 
> Vincent
> 
> On 13/06/2023 17:27, Matthias J. Sax wrote:
>> Sounds like a bug in aiokafka library to me.
>>
>> If the last message in a topic partition is a tx-marker, the consumer 
>> should step over it, and report the correct position after the marker.
>>
>> The official KafkaConsumer (ie, the Java one), does the exact same thing.
>>
>>
>> -Matthias
>>
>> On 5/30/23 8:41 AM, Vincent Maurin wrote:
>>> Hello !
>>>
>>> I am working on an exactly once stream processors in Python, using 
>>> aiokafka client library. My program stores a state in memory, that 
>>> is recovered from a changelog topic, like in kafka streams.
>>>
>>> On each processing loop, I am consuming messages, producing messages 
>>> to an output topics and to my changelog topic, within a transaction.
>>>
>>> When I need to restart a runner, to restore the state in memory, I 
>>> have a routine consuming the changelog topic from the beginning to 
>>> the "end" with a read_commited isolation level. Here I am struggling 
>>> to define when to stop my recovery :
>>> * my current (maybe) working solution is to loop over "poll" until 
>>> poll is not returning any messages anymore
>>> * I tried to do more something based on the end offests, the 
>>> checking the consumer position, but with control messages at the end 
>>> of the partition, I am running into an issue where position is one 
>>> below end offsets, and doesn't go further
>>>
>>> I had a quick look to
>>> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org
>>> /apache/kafka/streams/processor/internals/StoreChangelogReader.java
>>> but it is a bit hard to figure out what is going on here
>>>
>>> Best regards,
>>> Vincent



Re: Streams/RocksDB: Why Universal Compaction?

2023-07-27 Thread Guozhang Wang
Thanks Colt!

For a library's default configs, I think the principle would be "it
runs appropriately out of the box for the first time you played with
it", so I'm not suggesting we should try to make sure it is a
generally good combination for a wide range of production usage since
in that case most people would prefer to have some customizations
rather than blindly accept those default config values anyways. Hence,
what I had in mind regarding "benchmarks" is something light, like
running those stateful examples in our tutorials
(https://github.com/apache/kafka/tree/trunk/streams/examples/src/main/java/org/apache/kafka/streams/examples),
and see if the new config over all gives a better performance. It's
not required to be a very comprehensive one. If you could help with
that validation it would be great.


Guozhang

On Wed, Jul 26, 2023 at 8:38 AM Colt McNealy  wrote:
>
> Guozhang,
>
> Thanks for your response. That makes a lot of sense; I can't promise any
> super-formal benchmarks but we will definitely play with the configurations
> you sent and report back within a month about our high-level findings.
>
> For our purposes (a workflow engine), we will mostly monitor workflow
> execution metrics + state store restoration times. But in the interest of a
> formal benchmark that could be included in a KIP—what monitoring software
> tooling and setup environment would you recommend? If it doesn't involve
> writing copious amounts of custom code, perhaps (no promises) my team could
> put something together that's more suitable for a general Streams audience
> rather than just our own internal usage.
>
> Cheers,
> Colt McNealy
>
> *Founder, LittleHorse.dev*
>
>
> On Sun, Jul 23, 2023 at 11:21 AM Guozhang Wang 
> wrote:
>
> > Yeah I can shed some light here: I used Universal originally since at
> > the beginning of Kafka Streams journey there were user reports
> > complaining about its storage amplifications. But soon enough (around
> > 2019) I've realized that, as a OOTB config, level compaction may be
> > more preferable.
> >
> > I had a PR dating back to that time where I suggested changing a bunch
> > of OOTB configs or RocksDB including the compaction config:
> > https://github.com/apache/kafka/pull/6406/files, unfortunately it was
> > not merged since I wanted to run some benchmarks to make sure it does
> > not have any gotchas but never got the time to do so. I would be very
> > happy in fact if someone could pick that up and re-examine if they
> > still make sense, and if yes drive it through and merge.
> >
> > Guozhang
> >
> >
> > On Sun, Jul 23, 2023 at 10:29 AM Matthias J. Sax  wrote:
> > >
> > > Do you happen to know?
> > >
> > >
> > >  Forwarded Message 
> > > Subject: Streams/RocksDB: Why Universal Compaction?
> > > Date: Fri, 23 Jun 2023 13:19:36 -0700
> > > From: Colt McNealy 
> > > Reply-To: users@kafka.apache.org
> > > To: users@kafka.apache.org
> > >
> > > Hello there!
> > >
> > > I was wondering if anyone (perhaps an early developer or power-user of
> > > Kafka Streams) knows why the Streams developers made the default setting
> > > for RocksDB compaction "Universal" compaction rather than "Level"
> > > compaction?
> > >
> > > My understanding (in which I am extremely UNconfident) is as follows—
> > >
> > > Supposedly Universal compaction leads to lower write amplification after
> > > compaction finishes. In a run of Universal compaction, all data is
> > > compacted; as per the RocksDB documentation it is possible for temporary
> > > write amplification of up to 2x during this process. There have also been
> > > reports of "write stalls" during this process [1].
> > >
> > > In Level compaction, only certain levels (tiers of SST files) are
> > compacted
> > > at once, meaning that the compaction process is shorter and less
> > intensive,
> > > but that write amplification after compaction finishes is higher than
> > with
> > > universal compaction.
> > >
> > > Can anyone confirm/deny/correct this?
> > >
> > > [1] https://github.com/solana-labs/solana/issues/14586 (not
> > > Streams-related, but it is RocksDB)
> > >
> > > Thanks in advance,
> > > Colt McNealy
> > >
> > > *Founder, LittleHorse.dev*
> > >
> >


Re: Consuming an entire partition with control messages

2023-07-27 Thread Matthias J. Sax
Well, `kafka-consumer-group.sh` can only display the difference between 
"committed offset" and "end offset". It cannot know what the "right" 
offset to be committed is. It's really the responsibility of the 
consumers to commit correctly.


-Matthias

On 7/27/23 1:03 AM, Vincent Maurin wrote:
Thank you Matthias for your answer, I open an issue on the aiokafka 
project as follow up, let's see how we can resolve it there 
https://github.com/aio-libs/aiokafka/issues/911


As mentioned in the issue, some tools like kafka-consumer-groups.sh also 
display a lag of "1" in this kind of situation


Best regards,

Vincent

On 13/06/2023 17:27, Matthias J. Sax wrote:

Sounds like a bug in aiokafka library to me.

If the last message in a topic partition is a tx-marker, the consumer 
should step over it, and report the correct position after the marker.


The official KafkaConsumer (ie, the Java one), does the exact same thing.


-Matthias

On 5/30/23 8:41 AM, Vincent Maurin wrote:

Hello !

I am working on an exactly once stream processors in Python, using
aiokafka client library. My program stores a state in memory, that is
recovered from a changelog topic, like in kafka streams.

On each processing loop, I am consuming messages, producing messages
to an output topics and to my changelog topic, within a transaction.

When I need to restart a runner, to restore the state in memory, I
have a routine consuming the changelog topic from the beginning to the
"end" with a read_commited isolation level. Here I am struggling to
define when to stop my recovery :
* my current (maybe) working solution is to loop over "poll" until
poll is not returning any messages anymore
* I tried to do more something based on the end offests, the checking
the consumer position, but with control messages at the end of the
partition, I am running into an issue where position is one below end
offsets, and doesn't go further

I had a quick look to
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java
but it is a bit hard to figure out what is going on here

Best regards,
Vincent


Re: Consuming an entire partition with control messages

2023-07-27 Thread Vincent Maurin
Thank you Matthias for your answer, I open an issue on the aiokafka 
project as follow up, let's see how we can resolve it there 
https://github.com/aio-libs/aiokafka/issues/911


As mentioned in the issue, some tools like kafka-consumer-groups.sh also 
display a lag of "1" in this kind of situation


Best regards,

Vincent

On 13/06/2023 17:27, Matthias J. Sax wrote:

Sounds like a bug in aiokafka library to me.

If the last message in a topic partition is a tx-marker, the consumer 
should step over it, and report the correct position after the marker.


The official KafkaConsumer (ie, the Java one), does the exact same thing.


-Matthias

On 5/30/23 8:41 AM, Vincent Maurin wrote:

Hello !

I am working on an exactly once stream processors in Python, using
aiokafka client library. My program stores a state in memory, that is
recovered from a changelog topic, like in kafka streams.

On each processing loop, I am consuming messages, producing messages
to an output topics and to my changelog topic, within a transaction.

When I need to restart a runner, to restore the state in memory, I
have a routine consuming the changelog topic from the beginning to the
"end" with a read_commited isolation level. Here I am struggling to
define when to stop my recovery :
* my current (maybe) working solution is to loop over "poll" until
poll is not returning any messages anymore
* I tried to do more something based on the end offests, the checking
the consumer position, but with control messages at the end of the
partition, I am running into an issue where position is one below end
offsets, and doesn't go further

I had a quick look to
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java 


but it is a bit hard to figure out what is going on here

Best regards,
Vincent