Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-12-18 Thread Guozhang Wang
Hi Radai,

This is an interesting idea indeed. However since this KIP is already voted
I'd also suggest maybe we can discuss about it in a separate thread so that
we do not drag too long on it (as you can tell from the KIP number it has
been proposed for quite some time and it would be great to have such
feature get in and then later consumer further improvements).

As for handling large messages, please correct me if I'm thinking something
wrong (my main knowledge about this come from
https://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297).
If we consider (or have already done so) leveraging on headers to split /
stitch segments of a large message, could we still use different values to
indicate the sequence of the segments? E.g. suppose the header key are all
the same, the header values could still be "m1-s1" (message one, segment
one), and the last message of m1 being "m1-s5d" (message one, segment five
and btw it is the end segment) etc.


Guozhang

On Mon, Dec 16, 2019 at 9:36 AM Senthilnathan Muthusamy
 wrote:

> Hi Radai
>
> Thanks for the suggestion. This is really cool feature and specific
> scenario on handling the fragments... However I would strongly recommend to
> come up with separate KIP to discuss this scenario so that we will have a
> better design in place. And also not to divert the intent of the current
> KIP...
>
> Appreciate your valuable feedback!
>
> Regards,
> Senthil
>
> -Original Message-
> From: radai 
> Sent: Thursday, December 12, 2019 11:40 AM
> To: dev@kafka.apache.org
> Subject: Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> may I suggest that if, under "header" strategy, multiple records are found
> with identical header values they are ALL kept?
> this would be useful in cases where users send larger payloads than max
> record size to kafka and are forced to fragment them - by setting the same
> header in all fragments it would become possible to properly log-compact
> topics with such fragmented payloads.
>
> On Tue, Nov 26, 2019 at 10:24 PM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
> >
> > Thanks Jun for confirming!
> >
> > I have updated the KIP (added recommendation section and special case in
> handling LEO record for non-offset based compaction strategy). Please
> review and let me know if you have any other feedback.
> >
> > Regards,
> > Senthil
> >
> > -Original Message-
> > From: Jun Rao 
> > Sent: Tuesday, November 26, 2019 4:36 PM
> > To: dev 
> > Subject: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi, Senthil,
> >
> > Sorry for the delay.
> >
> > 51. It seems that we can just remove the last record from the batch, but
> keeps the batch during compaction. The batch level metadata is enough to
> preserve the log end offset.
> >
> > 53. Yes, your understanding is correct. So we could recommend users to
> set "
> > max.compaction.lag.ms" properly if they care about deletes.
> >
> > Could you add both to the KIP?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Nov 26, 2019 at 5:09 AM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
> >
> > > Hi Gouzhang & Jun,
> > >
> > > Can one of you please confirm/respond to the below mail so that I
> > > will go ahead and update the KIP and proceed.
> > >
> > > Thanks
> > > Senthil
> > >
> > > - Senthil
> > > 
> > > From: Senthilnathan Muthusamy 
> > > Sent: Wednesday, November 20, 2019 5:04:20 PM
> > > To: dev@kafka.apache.org 
> > > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > 
> > >
> > > Hi Gouzhang & Jun,
> > >
> > > Thanks for the detailed on the scenarios.
> > >
> > > #51 => thanks for the details Gouzhang with example. Does followers
> > > won't be sync'ing LEO as well with leader? If yes, keeping last
> > > record always (without compaction for non-offset scenarios) would
> > > work and this needed only if the new strategy ends up removing LEO
> > > record, right? Also I couldn’t able to retrieve Jason's mail related
> > > to creating an empty message... Can you please forward if you have?
> > > Wondering how that can solve this particular issue unless creating
> > > record for random key that won't conflict with the producer/consumer
> keys for that topic/partition.
> > >
> > > #53 => I see that t

RE: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-12-16 Thread Senthilnathan Muthusamy
Hi Radai

Thanks for the suggestion. This is really cool feature and specific scenario on 
handling the fragments... However I would strongly recommend to come up with 
separate KIP to discuss this scenario so that we will have a better design in 
place. And also not to divert the intent of the current KIP...

Appreciate your valuable feedback!

Regards,
Senthil

-Original Message-
From: radai  
Sent: Thursday, December 12, 2019 11:40 AM
To: dev@kafka.apache.org
Subject: Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

may I suggest that if, under "header" strategy, multiple records are found with 
identical header values they are ALL kept?
this would be useful in cases where users send larger payloads than max record 
size to kafka and are forced to fragment them - by setting the same header in 
all fragments it would become possible to properly log-compact topics with such 
fragmented payloads.

On Tue, Nov 26, 2019 at 10:24 PM Senthilnathan Muthusamy 
 wrote:
>
> Thanks Jun for confirming!
>
> I have updated the KIP (added recommendation section and special case in 
> handling LEO record for non-offset based compaction strategy). Please review 
> and let me know if you have any other feedback.
>
> Regards,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Tuesday, November 26, 2019 4:36 PM
> To: dev 
> Subject: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Sorry for the delay.
>
> 51. It seems that we can just remove the last record from the batch, but 
> keeps the batch during compaction. The batch level metadata is enough to 
> preserve the log end offset.
>
> 53. Yes, your understanding is correct. So we could recommend users to set "
> max.compaction.lag.ms" properly if they care about deletes.
>
> Could you add both to the KIP?
>
> Thanks,
>
> Jun
>
>
> On Tue, Nov 26, 2019 at 5:09 AM Senthilnathan Muthusamy 
>  wrote:
>
> > Hi Gouzhang & Jun,
> >
> > Can one of you please confirm/respond to the below mail so that I 
> > will go ahead and update the KIP and proceed.
> >
> > Thanks
> > Senthil
> >
> > - Senthil
> > ____
> > From: Senthilnathan Muthusamy 
> > Sent: Wednesday, November 20, 2019 5:04:20 PM
> > To: dev@kafka.apache.org 
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > 
> >
> > Hi Gouzhang & Jun,
> >
> > Thanks for the detailed on the scenarios.
> >
> > #51 => thanks for the details Gouzhang with example. Does followers 
> > won't be sync'ing LEO as well with leader? If yes, keeping last 
> > record always (without compaction for non-offset scenarios) would 
> > work and this needed only if the new strategy ends up removing LEO 
> > record, right? Also I couldn’t able to retrieve Jason's mail related 
> > to creating an empty message... Can you please forward if you have?
> > Wondering how that can solve this particular issue unless creating 
> > record for random key that won't conflict with the producer/consumer keys 
> > for that topic/partition.
> >
> > #53 => I see that this can happen for the low produce rate from 
> > remaining ineligible for compaction for an unbounded duration where by "
> > delete.retention.ms" triggers that removes the tombstone record. If 
> > that's the case (please correct me if I am missing any other 
> > scenarios), then we can suggest the Kafka users to have "segment.ms" & "
> > max.compaction.lag.ms" (as compaction won't happen on active 
> > segment) to be smaller than the "delete.retention.ms" and that 
> > should address this scenario, right?
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-
> > From: Jun Rao 
> > Sent: Wednesday, November 13, 2019 9:31 AM
> > To: dev 
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi, Seth,
> >
> > 51. The difference is that with the offset compaction strategy, the 
> > message corresponding to the last offset is always the winning 
> > record and will never be removed. But with the new strategies, it's 
> > now possible that the message corresponding to the last offset is a 
> > losing record and needs to be removed.
> >
> > 53. Similarly, with the offset compaction strategy, if we see a 
> > non-tombstone record after a tombstone record, the non-tombstone 
> > record is always the winning one. However, with the new strategies, 
> > that non-tombstone record with a larger offset could be a losing 
> > record.

Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-12-12 Thread radai
may I suggest that if, under "header" strategy, multiple records are
found with identical header values they are ALL kept?
this would be useful in cases where users send larger payloads than
max record size to kafka and are forced to fragment them - by setting
the same header in all fragments it would become possible to properly
log-compact topics with such fragmented payloads.

On Tue, Nov 26, 2019 at 10:24 PM Senthilnathan Muthusamy
 wrote:
>
> Thanks Jun for confirming!
>
> I have updated the KIP (added recommendation section and special case in 
> handling LEO record for non-offset based compaction strategy). Please review 
> and let me know if you have any other feedback.
>
> Regards,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Tuesday, November 26, 2019 4:36 PM
> To: dev 
> Subject: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Sorry for the delay.
>
> 51. It seems that we can just remove the last record from the batch, but 
> keeps the batch during compaction. The batch level metadata is enough to 
> preserve the log end offset.
>
> 53. Yes, your understanding is correct. So we could recommend users to set "
> max.compaction.lag.ms" properly if they care about deletes.
>
> Could you add both to the KIP?
>
> Thanks,
>
> Jun
>
>
> On Tue, Nov 26, 2019 at 5:09 AM Senthilnathan Muthusamy 
>  wrote:
>
> > Hi Gouzhang & Jun,
> >
> > Can one of you please confirm/respond to the below mail so that I will
> > go ahead and update the KIP and proceed.
> >
> > Thanks
> > Senthil
> >
> > - Senthil
> > ____
> > From: Senthilnathan Muthusamy 
> > Sent: Wednesday, November 20, 2019 5:04:20 PM
> > To: dev@kafka.apache.org 
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > 
> >
> > Hi Gouzhang & Jun,
> >
> > Thanks for the detailed on the scenarios.
> >
> > #51 => thanks for the details Gouzhang with example. Does followers
> > won't be sync'ing LEO as well with leader? If yes, keeping last record
> > always (without compaction for non-offset scenarios) would work and
> > this needed only if the new strategy ends up removing LEO record,
> > right? Also I couldn’t able to retrieve Jason's mail related to
> > creating an empty message... Can you please forward if you have?
> > Wondering how that can solve this particular issue unless creating
> > record for random key that won't conflict with the producer/consumer keys 
> > for that topic/partition.
> >
> > #53 => I see that this can happen for the low produce rate from
> > remaining ineligible for compaction for an unbounded duration where by "
> > delete.retention.ms" triggers that removes the tombstone record. If
> > that's the case (please correct me if I am missing any other
> > scenarios), then we can suggest the Kafka users to have "segment.ms" & "
> > max.compaction.lag.ms" (as compaction won't happen on active segment)
> > to be smaller than the "delete.retention.ms" and that should address
> > this scenario, right?
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-
> > From: Jun Rao 
> > Sent: Wednesday, November 13, 2019 9:31 AM
> > To: dev 
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi, Seth,
> >
> > 51. The difference is that with the offset compaction strategy, the
> > message corresponding to the last offset is always the winning record
> > and will never be removed. But with the new strategies, it's now
> > possible that the message corresponding to the last offset is a losing
> > record and needs to be removed.
> >
> > 53. Similarly, with the offset compaction strategy, if we see a
> > non-tombstone record after a tombstone record, the non-tombstone
> > record is always the winning one. However, with the new strategies,
> > that non-tombstone record with a larger offset could be a losing
> > record. The question is then how do we retain the tombstone long
> > enough so that we could still recognize that the non-tombstone record 
> > should be ignored.
> >
> > Thanks,
> >
> > Jun
> >
> > -Original Message-
> > From: Guozhang Wang 
> > Sent: Tuesday, November 12, 2019 6:09 PM
> > To: dev 
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hello Senthil,
> >
> > Let me try to re-iterate on Jun's comments with some context here:
> >
>

RE: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-26 Thread Senthilnathan Muthusamy
Thanks Jun for confirming!

I have updated the KIP (added recommendation section and special case in 
handling LEO record for non-offset based compaction strategy). Please review 
and let me know if you have any other feedback.

Regards,
Senthil

-Original Message-
From: Jun Rao  
Sent: Tuesday, November 26, 2019 4:36 PM
To: dev 
Subject: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

Hi, Senthil,

Sorry for the delay.

51. It seems that we can just remove the last record from the batch, but keeps 
the batch during compaction. The batch level metadata is enough to preserve the 
log end offset.

53. Yes, your understanding is correct. So we could recommend users to set "
max.compaction.lag.ms" properly if they care about deletes.

Could you add both to the KIP?

Thanks,

Jun


On Tue, Nov 26, 2019 at 5:09 AM Senthilnathan Muthusamy 
 wrote:

> Hi Gouzhang & Jun,
>
> Can one of you please confirm/respond to the below mail so that I will 
> go ahead and update the KIP and proceed.
>
> Thanks
> Senthil
>
> - Senthil
> 
> From: Senthilnathan Muthusamy 
> Sent: Wednesday, November 20, 2019 5:04:20 PM
> To: dev@kafka.apache.org 
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> 
>
> Hi Gouzhang & Jun,
>
> Thanks for the detailed on the scenarios.
>
> #51 => thanks for the details Gouzhang with example. Does followers 
> won't be sync'ing LEO as well with leader? If yes, keeping last record 
> always (without compaction for non-offset scenarios) would work and 
> this needed only if the new strategy ends up removing LEO record, 
> right? Also I couldn’t able to retrieve Jason's mail related to 
> creating an empty message... Can you please forward if you have? 
> Wondering how that can solve this particular issue unless creating 
> record for random key that won't conflict with the producer/consumer keys for 
> that topic/partition.
>
> #53 => I see that this can happen for the low produce rate from 
> remaining ineligible for compaction for an unbounded duration where by "
> delete.retention.ms" triggers that removes the tombstone record. If 
> that's the case (please correct me if I am missing any other 
> scenarios), then we can suggest the Kafka users to have "segment.ms" & "
> max.compaction.lag.ms" (as compaction won't happen on active segment) 
> to be smaller than the "delete.retention.ms" and that should address 
> this scenario, right?
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Wednesday, November 13, 2019 9:31 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi, Seth,
>
> 51. The difference is that with the offset compaction strategy, the 
> message corresponding to the last offset is always the winning record 
> and will never be removed. But with the new strategies, it's now 
> possible that the message corresponding to the last offset is a losing 
> record and needs to be removed.
>
> 53. Similarly, with the offset compaction strategy, if we see a 
> non-tombstone record after a tombstone record, the non-tombstone 
> record is always the winning one. However, with the new strategies, 
> that non-tombstone record with a larger offset could be a losing 
> record. The question is then how do we retain the tombstone long 
> enough so that we could still recognize that the non-tombstone record should 
> be ignored.
>
> Thanks,
>
> Jun
>
> -Original Message-
> From: Guozhang Wang 
> Sent: Tuesday, November 12, 2019 6:09 PM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Senthil,
>
> Let me try to re-iterate on Jun's comments with some context here:
>
> 51: today with the offset-only compaction strategy, the last record of 
> the log (we call it the log-end-record, whose offset is 
> log-end-offset) would always be preserved and not compacted. This is 
> kinda important for replication since followers reason about the 
> log-end-offset on the leader.
> Consider this case: three replicas of a partition, leader 1 and 
> follower 2 and 3.
>
> Leader 1 has records a, b, c, d and d is the current last record of 
> the partition, the current log-end-offset is 3 (assuming record a's 
> offset is 0).
> Follower 2 has replicated a, b, c, d. Log-end-offset is 3 Follower 3 
> has replicated a, b, c but not yet replicated d. Log-end-offset is 2.
>
> NOTE that the compaction triggering are independent on brokers, it is 
> possible that leader 1 triggers compaction and deletes record d, while 
> other followers have not triggered compaction yet. At this moment the 
> leader's log becomes a, b, c. Now l

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-26 Thread Jun Rao
Hi, Senthil,

Sorry for the delay.

51. It seems that we can just remove the last record from the batch, but
keeps the batch during compaction. The batch level metadata is enough to
preserve the log end offset.

53. Yes, your understanding is correct. So we could recommend users to set "
max.compaction.lag.ms" properly if they care about deletes.

Could you add both to the KIP?

Thanks,

Jun


On Tue, Nov 26, 2019 at 5:09 AM Senthilnathan Muthusamy
 wrote:

> Hi Gouzhang & Jun,
>
> Can one of you please confirm/respond to the below mail so that I will go
> ahead and update the KIP and proceed.
>
> Thanks
> Senthil
>
> - Senthil
> 
> From: Senthilnathan Muthusamy 
> Sent: Wednesday, November 20, 2019 5:04:20 PM
> To: dev@kafka.apache.org 
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> 
>
> Hi Gouzhang & Jun,
>
> Thanks for the detailed on the scenarios.
>
> #51 => thanks for the details Gouzhang with example. Does followers won't
> be sync'ing LEO as well with leader? If yes, keeping last record always
> (without compaction for non-offset scenarios) would work and this needed
> only if the new strategy ends up removing LEO record, right? Also I
> couldn’t able to retrieve Jason's mail related to creating an empty
> message... Can you please forward if you have? Wondering how that can solve
> this particular issue unless creating record for random key that won't
> conflict with the producer/consumer keys for that topic/partition.
>
> #53 => I see that this can happen for the low produce rate from remaining
> ineligible for compaction for an unbounded duration where by "
> delete.retention.ms" triggers that removes the tombstone record. If
> that's the case (please correct me if I am missing any other scenarios),
> then we can suggest the Kafka users to have "segment.ms" & "
> max.compaction.lag.ms" (as compaction won't happen on active segment) to
> be smaller than the "delete.retention.ms" and that should address this
> scenario, right?
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Wednesday, November 13, 2019 9:31 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi, Seth,
>
> 51. The difference is that with the offset compaction strategy, the
> message corresponding to the last offset is always the winning record and
> will never be removed. But with the new strategies, it's now possible that
> the message corresponding to the last offset is a losing record and needs
> to be removed.
>
> 53. Similarly, with the offset compaction strategy, if we see a
> non-tombstone record after a tombstone record, the non-tombstone record is
> always the winning one. However, with the new strategies, that
> non-tombstone record with a larger offset could be a losing record. The
> question is then how do we retain the tombstone long enough so that we
> could still recognize that the non-tombstone record should be ignored.
>
> Thanks,
>
> Jun
>
> -Original Message-
> From: Guozhang Wang 
> Sent: Tuesday, November 12, 2019 6:09 PM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Senthil,
>
> Let me try to re-iterate on Jun's comments with some context here:
>
> 51: today with the offset-only compaction strategy, the last record of the
> log (we call it the log-end-record, whose offset is log-end-offset) would
> always be preserved and not compacted. This is kinda important for
> replication since followers reason about the log-end-offset on the leader.
> Consider this case: three replicas of a partition, leader 1 and follower 2
> and 3.
>
> Leader 1 has records a, b, c, d and d is the current last record of the
> partition, the current log-end-offset is 3 (assuming record a's offset is
> 0).
> Follower 2 has replicated a, b, c, d. Log-end-offset is 3 Follower 3 has
> replicated a, b, c but not yet replicated d. Log-end-offset is 2.
>
> NOTE that the compaction triggering are independent on brokers, it is
> possible that leader 1 triggers compaction and deletes record d, while
> other followers have not triggered compaction yet. At this moment the
> leader's log becomes a, b, c. Now let's say follower 3 fetch from leader
> after the compaction, it will no longer see record d.
>
> Now suppose there's a leader migration and follower 3 becomes the new
> leader, it would accept new appends (say, it's e), and record e would be
> appended at *offset 3 *on new leader 3's log. But follower 2's offset 3's
> record is d still. Later let's say follower 2 also triggers compaction and
> also fetches the new record e from new leade

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-26 Thread Senthilnathan Muthusamy
Hi Gouzhang & Jun,

Can one of you please confirm/respond to the below mail so that I will go ahead 
and update the KIP and proceed.

Thanks
Senthil

- Senthil

From: Senthilnathan Muthusamy 
Sent: Wednesday, November 20, 2019 5:04:20 PM
To: dev@kafka.apache.org 
Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction



Hi Gouzhang & Jun,

Thanks for the detailed on the scenarios.

#51 => thanks for the details Gouzhang with example. Does followers won't be 
sync'ing LEO as well with leader? If yes, keeping last record always (without 
compaction for non-offset scenarios) would work and this needed only if the new 
strategy ends up removing LEO record, right? Also I couldn’t able to retrieve 
Jason's mail related to creating an empty message... Can you please forward if 
you have? Wondering how that can solve this particular issue unless creating 
record for random key that won't conflict with the producer/consumer keys for 
that topic/partition.

#53 => I see that this can happen for the low produce rate from remaining 
ineligible for compaction for an unbounded duration where by 
"delete.retention.ms" triggers that removes the tombstone record. If that's the 
case (please correct me if I am missing any other scenarios), then we can 
suggest the Kafka users to have "segment.ms" & "max.compaction.lag.ms" (as 
compaction won't happen on active segment) to be smaller than the 
"delete.retention.ms" and that should address this scenario, right?

Thanks,
Senthil

-Original Message-
From: Jun Rao 
Sent: Wednesday, November 13, 2019 9:31 AM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hi, Seth,

51. The difference is that with the offset compaction strategy, the message 
corresponding to the last offset is always the winning record and will never be 
removed. But with the new strategies, it's now possible that the message 
corresponding to the last offset is a losing record and needs to be removed.

53. Similarly, with the offset compaction strategy, if we see a non-tombstone 
record after a tombstone record, the non-tombstone record is always the winning 
one. However, with the new strategies, that non-tombstone record with a larger 
offset could be a losing record. The question is then how do we retain the 
tombstone long enough so that we could still recognize that the non-tombstone 
record should be ignored.

Thanks,

Jun

-Original Message-
From: Guozhang Wang 
Sent: Tuesday, November 12, 2019 6:09 PM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hello Senthil,

Let me try to re-iterate on Jun's comments with some context here:

51: today with the offset-only compaction strategy, the last record of the log 
(we call it the log-end-record, whose offset is log-end-offset) would always be 
preserved and not compacted. This is kinda important for replication since 
followers reason about the log-end-offset on the leader.
Consider this case: three replicas of a partition, leader 1 and follower 2 and 
3.

Leader 1 has records a, b, c, d and d is the current last record of the 
partition, the current log-end-offset is 3 (assuming record a's offset is 0).
Follower 2 has replicated a, b, c, d. Log-end-offset is 3 Follower 3 has 
replicated a, b, c but not yet replicated d. Log-end-offset is 2.

NOTE that the compaction triggering are independent on brokers, it is possible 
that leader 1 triggers compaction and deletes record d, while other followers 
have not triggered compaction yet. At this moment the leader's log becomes a, 
b, c. Now let's say follower 3 fetch from leader after the compaction, it will 
no longer see record d.

Now suppose there's a leader migration and follower 3 becomes the new leader, 
it would accept new appends (say, it's e), and record e would be appended at 
*offset 3 *on new leader 3's log. But follower 2's offset 3's record is d 
still. Later let's say follower 2 also triggers compaction and also fetches the 
new record e from new leader 3:

Follower 2's log would be* a(0), b(1), c(2), e(4)* where the numbers in 
brackets are offset number; while leader 3's log would be *a(0), b(1), c(2), 
e(3)*. Now you see the two logs diverges in offsets, although their log entries 
are the same.

-

One way to resolve this, is to simply never remove the last message during 
compaction. Another way (suggested by Jason in the old VOTE thread) is to 
create an empty message batch to "take up" that offset slot.


53: Again here's some context on when we can delete a tombstone (null):
during compaction, if we see the latest record for a certain key is a tombstone 
we can remove all old records BUT that tombstone itself cannot be removed 
immediately since the old records may already be fetched by some consumers and 
that tombstone may not be fetched by consumer yet. Also that tombstone may have 
not been repli

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-20 Thread Senthilnathan Muthusamy


Hi Gouzhang & Jun,

Thanks for the detailed on the scenarios.

#51 => thanks for the details Gouzhang with example. Does followers won't be 
sync'ing LEO as well with leader? If yes, keeping last record always (without 
compaction for non-offset scenarios) would work and this needed only if the new 
strategy ends up removing LEO record, right? Also I couldn’t able to retrieve 
Jason's mail related to creating an empty message... Can you please forward if 
you have? Wondering how that can solve this particular issue unless creating 
record for random key that won't conflict with the producer/consumer keys for 
that topic/partition.

#53 => I see that this can happen for the low produce rate from remaining 
ineligible for compaction for an unbounded duration where by 
"delete.retention.ms" triggers that removes the tombstone record. If that's the 
case (please correct me if I am missing any other scenarios), then we can 
suggest the Kafka users to have "segment.ms" & "max.compaction.lag.ms" (as 
compaction won't happen on active segment) to be smaller than the 
"delete.retention.ms" and that should address this scenario, right?

Thanks,
Senthil

-Original Message-
From: Jun Rao  
Sent: Wednesday, November 13, 2019 9:31 AM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hi, Seth,

51. The difference is that with the offset compaction strategy, the message 
corresponding to the last offset is always the winning record and will never be 
removed. But with the new strategies, it's now possible that the message 
corresponding to the last offset is a losing record and needs to be removed.

53. Similarly, with the offset compaction strategy, if we see a non-tombstone 
record after a tombstone record, the non-tombstone record is always the winning 
one. However, with the new strategies, that non-tombstone record with a larger 
offset could be a losing record. The question is then how do we retain the 
tombstone long enough so that we could still recognize that the non-tombstone 
record should be ignored.

Thanks,

Jun

-Original Message-
From: Guozhang Wang  
Sent: Tuesday, November 12, 2019 6:09 PM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hello Senthil,

Let me try to re-iterate on Jun's comments with some context here:

51: today with the offset-only compaction strategy, the last record of the log 
(we call it the log-end-record, whose offset is log-end-offset) would always be 
preserved and not compacted. This is kinda important for replication since 
followers reason about the log-end-offset on the leader.
Consider this case: three replicas of a partition, leader 1 and follower 2 and 
3.

Leader 1 has records a, b, c, d and d is the current last record of the 
partition, the current log-end-offset is 3 (assuming record a's offset is 0).
Follower 2 has replicated a, b, c, d. Log-end-offset is 3 Follower 3 has 
replicated a, b, c but not yet replicated d. Log-end-offset is 2.

NOTE that the compaction triggering are independent on brokers, it is possible 
that leader 1 triggers compaction and deletes record d, while other followers 
have not triggered compaction yet. At this moment the leader's log becomes a, 
b, c. Now let's say follower 3 fetch from leader after the compaction, it will 
no longer see record d.

Now suppose there's a leader migration and follower 3 becomes the new leader, 
it would accept new appends (say, it's e), and record e would be appended at 
*offset 3 *on new leader 3's log. But follower 2's offset 3's record is d 
still. Later let's say follower 2 also triggers compaction and also fetches the 
new record e from new leader 3:

Follower 2's log would be* a(0), b(1), c(2), e(4)* where the numbers in 
brackets are offset number; while leader 3's log would be *a(0), b(1), c(2), 
e(3)*. Now you see the two logs diverges in offsets, although their log entries 
are the same.

-

One way to resolve this, is to simply never remove the last message during 
compaction. Another way (suggested by Jason in the old VOTE thread) is to 
create an empty message batch to "take up" that offset slot.


53: Again here's some context on when we can delete a tombstone (null):
during compaction, if we see the latest record for a certain key is a tombstone 
we can remove all old records BUT that tombstone itself cannot be removed 
immediately since the old records may already be fetched by some consumers and 
that tombstone may not be fetched by consumer yet. Also that tombstone may have 
not been replicated to all other followers yet while the old records have 
already been replicated. Hence we have some config on the broker to "delay" the 
removal of the tombstone itself. You can find this config named 
"delete.retention.ms" in
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2Fdocumentation%2F%23bro

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-13 Thread Jun Rao
Hi, Seth,

51. The difference is that with the offset compaction strategy, the message
corresponding to the last offset is always the winning record and will
never be removed. But with the new strategies, it's now possible that the
message corresponding to the last offset is a losing record and needs to be
removed.

53. Similarly, with the offset compaction strategy, if we see a
non-tombstone record after a tombstone record, the non-tombstone record is
always the winning one. However, with the new strategies, that
non-tombstone record with a larger offset could be a losing record. The
question is then how do we retain the tombstone long enough so that we
could still recognize that the non-tombstone record should be ignored.

Thanks,

Jun

On Mon, Nov 11, 2019 at 5:15 PM Senthilnathan Muthusamy
 wrote:

> Hi Jun,
>
> Thanks for the response and please find below the response!
>
> #50 - got it...
>
> #51 - not sure how the last record will be deleted bcoz of this new
> compact strategy. The reason I am asking is, the compaction is based out of
> offsetmap and the new strategy logic is purely within the offsetmap... the
> offsetmap will always keep track of the latest offset irrespective of the
> compaction strategy. You can have a look at the PR of the new compaction
> strategy changes: https://github.com/apache/kafka/pull/7528/files
>
> #52 - sure, I have updated JIRA to include this details in the wiki.
>
> #53 - as I am pointed out in #51, the tombstone is abstract to this change
> (i.e. the tombstone is handled within LogCleaner and the compact strategy
> is by the offsetmap). this is what my understand on the tombstone based on
> the code walk-thru... please let me know if I am missing anything here...
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Thursday, November 7, 2019 4:32 PM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Thanks for bringing back this KIP. Overall, this seems like a useful
> feature. A few comments below.
>
> 50. One use case for the timestamp based compaction is to resolve
> conflicts during data center failures. The failover of a data center
> typically happens much longer tha millisec. So, timestamp could be enough
> to determine the value to keep.
>
> 51. With the timestamp/header strategy, it seems that it may now be
> possible that the last record could be removed during compaction. For
> example, if the active segment is empty, the last record in the previous
> segment could be removed due to compaction. A new replica then won't see
> the true end offset of the partition. If that replica ever becomes the
> leader, it could write a different record on the same end offset, which
> will be weird.
>
> 52. With the timestamp/header strategy, the behavior of the application
> may need to change. In particular, the application can't just blindly take
> the record with a larger offset and assuming that it's the value to keep.
> It needs to check the timestamp or the header now. So, it would be useful
> to at least document this.
>
> 53. This also adds complexity for deletes. Currently, we use a null
> payload to indicate a delete tombstone. The tombstone can be removed once
> all previous records with the same key have been removed. If the new
> strategies apply to tombstones, it's not clear when a tombstone can be
> removed since subsequent records could have timestamp/sequenceId smaller
> than that in the tombstone. It would be useful to think this through and
> document the expected behavior.
>
> Jun
>
> On Tue, Nov 5, 2019 at 11:37 AM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
>
> > Hi Guozhang,
> >
> > Sure and I have made a note in the JIRA item to make sure the wiki is
> > updated.
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-
> > From: Guozhang Wang 
> > Sent: Monday, November 4, 2019 11:00 AM
> > To: dev 
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hello Senthilnathan,
> >
> > Thanks for revamping on the KIP. I have only one comment about the
> > wiki otherwise LGTM.
> >
> > 1. We should emphasize that the newly introduced config yields to the
> > existing "log.cleanup.policy", i.e. if the latter's value is `delete`
> > not `compact`, then the previous config would be ignored.
> >
> >
> > Guozhang
> >
> > On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy <
> > senth...@microsoft.com.invalid> wrote:
> >
> > > Hi all,
> > >
> > > I will start the vote thread shortly for this updated KIP. If there
> > > are any

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-12 Thread Guozhang Wang
Hello Senthil,

Let me try to re-iterate on Jun's comments with some context here:

51: today with the offset-only compaction strategy, the last record of the
log (we call it the log-end-record, whose offset is log-end-offset) would
always be preserved and not compacted. This is kinda important for
replication since followers reason about the log-end-offset on the leader.
Consider this case: three replicas of a partition, leader 1 and follower 2
and 3.

Leader 1 has records a, b, c, d and d is the current last record of the
partition, the current log-end-offset is 3 (assuming record a's offset is
0).
Follower 2 has replicated a, b, c, d. Log-end-offset is 3
Follower 3 has replicated a, b, c but not yet replicated d. Log-end-offset
is 2.

NOTE that the compaction triggering are independent on brokers, it is
possible that leader 1 triggers compaction and deletes record d, while
other followers have not triggered compaction yet. At this moment the
leader's log becomes a, b, c. Now let's say follower 3 fetch from leader
after the compaction, it will no longer see record d.

Now suppose there's a leader migration and follower 3 becomes the new
leader, it would accept new appends (say, it's e), and record e would be
appended at *offset 3 *on new leader 3's log. But follower 2's offset 3's
record is d still. Later let's say follower 2 also triggers compaction and
also fetches the new record e from new leader 3:

Follower 2's log would be* a(0), b(1), c(2), e(4)* where the numbers in
brackets are offset number; while leader 3's log would be *a(0), b(1),
c(2), e(3)*. Now you see the two logs diverges in offsets, although their
log entries are the same.

-

One way to resolve this, is to simply never remove the last message during
compaction. Another way (suggested by Jason in the old VOTE thread) is to
create an empty message batch to "take up" that offset slot.


53: Again here's some context on when we can delete a tombstone (null):
during compaction, if we see the latest record for a certain key is a
tombstone we can remove all old records BUT that tombstone itself cannot be
removed immediately since the old records may already be fetched by some
consumers and that tombstone may not be fetched by consumer yet. Also that
tombstone may have not been replicated to all other followers yet while the
old records have already been replicated. Hence we have some config on the
broker to "delay" the removal of the tombstone itself. You can find this
config named "delete.retention.ms" in
https://kafka.apache.org/documentation/#brokerconfigs

Now consider under timestamp / header based compaction strategy: a later
record may still be deprecated by an early tombstone, so if that tombstone
is already removed then the log compaction thread would not remove that
later record and hence the logic would be broken. That's why we also need
consider "delaying" the removal of the tombstone in this case.

Personally I think we can still piggy-back on the "delete.retention.ms"
since its default value is 8640ms == 1 day, and we just need to
document that if you have timestamp / header based compaction, then it's
YOUR responsibility as the Kafka user to make sure that the timestamp /
header out of ordering is smaller than the value of "delete.retention.ms".
Otherwise some later records with smaller timestamp / headers may not be
compacted correctly since the tombstone is already gone and hence we do not
have the "proof" to remove it anymore.


Does that make sense to you?

Guozhang


On Tue, Nov 12, 2019 at 9:15 AM Senthilnathan Muthusamy
 wrote:

> Hi Jun,
>
> Thanks for the response and please find below the response!
>
> #50 - got it...
>
> #51 - not sure how the last record will be deleted bcoz of this new
> compact strategy. The reason I am asking is, the compaction is based out of
> offsetmap and the new strategy logic is purely within the offsetmap... the
> offsetmap will always keep track of the latest offset irrespective of the
> compaction strategy. You can have a look at the PR of the new compaction
> strategy changes: https://github.com/apache/kafka/pull/7528/files
>
> #52 - sure, I have updated JIRA to include this details in the wiki.
>
> #53 - as I am pointed out in #51, the tombstone is abstract to this change
> (i.e. the tombstone is handled within LogCleaner and the compact strategy
> is by the offsetmap). this is what my understand on the tombstone based on
> the code walk-thru... please let me know if I am missing anything here...
>
> Thanks,
> Senthil
>
> -----Original Message-
> From: Jun Rao 
> Sent: Thursday, November 7, 2019 4:32 PM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Thanks for bringing back this KIP. Overall, this seems like a useful
> fe

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-11 Thread Senthilnathan Muthusamy
Hi Jun,

Thanks for the response and please find below the response!

#50 - got it...

#51 - not sure how the last record will be deleted bcoz of this new compact 
strategy. The reason I am asking is, the compaction is based out of offsetmap 
and the new strategy logic is purely within the offsetmap... the offsetmap will 
always keep track of the latest offset irrespective of the compaction strategy. 
You can have a look at the PR of the new compaction strategy changes: 
https://github.com/apache/kafka/pull/7528/files 

#52 - sure, I have updated JIRA to include this details in the wiki.

#53 - as I am pointed out in #51, the tombstone is abstract to this change 
(i.e. the tombstone is handled within LogCleaner and the compact strategy is by 
the offsetmap). this is what my understand on the tombstone based on the code 
walk-thru... please let me know if I am missing anything here...

Thanks,
Senthil

-Original Message-
From: Jun Rao  
Sent: Thursday, November 7, 2019 4:32 PM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hi, Senthil,

Thanks for bringing back this KIP. Overall, this seems like a useful feature. A 
few comments below.

50. One use case for the timestamp based compaction is to resolve conflicts 
during data center failures. The failover of a data center typically happens 
much longer tha millisec. So, timestamp could be enough to determine the value 
to keep.

51. With the timestamp/header strategy, it seems that it may now be possible 
that the last record could be removed during compaction. For example, if the 
active segment is empty, the last record in the previous segment could be 
removed due to compaction. A new replica then won't see the true end offset of 
the partition. If that replica ever becomes the leader, it could write a 
different record on the same end offset, which will be weird.

52. With the timestamp/header strategy, the behavior of the application may 
need to change. In particular, the application can't just blindly take the 
record with a larger offset and assuming that it's the value to keep. It needs 
to check the timestamp or the header now. So, it would be useful to at least 
document this.

53. This also adds complexity for deletes. Currently, we use a null payload to 
indicate a delete tombstone. The tombstone can be removed once all previous 
records with the same key have been removed. If the new strategies apply to 
tombstones, it's not clear when a tombstone can be removed since subsequent 
records could have timestamp/sequenceId smaller than that in the tombstone. It 
would be useful to think this through and document the expected behavior.

Jun

On Tue, Nov 5, 2019 at 11:37 AM Senthilnathan Muthusamy 
 wrote:

> Hi Guozhang,
>
> Sure and I have made a note in the JIRA item to make sure the wiki is 
> updated.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Guozhang Wang 
> Sent: Monday, November 4, 2019 11:00 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Senthilnathan,
>
> Thanks for revamping on the KIP. I have only one comment about the 
> wiki otherwise LGTM.
>
> 1. We should emphasize that the newly introduced config yields to the 
> existing "log.cleanup.policy", i.e. if the latter's value is `delete` 
> not `compact`, then the previous config would be ignored.
>
>
> Guozhang
>
> On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy < 
> senth...@microsoft.com.invalid> wrote:
>
> > Hi all,
> >
> > I will start the vote thread shortly for this updated KIP. If there 
> > are any more thoughts I would love to hear them.
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-----
> > From: Senthilnathan Muthusamy 
> > Sent: Thursday, October 31, 2019 3:51 AM
> > To: dev@kafka.apache.org
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi Matthias
> >
> > Thanks for the response.
> >
> > (1) Yes
> >
> > (2) Yes, and the config name will be the same (i.e.
> > `log.cleaner.compaction.strategy` &
> > `log.cleaner.compaction.strategy.header`) at broker level and topic 
> > level (to override broker level default compact strategy). Please 
> > let me know if we need to keep it in different naming convention. Note:
> > Broker level (which will be in the server.properties) configuration 
> > is optional and default it to offset. Topic level configuration will 
> > be default to broker level config...
> >
> > (3) By this new way, it avoids another config parameter and also in 
> > feature if any new strategy like header need addition info, no 
> > additional config required. As this got discussed already and agreed 
> > to have separate config, I will revert it.

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-07 Thread Jun Rao
Hi, Senthil,

Thanks for bringing back this KIP. Overall, this seems like a useful
feature. A few comments below.

50. One use case for the timestamp based compaction is to resolve conflicts
during data center failures. The failover of a data center typically
happens much longer tha millisec. So, timestamp could be enough to
determine the value to keep.

51. With the timestamp/header strategy, it seems that it may now be
possible that the last record could be removed during compaction. For
example, if the active segment is empty, the last record in the previous
segment could be removed due to compaction. A new replica then won't see
the true end offset of the partition. If that replica ever becomes the
leader, it could write a different record on the same end offset, which
will be weird.

52. With the timestamp/header strategy, the behavior of the application may
need to change. In particular, the application can't just blindly take the
record with a larger offset and assuming that it's the value to keep. It
needs to check the timestamp or the header now. So, it would be useful to
at least document this.

53. This also adds complexity for deletes. Currently, we use a null payload
to indicate a delete tombstone. The tombstone can be removed once all
previous records with the same key have been removed. If the new strategies
apply to tombstones, it's not clear when a tombstone can be removed since
subsequent records could have timestamp/sequenceId smaller than that in the
tombstone. It would be useful to think this through and document the
expected behavior.

Jun

On Tue, Nov 5, 2019 at 11:37 AM Senthilnathan Muthusamy
 wrote:

> Hi Guozhang,
>
> Sure and I have made a note in the JIRA item to make sure the wiki is
> updated.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Guozhang Wang 
> Sent: Monday, November 4, 2019 11:00 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Senthilnathan,
>
> Thanks for revamping on the KIP. I have only one comment about the wiki
> otherwise LGTM.
>
> 1. We should emphasize that the newly introduced config yields to the
> existing "log.cleanup.policy", i.e. if the latter's value is `delete` not
> `compact`, then the previous config would be ignored.
>
>
> Guozhang
>
> On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
>
> > Hi all,
> >
> > I will start the vote thread shortly for this updated KIP. If there
> > are any more thoughts I would love to hear them.
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-----
> > From: Senthilnathan Muthusamy 
> > Sent: Thursday, October 31, 2019 3:51 AM
> > To: dev@kafka.apache.org
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi Matthias
> >
> > Thanks for the response.
> >
> > (1) Yes
> >
> > (2) Yes, and the config name will be the same (i.e.
> > `log.cleaner.compaction.strategy` &
> > `log.cleaner.compaction.strategy.header`) at broker level and topic
> > level (to override broker level default compact strategy). Please let
> > me know if we need to keep it in different naming convention. Note:
> > Broker level (which will be in the server.properties) configuration is
> > optional and default it to offset. Topic level configuration will be
> > default to broker level config...
> >
> > (3) By this new way, it avoids another config parameter and also in
> > feature if any new strategy like header need addition info, no
> > additional config required. As this got discussed already and agreed
> > to have separate config, I will revert it. KIP updated...
> >
> > (4) Done
> >
> > (5) Updated
> >
> > (6) Updated to pick the first header in the list
> >
> > Please let me know if you have any other questions.
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-
> > From: Matthias J. Sax 
> > Sent: Thursday, October 31, 2019 12:13 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Thanks for picking up this KIP, Senthil.
> >
> > (1) As far as I remember, the main issue of the original proposal was
> > a missing topic level configuration for the compaction strategy. With
> > this being addressed, I am in favor of this KIP.
> >
> > (2) With regard to (1), it seems we would need a new topic level
> > config `compaction.strategy`, and `log.cleaner.compaction.strategy`
> > would be the default strategy (ie, broker level config) if a topic does
> not overwrite it?
> >
> > (3) Why did 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-05 Thread Matthias J. Sax
Thanks for updating the KIP, Senthil.

@Eric: good point about using the last found header for the key instead
of the first!

I don't have any further comments at this point.


-Matthias

On 11/5/19 11:37 AM, Senthilnathan Muthusamy wrote:
> Hi Guozhang,
> 
> Sure and I have made a note in the JIRA item to make sure the wiki is updated.
> 
> Thanks,
> Senthil
> 
> -Original Message-
> From: Guozhang Wang  
> Sent: Monday, November 4, 2019 11:00 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hello Senthilnathan,
> 
> Thanks for revamping on the KIP. I have only one comment about the wiki 
> otherwise LGTM.
> 
> 1. We should emphasize that the newly introduced config yields to the 
> existing "log.cleanup.policy", i.e. if the latter's value is `delete` not 
> `compact`, then the previous config would be ignored.
> 
> 
> Guozhang
> 
> On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy 
>  wrote:
> 
>> Hi all,
>>
>> I will start the vote thread shortly for this updated KIP. If there 
>> are any more thoughts I would love to hear them.
>>
>> Thanks,
>> Senthil
>>
>> -----Original Message-
>> From: Senthilnathan Muthusamy 
>> Sent: Thursday, October 31, 2019 3:51 AM
>> To: dev@kafka.apache.org
>> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
>>
>> Hi Matthias
>>
>> Thanks for the response.
>>
>> (1) Yes
>>
>> (2) Yes, and the config name will be the same (i.e.
>> `log.cleaner.compaction.strategy` &
>> `log.cleaner.compaction.strategy.header`) at broker level and topic 
>> level (to override broker level default compact strategy). Please let 
>> me know if we need to keep it in different naming convention. Note: 
>> Broker level (which will be in the server.properties) configuration is 
>> optional and default it to offset. Topic level configuration will be 
>> default to broker level config...
>>
>> (3) By this new way, it avoids another config parameter and also in 
>> feature if any new strategy like header need addition info, no 
>> additional config required. As this got discussed already and agreed 
>> to have separate config, I will revert it. KIP updated...
>>
>> (4) Done
>>
>> (5) Updated
>>
>> (6) Updated to pick the first header in the list
>>
>> Please let me know if you have any other questions.
>>
>> Thanks,
>> Senthil
>>
>> -Original Message-
>> From: Matthias J. Sax 
>> Sent: Thursday, October 31, 2019 12:13 AM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>>
>> Thanks for picking up this KIP, Senthil.
>>
>> (1) As far as I remember, the main issue of the original proposal was 
>> a missing topic level configuration for the compaction strategy. With 
>> this being addressed, I am in favor of this KIP.
>>
>> (2) With regard to (1), it seems we would need a new topic level 
>> config `compaction.strategy`, and `log.cleaner.compaction.strategy` 
>> would be the default strategy (ie, broker level config) if a topic does not 
>> overwrite it?
>>
>> (3) Why did you remove `log.cleaner.compaction.strategy.header`
>> parameter and change the accepted values of 
>> `log.cleaner.compaction.strategy` to "header." instead of keeping 
>> "header"? The original approach seems to be cleaner, and I think this 
>> was discussed on the original discuss thread already.
>>
>> (4) Nit: For the "timestamp" compaction strategy you changed the KIP 
>> to
>>
>> -> `The record [create] timestamp`
>>
>> This is miss leading IMHO, because it depends on the broker/log 
>> configuration `(log.)message.timestamp.type` that can either be 
>> `CreateTime` or `LogAppendTime` what the actual record timestamp is. I 
>> would just remove "create" to keep it unspecified.
>>
>> (5) Nit: the section "Public Interfaces" should list the newly 
>> introduced configs -- configuration parameters are a public interface.
>>
>> (6) What do you mean by "first level header lookup"? The term "first 
>> level" indicates some hierarchy, but headers don't have any hierarchy 
>> -- it's just a list of key-value pairs? If you mean the _order_ of the 
>> headers, ie, pick the first header in the list that matches the key, 
>> please rephrase it to make it clearer.
>>
>>
>>
>> @Tom: I agree with all you are saying, however, I still think that 
>

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-05 Thread Senthilnathan Muthusamy
Hi Guozhang,

Sure and I have made a note in the JIRA item to make sure the wiki is updated.

Thanks,
Senthil

-Original Message-
From: Guozhang Wang  
Sent: Monday, November 4, 2019 11:00 AM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hello Senthilnathan,

Thanks for revamping on the KIP. I have only one comment about the wiki 
otherwise LGTM.

1. We should emphasize that the newly introduced config yields to the existing 
"log.cleanup.policy", i.e. if the latter's value is `delete` not `compact`, 
then the previous config would be ignored.


Guozhang

On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy 
 wrote:

> Hi all,
>
> I will start the vote thread shortly for this updated KIP. If there 
> are any more thoughts I would love to hear them.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Senthilnathan Muthusamy 
> Sent: Thursday, October 31, 2019 3:51 AM
> To: dev@kafka.apache.org
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi Matthias
>
> Thanks for the response.
>
> (1) Yes
>
> (2) Yes, and the config name will be the same (i.e.
> `log.cleaner.compaction.strategy` &
> `log.cleaner.compaction.strategy.header`) at broker level and topic 
> level (to override broker level default compact strategy). Please let 
> me know if we need to keep it in different naming convention. Note: 
> Broker level (which will be in the server.properties) configuration is 
> optional and default it to offset. Topic level configuration will be 
> default to broker level config...
>
> (3) By this new way, it avoids another config parameter and also in 
> feature if any new strategy like header need addition info, no 
> additional config required. As this got discussed already and agreed 
> to have separate config, I will revert it. KIP updated...
>
> (4) Done
>
> (5) Updated
>
> (6) Updated to pick the first header in the list
>
> Please let me know if you have any other questions.
>
> Thanks,
> Senthil
>
> -----Original Message-
> From: Matthias J. Sax 
> Sent: Thursday, October 31, 2019 12:13 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Thanks for picking up this KIP, Senthil.
>
> (1) As far as I remember, the main issue of the original proposal was 
> a missing topic level configuration for the compaction strategy. With 
> this being addressed, I am in favor of this KIP.
>
> (2) With regard to (1), it seems we would need a new topic level 
> config `compaction.strategy`, and `log.cleaner.compaction.strategy` 
> would be the default strategy (ie, broker level config) if a topic does not 
> overwrite it?
>
> (3) Why did you remove `log.cleaner.compaction.strategy.header`
> parameter and change the accepted values of 
> `log.cleaner.compaction.strategy` to "header." instead of keeping 
> "header"? The original approach seems to be cleaner, and I think this 
> was discussed on the original discuss thread already.
>
> (4) Nit: For the "timestamp" compaction strategy you changed the KIP 
> to
>
> -> `The record [create] timestamp`
>
> This is miss leading IMHO, because it depends on the broker/log 
> configuration `(log.)message.timestamp.type` that can either be 
> `CreateTime` or `LogAppendTime` what the actual record timestamp is. I 
> would just remove "create" to keep it unspecified.
>
> (5) Nit: the section "Public Interfaces" should list the newly 
> introduced configs -- configuration parameters are a public interface.
>
> (6) What do you mean by "first level header lookup"? The term "first 
> level" indicates some hierarchy, but headers don't have any hierarchy 
> -- it's just a list of key-value pairs? If you mean the _order_ of the 
> headers, ie, pick the first header in the list that matches the key, 
> please rephrase it to make it clearer.
>
>
>
> @Tom: I agree with all you are saying, however, I still think that 
> this KIP will improve the overall situation, because everything you 
> pointed out is actually true with offset based compaction, too.
>
> The KIP is not a silver bullet that solves all issue for interleaved 
> writes, but I personally believe, it's a good improvement.
>
>
>
> -Matthias
>
>
> On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> > Hi,
> >
> > Please let me know if anyone has any questions on this updated KIP-280...
> >
> > Thanks,
> >
> > Senthil
> >
> > -Original Message-
> > From: Senthilnathan Muthusamy 
> > Sent: Monday, October 28, 2019 11:36 PM
> > To: dev@kafka.apache.org
> > Subject: RE

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-05 Thread Senthilnathan Muthusamy
Thanks for pointing it out Eric. Updated the KIP...

Regards,
Senthil

-Original Message-
From: Guozhang Wang  
Sent: Monday, November 4, 2019 11:52 AM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Eric,

I think that's a good point, in `Headers.java` we also designed the API to only 
have `Header lastHeader(String key);`. I think picking the last header for that 
key would makes more sense since internally it is organized as a list such that 
newer headers could consider "overwriting" the older headers.


Guozhang

On Mon, Nov 4, 2019 at 11:31 AM Eric Azama  wrote:

> Hi Senthilnathan,
>
> Regarding Matthias's point 6, what is the reasoning for choosing the 
> first occurrence of the configured header? I believe this corresponds 
> to the oldest value for given key. If there are multiple values for a 
> key, it seems more intuitive that the newest value is the one that 
> should be used for compaction.
>
> Thanks,
> Eric
>
> On Mon, Nov 4, 2019 at 11:00 AM Guozhang Wang  wrote:
>
> > Hello Senthilnathan,
> >
> > Thanks for revamping on the KIP. I have only one comment about the 
> > wiki otherwise LGTM.
> >
> > 1. We should emphasize that the newly introduced config yields to 
> > the existing "log.cleanup.policy", i.e. if the latter's value is 
> > `delete` not `compact`, then the previous config would be ignored.
> >
> >
> > Guozhang
> >
> > On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy 
> >  wrote:
> >
> > > Hi all,
> > >
> > > I will start the vote thread shortly for this updated KIP. If 
> > > there are any more thoughts I would love to hear them.
> > >
> > > Thanks,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Senthilnathan Muthusamy 
> > > Sent: Thursday, October 31, 2019 3:51 AM
> > > To: dev@kafka.apache.org
> > > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Hi Matthias
> > >
> > > Thanks for the response.
> > >
> > > (1) Yes
> > >
> > > (2) Yes, and the config name will be the same (i.e.
> > > `log.cleaner.compaction.strategy` &
> > > `log.cleaner.compaction.strategy.header`) at broker level and 
> > > topic
> level
> > > (to override broker level default compact strategy). Please let me 
> > > know
> > if
> > > we need to keep it in different naming convention. Note: Broker 
> > > level (which will be in the server.properties) configuration is 
> > > optional and default it to offset. Topic level configuration will 
> > > be default to
> broker
> > > level config...
> > >
> > > (3) By this new way, it avoids another config parameter and also 
> > > in feature if any new strategy like header need addition info, no
> additional
> > > config required. As this got discussed already and agreed to have
> > separate
> > > config, I will revert it. KIP updated...
> > >
> > > (4) Done
> > >
> > > (5) Updated
> > >
> > > (6) Updated to pick the first header in the list
> > >
> > > Please let me know if you have any other questions.
> > >
> > > Thanks,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Matthias J. Sax 
> > > Sent: Thursday, October 31, 2019 12:13 AM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Thanks for picking up this KIP, Senthil.
> > >
> > > (1) As far as I remember, the main issue of the original proposal 
> > > was a missing topic level configuration for the compaction 
> > > strategy. With
> this
> > > being addressed, I am in favor of this KIP.
> > >
> > > (2) With regard to (1), it seems we would need a new topic level 
> > > config `compaction.strategy`, and 
> > > `log.cleaner.compaction.strategy` would be
> the
> > > default strategy (ie, broker level config) if a topic does not
> overwrite
> > it?
> > >
> > > (3) Why did you remove `log.cleaner.compaction.strategy.header`
> > > parameter and change the accepted values of 
> > > `log.cleaner.compaction.strategy` to "header." instead of 
> > > keeping "header"? The original approach seems to be cleaner, and I 
> > > think this
> was
> > > discussed on the original discuss thread already.
> > >
> > > (4) Nit: F

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Guozhang Wang
Eric,

I think that's a good point, in `Headers.java` we also designed the API to
only have `Header lastHeader(String key);`. I think picking the last header
for that key would makes more sense since internally it is organized as a
list such that newer headers could consider "overwriting" the older headers.


Guozhang

On Mon, Nov 4, 2019 at 11:31 AM Eric Azama  wrote:

> Hi Senthilnathan,
>
> Regarding Matthias's point 6, what is the reasoning for choosing the first
> occurrence of the configured header? I believe this corresponds to the
> oldest value for given key. If there are multiple values for a key, it
> seems more intuitive that the newest value is the one that should be used
> for compaction.
>
> Thanks,
> Eric
>
> On Mon, Nov 4, 2019 at 11:00 AM Guozhang Wang  wrote:
>
> > Hello Senthilnathan,
> >
> > Thanks for revamping on the KIP. I have only one comment about the wiki
> > otherwise LGTM.
> >
> > 1. We should emphasize that the newly introduced config yields to the
> > existing "log.cleanup.policy", i.e. if the latter's value is `delete` not
> > `compact`, then the previous config would be ignored.
> >
> >
> > Guozhang
> >
> > On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy
> >  wrote:
> >
> > > Hi all,
> > >
> > > I will start the vote thread shortly for this updated KIP. If there are
> > > any more thoughts I would love to hear them.
> > >
> > > Thanks,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Senthilnathan Muthusamy 
> > > Sent: Thursday, October 31, 2019 3:51 AM
> > > To: dev@kafka.apache.org
> > > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Hi Matthias
> > >
> > > Thanks for the response.
> > >
> > > (1) Yes
> > >
> > > (2) Yes, and the config name will be the same (i.e.
> > > `log.cleaner.compaction.strategy` &
> > > `log.cleaner.compaction.strategy.header`) at broker level and topic
> level
> > > (to override broker level default compact strategy). Please let me know
> > if
> > > we need to keep it in different naming convention. Note: Broker level
> > > (which will be in the server.properties) configuration is optional and
> > > default it to offset. Topic level configuration will be default to
> broker
> > > level config...
> > >
> > > (3) By this new way, it avoids another config parameter and also in
> > > feature if any new strategy like header need addition info, no
> additional
> > > config required. As this got discussed already and agreed to have
> > separate
> > > config, I will revert it. KIP updated...
> > >
> > > (4) Done
> > >
> > > (5) Updated
> > >
> > > (6) Updated to pick the first header in the list
> > >
> > > Please let me know if you have any other questions.
> > >
> > > Thanks,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Matthias J. Sax 
> > > Sent: Thursday, October 31, 2019 12:13 AM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Thanks for picking up this KIP, Senthil.
> > >
> > > (1) As far as I remember, the main issue of the original proposal was a
> > > missing topic level configuration for the compaction strategy. With
> this
> > > being addressed, I am in favor of this KIP.
> > >
> > > (2) With regard to (1), it seems we would need a new topic level config
> > > `compaction.strategy`, and `log.cleaner.compaction.strategy` would be
> the
> > > default strategy (ie, broker level config) if a topic does not
> overwrite
> > it?
> > >
> > > (3) Why did you remove `log.cleaner.compaction.strategy.header`
> > > parameter and change the accepted values of
> > > `log.cleaner.compaction.strategy` to "header." instead of keeping
> > > "header"? The original approach seems to be cleaner, and I think this
> was
> > > discussed on the original discuss thread already.
> > >
> > > (4) Nit: For the "timestamp" compaction strategy you changed the KIP to
> > >
> > > -> `The record [create] timestamp`
> > >
> > > This is miss leading IMHO, because it depends on the broker/log
> > > configuration `(log.)message.timestamp.type` that can either be
> > > `CreateTi

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Eric Azama
Hi Senthilnathan,

Regarding Matthias's point 6, what is the reasoning for choosing the first
occurrence of the configured header? I believe this corresponds to the
oldest value for given key. If there are multiple values for a key, it
seems more intuitive that the newest value is the one that should be used
for compaction.

Thanks,
Eric

On Mon, Nov 4, 2019 at 11:00 AM Guozhang Wang  wrote:

> Hello Senthilnathan,
>
> Thanks for revamping on the KIP. I have only one comment about the wiki
> otherwise LGTM.
>
> 1. We should emphasize that the newly introduced config yields to the
> existing "log.cleanup.policy", i.e. if the latter's value is `delete` not
> `compact`, then the previous config would be ignored.
>
>
> Guozhang
>
> On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy
>  wrote:
>
> > Hi all,
> >
> > I will start the vote thread shortly for this updated KIP. If there are
> > any more thoughts I would love to hear them.
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-
> > From: Senthilnathan Muthusamy 
> > Sent: Thursday, October 31, 2019 3:51 AM
> > To: dev@kafka.apache.org
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi Matthias
> >
> > Thanks for the response.
> >
> > (1) Yes
> >
> > (2) Yes, and the config name will be the same (i.e.
> > `log.cleaner.compaction.strategy` &
> > `log.cleaner.compaction.strategy.header`) at broker level and topic level
> > (to override broker level default compact strategy). Please let me know
> if
> > we need to keep it in different naming convention. Note: Broker level
> > (which will be in the server.properties) configuration is optional and
> > default it to offset. Topic level configuration will be default to broker
> > level config...
> >
> > (3) By this new way, it avoids another config parameter and also in
> > feature if any new strategy like header need addition info, no additional
> > config required. As this got discussed already and agreed to have
> separate
> > config, I will revert it. KIP updated...
> >
> > (4) Done
> >
> > (5) Updated
> >
> > (6) Updated to pick the first header in the list
> >
> > Please let me know if you have any other questions.
> >
> > Thanks,
> > Senthil
> >
> > -Original Message-
> > From: Matthias J. Sax 
> > Sent: Thursday, October 31, 2019 12:13 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Thanks for picking up this KIP, Senthil.
> >
> > (1) As far as I remember, the main issue of the original proposal was a
> > missing topic level configuration for the compaction strategy. With this
> > being addressed, I am in favor of this KIP.
> >
> > (2) With regard to (1), it seems we would need a new topic level config
> > `compaction.strategy`, and `log.cleaner.compaction.strategy` would be the
> > default strategy (ie, broker level config) if a topic does not overwrite
> it?
> >
> > (3) Why did you remove `log.cleaner.compaction.strategy.header`
> > parameter and change the accepted values of
> > `log.cleaner.compaction.strategy` to "header." instead of keeping
> > "header"? The original approach seems to be cleaner, and I think this was
> > discussed on the original discuss thread already.
> >
> > (4) Nit: For the "timestamp" compaction strategy you changed the KIP to
> >
> > -> `The record [create] timestamp`
> >
> > This is miss leading IMHO, because it depends on the broker/log
> > configuration `(log.)message.timestamp.type` that can either be
> > `CreateTime` or `LogAppendTime` what the actual record timestamp is. I
> > would just remove "create" to keep it unspecified.
> >
> > (5) Nit: the section "Public Interfaces" should list the newly introduced
> > configs -- configuration parameters are a public interface.
> >
> > (6) What do you mean by "first level header lookup"? The term "first
> > level" indicates some hierarchy, but headers don't have any hierarchy --
> > it's just a list of key-value pairs? If you mean the _order_ of the
> > headers, ie, pick the first header in the list that matches the key,
> please
> > rephrase it to make it clearer.
> >
> >
> >
> > @Tom: I agree with all you are saying, however, I still think that this
> > KIP will improve the overall situation, because everything you pointed
> out
> > is actuall

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Guozhang Wang
Hello Senthilnathan,

Thanks for revamping on the KIP. I have only one comment about the wiki
otherwise LGTM.

1. We should emphasize that the newly introduced config yields to the
existing "log.cleanup.policy", i.e. if the latter's value is `delete` not
`compact`, then the previous config would be ignored.


Guozhang

On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy
 wrote:

> Hi all,
>
> I will start the vote thread shortly for this updated KIP. If there are
> any more thoughts I would love to hear them.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Senthilnathan Muthusamy 
> Sent: Thursday, October 31, 2019 3:51 AM
> To: dev@kafka.apache.org
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi Matthias
>
> Thanks for the response.
>
> (1) Yes
>
> (2) Yes, and the config name will be the same (i.e.
> `log.cleaner.compaction.strategy` &
> `log.cleaner.compaction.strategy.header`) at broker level and topic level
> (to override broker level default compact strategy). Please let me know if
> we need to keep it in different naming convention. Note: Broker level
> (which will be in the server.properties) configuration is optional and
> default it to offset. Topic level configuration will be default to broker
> level config...
>
> (3) By this new way, it avoids another config parameter and also in
> feature if any new strategy like header need addition info, no additional
> config required. As this got discussed already and agreed to have separate
> config, I will revert it. KIP updated...
>
> (4) Done
>
> (5) Updated
>
> (6) Updated to pick the first header in the list
>
> Please let me know if you have any other questions.
>
> Thanks,
> Senthil
>
> -----Original Message-
> From: Matthias J. Sax 
> Sent: Thursday, October 31, 2019 12:13 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Thanks for picking up this KIP, Senthil.
>
> (1) As far as I remember, the main issue of the original proposal was a
> missing topic level configuration for the compaction strategy. With this
> being addressed, I am in favor of this KIP.
>
> (2) With regard to (1), it seems we would need a new topic level config
> `compaction.strategy`, and `log.cleaner.compaction.strategy` would be the
> default strategy (ie, broker level config) if a topic does not overwrite it?
>
> (3) Why did you remove `log.cleaner.compaction.strategy.header`
> parameter and change the accepted values of
> `log.cleaner.compaction.strategy` to "header." instead of keeping
> "header"? The original approach seems to be cleaner, and I think this was
> discussed on the original discuss thread already.
>
> (4) Nit: For the "timestamp" compaction strategy you changed the KIP to
>
> -> `The record [create] timestamp`
>
> This is miss leading IMHO, because it depends on the broker/log
> configuration `(log.)message.timestamp.type` that can either be
> `CreateTime` or `LogAppendTime` what the actual record timestamp is. I
> would just remove "create" to keep it unspecified.
>
> (5) Nit: the section "Public Interfaces" should list the newly introduced
> configs -- configuration parameters are a public interface.
>
> (6) What do you mean by "first level header lookup"? The term "first
> level" indicates some hierarchy, but headers don't have any hierarchy --
> it's just a list of key-value pairs? If you mean the _order_ of the
> headers, ie, pick the first header in the list that matches the key, please
> rephrase it to make it clearer.
>
>
>
> @Tom: I agree with all you are saying, however, I still think that this
> KIP will improve the overall situation, because everything you pointed out
> is actually true with offset based compaction, too.
>
> The KIP is not a silver bullet that solves all issue for interleaved
> writes, but I personally believe, it's a good improvement.
>
>
>
> -Matthias
>
>
> On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> > Hi,
> >
> > Please let me know if anyone has any questions on this updated KIP-280...
> >
> > Thanks,
> >
> > Senthil
> >
> > -Original Message-
> > From: Senthilnathan Muthusamy 
> > Sent: Monday, October 28, 2019 11:36 PM
> > To: dev@kafka.apache.org
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi Tom,
> >
> > Sorry for the delayed response.
> >
> > Regarding the fall back to offset decision for both timestamp & header
> value is based on the previous author discuss
> https://nam06.safelinks.protection.outl

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Senthilnathan Muthusamy
Hi all,

I will start the vote thread shortly for this updated KIP. If there are any 
more thoughts I would love to hear them.

Thanks,
Senthil

-Original Message-
From: Senthilnathan Muthusamy  
Sent: Thursday, October 31, 2019 3:51 AM
To: dev@kafka.apache.org
Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction

Hi Matthias

Thanks for the response.

(1) Yes

(2) Yes, and the config name will be the same (i.e. 
`log.cleaner.compaction.strategy` & `log.cleaner.compaction.strategy.header`) 
at broker level and topic level (to override broker level default compact 
strategy). Please let me know if we need to keep it in different naming 
convention. Note: Broker level (which will be in the server.properties) 
configuration is optional and default it to offset. Topic level configuration 
will be default to broker level config...

(3) By this new way, it avoids another config parameter and also in feature if 
any new strategy like header need addition info, no additional config required. 
As this got discussed already and agreed to have separate config, I will revert 
it. KIP updated...

(4) Done

(5) Updated

(6) Updated to pick the first header in the list

Please let me know if you have any other questions.

Thanks,
Senthil

-Original Message-
From: Matthias J. Sax 
Sent: Thursday, October 31, 2019 12:13 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Thanks for picking up this KIP, Senthil.

(1) As far as I remember, the main issue of the original proposal was a missing 
topic level configuration for the compaction strategy. With this being 
addressed, I am in favor of this KIP.

(2) With regard to (1), it seems we would need a new topic level config 
`compaction.strategy`, and `log.cleaner.compaction.strategy` would be the 
default strategy (ie, broker level config) if a topic does not overwrite it?

(3) Why did you remove `log.cleaner.compaction.strategy.header`
parameter and change the accepted values of `log.cleaner.compaction.strategy` 
to "header." instead of keeping "header"? The original approach seems to 
be cleaner, and I think this was discussed on the original discuss thread 
already.

(4) Nit: For the "timestamp" compaction strategy you changed the KIP to

-> `The record [create] timestamp`

This is miss leading IMHO, because it depends on the broker/log configuration 
`(log.)message.timestamp.type` that can either be `CreateTime` or 
`LogAppendTime` what the actual record timestamp is. I would just remove 
"create" to keep it unspecified.

(5) Nit: the section "Public Interfaces" should list the newly introduced 
configs -- configuration parameters are a public interface.

(6) What do you mean by "first level header lookup"? The term "first level" 
indicates some hierarchy, but headers don't have any hierarchy -- it's just a 
list of key-value pairs? If you mean the _order_ of the headers, ie, pick the 
first header in the list that matches the key, please rephrase it to make it 
clearer.



@Tom: I agree with all you are saying, however, I still think that this KIP 
will improve the overall situation, because everything you pointed out is 
actually true with offset based compaction, too.

The KIP is not a silver bullet that solves all issue for interleaved writes, 
but I personally believe, it's a good improvement.



-Matthias


On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> Hi,
> 
> Please let me know if anyone has any questions on this updated KIP-280...
> 
> Thanks,
> 
> Senthil
> 
> -Original Message-
> From: Senthilnathan Muthusamy 
> Sent: Monday, October 28, 2019 11:36 PM
> To: dev@kafka.apache.org
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hi Tom,
> 
> Sorry for the delayed response.
> 
> Regarding the fall back to offset decision for both timestamp & header value 
> is based on the previous author discuss 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67%40%253Cdev.kafka.apache.org%253Edata=02%7C01%7Csenthilm%40microsoft.com%7Cb5c596140be1436e9fb708d75df04714%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637081159484181661sdata=%2Fap4F2CdPQe02wNDGkzjzIrxOQRTa2KraQE75dpjTzE%3Dreserved=0
>  and as per the discussion, it is really required to avoid duplicates.
> 
> And the timestamp strategy is from the original KIP author and we are keeping 
> it as is.
> 
> Finally on the sequence order guarantee by the producer, it is not feasible 
> on waiting for ack in async / multi-threads/processes scenarios and hence the 
> header sequence based compact strategy with producer's responsibility to have 
> a unique sequence generation for the topic-partition-key level.
> 
> Hoping this clarifies all your questions. Pl

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-31 Thread Senthilnathan Muthusamy
Hi Matthias

Thanks for the response.

(1) Yes

(2) Yes, and the config name will be the same (i.e. 
`log.cleaner.compaction.strategy` & `log.cleaner.compaction.strategy.header`) 
at broker level and topic level (to override broker level default compact 
strategy). Please let me know if we need to keep it in different naming 
convention. Note: Broker level (which will be in the server.properties) 
configuration is optional and default it to offset. Topic level configuration 
will be default to broker level config...

(3) By this new way, it avoids another config parameter and also in feature if 
any new strategy like header need addition info, no additional config required. 
As this got discussed already and agreed to have separate config, I will revert 
it. KIP updated...

(4) Done

(5) Updated

(6) Updated to pick the first header in the list

Please let me know if you have any other questions.

Thanks,
Senthil

-Original Message-
From: Matthias J. Sax 
Sent: Thursday, October 31, 2019 12:13 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Thanks for picking up this KIP, Senthil.

(1) As far as I remember, the main issue of the original proposal was a missing 
topic level configuration for the compaction strategy. With this being 
addressed, I am in favor of this KIP.

(2) With regard to (1), it seems we would need a new topic level config 
`compaction.strategy`, and `log.cleaner.compaction.strategy` would be the 
default strategy (ie, broker level config) if a topic does not overwrite it?

(3) Why did you remove `log.cleaner.compaction.strategy.header`
parameter and change the accepted values of `log.cleaner.compaction.strategy` 
to "header." instead of keeping "header"? The original approach seems to 
be cleaner, and I think this was discussed on the original discuss thread 
already.

(4) Nit: For the "timestamp" compaction strategy you changed the KIP to

-> `The record [create] timestamp`

This is miss leading IMHO, because it depends on the broker/log configuration 
`(log.)message.timestamp.type` that can either be `CreateTime` or 
`LogAppendTime` what the actual record timestamp is. I would just remove 
"create" to keep it unspecified.

(5) Nit: the section "Public Interfaces" should list the newly introduced 
configs -- configuration parameters are a public interface.

(6) What do you mean by "first level header lookup"? The term "first level" 
indicates some hierarchy, but headers don't have any hierarchy -- it's just a 
list of key-value pairs? If you mean the _order_ of the headers, ie, pick the 
first header in the list that matches the key, please rephrase it to make it 
clearer.



@Tom: I agree with all you are saying, however, I still think that this KIP 
will improve the overall situation, because everything you pointed out is 
actually true with offset based compaction, too.

The KIP is not a silver bullet that solves all issue for interleaved writes, 
but I personally believe, it's a good improvement.



-Matthias


On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> Hi,
> 
> Please let me know if anyone has any questions on this updated KIP-280...
> 
> Thanks,
> 
> Senthil
> 
> -Original Message-
> From: Senthilnathan Muthusamy 
> Sent: Monday, October 28, 2019 11:36 PM
> To: dev@kafka.apache.org
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hi Tom,
> 
> Sorry for the delayed response.
> 
> Regarding the fall back to offset decision for both timestamp & header value 
> is based on the previous author discuss 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67%40%253Cdev.kafka.apache.org%253Edata=02%7C01%7Csenthilm%40microsoft.com%7Cfce3eed73837437b5d6b08d75c3a4692%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079277707943399sdata=pTrTHHOD5KCtLM2YXSMWmPBMx%2BI9L5PeEe5QEcwzkKA%3Dreserved=0
>  and as per the discussion, it is really required to avoid duplicates.
> 
> And the timestamp strategy is from the original KIP author and we are keeping 
> it as is.
> 
> Finally on the sequence order guarantee by the producer, it is not feasible 
> on waiting for ack in async / multi-threads/processes scenarios and hence the 
> header sequence based compact strategy with producer's responsibility to have 
> a unique sequence generation for the topic-partition-key level.
> 
> Hoping this clarifies all your questions. Please let us know if you have any 
> further questions.
> 
> @Guozhang Wang / @Matthias J. Sax, I see you both had a detail discussion on 
> the original KIP with previous author and it would great to hear your inputs 
> as well.
> 
> Thanks,
> Senthil
> 
> -Original Message-
> From: Tom 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-31 Thread Matthias J. Sax
Thanks for picking up this KIP, Senthil.

(1) As far as I remember, the main issue of the original proposal was a
missing topic level configuration for the compaction strategy. With this
being addressed, I am in favor of this KIP.

(2) With regard to (1), it seems we would need a new topic level config
`compaction.strategy`, and `log.cleaner.compaction.strategy` would be
the default strategy (ie, broker level config) if a topic does not
overwrite it?

(3) Why did you remove `log.cleaner.compaction.strategy.header`
parameter and change the accepted values of
`log.cleaner.compaction.strategy` to "header." instead of keeping
"header"? The original approach seems to be cleaner, and I think this
was discussed on the original discuss thread already.

(4) Nit: For the "timestamp" compaction strategy you changed the KIP to

-> `The record [create] timestamp`

This is miss leading IMHO, because it depends on the broker/log
configuration `(log.)message.timestamp.type` that can either be
`CreateTime` or `LogAppendTime` what the actual record timestamp is. I
would just remove "create" to keep it unspecified.

(5) Nit: the section "Public Interfaces" should list the newly
introduced configs -- configuration parameters are a public interface.

(6) What do you mean by "first level header lookup"? The term "first
level" indicates some hierarchy, but headers don't have any hierarchy --
it's just a list of key-value pairs? If you mean the _order_ of the
headers, ie, pick the first header in the list that matches the key,
please rephrase it to make it clearer.



@Tom: I agree with all you are saying, however, I still think that this
KIP will improve the overall situation, because everything you pointed
out is actually true with offset based compaction, too.

The KIP is not a silver bullet that solves all issue for interleaved
writes, but I personally believe, it's a good improvement.



-Matthias


On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> Hi,
> 
> Please let me know if anyone has any questions on this updated KIP-280...
> 
> Thanks,
> 
> Senthil
> 
> -Original Message-
> From: Senthilnathan Muthusamy  
> Sent: Monday, October 28, 2019 11:36 PM
> To: dev@kafka.apache.org
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hi Tom,
> 
> Sorry for the delayed response.
> 
> Regarding the fall back to offset decision for both timestamp & header value 
> is based on the previous author discuss 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67%40%253Cdev.kafka.apache.org%253Edata=02%7C01%7Csenthilm%40microsoft.com%7Cfce3eed73837437b5d6b08d75c3a4692%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079277707943399sdata=pTrTHHOD5KCtLM2YXSMWmPBMx%2BI9L5PeEe5QEcwzkKA%3Dreserved=0
>  and as per the discussion, it is really required to avoid duplicates.
> 
> And the timestamp strategy is from the original KIP author and we are keeping 
> it as is.
> 
> Finally on the sequence order guarantee by the producer, it is not feasible 
> on waiting for ack in async / multi-threads/processes scenarios and hence the 
> header sequence based compact strategy with producer's responsibility to have 
> a unique sequence generation for the topic-partition-key level.
> 
> Hoping this clarifies all your questions. Please let us know if you have any 
> further questions.
> 
> @Guozhang Wang / @Matthias J. Sax, I see you both had a detail discussion on 
> the original KIP with previous author and it would great to hear your inputs 
> as well.
> 
> Thanks,
> Senthil
> 
> -Original Message-
> From: Tom Bentley 
> Sent: Tuesday, October 22, 2019 2:32 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hi Senthilnathan,
> 
> In the motivation isn't it a little misleading to say "On the producer side, 
> we clearly preserve an order for the two messages,   V2>"? IMHO, the semantics of the producer are clear that having an 
> V2>observed
> order of sending records from different producers is not sufficient to 
> guarantee ordering on the broker. You really need to send the 2nd record only 
> after the 1st record is acked. It's the difficultly of achieving that in 
> practice that's the true motivation for your KIP.
> 
> I can see the attraction of using timestamps, but it would be helpful to 
> explain how that really solves the problem. When the producers are in 
> different processes on different machines you're relying on their clocks 
> being synchronized, which is a whole subject in itself. Even if they're 
> synchronized the resolution of System.currentTimeMillis() is typically many 
> milli

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-30 Thread Senthilnathan Muthusamy
Hi,

Please let me know if anyone has any questions on this updated KIP-280...

Thanks,

Senthil

-Original Message-
From: Senthilnathan Muthusamy  
Sent: Monday, October 28, 2019 11:36 PM
To: dev@kafka.apache.org
Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction

Hi Tom,

Sorry for the delayed response.

Regarding the fall back to offset decision for both timestamp & header value is 
based on the previous author discuss 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67%40%253Cdev.kafka.apache.org%253Edata=02%7C01%7Csenthilm%40microsoft.com%7Cfce3eed73837437b5d6b08d75c3a4692%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079277707943399sdata=pTrTHHOD5KCtLM2YXSMWmPBMx%2BI9L5PeEe5QEcwzkKA%3Dreserved=0
 and as per the discussion, it is really required to avoid duplicates.

And the timestamp strategy is from the original KIP author and we are keeping 
it as is.

Finally on the sequence order guarantee by the producer, it is not feasible on 
waiting for ack in async / multi-threads/processes scenarios and hence the 
header sequence based compact strategy with producer's responsibility to have a 
unique sequence generation for the topic-partition-key level.

Hoping this clarifies all your questions. Please let us know if you have any 
further questions.

@Guozhang Wang / @Matthias J. Sax, I see you both had a detail discussion on 
the original KIP with previous author and it would great to hear your inputs as 
well.

Thanks,
Senthil

-Original Message-
From: Tom Bentley 
Sent: Tuesday, October 22, 2019 2:32 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hi Senthilnathan,

In the motivation isn't it a little misleading to say "On the producer side, we 
clearly preserve an order for the two messages,  "? IMHO, the semantics of the producer are clear that having an 
V2>observed
order of sending records from different producers is not sufficient to 
guarantee ordering on the broker. You really need to send the 2nd record only 
after the 1st record is acked. It's the difficultly of achieving that in 
practice that's the true motivation for your KIP.

I can see the attraction of using timestamps, but it would be helpful to 
explain how that really solves the problem. When the producers are in different 
processes on different machines you're relying on their clocks being 
synchronized, which is a whole subject in itself. Even if they're synchronized 
the resolution of System.currentTimeMillis() is typically many milliseconds. If 
your producers are in different threads of the same process that could be a 
real problem because it makes ties quite likely.
And you don't explain why it's OK to resolve ties using the offset. The basis 
of your argument is that the offset is giving you the wrong answer.
So it seems to me that using it as a tiebreaker is just narrowing the chances 
of getting the wrong answer. Maybe none of this matters for your use case, but 
I think it should be spelled out in the KIP, because it surely would matter for 
similar use cases.

Using a sequence at least removes the problem of ties, but the interesting bit 
is now in how you deal with races between threads/processes in getting a 
sequence number allocated (which is out of scope of the KIP, I guess).
How is resolving that race any simpler that resolving the motivating race by 
waiting for the ack of the first record sent?

Kind regards,

Tom

On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy 
 wrote:

> Hi All,
>
> We are bring back the KIP-280 to live with small correct for the 
> discussion & voting. Thanks to previous author Luis Cabral on the
> KIP-280 initiation and we are taking over to complete and get it into 2.4...
>
> Below is the correction that we made to the existing KIP-280:
>
>   *   Allowing the compact strategy configuration at the topic level as
> the log compaction is at the topic level and a broker can have 
> multiple topics. This allows the flexibility to have the strategy at 
> both broker level (i.e. for all topics within the broker) and topic 
> level (i.e. for a subset of topics within a broker) as well...
>
> KIP-280:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced%
> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7C686c3
> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0
> %7C637073341017520406sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeAa7Gs6
> XqJ%2F5SQ%3Dreserved=0 PULL REQUEST: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Fkafka%2Fpull%2F7528data=02%7C01%7Csenthilm%40mi
> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2
> d7cd011db47%7C1%7C0%7C6370733

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-29 Thread Senthilnathan Muthusamy
Hi Tom,

Sorry for the delayed response.

Regarding the fall back to offset decision for both timestamp & header value is 
based on the previous author discuss 
https://lists.apache.org/thread.html/f44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67@%3Cdev.kafka.apache.org%3E
 and as per the discussion, it is really required to avoid duplicates.

And the timestamp strategy is from the original KIP author and we are keeping 
it as is.

Finally on the sequence order guarantee by the producer, it is not feasible on 
waiting for ack in async / multi-threads/processes scenarios and hence the 
header sequence based compact strategy with producer's responsibility to have a 
unique sequence generation for the topic-partition-key level.

Hoping this clarifies all your questions. Please let us know if you have any 
further questions.

@Guozhang Wang / @Matthias J. Sax, I see you both had a detail discussion on 
the original KIP with previous author and it would great to hear your inputs as 
well.

Thanks,
Senthil

-Original Message-
From: Tom Bentley  
Sent: Tuesday, October 22, 2019 2:32 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hi Senthilnathan,

In the motivation isn't it a little misleading to say "On the producer side, we 
clearly preserve an order for the two messages,  "? IMHO, the semantics of the producer are clear that having an 
V2>observed
order of sending records from different producers is not sufficient to 
guarantee ordering on the broker. You really need to send the 2nd record only 
after the 1st record is acked. It's the difficultly of achieving that in 
practice that's the true motivation for your KIP.

I can see the attraction of using timestamps, but it would be helpful to 
explain how that really solves the problem. When the producers are in different 
processes on different machines you're relying on their clocks being 
synchronized, which is a whole subject in itself. Even if they're synchronized 
the resolution of System.currentTimeMillis() is typically many milliseconds. If 
your producers are in different threads of the same process that could be a 
real problem because it makes ties quite likely.
And you don't explain why it's OK to resolve ties using the offset. The basis 
of your argument is that the offset is giving you the wrong answer.
So it seems to me that using it as a tiebreaker is just narrowing the chances 
of getting the wrong answer. Maybe none of this matters for your use case, but 
I think it should be spelled out in the KIP, because it surely would matter for 
similar use cases.

Using a sequence at least removes the problem of ties, but the interesting bit 
is now in how you deal with races between threads/processes in getting a 
sequence number allocated (which is out of scope of the KIP, I guess).
How is resolving that race any simpler that resolving the motivating race by 
waiting for the ack of the first record sent?

Kind regards,

Tom

On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy 
 wrote:

> Hi All,
>
> We are bring back the KIP-280 to live with small correct for the 
> discussion & voting. Thanks to previous author Luis Cabral on the 
> KIP-280 initiation and we are taking over to complete and get it into 2.4...
>
> Below is the correction that we made to the existing KIP-280:
>
>   *   Allowing the compact strategy configuration at the topic level as
> the log compaction is at the topic level and a broker can have 
> multiple topics. This allows the flexibility to have the strategy at 
> both broker level (i.e. for all topics within the broker) and topic 
> level (i.e. for a subset of topics within a broker) as well...
>
> KIP-280:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced%
> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7C686c3
> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0
> %7C637073341017520406sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeAa7Gs6
> XqJ%2F5SQ%3Dreserved=0 PULL REQUEST: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Fkafka%2Fpull%2F7528data=02%7C01%7Csenthilm%40mi
> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2
> d7cd011db47%7C1%7C0%7C637073341017520406sdata=bt32PgDUjJjpXohEWpt
> Fxv6mPERCwcRFlVROzinBtnk%3Dreserved=0 (unit test coverage in 
> progress)
>
> Previous Thread DISCUSS:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.apache.org%2Fthread.html%2F79aa6e50d7c737ddf83455dd8063692a535a1afa5
> 58620fe1a1496d3%40%253Cdev.kafka.apache.org%253Edata=02%7C01%7Cse
> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86
> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406sdata=XwcUWWYD
> PV1nA%2BbkDGLFNlXZ5by

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-22 Thread Tom Bentley
Hi Senthilnathan,

In the motivation isn't it a little misleading to say "On the producer
side, we clearly preserve an order for the two messages,  "? IMHO, the semantics of the producer are clear that having an observed
order of sending records from different producers is not sufficient to
guarantee ordering on the broker. You really need to send the 2nd record
only after the 1st record is acked. It's the difficultly of achieving that
in practice that's the true motivation for your KIP.

I can see the attraction of using timestamps, but it would be helpful to
explain how that really solves the problem. When the producers are in
different processes on different machines you're relying on their clocks
being synchronized, which is a whole subject in itself. Even if they're
synchronized the resolution of System.currentTimeMillis() is typically many
milliseconds. If your producers are in different threads of the same
process that could be a real problem because it makes ties quite likely.
And you don't explain why it's OK to resolve ties using the offset. The
basis of your argument is that the offset is giving you the wrong answer.
So it seems to me that using it as a tiebreaker is just narrowing the
chances of getting the wrong answer. Maybe none of this matters for your
use case, but I think it should be spelled out in the KIP, because it
surely would matter for similar use cases.

Using a sequence at least removes the problem of ties, but the interesting
bit is now in how you deal with races between threads/processes in getting
a sequence number allocated (which is out of scope of the KIP, I guess).
How is resolving that race any simpler that resolving the motivating race
by waiting for the ack of the first record sent?

Kind regards,

Tom

On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy
 wrote:

> Hi All,
>
> We are bring back the KIP-280 to live with small correct for the
> discussion & voting. Thanks to previous author Luis Cabral on the KIP-280
> initiation and we are taking over to complete and get it into 2.4...
>
> Below is the correction that we made to the existing KIP-280:
>
>   *   Allowing the compact strategy configuration at the topic level as
> the log compaction is at the topic level and a broker can have multiple
> topics. This allows the flexibility to have the strategy at both broker
> level (i.e. for all topics within the broker) and topic level (i.e. for a
> subset of topics within a broker) as well...
>
> KIP-280:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction
> PULL REQUEST: https://github.com/apache/kafka/pull/7528 (unit test
> coverage in progress)
>
> Previous Thread DISCUSS:
> https://lists.apache.org/thread.html/79aa6e50d7c737ddf83455dd8063692a535a1afa558620fe1a1496d3@%3Cdev.kafka.apache.org%3E
> Previous Thread VOTE:
> https://lists.apache.org/thread.html/b2ecd73ce849741f0c40b4f801c3f7650583497812713e240e1ac2b7@%3Cdev.kafka.apache.org%3E
>
> Appreciate your timely action.
>
> PS: Initiating a separate thread as I was not able to reply to the
> existing threads...
>
> Thanks,
> Senthil
>


[DISCUSS] KIP-280: Enhanced log compaction

2019-10-21 Thread Senthilnathan Muthusamy
Hi All,

We are bring back the KIP-280 to live with small correct for the discussion & 
voting. Thanks to previous author Luis Cabral on the KIP-280 initiation and we 
are taking over to complete and get it into 2.4...

Below is the correction that we made to the existing KIP-280:

  *   Allowing the compact strategy configuration at the topic level as the log 
compaction is at the topic level and a broker can have multiple topics. This 
allows the flexibility to have the strategy at both broker level (i.e. for all 
topics within the broker) and topic level (i.e. for a subset of topics within a 
broker) as well...

KIP-280: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction
PULL REQUEST: https://github.com/apache/kafka/pull/7528 (unit test coverage in 
progress)

Previous Thread DISCUSS: 
https://lists.apache.org/thread.html/79aa6e50d7c737ddf83455dd8063692a535a1afa558620fe1a1496d3@%3Cdev.kafka.apache.org%3E
Previous Thread VOTE: 
https://lists.apache.org/thread.html/b2ecd73ce849741f0c40b4f801c3f7650583497812713e240e1ac2b7@%3Cdev.kafka.apache.org%3E

Appreciate your timely action.

PS: Initiating a separate thread as I was not able to reply to the existing 
threads...

Thanks,
Senthil


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-10-22 Thread Luís Cabral
 Since this is not moving forward, how about I proceed with the currently 
documented changes, and any improvements (such as configuration changes) can be 
taken up afterwards by whoever wants it under a different KIP?
On Thursday, October 11, 2018, 9:47:12 AM GMT+2, Luís Cabral 
 wrote:  
 
  Hi Matthias,

How can this be done?

Kind Regards,
Luis
 
On Sunday, September 30, 2018, 9:10:01 PM GMT+2, Matthias J. Sax 
 wrote:  
 
 Luis,

What is the status of this KIP?

I tend to agree, that introducing the feature only globally, might be
less useful (I would assume that people want to turn it on, on a
per-topic basis). As I am not familiar with the corresponding code, I
cannot judge the complexity to add topic level configs, however, it
seems to be worth to include it in the KIP.


-Matthias



On 9/21/18 1:59 PM, Bertus Greeff wrote:
> Someone pointed out to me that my scenario is also resolved by using Kafka 
> transactions.  Zombie fencing which is essentially my scenario was one of the 
> scenarios that transactions were designed to solve.  I was going to use the 
> ideas of this KIP to solve it but using transactions seems even better 
> because out of order messages never even make it into the topic.  They are 
> blocked by the broker.
> 
> -Original Message-
> From: Guozhang Wang  
> Sent: Saturday, September 1, 2018 11:33 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hello Luis,
> 
> Thanks for your thoughtful responses, here are my two cents:
> 
> 1) I think having the new configs with per-topic granularity would not 
> introduce much memory overhead or logic complexity, as all you need is to 
> remember this at the topic metadata cache. If I've missed some details about 
> the complexity, could you elaborate a bit more?
> 
> 2) I agree with you: the current `ConfigDef.Validator` only scope on the 
> validated config value itself, and hence cannot be dependent on another 
> config.
> 
> 4) I think Jun's point is that since we need the latest message in the log 
> segment for the timestamp tracking, we cannot delete it actually: with offset 
> based only policy, this is naturally guaranteed; but now with other policy, 
> it is not guaranteed to never be compacted away, and hence we need to 
> "enforce" to special-handle this case and not delete it.
> 
> 
> 
> Guozhang
> 
> 
> 
> Guozhang
> 
> 
> On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral 
> wrote:
> 
>> Hi all,
>>
>> Since there has been a rejuvenated interest in this KIP, it felt 
>> better to downgrade it back down to [DISCUSSION], as we aren't really 
>> voting on it anymore.
>> I'll try to address the currently pending questions on the following 
>> points, so please bear with me while we go through them all:
>>
>> 1) Configuration: Topic vs Global
>>
>> Here we all agree that having a configuration per-topic is the best 
>> option. However, this is not possible with the current design of the 
>> compaction solution. Yes, it is true that "some" properties that 
>> relate to compaction are configurable per-topic, but those are only 
>> the properties that act outside(!) of the actual compaction logic, 
>> such as configuring the start-compaction trigger with "ratio" or 
>> compaction duration with "lag.ms ".
>> This logic can, of course, be re-designed to suit our wishes, but this 
>> is not a direct effort, and if we have spent months arguing about the 
>> extra 8 bytes per record, for sure we would spend another few dozen 
>> months discussing the memory impact that doing this change to the 
>> properties will invariably have.
>> As such, I will limit this KIP to ONLY have these configurations globally.
>>
>> 2) Configuration: Fail-fast vs Fallback
>>
>>
>> Ideally, I would also like to prevent the application to start if the 
>> configuration is somehow invalid.
>> However (another 'however'), the way the configuration is handled 
>> prevents adding dependencies between them, so we can't add logic that 
>> says "configuration X is invalid if configuration Y is so-and-such".
>> Again, this can be re-designed to add this feature to the 
>> configuration logic, but it would again be a big change just by 
>> itself, so this KIP is again limited to use ONLY what is already in place.
>>
>> 3) Documenting the memory impact on the KIP
>>
>> This is now added to the KIP, though this topic is more complicated 
>> than 'memory impact'. E.g.: this change doesn't translate to an actual 
>> memory impact, it just means that the compaction will potentially 
>> encompass less

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-10-11 Thread Luís Cabral
 Hi Matthias,

How can this be done?

Kind Regards,
Luis
 
On Sunday, September 30, 2018, 9:10:01 PM GMT+2, Matthias J. Sax 
 wrote:  
 
 Luis,

What is the status of this KIP?

I tend to agree, that introducing the feature only globally, might be
less useful (I would assume that people want to turn it on, on a
per-topic basis). As I am not familiar with the corresponding code, I
cannot judge the complexity to add topic level configs, however, it
seems to be worth to include it in the KIP.


-Matthias



On 9/21/18 1:59 PM, Bertus Greeff wrote:
> Someone pointed out to me that my scenario is also resolved by using Kafka 
> transactions.  Zombie fencing which is essentially my scenario was one of the 
> scenarios that transactions were designed to solve.  I was going to use the 
> ideas of this KIP to solve it but using transactions seems even better 
> because out of order messages never even make it into the topic.  They are 
> blocked by the broker.
> 
> -Original Message-
> From: Guozhang Wang  
> Sent: Saturday, September 1, 2018 11:33 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hello Luis,
> 
> Thanks for your thoughtful responses, here are my two cents:
> 
> 1) I think having the new configs with per-topic granularity would not 
> introduce much memory overhead or logic complexity, as all you need is to 
> remember this at the topic metadata cache. If I've missed some details about 
> the complexity, could you elaborate a bit more?
> 
> 2) I agree with you: the current `ConfigDef.Validator` only scope on the 
> validated config value itself, and hence cannot be dependent on another 
> config.
> 
> 4) I think Jun's point is that since we need the latest message in the log 
> segment for the timestamp tracking, we cannot delete it actually: with offset 
> based only policy, this is naturally guaranteed; but now with other policy, 
> it is not guaranteed to never be compacted away, and hence we need to 
> "enforce" to special-handle this case and not delete it.
> 
> 
> 
> Guozhang
> 
> 
> 
> Guozhang
> 
> 
> On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral 
> wrote:
> 
>> Hi all,
>>
>> Since there has been a rejuvenated interest in this KIP, it felt 
>> better to downgrade it back down to [DISCUSSION], as we aren't really 
>> voting on it anymore.
>> I'll try to address the currently pending questions on the following 
>> points, so please bear with me while we go through them all:
>>
>> 1) Configuration: Topic vs Global
>>
>> Here we all agree that having a configuration per-topic is the best 
>> option. However, this is not possible with the current design of the 
>> compaction solution. Yes, it is true that "some" properties that 
>> relate to compaction are configurable per-topic, but those are only 
>> the properties that act outside(!) of the actual compaction logic, 
>> such as configuring the start-compaction trigger with "ratio" or 
>> compaction duration with "lag.ms ".
>> This logic can, of course, be re-designed to suit our wishes, but this 
>> is not a direct effort, and if we have spent months arguing about the 
>> extra 8 bytes per record, for sure we would spend another few dozen 
>> months discussing the memory impact that doing this change to the 
>> properties will invariably have.
>> As such, I will limit this KIP to ONLY have these configurations globally.
>>
>> 2) Configuration: Fail-fast vs Fallback
>>
>>
>> Ideally, I would also like to prevent the application to start if the 
>> configuration is somehow invalid.
>> However (another 'however'), the way the configuration is handled 
>> prevents adding dependencies between them, so we can't add logic that 
>> says "configuration X is invalid if configuration Y is so-and-such".
>> Again, this can be re-designed to add this feature to the 
>> configuration logic, but it would again be a big change just by 
>> itself, so this KIP is again limited to use ONLY what is already in place.
>>
>> 3) Documenting the memory impact on the KIP
>>
>> This is now added to the KIP, though this topic is more complicated 
>> than 'memory impact'. E.g.: this change doesn't translate to an actual 
>> memory impact, it just means that the compaction will potentially 
>> encompass less records per execution.
>>
>> 4) Documenting how we deal with the last message in the log
>>
>> I have 2 interpretations for this request: "the last message in the log"
>> or "the last message with a shared key on the log"
>> For the former: there is n

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-09-30 Thread Matthias J. Sax
Luis,

What is the status of this KIP?

I tend to agree, that introducing the feature only globally, might be
less useful (I would assume that people want to turn it on, on a
per-topic basis). As I am not familiar with the corresponding code, I
cannot judge the complexity to add topic level configs, however, it
seems to be worth to include it in the KIP.


-Matthias



On 9/21/18 1:59 PM, Bertus Greeff wrote:
> Someone pointed out to me that my scenario is also resolved by using Kafka 
> transactions.  Zombie fencing which is essentially my scenario was one of the 
> scenarios that transactions were designed to solve.  I was going to use the 
> ideas of this KIP to solve it but using transactions seems even better 
> because out of order messages never even make it into the topic.  They are 
> blocked by the broker.
> 
> -Original Message-
> From: Guozhang Wang  
> Sent: Saturday, September 1, 2018 11:33 AM
> To: dev 
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hello Luis,
> 
> Thanks for your thoughtful responses, here are my two cents:
> 
> 1) I think having the new configs with per-topic granularity would not 
> introduce much memory overhead or logic complexity, as all you need is to 
> remember this at the topic metadata cache. If I've missed some details about 
> the complexity, could you elaborate a bit more?
> 
> 2) I agree with you: the current `ConfigDef.Validator` only scope on the 
> validated config value itself, and hence cannot be dependent on another 
> config.
> 
> 4) I think Jun's point is that since we need the latest message in the log 
> segment for the timestamp tracking, we cannot delete it actually: with offset 
> based only policy, this is naturally guaranteed; but now with other policy, 
> it is not guaranteed to never be compacted away, and hence we need to 
> "enforce" to special-handle this case and not delete it.
> 
> 
> 
> Guozhang
> 
> 
> 
> Guozhang
> 
> 
> On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral 
> wrote:
> 
>> Hi all,
>>
>> Since there has been a rejuvenated interest in this KIP, it felt 
>> better to downgrade it back down to [DISCUSSION], as we aren't really 
>> voting on it anymore.
>> I'll try to address the currently pending questions on the following 
>> points, so please bear with me while we go through them all:
>>
>> 1) Configuration: Topic vs Global
>>
>> Here we all agree that having a configuration per-topic is the best 
>> option. However, this is not possible with the current design of the 
>> compaction solution. Yes, it is true that "some" properties that 
>> relate to compaction are configurable per-topic, but those are only 
>> the properties that act outside(!) of the actual compaction logic, 
>> such as configuring the start-compaction trigger with "ratio" or 
>> compaction duration with "lag.ms ".
>> This logic can, of course, be re-designed to suit our wishes, but this 
>> is not a direct effort, and if we have spent months arguing about the 
>> extra 8 bytes per record, for sure we would spend another few dozen 
>> months discussing the memory impact that doing this change to the 
>> properties will invariably have.
>> As such, I will limit this KIP to ONLY have these configurations globally.
>>
>> 2) Configuration: Fail-fast vs Fallback
>>
>>
>> Ideally, I would also like to prevent the application to start if the 
>> configuration is somehow invalid.
>> However (another 'however'), the way the configuration is handled 
>> prevents adding dependencies between them, so we can't add logic that 
>> says "configuration X is invalid if configuration Y is so-and-such".
>> Again, this can be re-designed to add this feature to the 
>> configuration logic, but it would again be a big change just by 
>> itself, so this KIP is again limited to use ONLY what is already in place.
>>
>> 3) Documenting the memory impact on the KIP
>>
>> This is now added to the KIP, though this topic is more complicated 
>> than 'memory impact'. E.g.: this change doesn't translate to an actual 
>> memory impact, it just means that the compaction will potentially 
>> encompass less records per execution.
>>
>> 4) Documenting how we deal with the last message in the log
>>
>> I have 2 interpretations for this request: "the last message in the log"
>> or "the last message with a shared key on the log"
>> For the former: there is no change to the logic on how the last 
>> message is handled. Only the "tail" gets compacted, so the "h

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-09-21 Thread Bertus Greeff
Someone pointed out to me that my scenario is also resolved by using Kafka 
transactions.  Zombie fencing which is essentially my scenario was one of the 
scenarios that transactions were designed to solve.  I was going to use the 
ideas of this KIP to solve it but using transactions seems even better because 
out of order messages never even make it into the topic.  They are blocked by 
the broker.

-Original Message-
From: Guozhang Wang  
Sent: Saturday, September 1, 2018 11:33 AM
To: dev 
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hello Luis,

Thanks for your thoughtful responses, here are my two cents:

1) I think having the new configs with per-topic granularity would not 
introduce much memory overhead or logic complexity, as all you need is to 
remember this at the topic metadata cache. If I've missed some details about 
the complexity, could you elaborate a bit more?

2) I agree with you: the current `ConfigDef.Validator` only scope on the 
validated config value itself, and hence cannot be dependent on another config.

4) I think Jun's point is that since we need the latest message in the log 
segment for the timestamp tracking, we cannot delete it actually: with offset 
based only policy, this is naturally guaranteed; but now with other policy, it 
is not guaranteed to never be compacted away, and hence we need to "enforce" to 
special-handle this case and not delete it.



Guozhang



Guozhang


On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral 
wrote:

> Hi all,
>
> Since there has been a rejuvenated interest in this KIP, it felt 
> better to downgrade it back down to [DISCUSSION], as we aren't really 
> voting on it anymore.
> I'll try to address the currently pending questions on the following 
> points, so please bear with me while we go through them all:
>
> 1) Configuration: Topic vs Global
>
> Here we all agree that having a configuration per-topic is the best 
> option. However, this is not possible with the current design of the 
> compaction solution. Yes, it is true that "some" properties that 
> relate to compaction are configurable per-topic, but those are only 
> the properties that act outside(!) of the actual compaction logic, 
> such as configuring the start-compaction trigger with "ratio" or 
> compaction duration with "lag.ms ".
> This logic can, of course, be re-designed to suit our wishes, but this 
> is not a direct effort, and if we have spent months arguing about the 
> extra 8 bytes per record, for sure we would spend another few dozen 
> months discussing the memory impact that doing this change to the 
> properties will invariably have.
> As such, I will limit this KIP to ONLY have these configurations globally.
>
> 2) Configuration: Fail-fast vs Fallback
>
>
> Ideally, I would also like to prevent the application to start if the 
> configuration is somehow invalid.
> However (another 'however'), the way the configuration is handled 
> prevents adding dependencies between them, so we can't add logic that 
> says "configuration X is invalid if configuration Y is so-and-such".
> Again, this can be re-designed to add this feature to the 
> configuration logic, but it would again be a big change just by 
> itself, so this KIP is again limited to use ONLY what is already in place.
>
> 3) Documenting the memory impact on the KIP
>
> This is now added to the KIP, though this topic is more complicated 
> than 'memory impact'. E.g.: this change doesn't translate to an actual 
> memory impact, it just means that the compaction will potentially 
> encompass less records per execution.
>
> 4) Documenting how we deal with the last message in the log
>
> I have 2 interpretations for this request: "the last message in the log"
> or "the last message with a shared key on the log"
> For the former: there is no change to the logic on how the last 
> message is handled. Only the "tail" gets compacted, so the "head" 
> (which includes the last message) still keeps the last message
>
> 5) Documenting how the key deletion will be handled
>
> I'm having some trouble understanding this one; do you mean how keys 
> are deleted in general, or?
>
> Cheers,
> Luis Cabral
>
>On Friday, August 24, 2018, 1:54:54 AM GMT+2, Jun Rao 
> 
> wrote:
>
>  Hi, Luis,
>
> Thanks for the reply. A few more comments below.
>
> 1. About the topic level configuration. It seems that it's useful for 
> the new configs to be at the topic level. Currently, the following 
> configs related to compaction are already at the topic level.
>
> min.cleanable.dirty.ratio
> min.compaction.lag.ms
> cleanup.policy
>
> 2. Have you documented the memory impact in the KIP?
>
> 3. Could y

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-09-01 Thread Guozhang Wang
Hello Luis,

Thanks for your thoughtful responses, here are my two cents:

1) I think having the new configs with per-topic granularity would not
introduce much memory overhead or logic complexity, as all you need is to
remember this at the topic metadata cache. If I've missed some details
about the complexity, could you elaborate a bit more?

2) I agree with you: the current `ConfigDef.Validator` only scope on the
validated config value itself, and hence cannot be dependent on another
config.

4) I think Jun's point is that since we need the latest message in the log
segment for the timestamp tracking, we cannot delete it actually: with
offset based only policy, this is naturally guaranteed; but now with other
policy, it is not guaranteed to never be compacted away, and hence we need
to "enforce" to special-handle this case and not delete it.



Guozhang



Guozhang


On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral 
wrote:

> Hi all,
>
> Since there has been a rejuvenated interest in this KIP, it felt better to
> downgrade it back down to [DISCUSSION], as we aren't really voting on it
> anymore.
> I'll try to address the currently pending questions on the following
> points, so please bear with me while we go through them all:
>
> 1) Configuration: Topic vs Global
>
> Here we all agree that having a configuration per-topic is the best
> option. However, this is not possible with the current design of the
> compaction solution. Yes, it is true that "some" properties that relate to
> compaction are configurable per-topic, but those are only the properties
> that act outside(!) of the actual compaction logic, such as configuring the
> start-compaction trigger with "ratio" or compaction duration with "lag.ms
> ".
> This logic can, of course, be re-designed to suit our wishes, but this is
> not a direct effort, and if we have spent months arguing about the extra 8
> bytes per record, for sure we would spend another few dozen months
> discussing the memory impact that doing this change to the properties will
> invariably have.
> As such, I will limit this KIP to ONLY have these configurations globally.
>
> 2) Configuration: Fail-fast vs Fallback
>
>
> Ideally, I would also like to prevent the application to start if the
> configuration is somehow invalid.
> However (another 'however'), the way the configuration is handled prevents
> adding dependencies between them, so we can't add logic that says
> "configuration X is invalid if configuration Y is so-and-such".
> Again, this can be re-designed to add this feature to the configuration
> logic, but it would again be a big change just by itself, so this KIP is
> again limited to use ONLY what is already in place.
>
> 3) Documenting the memory impact on the KIP
>
> This is now added to the KIP, though this topic is more complicated than
> 'memory impact'. E.g.: this change doesn't translate to an actual memory
> impact, it just means that the compaction will potentially encompass less
> records per execution.
>
> 4) Documenting how we deal with the last message in the log
>
> I have 2 interpretations for this request: "the last message in the log"
> or "the last message with a shared key on the log"
> For the former: there is no change to the logic on how the last message is
> handled. Only the "tail" gets compacted, so the "head" (which includes the
> last message) still keeps the last message
>
> 5) Documenting how the key deletion will be handled
>
> I'm having some trouble understanding this one; do you mean how keys are
> deleted in general, or?
>
> Cheers,
> Luis Cabral
>
>On Friday, August 24, 2018, 1:54:54 AM GMT+2, Jun Rao 
> wrote:
>
>  Hi, Luis,
>
> Thanks for the reply. A few more comments below.
>
> 1. About the topic level configuration. It seems that it's useful for the
> new configs to be at the topic level. Currently, the following configs
> related to compaction are already at the topic level.
>
> min.cleanable.dirty.ratio
> min.compaction.lag.ms
> cleanup.policy
>
> 2. Have you documented the memory impact in the KIP?
>
> 3. Could you document how we deal with the last message in the log, which
> is potentially cleanable now?
>
> 4. Could you document how key deletion will be handled?
>
> 10. As for Jason's proposal on CompactionStrategy, it does make the feature
> more general. On the other hand, it will be useful not to require
> user-level code if the compaction value only comes from the header.
>
> 20. "If compaction.strategy.header is chosen and compaction.strategy.header
> is not set, the KIP falls back to offset." I am wondering if it's better to
> just fail the configuration in the case.
>
> Jun
>
>
>
> On Thu, Aug 16, 2018 at 1:33 PM, Guozhang Wang  wrote:
>
> > Regarding "broker-agnostic of headers": there are some KIPs from Streams
> to
> > use headers for internal purposes as well, e.g. KIP-258 and KIP-213 (I
> > admit there may be a conflict with user space, but practically I think it
> > is very rare). So I think we are very 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-29 Thread Luís Cabral
Hi all,

Since there has been a rejuvenated interest in this KIP, it felt better to 
downgrade it back down to [DISCUSSION], as we aren't really voting on it 
anymore.
I'll try to address the currently pending questions on the following points, so 
please bear with me while we go through them all:

1) Configuration: Topic vs Global

Here we all agree that having a configuration per-topic is the best option. 
However, this is not possible with the current design of the compaction 
solution. Yes, it is true that "some" properties that relate to compaction are 
configurable per-topic, but those are only the properties that act outside(!) 
of the actual compaction logic, such as configuring the start-compaction 
trigger with "ratio" or compaction duration with "lag.ms".
This logic can, of course, be re-designed to suit our wishes, but this is not a 
direct effort, and if we have spent months arguing about the extra 8 bytes per 
record, for sure we would spend another few dozen months discussing the memory 
impact that doing this change to the properties will invariably have.
As such, I will limit this KIP to ONLY have these configurations globally.

2) Configuration: Fail-fast vs Fallback


Ideally, I would also like to prevent the application to start if the 
configuration is somehow invalid.
However (another 'however'), the way the configuration is handled prevents 
adding dependencies between them, so we can't add logic that says 
"configuration X is invalid if configuration Y is so-and-such".
Again, this can be re-designed to add this feature to the configuration logic, 
but it would again be a big change just by itself, so this KIP is again limited 
to use ONLY what is already in place.

3) Documenting the memory impact on the KIP

This is now added to the KIP, though this topic is more complicated than 
'memory impact'. E.g.: this change doesn't translate to an actual memory 
impact, it just means that the compaction will potentially encompass less 
records per execution.

4) Documenting how we deal with the last message in the log

I have 2 interpretations for this request: "the last message in the log" or 
"the last message with a shared key on the log"
For the former: there is no change to the logic on how the last message is 
handled. Only the "tail" gets compacted, so the "head" (which includes the last 
message) still keeps the last message

5) Documenting how the key deletion will be handled

I'm having some trouble understanding this one; do you mean how keys are 
deleted in general, or?

Cheers,
Luis Cabral

   On Friday, August 24, 2018, 1:54:54 AM GMT+2, Jun Rao  
wrote:  
 
 Hi, Luis,

Thanks for the reply. A few more comments below.

1. About the topic level configuration. It seems that it's useful for the
new configs to be at the topic level. Currently, the following configs
related to compaction are already at the topic level.

min.cleanable.dirty.ratio
min.compaction.lag.ms
cleanup.policy

2. Have you documented the memory impact in the KIP?

3. Could you document how we deal with the last message in the log, which
is potentially cleanable now?

4. Could you document how key deletion will be handled?

10. As for Jason's proposal on CompactionStrategy, it does make the feature
more general. On the other hand, it will be useful not to require
user-level code if the compaction value only comes from the header.

20. "If compaction.strategy.header is chosen and compaction.strategy.header
is not set, the KIP falls back to offset." I am wondering if it's better to
just fail the configuration in the case.

Jun



On Thu, Aug 16, 2018 at 1:33 PM, Guozhang Wang  wrote:

> Regarding "broker-agnostic of headers": there are some KIPs from Streams to
> use headers for internal purposes as well, e.g. KIP-258 and KIP-213 (I
> admit there may be a conflict with user space, but practically I think it
> is very rare). So I think we are very likely going to make Kafka internals
> to be "headers-aware" anyways.
>
> Regarding the general API: I think it is a good idea in general, but it may
> still have limits: note that right now our KIP enforce a header type to be
> long, and we have a very careful discussion about the fall-back policy if
> header does not have the specified key or if the value is not long-typed;
> but if we enforce long type version in the interface, it would require
> users trying to customizing their compaction logic (think: based on some
> value payload field) to transform their fields to long as well. So I feel
> we can still go with the current proposed approach, and only consider this
> general API if we observe it does have a general usage requirement. By that
> time we can still extend the config values of "log.cleaner.compaction.
> strategy" to "offset, timestamp, header, myFuncName".
>
> @Bertus
>
> Thanks for your feedback, I believe the proposed config is indeed for both
> global (for the whole broker) and per-topic, Luís can confirm if this is
> the case, and update the wiki 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-23 Thread Jun Rao
Hi, Luis,

Thanks for the reply. A few more comments below.

1. About the topic level configuration. It seems that it's useful for the
new configs to be at the topic level. Currently, the following configs
related to compaction are already at the topic level.

min.cleanable.dirty.ratio
min.compaction.lag.ms
cleanup.policy

2. Have you documented the memory impact in the KIP?

3. Could you document how we deal with the last message in the log, which
is potentially cleanable now?

4. Could you document how key deletion will be handled?

10. As for Jason's proposal on CompactionStrategy, it does make the feature
more general. On the other hand, it will be useful not to require
user-level code if the compaction value only comes from the header.

20. "If compaction.strategy.header is chosen and compaction.strategy.header
is not set, the KIP falls back to offset." I am wondering if it's better to
just fail the configuration in the case.

Jun



On Thu, Aug 16, 2018 at 1:33 PM, Guozhang Wang  wrote:

> Regarding "broker-agnostic of headers": there are some KIPs from Streams to
> use headers for internal purposes as well, e.g. KIP-258 and KIP-213 (I
> admit there may be a conflict with user space, but practically I think it
> is very rare). So I think we are very likely going to make Kafka internals
> to be "headers-aware" anyways.
>
> Regarding the general API: I think it is a good idea in general, but it may
> still have limits: note that right now our KIP enforce a header type to be
> long, and we have a very careful discussion about the fall-back policy if
> header does not have the specified key or if the value is not long-typed;
> but if we enforce long type version in the interface, it would require
> users trying to customizing their compaction logic (think: based on some
> value payload field) to transform their fields to long as well. So I feel
> we can still go with the current proposed approach, and only consider this
> general API if we observe it does have a general usage requirement. By that
> time we can still extend the config values of "log.cleaner.compaction.
> strategy" to "offset, timestamp, header, myFuncName".
>
> @Bertus
>
> Thanks for your feedback, I believe the proposed config is indeed for both
> global (for the whole broker) and per-topic, Luís can confirm if this is
> the case, and update the wiki page to make it clear.
>
>
> Guozhang
>
>
> On Thu, Aug 16, 2018 at 9:09 AM, Bertus Greeff <
> bgre...@microsoft.com.invalid> wrote:
>
> > I'm interested to know the status of this KIP.  I see that the status is
> > "Voting".  How long does this normally take?
> >
> > We want to use Kafka and this KIP provides exactly the log compaction
> > logic that we want for many of our projects.
> >
> > One piece of feedback that I have is that log.cleaner.compaction.
> strategy
> > and log.cleaner.compaction.strategy.header needs to be per topic.  The
> > text of the KIP makes it sound that the config is only available for all
> > topics but this makes no sense.  Different topics will need different
> > strategies and/or headers.
> >
> > From the KIP:
> > Provide the configuration for the individual topics
> > None of the configurations for log compaction are available at topic
> > level, so adding it there is not a part of this KIP
> >
> >
> >
> > On 2018/04/05 08:44:00, Luís Cabral  wrote:
> > > Hello all,>
> > > Starting a discussion for this feature.>
> > > KIP-280   :  https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%
> > 3A+Enhanced+log+compactionPull-4822 :  https://github.com/apache/kafk
> > a/pull/4822>
> >
> > > Kind Regards,Luís>
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-16 Thread Guozhang Wang
Regarding "broker-agnostic of headers": there are some KIPs from Streams to
use headers for internal purposes as well, e.g. KIP-258 and KIP-213 (I
admit there may be a conflict with user space, but practically I think it
is very rare). So I think we are very likely going to make Kafka internals
to be "headers-aware" anyways.

Regarding the general API: I think it is a good idea in general, but it may
still have limits: note that right now our KIP enforce a header type to be
long, and we have a very careful discussion about the fall-back policy if
header does not have the specified key or if the value is not long-typed;
but if we enforce long type version in the interface, it would require
users trying to customizing their compaction logic (think: based on some
value payload field) to transform their fields to long as well. So I feel
we can still go with the current proposed approach, and only consider this
general API if we observe it does have a general usage requirement. By that
time we can still extend the config values of "log.cleaner.compaction.
strategy" to "offset, timestamp, header, myFuncName".

@Bertus

Thanks for your feedback, I believe the proposed config is indeed for both
global (for the whole broker) and per-topic, Luís can confirm if this is
the case, and update the wiki page to make it clear.


Guozhang


On Thu, Aug 16, 2018 at 9:09 AM, Bertus Greeff <
bgre...@microsoft.com.invalid> wrote:

> I'm interested to know the status of this KIP.  I see that the status is
> "Voting".  How long does this normally take?
>
> We want to use Kafka and this KIP provides exactly the log compaction
> logic that we want for many of our projects.
>
> One piece of feedback that I have is that log.cleaner.compaction.strategy
> and log.cleaner.compaction.strategy.header needs to be per topic.  The
> text of the KIP makes it sound that the config is only available for all
> topics but this makes no sense.  Different topics will need different
> strategies and/or headers.
>
> From the KIP:
> Provide the configuration for the individual topics
> None of the configurations for log compaction are available at topic
> level, so adding it there is not a part of this KIP
>
>
>
> On 2018/04/05 08:44:00, Luís Cabral  wrote:
> > Hello all,>
> > Starting a discussion for this feature.>
> > KIP-280   :  https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%
> 3A+Enhanced+log+compactionPull-4822 :  https://github.com/apache/kafk
> a/pull/4822>
>
> > Kind Regards,Luís>
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-16 Thread Bertus Greeff
I'm interested to know the status of this KIP.  I see that the status is 
"Voting".  How long does this normally take?

We want to use Kafka and this KIP provides exactly the log compaction logic 
that we want for many of our projects.

One piece of feedback that I have is that log.cleaner.compaction.strategy and 
log.cleaner.compaction.strategy.header needs to be per topic.  The text of the 
KIP makes it sound that the config is only available for all topics but this 
makes no sense.  Different topics will need different strategies and/or headers.

>From the KIP:
Provide the configuration for the individual topics
None of the configurations for log compaction are available at topic level, so 
adding it there is not a part of this KIP



On 2018/04/05 08:44:00, Luís Cabral  wrote:
> Hello all,>
> Starting a discussion for this feature.>
> KIP-280   :  
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822
>  :  https://github.com/apache/kafka/pull/4822>

> Kind Regards,Luís>


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-06-01 Thread Guozhang Wang
Hello Luis,

Please feel free to continue on the voting process as there seems be no
further comments on this thread (I have synced with Jun and Ismael
separately offline and they are in consent with the approach to add the
fields in offset map for all cases).

We can still continue on reviewing the PR while voting on the thread so
that it can get in earlier into trunk for the next release.



Guozhang


On Mon, May 28, 2018 at 11:04 AM, Matthias J. Sax 
wrote:

> Luis,
>
> this week is feature freeze for the upcoming 2.0 release and most people
> focus on getting their PR merged. Thus, this and the next week (until
> code freeze) KIPs for 2.1 are not a high priority for most people.
>
> Please bear with us. Thanks for your understanding.
>
>
> -Matthias
>
> On 5/28/18 5:21 AM, Luís Cabral wrote:
> >  Hi Guozhang,
> >
> > It doesn't look like there will be much feedback here.
> > Is it alright if I just update the spec back to a standardized behaviour
> and move this along?
> >
> > Cheers,Luis
> > On Thursday, May 24, 2018, 11:20:01 AM GMT+2, Luis Cabral <
> luis_cab...@yahoo.com> wrote:
> >
> >  Hi Jun / Ismael,
> >
> > Any chance to get your opinion on this?
> > Thanks in advance!
> >
> > Regards,
> > Luís
> >
> >> On 22 May 2018, at 17:30, Guozhang Wang  wrote:
> >>
> >> Hello Luís,
> >>
> >> While reviewing your PR I realized my previous calculation on the memory
> >> usage was incorrect: in fact, in the current implementation, each entry
> in
> >> the memory-bounded cache is 16 (default MD5 hash digest length) + 8
> (long
> >> type) = 24 bytes, and if we add the long-typed version value it is 32
> >> bytes. I.e. each entry will be increased by 33%, not doubling.
> >>
> >> After redoing the math I'm bit leaning towards just adding this entry
> for
> >> all cases rather than treating timestamp differently with others (sorry
> for
> >> being back and forth, but I just want to make sure we've got a good
> balance
> >> between efficiency and semantics consistency). I've also chatted with
> Jun
> >> and Ismael about this (cc'ed), and maybe you guys can chime in here as
> well.
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Tue, May 22, 2018 at 6:45 AM, Luís Cabral
> 
> >> wrote:
> >>
> >>> Hi Matthias / Guozhang,
> >>>
> >>> Were the questions clarified?
> >>> Please feel free to add more feedback, otherwise it would be nice to
> move
> >>> this topic onwards 
> >>>
> >>> Kind Regards,
> >>> Luís Cabral
> >>>
> >>> From: Guozhang Wang
> >>> Sent: 09 May 2018 20:00
> >>> To: dev@kafka.apache.org
> >>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >>>
> >>> I have thought about being consistency in strategy v.s. practical
> concerns
> >>> about storage convenience to its impact on compaction effectiveness.
> >>>
> >>> The different between timestamp and the header key-value pairs is that
> for
> >>> the latter, as I mentioned before, "it is arguably out of Kafka's
> control,
> >>> and indeed users may (mistakenly) generate many records with the same
> key
> >>> and the same header value." So giving up tie breakers may result in
> very
> >>> poor compaction effectiveness when it happens, while for timestamps the
> >>> likelihood of this is considered very small.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax  >
> >>> wrote:
> >>>
> >>>> Thanks.
> >>>>
> >>>> To reverse the question: if this argument holds, why does it not apply
> >>>> to the case when the header key is used as compaction attribute?
> >>>>
> >>>> I am not against keeping both records in case timestamps are equal,
> but
> >>>> shouldn't we apply the same strategy for all cases and don't use
> offset
> >>>> as tie-breaker at all?
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>> On 5/6/18 8:47 PM, Guozhang Wang wrote:
> >>>>> Hello Matthias,
> >>>>>
> >>>>> The related discussion was in the PR:
> >>>>> https://github.com/apache/kafka/pull/4822#discus

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-28 Thread Matthias J. Sax
Luis,

this week is feature freeze for the upcoming 2.0 release and most people
focus on getting their PR merged. Thus, this and the next week (until
code freeze) KIPs for 2.1 are not a high priority for most people.

Please bear with us. Thanks for your understanding.


-Matthias

On 5/28/18 5:21 AM, Luís Cabral wrote:
>  Hi Guozhang,
> 
> It doesn't look like there will be much feedback here.
> Is it alright if I just update the spec back to a standardized behaviour and 
> move this along?
> 
> Cheers,Luis
> On Thursday, May 24, 2018, 11:20:01 AM GMT+2, Luis Cabral 
>  wrote:  
>  
>  Hi Jun / Ismael, 
> 
> Any chance to get your opinion on this?
> Thanks in advance!
> 
> Regards,
> Luís
> 
>> On 22 May 2018, at 17:30, Guozhang Wang  wrote:
>>
>> Hello Luís,
>>
>> While reviewing your PR I realized my previous calculation on the memory
>> usage was incorrect: in fact, in the current implementation, each entry in
>> the memory-bounded cache is 16 (default MD5 hash digest length) + 8 (long
>> type) = 24 bytes, and if we add the long-typed version value it is 32
>> bytes. I.e. each entry will be increased by 33%, not doubling.
>>
>> After redoing the math I'm bit leaning towards just adding this entry for
>> all cases rather than treating timestamp differently with others (sorry for
>> being back and forth, but I just want to make sure we've got a good balance
>> between efficiency and semantics consistency). I've also chatted with Jun
>> and Ismael about this (cc'ed), and maybe you guys can chime in here as well.
>>
>>
>> Guozhang
>>
>>
>> On Tue, May 22, 2018 at 6:45 AM, Luís Cabral 
>> wrote:
>>
>>> Hi Matthias / Guozhang,
>>>
>>> Were the questions clarified?
>>> Please feel free to add more feedback, otherwise it would be nice to move
>>> this topic onwards 
>>>
>>> Kind Regards,
>>> Luís Cabral
>>>
>>> From: Guozhang Wang
>>> Sent: 09 May 2018 20:00
>>> To: dev@kafka.apache.org
>>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>>>
>>> I have thought about being consistency in strategy v.s. practical concerns
>>> about storage convenience to its impact on compaction effectiveness.
>>>
>>> The different between timestamp and the header key-value pairs is that for
>>> the latter, as I mentioned before, "it is arguably out of Kafka's control,
>>> and indeed users may (mistakenly) generate many records with the same key
>>> and the same header value." So giving up tie breakers may result in very
>>> poor compaction effectiveness when it happens, while for timestamps the
>>> likelihood of this is considered very small.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax 
>>> wrote:
>>>
>>>> Thanks.
>>>>
>>>> To reverse the question: if this argument holds, why does it not apply
>>>> to the case when the header key is used as compaction attribute?
>>>>
>>>> I am not against keeping both records in case timestamps are equal, but
>>>> shouldn't we apply the same strategy for all cases and don't use offset
>>>> as tie-breaker at all?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>> On 5/6/18 8:47 PM, Guozhang Wang wrote:
>>>>> Hello Matthias,
>>>>>
>>>>> The related discussion was in the PR:
>>>>> https://github.com/apache/kafka/pull/4822#discussion_r184588037
>>>>>
>>>>> The concern is that, to use offset as tie breaker we need to double the
>>>>> entry size of the entry in bounded compaction cache, and hence largely
>>>>> reduce the effectiveness of the compaction itself. Since with
>>>> milliseconds
>>>>> timestamp the scenario of ties with the same key is expected to be
>>>> small, I
>>>>> think it would be a reasonable tradeoff to make.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>> On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax >>>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I just updated myself on this KIP. One question (maybe it was
>>> discussed
>>>>>> and I missed it). What is the motivation to not use the offset as tie
>>>>>> breaker for the "timestamp" cas

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-28 Thread Luís Cabral
 Hi Guozhang,

It doesn't look like there will be much feedback here.
Is it alright if I just update the spec back to a standardized behaviour and 
move this along?

Cheers,Luis
On Thursday, May 24, 2018, 11:20:01 AM GMT+2, Luis Cabral 
<luis_cab...@yahoo.com> wrote:  
 
 Hi Jun / Ismael, 

Any chance to get your opinion on this?
Thanks in advance!

Regards,
Luís

> On 22 May 2018, at 17:30, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> Hello Luís,
> 
> While reviewing your PR I realized my previous calculation on the memory
> usage was incorrect: in fact, in the current implementation, each entry in
> the memory-bounded cache is 16 (default MD5 hash digest length) + 8 (long
> type) = 24 bytes, and if we add the long-typed version value it is 32
> bytes. I.e. each entry will be increased by 33%, not doubling.
> 
> After redoing the math I'm bit leaning towards just adding this entry for
> all cases rather than treating timestamp differently with others (sorry for
> being back and forth, but I just want to make sure we've got a good balance
> between efficiency and semantics consistency). I've also chatted with Jun
> and Ismael about this (cc'ed), and maybe you guys can chime in here as well.
> 
> 
> Guozhang
> 
> 
> On Tue, May 22, 2018 at 6:45 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
> wrote:
> 
>> Hi Matthias / Guozhang,
>> 
>> Were the questions clarified?
>> Please feel free to add more feedback, otherwise it would be nice to move
>> this topic onwards 
>> 
>> Kind Regards,
>> Luís Cabral
>> 
>> From: Guozhang Wang
>> Sent: 09 May 2018 20:00
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>> 
>> I have thought about being consistency in strategy v.s. practical concerns
>> about storage convenience to its impact on compaction effectiveness.
>> 
>> The different between timestamp and the header key-value pairs is that for
>> the latter, as I mentioned before, "it is arguably out of Kafka's control,
>> and indeed users may (mistakenly) generate many records with the same key
>> and the same header value." So giving up tie breakers may result in very
>> poor compaction effectiveness when it happens, while for timestamps the
>> likelihood of this is considered very small.
>> 
>> 
>> Guozhang
>> 
>> 
>> On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax <matth...@confluent.io>
>> wrote:
>> 
>>> Thanks.
>>> 
>>> To reverse the question: if this argument holds, why does it not apply
>>> to the case when the header key is used as compaction attribute?
>>> 
>>> I am not against keeping both records in case timestamps are equal, but
>>> shouldn't we apply the same strategy for all cases and don't use offset
>>> as tie-breaker at all?
>>> 
>>> 
>>> -Matthias
>>> 
>>>> On 5/6/18 8:47 PM, Guozhang Wang wrote:
>>>> Hello Matthias,
>>>> 
>>>> The related discussion was in the PR:
>>>> https://github.com/apache/kafka/pull/4822#discussion_r184588037
>>>> 
>>>> The concern is that, to use offset as tie breaker we need to double the
>>>> entry size of the entry in bounded compaction cache, and hence largely
>>>> reduce the effectiveness of the compaction itself. Since with
>>> milliseconds
>>>> timestamp the scenario of ties with the same key is expected to be
>>> small, I
>>>> think it would be a reasonable tradeoff to make.
>>>> 
>>>> 
>>>> Guozhang
>>>> 
>>>> On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax <matth...@confluent.io
>>> 
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I just updated myself on this KIP. One question (maybe it was
>> discussed
>>>>> and I missed it). What is the motivation to not use the offset as tie
>>>>> breaker for the "timestamp" case? Isn't this inconsistent behavior?
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> 
>>>>>> On 5/2/18 2:07 PM, Guozhang Wang wrote:
>>>>>> Hello Luís,
>>>>>> 
>>>>>> Sorry for the late reply.
>>>>>> 
>>>>>> My understanding is that such duplicates will only happen if the
>>>>> non-offset
>>>>>> version value, either the timestamp or some long-typed header key,
>>

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-24 Thread Luis Cabral
Hi Jun / Ismael, 

Any chance to get your opinion on this?
Thanks in advance!

Regards,
Luís

> On 22 May 2018, at 17:30, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> Hello Luís,
> 
> While reviewing your PR I realized my previous calculation on the memory
> usage was incorrect: in fact, in the current implementation, each entry in
> the memory-bounded cache is 16 (default MD5 hash digest length) + 8 (long
> type) = 24 bytes, and if we add the long-typed version value it is 32
> bytes. I.e. each entry will be increased by 33%, not doubling.
> 
> After redoing the math I'm bit leaning towards just adding this entry for
> all cases rather than treating timestamp differently with others (sorry for
> being back and forth, but I just want to make sure we've got a good balance
> between efficiency and semantics consistency). I've also chatted with Jun
> and Ismael about this (cc'ed), and maybe you guys can chime in here as well.
> 
> 
> Guozhang
> 
> 
> On Tue, May 22, 2018 at 6:45 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
> wrote:
> 
>> Hi Matthias / Guozhang,
>> 
>> Were the questions clarified?
>> Please feel free to add more feedback, otherwise it would be nice to move
>> this topic onwards 
>> 
>> Kind Regards,
>> Luís Cabral
>> 
>> From: Guozhang Wang
>> Sent: 09 May 2018 20:00
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>> 
>> I have thought about being consistency in strategy v.s. practical concerns
>> about storage convenience to its impact on compaction effectiveness.
>> 
>> The different between timestamp and the header key-value pairs is that for
>> the latter, as I mentioned before, "it is arguably out of Kafka's control,
>> and indeed users may (mistakenly) generate many records with the same key
>> and the same header value." So giving up tie breakers may result in very
>> poor compaction effectiveness when it happens, while for timestamps the
>> likelihood of this is considered very small.
>> 
>> 
>> Guozhang
>> 
>> 
>> On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax <matth...@confluent.io>
>> wrote:
>> 
>>> Thanks.
>>> 
>>> To reverse the question: if this argument holds, why does it not apply
>>> to the case when the header key is used as compaction attribute?
>>> 
>>> I am not against keeping both records in case timestamps are equal, but
>>> shouldn't we apply the same strategy for all cases and don't use offset
>>> as tie-breaker at all?
>>> 
>>> 
>>> -Matthias
>>> 
>>>> On 5/6/18 8:47 PM, Guozhang Wang wrote:
>>>> Hello Matthias,
>>>> 
>>>> The related discussion was in the PR:
>>>> https://github.com/apache/kafka/pull/4822#discussion_r184588037
>>>> 
>>>> The concern is that, to use offset as tie breaker we need to double the
>>>> entry size of the entry in bounded compaction cache, and hence largely
>>>> reduce the effectiveness of the compaction itself. Since with
>>> milliseconds
>>>> timestamp the scenario of ties with the same key is expected to be
>>> small, I
>>>> think it would be a reasonable tradeoff to make.
>>>> 
>>>> 
>>>> Guozhang
>>>> 
>>>> On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax <matth...@confluent.io
>>> 
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I just updated myself on this KIP. One question (maybe it was
>> discussed
>>>>> and I missed it). What is the motivation to not use the offset as tie
>>>>> breaker for the "timestamp" case? Isn't this inconsistent behavior?
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> 
>>>>>> On 5/2/18 2:07 PM, Guozhang Wang wrote:
>>>>>> Hello Luís,
>>>>>> 
>>>>>> Sorry for the late reply.
>>>>>> 
>>>>>> My understanding is that such duplicates will only happen if the
>>>>> non-offset
>>>>>> version value, either the timestamp or some long-typed header key,
>> are
>>>>> the
>>>>>> same (i.e. we cannot break ties).
>>>>>> 
>>>>>> 1. For timestamp, which is in milli-seconds, I think in practice the
>>>>>> likelihood of records with the same key and the same mil

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-22 Thread Guozhang Wang
Hello Luís,

While reviewing your PR I realized my previous calculation on the memory
usage was incorrect: in fact, in the current implementation, each entry in
the memory-bounded cache is 16 (default MD5 hash digest length) + 8 (long
type) = 24 bytes, and if we add the long-typed version value it is 32
bytes. I.e. each entry will be increased by 33%, not doubling.

After redoing the math I'm bit leaning towards just adding this entry for
all cases rather than treating timestamp differently with others (sorry for
being back and forth, but I just want to make sure we've got a good balance
between efficiency and semantics consistency). I've also chatted with Jun
and Ismael about this (cc'ed), and maybe you guys can chime in here as well.


Guozhang


On Tue, May 22, 2018 at 6:45 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
wrote:

> Hi Matthias / Guozhang,
>
> Were the questions clarified?
> Please feel free to add more feedback, otherwise it would be nice to move
> this topic onwards 
>
> Kind Regards,
> Luís Cabral
>
> From: Guozhang Wang
> Sent: 09 May 2018 20:00
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> I have thought about being consistency in strategy v.s. practical concerns
> about storage convenience to its impact on compaction effectiveness.
>
> The different between timestamp and the header key-value pairs is that for
> the latter, as I mentioned before, "it is arguably out of Kafka's control,
> and indeed users may (mistakenly) generate many records with the same key
> and the same header value." So giving up tie breakers may result in very
> poor compaction effectiveness when it happens, while for timestamps the
> likelihood of this is considered very small.
>
>
> Guozhang
>
>
> On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > Thanks.
> >
> > To reverse the question: if this argument holds, why does it not apply
> > to the case when the header key is used as compaction attribute?
> >
> > I am not against keeping both records in case timestamps are equal, but
> > shouldn't we apply the same strategy for all cases and don't use offset
> > as tie-breaker at all?
> >
> >
> > -Matthias
> >
> > On 5/6/18 8:47 PM, Guozhang Wang wrote:
> > > Hello Matthias,
> > >
> > > The related discussion was in the PR:
> > > https://github.com/apache/kafka/pull/4822#discussion_r184588037
> > >
> > > The concern is that, to use offset as tie breaker we need to double the
> > > entry size of the entry in bounded compaction cache, and hence largely
> > > reduce the effectiveness of the compaction itself. Since with
> > milliseconds
> > > timestamp the scenario of ties with the same key is expected to be
> > small, I
> > > think it would be a reasonable tradeoff to make.
> > >
> > >
> > > Guozhang
> > >
> > > On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax <matth...@confluent.io
> >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I just updated myself on this KIP. One question (maybe it was
> discussed
> > >> and I missed it). What is the motivation to not use the offset as tie
> > >> breaker for the "timestamp" case? Isn't this inconsistent behavior?
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 5/2/18 2:07 PM, Guozhang Wang wrote:
> > >>> Hello Luís,
> > >>>
> > >>> Sorry for the late reply.
> > >>>
> > >>> My understanding is that such duplicates will only happen if the
> > >> non-offset
> > >>> version value, either the timestamp or some long-typed header key,
> are
> > >> the
> > >>> same (i.e. we cannot break ties).
> > >>>
> > >>> 1. For timestamp, which is in milli-seconds, I think in practice the
> > >>> likelihood of records with the same key and the same milli-sec
> > timestamp
> > >>> are very small. And hence the duplicate amount should be very small.
> > >>>
> > >>> 2. For long-typed header key, it is arguably out of Kafka's control,
> > and
> > >>> indeed users may (mistakenly) generate many records with the same key
> > and
> > >>> the same header value.
> > >>>
> > >>>
> > >>> So I'd like to propose a counter-offer: for 1), we still use only 8
> > bytes
> > >>>

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-22 Thread Luís Cabral
Hi Matthias / Guozhang,

Were the questions clarified?
Please feel free to add more feedback, otherwise it would be nice to move this 
topic onwards 

Kind Regards,
Luís Cabral

From: Guozhang Wang
Sent: 09 May 2018 20:00
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

I have thought about being consistency in strategy v.s. practical concerns
about storage convenience to its impact on compaction effectiveness.

The different between timestamp and the header key-value pairs is that for
the latter, as I mentioned before, "it is arguably out of Kafka's control,
and indeed users may (mistakenly) generate many records with the same key
and the same header value." So giving up tie breakers may result in very
poor compaction effectiveness when it happens, while for timestamps the
likelihood of this is considered very small.


Guozhang


On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Thanks.
>
> To reverse the question: if this argument holds, why does it not apply
> to the case when the header key is used as compaction attribute?
>
> I am not against keeping both records in case timestamps are equal, but
> shouldn't we apply the same strategy for all cases and don't use offset
> as tie-breaker at all?
>
>
> -Matthias
>
> On 5/6/18 8:47 PM, Guozhang Wang wrote:
> > Hello Matthias,
> >
> > The related discussion was in the PR:
> > https://github.com/apache/kafka/pull/4822#discussion_r184588037
> >
> > The concern is that, to use offset as tie breaker we need to double the
> > entry size of the entry in bounded compaction cache, and hence largely
> > reduce the effectiveness of the compaction itself. Since with
> milliseconds
> > timestamp the scenario of ties with the same key is expected to be
> small, I
> > think it would be a reasonable tradeoff to make.
> >
> >
> > Guozhang
> >
> > On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax <matth...@confluent.io>
> > wrote:
> >
> >> Hi,
> >>
> >> I just updated myself on this KIP. One question (maybe it was discussed
> >> and I missed it). What is the motivation to not use the offset as tie
> >> breaker for the "timestamp" case? Isn't this inconsistent behavior?
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 5/2/18 2:07 PM, Guozhang Wang wrote:
> >>> Hello Luís,
> >>>
> >>> Sorry for the late reply.
> >>>
> >>> My understanding is that such duplicates will only happen if the
> >> non-offset
> >>> version value, either the timestamp or some long-typed header key, are
> >> the
> >>> same (i.e. we cannot break ties).
> >>>
> >>> 1. For timestamp, which is in milli-seconds, I think in practice the
> >>> likelihood of records with the same key and the same milli-sec
> timestamp
> >>> are very small. And hence the duplicate amount should be very small.
> >>>
> >>> 2. For long-typed header key, it is arguably out of Kafka's control,
> and
> >>> indeed users may (mistakenly) generate many records with the same key
> and
> >>> the same header value.
> >>>
> >>>
> >>> So I'd like to propose a counter-offer: for 1), we still use only 8
> bytes
> >>> and allows for potential duplicates due to ties; for 2) we use 16 bytes
> >> to
> >>> always break ties. The motivation for distinguishing 1) and 2), is that
> >> my
> >>> expectation for 1) would be much common, and hence worth special
> handling
> >>> it to be more effective in cleaning. WDYT?
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>>
> >>> On Wed, May 2, 2018 at 2:36 AM, Luís Cabral
> >> <luis_cab...@yahoo.com.invalid>
> >>> wrote:
> >>>
> >>>>  Hi Guozhang,
> >>>>
> >>>> Have you managed to have a look at my reply?
> >>>> How do you feel about this?
> >>>>
> >>>> Kind Regards,
> >>>> Luís Cabral
> >>>> On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral <
> >>>> luis_cab...@yahoo.com> wrote:
> >>>>
> >>>>   Hi Guozhang,
> >>>>
> >>>> I understand the argument, but this is a hazardous compromise for
> using
> >>>> Kafka as an event store (as is my original intention).
> >>>>
> >>>> I expect to have many dupli

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-09 Thread Guozhang Wang
I have thought about being consistency in strategy v.s. practical concerns
about storage convenience to its impact on compaction effectiveness.

The different between timestamp and the header key-value pairs is that for
the latter, as I mentioned before, "it is arguably out of Kafka's control,
and indeed users may (mistakenly) generate many records with the same key
and the same header value." So giving up tie breakers may result in very
poor compaction effectiveness when it happens, while for timestamps the
likelihood of this is considered very small.


Guozhang


On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax 
wrote:

> Thanks.
>
> To reverse the question: if this argument holds, why does it not apply
> to the case when the header key is used as compaction attribute?
>
> I am not against keeping both records in case timestamps are equal, but
> shouldn't we apply the same strategy for all cases and don't use offset
> as tie-breaker at all?
>
>
> -Matthias
>
> On 5/6/18 8:47 PM, Guozhang Wang wrote:
> > Hello Matthias,
> >
> > The related discussion was in the PR:
> > https://github.com/apache/kafka/pull/4822#discussion_r184588037
> >
> > The concern is that, to use offset as tie breaker we need to double the
> > entry size of the entry in bounded compaction cache, and hence largely
> > reduce the effectiveness of the compaction itself. Since with
> milliseconds
> > timestamp the scenario of ties with the same key is expected to be
> small, I
> > think it would be a reasonable tradeoff to make.
> >
> >
> > Guozhang
> >
> > On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax 
> > wrote:
> >
> >> Hi,
> >>
> >> I just updated myself on this KIP. One question (maybe it was discussed
> >> and I missed it). What is the motivation to not use the offset as tie
> >> breaker for the "timestamp" case? Isn't this inconsistent behavior?
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 5/2/18 2:07 PM, Guozhang Wang wrote:
> >>> Hello Luís,
> >>>
> >>> Sorry for the late reply.
> >>>
> >>> My understanding is that such duplicates will only happen if the
> >> non-offset
> >>> version value, either the timestamp or some long-typed header key, are
> >> the
> >>> same (i.e. we cannot break ties).
> >>>
> >>> 1. For timestamp, which is in milli-seconds, I think in practice the
> >>> likelihood of records with the same key and the same milli-sec
> timestamp
> >>> are very small. And hence the duplicate amount should be very small.
> >>>
> >>> 2. For long-typed header key, it is arguably out of Kafka's control,
> and
> >>> indeed users may (mistakenly) generate many records with the same key
> and
> >>> the same header value.
> >>>
> >>>
> >>> So I'd like to propose a counter-offer: for 1), we still use only 8
> bytes
> >>> and allows for potential duplicates due to ties; for 2) we use 16 bytes
> >> to
> >>> always break ties. The motivation for distinguishing 1) and 2), is that
> >> my
> >>> expectation for 1) would be much common, and hence worth special
> handling
> >>> it to be more effective in cleaning. WDYT?
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>>
> >>> On Wed, May 2, 2018 at 2:36 AM, Luís Cabral
> >> 
> >>> wrote:
> >>>
>   Hi Guozhang,
> 
>  Have you managed to have a look at my reply?
>  How do you feel about this?
> 
>  Kind Regards,
>  Luís Cabral
>  On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral <
>  luis_cab...@yahoo.com> wrote:
> 
>    Hi Guozhang,
> 
>  I understand the argument, but this is a hazardous compromise for
> using
>  Kafka as an event store (as is my original intention).
> 
>  I expect to have many duplicated messages in Kafka as the overall
>  architecture being used allows for the producer to re-send a fresh
> >> state of
>  the backed data into Kafka.Though this scenario is not common, as the
>  intention is for Kafka to bear the weight of replaying all the records
> >> for
>  new consumers, but it will occasionally happen.
> 
>  As there are plenty of records which are not updated frequently, this
>  would leave the topic with a surplus of quite a few million duplicate
>  records (and increasing every time the above mentioned function is
> >> applied).
> 
>  I would prefer to have the temporary memory footprint of 8 bytes per
>  record whenever the compaction is run (only when not in 'offset'
> mode),
>  than allowing for the topic to run into this state.
> 
>  What do you think? Is this scenario too specific for me, or do you
> >> believe
>  that it could happen to other clients as well?
> 
>  Thanks again for the continued discussion!
>  Cheers,
>  LuisOn Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang <
>  wangg...@gmail.com> wrote:
> 
>   Hello Luis,
> 
>  When the comparing the version returns `equal`, the original proposal
> >> is to
> 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-06 Thread Matthias J. Sax
Thanks.

To reverse the question: if this argument holds, why does it not apply
to the case when the header key is used as compaction attribute?

I am not against keeping both records in case timestamps are equal, but
shouldn't we apply the same strategy for all cases and don't use offset
as tie-breaker at all?


-Matthias

On 5/6/18 8:47 PM, Guozhang Wang wrote:
> Hello Matthias,
> 
> The related discussion was in the PR:
> https://github.com/apache/kafka/pull/4822#discussion_r184588037
> 
> The concern is that, to use offset as tie breaker we need to double the
> entry size of the entry in bounded compaction cache, and hence largely
> reduce the effectiveness of the compaction itself. Since with milliseconds
> timestamp the scenario of ties with the same key is expected to be small, I
> think it would be a reasonable tradeoff to make.
> 
> 
> Guozhang
> 
> On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax 
> wrote:
> 
>> Hi,
>>
>> I just updated myself on this KIP. One question (maybe it was discussed
>> and I missed it). What is the motivation to not use the offset as tie
>> breaker for the "timestamp" case? Isn't this inconsistent behavior?
>>
>>
>> -Matthias
>>
>>
>> On 5/2/18 2:07 PM, Guozhang Wang wrote:
>>> Hello Luís,
>>>
>>> Sorry for the late reply.
>>>
>>> My understanding is that such duplicates will only happen if the
>> non-offset
>>> version value, either the timestamp or some long-typed header key, are
>> the
>>> same (i.e. we cannot break ties).
>>>
>>> 1. For timestamp, which is in milli-seconds, I think in practice the
>>> likelihood of records with the same key and the same milli-sec timestamp
>>> are very small. And hence the duplicate amount should be very small.
>>>
>>> 2. For long-typed header key, it is arguably out of Kafka's control, and
>>> indeed users may (mistakenly) generate many records with the same key and
>>> the same header value.
>>>
>>>
>>> So I'd like to propose a counter-offer: for 1), we still use only 8 bytes
>>> and allows for potential duplicates due to ties; for 2) we use 16 bytes
>> to
>>> always break ties. The motivation for distinguishing 1) and 2), is that
>> my
>>> expectation for 1) would be much common, and hence worth special handling
>>> it to be more effective in cleaning. WDYT?
>>>
>>>
>>> Guozhang
>>>
>>>
>>>
>>> On Wed, May 2, 2018 at 2:36 AM, Luís Cabral
>> 
>>> wrote:
>>>
  Hi Guozhang,

 Have you managed to have a look at my reply?
 How do you feel about this?

 Kind Regards,
 Luís Cabral
 On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral <
 luis_cab...@yahoo.com> wrote:

   Hi Guozhang,

 I understand the argument, but this is a hazardous compromise for using
 Kafka as an event store (as is my original intention).

 I expect to have many duplicated messages in Kafka as the overall
 architecture being used allows for the producer to re-send a fresh
>> state of
 the backed data into Kafka.Though this scenario is not common, as the
 intention is for Kafka to bear the weight of replaying all the records
>> for
 new consumers, but it will occasionally happen.

 As there are plenty of records which are not updated frequently, this
 would leave the topic with a surplus of quite a few million duplicate
 records (and increasing every time the above mentioned function is
>> applied).

 I would prefer to have the temporary memory footprint of 8 bytes per
 record whenever the compaction is run (only when not in 'offset' mode),
 than allowing for the topic to run into this state.

 What do you think? Is this scenario too specific for me, or do you
>> believe
 that it could happen to other clients as well?

 Thanks again for the continued discussion!
 Cheers,
 LuisOn Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang <
 wangg...@gmail.com> wrote:

  Hello Luis,

 When the comparing the version returns `equal`, the original proposal
>> is to
 use the offset as the tie breaker. My previous comment is that

 1) when we build the map calling `put`, if there is already an entry for
 the key, compare its stored version, and replace if the put record's
 version is "no smaller than" the stored record: this is because when
 building the map we are always going from smaller offsets to larger
>> ones.

 2) when making a second pass to determine if each record should be
>> retained
 based on the map, we do not try to break the tie if the map's returned
 version is the same but always treat it as "keep". In this case when we
>> are
 comparing a record with itself stored in the offset map, version
>> comparison
 would return `equals`. As I mentioned in the PR, one caveat is that we
>> may
 indeed have multiple records with the same key and the same version, but
 once a new versioned 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-06 Thread Guozhang Wang
Hello Matthias,

The related discussion was in the PR:
https://github.com/apache/kafka/pull/4822#discussion_r184588037

The concern is that, to use offset as tie breaker we need to double the
entry size of the entry in bounded compaction cache, and hence largely
reduce the effectiveness of the compaction itself. Since with milliseconds
timestamp the scenario of ties with the same key is expected to be small, I
think it would be a reasonable tradeoff to make.


Guozhang

On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax 
wrote:

> Hi,
>
> I just updated myself on this KIP. One question (maybe it was discussed
> and I missed it). What is the motivation to not use the offset as tie
> breaker for the "timestamp" case? Isn't this inconsistent behavior?
>
>
> -Matthias
>
>
> On 5/2/18 2:07 PM, Guozhang Wang wrote:
> > Hello Luís,
> >
> > Sorry for the late reply.
> >
> > My understanding is that such duplicates will only happen if the
> non-offset
> > version value, either the timestamp or some long-typed header key, are
> the
> > same (i.e. we cannot break ties).
> >
> > 1. For timestamp, which is in milli-seconds, I think in practice the
> > likelihood of records with the same key and the same milli-sec timestamp
> > are very small. And hence the duplicate amount should be very small.
> >
> > 2. For long-typed header key, it is arguably out of Kafka's control, and
> > indeed users may (mistakenly) generate many records with the same key and
> > the same header value.
> >
> >
> > So I'd like to propose a counter-offer: for 1), we still use only 8 bytes
> > and allows for potential duplicates due to ties; for 2) we use 16 bytes
> to
> > always break ties. The motivation for distinguishing 1) and 2), is that
> my
> > expectation for 1) would be much common, and hence worth special handling
> > it to be more effective in cleaning. WDYT?
> >
> >
> > Guozhang
> >
> >
> >
> > On Wed, May 2, 2018 at 2:36 AM, Luís Cabral
> 
> > wrote:
> >
> >>  Hi Guozhang,
> >>
> >> Have you managed to have a look at my reply?
> >> How do you feel about this?
> >>
> >> Kind Regards,
> >> Luís Cabral
> >> On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral <
> >> luis_cab...@yahoo.com> wrote:
> >>
> >>   Hi Guozhang,
> >>
> >> I understand the argument, but this is a hazardous compromise for using
> >> Kafka as an event store (as is my original intention).
> >>
> >> I expect to have many duplicated messages in Kafka as the overall
> >> architecture being used allows for the producer to re-send a fresh
> state of
> >> the backed data into Kafka.Though this scenario is not common, as the
> >> intention is for Kafka to bear the weight of replaying all the records
> for
> >> new consumers, but it will occasionally happen.
> >>
> >> As there are plenty of records which are not updated frequently, this
> >> would leave the topic with a surplus of quite a few million duplicate
> >> records (and increasing every time the above mentioned function is
> applied).
> >>
> >> I would prefer to have the temporary memory footprint of 8 bytes per
> >> record whenever the compaction is run (only when not in 'offset' mode),
> >> than allowing for the topic to run into this state.
> >>
> >> What do you think? Is this scenario too specific for me, or do you
> believe
> >> that it could happen to other clients as well?
> >>
> >> Thanks again for the continued discussion!
> >> Cheers,
> >> LuisOn Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang <
> >> wangg...@gmail.com> wrote:
> >>
> >>  Hello Luis,
> >>
> >> When the comparing the version returns `equal`, the original proposal
> is to
> >> use the offset as the tie breaker. My previous comment is that
> >>
> >> 1) when we build the map calling `put`, if there is already an entry for
> >> the key, compare its stored version, and replace if the put record's
> >> version is "no smaller than" the stored record: this is because when
> >> building the map we are always going from smaller offsets to larger
> ones.
> >>
> >> 2) when making a second pass to determine if each record should be
> retained
> >> based on the map, we do not try to break the tie if the map's returned
> >> version is the same but always treat it as "keep". In this case when we
> are
> >> comparing a record with itself stored in the offset map, version
> comparison
> >> would return `equals`. As I mentioned in the PR, one caveat is that we
> may
> >> indeed have multiple records with the same key and the same version, but
> >> once a new versioned record is appended it will be deleted.
> >>
> >>
> >> Does that make sense?
> >>
> >> Guozhang
> >>
> >>
> >> On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral
>  >>>
> >> wrote:
> >>
> >>>  Hi,
> >>>
> >>> I was updating the PR to match the latest decisions and noticed (or
> >>> rather, the integration tests noticed) that without storing the offset,
> >>> then the cache doesn't know when to keep 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-04 Thread Guozhang Wang
Thanks Luís, I do not have other comments on this KIP. I'd also like to
ping Jason and Jun to take a look at this one.


Guozhang

On Thu, May 3, 2018 at 1:40 AM, Luís Cabral 
wrote:

>
> Hi Guozhang,
>
> No worries, looking at the traffic on this project, I'm sure you have your
> hands full.
> Anyway, that proposal seems quite reasonable :- KIP is now updated to
> reflect those points.
>
> Are there any more topics you would like to address here?
>
> Cheers,
> LuisOn Wednesday, May 2, 2018, 11:07:54 PM GMT+2, Guozhang Wang <
> wangg...@gmail.com> wrote:
>
>  Hello Luís,
>
> Sorry for the late reply.
>
> My understanding is that such duplicates will only happen if the non-offset
> version value, either the timestamp or some long-typed header key, are the
> same (i.e. we cannot break ties).
>
> 1. For timestamp, which is in milli-seconds, I think in practice the
> likelihood of records with the same key and the same milli-sec timestamp
> are very small. And hence the duplicate amount should be very small.
>
> 2. For long-typed header key, it is arguably out of Kafka's control, and
> indeed users may (mistakenly) generate many records with the same key and
> the same header value.
>
>
> So I'd like to propose a counter-offer: for 1), we still use only 8 bytes
> and allows for potential duplicates due to ties; for 2) we use 16 bytes to
> always break ties. The motivation for distinguishing 1) and 2), is that my
> expectation for 1) would be much common, and hence worth special handling
> it to be more effective in cleaning. WDYT?
>
>
> Guozhang
>
>
>
> On Wed, May 2, 2018 at 2:36 AM, Luís Cabral  >
> wrote:
>
> >  Hi Guozhang,
> >
> > Have you managed to have a look at my reply?
> > How do you feel about this?
> >
> > Kind Regards,
> > Luís Cabral
> >On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral <
> > luis_cab...@yahoo.com> wrote:
> >
> >  Hi Guozhang,
> >
> > I understand the argument, but this is a hazardous compromise for using
> > Kafka as an event store (as is my original intention).
> >
> > I expect to have many duplicated messages in Kafka as the overall
> > architecture being used allows for the producer to re-send a fresh state
> of
> > the backed data into Kafka.Though this scenario is not common, as the
> > intention is for Kafka to bear the weight of replaying all the records
> for
> > new consumers, but it will occasionally happen.
> >
> > As there are plenty of records which are not updated frequently, this
> > would leave the topic with a surplus of quite a few million duplicate
> > records (and increasing every time the above mentioned function is
> applied).
> >
> > I would prefer to have the temporary memory footprint of 8 bytes per
> > record whenever the compaction is run (only when not in 'offset' mode),
> > than allowing for the topic to run into this state.
> >
> > What do you think? Is this scenario too specific for me, or do you
> believe
> > that it could happen to other clients as well?
> >
> > Thanks again for the continued discussion!
> > Cheers,
> > LuisOn Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang <
> > wangg...@gmail.com> wrote:
> >
> >  Hello Luis,
> >
> > When the comparing the version returns `equal`, the original proposal is
> to
> > use the offset as the tie breaker. My previous comment is that
> >
> > 1) when we build the map calling `put`, if there is already an entry for
> > the key, compare its stored version, and replace if the put record's
> > version is "no smaller than" the stored record: this is because when
> > building the map we are always going from smaller offsets to larger ones.
> >
> > 2) when making a second pass to determine if each record should be
> retained
> > based on the map, we do not try to break the tie if the map's returned
> > version is the same but always treat it as "keep". In this case when we
> are
> > comparing a record with itself stored in the offset map, version
> comparison
> > would return `equals`. As I mentioned in the PR, one caveat is that we
> may
> > indeed have multiple records with the same key and the same version, but
> > once a new versioned record is appended it will be deleted.
> >
> >
> > Does that make sense?
> >
> > Guozhang
> >
> >
> > On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral
>  > >
> > wrote:
> >
> > >  Hi,
> > >
> > > I was updating the PR to match the latest decisions and noticed (or
> > > rather, the integration tests noticed) that without storing the offset,
> > > then the cache doesn't know when to keep the record itself.
> > >
> > > This is because, after the cache is populated, all the records are
> > > compared against the stored ones, so "Record{key:A,offset:1,
> version:1}"
> > > will compare against itself and be flagged as "don't keep", since we
> only
> > > compared based on the version and didn't check to see if the offset was
> > the
> > > same or not.
> > >
> > > 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-03 Thread Luís Cabral
 
Hi Guozhang,

No worries, looking at the traffic on this project, I'm sure you have your 
hands full.
Anyway, that proposal seems quite reasonable :- KIP is now updated to reflect 
those points.

Are there any more topics you would like to address here?

Cheers,
LuisOn Wednesday, May 2, 2018, 11:07:54 PM GMT+2, Guozhang Wang 
 wrote:  
 
 Hello Luís,

Sorry for the late reply.

My understanding is that such duplicates will only happen if the non-offset
version value, either the timestamp or some long-typed header key, are the
same (i.e. we cannot break ties).

1. For timestamp, which is in milli-seconds, I think in practice the
likelihood of records with the same key and the same milli-sec timestamp
are very small. And hence the duplicate amount should be very small.

2. For long-typed header key, it is arguably out of Kafka's control, and
indeed users may (mistakenly) generate many records with the same key and
the same header value.


So I'd like to propose a counter-offer: for 1), we still use only 8 bytes
and allows for potential duplicates due to ties; for 2) we use 16 bytes to
always break ties. The motivation for distinguishing 1) and 2), is that my
expectation for 1) would be much common, and hence worth special handling
it to be more effective in cleaning. WDYT?


Guozhang



On Wed, May 2, 2018 at 2:36 AM, Luís Cabral 
wrote:

>  Hi Guozhang,
>
> Have you managed to have a look at my reply?
> How do you feel about this?
>
> Kind Regards,
> Luís Cabral
>    On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral <
> luis_cab...@yahoo.com> wrote:
>
>  Hi Guozhang,
>
> I understand the argument, but this is a hazardous compromise for using
> Kafka as an event store (as is my original intention).
>
> I expect to have many duplicated messages in Kafka as the overall
> architecture being used allows for the producer to re-send a fresh state of
> the backed data into Kafka.Though this scenario is not common, as the
> intention is for Kafka to bear the weight of replaying all the records for
> new consumers, but it will occasionally happen.
>
> As there are plenty of records which are not updated frequently, this
> would leave the topic with a surplus of quite a few million duplicate
> records (and increasing every time the above mentioned function is applied).
>
> I would prefer to have the temporary memory footprint of 8 bytes per
> record whenever the compaction is run (only when not in 'offset' mode),
> than allowing for the topic to run into this state.
>
> What do you think? Is this scenario too specific for me, or do you believe
> that it could happen to other clients as well?
>
> Thanks again for the continued discussion!
> Cheers,
> Luis    On Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang <
> wangg...@gmail.com> wrote:
>
>  Hello Luis,
>
> When the comparing the version returns `equal`, the original proposal is to
> use the offset as the tie breaker. My previous comment is that
>
> 1) when we build the map calling `put`, if there is already an entry for
> the key, compare its stored version, and replace if the put record's
> version is "no smaller than" the stored record: this is because when
> building the map we are always going from smaller offsets to larger ones.
>
> 2) when making a second pass to determine if each record should be retained
> based on the map, we do not try to break the tie if the map's returned
> version is the same but always treat it as "keep". In this case when we are
> comparing a record with itself stored in the offset map, version comparison
> would return `equals`. As I mentioned in the PR, one caveat is that we may
> indeed have multiple records with the same key and the same version, but
> once a new versioned record is appended it will be deleted.
>
>
> Does that make sense?
>
> Guozhang
>
>
> On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral  >
> wrote:
>
> >  Hi,
> >
> > I was updating the PR to match the latest decisions and noticed (or
> > rather, the integration tests noticed) that without storing the offset,
> > then the cache doesn't know when to keep the record itself.
> >
> > This is because, after the cache is populated, all the records are
> > compared against the stored ones, so "Record{key:A,offset:1,version:1}"
> > will compare against itself and be flagged as "don't keep", since we only
> > compared based on the version and didn't check to see if the offset was
> the
> > same or not.
> >
> > This sort of invalidates not storing the offset in the cache,
> > unfortunately, and the binary footprint increases two-fold when "offset"
> is
> > not used as a compaction strategy.
> >
> > Guozhang: Is it ok with you if we go back on this decision and leave the
> > offset as a tie-breaker?
> >
> >
> > Kind Regards,Luis
> >
> >    On Friday, April 27, 2018, 11:11:55 AM GMT+2, Luís Cabral
> >  wrote:
> >
> >  Hi,
> >
> > The KIP is now 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-02 Thread Guozhang Wang
Hello Luís,

Sorry for the late reply.

My understanding is that such duplicates will only happen if the non-offset
version value, either the timestamp or some long-typed header key, are the
same (i.e. we cannot break ties).

1. For timestamp, which is in milli-seconds, I think in practice the
likelihood of records with the same key and the same milli-sec timestamp
are very small. And hence the duplicate amount should be very small.

2. For long-typed header key, it is arguably out of Kafka's control, and
indeed users may (mistakenly) generate many records with the same key and
the same header value.


So I'd like to propose a counter-offer: for 1), we still use only 8 bytes
and allows for potential duplicates due to ties; for 2) we use 16 bytes to
always break ties. The motivation for distinguishing 1) and 2), is that my
expectation for 1) would be much common, and hence worth special handling
it to be more effective in cleaning. WDYT?


Guozhang



On Wed, May 2, 2018 at 2:36 AM, Luís Cabral 
wrote:

>  Hi Guozhang,
>
> Have you managed to have a look at my reply?
> How do you feel about this?
>
> Kind Regards,
> Luís Cabral
> On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral <
> luis_cab...@yahoo.com> wrote:
>
>   Hi Guozhang,
>
> I understand the argument, but this is a hazardous compromise for using
> Kafka as an event store (as is my original intention).
>
> I expect to have many duplicated messages in Kafka as the overall
> architecture being used allows for the producer to re-send a fresh state of
> the backed data into Kafka.Though this scenario is not common, as the
> intention is for Kafka to bear the weight of replaying all the records for
> new consumers, but it will occasionally happen.
>
> As there are plenty of records which are not updated frequently, this
> would leave the topic with a surplus of quite a few million duplicate
> records (and increasing every time the above mentioned function is applied).
>
> I would prefer to have the temporary memory footprint of 8 bytes per
> record whenever the compaction is run (only when not in 'offset' mode),
> than allowing for the topic to run into this state.
>
> What do you think? Is this scenario too specific for me, or do you believe
> that it could happen to other clients as well?
>
> Thanks again for the continued discussion!
> Cheers,
> LuisOn Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang <
> wangg...@gmail.com> wrote:
>
>  Hello Luis,
>
> When the comparing the version returns `equal`, the original proposal is to
> use the offset as the tie breaker. My previous comment is that
>
> 1) when we build the map calling `put`, if there is already an entry for
> the key, compare its stored version, and replace if the put record's
> version is "no smaller than" the stored record: this is because when
> building the map we are always going from smaller offsets to larger ones.
>
> 2) when making a second pass to determine if each record should be retained
> based on the map, we do not try to break the tie if the map's returned
> version is the same but always treat it as "keep". In this case when we are
> comparing a record with itself stored in the offset map, version comparison
> would return `equals`. As I mentioned in the PR, one caveat is that we may
> indeed have multiple records with the same key and the same version, but
> once a new versioned record is appended it will be deleted.
>
>
> Does that make sense?
>
> Guozhang
>
>
> On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral  >
> wrote:
>
> >  Hi,
> >
> > I was updating the PR to match the latest decisions and noticed (or
> > rather, the integration tests noticed) that without storing the offset,
> > then the cache doesn't know when to keep the record itself.
> >
> > This is because, after the cache is populated, all the records are
> > compared against the stored ones, so "Record{key:A,offset:1,version:1}"
> > will compare against itself and be flagged as "don't keep", since we only
> > compared based on the version and didn't check to see if the offset was
> the
> > same or not.
> >
> > This sort of invalidates not storing the offset in the cache,
> > unfortunately, and the binary footprint increases two-fold when "offset"
> is
> > not used as a compaction strategy.
> >
> > Guozhang: Is it ok with you if we go back on this decision and leave the
> > offset as a tie-breaker?
> >
> >
> > Kind Regards,Luis
> >
> >On Friday, April 27, 2018, 11:11:55 AM GMT+2, Luís Cabral
> >  wrote:
> >
> >  Hi,
> >
> > The KIP is now updated with the results of the byte array discussion.
> >
> > This is my first contribution to Kafka, so I'm not sure on what the
> > processes are. Is it now acceptable to take this into a vote, or should I
> > ask for more contributors to join the discussion first?
> >
> > Kind Regards,LuisOn Friday, April 27, 2018, 6:12:03 AM GMT+2,
> Guozhang
> > Wang 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-02 Thread Luís Cabral
 Hi Guozhang,

Have you managed to have a look at my reply?
How do you feel about this?

Kind Regards,
Luís Cabral
On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral 
 wrote:  
 
  Hi Guozhang,

I understand the argument, but this is a hazardous compromise for using Kafka 
as an event store (as is my original intention).

I expect to have many duplicated messages in Kafka as the overall architecture 
being used allows for the producer to re-send a fresh state of the backed data 
into Kafka.Though this scenario is not common, as the intention is for Kafka to 
bear the weight of replaying all the records for new consumers, but it will 
occasionally happen.

As there are plenty of records which are not updated frequently, this would 
leave the topic with a surplus of quite a few million duplicate records (and 
increasing every time the above mentioned function is applied).

I would prefer to have the temporary memory footprint of 8 bytes per record 
whenever the compaction is run (only when not in 'offset' mode), than allowing 
for the topic to run into this state.

What do you think? Is this scenario too specific for me, or do you believe that 
it could happen to other clients as well?

Thanks again for the continued discussion!
Cheers,
LuisOn Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang 
 wrote:  
 
 Hello Luis,

When the comparing the version returns `equal`, the original proposal is to
use the offset as the tie breaker. My previous comment is that

1) when we build the map calling `put`, if there is already an entry for
the key, compare its stored version, and replace if the put record's
version is "no smaller than" the stored record: this is because when
building the map we are always going from smaller offsets to larger ones.

2) when making a second pass to determine if each record should be retained
based on the map, we do not try to break the tie if the map's returned
version is the same but always treat it as "keep". In this case when we are
comparing a record with itself stored in the offset map, version comparison
would return `equals`. As I mentioned in the PR, one caveat is that we may
indeed have multiple records with the same key and the same version, but
once a new versioned record is appended it will be deleted.


Does that make sense?

Guozhang


On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral 
wrote:

>  Hi,
>
> I was updating the PR to match the latest decisions and noticed (or
> rather, the integration tests noticed) that without storing the offset,
> then the cache doesn't know when to keep the record itself.
>
> This is because, after the cache is populated, all the records are
> compared against the stored ones, so "Record{key:A,offset:1,version:1}"
> will compare against itself and be flagged as "don't keep", since we only
> compared based on the version and didn't check to see if the offset was the
> same or not.
>
> This sort of invalidates not storing the offset in the cache,
> unfortunately, and the binary footprint increases two-fold when "offset" is
> not used as a compaction strategy.
>
> Guozhang: Is it ok with you if we go back on this decision and leave the
> offset as a tie-breaker?
>
>
> Kind Regards,Luis
>
>    On Friday, April 27, 2018, 11:11:55 AM GMT+2, Luís Cabral
>  wrote:
>
>  Hi,
>
> The KIP is now updated with the results of the byte array discussion.
>
> This is my first contribution to Kafka, so I'm not sure on what the
> processes are. Is it now acceptable to take this into a vote, or should I
> ask for more contributors to join the discussion first?
>
> Kind Regards,Luis    On Friday, April 27, 2018, 6:12:03 AM GMT+2, Guozhang
> Wang  wrote:
>
>  Hello Luís,
>
> > Offset is an integer? I've only noticed it being resolved as a long so
> far.
>
> You are right, offset is a long.
>
> As for timestamp / other types, I left a comment in your PR about handling
> tie breakers.
>
> > Given these arguments, is this point something that you absolutely must
> have?
>
> No I do not have a strong use case in mind to go with arbitrary byte
> arrays, was just thinking that if we are going to enhance log compaction
> why not generalize it more :)
>
> Your concern about the memory usage makes sense. I'm happy to take my
> suggestion back and enforce only long typed fields.
>
>
> Guozhang
>
>
>
>
>
> On Thu, Apr 26, 2018 at 1:44 AM, Luís Cabral  >
> wrote:
>
> >  Hi,
> >
> > bq. have a integer typed OffsetMap (for offset)
> >
> > Offset is an integer? I've only noticed it being resolved as a long so
> far.
> >
> >
> > bq. long typed OffsetMap (for timestamp)
> >
> > We would still need to store the offset, as it is functioning as a
> > tie-breaker. Not that this is a big deal, we can be easily have both (as
> > currently done on the PR).
> >
> >
> > bq. For the byte array typed offset map, we can use 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-30 Thread Luís Cabral
 Hi Guozhang,

I understand the argument, but this is a hazardous compromise for using Kafka 
as an event store (as is my original intention).

I expect to have many duplicated messages in Kafka as the overall architecture 
being used allows for the producer to re-send a fresh state of the backed data 
into Kafka.Though this scenario is not common, as the intention is for Kafka to 
bear the weight of replaying all the records for new consumers, but it will 
occasionally happen.

As there are plenty of records which are not updated frequently, this would 
leave the topic with a surplus of quite a few million duplicate records (and 
increasing every time the above mentioned function is applied).

I would prefer to have the temporary memory footprint of 8 bytes per record 
whenever the compaction is run (only when not in 'offset' mode), than allowing 
for the topic to run into this state.

What do you think? Is this scenario too specific for me, or do you believe that 
it could happen to other clients as well?

Thanks again for the continued discussion!
Cheers,
LuisOn Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang 
 wrote:  
 
 Hello Luis,

When the comparing the version returns `equal`, the original proposal is to
use the offset as the tie breaker. My previous comment is that

1) when we build the map calling `put`, if there is already an entry for
the key, compare its stored version, and replace if the put record's
version is "no smaller than" the stored record: this is because when
building the map we are always going from smaller offsets to larger ones.

2) when making a second pass to determine if each record should be retained
based on the map, we do not try to break the tie if the map's returned
version is the same but always treat it as "keep". In this case when we are
comparing a record with itself stored in the offset map, version comparison
would return `equals`. As I mentioned in the PR, one caveat is that we may
indeed have multiple records with the same key and the same version, but
once a new versioned record is appended it will be deleted.


Does that make sense?

Guozhang


On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral 
wrote:

>  Hi,
>
> I was updating the PR to match the latest decisions and noticed (or
> rather, the integration tests noticed) that without storing the offset,
> then the cache doesn't know when to keep the record itself.
>
> This is because, after the cache is populated, all the records are
> compared against the stored ones, so "Record{key:A,offset:1,version:1}"
> will compare against itself and be flagged as "don't keep", since we only
> compared based on the version and didn't check to see if the offset was the
> same or not.
>
> This sort of invalidates not storing the offset in the cache,
> unfortunately, and the binary footprint increases two-fold when "offset" is
> not used as a compaction strategy.
>
> Guozhang: Is it ok with you if we go back on this decision and leave the
> offset as a tie-breaker?
>
>
> Kind Regards,Luis
>
>    On Friday, April 27, 2018, 11:11:55 AM GMT+2, Luís Cabral
>  wrote:
>
>  Hi,
>
> The KIP is now updated with the results of the byte array discussion.
>
> This is my first contribution to Kafka, so I'm not sure on what the
> processes are. Is it now acceptable to take this into a vote, or should I
> ask for more contributors to join the discussion first?
>
> Kind Regards,Luis    On Friday, April 27, 2018, 6:12:03 AM GMT+2, Guozhang
> Wang  wrote:
>
>  Hello Luís,
>
> > Offset is an integer? I've only noticed it being resolved as a long so
> far.
>
> You are right, offset is a long.
>
> As for timestamp / other types, I left a comment in your PR about handling
> tie breakers.
>
> > Given these arguments, is this point something that you absolutely must
> have?
>
> No I do not have a strong use case in mind to go with arbitrary byte
> arrays, was just thinking that if we are going to enhance log compaction
> why not generalize it more :)
>
> Your concern about the memory usage makes sense. I'm happy to take my
> suggestion back and enforce only long typed fields.
>
>
> Guozhang
>
>
>
>
>
> On Thu, Apr 26, 2018 at 1:44 AM, Luís Cabral  >
> wrote:
>
> >  Hi,
> >
> > bq. have a integer typed OffsetMap (for offset)
> >
> > Offset is an integer? I've only noticed it being resolved as a long so
> far.
> >
> >
> > bq. long typed OffsetMap (for timestamp)
> >
> > We would still need to store the offset, as it is functioning as a
> > tie-breaker. Not that this is a big deal, we can be easily have both (as
> > currently done on the PR).
> >
> >
> > bq. For the byte array typed offset map, we can use a general hashmap,
> > where the hashmap's CAPACITY will be reasoned from the given "val memory:
> > Int" parameter
> >
> > If you have a map with 128 byte capacity, then store a value with 16
> bytes
> > and 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-27 Thread Guozhang Wang
Hello Luis,

When the comparing the version returns `equal`, the original proposal is to
use the offset as the tie breaker. My previous comment is that

1) when we build the map calling `put`, if there is already an entry for
the key, compare its stored version, and replace if the put record's
version is "no smaller than" the stored record: this is because when
building the map we are always going from smaller offsets to larger ones.

2) when making a second pass to determine if each record should be retained
based on the map, we do not try to break the tie if the map's returned
version is the same but always treat it as "keep". In this case when we are
comparing a record with itself stored in the offset map, version comparison
would return `equals`. As I mentioned in the PR, one caveat is that we may
indeed have multiple records with the same key and the same version, but
once a new versioned record is appended it will be deleted.


Does that make sense?

Guozhang


On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral 
wrote:

>  Hi,
>
> I was updating the PR to match the latest decisions and noticed (or
> rather, the integration tests noticed) that without storing the offset,
> then the cache doesn't know when to keep the record itself.
>
> This is because, after the cache is populated, all the records are
> compared against the stored ones, so "Record{key:A,offset:1,version:1}"
> will compare against itself and be flagged as "don't keep", since we only
> compared based on the version and didn't check to see if the offset was the
> same or not.
>
> This sort of invalidates not storing the offset in the cache,
> unfortunately, and the binary footprint increases two-fold when "offset" is
> not used as a compaction strategy.
>
> Guozhang: Is it ok with you if we go back on this decision and leave the
> offset as a tie-breaker?
>
>
> Kind Regards,Luis
>
> On Friday, April 27, 2018, 11:11:55 AM GMT+2, Luís Cabral
>  wrote:
>
>   Hi,
>
> The KIP is now updated with the results of the byte array discussion.
>
> This is my first contribution to Kafka, so I'm not sure on what the
> processes are. Is it now acceptable to take this into a vote, or should I
> ask for more contributors to join the discussion first?
>
> Kind Regards,LuisOn Friday, April 27, 2018, 6:12:03 AM GMT+2, Guozhang
> Wang  wrote:
>
>  Hello Luís,
>
> > Offset is an integer? I've only noticed it being resolved as a long so
> far.
>
> You are right, offset is a long.
>
> As for timestamp / other types, I left a comment in your PR about handling
> tie breakers.
>
> > Given these arguments, is this point something that you absolutely must
> have?
>
> No I do not have a strong use case in mind to go with arbitrary byte
> arrays, was just thinking that if we are going to enhance log compaction
> why not generalize it more :)
>
> Your concern about the memory usage makes sense. I'm happy to take my
> suggestion back and enforce only long typed fields.
>
>
> Guozhang
>
>
>
>
>
> On Thu, Apr 26, 2018 at 1:44 AM, Luís Cabral  >
> wrote:
>
> >  Hi,
> >
> > bq. have a integer typed OffsetMap (for offset)
> >
> > Offset is an integer? I've only noticed it being resolved as a long so
> far.
> >
> >
> > bq. long typed OffsetMap (for timestamp)
> >
> > We would still need to store the offset, as it is functioning as a
> > tie-breaker. Not that this is a big deal, we can be easily have both (as
> > currently done on the PR).
> >
> >
> > bq. For the byte array typed offset map, we can use a general hashmap,
> > where the hashmap's CAPACITY will be reasoned from the given "val memory:
> > Int" parameter
> >
> > If you have a map with 128 byte capacity, then store a value with 16
> bytes
> > and another with 32 bytes, how many free slots do you have left in this
> map?
> >
> > You can make this work, but I think you would need to re-design the whole
> > log cleaner approach, which implies changing some of the already existing
> > configurations (like "log.cleaner.io.buffer.load.factor"). I would
> rather
> > maintain backwards compatibility as much as possible in this KIP, and if
> > this means that using "foo" / "bar" or "2.1-a" / "3.20-b" as record
> > versions is not viable, then so be it.
> >
> > Given these arguments, is this point something that you absolutely must
> > have? I'm still sort of hoping that you are just entertaining the idea
> and
> > are ok with having a long (now conceded to be unsigned, so the byte
> arrays
> > can be compared directly).
> >
> >
> > Kind Regards,Luis
> >
>
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-27 Thread Luís Cabral
 Hi,

I was updating the PR to match the latest decisions and noticed (or rather, the 
integration tests noticed) that without storing the offset, then the cache 
doesn't know when to keep the record itself.

This is because, after the cache is populated, all the records are compared 
against the stored ones, so "Record{key:A,offset:1,version:1}" will compare 
against itself and be flagged as "don't keep", since we only compared based on 
the version and didn't check to see if the offset was the same or not.

This sort of invalidates not storing the offset in the cache, unfortunately, 
and the binary footprint increases two-fold when "offset" is not used as a 
compaction strategy.

Guozhang: Is it ok with you if we go back on this decision and leave the offset 
as a tie-breaker?


Kind Regards,Luis

On Friday, April 27, 2018, 11:11:55 AM GMT+2, Luís Cabral 
 wrote:  
 
  Hi,

The KIP is now updated with the results of the byte array discussion.

This is my first contribution to Kafka, so I'm not sure on what the processes 
are. Is it now acceptable to take this into a vote, or should I ask for more 
contributors to join the discussion first?

Kind Regards,Luis    On Friday, April 27, 2018, 6:12:03 AM GMT+2, Guozhang Wang 
 wrote:  
 
 Hello Luís,

> Offset is an integer? I've only noticed it being resolved as a long so
far.

You are right, offset is a long.

As for timestamp / other types, I left a comment in your PR about handling
tie breakers.

> Given these arguments, is this point something that you absolutely must
have?

No I do not have a strong use case in mind to go with arbitrary byte
arrays, was just thinking that if we are going to enhance log compaction
why not generalize it more :)

Your concern about the memory usage makes sense. I'm happy to take my
suggestion back and enforce only long typed fields.


Guozhang





On Thu, Apr 26, 2018 at 1:44 AM, Luís Cabral 
wrote:

>  Hi,
>
> bq. have a integer typed OffsetMap (for offset)
>
> Offset is an integer? I've only noticed it being resolved as a long so far.
>
>
> bq. long typed OffsetMap (for timestamp)
>
> We would still need to store the offset, as it is functioning as a
> tie-breaker. Not that this is a big deal, we can be easily have both (as
> currently done on the PR).
>
>
> bq. For the byte array typed offset map, we can use a general hashmap,
> where the hashmap's CAPACITY will be reasoned from the given "val memory:
> Int" parameter
>
> If you have a map with 128 byte capacity, then store a value with 16 bytes
> and another with 32 bytes, how many free slots do you have left in this map?
>
> You can make this work, but I think you would need to re-design the whole
> log cleaner approach, which implies changing some of the already existing
> configurations (like "log.cleaner.io.buffer.load.factor"). I would rather
> maintain backwards compatibility as much as possible in this KIP, and if
> this means that using "foo" / "bar" or "2.1-a" / "3.20-b" as record
> versions is not viable, then so be it.
>
> Given these arguments, is this point something that you absolutely must
> have? I'm still sort of hoping that you are just entertaining the idea and
> are ok with having a long (now conceded to be unsigned, so the byte arrays
> can be compared directly).
>
>
> Kind Regards,Luis
>



-- 
-- Guozhang    

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-27 Thread Luís Cabral
 Hi,

The KIP is now updated with the results of the byte array discussion.

This is my first contribution to Kafka, so I'm not sure on what the processes 
are. Is it now acceptable to take this into a vote, or should I ask for more 
contributors to join the discussion first?

Kind Regards,LuisOn Friday, April 27, 2018, 6:12:03 AM GMT+2, Guozhang Wang 
 wrote:  
 
 Hello Luís,

> Offset is an integer? I've only noticed it being resolved as a long so
far.

You are right, offset is a long.

As for timestamp / other types, I left a comment in your PR about handling
tie breakers.

> Given these arguments, is this point something that you absolutely must
have?

No I do not have a strong use case in mind to go with arbitrary byte
arrays, was just thinking that if we are going to enhance log compaction
why not generalize it more :)

Your concern about the memory usage makes sense. I'm happy to take my
suggestion back and enforce only long typed fields.


Guozhang





On Thu, Apr 26, 2018 at 1:44 AM, Luís Cabral 
wrote:

>  Hi,
>
> bq. have a integer typed OffsetMap (for offset)
>
> Offset is an integer? I've only noticed it being resolved as a long so far.
>
>
> bq. long typed OffsetMap (for timestamp)
>
> We would still need to store the offset, as it is functioning as a
> tie-breaker. Not that this is a big deal, we can be easily have both (as
> currently done on the PR).
>
>
> bq. For the byte array typed offset map, we can use a general hashmap,
> where the hashmap's CAPACITY will be reasoned from the given "val memory:
> Int" parameter
>
> If you have a map with 128 byte capacity, then store a value with 16 bytes
> and another with 32 bytes, how many free slots do you have left in this map?
>
> You can make this work, but I think you would need to re-design the whole
> log cleaner approach, which implies changing some of the already existing
> configurations (like "log.cleaner.io.buffer.load.factor"). I would rather
> maintain backwards compatibility as much as possible in this KIP, and if
> this means that using "foo" / "bar" or "2.1-a" / "3.20-b" as record
> versions is not viable, then so be it.
>
> Given these arguments, is this point something that you absolutely must
> have? I'm still sort of hoping that you are just entertaining the idea and
> are ok with having a long (now conceded to be unsigned, so the byte arrays
> can be compared directly).
>
>
> Kind Regards,Luis
>



-- 
-- Guozhang  

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-26 Thread Guozhang Wang
Hello Luís,

> Offset is an integer? I've only noticed it being resolved as a long so
far.

You are right, offset is a long.

As for timestamp / other types, I left a comment in your PR about handling
tie breakers.

> Given these arguments, is this point something that you absolutely must
have?

No I do not have a strong use case in mind to go with arbitrary byte
arrays, was just thinking that if we are going to enhance log compaction
why not generalize it more :)

Your concern about the memory usage makes sense. I'm happy to take my
suggestion back and enforce only long typed fields.


Guozhang





On Thu, Apr 26, 2018 at 1:44 AM, Luís Cabral 
wrote:

>  Hi,
>
> bq. have a integer typed OffsetMap (for offset)
>
> Offset is an integer? I've only noticed it being resolved as a long so far.
>
>
> bq. long typed OffsetMap (for timestamp)
>
> We would still need to store the offset, as it is functioning as a
> tie-breaker. Not that this is a big deal, we can be easily have both (as
> currently done on the PR).
>
>
> bq. For the byte array typed offset map, we can use a general hashmap,
> where the hashmap's CAPACITY will be reasoned from the given "val memory:
> Int" parameter
>
> If you have a map with 128 byte capacity, then store a value with 16 bytes
> and another with 32 bytes, how many free slots do you have left in this map?
>
> You can make this work, but I think you would need to re-design the whole
> log cleaner approach, which implies changing some of the already existing
> configurations (like "log.cleaner.io.buffer.load.factor"). I would rather
> maintain backwards compatibility as much as possible in this KIP, and if
> this means that using "foo" / "bar" or "2.1-a" / "3.20-b" as record
> versions is not viable, then so be it.
>
> Given these arguments, is this point something that you absolutely must
> have? I'm still sort of hoping that you are just entertaining the idea and
> are ok with having a long (now conceded to be unsigned, so the byte arrays
> can be compared directly).
>
>
> Kind Regards,Luis
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-26 Thread Luís Cabral
 Hi,

bq. have a integer typed OffsetMap (for offset)

Offset is an integer? I've only noticed it being resolved as a long so far.


bq. long typed OffsetMap (for timestamp)

We would still need to store the offset, as it is functioning as a tie-breaker. 
Not that this is a big deal, we can be easily have both (as currently done on 
the PR).


bq. For the byte array typed offset map, we can use a general hashmap, where 
the hashmap's CAPACITY will be reasoned from the given "val memory: Int" 
parameter

If you have a map with 128 byte capacity, then store a value with 16 bytes and 
another with 32 bytes, how many free slots do you have left in this map?

You can make this work, but I think you would need to re-design the whole log 
cleaner approach, which implies changing some of the already existing 
configurations (like "log.cleaner.io.buffer.load.factor"). I would rather 
maintain backwards compatibility as much as possible in this KIP, and if this 
means that using "foo" / "bar" or "2.1-a" / "3.20-b" as record versions is not 
viable, then so be it.
 
Given these arguments, is this point something that you absolutely must have? 
I'm still sort of hoping that you are just entertaining the idea and are ok 
with having a long (now conceded to be unsigned, so the byte arrays can be 
compared directly).


Kind Regards,Luis


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-25 Thread Guozhang Wang
Luis,

Just sharing my two cents here: we can extend the OffsetMap trait, making
its "offset" type a generic template, and have a integer typed OffsetMap
(for offset), and a long typed OffsetMap (for timestamp), and a byte array
typed OffsetMap (record header fields).

For the byte array typed offset map, we can use a general hashmap, where
the hashmap's CAPACITY will be reasoned from the given "val memory: Int"
parameter, for example:
http://java-performance.info/memory-consumption-of-java-data-types-2/


Guozhang


On Tue, Apr 24, 2018 at 1:40 PM, Luis Cabral <luis_cab...@yahoo.com.invalid>
wrote:

> Hi Guozhang,
>
> I mean here:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/
> OffsetMap.scala
>
>
> It seems that this class was designed with a great focus on preventing
> memory starvation, and this class is also very central to the changes
> required for this KIP.
>
> So far this has not been of great concern, as enhancing that logic with an
> extra 8 bytes is quite easy (have a look at the pull request for more
> details).
> Is it alright to enhance this with a variable amount of bytes, though? Or
> to replace it with a direct record cache?
>
> Kind Regards
> Luis
>
>
> > On 24 Apr 2018, at 20:30, Guozhang Wang <wangg...@gmail.com> wrote:
> >
> > Not sure if I fully understand your question, but here's my
> understanding:
> > In LogCleaner we call:
> >
> > "val records = MemoryRecords.readableRecords(readBuffer)"
> >
> > Which returns a MemoryRecords object, and then call filterInto with a
> given
> > customized RecordFilter that instantiates "checkBatchRetention(batch:
> > RecordBatch)". Note that `RecordBatch` is just an iterator of `Record`,
> > which contains the headers, so we can just access that header there.
> >
> >
> > Guozhang
> >
> >
> > On Tue, Apr 24, 2018 at 12:41 AM, Luís Cabral
> <luis_cab...@yahoo.com.invalid
> >> wrote:
> >
> >>
> >> Hi Guozhang,
> >>
> >> As much as I would like to move on from this topic, I've now tried to
> >> implement it into the pull request, and could not find a viable way to
> >> store a variable size byte array into the current concept of the log
> >> cleaner (with long the current approach just always considers it to be 8
> >> bytes).
> >>
> >> Do you have any suggestions on how to handle this issue there?
> >>
> >> Kind Regards,
> >> Luis
> >>
> >>On Tuesday, April 24, 2018, 1:11:11 AM GMT+2, Luís Cabral <
> >> luis_cab...@yahoo.com> wrote:
> >>
> >> #yiv6853119978 #yiv6853119978 -- _filtered #yiv6853119978 {panose-1:2 4
> 5
> >> 3 5 4 6 3 2 4;} _filtered #yiv6853119978 {font-family:Calibri;panose-1:
> 2
> >> 15 5 2 2 2 4 3 2 4;}#yiv6853119978 #yiv6853119978
> p.yiv6853119978MsoNormal,
> >> #yiv6853119978 li.yiv6853119978MsoNormal, #yiv6853119978
> >> div.yiv6853119978MsoNormal {margin:0cm;margin-bottom:.
> >> 0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv6853119978 a:link,
> >> #yiv6853119978 span.yiv6853119978MsoHyperlink
> {color:blue;text-decoration:underline;}#yiv6853119978
> >> a:visited, #yiv6853119978 span.yiv6853119978MsoHyperlinkFollowed
> >> {color:#954F72;text-decoration:underline;}#yiv6853119978
> >> .yiv6853119978MsoChpDefault {} _filtered #yiv6853119978 {margin:72.0pt
> >> 72.0pt 72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1
> >> {}#yiv6853119978
> >> That is definitely clearer, KIP updated!
> >>
> >>
> >>
> >> From: Guozhang Wang
> >> Sent: 23 April 2018 23:44
> >> To: dev@kafka.apache.org
> >> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >>
> >>
> >>
> >> Thanks Luís. The KIP looks good to me. Just that what I left as a minor:
> >>
> >>
> >>
> >> `When both records being compared contain a matching "compaction value",
> >>
> >> then the record with the highest offset will be kept;`
> >>
> >>
> >>
> >> I understand your intent, it's just that the sentence itself is a bit
> >>
> >> misleading, I think what you actually meant to say:
> >>
> >>
> >>
> >> `When both records being compared contain a matching "compaction value"
> and
> >>
> >> their corresponding byte arrays are considered equal, then the record
> with
> >>
> >> the highest offset will be 

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Luis Cabral
Hi Guozhang,

I mean here: 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/OffsetMap.scala


It seems that this class was designed with a great focus on preventing memory 
starvation, and this class is also very central to the changes required for 
this KIP.

So far this has not been of great concern, as enhancing that logic with an 
extra 8 bytes is quite easy (have a look at the pull request for more details). 
Is it alright to enhance this with a variable amount of bytes, though? Or to 
replace it with a direct record cache? 

Kind Regards
Luis 


> On 24 Apr 2018, at 20:30, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> Not sure if I fully understand your question, but here's my understanding:
> In LogCleaner we call:
> 
> "val records = MemoryRecords.readableRecords(readBuffer)"
> 
> Which returns a MemoryRecords object, and then call filterInto with a given
> customized RecordFilter that instantiates "checkBatchRetention(batch:
> RecordBatch)". Note that `RecordBatch` is just an iterator of `Record`,
> which contains the headers, so we can just access that header there.
> 
> 
> Guozhang
> 
> 
> On Tue, Apr 24, 2018 at 12:41 AM, Luís Cabral <luis_cab...@yahoo.com.invalid
>> wrote:
> 
>> 
>> Hi Guozhang,
>> 
>> As much as I would like to move on from this topic, I've now tried to
>> implement it into the pull request, and could not find a viable way to
>> store a variable size byte array into the current concept of the log
>> cleaner (with long the current approach just always considers it to be 8
>> bytes).
>> 
>> Do you have any suggestions on how to handle this issue there?
>> 
>> Kind Regards,
>> Luis
>> 
>>On Tuesday, April 24, 2018, 1:11:11 AM GMT+2, Luís Cabral <
>> luis_cab...@yahoo.com> wrote:
>> 
>> #yiv6853119978 #yiv6853119978 -- _filtered #yiv6853119978 {panose-1:2 4 5
>> 3 5 4 6 3 2 4;} _filtered #yiv6853119978 {font-family:Calibri;panose-1:2
>> 15 5 2 2 2 4 3 2 4;}#yiv6853119978 #yiv6853119978 p.yiv6853119978MsoNormal,
>> #yiv6853119978 li.yiv6853119978MsoNormal, #yiv6853119978
>> div.yiv6853119978MsoNormal {margin:0cm;margin-bottom:.
>> 0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv6853119978 a:link,
>> #yiv6853119978 span.yiv6853119978MsoHyperlink 
>> {color:blue;text-decoration:underline;}#yiv6853119978
>> a:visited, #yiv6853119978 span.yiv6853119978MsoHyperlinkFollowed
>> {color:#954F72;text-decoration:underline;}#yiv6853119978
>> .yiv6853119978MsoChpDefault {} _filtered #yiv6853119978 {margin:72.0pt
>> 72.0pt 72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1
>> {}#yiv6853119978
>> That is definitely clearer, KIP updated!
>> 
>> 
>> 
>> From: Guozhang Wang
>> Sent: 23 April 2018 23:44
>> To: dev@kafka.apache.org
>> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>> 
>> 
>> 
>> Thanks Luís. The KIP looks good to me. Just that what I left as a minor:
>> 
>> 
>> 
>> `When both records being compared contain a matching "compaction value",
>> 
>> then the record with the highest offset will be kept;`
>> 
>> 
>> 
>> I understand your intent, it's just that the sentence itself is a bit
>> 
>> misleading, I think what you actually meant to say:
>> 
>> 
>> 
>> `When both records being compared contain a matching "compaction value" and
>> 
>> their corresponding byte arrays are considered equal, then the record with
>> 
>> the highest offset will be kept;`
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Guozhang
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, Apr 23, 2018 at 1:54 PM, Luís Cabral <luis_cab...@yahoo.com.invalid
>>> 
>> 
>> wrote:
>> 
>> 
>> 
>>> Hello Guozhang,
>> 
>>> 
>> 
>>> The KIP is now updated to reflect this choice in strategy.
>> 
>>> Please let me know your thoughts there.
>> 
>>> 
>> 
>>> Kind Regards,
>> 
>>> Luís
>> 
>>> 
>> 
>>> From: Guozhang Wang
>> 
>>> Sent: 23 April 2018 19:32
>> 
>>> To: dev@kafka.apache.org
>> 
>>> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>> 
>>> 
>> 
>>> Hi Luis,
>> 
>>> 
>> 
>>> I think by "generalizing it" we could go beyond numerical values, and
>> 
>>> that's why I suggested we do not need to require that the type s

Re: RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Guozhang Wang
Not sure if I fully understand your question, but here's my understanding:
In LogCleaner we call:

"val records = MemoryRecords.readableRecords(readBuffer)"

Which returns a MemoryRecords object, and then call filterInto with a given
customized RecordFilter that instantiates "checkBatchRetention(batch:
RecordBatch)". Note that `RecordBatch` is just an iterator of `Record`,
which contains the headers, so we can just access that header there.


Guozhang


On Tue, Apr 24, 2018 at 12:41 AM, Luís Cabral <luis_cab...@yahoo.com.invalid
> wrote:

>
> Hi Guozhang,
>
> As much as I would like to move on from this topic, I've now tried to
> implement it into the pull request, and could not find a viable way to
> store a variable size byte array into the current concept of the log
> cleaner (with long the current approach just always considers it to be 8
> bytes).
>
> Do you have any suggestions on how to handle this issue there?
>
> Kind Regards,
> Luis
>
> On Tuesday, April 24, 2018, 1:11:11 AM GMT+2, Luís Cabral <
> luis_cab...@yahoo.com> wrote:
>
>  #yiv6853119978 #yiv6853119978 -- _filtered #yiv6853119978 {panose-1:2 4 5
> 3 5 4 6 3 2 4;} _filtered #yiv6853119978 {font-family:Calibri;panose-1:2
> 15 5 2 2 2 4 3 2 4;}#yiv6853119978 #yiv6853119978 p.yiv6853119978MsoNormal,
> #yiv6853119978 li.yiv6853119978MsoNormal, #yiv6853119978
> div.yiv6853119978MsoNormal {margin:0cm;margin-bottom:.
> 0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv6853119978 a:link,
> #yiv6853119978 span.yiv6853119978MsoHyperlink 
> {color:blue;text-decoration:underline;}#yiv6853119978
> a:visited, #yiv6853119978 span.yiv6853119978MsoHyperlinkFollowed
> {color:#954F72;text-decoration:underline;}#yiv6853119978
> .yiv6853119978MsoChpDefault {} _filtered #yiv6853119978 {margin:72.0pt
> 72.0pt 72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1
> {}#yiv6853119978
> That is definitely clearer, KIP updated!
>
>
>
> From: Guozhang Wang
> Sent: 23 April 2018 23:44
> To: dev@kafka.apache.org
> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
>
>
> Thanks Luís. The KIP looks good to me. Just that what I left as a minor:
>
>
>
> `When both records being compared contain a matching "compaction value",
>
> then the record with the highest offset will be kept;`
>
>
>
> I understand your intent, it's just that the sentence itself is a bit
>
> misleading, I think what you actually meant to say:
>
>
>
> `When both records being compared contain a matching "compaction value" and
>
> their corresponding byte arrays are considered equal, then the record with
>
> the highest offset will be kept;`
>
>
>
>
>
>
>
> Guozhang
>
>
>
>
>
>
>
> On Mon, Apr 23, 2018 at 1:54 PM, Luís Cabral <luis_cab...@yahoo.com.invalid
> >
>
> wrote:
>
>
>
> > Hello Guozhang,
>
> >
>
> > The KIP is now updated to reflect this choice in strategy.
>
> > Please let me know your thoughts there.
>
> >
>
> > Kind Regards,
>
> > Luís
>
> >
>
> > From: Guozhang Wang
>
> > Sent: 23 April 2018 19:32
>
> > To: dev@kafka.apache.org
>
> > Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> >
>
> > Hi Luis,
>
> >
>
> > I think by "generalizing it" we could go beyond numerical values, and
>
> > that's why I suggested we do not need to require that the type serialized
>
> > to the bytes have any numerical semantics since it has to ben serialized
> to
>
> > a byte array anyways. I understand that for your use case, the intended
>
> > record header compaction value is a number, but imagine if someone else
>
> > wants to compact the same-keyed messages based on some record header
>
> > key-value pair whose value types before serializing to bytes are not
>
> > numbers at all, but just some strings:
>
> >
>
> > key: "A", value: "a1", header: ["bar" -> "a".bytes()],
>
> > key: "A", value: "a2", header: ["bar" -> "c".bytes()],
>
> > key: "A", value: "a3", header: ["bar" -> "b".bytes()],
>
> >
>
> >
>
> > Could we allow them to use that header for compaction as well?
>
> >
>
> >
>
> > Now going back to your use case, for numbers that could be negative
> values,
>
> > as long as users are aware of the requirement and change the default
>
> > encoding schemes when they generate the producer record while set

Re: RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Luís Cabral
 
Hi Guozhang,

As much as I would like to move on from this topic, I've now tried to implement 
it into the pull request, and could not find a viable way to store a variable 
size byte array into the current concept of the log cleaner (with long the 
current approach just always considers it to be 8 bytes).

Do you have any suggestions on how to handle this issue there?

Kind Regards,
Luis

On Tuesday, April 24, 2018, 1:11:11 AM GMT+2, Luís Cabral 
<luis_cab...@yahoo.com> wrote:  
 
 #yiv6853119978 #yiv6853119978 -- _filtered #yiv6853119978 {panose-1:2 4 5 3 5 
4 6 3 2 4;} _filtered #yiv6853119978 {font-family:Calibri;panose-1:2 15 5 2 2 2 
4 3 2 4;}#yiv6853119978 #yiv6853119978 p.yiv6853119978MsoNormal, #yiv6853119978 
li.yiv6853119978MsoNormal, #yiv6853119978 div.yiv6853119978MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv6853119978
 a:link, #yiv6853119978 span.yiv6853119978MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv6853119978 a:visited, #yiv6853119978 
span.yiv6853119978MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv6853119978 
.yiv6853119978MsoChpDefault {} _filtered #yiv6853119978 {margin:72.0pt 72.0pt 
72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1 {}#yiv6853119978 
That is definitely clearer, KIP updated!

  

From: Guozhang Wang
Sent: 23 April 2018 23:44
To: dev@kafka.apache.org
Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

  

Thanks Luís. The KIP looks good to me. Just that what I left as a minor:

  

`When both records being compared contain a matching "compaction value",

then the record with the highest offset will be kept;`

  

I understand your intent, it's just that the sentence itself is a bit

misleading, I think what you actually meant to say:

  

`When both records being compared contain a matching "compaction value" and

their corresponding byte arrays are considered equal, then the record with

the highest offset will be kept;`

  

  

  

Guozhang

  

  

  

On Mon, Apr 23, 2018 at 1:54 PM, Luís Cabral <luis_cab...@yahoo.com.invalid>

wrote:

  

> Hello Guozhang,

>  

> The KIP is now updated to reflect this choice in strategy.

> Please let me know your thoughts there.

>  

> Kind Regards,

> Luís

>  

> From: Guozhang Wang

> Sent: 23 April 2018 19:32

> To: dev@kafka.apache.org

> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

>  

> Hi Luis,

>  

> I think by "generalizing it" we could go beyond numerical values, and

> that's why I suggested we do not need to require that the type serialized

> to the bytes have any numerical semantics since it has to ben serialized to

> a byte array anyways. I understand that for your use case, the intended

> record header compaction value is a number, but imagine if someone else

> wants to compact the same-keyed messages based on some record header

> key-value pair whose value types before serializing to bytes are not

> numbers at all, but just some strings:

>  

> key: "A", value: "a1", header: ["bar" -> "a".bytes()],

> key: "A", value: "a2", header: ["bar" -> "c".bytes()],

> key: "A", value: "a3", header: ["bar" -> "b".bytes()],

>  

>  

> Could we allow them to use that header for compaction as well?

>  

>  

> Now going back to your use case, for numbers that could be negative values,

> as long as users are aware of the requirement and change the default

> encoding schemes when they generate the producer record while setting the

> headers so that the serialized bytes still obey the value that should be OK

> (again, as I said, we push this responsibility to users to define the right

> serde mechanism, but that seems to be more flexible). For example: -INF

> serialized to 0x, -INF+1 serialized to 0x0001, etc.

>  

>  

>  

> Guozhang

>  

>  

>  

>  

>  

> On Mon, Apr 23, 2018 at 10:19 AM, Luís Cabral

> <luis_cab...@yahoo.com.invalid

> > wrote:

>  

> > Hello Guozhang,

> >

> > Thanks for the fast reply!

> >

> > As for the matter of the timestamp, it’s now added to the KIP, so I hope

> > this is correctly addressed.

> > Kindly let me know if you would like some adaptions to the concept.

> >

> >

> > bq. The issue that I do not understand completely is why you'd keep

> saying

> > that why we need to convert it to a String, first then converting to any

> > other fields.

> >

> > Maybe I’m over-engineering it again, and the problem can be simplified to

> > restricting this to values greater than or equal to zero, which ends up

> &

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
That is definitely clearer, KIP updated!

From: Guozhang Wang
Sent: 23 April 2018 23:44
To: dev@kafka.apache.org
Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

Thanks Luís. The KIP looks good to me. Just that what I left as a minor:

`When both records being compared contain a matching "compaction value",
then the record with the highest offset will be kept;`

I understand your intent, it's just that the sentence itself is a bit
misleading, I think what you actually meant to say:

`When both records being compared contain a matching "compaction value" and
their corresponding byte arrays are considered equal, then the record with
the highest offset will be kept;`



Guozhang



On Mon, Apr 23, 2018 at 1:54 PM, Luís Cabral <luis_cab...@yahoo.com.invalid>
wrote:

> Hello Guozhang,
>
> The KIP is now updated to reflect this choice in strategy.
> Please let me know your thoughts there.
>
> Kind Regards,
> Luís
>
> From: Guozhang Wang
> Sent: 23 April 2018 19:32
> To: dev@kafka.apache.org
> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi Luis,
>
> I think by "generalizing it" we could go beyond numerical values, and
> that's why I suggested we do not need to require that the type serialized
> to the bytes have any numerical semantics since it has to ben serialized to
> a byte array anyways. I understand that for your use case, the intended
> record header compaction value is a number, but imagine if someone else
> wants to compact the same-keyed messages based on some record header
> key-value pair whose value types before serializing to bytes are not
> numbers at all, but just some strings:
>
> key: "A", value: "a1", header: ["bar" -> "a".bytes()],
> key: "A", value: "a2", header: ["bar" -> "c".bytes()],
> key: "A", value: "a3", header: ["bar" -> "b".bytes()],
>
>
> Could we allow them to use that header for compaction as well?
>
>
> Now going back to your use case, for numbers that could be negative values,
> as long as users are aware of the requirement and change the default
> encoding schemes when they generate the producer record while setting the
> headers so that the serialized bytes still obey the value that should be OK
> (again, as I said, we push this responsibility to users to define the right
> serde mechanism, but that seems to be more flexible). For example: -INF
> serialized to 0x, -INF+1 serialized to 0x0001, etc.
>
>
>
> Guozhang
>
>
>
>
>
> On Mon, Apr 23, 2018 at 10:19 AM, Luís Cabral
> <luis_cab...@yahoo.com.invalid
> > wrote:
>
> > Hello Guozhang,
> >
> > Thanks for the fast reply!
> >
> > As for the matter of the timestamp, it’s now added to the KIP, so I hope
> > this is correctly addressed.
> > Kindly let me know if you would like some adaptions to the concept.
> >
> >
> > bq. The issue that I do not understand completely is why you'd keep
> saying
> > that why we need to convert it to a String, first then converting to any
> > other fields.
> >
> > Maybe I’m over-engineering it again, and the problem can be simplified to
> > restricting this to values greater than or equal to zero, which ends up
> > being ok for my own use case...
> > This would then generally guarantee the lexicographic ordering, as you
> say.
> > Is this what you mean? Should I then add this restriction to the KIP?
> >
> > Cheers,
> > Luis
> >
> > From: Guozhang Wang
> > Sent: 23 April 2018 17:55
> > To: dev@kafka.apache.org
> > Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hello Luis,
> >
> > Thanks for your email, replying to your points in the following:
> >
> > > I don't personally see advantages in it, but also the only disadvantage
> > that I can think of is putting multiple meanings on this field.
> >
> > If we do not treat timestamp as a special value of the config, then I
> > cannot use the timestamp field of the record as the compaction value,
> since
> > we will only look into the record header other than the default offset,
> > right? Then users wanting to use the timestamp as the compaction value
> have
> > to put that timestamp into the record header with a name, which
> duplicates
> > the field unnecessary. So to me without treating it as a special value we
> > are doomed to have duplicate record field.
> >
> > > Having it this way would jeopardize my own particular use case, as I
> need
> > to have an incremental number represe

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
Thanks Luís. The KIP looks good to me. Just that what I left as a minor:

`When both records being compared contain a matching "compaction value",
then the record with the highest offset will be kept;`

I understand your intent, it's just that the sentence itself is a bit
misleading, I think what you actually meant to say:

`When both records being compared contain a matching "compaction value" and
their corresponding byte arrays are considered equal, then the record with
the highest offset will be kept;`



Guozhang



On Mon, Apr 23, 2018 at 1:54 PM, Luís Cabral <luis_cab...@yahoo.com.invalid>
wrote:

> Hello Guozhang,
>
> The KIP is now updated to reflect this choice in strategy.
> Please let me know your thoughts there.
>
> Kind Regards,
> Luís
>
> From: Guozhang Wang
> Sent: 23 April 2018 19:32
> To: dev@kafka.apache.org
> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hi Luis,
>
> I think by "generalizing it" we could go beyond numerical values, and
> that's why I suggested we do not need to require that the type serialized
> to the bytes have any numerical semantics since it has to ben serialized to
> a byte array anyways. I understand that for your use case, the intended
> record header compaction value is a number, but imagine if someone else
> wants to compact the same-keyed messages based on some record header
> key-value pair whose value types before serializing to bytes are not
> numbers at all, but just some strings:
>
> key: "A", value: "a1", header: ["bar" -> "a".bytes()],
> key: "A", value: "a2", header: ["bar" -> "c".bytes()],
> key: "A", value: "a3", header: ["bar" -> "b".bytes()],
>
>
> Could we allow them to use that header for compaction as well?
>
>
> Now going back to your use case, for numbers that could be negative values,
> as long as users are aware of the requirement and change the default
> encoding schemes when they generate the producer record while setting the
> headers so that the serialized bytes still obey the value that should be OK
> (again, as I said, we push this responsibility to users to define the right
> serde mechanism, but that seems to be more flexible). For example: -INF
> serialized to 0x, -INF+1 serialized to 0x0001, etc.
>
>
>
> Guozhang
>
>
>
>
>
> On Mon, Apr 23, 2018 at 10:19 AM, Luís Cabral
> <luis_cab...@yahoo.com.invalid
> > wrote:
>
> > Hello Guozhang,
> >
> > Thanks for the fast reply!
> >
> > As for the matter of the timestamp, it’s now added to the KIP, so I hope
> > this is correctly addressed.
> > Kindly let me know if you would like some adaptions to the concept.
> >
> >
> > bq. The issue that I do not understand completely is why you'd keep
> saying
> > that why we need to convert it to a String, first then converting to any
> > other fields.
> >
> > Maybe I’m over-engineering it again, and the problem can be simplified to
> > restricting this to values greater than or equal to zero, which ends up
> > being ok for my own use case...
> > This would then generally guarantee the lexicographic ordering, as you
> say.
> > Is this what you mean? Should I then add this restriction to the KIP?
> >
> > Cheers,
> > Luis
> >
> > From: Guozhang Wang
> > Sent: 23 April 2018 17:55
> > To: dev@kafka.apache.org
> > Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hello Luis,
> >
> > Thanks for your email, replying to your points in the following:
> >
> > > I don't personally see advantages in it, but also the only disadvantage
> > that I can think of is putting multiple meanings on this field.
> >
> > If we do not treat timestamp as a special value of the config, then I
> > cannot use the timestamp field of the record as the compaction value,
> since
> > we will only look into the record header other than the default offset,
> > right? Then users wanting to use the timestamp as the compaction value
> have
> > to put that timestamp into the record header with a name, which
> duplicates
> > the field unnecessary. So to me without treating it as a special value we
> > are doomed to have duplicate record field.
> >
> > > Having it this way would jeopardize my own particular use case, as I
> need
> > to have an incremental number representing the version (i.e.: 1, 2, 3, 5,
> > 52, et cetera)
> >
> > The issue that I do not understand completely is why you'd keep saying
> t

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
Hello Guozhang,

The KIP is now updated to reflect this choice in strategy.
Please let me know your thoughts there.

Kind Regards,
Luís

From: Guozhang Wang
Sent: 23 April 2018 19:32
To: dev@kafka.apache.org
Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

Hi Luis,

I think by "generalizing it" we could go beyond numerical values, and
that's why I suggested we do not need to require that the type serialized
to the bytes have any numerical semantics since it has to ben serialized to
a byte array anyways. I understand that for your use case, the intended
record header compaction value is a number, but imagine if someone else
wants to compact the same-keyed messages based on some record header
key-value pair whose value types before serializing to bytes are not
numbers at all, but just some strings:

key: "A", value: "a1", header: ["bar" -> "a".bytes()],
key: "A", value: "a2", header: ["bar" -> "c".bytes()],
key: "A", value: "a3", header: ["bar" -> "b".bytes()],


Could we allow them to use that header for compaction as well?


Now going back to your use case, for numbers that could be negative values,
as long as users are aware of the requirement and change the default
encoding schemes when they generate the producer record while setting the
headers so that the serialized bytes still obey the value that should be OK
(again, as I said, we push this responsibility to users to define the right
serde mechanism, but that seems to be more flexible). For example: -INF
serialized to 0x, -INF+1 serialized to 0x0001, etc.



Guozhang





On Mon, Apr 23, 2018 at 10:19 AM, Luís Cabral <luis_cab...@yahoo.com.invalid
> wrote:

> Hello Guozhang,
>
> Thanks for the fast reply!
>
> As for the matter of the timestamp, it’s now added to the KIP, so I hope
> this is correctly addressed.
> Kindly let me know if you would like some adaptions to the concept.
>
>
> bq. The issue that I do not understand completely is why you'd keep saying
> that why we need to convert it to a String, first then converting to any
> other fields.
>
> Maybe I’m over-engineering it again, and the problem can be simplified to
> restricting this to values greater than or equal to zero, which ends up
> being ok for my own use case...
> This would then generally guarantee the lexicographic ordering, as you say.
> Is this what you mean? Should I then add this restriction to the KIP?
>
> Cheers,
> Luis
>
> From: Guozhang Wang
> Sent: 23 April 2018 17:55
> To: dev@kafka.apache.org
> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Luis,
>
> Thanks for your email, replying to your points in the following:
>
> > I don't personally see advantages in it, but also the only disadvantage
> that I can think of is putting multiple meanings on this field.
>
> If we do not treat timestamp as a special value of the config, then I
> cannot use the timestamp field of the record as the compaction value, since
> we will only look into the record header other than the default offset,
> right? Then users wanting to use the timestamp as the compaction value have
> to put that timestamp into the record header with a name, which duplicates
> the field unnecessary. So to me without treating it as a special value we
> are doomed to have duplicate record field.
>
> > Having it this way would jeopardize my own particular use case, as I need
> to have an incremental number representing the version (i.e.: 1, 2, 3, 5,
> 52, et cetera)
>
> The issue that I do not understand completely is why you'd keep saying that
> why we need to convert it to a String, first then converting to any other
> fields. Since the header is organized in:
>
> public interface Header {
>
> String key();
>
> byte[] value();
>
> }
>
>
> Which means that the header value can be of any types. So with your use
> case why can't you just serialize your incremental version number into a
> byte array directly, whose lexico-order obeys the version number value?? I
> think the default byte serialization mechanism of the integer is sufficient
> for this purpose (assuming that increment number is int).
>
>
>
> Guozhang
>
>
>
>
> On Mon, Apr 23, 2018 at 2:30 AM, Luís Cabral <luis_cab...@yahoo.com.invalid
> >
> wrote:
>
> >  Hi Guozhang,
> >
> > Thank you very much for the patience in explaining your points, I've
> > learnt quite a bit in researching and experimenting after your replies.
> >
> >
> > bq. I still think it is worth defining `timestamp` as a special
> compaction
> > value
> >
> > I don't personally see advantages in it,

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
Hi Luis,

I think by "generalizing it" we could go beyond numerical values, and
that's why I suggested we do not need to require that the type serialized
to the bytes have any numerical semantics since it has to ben serialized to
a byte array anyways. I understand that for your use case, the intended
record header compaction value is a number, but imagine if someone else
wants to compact the same-keyed messages based on some record header
key-value pair whose value types before serializing to bytes are not
numbers at all, but just some strings:

key: "A", value: "a1", header: ["bar" -> "a".bytes()],
key: "A", value: "a2", header: ["bar" -> "c".bytes()],
key: "A", value: "a3", header: ["bar" -> "b".bytes()],


Could we allow them to use that header for compaction as well?


Now going back to your use case, for numbers that could be negative values,
as long as users are aware of the requirement and change the default
encoding schemes when they generate the producer record while setting the
headers so that the serialized bytes still obey the value that should be OK
(again, as I said, we push this responsibility to users to define the right
serde mechanism, but that seems to be more flexible). For example: -INF
serialized to 0x, -INF+1 serialized to 0x0001, etc.



Guozhang





On Mon, Apr 23, 2018 at 10:19 AM, Luís Cabral <luis_cab...@yahoo.com.invalid
> wrote:

> Hello Guozhang,
>
> Thanks for the fast reply!
>
> As for the matter of the timestamp, it’s now added to the KIP, so I hope
> this is correctly addressed.
> Kindly let me know if you would like some adaptions to the concept.
>
>
> bq. The issue that I do not understand completely is why you'd keep saying
> that why we need to convert it to a String, first then converting to any
> other fields.
>
> Maybe I’m over-engineering it again, and the problem can be simplified to
> restricting this to values greater than or equal to zero, which ends up
> being ok for my own use case...
> This would then generally guarantee the lexicographic ordering, as you say.
> Is this what you mean? Should I then add this restriction to the KIP?
>
> Cheers,
> Luis
>
> From: Guozhang Wang
> Sent: 23 April 2018 17:55
> To: dev@kafka.apache.org
> Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Luis,
>
> Thanks for your email, replying to your points in the following:
>
> > I don't personally see advantages in it, but also the only disadvantage
> that I can think of is putting multiple meanings on this field.
>
> If we do not treat timestamp as a special value of the config, then I
> cannot use the timestamp field of the record as the compaction value, since
> we will only look into the record header other than the default offset,
> right? Then users wanting to use the timestamp as the compaction value have
> to put that timestamp into the record header with a name, which duplicates
> the field unnecessary. So to me without treating it as a special value we
> are doomed to have duplicate record field.
>
> > Having it this way would jeopardize my own particular use case, as I need
> to have an incremental number representing the version (i.e.: 1, 2, 3, 5,
> 52, et cetera)
>
> The issue that I do not understand completely is why you'd keep saying that
> why we need to convert it to a String, first then converting to any other
> fields. Since the header is organized in:
>
> public interface Header {
>
> String key();
>
> byte[] value();
>
> }
>
>
> Which means that the header value can be of any types. So with your use
> case why can't you just serialize your incremental version number into a
> byte array directly, whose lexico-order obeys the version number value?? I
> think the default byte serialization mechanism of the integer is sufficient
> for this purpose (assuming that increment number is int).
>
>
>
> Guozhang
>
>
>
>
> On Mon, Apr 23, 2018 at 2:30 AM, Luís Cabral <luis_cab...@yahoo.com.invalid
> >
> wrote:
>
> >  Hi Guozhang,
> >
> > Thank you very much for the patience in explaining your points, I've
> > learnt quite a bit in researching and experimenting after your replies.
> >
> >
> > bq. I still think it is worth defining `timestamp` as a special
> compaction
> > value
> >
> > I don't personally see advantages in it, but also the only disadvantage
> > that I can think of is putting multiple meanings on this field, which
> does
> > not seem enough to dissuade anyone, so I've added it to the KIP as a
> > compromise.
> > (please also see the pull request in 

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
Hello Guozhang,

Thanks for the fast reply!

As for the matter of the timestamp, it’s now added to the KIP, so I hope this 
is correctly addressed.
Kindly let me know if you would like some adaptions to the concept.


bq. The issue that I do not understand completely is why you'd keep saying that 
why we need to convert it to a String, first then converting to any other 
fields.

Maybe I’m over-engineering it again, and the problem can be simplified to 
restricting this to values greater than or equal to zero, which ends up being 
ok for my own use case...
This would then generally guarantee the lexicographic ordering, as you say.
Is this what you mean? Should I then add this restriction to the KIP?

Cheers,
Luis

From: Guozhang Wang
Sent: 23 April 2018 17:55
To: dev@kafka.apache.org
Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

Hello Luis,

Thanks for your email, replying to your points in the following:

> I don't personally see advantages in it, but also the only disadvantage
that I can think of is putting multiple meanings on this field.

If we do not treat timestamp as a special value of the config, then I
cannot use the timestamp field of the record as the compaction value, since
we will only look into the record header other than the default offset,
right? Then users wanting to use the timestamp as the compaction value have
to put that timestamp into the record header with a name, which duplicates
the field unnecessary. So to me without treating it as a special value we
are doomed to have duplicate record field.

> Having it this way would jeopardize my own particular use case, as I need
to have an incremental number representing the version (i.e.: 1, 2, 3, 5,
52, et cetera)

The issue that I do not understand completely is why you'd keep saying that
why we need to convert it to a String, first then converting to any other
fields. Since the header is organized in:

public interface Header {

String key();

byte[] value();

}


Which means that the header value can be of any types. So with your use
case why can't you just serialize your incremental version number into a
byte array directly, whose lexico-order obeys the version number value?? I
think the default byte serialization mechanism of the integer is sufficient
for this purpose (assuming that increment number is int).



Guozhang




On Mon, Apr 23, 2018 at 2:30 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
wrote:

>  Hi Guozhang,
>
> Thank you very much for the patience in explaining your points, I've
> learnt quite a bit in researching and experimenting after your replies.
>
>
> bq. I still think it is worth defining `timestamp` as a special compaction
> value
>
> I don't personally see advantages in it, but also the only disadvantage
> that I can think of is putting multiple meanings on this field, which does
> not seem enough to dissuade anyone, so I've added it to the KIP as a
> compromise.
> (please also see the pull request in case you want to confirm the
> implementation matches your idea)
>
>
> bq. Should it be "the record with the highest value will be kept"?
>
>
> That is describing a scenario where the records being compared have the
> same value, in which case the offset is used as a tie-breaker.
> With trying to cover as much as possible, the "Proposed Changes" may have
> became confusing to read, sorry for that...
>
>
> bq. Users are then responsible to encode their compaction field according
> to the byte array lexico-ordering to full fill their ordering semantics. It
> is more flexible to enforce users to encode their compaction field always
> as a long type.
>
> This was indeed my focus on the previous replies, since I am not sure how
> this would work without adding a lot of responsibility on the client side.
> So, rather than trying to debate best practices, since I don't know which
> ones are being followed in this project, I will instead debate my own
> selfish need for this feature:
> Having it this way would jeopardize my own particular use case, as I need
> to have an incremental number representing the version (i.e.: 1, 2, 3, 5,
> 52, et cetera). It does not totally invalidate it, since we can always
> convert it to String on the client side and left-pad with 0's to the max
> length of a long, but it seems a shame to have to do this as it would
> increase the data transfer size (I'm trying to avoid it becoming a
> bottleneck during high throughput periods). This would likely mean that I
> would start abusing the "timestamp" approach discussed above, as it keeps
> the messages nimble, but it would again be a shame to be forced into such a
> hacky solution.
> This is how I see it, and why I would like to avoid it. But maybe there is
> some smarter way that you know of on how to handle it on the client side

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
Hello Luis,

Thanks for your email, replying to your points in the following:

> I don't personally see advantages in it, but also the only disadvantage
that I can think of is putting multiple meanings on this field.

If we do not treat timestamp as a special value of the config, then I
cannot use the timestamp field of the record as the compaction value, since
we will only look into the record header other than the default offset,
right? Then users wanting to use the timestamp as the compaction value have
to put that timestamp into the record header with a name, which duplicates
the field unnecessary. So to me without treating it as a special value we
are doomed to have duplicate record field.

> Having it this way would jeopardize my own particular use case, as I need
to have an incremental number representing the version (i.e.: 1, 2, 3, 5,
52, et cetera)

The issue that I do not understand completely is why you'd keep saying that
why we need to convert it to a String, first then converting to any other
fields. Since the header is organized in:

public interface Header {

String key();

byte[] value();

}


Which means that the header value can be of any types. So with your use
case why can't you just serialize your incremental version number into a
byte array directly, whose lexico-order obeys the version number value?? I
think the default byte serialization mechanism of the integer is sufficient
for this purpose (assuming that increment number is int).



Guozhang




On Mon, Apr 23, 2018 at 2:30 AM, Luís Cabral 
wrote:

>  Hi Guozhang,
>
> Thank you very much for the patience in explaining your points, I've
> learnt quite a bit in researching and experimenting after your replies.
>
>
> bq. I still think it is worth defining `timestamp` as a special compaction
> value
>
> I don't personally see advantages in it, but also the only disadvantage
> that I can think of is putting multiple meanings on this field, which does
> not seem enough to dissuade anyone, so I've added it to the KIP as a
> compromise.
> (please also see the pull request in case you want to confirm the
> implementation matches your idea)
>
>
> bq. Should it be "the record with the highest value will be kept"?
>
>
> That is describing a scenario where the records being compared have the
> same value, in which case the offset is used as a tie-breaker.
> With trying to cover as much as possible, the "Proposed Changes" may have
> became confusing to read, sorry for that...
>
>
> bq. Users are then responsible to encode their compaction field according
> to the byte array lexico-ordering to full fill their ordering semantics. It
> is more flexible to enforce users to encode their compaction field always
> as a long type.
>
> This was indeed my focus on the previous replies, since I am not sure how
> this would work without adding a lot of responsibility on the client side.
> So, rather than trying to debate best practices, since I don't know which
> ones are being followed in this project, I will instead debate my own
> selfish need for this feature:
> Having it this way would jeopardize my own particular use case, as I need
> to have an incremental number representing the version (i.e.: 1, 2, 3, 5,
> 52, et cetera). It does not totally invalidate it, since we can always
> convert it to String on the client side and left-pad with 0's to the max
> length of a long, but it seems a shame to have to do this as it would
> increase the data transfer size (I'm trying to avoid it becoming a
> bottleneck during high throughput periods). This would likely mean that I
> would start abusing the "timestamp" approach discussed above, as it keeps
> the messages nimble, but it would again be a shame to be forced into such a
> hacky solution.
> This is how I see it, and why I would like to avoid it. But maybe there is
> some smarter way that you know of on how to handle it on the client side
> that would invalidate these concerns?
> Please let me know, and I would also greatly value some more feedback from
> other people regarding this topic, so please don't be shy!
>
> Kind Regards,LuisOn Friday, April 20, 2018, 7:41:30 PM GMT+2, Guozhang
> Wang  wrote:
>
>  Hi Luís,
>
> What I'm thinking primarily is that we only need to compare the compaction
> values as LONG for the offset and timestmap "type" (I still think it is
> worth defining `timestamp` as a special compaction value, with the reasons
> below).
>
> Not sure if you've seen my other comment earlier regarding the offset /
> timestmap, I'm pasting / editing them here to illustrate my idea:
>
> --
>
> I think maybe we have a mis-communication here: I'm not against the idea of
> using headers, but just trying to argue that we could make `timestamp`
> field a special config value that is referring to the timestamp field in
> the metadata. So from log cleaner's pov:
>
> 1. if the config value is "offset", look into the offset field, 

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
 Hi Guozhang,

Thank you very much for the patience in explaining your points, I've learnt 
quite a bit in researching and experimenting after your replies.


bq. I still think it is worth defining `timestamp` as a special compaction value

I don't personally see advantages in it, but also the only disadvantage that I 
can think of is putting multiple meanings on this field, which does not seem 
enough to dissuade anyone, so I've added it to the KIP as a compromise. 
(please also see the pull request in case you want to confirm the 
implementation matches your idea)


bq. Should it be "the record with the highest value will be kept"?


That is describing a scenario where the records being compared have the same 
value, in which case the offset is used as a tie-breaker. 
With trying to cover as much as possible, the "Proposed Changes" may have 
became confusing to read, sorry for that...


bq. Users are then responsible to encode their compaction field according to 
the byte array lexico-ordering to full fill their ordering semantics. It is 
more flexible to enforce users to encode their compaction field always as a 
long type.

This was indeed my focus on the previous replies, since I am not sure how this 
would work without adding a lot of responsibility on the client side. 
So, rather than trying to debate best practices, since I don't know which ones 
are being followed in this project, I will instead debate my own selfish need 
for this feature: 
Having it this way would jeopardize my own particular use case, as I need to 
have an incremental number representing the version (i.e.: 1, 2, 3, 5, 52, et 
cetera). It does not totally invalidate it, since we can always convert it to 
String on the client side and left-pad with 0's to the max length of a long, 
but it seems a shame to have to do this as it would increase the data transfer 
size (I'm trying to avoid it becoming a bottleneck during high throughput 
periods). This would likely mean that I would start abusing the "timestamp" 
approach discussed above, as it keeps the messages nimble, but it would again 
be a shame to be forced into such a hacky solution.
This is how I see it, and why I would like to avoid it. But maybe there is some 
smarter way that you know of on how to handle it on the client side that would 
invalidate these concerns?
Please let me know, and I would also greatly value some more feedback from 
other people regarding this topic, so please don't be shy! 

Kind Regards,LuisOn Friday, April 20, 2018, 7:41:30 PM GMT+2, Guozhang Wang 
 wrote:  
 
 Hi Luís,

What I'm thinking primarily is that we only need to compare the compaction
values as LONG for the offset and timestmap "type" (I still think it is
worth defining `timestamp` as a special compaction value, with the reasons
below).

Not sure if you've seen my other comment earlier regarding the offset /
timestmap, I'm pasting / editing them here to illustrate my idea:

--

I think maybe we have a mis-communication here: I'm not against the idea of
using headers, but just trying to argue that we could make `timestamp`
field a special config value that is referring to the timestamp field in
the metadata. So from log cleaner's pov:

1. if the config value is "offset", look into the offset field, *comparing
their value as long*
2. if the config value is "timestamp", look into the timestamp field,
*comparing
their value as long*
3. otherwise, say the config value is "foo", search for key "foo" in the
message header, comparing the value as *byte arrays*

I.e. "offset" and "timestamp" are treated as special cases other than case
3) above.

--

I think your main concern is that "Although the byte[] can be compared, it
is not actually comparable as the versioning is based on a long", while I'm
thinking we can indeed generalize it: there is not hard reasons that the
"compaction value" has to be a long, and since the goal of this KIP is to
generalize the log compaction logic to consider header fields, why not
allowing it to be of any types than enforcing them still to be a long type?
Users are then responsible to encode their compaction field according to
the byte array lexico-ordering to full fill their ordering semantics. It is
more flexible to enforce users to encode their compaction field always as a
long type. Let me know WDYT.



Also I have some minor comments on the wiki itself:

1) "When both records being compared contain a matching "compaction value",
then the record with the highest offset will be kept;"

Should it be "the record with the highest value will be kept"?




Guozhang


On Fri, Apr 20, 2018 at 1:05 AM, Luís Cabral 
wrote:

>  Guozhang, is this reply ok with you?
>
>
> If you insist on the byte[] comparison directly, then I would need some
> suggestions on how to represent a "version" with it, and then the KIP could
> be changed to that.
>    On Tuesday, April 17, 2018, 2:44:16 PM GMT+2, 

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-20 Thread Guozhang Wang
Hi Luís,

What I'm thinking primarily is that we only need to compare the compaction
values as LONG for the offset and timestmap "type" (I still think it is
worth defining `timestamp` as a special compaction value, with the reasons
below).

Not sure if you've seen my other comment earlier regarding the offset /
timestmap, I'm pasting / editing them here to illustrate my idea:

--

I think maybe we have a mis-communication here: I'm not against the idea of
using headers, but just trying to argue that we could make `timestamp`
field a special config value that is referring to the timestamp field in
the metadata. So from log cleaner's pov:

1. if the config value is "offset", look into the offset field, *comparing
their value as long*
2. if the config value is "timestamp", look into the timestamp field,
*comparing
their value as long*
3. otherwise, say the config value is "foo", search for key "foo" in the
message header, comparing the value as *byte arrays*

I.e. "offset" and "timestamp" are treated as special cases other than case
3) above.

--

I think your main concern is that "Although the byte[] can be compared, it
is not actually comparable as the versioning is based on a long", while I'm
thinking we can indeed generalize it: there is not hard reasons that the
"compaction value" has to be a long, and since the goal of this KIP is to
generalize the log compaction logic to consider header fields, why not
allowing it to be of any types than enforcing them still to be a long type?
Users are then responsible to encode their compaction field according to
the byte array lexico-ordering to full fill their ordering semantics. It is
more flexible to enforce users to encode their compaction field always as a
long type. Let me know WDYT.



Also I have some minor comments on the wiki itself:

1) "When both records being compared contain a matching "compaction value",
then the record with the highest offset will be kept;"

Should it be "the record with the highest value will be kept"?




Guozhang


On Fri, Apr 20, 2018 at 1:05 AM, Luís Cabral 
wrote:

>  Guozhang, is this reply ok with you?
>
>
> If you insist on the byte[] comparison directly, then I would need some
> suggestions on how to represent a "version" with it, and then the KIP could
> be changed to that.
> On Tuesday, April 17, 2018, 2:44:16 PM GMT+2, Luís Cabral <
> luis_cab...@yahoo.com> wrote:
>
>  Oops, missed that email...
>
> bq. It is because when we compare the bytes we do not treat them as longs
> atall, so we just compare them based on bytes; I admit that if users's
> headertypes have some semantic meanings (e.g. it is encoded from a long)
> they weare forcing them to choose the encoder that obeys key
> lexicographicordering; but I felt it is more general than enforcing any
> fields that maybe used for log cleaner to be defined as a special type.
>
> Yes, you can compare bytes between each other (its what that code does).
> You can then assume (or infer) that the encoding used allows for
> lexicographic ordering, which I hope you do not do a lot of. This is
> (logically) the same as converting to String and then comparing the
> strings, except that it allows for abstracting from the String encoding
> (again, either with assumptions or with inferred knowledge).
> This is purely academic, however, as the versioning is based on a long,
> which is not compatible with this approach. So, is this comment a
> fact-check stating that it is possible to compare byte[] overall, or is it
> about trying to use it in this KIP?
>
> Cheers
>
> PS (because I'm stubborn): It is still not comparable, this comparison is
> all based on assumptions about the content of the byte array, but I hope we
> can leave this stuff to Stack Overflow instead of debating it here :)
>
>



-- 
-- Guozhang


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-20 Thread Luís Cabral
 Guozhang, is this reply ok with you?


If you insist on the byte[] comparison directly, then I would need some 
suggestions on how to represent a "version" with it, and then the KIP could be 
changed to that.
On Tuesday, April 17, 2018, 2:44:16 PM GMT+2, Luís Cabral 
 wrote:  
 
 Oops, missed that email...

bq. It is because when we compare the bytes we do not treat them as longs 
atall, so we just compare them based on bytes; I admit that if users's 
headertypes have some semantic meanings (e.g. it is encoded from a long) they 
weare forcing them to choose the encoder that obeys key lexicographicordering; 
but I felt it is more general than enforcing any fields that maybe used for log 
cleaner to be defined as a special type.

Yes, you can compare bytes between each other (its what that code does). You 
can then assume (or infer) that the encoding used allows for lexicographic 
ordering, which I hope you do not do a lot of. This is (logically) the same as 
converting to String and then comparing the strings, except that it allows for 
abstracting from the String encoding (again, either with assumptions or with 
inferred knowledge).
This is purely academic, however, as the versioning is based on a long, which 
is not compatible with this approach. So, is this comment a fact-check stating 
that it is possible to compare byte[] overall, or is it about trying to use it 
in this KIP?

Cheers

PS (because I'm stubborn): It is still not comparable, this comparison is all 
based on assumptions about the content of the byte array, but I hope we can 
leave this stuff to Stack Overflow instead of debating it here :)
  

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Luís Cabral
Oops, missed that email...

bq. It is because when we compare the bytes we do not treat them as longs 
atall, so we just compare them based on bytes; I admit that if users's 
headertypes have some semantic meanings (e.g. it is encoded from a long) they 
weare forcing them to choose the encoder that obeys key lexicographicordering; 
but I felt it is more general than enforcing any fields that maybe used for log 
cleaner to be defined as a special type.

Yes, you can compare bytes between each other (its what that code does). You 
can then assume (or infer) that the encoding used allows for lexicographic 
ordering, which I hope you do not do a lot of. This is (logically) the same as 
converting to String and then comparing the strings, except that it allows for 
abstracting from the String encoding (again, either with assumptions or with 
inferred knowledge).
This is purely academic, however, as the versioning is based on a long, which 
is not compatible with this approach. So, is this comment a fact-check stating 
that it is possible to compare byte[] overall, or is it about trying to use it 
in this KIP?

Cheers

PS (because I'm stubborn): It is still not comparable, this comparison is all 
based on assumptions about the content of the byte array, but I hope we can 
leave this stuff to Stack Overflow instead of debating it here :)


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Ted Yu
Can you respond to:
http://search-hadoop.com/m/Kafka/uyzND1OlYaSzZ3SM1?subj=Re+RE+DISCUSS+KIP+280+Enhanced+log+compaction
 Original message From: Luís Cabral 
<luis_cab...@yahoo.com.INVALID> Date: 4/17/18  2:41 AM  (GMT-08:00) To: 
dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log 
compaction 
Hi all,
There aren't that many discussions on this KIP, does that mean it should now 
move to voting? I'm not sure on the process here...
Cheers


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Luís Cabral
Hi all,
There aren't that many discussions on this KIP, does that mean it should now 
move to voting? I'm not sure on the process here...
Cheers


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
Yup, lazy copy-paste punishment :P


Guozhang

On Wed, Apr 11, 2018 at 10:19 AM, Ted Yu  wrote:

> bq. 2. if the config value is "timestamp", look into the offset field;
>
> I think you meant looking into timestamp field.
>
> Cheers
>
> On Wed, Apr 11, 2018 at 10:18 AM, Guozhang Wang 
> wrote:
>
> > > I do not mean that it is "used", but if what you meant is that you
> would
> > prefer to use that field instead of a header?
> > > This is in relation to a previous point of yours:
> >
> > I think maybe we have a mis-communication here: I'm not against the idea
> of
> > using headers, but just trying to argue that we could make `timestamp`
> > field a special config value that is referring to the timestamp field in
> > the metadata. So from log cleaner's pov:
> >
> > 1. if the config value is "offset", look into the offset field,
> > 2. if the config value is "timestamp", look into the offset field;
> > 2. otherwise, say the config value is "foo", search for key "foo" in the
> > message header.
> >
> >
> > > get super-inconsistent results, which make me reluctant to rely on it:
> > https://codebunk.com/b/704211525/
> >
> > Hmm, could you elaborate which part of the results are inconsistent? I
> > cannot tell directly from the console output of the code you posted.
> >
> >
> >
> > Guozhang
> >
> >
> >
> > On Wed, Apr 11, 2018 at 9:16 AM, Luís Cabral
>  > >
> > wrote:
> >
> > > Hi Guozhang,
> > >
> > >
> > > bq. I'm not sure I understand you statement that it is used to
> determine
> > > the "version" of the record
> > >
> > > I do not mean that it is "used", but if what you meant is that you
> would
> > > prefer to use that field instead of a header?
> > > This is in relation to a previous point of yours:
> > > >>> 1) I'm also in favor of making the `timestamp` a preserved config
> > > value along with `offset`, for which we would not go into the headers
> to
> > > look for the matching key, but directly look into the timestamp field
> of
> > > the message.
> > >
> > >
> > >
> > > bq. Regarding the byte arrays: I think byte arrays are indeed
> > > comparable, right?
> > >
> > > As far as I am aware, they are not comparable. Then again, I am not
> aware
> > > of everything that exists everywhere :)
> > > I just experimented with the code you mentioned and get
> > super-inconsistent
> > > results, which make me reluctant to rely on it:
> https://codebunk.com/b/
> > > 704211525/
> > >
> > >
> > >
> > > Thank you again for the comments.
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
If you are referring to, for example:

-4611686018427387904 > 0
-4611686018427387904 > 4611686018427387903


It is because when we compare the bytes we do not treat them as longs at
all, so we just compare them based on bytes; I admit that if users's header
types have some semantic meanings (e.g. it is encoded from a long) they we
are forcing them to choose the encoder that obeys key lexicographic
ordering; but I felt it is more general than enforcing any fields that may
be used for log cleaner to be defined as a special type.

Guozhang



On Wed, Apr 11, 2018 at 10:18 AM, Guozhang Wang  wrote:

> > I do not mean that it is "used", but if what you meant is that you
> would prefer to use that field instead of a header?
> > This is in relation to a previous point of yours:
>
> I think maybe we have a mis-communication here: I'm not against the idea
> of using headers, but just trying to argue that we could make `timestamp`
> field a special config value that is referring to the timestamp field in
> the metadata. So from log cleaner's pov:
>
> 1. if the config value is "offset", look into the offset field,
> 2. if the config value is "timestamp", look into the offset field;
> 2. otherwise, say the config value is "foo", search for key "foo" in the
> message header.
>
>
> > get super-inconsistent results, which make me reluctant to rely on it:
> https://codebunk.com/b/704211525/
>
> Hmm, could you elaborate which part of the results are inconsistent? I
> cannot tell directly from the console output of the code you posted.
>
>
>
> Guozhang
>
>
>
> On Wed, Apr 11, 2018 at 9:16 AM, Luís Cabral <
> luis_cab...@yahoo.com.invalid> wrote:
>
>> Hi Guozhang,
>>
>>
>> bq. I'm not sure I understand you statement that it is used to determine
>> the "version" of the record
>>
>> I do not mean that it is "used", but if what you meant is that you would
>> prefer to use that field instead of a header?
>> This is in relation to a previous point of yours:
>> >>> 1) I'm also in favor of making the `timestamp` a preserved config
>> value along with `offset`, for which we would not go into the headers to
>> look for the matching key, but directly look into the timestamp field of
>> the message.
>>
>>
>>
>> bq. Regarding the byte arrays: I think byte arrays are indeed
>> comparable, right?
>>
>> As far as I am aware, they are not comparable. Then again, I am not aware
>> of everything that exists everywhere :)
>> I just experimented with the code you mentioned and get
>> super-inconsistent results, which make me reluctant to rely on it:
>> https://codebunk.com/b/704211525/
>>
>>
>>
>> Thank you again for the comments.
>>
>
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Ted Yu
bq. 2. if the config value is "timestamp", look into the offset field;

I think you meant looking into timestamp field.

Cheers

On Wed, Apr 11, 2018 at 10:18 AM, Guozhang Wang  wrote:

> > I do not mean that it is "used", but if what you meant is that you would
> prefer to use that field instead of a header?
> > This is in relation to a previous point of yours:
>
> I think maybe we have a mis-communication here: I'm not against the idea of
> using headers, but just trying to argue that we could make `timestamp`
> field a special config value that is referring to the timestamp field in
> the metadata. So from log cleaner's pov:
>
> 1. if the config value is "offset", look into the offset field,
> 2. if the config value is "timestamp", look into the offset field;
> 2. otherwise, say the config value is "foo", search for key "foo" in the
> message header.
>
>
> > get super-inconsistent results, which make me reluctant to rely on it:
> https://codebunk.com/b/704211525/
>
> Hmm, could you elaborate which part of the results are inconsistent? I
> cannot tell directly from the console output of the code you posted.
>
>
>
> Guozhang
>
>
>
> On Wed, Apr 11, 2018 at 9:16 AM, Luís Cabral  >
> wrote:
>
> > Hi Guozhang,
> >
> >
> > bq. I'm not sure I understand you statement that it is used to determine
> > the "version" of the record
> >
> > I do not mean that it is "used", but if what you meant is that you would
> > prefer to use that field instead of a header?
> > This is in relation to a previous point of yours:
> > >>> 1) I'm also in favor of making the `timestamp` a preserved config
> > value along with `offset`, for which we would not go into the headers to
> > look for the matching key, but directly look into the timestamp field of
> > the message.
> >
> >
> >
> > bq. Regarding the byte arrays: I think byte arrays are indeed
> > comparable, right?
> >
> > As far as I am aware, they are not comparable. Then again, I am not aware
> > of everything that exists everywhere :)
> > I just experimented with the code you mentioned and get
> super-inconsistent
> > results, which make me reluctant to rely on it: https://codebunk.com/b/
> > 704211525/
> >
> >
> >
> > Thank you again for the comments.
> >
>
>
>
> --
> -- Guozhang
>


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
> I do not mean that it is "used", but if what you meant is that you would
prefer to use that field instead of a header?
> This is in relation to a previous point of yours:

I think maybe we have a mis-communication here: I'm not against the idea of
using headers, but just trying to argue that we could make `timestamp`
field a special config value that is referring to the timestamp field in
the metadata. So from log cleaner's pov:

1. if the config value is "offset", look into the offset field,
2. if the config value is "timestamp", look into the offset field;
2. otherwise, say the config value is "foo", search for key "foo" in the
message header.


> get super-inconsistent results, which make me reluctant to rely on it:
https://codebunk.com/b/704211525/

Hmm, could you elaborate which part of the results are inconsistent? I
cannot tell directly from the console output of the code you posted.



Guozhang



On Wed, Apr 11, 2018 at 9:16 AM, Luís Cabral 
wrote:

> Hi Guozhang,
>
>
> bq. I'm not sure I understand you statement that it is used to determine
> the "version" of the record
>
> I do not mean that it is "used", but if what you meant is that you would
> prefer to use that field instead of a header?
> This is in relation to a previous point of yours:
> >>> 1) I'm also in favor of making the `timestamp` a preserved config
> value along with `offset`, for which we would not go into the headers to
> look for the matching key, but directly look into the timestamp field of
> the message.
>
>
>
> bq. Regarding the byte arrays: I think byte arrays are indeed
> comparable, right?
>
> As far as I am aware, they are not comparable. Then again, I am not aware
> of everything that exists everywhere :)
> I just experimented with the code you mentioned and get super-inconsistent
> results, which make me reluctant to rely on it: https://codebunk.com/b/
> 704211525/
>
>
>
> Thank you again for the comments.
>



-- 
-- Guozhang


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Luís Cabral
Hi Guozhang,


bq. I'm not sure I understand you statement that it is used to determine the 
"version" of the record

I do not mean that it is "used", but if what you meant is that you would prefer 
to use that field instead of a header?
This is in relation to a previous point of yours:
>>> 1) I'm also in favor of making the `timestamp` a preserved config value 
>>> along with `offset`, for which we would not go into the headers to look for 
>>> the matching key, but directly look into the timestamp field of the message.



bq. Regarding the byte arrays: I think byte arrays are indeed comparable, right?

As far as I am aware, they are not comparable. Then again, I am not aware of 
everything that exists everywhere :)
I just experimented with the code you mentioned and get super-inconsistent 
results, which make me reluctant to rely on it: 
https://codebunk.com/b/704211525/



Thank you again for the comments.


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
Hello Luís,

Regarding the timestamp: it is designed to be mainly used for indicating
the time when this record is generated (i.e. CREATE_TIME at the producer
side, it will set the timestamp), or when the record has been appended to
Kafka brokers (i.e. LOG_APPEND_TIME at the broker side, where producer
would not set it, and even they do it will be ignored). I'm not sure I
understand you statement that it is used to determine the "version" of the
record? Could you point me to the KIP regarding this field?

Regarding the byte arrays: I think byte arrays are indeed comparable,
right? We can compare them lexicographically, for example:
https://github.com/apache/kafka/blob/196bcfca0c56420793f85514d1602bde564b0651/clients/src/main/java/org/apache/kafka/common/utils/Bytes.java#L144


Guozhang


On Wed, Apr 11, 2018 at 2:09 AM, Luís Cabral 
wrote:

>
> Hi all,
>
>
> On my own previous statement:
> bq. Not that I mind doing it directly (I intend to use a Java client), but
> please be aware that a String binary representation is based on the charset
> encoding, while the Long binary representation varies according to the
> language.
>
>
> I went back to double check this, and it seems that parsing the binary
> directly to long is already done (e.g.: the 'timestamp' header), so I guess
> this is OK and I was simply over-engineering it.The KIP is now adapted to
> use long directly.
>
>
> On Guozhang's statement:
>  bq. I'm also in favor of making the `timestamp` a preserved config value
> along with `offset`, for which we would not go into the headers to look for
> the matching key, but directly look into the timestamp field of the message.
>
>
> In reviewing the previous point I think I understood this a bit better, in
> that you mean using the already existing 'timestamp' field, which the
> client can customize, in order to determine the version of the record.Would
> you then be OK with the client hijacking this when/if they need to use
> incremental versioning instead? There are some KIP's opened regarding this
> field, and this field itself is already used for other things, so it
> already has some meanings attached to it. I personally prefer to avoid
> attaching multiple meanings to a single field, but if it allows for this
> feature to go through, then I am OK with it.Please let me know your
> thoughts here.
>
> Cheers!
>
>


-- 
-- Guozhang


Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Luís Cabral

Hi all,


On my own previous statement:
bq. Not that I mind doing it directly (I intend to use a Java client), but 
please be aware that a String binary representation is based on the charset 
encoding, while the Long binary representation varies according to the language.


I went back to double check this, and it seems that parsing the binary directly 
to long is already done (e.g.: the 'timestamp' header), so I guess this is OK 
and I was simply over-engineering it.The KIP is now adapted to use long 
directly.


On Guozhang's statement:
 bq. I'm also in favor of making the `timestamp` a preserved config value along 
with `offset`, for which we would not go into the headers to look for the 
matching key, but directly look into the timestamp field of the message.


In reviewing the previous point I think I understood this a bit better, in that 
you mean using the already existing 'timestamp' field, which the client can 
customize, in order to determine the version of the record.Would you then be OK 
with the client hijacking this when/if they need to use incremental versioning 
instead? There are some KIP's opened regarding this field, and this field 
itself is already used for other things, so it already has some meanings 
attached to it. I personally prefer to avoid attaching multiple meanings to a 
single field, but if it allows for this feature to go through, then I am OK 
with it.Please let me know your thoughts here.

Cheers!



RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-10 Thread Luís Cabral
Hi Guozhang,

Thank you for the feedback!


bq. I'm also in favor of making the `timestamp` a preserved config value along 
with `offset`, for which we would not go into the headers to look for the 
matching key, but directly look into the timestamp field of the message.

Do you mean using an automatically filled timestamp, or a timestamp set by the 
record producer client?
Conceptually, I would also prefer the record version to be on the same level as 
“key”. Sadly, however, that means a huge non-backwards compatible change in the 
API, as the stream between the client and the server is done via 
serialization/deserialization -- we would be moving the byte positions in the 
new release of Kafka, and possibly corrupt pre-existing topics.


bq. About the `long` conversion from `byte[]` values, why couldn't we just 
compare on the byte arrays directly, than enforcing it to be encoded as a long 
type, then comparing on two longs?

Sadly, byte arrays are not comparable. You can use them to check for equality, 
but they don’t give you any representation of being greater or lesser than 
another byte array until they are converted into an actually comparable value.


Cheers


From: Guozhang Wang
Sent: 09 April 2018 22:19
To: dev@kafka.apache.org
Cc: Konstantin Chukhlomin
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Thanks for the KIP.

1) I'm also in favor of making the `timestamp` a preserved config value
along with `offset`, for which we would not go into the headers to look for
the matching key, but directly look into the timestamp field of the message.

2) About the `long` conversion from `byte[]` values, why couldn't we just
compare on the byte arrays directly, than enforcing it to be encoded as a
long type, then comparing on two longs? If we directly compare the bytes,
we can 2.a) allow flexible types as the compaction keys, 2.b) even for long
typed compaction key, comparing their encoded bytes directly should still
work for positive values (for negative timestamp support we already have a
KIP here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
228+Negative+record+timestamp+support, cc'ing the proposer as well, in
which the encoded bytes could still be correctly compared lexichronically).


Guozhang


On Mon, Apr 9, 2018 at 11:34 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Thanks for clarification.
>
> I understand the config name now. Makes sense.
>
>
> > bq. It might also be good, to elaborate why you suggest "long" for the
> compaction value is the KIP itself.
> >
> > I would prefer that the definition simply be "long", to keep the code
> contract cleaner, and leave the clients to infer the usages according to
> their scenarios.
> > But if its important, then I can add it (let me know).
>
> I think adding couple of examples would be good enough.
>
>
> > bq. One more though: the KIP basically allows, that a record with larger
> offset is deleted while a record with smaller offset is preserved (if the
> record with smaller offset has a larger "compaction value" than the record
> with the larger offset). I don't see a issue with this atm, just wanted to
> point it out, as it seems to be an important change in behavior (compaction
> does not strictly "move forward" any longer if you wish).
> >
> > This is the whole purpose of the KIP, though. Is it not clear that this
> is the intention?
>
> The KIP is clear. Just wanted to point it out. Not sure if others have
> concerns about this change in behavior.
>
> > bq. Do you effectively mean, that the value has exactly 8 bytes?
> >
> > Not exactly... At the moment, the code in the pull request expects that
> the header value supports "byte[] => String => Long" as a valid conversion,
> so having a direct "long-as-byte[]" is not considered to be valid.
> > I am not sure which practice is preferred in Kafka, but given the base
> approach of "byte[]" as the header value, it seemed saner to support a
> String value rather than a Long.
> > Sadly, since Kafka needs to comprehend and process the actual value, I
> could not find a way to support both approaches.
> >
> > This last part is tricky, and where I expected more debates to arise,
> though in the end it can be boiled down to a "Long vs String" thing, so I
> hope it doesn't become a blocker...
>
> That's a tricky question... Personally, I think a direct byte[]->long
> conversion would be straight forward. What is the advantage of the
> intermediate "String" type?
>
> We could also introduce a magic byte that indicate the type of the value
> -- but I am not sure if we would need this flexibility. It also make
> serializing the value on the client side more complex.
>
> Whatever the dec

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Luís Cabral
Hi Matthias,


bq. I think adding couple of examples would be good enough.

I’ll add them to the KIP then.


bq. That's a tricky question... Personally, I think a direct byte[]->long 
conversion would be straight forward. What is the advantage of the intermediate 
"String" type?

Not that I mind doing it directly (I intend to use a Java client), but please 
be aware that a String binary representation is based on the charset encoding, 
while the Long binary representation varies according to the language.
If it is intended that the Kafka remains language agnostic, then I recommend 
keeping String (following the code used for ‘key’, I set it to UTF-8).
I’ll add the currently intended approach and explain why on the KIP, and then 
adapt as the discussions unfold.


Cheers


From: Matthias J. Sax
Sent: 09 April 2018 20:35
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Thanks for clarification.

I understand the config name now. Makes sense.


> bq. It might also be good, to elaborate why you suggest "long" for the 
> compaction value is the KIP itself.
> 
> I would prefer that the definition simply be "long", to keep the code 
> contract cleaner, and leave the clients to infer the usages according to 
> their scenarios.
> But if its important, then I can add it (let me know).

I think adding couple of examples would be good enough.


> bq. One more though: the KIP basically allows, that a record with larger 
> offset is deleted while a record with smaller offset is preserved (if the 
> record with smaller offset has a larger "compaction value" than the record 
> with the larger offset). I don't see a issue with this atm, just wanted to 
> point it out, as it seems to be an important change in behavior (compaction 
> does not strictly "move forward" any longer if you wish).
> 
> This is the whole purpose of the KIP, though. Is it not clear that this is 
> the intention?

The KIP is clear. Just wanted to point it out. Not sure if others have
concerns about this change in behavior.

> bq. Do you effectively mean, that the value has exactly 8 bytes?
> 
> Not exactly... At the moment, the code in the pull request expects that the 
> header value supports "byte[] => String => Long" as a valid conversion, so 
> having a direct "long-as-byte[]" is not considered to be valid.
> I am not sure which practice is preferred in Kafka, but given the base 
> approach of "byte[]" as the header value, it seemed saner to support a String 
> value rather than a Long.
> Sadly, since Kafka needs to comprehend and process the actual value, I could 
> not find a way to support both approaches.
> 
> This last part is tricky, and where I expected more debates to arise, though 
> in the end it can be boiled down to a "Long vs String" thing, so I hope it 
> doesn't become a blocker...

That's a tricky question... Personally, I think a direct byte[]->long
conversion would be straight forward. What is the advantage of the
intermediate "String" type?

We could also introduce a magic byte that indicate the type of the value
-- but I am not sure if we would need this flexibility. It also make
serializing the value on the client side more complex.

Whatever the decision is, the KIP should explain that value format is
expected in detail.



-Matthias


On 4/9/18 2:20 AM, Luís Cabral wrote:
> Hi,
> 
> 
> bq. About naming: the broker config has `cleaner` in it, while the topic 
> config does not. Is might be more consistent if either both have `cleaner` or 
> none of them? (Personally, I would prefer to strip `cleaner` for the broker 
> config.)
> 
> 
> Do you mean the "log.cleaner." prefix attached to the global config?
> If so, then this seems to be the naming approach used in this project, so I 
> would rather stick to it (changing this will likely lead to a bigger --off 
> topic-- discussion).
> 
> 
> 
> bq. The sentence "toggle the compaction strategy to this approach" does not 
> make clear that is should be the default -- even if you follow a common Kafka 
> pattern, the KIP should make it explicit (new people might not be familiar 
> with the pattern and cannot infer from the name itself that it is supposed to 
> be a global default setting that can be overwritten on a per-topic bases with 
> the second config).
> 
> 
> Sorry, I'm having some problems understanding this part.
> If it is about the meaning of the properties (global vs local), then isn't 
> this explained in the Kafka Docs? 
>   https://kafka.apache.org/documentation
> 
> Otherwise, could you kindly elaborate?
> 
> 
> 
> bq. It might also be good, to elaborate why you suggest "long" for the 
> compaction va

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Guozhang Wang
e, if two messages have the same "compaction value" in the
> header? (For timestamps, there is the same issue, and one idea was to use
> the offset as tie-breaker)
> >>>
> >>> Sorry, I forgot to mention that in the KIP. In the pull request used
> with the KIP you can see that it is indeed using the offset as a
> tie-breaker in case the header values are the same.
> >>> I’ll make this clear by adding it as part of the proposed changes.
> >
> > Think you forgot to actually add this case. :)
> >
> >
> > One more question:
> >
> >> cast-able to "long"
> >
> > Do you effectively mean, that the value has exactly 8 bytes?
> >
> >
> > -Matthias
> >
> > On 4/8/18 3:44 AM, Luís Cabral wrote:
> >> Hi Matthias,
> >>
> >>
> >> bq. Why do we need two new configs? Why is the topic config
> `compaction.strategy` not sufficient?
> >>
> >> As I understand these configurations, one allows you to configure the
> default for all topics while the other allows you to configure a single
> topic directly.
> >> If this is incorrect, or if having a global toggle is not desired, then
> I have no issues with having only the topic-relevant configuration.
> >>
> >>
> >> bq. For Kafka Streams we did think about a timestamp base compaction at
> some point (internal brain storming)---we never thought this through in
> details, but it might be a good idea to discuss it in this KIP and maybe
> piggy-back it if we want it (as a second pre-defined strategy "timestamp"
> next to "offset"?)
> >>
> >> The reason why I went for a “long” value here was mainly to support the
> 2 most common versioning patterns around: incremental numerals and
> timestamp (long representing milliseconds since 0h, January 1, 1970 GMT).
> >> Is this not enough to represent the strategy you guys had in mind? I
> would love to hear more about those discussions so this KIP can fulfil some
> more requirements that I am not aware of at the moment.
> >>
> >>
> >> bq. With the header approach it is not ensured that each record uses a
> unique "compaction value" (in contrast to offsets). Thus, what should the
> behaviour be, if two messages have the same "compaction value" in the
> header? (For timestamps, there is the same issue, and one idea was to use
> the offset as tie-breaker)
> >>
> >> Sorry, I forgot to mention that in the KIP. In the pull request used
> with the KIP you can see that it is indeed using the offset as a
> tie-breaker in case the header values are the same.
> >> I’ll make this clear by adding it as part of the proposed changes.
> >>
> >>
> >> bq. What should the behaviour be, if a message does not encode the
> "compaction key" in the header?
> >>
> >> The intention is that if both records being compared don’t have this
> value, then the offset is used instead. However, if only one of these
> records doesn’t have it, then whichever record has a “compaction key” is
> kept (as the other is considered to be anomalous).
> >> I’ll also add this to the proposed changes in the KIP to highlight
> these fall-back behaviours.
> >>
> >>
> >> Thank you for the feedback and looking forward for more replies!
> >>
> >> Cheers
> >>
> >>
> >> From: Matthias J. Sax
> >> Sent: 08 April 2018 05:29
> >> To: dev@kafka.apache.org
> >> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >>
> >> Luís,
> >>
> >> thanks a lot for this KIP. Very interesting idea.
> >>
> >> Couple of questions:
> >>
> >>   - Why do we need two new configs? Why is the topic config
> >> `compaction.strategy` not sufficient?
> >>
> >>   - For Kafka Streams we did think about a timestamp base compaction at
> >> some point (internal brain storming)---we never thought this through in
> >> details, but it might be a good idea to discuss it in this KIP and maybe
> >> piggy-back it if we want it (as a second pre-defined strategy
> >> "timestamp" next to "offset"?)
> >>
> >>   - With the header approach it is not ensured that each record uses a
> >> unique "compaction value" (in contrast to offsets). Thus, what should
> >> the behavior be, if two messages have the same "compaction value" in the
> >> header? (For timestamps, there is the same issue, and one idea was to
> >> use the offset as tie-br

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Matthias J. Sax
;> it if we want it (as a second pre-defined strategy "timestamp" next to 
>> "offset"?)
>>
>> The reason why I went for a “long” value here was mainly to support the 2 
>> most common versioning patterns around: incremental numerals and timestamp 
>> (long representing milliseconds since 0h, January 1, 1970 GMT).
>> Is this not enough to represent the strategy you guys had in mind? I would 
>> love to hear more about those discussions so this KIP can fulfil some more 
>> requirements that I am not aware of at the moment.
>>
>>
>> bq. With the header approach it is not ensured that each record uses a 
>> unique "compaction value" (in contrast to offsets). Thus, what should the 
>> behaviour be, if two messages have the same "compaction value" in the 
>> header? (For timestamps, there is the same issue, and one idea was to use 
>> the offset as tie-breaker)
>>
>> Sorry, I forgot to mention that in the KIP. In the pull request used with 
>> the KIP you can see that it is indeed using the offset as a tie-breaker in 
>> case the header values are the same.
>> I’ll make this clear by adding it as part of the proposed changes.
>>
>>
>> bq. What should the behaviour be, if a message does not encode the 
>> "compaction key" in the header?
>>
>> The intention is that if both records being compared don’t have this value, 
>> then the offset is used instead. However, if only one of these records 
>> doesn’t have it, then whichever record has a “compaction key” is kept (as 
>> the other is considered to be anomalous).
>> I’ll also add this to the proposed changes in the KIP to highlight these 
>> fall-back behaviours.
>>
>>
>> Thank you for the feedback and looking forward for more replies!
>>
>> Cheers
>>
>>
>> From: Matthias J. Sax
>> Sent: 08 April 2018 05:29
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>>
>> Luís,
>>
>> thanks a lot for this KIP. Very interesting idea.
>>
>> Couple of questions:
>>
>>   - Why do we need two new configs? Why is the topic config
>> `compaction.strategy` not sufficient?
>>
>>   - For Kafka Streams we did think about a timestamp base compaction at
>> some point (internal brain storming)---we never thought this through in
>> details, but it might be a good idea to discuss it in this KIP and maybe
>> piggy-back it if we want it (as a second pre-defined strategy
>> "timestamp" next to "offset"?)
>>
>>   - With the header approach it is not ensured that each record uses a
>> unique "compaction value" (in contrast to offsets). Thus, what should
>> the behavior be, if two messages have the same "compaction value" in the
>> header? (For timestamps, there is the same issue, and one idea was to
>> use the offset as tie-breaker)
>>
>>   - What should the behavior be, if a message does not encode the
>> "compaction key" in the header?
>>
>>
>> -Matthias
>>
>>
>> On 4/5/18 11:59 PM, Luís Cabral wrote:
>>>   
>>> Thank you very much for taking the time to read it.
>>>
>>> bq. In the 'Proposed Changes' section, can you expand 'OCC' ?
>>> I've made the 'OCC' into a link pointing to the appropriate Wiki page 
>>> explaining what it is. This is not a particularly important part of the 
>>> change, it is just to reference the similarity between this proposal and 
>>> the version control offered by OCC.
>>>
>>> bq. Is it possible to enumerate the keys ?
>>> Do you mean hard-coding the header key used, rather than using a free-text 
>>> solution? If I were to hard-code header with key "version", for example, 
>>> then this may conflict with other clients that already use this header for 
>>> something else, making it cumbersome for them to try and use this strategy, 
>>> should they want it.
>>> If I misunderstood your points, then please correct me. I appreciate the 
>>> feedback!    On Thursday, April 5, 2018, 5:13:47 PM GMT+2, Ted Yu 
>>> <yuzhih...@gmail.com> wrote:  
>>>   
>>>   In the 'Proposed Changes' section, can you expand 'OCC' ?
>>>
>>> bq. Specifically changing this to anything other than "*offset*"
>>>
>>> Is it possible to enumerate the keys ? In the future, more metadata would
>>> be defined in record header - it is better to avoid collision.
>>>
>>> Cheers
>>>
>>> On Thu, Apr 5, 2018 at 2:05 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
>>> wrote:
>>>
>>>>
>>>> This is embarassingly hard to fix... going again...
>>>> 
>>>> KIP-280:  https://cwiki.apache.org/confluence/display/
>>>> KAFKA/KIP-280%3A+Enhanced+log+compaction
>>>> -
>>>> Pull-4822:  https://github.com/apache/kafka/pull/4822
>>>>
>>>>
>>>>     On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral
>>>> <luis_cab...@yahoo.com.INVALID> wrote:
>>>>
>>>>   Fixing the links:KIP-280:  https://cwiki.apache.org/confluence/display/
>>>> KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:  https://
>>>> github.com/apache/kafka/pull/4822
>>>>
>>>>
>>>> On 2018/04/0508:44:00, Luís Cabral <l...@yahoo.com.INVALID> wrote:
>>>>> Helloall,>
>>>>> Starting adiscussion for this feature.>
>>>>> KIP-280  :  https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>> 280%3A+Enhanced+log+compactionPull-4822:  https://github.com/apache/
>>>> kafka/pull/4822>
>>>>
>>>>> KindRegards,Luís>
>>>>
>>>>   
>>
>>
>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Luís Cabral
d that each record uses a 
>> unique "compaction value" (in contrast to offsets). Thus, what should the 
>> behaviour be, if two messages have the same "compaction value" in the 
>> header? (For timestamps, there is the same issue, and one idea was to use 
>> the offset as tie-breaker)
>> 
>> Sorry, I forgot to mention that in the KIP. In the pull request used with 
>> the KIP you can see that it is indeed using the offset as a tie-breaker in 
>> case the header values are the same.
>> I’ll make this clear by adding it as part of the proposed changes.

Think you forgot to actually add this case. :)


One more question:

> cast-able to "long"

Do you effectively mean, that the value has exactly 8 bytes?


-Matthias

On 4/8/18 3:44 AM, Luís Cabral wrote:
> Hi Matthias,
> 
> 
> bq. Why do we need two new configs? Why is the topic config 
> `compaction.strategy` not sufficient?
> 
> As I understand these configurations, one allows you to configure the default 
> for all topics while the other allows you to configure a single topic 
> directly.
> If this is incorrect, or if having a global toggle is not desired, then I 
> have no issues with having only the topic-relevant configuration.
> 
> 
> bq. For Kafka Streams we did think about a timestamp base compaction at some 
> point (internal brain storming)---we never thought this through in details, 
> but it might be a good idea to discuss it in this KIP and maybe piggy-back it 
> if we want it (as a second pre-defined strategy "timestamp" next to "offset"?)
> 
> The reason why I went for a “long” value here was mainly to support the 2 
> most common versioning patterns around: incremental numerals and timestamp 
> (long representing milliseconds since 0h, January 1, 1970 GMT).
> Is this not enough to represent the strategy you guys had in mind? I would 
> love to hear more about those discussions so this KIP can fulfil some more 
> requirements that I am not aware of at the moment.
> 
> 
> bq. With the header approach it is not ensured that each record uses a unique 
> "compaction value" (in contrast to offsets). Thus, what should the behaviour 
> be, if two messages have the same "compaction value" in the header? (For 
> timestamps, there is the same issue, and one idea was to use the offset as 
> tie-breaker)
> 
> Sorry, I forgot to mention that in the KIP. In the pull request used with the 
> KIP you can see that it is indeed using the offset as a tie-breaker in case 
> the header values are the same.
> I’ll make this clear by adding it as part of the proposed changes.
> 
> 
> bq. What should the behaviour be, if a message does not encode the 
> "compaction key" in the header?
> 
> The intention is that if both records being compared don’t have this value, 
> then the offset is used instead. However, if only one of these records 
> doesn’t have it, then whichever record has a “compaction key” is kept (as the 
> other is considered to be anomalous).
> I’ll also add this to the proposed changes in the KIP to highlight these 
> fall-back behaviours.
> 
> 
> Thank you for the feedback and looking forward for more replies!
> 
> Cheers
> 
> 
> From: Matthias J. Sax
> Sent: 08 April 2018 05:29
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Luís,
> 
> thanks a lot for this KIP. Very interesting idea.
> 
> Couple of questions:
> 
>  - Why do we need two new configs? Why is the topic config
> `compaction.strategy` not sufficient?
> 
>  - For Kafka Streams we did think about a timestamp base compaction at
> some point (internal brain storming)---we never thought this through in
> details, but it might be a good idea to discuss it in this KIP and maybe
> piggy-back it if we want it (as a second pre-defined strategy
> "timestamp" next to "offset"?)
> 
>  - With the header approach it is not ensured that each record uses a
> unique "compaction value" (in contrast to offsets). Thus, what should
> the behavior be, if two messages have the same "compaction value" in the
> header? (For timestamps, there is the same issue, and one idea was to
> use the offset as tie-breaker)
> 
>  - What should the behavior be, if a message does not encode the
> "compaction key" in the header?
> 
> 
> -Matthias
> 
> 
> On 4/5/18 11:59 PM, Luís Cabral wrote:
>>  
>> Thank you very much for taking the time to read it.
>>
>> bq. In the 'Proposed Changes' section, can you expand 'OCC' ?
>> I've made the 'OCC' into a link pointing to the appropriate Wiki page 
>> explaining what it is. T

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-08 Thread Matthias J. Sax
Thanks for clarification. This make sense to me.

About naming: the broker config has `cleaner` in it, while the topic
config does not. Is might be more consistent if either both have
`cleaner` or none of them? (Personally, I would prefer to strip
`cleaner` for the broker config.)

The sentence "toggle the compaction strategy to this approach" does not
make clear that is should be the default -- even if you follow a common
Kafka pattern, the KIP should make it explicit (new people might not be
familiar with the pattern and cannot infer from the name itself that it
is supposed to be a global default setting that can be overwritten on a
per-topic bases with the second config).

For the timestamp compaction, long is good enough. The idea about my
comment was really to add "timestamp" as reserved value for the
parameter and this should use the record's metadata timestamp (instead
of a header key named 'timestamp'). Before you update the KIP, other
might want to share their opinion about this.

It might also be good, to elaborate why you suggest "long" for the
compaction value is the KIP itself.

One more though: the KIP basically allows, that a record with larger
offset is deleted while a record with smaller offset is preserved (if
the record with smaller offset has a larger "compaction value" than the
record with the larger offset). I don't see a issue with this atm, just
wanted to point it out, as it seems to be an important change in
behavior (compaction does not strictly "move forward" any longer if you
wish).

>> bq. With the header approach it is not ensured that each record uses a 
>> unique "compaction value" (in contrast to offsets). Thus, what should the 
>> behaviour be, if two messages have the same "compaction value" in the 
>> header? (For timestamps, there is the same issue, and one idea was to use 
>> the offset as tie-breaker)
>> 
>> Sorry, I forgot to mention that in the KIP. In the pull request used with 
>> the KIP you can see that it is indeed using the offset as a tie-breaker in 
>> case the header values are the same.
>> I’ll make this clear by adding it as part of the proposed changes.

Think you forgot to actually add this case. :)


One more question:

> cast-able to "long"

Do you effectively mean, that the value has exactly 8 bytes?


-Matthias

On 4/8/18 3:44 AM, Luís Cabral wrote:
> Hi Matthias,
> 
> 
> bq. Why do we need two new configs? Why is the topic config 
> `compaction.strategy` not sufficient?
> 
> As I understand these configurations, one allows you to configure the default 
> for all topics while the other allows you to configure a single topic 
> directly.
> If this is incorrect, or if having a global toggle is not desired, then I 
> have no issues with having only the topic-relevant configuration.
> 
> 
> bq. For Kafka Streams we did think about a timestamp base compaction at some 
> point (internal brain storming)---we never thought this through in details, 
> but it might be a good idea to discuss it in this KIP and maybe piggy-back it 
> if we want it (as a second pre-defined strategy "timestamp" next to "offset"?)
> 
> The reason why I went for a “long” value here was mainly to support the 2 
> most common versioning patterns around: incremental numerals and timestamp 
> (long representing milliseconds since 0h, January 1, 1970 GMT).
> Is this not enough to represent the strategy you guys had in mind? I would 
> love to hear more about those discussions so this KIP can fulfil some more 
> requirements that I am not aware of at the moment.
> 
> 
> bq. With the header approach it is not ensured that each record uses a unique 
> "compaction value" (in contrast to offsets). Thus, what should the behaviour 
> be, if two messages have the same "compaction value" in the header? (For 
> timestamps, there is the same issue, and one idea was to use the offset as 
> tie-breaker)
> 
> Sorry, I forgot to mention that in the KIP. In the pull request used with the 
> KIP you can see that it is indeed using the offset as a tie-breaker in case 
> the header values are the same.
> I’ll make this clear by adding it as part of the proposed changes.
> 
> 
> bq. What should the behaviour be, if a message does not encode the 
> "compaction key" in the header?
> 
> The intention is that if both records being compared don’t have this value, 
> then the offset is used instead. However, if only one of these records 
> doesn’t have it, then whichever record has a “compaction key” is kept (as the 
> other is considered to be anomalous).
> I’ll also add this to the proposed changes in the KIP to highlight these 
> fall-back behaviours.
> 
> 
> Thank you fo

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-08 Thread Luís Cabral
Hi Matthias,


bq. Why do we need two new configs? Why is the topic config 
`compaction.strategy` not sufficient?

As I understand these configurations, one allows you to configure the default 
for all topics while the other allows you to configure a single topic directly.
If this is incorrect, or if having a global toggle is not desired, then I have 
no issues with having only the topic-relevant configuration.


bq. For Kafka Streams we did think about a timestamp base compaction at some 
point (internal brain storming)---we never thought this through in details, but 
it might be a good idea to discuss it in this KIP and maybe piggy-back it if we 
want it (as a second pre-defined strategy "timestamp" next to "offset"?)

The reason why I went for a “long” value here was mainly to support the 2 most 
common versioning patterns around: incremental numerals and timestamp (long 
representing milliseconds since 0h, January 1, 1970 GMT).
Is this not enough to represent the strategy you guys had in mind? I would love 
to hear more about those discussions so this KIP can fulfil some more 
requirements that I am not aware of at the moment.


bq. With the header approach it is not ensured that each record uses a unique 
"compaction value" (in contrast to offsets). Thus, what should the behaviour 
be, if two messages have the same "compaction value" in the header? (For 
timestamps, there is the same issue, and one idea was to use the offset as 
tie-breaker)

Sorry, I forgot to mention that in the KIP. In the pull request used with the 
KIP you can see that it is indeed using the offset as a tie-breaker in case the 
header values are the same.
I’ll make this clear by adding it as part of the proposed changes.


bq. What should the behaviour be, if a message does not encode the "compaction 
key" in the header?

The intention is that if both records being compared don’t have this value, 
then the offset is used instead. However, if only one of these records doesn’t 
have it, then whichever record has a “compaction key” is kept (as the other is 
considered to be anomalous).
I’ll also add this to the proposed changes in the KIP to highlight these 
fall-back behaviours.


Thank you for the feedback and looking forward for more replies!

Cheers


From: Matthias J. Sax
Sent: 08 April 2018 05:29
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Luís,

thanks a lot for this KIP. Very interesting idea.

Couple of questions:

 - Why do we need two new configs? Why is the topic config
`compaction.strategy` not sufficient?

 - For Kafka Streams we did think about a timestamp base compaction at
some point (internal brain storming)---we never thought this through in
details, but it might be a good idea to discuss it in this KIP and maybe
piggy-back it if we want it (as a second pre-defined strategy
"timestamp" next to "offset"?)

 - With the header approach it is not ensured that each record uses a
unique "compaction value" (in contrast to offsets). Thus, what should
the behavior be, if two messages have the same "compaction value" in the
header? (For timestamps, there is the same issue, and one idea was to
use the offset as tie-breaker)

 - What should the behavior be, if a message does not encode the
"compaction key" in the header?


-Matthias


On 4/5/18 11:59 PM, Luís Cabral wrote:
>  
> Thank you very much for taking the time to read it.
> 
> bq. In the 'Proposed Changes' section, can you expand 'OCC' ?
> I've made the 'OCC' into a link pointing to the appropriate Wiki page 
> explaining what it is. This is not a particularly important part of the 
> change, it is just to reference the similarity between this proposal and the 
> version control offered by OCC.
> 
> bq. Is it possible to enumerate the keys ?
> Do you mean hard-coding the header key used, rather than using a free-text 
> solution? If I were to hard-code header with key "version", for example, then 
> this may conflict with other clients that already use this header for 
> something else, making it cumbersome for them to try and use this strategy, 
> should they want it.
> If I misunderstood your points, then please correct me. I appreciate the 
> feedback!On Thursday, April 5, 2018, 5:13:47 PM GMT+2, Ted Yu 
> <yuzhih...@gmail.com> wrote:  
>  
>  In the 'Proposed Changes' section, can you expand 'OCC' ?
> 
> bq. Specifically changing this to anything other than "*offset*"
> 
> Is it possible to enumerate the keys ? In the future, more metadata would
> be defined in record header - it is better to avoid collision.
> 
> Cheers
> 
> On Thu, Apr 5, 2018 at 2:05 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
> wrote:
> 
>>
>> This is embarassingly hard to fix... going again...
>> -

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-07 Thread Matthias J. Sax
Luís,

thanks a lot for this KIP. Very interesting idea.

Couple of questions:

 - Why do we need two new configs? Why is the topic config
`compaction.strategy` not sufficient?

 - For Kafka Streams we did think about a timestamp base compaction at
some point (internal brain storming)---we never thought this through in
details, but it might be a good idea to discuss it in this KIP and maybe
piggy-back it if we want it (as a second pre-defined strategy
"timestamp" next to "offset"?)

 - With the header approach it is not ensured that each record uses a
unique "compaction value" (in contrast to offsets). Thus, what should
the behavior be, if two messages have the same "compaction value" in the
header? (For timestamps, there is the same issue, and one idea was to
use the offset as tie-breaker)

 - What should the behavior be, if a message does not encode the
"compaction key" in the header?


-Matthias


On 4/5/18 11:59 PM, Luís Cabral wrote:
>  
> Thank you very much for taking the time to read it.
> 
> bq. In the 'Proposed Changes' section, can you expand 'OCC' ?
> I've made the 'OCC' into a link pointing to the appropriate Wiki page 
> explaining what it is. This is not a particularly important part of the 
> change, it is just to reference the similarity between this proposal and the 
> version control offered by OCC.
> 
> bq. Is it possible to enumerate the keys ?
> Do you mean hard-coding the header key used, rather than using a free-text 
> solution? If I were to hard-code header with key "version", for example, then 
> this may conflict with other clients that already use this header for 
> something else, making it cumbersome for them to try and use this strategy, 
> should they want it.
> If I misunderstood your points, then please correct me. I appreciate the 
> feedback!On Thursday, April 5, 2018, 5:13:47 PM GMT+2, Ted Yu 
>  wrote:  
>  
>  In the 'Proposed Changes' section, can you expand 'OCC' ?
> 
> bq. Specifically changing this to anything other than "*offset*"
> 
> Is it possible to enumerate the keys ? In the future, more metadata would
> be defined in record header - it is better to avoid collision.
> 
> Cheers
> 
> On Thu, Apr 5, 2018 at 2:05 AM, Luís Cabral 
> wrote:
> 
>>
>> This is embarassingly hard to fix... going again...
>> 
>> KIP-280:  https://cwiki.apache.org/confluence/display/
>> KAFKA/KIP-280%3A+Enhanced+log+compaction
>> -
>> Pull-4822:  https://github.com/apache/kafka/pull/4822
>>
>>
>>     On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral
>>  wrote:
>>
>>   Fixing the links:KIP-280:  https://cwiki.apache.org/confluence/display/
>> KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:  https://
>> github.com/apache/kafka/pull/4822
>>
>>
>> On 2018/04/0508:44:00, Luís Cabral  wrote:
>>> Helloall,>
>>> Starting adiscussion for this feature.>
>>> KIP-280  :  https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 280%3A+Enhanced+log+compactionPull-4822:  https://github.com/apache/
>> kafka/pull/4822>
>>
>>> KindRegards,Luís>
>>
>>  



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-06 Thread Luís Cabral
 
Thank you very much for taking the time to read it.

bq. In the 'Proposed Changes' section, can you expand 'OCC' ?
I've made the 'OCC' into a link pointing to the appropriate Wiki page 
explaining what it is. This is not a particularly important part of the change, 
it is just to reference the similarity between this proposal and the version 
control offered by OCC.

bq. Is it possible to enumerate the keys ?
Do you mean hard-coding the header key used, rather than using a free-text 
solution? If I were to hard-code header with key "version", for example, then 
this may conflict with other clients that already use this header for something 
else, making it cumbersome for them to try and use this strategy, should they 
want it.
If I misunderstood your points, then please correct me. I appreciate the 
feedback!On Thursday, April 5, 2018, 5:13:47 PM GMT+2, Ted Yu 
 wrote:  
 
 In the 'Proposed Changes' section, can you expand 'OCC' ?

bq. Specifically changing this to anything other than "*offset*"

Is it possible to enumerate the keys ? In the future, more metadata would
be defined in record header - it is better to avoid collision.

Cheers

On Thu, Apr 5, 2018 at 2:05 AM, Luís Cabral 
wrote:

>
> This is embarassingly hard to fix... going again...
> 
> KIP-280:  https://cwiki.apache.org/confluence/display/
> KAFKA/KIP-280%3A+Enhanced+log+compaction
> -
> Pull-4822:  https://github.com/apache/kafka/pull/4822
>
>
>    On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral
>  wrote:
>
>  Fixing the links:KIP-280:  https://cwiki.apache.org/confluence/display/
> KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:  https://
> github.com/apache/kafka/pull/4822
>
>
> On 2018/04/0508:44:00, Luís Cabral  wrote:
> > Helloall,>
> > Starting adiscussion for this feature.>
> >KIP-280  :  https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%3A+Enhanced+log+compactionPull-4822:  https://github.com/apache/
> kafka/pull/4822>
>
> > KindRegards,Luís>
>
>  

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-05 Thread Ted Yu
In the 'Proposed Changes' section, can you expand 'OCC' ?

bq. Specifically changing this to anything other than "*offset*"

Is it possible to enumerate the keys ? In the future, more metadata would
be defined in record header - it is better to avoid collision.

Cheers

On Thu, Apr 5, 2018 at 2:05 AM, Luís Cabral 
wrote:

>
> This is embarassingly hard to fix... going again...
> 
> KIP-280:  https://cwiki.apache.org/confluence/display/
> KAFKA/KIP-280%3A+Enhanced+log+compaction
> -
> Pull-4822:  https://github.com/apache/kafka/pull/4822
>
>
> On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral
>  wrote:
>
>   Fixing the links:KIP-280:  https://cwiki.apache.org/confluence/display/
> KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:  https://
> github.com/apache/kafka/pull/4822
>
>
> On 2018/04/0508:44:00, Luís Cabral  wrote:
> > Helloall,>
> > Starting adiscussion for this feature.>
> >KIP-280   :  https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%3A+Enhanced+log+compactionPull-4822:  https://github.com/apache/
> kafka/pull/4822>
>
> > KindRegards,Luís>
>
>


Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-05 Thread Luís Cabral
 
This is embarassingly hard to fix... going again...

KIP-280:  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction
-
Pull-4822:  https://github.com/apache/kafka/pull/4822


On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral 
 wrote:  
 
  Fixing the links:KIP-280:  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:
  https://github.com/apache/kafka/pull/4822


On 2018/04/0508:44:00, Luís Cabral  wrote: 
> Helloall,> 
> Starting adiscussion for this feature.> 
>KIP-280   :  
>https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:
>  https://github.com/apache/kafka/pull/4822>

> KindRegards,Luís> 
  

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-05 Thread Luís Cabral
 Fixing the links:KIP-280:  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:
  https://github.com/apache/kafka/pull/4822


On 2018/04/0508:44:00, Luís Cabral  wrote: 
> Helloall,> 
> Starting adiscussion for this feature.> 
>KIP-280   :  
>https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:
>  https://github.com/apache/kafka/pull/4822>

> KindRegards,Luís> 



[DISCUSS] KIP-280: Enhanced log compaction

2018-04-05 Thread Luís Cabral
Hello all,
Starting a discussion for this feature.
KIP-280   :  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822
 :  https://github.com/apache/kafka/pull/4822
Kind Regards,Luís