Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-12-18 Thread Guozhang Wang
gt; Senthil > > -Original Message- > From: radai > Sent: Thursday, December 12, 2019 11:40 AM > To: dev@kafka.apache.org > Subject: Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction > > may I suggest that if, under "header" strategy, multiple records a

RE: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-12-16 Thread Senthilnathan Muthusamy
of the current KIP... Appreciate your valuable feedback! Regards, Senthil -Original Message- From: radai Sent: Thursday, December 12, 2019 11:40 AM To: dev@kafka.apache.org Subject: Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction may I suggest that if, under "header"

Re: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-12-12 Thread radai
cord for non-offset based compaction strategy). Please review > and let me know if you have any other feedback. > > Regards, > Senthil > > -Original Message- > From: Jun Rao > Sent: Tuesday, November 26, 2019 4:36 PM > To: dev > Subject: [EXTERNAL] Re: [DI

RE: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-26 Thread Senthilnathan Muthusamy
: Tuesday, November 26, 2019 4:36 PM To: dev Subject: [EXTERNAL] Re: [DISCUSS] KIP-280: Enhanced log compaction Hi, Senthil, Sorry for the delay. 51. It seems that we can just remove the last record from the batch, but keeps the batch during compaction. The batch level metadata is enough

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-26 Thread Jun Rao
; ahead and update the KIP and proceed. > > Thanks > Senthil > > - Senthil > > From: Senthilnathan Muthusamy > Sent: Wednesday, November 20, 2019 5:04:20 PM > To: dev@kafka.apache.org > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-26 Thread Senthilnathan Muthusamy
.org Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction Hi Gouzhang & Jun, Thanks for the detailed on the scenarios. #51 => thanks for the details Gouzhang with example. Does followers won't be sync'ing LEO as well with leader? If yes, keeping last record always (without compact

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-20 Thread Senthilnathan Muthusamy
ght? Thanks, Senthil -Original Message- From: Jun Rao Sent: Wednesday, November 13, 2019 9:31 AM To: dev Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Hi, Seth, 51. The difference is that with the offset compaction strategy, the message corresponding to the last offset i

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-13 Thread Jun Rao
t; > Thanks, > Senthil > > -Original Message- > From: Jun Rao > Sent: Thursday, November 7, 2019 4:32 PM > To: dev > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hi, Senthil, > > Thanks for bringing back this KIP. Overall, this seems like a

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-12 Thread Guozhang Wang
pact strategy > is by the offsetmap). this is what my understand on the tombstone based on > the code walk-thru... please let me know if I am missing anything here... > > Thanks, > Senthil > > -----Original Message- > From: Jun Rao > Sent: Thursday, November

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-11 Thread Senthilnathan Muthusamy
... Thanks, Senthil -Original Message- From: Jun Rao Sent: Thursday, November 7, 2019 4:32 PM To: dev Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Hi, Senthil, Thanks for bringing back this KIP. Overall, this seems like a useful feature. A few comments below. 50. One use case

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-07 Thread Jun Rao
hil > > -Original Message- > From: Guozhang Wang > Sent: Monday, November 4, 2019 11:00 AM > To: dev > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hello Senthilnathan, > > Thanks for revamping on the KIP. I have only one comment about th

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-05 Thread Matthias J. Sax
de a note in the JIRA item to make sure the wiki is updated. > > Thanks, > Senthil > > -Original Message- > From: Guozhang Wang > Sent: Monday, November 4, 2019 11:00 AM > To: dev > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hello Senthilnatha

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-05 Thread Senthilnathan Muthusamy
Hi Guozhang, Sure and I have made a note in the JIRA item to make sure the wiki is updated. Thanks, Senthil -Original Message- From: Guozhang Wang Sent: Monday, November 4, 2019 11:00 AM To: dev Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Hello Senthilnathan, Thanks

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-05 Thread Senthilnathan Muthusamy
Thanks for pointing it out Eric. Updated the KIP... Regards, Senthil -Original Message- From: Guozhang Wang Sent: Monday, November 4, 2019 11:52 AM To: dev Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Eric, I think that's a good point, in `Headers.java` we also designed

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Guozhang Wang
IP. If there are > > > any more thoughts I would love to hear them. > > > > > > Thanks, > > > Senthil > > > > > > -Original Message- > > > From: Senthilnathan Muthusamy > > > Sent: Thursday, October 31, 2

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Eric Azama
> > > I will start the vote thread shortly for this updated KIP. If there are > > any more thoughts I would love to hear them. > > > > Thanks, > > Senthil > > > > -Original Message- > > From: Senthilnathan Muthusamy > > Sent: Thursday, October 31,

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Guozhang Wang
al Message- > From: Senthilnathan Muthusamy > Sent: Thursday, October 31, 2019 3:51 AM > To: dev@kafka.apache.org > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction > > Hi Matthias > > Thanks for the response. > > (1) Yes > > (2) Yes, and

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-11-04 Thread Senthilnathan Muthusamy
t me know if you have any other questions. Thanks, Senthil -Original Message- From: Matthias J. Sax Sent: Thursday, October 31, 2019 12:13 AM To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Thanks for picking up this KIP, Senthil. (1) As far as I reme

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-31 Thread Senthilnathan Muthusamy
ther questions. Thanks, Senthil -Original Message- From: Matthias J. Sax Sent: Thursday, October 31, 2019 12:13 AM To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Thanks for picking up this KIP, Senthil. (1) As far as I remember, the main i

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-31 Thread Matthias J. Sax
questions on this updated KIP-280... > > Thanks, > > Senthil > > -Original Message- > From: Senthilnathan Muthusamy > Sent: Monday, October 28, 2019 11:36 PM > To: dev@kafka.apache.org > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction > > Hi

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-30 Thread Senthilnathan Muthusamy
Hi, Please let me know if anyone has any questions on this updated KIP-280... Thanks, Senthil -Original Message- From: Senthilnathan Muthusamy Sent: Monday, October 28, 2019 11:36 PM To: dev@kafka.apache.org Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction Hi Tom, Sorry

RE: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-29 Thread Senthilnathan Muthusamy
had a detail discussion on the original KIP with previous author and it would great to hear your inputs as well. Thanks, Senthil -Original Message- From: Tom Bentley Sent: Tuesday, October 22, 2019 2:32 AM To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Re: [DISCUSS] KIP-280: Enhanced log compaction

2019-10-22 Thread Tom Bentley
Hi Senthilnathan, In the motivation isn't it a little misleading to say "On the producer side, we clearly preserve an order for the two messages, "? IMHO, the semantics of the producer are clear that having an observed order of sending records from different producers is not sufficient to

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-10-22 Thread Luís Cabral
order messages never even make it into the topic.  They are > blocked by the broker. > > -Original Message- > From: Guozhang Wang > Sent: Saturday, September 1, 2018 11:33 AM > To: dev > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hello Luis,

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-10-11 Thread Luís Cabral
order messages never even make it into the topic.  They are > blocked by the broker. > > -Original Message- > From: Guozhang Wang > Sent: Saturday, September 1, 2018 11:33 AM > To: dev > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hello Luis,

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-09-30 Thread Matthias J. Sax
rom: Guozhang Wang > Sent: Saturday, September 1, 2018 11:33 AM > To: dev > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hello Luis, > > Thanks for your thoughtful responses, here are my two cents: > > 1) I think having the new configs with per-topic g

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-09-21 Thread Bertus Greeff
better because out of order messages never even make it into the topic. They are blocked by the broker. -Original Message- From: Guozhang Wang Sent: Saturday, September 1, 2018 11:33 AM To: dev Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Hello Luis, Thanks for your

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-09-01 Thread Guozhang Wang
Hello Luis, Thanks for your thoughtful responses, here are my two cents: 1) I think having the new configs with per-topic granularity would not introduce much memory overhead or logic complexity, as all you need is to remember this at the topic metadata cache. If I've missed some details about

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-29 Thread Luís Cabral
Hi all, Since there has been a rejuvenated interest in this KIP, it felt better to downgrade it back down to [DISCUSSION], as we aren't really voting on it anymore. I'll try to address the currently pending questions on the following points, so please bear with me while we go through them all:

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-23 Thread Jun Rao
Hi, Luis, Thanks for the reply. A few more comments below. 1. About the topic level configuration. It seems that it's useful for the new configs to be at the topic level. Currently, the following configs related to compaction are already at the topic level. min.cleanable.dirty.ratio

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-16 Thread Guozhang Wang
Regarding "broker-agnostic of headers": there are some KIPs from Streams to use headers for internal purposes as well, e.g. KIP-258 and KIP-213 (I admit there may be a conflict with user space, but practically I think it is very rare). So I think we are very likely going to make Kafka internals to

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-08-16 Thread Bertus Greeff
I'm interested to know the status of this KIP. I see that the status is "Voting". How long does this normally take? We want to use Kafka and this KIP provides exactly the log compaction logic that we want for many of our projects. One piece of feedback that I have is that

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-06-01 Thread Guozhang Wang
ime in here as > well. > >> > >> > >> Guozhang > >> > >> > >> On Tue, May 22, 2018 at 6:45 AM, Luís Cabral > > >> wrote: > >> > >>> Hi Matthias / Guozhang, > >>> > >>> Were the questi

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-28 Thread Matthias J. Sax
>> wrote: >> >>> Hi Matthias / Guozhang, >>> >>> Were the questions clarified? >>> Please feel free to add more feedback, otherwise it would be nice to move >>> this topic onwards  >>> >>> Kind Regards, >>> Luís

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-28 Thread Luís Cabral
ified? >> Please feel free to add more feedback, otherwise it would be nice to move >> this topic onwards  >> >> Kind Regards, >> Luís Cabral >> >> From: Guozhang Wang >> Sent: 09 May 2018 20:00 >> To: dev@kafka.apache.org >> Subject: Re: [D

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-24 Thread Luis Cabral
e questions clarified? >> Please feel free to add more feedback, otherwise it would be nice to move >> this topic onwards  >> >> Kind Regards, >> Luís Cabral >> >> From: Guozhang Wang >> Sent: 09 May 2018 20:00 >> To: dev@kafka.apache.org >

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-22 Thread Guozhang Wang
e the questions clarified? > Please feel free to add more feedback, otherwise it would be nice to move > this topic onwards  > > Kind Regards, > Luís Cabral > > From: Guozhang Wang > Sent: 09 May 2018 20:00 > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-280: Enh

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-22 Thread Luís Cabral
Hi Matthias / Guozhang, Were the questions clarified? Please feel free to add more feedback, otherwise it would be nice to move this topic onwards  Kind Regards, Luís Cabral From: Guozhang Wang Sent: 09 May 2018 20:00 To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-280: Enhanced log

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-09 Thread Guozhang Wang
I have thought about being consistency in strategy v.s. practical concerns about storage convenience to its impact on compaction effectiveness. The different between timestamp and the header key-value pairs is that for the latter, as I mentioned before, "it is arguably out of Kafka's control, and

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-06 Thread Matthias J. Sax
Thanks. To reverse the question: if this argument holds, why does it not apply to the case when the header key is used as compaction attribute? I am not against keeping both records in case timestamps are equal, but shouldn't we apply the same strategy for all cases and don't use offset as

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-06 Thread Guozhang Wang
Hello Matthias, The related discussion was in the PR: https://github.com/apache/kafka/pull/4822#discussion_r184588037 The concern is that, to use offset as tie breaker we need to double the entry size of the entry in bounded compaction cache, and hence largely reduce the effectiveness of the

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-04 Thread Guozhang Wang
Thanks Luís, I do not have other comments on this KIP. I'd also like to ping Jason and Jun to take a look at this one. Guozhang On Thu, May 3, 2018 at 1:40 AM, Luís Cabral wrote: > > Hi Guozhang, > > No worries, looking at the traffic on this project, I'm sure

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-03 Thread Luís Cabral
Hi Guozhang, No worries, looking at the traffic on this project, I'm sure you have your hands full. Anyway, that proposal seems quite reasonable :- KIP is now updated to reflect those points. Are there any more topics you would like to address here? Cheers, LuisOn Wednesday, May 2,

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-02 Thread Guozhang Wang
Hello Luís, Sorry for the late reply. My understanding is that such duplicates will only happen if the non-offset version value, either the timestamp or some long-typed header key, are the same (i.e. we cannot break ties). 1. For timestamp, which is in milli-seconds, I think in practice the

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-05-02 Thread Luís Cabral
Hi Guozhang, Have you managed to have a look at my reply? How do you feel about this? Kind Regards, Luís Cabral On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral wrote: Hi Guozhang, I understand the argument, but this is a hazardous compromise for

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-30 Thread Luís Cabral
Hi Guozhang, I understand the argument, but this is a hazardous compromise for using Kafka as an event store (as is my original intention). I expect to have many duplicated messages in Kafka as the overall architecture being used allows for the producer to re-send a fresh state of the backed

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-27 Thread Guozhang Wang
Hello Luis, When the comparing the version returns `equal`, the original proposal is to use the offset as the tie breaker. My previous comment is that 1) when we build the map calling `put`, if there is already an entry for the key, compare its stored version, and replace if the put record's

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-27 Thread Luís Cabral
Hi, I was updating the PR to match the latest decisions and noticed (or rather, the integration tests noticed) that without storing the offset, then the cache doesn't know when to keep the record itself. This is because, after the cache is populated, all the records are compared against the

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-27 Thread Luís Cabral
Hi, The KIP is now updated with the results of the byte array discussion. This is my first contribution to Kafka, so I'm not sure on what the processes are. Is it now acceptable to take this into a vote, or should I ask for more contributors to join the discussion first? Kind Regards,Luis

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-26 Thread Guozhang Wang
Hello Luís, > Offset is an integer? I've only noticed it being resolved as a long so far. You are right, offset is a long. As for timestamp / other types, I left a comment in your PR about handling tie breakers. > Given these arguments, is this point something that you absolutely must have?

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-26 Thread Luís Cabral
Hi, bq. have a integer typed OffsetMap (for offset) Offset is an integer? I've only noticed it being resolved as a long so far. bq. long typed OffsetMap (for timestamp) We would still need to store the offset, as it is functioning as a tie-breaker.  Not that this is a big deal, we can be

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-25 Thread Guozhang Wang
rgin:0cm;margin-bottom:. > >> 0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv6853119978 a:link, > >> #yiv6853119978 span.yiv6853119978MsoHyperlink > {color:blue;text-decoration:underline;}#yiv6853119978 > >> a:visited, #yiv6853119978 span.yiv6853119978MsoHyperlinkFollowed > >> {color

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Luis Cabral
iv6853119978MsoChpDefault {} _filtered #yiv6853119978 {margin:72.0pt >> 72.0pt 72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1 >> {}#yiv6853119978 >> That is definitely clearer, KIP updated! >> >> >> >> From: Guozhang Wang >> Sent

Re: RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Guozhang Wang
nderline;}#yiv6853119978 > .yiv6853119978MsoChpDefault {} _filtered #yiv6853119978 {margin:72.0pt > 72.0pt 72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1 > {}#yiv6853119978 > That is definitely clearer, KIP updated! > > > > From: Guozhang Wang > Sent: 23 April

Re: RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Luís Cabral
margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1 {}#yiv6853119978 That is definitely clearer, KIP updated!   From: Guozhang Wang Sent: 23 April 2018 23:44 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction   Thanks Luís. The KIP

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
That is definitely clearer, KIP updated! From: Guozhang Wang Sent: 23 April 2018 23:44 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction Thanks Luís. The KIP looks good to me. Just that what I left as a minor: `When both records being compared contain

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
uís Cabral <luis_cab...@yahoo.com.invalid> wrote: > Hello Guozhang, > > The KIP is now updated to reflect this choice in strategy. > Please let me know your thoughts there. > > Kind Regards, > Luís > > From: Guozhang Wang > Sent: 23 April 2018 19:32 > To: dev@kafka.ap

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
Hello Guozhang, The KIP is now updated to reflect this choice in strategy. Please let me know your thoughts there. Kind Regards, Luís From: Guozhang Wang Sent: 23 April 2018 19:32 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction Hi Luis, I think

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
greater than or equal to zero, which ends up > being ok for my own use case... > This would then generally guarantee the lexicographic ordering, as you say. > Is this what you mean? Should I then add this restriction to the KIP? > > Cheers, > Luis > > From: Guozhang Wang &

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
the lexicographic ordering, as you say. Is this what you mean? Should I then add this restriction to the KIP? Cheers, Luis From: Guozhang Wang Sent: 23 April 2018 17:55 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction Hello Luis, Thanks for your email

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
Hello Luis, Thanks for your email, replying to your points in the following: > I don't personally see advantages in it, but also the only disadvantage that I can think of is putting multiple meanings on this field. If we do not treat timestamp as a special value of the config, then I cannot use

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
Hi Guozhang, Thank you very much for the patience in explaining your points, I've learnt quite a bit in researching and experimenting after your replies. bq. I still think it is worth defining `timestamp` as a special compaction value I don't personally see advantages in it, but also the

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-20 Thread Guozhang Wang
Hi Luís, What I'm thinking primarily is that we only need to compare the compaction values as LONG for the offset and timestmap "type" (I still think it is worth defining `timestamp` as a special compaction value, with the reasons below). Not sure if you've seen my other comment earlier

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-20 Thread Luís Cabral
Guozhang, is this reply ok with you? If you insist on the byte[] comparison directly, then I would need some suggestions on how to represent a "version" with it, and then the KIP could be changed to that. On Tuesday, April 17, 2018, 2:44:16 PM GMT+2, Luís Cabral

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Luís Cabral
Oops, missed that email... bq. It is because when we compare the bytes we do not treat them as longs atall, so we just compare them based on bytes; I admit that if users's headertypes have some semantic meanings (e.g. it is encoded from a long) they weare forcing them to choose the encoder

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Ted Yu
Can you respond to: http://search-hadoop.com/m/Kafka/uyzND1OlYaSzZ3SM1?subj=Re+RE+DISCUSS+KIP+280+Enhanced+log+compaction Original message From: Luís Cabral <luis_cab...@yahoo.com.INVALID> Date: 4/17/18 2:41 AM (GMT-08:00) To: dev@kafka.apache.org Subject: Re: RE: [D

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Luís Cabral
Hi all, There aren't that many discussions on this KIP, does that mean it should now move to voting? I'm not sure on the process here... Cheers

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
Yup, lazy copy-paste punishment :P Guozhang On Wed, Apr 11, 2018 at 10:19 AM, Ted Yu wrote: > bq. 2. if the config value is "timestamp", look into the offset field; > > I think you meant looking into timestamp field. > > Cheers > > On Wed, Apr 11, 2018 at 10:18 AM,

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
If you are referring to, for example: -4611686018427387904 > 0 -4611686018427387904 > 4611686018427387903 It is because when we compare the bytes we do not treat them as longs at all, so we just compare them based on bytes; I admit that if users's header types have some semantic meanings (e.g.

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Ted Yu
bq. 2. if the config value is "timestamp", look into the offset field; I think you meant looking into timestamp field. Cheers On Wed, Apr 11, 2018 at 10:18 AM, Guozhang Wang wrote: > > I do not mean that it is "used", but if what you meant is that you would > prefer to use

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
> I do not mean that it is "used", but if what you meant is that you would prefer to use that field instead of a header? > This is in relation to a previous point of yours: I think maybe we have a mis-communication here: I'm not against the idea of using headers, but just trying to argue that we

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Luís Cabral
Hi Guozhang, bq. I'm not sure I understand you statement that it is used to determine the "version" of the record I do not mean that it is "used", but if what you meant is that you would prefer to use that field instead of a header? This is in relation to a previous point of yours: >>> 1) I'm

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
Hello Luís, Regarding the timestamp: it is designed to be mainly used for indicating the time when this record is generated (i.e. CREATE_TIME at the producer side, it will set the timestamp), or when the record has been appended to Kafka brokers (i.e. LOG_APPEND_TIME at the broker side, where

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Luís Cabral
Hi all, On my own previous statement: bq. Not that I mind doing it directly (I intend to use a Java client), but please be aware that a String binary representation is based on the charset encoding, while the Long binary representation varies according to the language. I went back to double

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-10 Thread Luís Cabral
: Guozhang Wang Sent: 09 April 2018 22:19 To: dev@kafka.apache.org Cc: Konstantin Chukhlomin Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Thanks for the KIP. 1) I'm also in favor of making the `timestamp` a preserved config value along with `offset`, for which we would not go into the headers

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Luís Cabral
sets). Thus, what should the >> behaviour be, if two messages have the same "compaction value" in the >> header? (For timestamps, there is the same issue, and one idea was to use >> the offset as tie-breaker) >> >> Sorry, I forgot to mention that in the KIP. In the pull

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Guozhang Wang
y 1, 1970 GMT). > >> Is this not enough to represent the strategy you guys had in mind? I > would love to hear more about those discussions so this KIP can fulfil some > more requirements that I am not aware of at the moment. > >> > >> > >&g

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Matthias J. Sax
>> behaviour be, if two messages have the same "compaction value" in the >> header? (For timestamps, there is the same issue, and one idea was to use >> the offset as tie-breaker) >> >> Sorry, I forgot to mention that in the KIP. In the pull request used with >&

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-09 Thread Luís Cabral
t; > bq. What should the behaviour be, if a message does not encode the > "compaction key" in the header? > > The intention is that if both records being compared don’t have this value, > then the offset is used instead. However, if only one of these records >

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-08 Thread Matthias J. Sax
compaction key” is kept (as the > other is considered to be anomalous). > I’ll also add this to the proposed changes in the KIP to highlight these > fall-back behaviours. > > > Thank you for the feedback and looking forward for more replies! > > Cheers > > > From: Matth

RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-08 Thread Luís Cabral
kept (as the other is considered to be anomalous). I’ll also add this to the proposed changes in the KIP to highlight these fall-back behaviours. Thank you for the feedback and looking forward for more replies! Cheers From: Matthias J. Sax Sent: 08 April 2018 05:29 To: dev@kafka.apache.org Subje

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-07 Thread Matthias J. Sax
Luís, thanks a lot for this KIP. Very interesting idea. Couple of questions: - Why do we need two new configs? Why is the topic config `compaction.strategy` not sufficient? - For Kafka Streams we did think about a timestamp base compaction at some point (internal brain storming)---we never

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-06 Thread Luís Cabral
Thank you very much for taking the time to read it. bq. In the 'Proposed Changes' section, can you expand 'OCC' ? I've made the 'OCC' into a link pointing to the appropriate Wiki page explaining what it is. This is not a particularly important part of the change, it is just to reference the

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-05 Thread Ted Yu
In the 'Proposed Changes' section, can you expand 'OCC' ? bq. Specifically changing this to anything other than "*offset*" Is it possible to enumerate the keys ? In the future, more metadata would be defined in record header - it is better to avoid collision. Cheers On Thu, Apr 5, 2018 at 2:05

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-05 Thread Luís Cabral
This is embarassingly hard to fix... going again... KIP-280:   https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction - Pull-4822:  https://github.com/apache/kafka/pull/4822 On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral

Re: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-05 Thread Luís Cabral
 Fixing the links:KIP-280:   https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:   https://github.com/apache/kafka/pull/4822 On 2018/04/0508:44:00, Luís Cabral wrote: > Helloall,> > Starting adiscussion for this feature.>