RE: [EXTERNAL] Re: [VOTE] KIP-280: Enhanced log compaction

2019-12-16 Thread Senthilnathan Muthusamy
Sure Guozhang! Working on it and will post it when it is ready...

Thanks,
Senthil

-Original Message-
From: Guozhang Wang  
Sent: Sunday, December 8, 2019 6:47 PM
To: dev 
Subject: [EXTERNAL] Re: [VOTE] KIP-280: Enhanced log compaction

Thanks for the updated KIP, recasting my vote +1 on it again.

Thanks for driving the KIP discussion, and please feel free to ping the 
community when the PR is ready for reviews! :) One minor recommendation is to 
break it into smaller PRs to help on faster reviews and code merges.


Guozhang

On Tue, Nov 26, 2019 at 10:24 PM Senthilnathan Muthusamy 
 wrote:

> Jun,
>
> If the updated KIP looks good, can you please vote for it.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Thursday, November 7, 2019 4:33 PM
> To: dev 
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Thanks for the KIP. Added a few more comments on the discussion thread.
>
> Jun
>
> On Wed, Nov 6, 2019 at 3:38 AM Senthilnathan Muthusamy < 
> senth...@microsoft.com.invalid> wrote:
>
> > Thanks Matthias!
> >
> > Received 2 +1 binding... looking for one more +1 binding !
> >
> > Regards,
> > Senthil
> >
> > -Original Message-----
> > From: Matthias J. Sax 
> > Sent: Wednesday, November 6, 2019 12:10 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > +1 (binding)
> >
> > On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> > > Thanks Gouzhang and I have made a note in the JIRA item to update 
> > > the
> > wiki.
> > >
> > > Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> > >
> > > Regards,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Guozhang Wang 
> > > Sent: Monday, November 4, 2019 11:01 AM
> > > To: dev 
> > > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> > >
> > > I only have one minor comment on the DISCUSS thread, otherwise I'm
> > > +1
> > (binding).
> > >
> > > On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy <
> > senth...@microsoft.com.invalid> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I would like to start the vote on the updated KIP-280: Enhanced 
> > >> log compaction. Thanks to Guozhang, Matthias & Tom for the 
> > >> valuable feedback on the discussion thread...
> > >>
> > >> KIP:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fc
> > >> wi
> > >> k
> > >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnha
> > >> nc
> > >> ed
> > >> %
> > >> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7C
> > >> a8
> > >> ca
> > >> 2
> > >> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C
> > >> 1%
> > >> 7C
> > >> 0
> > >> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO
> > >> 85
> > >> df
> > >> j
> > >> IY6pI%3Dreserved=0
> > >>
> > >> JIRA:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fi
> > >> ss
> > >> u
> > >> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csen
> > >> th
> > >> il
> > >> m
> > >> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f
> > >> 14
> > >> 1a
> > >> f
> > >> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3X
> > >> RR
> > >> z%
> > >> 2
> > >> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
> > >>
> > >> Thanks,
> > >> Senthil
> > >>
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
> >
>


--
-- Guozhang


Re: [VOTE] KIP-280: Enhanced log compaction

2019-12-08 Thread Guozhang Wang
Thanks for the updated KIP, recasting my vote +1 on it again.

Thanks for driving the KIP discussion, and please feel free to ping the
community when the PR is ready for reviews! :) One minor recommendation is
to break it into smaller PRs to help on faster reviews and code merges.


Guozhang

On Tue, Nov 26, 2019 at 10:24 PM Senthilnathan Muthusamy
 wrote:

> Jun,
>
> If the updated KIP looks good, can you please vote for it.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Thursday, November 7, 2019 4:33 PM
> To: dev 
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Thanks for the KIP. Added a few more comments on the discussion thread.
>
> Jun
>
> On Wed, Nov 6, 2019 at 3:38 AM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
>
> > Thanks Matthias!
> >
> > Received 2 +1 binding... looking for one more +1 binding !
> >
> > Regards,
> > Senthil
> >
> > -Original Message-----
> > From: Matthias J. Sax 
> > Sent: Wednesday, November 6, 2019 12:10 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > +1 (binding)
> >
> > On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> > > Thanks Gouzhang and I have made a note in the JIRA item to update
> > > the
> > wiki.
> > >
> > > Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> > >
> > > Regards,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Guozhang Wang 
> > > Sent: Monday, November 4, 2019 11:01 AM
> > > To: dev 
> > > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> > >
> > > I only have one minor comment on the DISCUSS thread, otherwise I'm
> > > +1
> > (binding).
> > >
> > > On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy <
> > senth...@microsoft.com.invalid> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I would like to start the vote on the updated KIP-280: Enhanced log
> > >> compaction. Thanks to Guozhang, Matthias & Tom for the valuable
> > >> feedback on the discussion thread...
> > >>
> > >> KIP:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
> > >> wi
> > >> k
> > >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanc
> > >> ed
> > >> %
> > >> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7Ca8
> > >> ca
> > >> 2
> > >> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> > >> 7C
> > >> 0
> > >> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO85
> > >> df
> > >> j
> > >> IY6pI%3Dreserved=0
> > >>
> > >> JIRA:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fi
> > >> ss
> > >> u
> > >> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csenth
> > >> il
> > >> m
> > >> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f14
> > >> 1a
> > >> f
> > >> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3XRR
> > >> z%
> > >> 2
> > >> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
> > >>
> > >> Thanks,
> > >> Senthil
> > >>
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
> >
>


-- 
-- Guozhang


RE: [EXTERNAL] Re: [VOTE] KIP-280: Enhanced log compaction

2019-12-02 Thread Senthilnathan Muthusamy
Thanks Jun and added new recommendation for the consumer in the KIP.

As we have 3 +1 building votes (Guozhang, Matthias & Jun), considering this KIP 
as accepted.

Regards,
Senthil

-Original Message-
From: Jun Rao  
Sent: Wednesday, November 27, 2019 9:16 AM
To: dev 
Subject: [EXTERNAL] Re: [VOTE] KIP-280: Enhanced log compaction

Hi, Senthil,

Thanks for the updated KIP. +1 from me.

Could you also add in the recommendation section what the users should do when 
consuming the compacted topic with the new strategies (e.g. have to be more 
careful with what records to keep)?

Jun


On Tue, Nov 26, 2019 at 10:24 PM Senthilnathan Muthusamy 
 wrote:

> Jun,
>
> If the updated KIP looks good, can you please vote for it.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Thursday, November 7, 2019 4:33 PM
> To: dev 
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Thanks for the KIP. Added a few more comments on the discussion thread.
>
> Jun
>
> On Wed, Nov 6, 2019 at 3:38 AM Senthilnathan Muthusamy < 
> senth...@microsoft.com.invalid> wrote:
>
> > Thanks Matthias!
> >
> > Received 2 +1 binding... looking for one more +1 binding !
> >
> > Regards,
> > Senthil
> >
> > -Original Message-----
> > From: Matthias J. Sax 
> > Sent: Wednesday, November 6, 2019 12:10 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > +1 (binding)
> >
> > On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> > > Thanks Gouzhang and I have made a note in the JIRA item to update 
> > > the
> > wiki.
> > >
> > > Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> > >
> > > Regards,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Guozhang Wang 
> > > Sent: Monday, November 4, 2019 11:01 AM
> > > To: dev 
> > > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> > >
> > > I only have one minor comment on the DISCUSS thread, otherwise I'm
> > > +1
> > (binding).
> > >
> > > On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy <
> > senth...@microsoft.com.invalid> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I would like to start the vote on the updated KIP-280: Enhanced 
> > >> log compaction. Thanks to Guozhang, Matthias & Tom for the 
> > >> valuable feedback on the discussion thread...
> > >>
> > >> KIP:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fc
> > >> wi
> > >> k
> > >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnha
> > >> nc
> > >> ed
> > >> %
> > >> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7C
> > >> a8
> > >> ca
> > >> 2
> > >> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C
> > >> 1%
> > >> 7C
> > >> 0
> > >> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO
> > >> 85
> > >> df
> > >> j
> > >> IY6pI%3Dreserved=0
> > >>
> > >> JIRA:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fi
> > >> ss
> > >> u
> > >> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csen
> > >> th
> > >> il
> > >> m
> > >> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f
> > >> 14
> > >> 1a
> > >> f
> > >> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3X
> > >> RR
> > >> z%
> > >> 2
> > >> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
> > >>
> > >> Thanks,
> > >> Senthil
> > >>
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
> >
>


Re: [VOTE] KIP-280: Enhanced log compaction

2019-11-27 Thread Jun Rao
Hi, Senthil,

Thanks for the updated KIP. +1 from me.

Could you also add in the recommendation section what the users should do
when consuming the compacted topic with the new strategies (e.g. have to be
more careful with what records to keep)?

Jun


On Tue, Nov 26, 2019 at 10:24 PM Senthilnathan Muthusamy
 wrote:

> Jun,
>
> If the updated KIP looks good, can you please vote for it.
>
> Thanks,
> Senthil
>
> -Original Message-
> From: Jun Rao 
> Sent: Thursday, November 7, 2019 4:33 PM
> To: dev 
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hi, Senthil,
>
> Thanks for the KIP. Added a few more comments on the discussion thread.
>
> Jun
>
> On Wed, Nov 6, 2019 at 3:38 AM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
>
> > Thanks Matthias!
> >
> > Received 2 +1 binding... looking for one more +1 binding !
> >
> > Regards,
> > Senthil
> >
> > -Original Message-----
> > From: Matthias J. Sax 
> > Sent: Wednesday, November 6, 2019 12:10 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > +1 (binding)
> >
> > On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> > > Thanks Gouzhang and I have made a note in the JIRA item to update
> > > the
> > wiki.
> > >
> > > Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> > >
> > > Regards,
> > > Senthil
> > >
> > > -Original Message-
> > > From: Guozhang Wang 
> > > Sent: Monday, November 4, 2019 11:01 AM
> > > To: dev 
> > > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> > >
> > > I only have one minor comment on the DISCUSS thread, otherwise I'm
> > > +1
> > (binding).
> > >
> > > On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy <
> > senth...@microsoft.com.invalid> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I would like to start the vote on the updated KIP-280: Enhanced log
> > >> compaction. Thanks to Guozhang, Matthias & Tom for the valuable
> > >> feedback on the discussion thread...
> > >>
> > >> KIP:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
> > >> wi
> > >> k
> > >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanc
> > >> ed
> > >> %
> > >> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7Ca8
> > >> ca
> > >> 2
> > >> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> > >> 7C
> > >> 0
> > >> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO85
> > >> df
> > >> j
> > >> IY6pI%3Dreserved=0
> > >>
> > >> JIRA:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fi
> > >> ss
> > >> u
> > >> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csenth
> > >> il
> > >> m
> > >> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f14
> > >> 1a
> > >> f
> > >> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3XRR
> > >> z%
> > >> 2
> > >> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
> > >>
> > >> Thanks,
> > >> Senthil
> > >>
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
> >
>


RE: [VOTE] KIP-280: Enhanced log compaction

2019-11-26 Thread Senthilnathan Muthusamy
Jun,

If the updated KIP looks good, can you please vote for it.

Thanks,
Senthil

-Original Message-
From: Jun Rao  
Sent: Thursday, November 7, 2019 4:33 PM
To: dev 
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Hi, Senthil,

Thanks for the KIP. Added a few more comments on the discussion thread.

Jun

On Wed, Nov 6, 2019 at 3:38 AM Senthilnathan Muthusamy 
 wrote:

> Thanks Matthias!
>
> Received 2 +1 binding... looking for one more +1 binding !
>
> Regards,
> Senthil
>
> -Original Message-
> From: Matthias J. Sax 
> Sent: Wednesday, November 6, 2019 12:10 AM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> +1 (binding)
>
> On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> > Thanks Gouzhang and I have made a note in the JIRA item to update 
> > the
> wiki.
> >
> > Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> >
> > Regards,
> > Senthil
> >
> > -Original Message-
> > From: Guozhang Wang 
> > Sent: Monday, November 4, 2019 11:01 AM
> > To: dev 
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > I only have one minor comment on the DISCUSS thread, otherwise I'm 
> > +1
> (binding).
> >
> > On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
> >
> >> Hi all,
> >>
> >> I would like to start the vote on the updated KIP-280: Enhanced log 
> >> compaction. Thanks to Guozhang, Matthias & Tom for the valuable 
> >> feedback on the discussion thread...
> >>
> >> KIP:
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
> >> wi
> >> k
> >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanc
> >> ed
> >> %
> >> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7Ca8
> >> ca
> >> 2
> >> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> >> 7C
> >> 0
> >> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO85
> >> df
> >> j
> >> IY6pI%3Dreserved=0
> >>
> >> JIRA:
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fi
> >> ss
> >> u
> >> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csenth
> >> il
> >> m
> >> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f14
> >> 1a
> >> f
> >> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3XRR
> >> z%
> >> 2
> >> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
> >>
> >> Thanks,
> >> Senthil
> >>
> >
> >
> > --
> > -- Guozhang
> >
>
>


Re: [VOTE] KIP-280: Enhanced log compaction

2019-11-07 Thread Jun Rao
Hi, Senthil,

Thanks for the KIP. Added a few more comments on the discussion thread.

Jun

On Wed, Nov 6, 2019 at 3:38 AM Senthilnathan Muthusamy
 wrote:

> Thanks Matthias!
>
> Received 2 +1 binding... looking for one more +1 binding !
>
> Regards,
> Senthil
>
> -Original Message-
> From: Matthias J. Sax 
> Sent: Wednesday, November 6, 2019 12:10 AM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> +1 (binding)
>
> On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> > Thanks Gouzhang and I have made a note in the JIRA item to update the
> wiki.
> >
> > Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> >
> > Regards,
> > Senthil
> >
> > -Original Message-----
> > From: Guozhang Wang 
> > Sent: Monday, November 4, 2019 11:01 AM
> > To: dev 
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > I only have one minor comment on the DISCUSS thread, otherwise I'm +1
> (binding).
> >
> > On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy <
> senth...@microsoft.com.invalid> wrote:
> >
> >> Hi all,
> >>
> >> I would like to start the vote on the updated KIP-280: Enhanced log
> >> compaction. Thanks to Guozhang, Matthias & Tom for the valuable
> >> feedback on the discussion thread...
> >>
> >> KIP:
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
> >> k
> >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced
> >> %
> >> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7Ca8ca
> >> 2
> >> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> >> 0
> >> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO85df
> >> j
> >> IY6pI%3Dreserved=0
> >>
> >> JIRA:
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss
> >> u
> >> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csenthil
> >> m
> >> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f141a
> >> f
> >> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3XRRz%
> >> 2
> >> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
> >>
> >> Thanks,
> >> Senthil
> >>
> >
> >
> > --
> > -- Guozhang
> >
>
>


RE: [VOTE] KIP-280: Enhanced log compaction

2019-11-06 Thread Senthilnathan Muthusamy
Thanks Matthias! 

Received 2 +1 binding... looking for one more +1 binding !

Regards,
Senthil

-Original Message-
From: Matthias J. Sax  
Sent: Wednesday, November 6, 2019 12:10 AM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

+1 (binding)

On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> Thanks Gouzhang and I have made a note in the JIRA item to update the wiki.
> 
> Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> 
> Regards,
> Senthil
> 
> -Original Message-
> From: Guozhang Wang 
> Sent: Monday, November 4, 2019 11:01 AM
> To: dev 
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> 
> I only have one minor comment on the DISCUSS thread, otherwise I'm +1 
> (binding).
> 
> On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy 
>  wrote:
> 
>> Hi all,
>>
>> I would like to start the vote on the updated KIP-280: Enhanced log 
>> compaction. Thanks to Guozhang, Matthias & Tom for the valuable 
>> feedback on the discussion thread...
>>
>> KIP:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
>> k 
>> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced
>> %
>> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7Ca8ca
>> 2
>> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
>> 0 
>> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO85df
>> j
>> IY6pI%3Dreserved=0
>>
>> JIRA: 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss
>> u 
>> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csenthil
>> m 
>> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f141a
>> f
>> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3XRRz%
>> 2
>> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
>>
>> Thanks,
>> Senthil
>>
> 
> 
> --
> -- Guozhang
> 



Re: [VOTE] KIP-280: Enhanced log compaction

2019-11-06 Thread Matthias J. Sax
+1 (binding)

On 11/5/19 11:44 AM, Senthilnathan Muthusamy wrote:
> Thanks Gouzhang and I have made a note in the JIRA item to update the wiki.
> 
> Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!
> 
> Regards,
> Senthil
> 
> -Original Message-
> From: Guozhang Wang  
> Sent: Monday, November 4, 2019 11:01 AM
> To: dev 
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> 
> I only have one minor comment on the DISCUSS thread, otherwise I'm +1 
> (binding).
> 
> On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy 
>  wrote:
> 
>> Hi all,
>>
>> I would like to start the vote on the updated KIP-280: Enhanced log 
>> compaction. Thanks to Guozhang, Matthias & Tom for the valuable 
>> feedback on the discussion thread...
>>
>> KIP:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
>> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced%
>> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7Ca8ca2
>> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0
>> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO85dfj
>> IY6pI%3Dreserved=0
>>
>> JIRA: 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu
>> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csenthilm
>> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f141af
>> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3XRRz%2
>> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
>>
>> Thanks,
>> Senthil
>>
> 
> 
> --
> -- Guozhang
> 



signature.asc
Description: OpenPGP digital signature


RE: [VOTE] KIP-280: Enhanced log compaction

2019-11-05 Thread Senthilnathan Muthusamy
Thanks Gouzhang and I have made a note in the JIRA item to update the wiki.

Till now got 1 +1 binding... waiting for 2 more +1 binding... thnx!

Regards,
Senthil

-Original Message-
From: Guozhang Wang  
Sent: Monday, November 4, 2019 11:01 AM
To: dev 
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

I only have one minor comment on the DISCUSS thread, otherwise I'm +1 (binding).

On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy 
 wrote:

> Hi all,
>
> I would like to start the vote on the updated KIP-280: Enhanced log 
> compaction. Thanks to Guozhang, Matthias & Tom for the valuable 
> feedback on the discussion thread...
>
> KIP:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced%
> 2Blog%2Bcompactiondata=02%7C01%7Csenthilm%40microsoft.com%7Ca8ca2
> 5d3f1894d0d271f08d7615966d3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0
> %7C637085005478393331sdata=qrttmbYi2Ea4qfcF5qKVbn7CaYwmvRylO85dfj
> IY6pI%3Dreserved=0
>
> JIRA: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu
> es.apache.org%2Fjira%2Fbrowse%2FKAFKA-7061data=02%7C01%7Csenthilm
> %40microsoft.com%7Ca8ca25d3f1894d0d271f08d7615966d3%7C72f988bf86f141af
> 91ab2d7cd011db47%7C1%7C0%7C637085005478393331sdata=7c%2BzF3XRRz%2
> BijyyjBRntP6ZMWqnyzy4BEE8rqnZaF1s%3Dreserved=0
>
> Thanks,
> Senthil
>


--
-- Guozhang


Re: [VOTE] KIP-280: Enhanced log compaction

2019-11-04 Thread Guozhang Wang
I only have one minor comment on the DISCUSS thread, otherwise I'm +1
(binding).

On Mon, Nov 4, 2019 at 9:53 AM Senthilnathan Muthusamy
 wrote:

> Hi all,
>
> I would like to start the vote on the updated KIP-280: Enhanced log
> compaction. Thanks to Guozhang, Matthias & Tom for the valuable feedback on
> the discussion thread...
>
> KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction
>
> JIRA: https://issues.apache.org/jira/browse/KAFKA-7061
>
> Thanks,
> Senthil
>


-- 
-- Guozhang


RE: [VOTE] KIP-280: Enhanced log compaction

2018-08-16 Thread Bertus Greeff
I want to add my support for the Header strategy.  That's the one we care about 
as well and would very much want to have KIP-280 move forward.  Being able to 
provide an external sequence is very important.  Time is not precise enough in 
a distributed system but there are know good ways to have an absolute sequence.

-Original Message-
From: Luís Cabral  
Sent: Thursday, August 16, 2018 10:32 AM
To: dev@kafka.apache.org
Subject: RE: [VOTE] KIP-280: Enhanced log compaction

Hi,

@Guozhang & @Jun:
There were some left over comments from when the topic was still fresh (you may 
have to read the email chain to refresh your memory). Are these now clarified 
for you?

@Jason:
Was the reason opaque to you? I think we can avoid adding something so simple 
to the description.
As for the suggestion, that is indeed a wonderful idea. I suggest that you 
create a KIP and address the topic with the rest of the community.

Kind Regards,
Luis

From: Jason Gustafson
Sent: 14 August 2018 01:25
To: dev
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Hey Luis,

Thanks for the explanation. I'd suggest adding the use case to the motivation 
section.

I think my only hesitation about the header-based compaction is that it is the 
first time we are putting a schema requirement on header values. I wonder if 
it's better to leave Kafka agnostic. For example, maybe the compaction strategy 
could be a plugin which allows custom derivation of the compaction key. Say 
something like this:

class CompactionKey {
  byte[] key;
  long version;
}

interface CompactionStrategy {
  CompactionKey deriveCompactionKey(Record record); }

The idea is to leave schemas in the hands of users. Have you considered 
something like that?

Thanks,
Jason

On Sat, Aug 11, 2018 at 2:04 AM, Luís Cabral 
wrote:

> Hi Jason,
>
> The initial (and still only) requirement I wanted out of this KIP was 
> to have the header strategy.
> This is because I want to be able to version both by date/time or by 
> (externally provided) sequence, this is specially important if you are 
> running in multiple environments, which may cause the commonly known 
> issue of the clocks being slightly de-synchronized.
> Having the timestamp strategy was added to the KIP as the result of 
> the discussions, where it was seen as a potential benefit for other 
> clients who may prefer that.
>
> Cheers,
> Luís
>
> From: Jason Gustafson
> Sent: 10 August 2018 22:38
> To: dev
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hi Luis,
>
> It's still not very clear to me why we need the header-based strategy. 
> Can you elaborate why having the timestamp-based approach alone is not 
> sufficient? The use case in the motivation just describes a "most 
> recent snapshot" use case.
>
> Thanks,
> Jason
>
> On Thu, Aug 9, 2018 at 4:36 AM, Luís Cabral 
>  >
> wrote:
>
> > Hi,
> >
> >
> > So, after a "short" break, I've finally managed to find time to 
> > resume this KIP. Sorry to all for the delay.
> >
> > Continuing the conversation of the configurations being global vs  
> > topic, I've checked this and it seems that they are only available globally.
> >
> > This configuration is passed to the log cleaner via
> "CleanerConfig.scala",
> > which only accepts global configurations. This seems intentional, as 
> > the log cleaner is not mutable and doesn't get instantiated that 
> > often. I
> think
> > that changing this to accept per-topic configuration would be very 
> > nice, but perhaps as a part of a different KIP.
> >
> >
> > Following the Kafka documentation, these are the settings I'm 
> > referring
> to:
> >
> > -- --
> >
> > Updating Log Cleaner Configs
> >
> > Log cleaner configs may be updated dynamically at 
> > cluster-default level used by all brokers. The changes take effect 
> > on the next iteration
> of
> > log cleaning. One or more of these configs may be updated:
> >
> > * log.cleaner.threads
> >
> > * log.cleaner.io.max.bytes.per.second
> >
> > * log.cleaner.dedupe.buffer.size
> >
> > * log.cleaner.io.buffer.size
> >
> > * log.cleaner.io.buffer.load.factor
> >
> > * log.cleaner.backoff.ms
> >
> > -- --
> >
> >
> >
> > Please feel free to confirm, otherwise I will update the KIP to 
> > reflect these configuration nuances in the next few days.
> >
> >
> > Best Regards,
> >
> > Luis
> >
> >
> >
> > On Monday, July 9, 2018, 1:57:38 PM GMT+2, Andras Beni < 
> > andrasb...@cloudera.com.INVALID> wrot

RE: [VOTE] KIP-280: Enhanced log compaction

2018-08-16 Thread Luís Cabral
Hi,

@Guozhang & @Jun:
There were some left over comments from when the topic was still fresh (you may 
have to read the email chain to refresh your memory). Are these now clarified 
for you?

@Jason:
Was the reason opaque to you? I think we can avoid adding something so simple 
to the description.
As for the suggestion, that is indeed a wonderful idea. I suggest that you 
create a KIP and address the topic with the rest of the community.

Kind Regards,
Luis

From: Jason Gustafson
Sent: 14 August 2018 01:25
To: dev
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Hey Luis,

Thanks for the explanation. I'd suggest adding the use case to the
motivation section.

I think my only hesitation about the header-based compaction is that it is
the first time we are putting a schema requirement on header values. I
wonder if it's better to leave Kafka agnostic. For example, maybe the
compaction strategy could be a plugin which allows custom derivation of the
compaction key. Say something like this:

class CompactionKey {
  byte[] key;
  long version;
}

interface CompactionStrategy {
  CompactionKey deriveCompactionKey(Record record);
}

The idea is to leave schemas in the hands of users. Have you considered
something like that?

Thanks,
Jason

On Sat, Aug 11, 2018 at 2:04 AM, Luís Cabral 
wrote:

> Hi Jason,
>
> The initial (and still only) requirement I wanted out of this KIP was to
> have the header strategy.
> This is because I want to be able to version both by date/time or by
> (externally provided) sequence, this is specially important if you are
> running in multiple environments, which may cause the commonly known issue
> of the clocks being slightly de-synchronized.
> Having the timestamp strategy was added to the KIP as the result of the
> discussions, where it was seen as a potential benefit for other clients who
> may prefer that.
>
> Cheers,
> Luís
>
> From: Jason Gustafson
> Sent: 10 August 2018 22:38
> To: dev
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hi Luis,
>
> It's still not very clear to me why we need the header-based strategy. Can
> you elaborate why having the timestamp-based approach alone is not
> sufficient? The use case in the motivation just describes a "most recent
> snapshot" use case.
>
> Thanks,
> Jason
>
> On Thu, Aug 9, 2018 at 4:36 AM, Luís Cabral  >
> wrote:
>
> > Hi,
> >
> >
> > So, after a "short" break, I've finally managed to find time to resume
> > this KIP. Sorry to all for the delay.
> >
> > Continuing the conversation of the configurations being global vs  topic,
> > I've checked this and it seems that they are only available globally.
> >
> > This configuration is passed to the log cleaner via
> "CleanerConfig.scala",
> > which only accepts global configurations. This seems intentional, as the
> > log cleaner is not mutable and doesn't get instantiated that often. I
> think
> > that changing this to accept per-topic configuration would be very nice,
> > but perhaps as a part of a different KIP.
> >
> >
> > Following the Kafka documentation, these are the settings I'm referring
> to:
> >
> > -- --
> >
> > Updating Log Cleaner Configs
> >
> > Log cleaner configs may be updated dynamically at cluster-default
> > level used by all brokers. The changes take effect on the next iteration
> of
> > log cleaning. One or more of these configs may be updated:
> >
> > * log.cleaner.threads
> >
> > * log.cleaner.io.max.bytes.per.second
> >
> > * log.cleaner.dedupe.buffer.size
> >
> > * log.cleaner.io.buffer.size
> >
> > * log.cleaner.io.buffer.load.factor
> >
> > * log.cleaner.backoff.ms
> >
> > -- --
> >
> >
> >
> > Please feel free to confirm, otherwise I will update the KIP to reflect
> > these configuration nuances in the next few days.
> >
> >
> > Best Regards,
> >
> > Luis
> >
> >
> >
> > On Monday, July 9, 2018, 1:57:38 PM GMT+2, Andras Beni <
> > andrasb...@cloudera.com.INVALID> wrote:
> >
> >
> >
> >
> >
> > Hi Luís,
> >
> > Can you please clarify how the header value has to be encoded in case log
> > compaction strategy is 'header'. As I see current PR reads varLong in
> > CleanerCache.extractVersion and read String and uses toLong in
> > Cleaner.extractVersion while the KIP says no more than 'the header value
> > (which must be of type "long")'.
> >
> > Otherwise +1 for the KIP
> >
> > As for current implementation: i

Re: [VOTE] KIP-280: Enhanced log compaction

2018-08-13 Thread Jason Gustafson
Hey Luis,

Thanks for the explanation. I'd suggest adding the use case to the
motivation section.

I think my only hesitation about the header-based compaction is that it is
the first time we are putting a schema requirement on header values. I
wonder if it's better to leave Kafka agnostic. For example, maybe the
compaction strategy could be a plugin which allows custom derivation of the
compaction key. Say something like this:

class CompactionKey {
  byte[] key;
  long version;
}

interface CompactionStrategy {
  CompactionKey deriveCompactionKey(Record record);
}

The idea is to leave schemas in the hands of users. Have you considered
something like that?

Thanks,
Jason

On Sat, Aug 11, 2018 at 2:04 AM, Luís Cabral 
wrote:

> Hi Jason,
>
> The initial (and still only) requirement I wanted out of this KIP was to
> have the header strategy.
> This is because I want to be able to version both by date/time or by
> (externally provided) sequence, this is specially important if you are
> running in multiple environments, which may cause the commonly known issue
> of the clocks being slightly de-synchronized.
> Having the timestamp strategy was added to the KIP as the result of the
> discussions, where it was seen as a potential benefit for other clients who
> may prefer that.
>
> Cheers,
> Luís
>
> From: Jason Gustafson
> Sent: 10 August 2018 22:38
> To: dev
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hi Luis,
>
> It's still not very clear to me why we need the header-based strategy. Can
> you elaborate why having the timestamp-based approach alone is not
> sufficient? The use case in the motivation just describes a "most recent
> snapshot" use case.
>
> Thanks,
> Jason
>
> On Thu, Aug 9, 2018 at 4:36 AM, Luís Cabral  >
> wrote:
>
> > Hi,
> >
> >
> > So, after a "short" break, I've finally managed to find time to resume
> > this KIP. Sorry to all for the delay.
> >
> > Continuing the conversation of the configurations being global vs  topic,
> > I've checked this and it seems that they are only available globally.
> >
> > This configuration is passed to the log cleaner via
> "CleanerConfig.scala",
> > which only accepts global configurations. This seems intentional, as the
> > log cleaner is not mutable and doesn't get instantiated that often. I
> think
> > that changing this to accept per-topic configuration would be very nice,
> > but perhaps as a part of a different KIP.
> >
> >
> > Following the Kafka documentation, these are the settings I'm referring
> to:
> >
> > -- --
> >
> > Updating Log Cleaner Configs
> >
> > Log cleaner configs may be updated dynamically at cluster-default
> > level used by all brokers. The changes take effect on the next iteration
> of
> > log cleaning. One or more of these configs may be updated:
> >
> > * log.cleaner.threads
> >
> > * log.cleaner.io.max.bytes.per.second
> >
> > * log.cleaner.dedupe.buffer.size
> >
> > * log.cleaner.io.buffer.size
> >
> > * log.cleaner.io.buffer.load.factor
> >
> > * log.cleaner.backoff.ms
> >
> > -- --
> >
> >
> >
> > Please feel free to confirm, otherwise I will update the KIP to reflect
> > these configuration nuances in the next few days.
> >
> >
> > Best Regards,
> >
> > Luis
> >
> >
> >
> > On Monday, July 9, 2018, 1:57:38 PM GMT+2, Andras Beni <
> > andrasb...@cloudera.com.INVALID> wrote:
> >
> >
> >
> >
> >
> > Hi Luís,
> >
> > Can you please clarify how the header value has to be encoded in case log
> > compaction strategy is 'header'. As I see current PR reads varLong in
> > CleanerCache.extractVersion and read String and uses toLong in
> > Cleaner.extractVersion while the KIP says no more than 'the header value
> > (which must be of type "long")'.
> >
> > Otherwise +1 for the KIP
> >
> > As for current implementation: it seems in Cleaner class header key
> > "version" is hardwired.
> >
> > Andras
> >
> >
> >
> > On Fri, Jul 6, 2018 at 10:36 PM Jun Rao  wrote:
> >
> > > Hi, Guozhang,
> > >
> > > For #4, what you suggested could make sense for timestamp based de-dup,
> > but
> > > not sure how general it is since the KIP also supports de-dup based on
> > > header.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Jul 6, 

RE: [VOTE] KIP-280: Enhanced log compaction

2018-08-11 Thread Luís Cabral
Hi Jason,

The initial (and still only) requirement I wanted out of this KIP was to have 
the header strategy.
This is because I want to be able to version both by date/time or by 
(externally provided) sequence, this is specially important if you are running 
in multiple environments, which may cause the commonly known issue of the 
clocks being slightly de-synchronized.
Having the timestamp strategy was added to the KIP as the result of the 
discussions, where it was seen as a potential benefit for other clients who may 
prefer that.

Cheers,
Luís

From: Jason Gustafson
Sent: 10 August 2018 22:38
To: dev
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Hi Luis,

It's still not very clear to me why we need the header-based strategy. Can
you elaborate why having the timestamp-based approach alone is not
sufficient? The use case in the motivation just describes a "most recent
snapshot" use case.

Thanks,
Jason

On Thu, Aug 9, 2018 at 4:36 AM, Luís Cabral 
wrote:

> Hi,
>
>
> So, after a "short" break, I've finally managed to find time to resume
> this KIP. Sorry to all for the delay.
>
> Continuing the conversation of the configurations being global vs  topic,
> I've checked this and it seems that they are only available globally.
>
> This configuration is passed to the log cleaner via "CleanerConfig.scala",
> which only accepts global configurations. This seems intentional, as the
> log cleaner is not mutable and doesn't get instantiated that often. I think
> that changing this to accept per-topic configuration would be very nice,
> but perhaps as a part of a different KIP.
>
>
> Following the Kafka documentation, these are the settings I'm referring to:
>
> -- --
>
> Updating Log Cleaner Configs
>
> Log cleaner configs may be updated dynamically at cluster-default
> level used by all brokers. The changes take effect on the next iteration of
> log cleaning. One or more of these configs may be updated:
>
> * log.cleaner.threads
>
> * log.cleaner.io.max.bytes.per.second
>
> * log.cleaner.dedupe.buffer.size
>
> * log.cleaner.io.buffer.size
>
> * log.cleaner.io.buffer.load.factor
>
> * log.cleaner.backoff.ms
>
> -- --
>
>
>
> Please feel free to confirm, otherwise I will update the KIP to reflect
> these configuration nuances in the next few days.
>
>
> Best Regards,
>
> Luis
>
>
>
> On Monday, July 9, 2018, 1:57:38 PM GMT+2, Andras Beni <
> andrasb...@cloudera.com.INVALID> wrote:
>
>
>
>
>
> Hi Luís,
>
> Can you please clarify how the header value has to be encoded in case log
> compaction strategy is 'header'. As I see current PR reads varLong in
> CleanerCache.extractVersion and read String and uses toLong in
> Cleaner.extractVersion while the KIP says no more than 'the header value
> (which must be of type "long")'.
>
> Otherwise +1 for the KIP
>
> As for current implementation: it seems in Cleaner class header key
> "version" is hardwired.
>
> Andras
>
>
>
> On Fri, Jul 6, 2018 at 10:36 PM Jun Rao  wrote:
>
> > Hi, Guozhang,
> >
> > For #4, what you suggested could make sense for timestamp based de-dup,
> but
> > not sure how general it is since the KIP also supports de-dup based on
> > header.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jul 6, 2018 at 1:12 PM, Guozhang Wang 
> wrote:
> >
> > > Hello Jun,
> > > Thanks for your feedbacks. I'd agree on #3 that it's worth adding a
> > special
> > > check to not delete the last message, since although unlikely, it is
> > still
> > > possible that a new active segment gets rolled out but contains no data
> > > yet, and hence the actual last message in this case would be in a
> > > "compact-able" segment.
> > >
> > > For the second part of #4 you raised, maybe we could educate users to
> > set "
> > > message.timestamp.difference.max.ms" to be no larger than "
> > > log.cleaner.delete.retention.ms" (its default value is
> Long.MAX_VALUE)?
> > A
> > > more aggressive approach would be changing the default value of the
> > former
> > > to be the value of the latter if:
> > >
> > > 1. cleanup.policy = compact OR compact,delete
> > > 2. log.cleaner.compaction.strategy != offset
> > >
> > > Because in this case I think it makes sense to really allow users send
> > any
> > > data longer than "log.cleaner.delete.retention.ms", WDYT?
> > >
> > >
> > > Guozhang
> > >

Re: [VOTE] KIP-280: Enhanced log compaction

2018-08-10 Thread Jason Gustafson
 be useful to document this impact in the wiki and the
> > > > release notes.
> > > >
> > > > 3. Yes, it's unlikely for the last message to be removed in the
> current
> > > > implementation since we never clean the active segment. However, in
> > > theory,
> > > > this can happen. So it would be useful to guard this explicitly.
> > > >
> > > > 4. Just thought about another issue. We probably want to be a bit
> > careful
> > > > with key deletion. Currently, one can delete a key by sending a
> message
> > > > with a delete tombstone (a null payload). To prevent a reader from
> > > missing
> > > > a deletion if it's removed too quickly, we depend on a configuration
> > > > log.cleaner.delete.retention.ms (defaults to 1 day). The delete
> > > tombstone
> > > > will only be physically removed from the log after that amount of
> time.
> > > The
> > > > expectation is that a reader should finish reading to the end of the
> > log
> > > > after consuming a message within that configured time. With the new
> > > > strategy, we have similar, but slightly different problems. The first
> > > > problem is that the delete tombstone may be delivered earlier than an
> > > > outdated record in offset order to a consumer. In order for the
> > consumer
> > > > not to take the outdated record, the consumer should cache the
> deletion
> > > > tombstone for some configured amount of time. We ca probably
> piggyback
> > > this
> > > > on log.cleaner.delete.retention.ms, but we need to document this.
> The
> > > > second problem is that once the delete tombstone is physically
> removed
> > > from
> > > > the log, how can we prevent outdated records to be added (otherwise,
> > they
> > > > will never be garbage collected)? Not sure what's the best way to do
> > > this.
> > > > One possible way is to push this back to the application and require
> > the
> > > > user not to publish outdated records after
> > log.cleaner.delete.retention.
> > > ms
> > > > .
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Jul 4, 2018 at 11:11 AM, Luís Cabral
> > >  > > > >
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > -:  1. I guess both new configurations will be at the topic level?
> > > > >
> > > > > They will exist in the global configuration, at the very least.
> > > > > I would like to have them on the topic level as well, but there is
> an
> > > > > inconsistency between the cleanup/compaction properties that exist
> > > “only
> > > > > globally” vs “globally + per topic”.
> > > > > I haven’t gotten around to investigating why, and if that reason
> > would
> > > > > then also impact the properties I’m suggesting. At first glance
> they
> > > seem
> > > > > to belong with the properties that are "only globally” configured,
> > but
> > > > > Guozhang has written earlier with a suggestion of a compaction
> > property
> > > > > that works for both (though I haven’t had the time to look into it
> > yet,
> > > > > unfortunately).
> > > > >
> > > > > -:  2. Since the log cleaner now needs to keep both the offset and
> > > > another
> > > > > long (say timestamp) in the de-dup map, it reduces the number of
> keys
> > > > that
> > > > > we can keep in the map and thus may require more rounds of
> cleaning.
> > > This
> > > > > is probably not a big issue, but it will be useful to document this
> > > > impact
> > > > > in the KIP.
> > > > >
> > > > > As a reader, I tend to prefer brief documentation on new features
> > (they
> > > > > tend to be too many for me to find the willpower to read a 200-page
> > > essay
> > > > > about each one), so that influences me to avoid writing about every
> > > > > micro-impact that may exist, and simply leave it inferred (as you
> > have
> > > > just
> > > > > done).
> > > > > But I also don’t feel strongly enough about it to argue either way.
> > So,
> > > > > after reading my argumen

Re: [VOTE] KIP-280: Enhanced log compaction

2018-08-09 Thread Luís Cabral
is that a reader should finish reading to the end of the
> log
> > > after consuming a message within that configured time. With the new
> > > strategy, we have similar, but slightly different problems. The first
> > > problem is that the delete tombstone may be delivered earlier than an
> > > outdated record in offset order to a consumer. In order for the
> consumer
> > > not to take the outdated record, the consumer should cache the deletion
> > > tombstone for some configured amount of time. We ca probably piggyback
> > this
> > > on log.cleaner.delete.retention.ms, but we need to document this. The
> > > second problem is that once the delete tombstone is physically removed
> > from
> > > the log, how can we prevent outdated records to be added (otherwise,
> they
> > > will never be garbage collected)? Not sure what's the best way to do
> > this.
> > > One possible way is to push this back to the application and require
> the
> > > user not to publish outdated records after
> log.cleaner.delete.retention.
> > ms
> > > .
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Jul 4, 2018 at 11:11 AM, Luís Cabral
> >  > > >
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > -:  1. I guess both new configurations will be at the topic level?
> > > >
> > > > They will exist in the global configuration, at the very least.
> > > > I would like to have them on the topic level as well, but there is an
> > > > inconsistency between the cleanup/compaction properties that exist
> > “only
> > > > globally” vs “globally + per topic”.
> > > > I haven’t gotten around to investigating why, and if that reason
> would
> > > > then also impact the properties I’m suggesting. At first glance they
> > seem
> > > > to belong with the properties that are "only globally” configured,
> but
> > > > Guozhang has written earlier with a suggestion of a compaction
> property
> > > > that works for both (though I haven’t had the time to look into it
> yet,
> > > > unfortunately).
> > > >
> > > > -:  2. Since the log cleaner now needs to keep both the offset and
> > > another
> > > > long (say timestamp) in the de-dup map, it reduces the number of keys
> > > that
> > > > we can keep in the map and thus may require more rounds of cleaning.
> > This
> > > > is probably not a big issue, but it will be useful to document this
> > > impact
> > > > in the KIP.
> > > >
> > > > As a reader, I tend to prefer brief documentation on new features
> (they
> > > > tend to be too many for me to find the willpower to read a 200-page
> > essay
> > > > about each one), so that influences me to avoid writing about every
> > > > micro-impact that may exist, and simply leave it inferred (as you
> have
> > > just
> > > > done).
> > > > But I also don’t feel strongly enough about it to argue either way.
> So,
> > > > after reading my argument, if you still insist, I’ll happily add this
> > > there.
> > > >
> > > > -: 3. With the new cleaning strategy, we want to be a bit careful
> with
> > > > removing the last message in a partition (which is possible now). We
> > need
> > > > to preserve the offset of the last message so that we don't reuse the
> > > > offset for a different message. One way to simply never remove the
> last
> > > > message. Another way (suggested by Jason) is to create an empty
> message
> > > > batch.
> > > >
> > > > That is a good point, but isn’t the last message always kept
> > regardless?
> > > > In all of my tests with this approach, never have I seen it being
> > > removed.
> > > > This is not because I made it so while changing the code, it was
> simply
> > > > like this before, which made me smile!
> > > > Given these results, I just *assumed* (oops) that these scenarios
> > > > represented the reality, so the compaction would only affected the
> > > “tail”,
> > > > while the “head” remained untouched. Now that you say its possible
> that
> > > the
> > > > last message actually gets overwritten somehow, I guess a new bullet
> > > point
> > > > will have to be added to the KIP for this (after I’ve found the time

Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-09 Thread Andras Beni
; > >
> > > > -:  1. I guess both new configurations will be at the topic level?
> > > >
> > > > They will exist in the global configuration, at the very least.
> > > > I would like to have them on the topic level as well, but there is an
> > > > inconsistency between the cleanup/compaction properties that exist
> > “only
> > > > globally” vs “globally + per topic”.
> > > > I haven’t gotten around to investigating why, and if that reason
> would
> > > > then also impact the properties I’m suggesting. At first glance they
> > seem
> > > > to belong with the properties that are "only globally” configured,
> but
> > > > Guozhang has written earlier with a suggestion of a compaction
> property
> > > > that works for both (though I haven’t had the time to look into it
> yet,
> > > > unfortunately).
> > > >
> > > > -:  2. Since the log cleaner now needs to keep both the offset and
> > > another
> > > > long (say timestamp) in the de-dup map, it reduces the number of keys
> > > that
> > > > we can keep in the map and thus may require more rounds of cleaning.
> > This
> > > > is probably not a big issue, but it will be useful to document this
> > > impact
> > > > in the KIP.
> > > >
> > > > As a reader, I tend to prefer brief documentation on new features
> (they
> > > > tend to be too many for me to find the willpower to read a 200-page
> > essay
> > > > about each one), so that influences me to avoid writing about every
> > > > micro-impact that may exist, and simply leave it inferred (as you
> have
> > > just
> > > > done).
> > > > But I also don’t feel strongly enough about it to argue either way.
> So,
> > > > after reading my argument, if you still insist, I’ll happily add this
> > > there.
> > > >
> > > > -: 3. With the new cleaning strategy, we want to be a bit careful
> with
> > > > removing the last message in a partition (which is possible now). We
> > need
> > > > to preserve the offset of the last message so that we don't reuse the
> > > > offset for a different message. One way to simply never remove the
> last
> > > > message. Another way (suggested by Jason) is to create an empty
> message
> > > > batch.
> > > >
> > > > That is a good point, but isn’t the last message always kept
> > regardless?
> > > > In all of my tests with this approach, never have I seen it being
> > > removed.
> > > > This is not because I made it so while changing the code, it was
> simply
> > > > like this before, which made me smile!
> > > > Given these results, I just *assumed* (oops) that these scenarios
> > > > represented the reality, so the compaction would only affected the
> > > “tail”,
> > > > while the “head” remained untouched. Now that you say its possible
> that
> > > the
> > > > last message actually gets overwritten somehow, I guess a new bullet
> > > point
> > > > will have to be added to the KIP for this (after I’ve found the time
> to
> > > > review the portion of the code that enacts this behaviour).
> > > >
> > > > Kind Regards,
> > > > Luís Cabral
> > > >
> > > > From: Jun Rao
> > > > Sent: 03 July 2018 23:58
> > > > To: dev
> > > > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> > > >
> > > > Hi, Luis,
> > > >
> > > > Thanks for the KIP. Overall, this seems a useful KIP. A few comments
> > > below.
> > > >
> > > > 1. I guess both new configurations will be at the topic level?
> > > > 2. Since the log cleaner now needs to keep both the offset and
> another
> > > long
> > > > (say timestamp) in the de-dup map, it reduces the number of keys that
> > we
> > > > can keep in the map and thus may require more rounds of cleaning.
> This
> > is
> > > > probably not a big issue, but it will be useful to document this
> impact
> > > in
> > > > the KIP.
> > > > 3. With the new cleaning strategy, we want to be a bit careful with
> > > > removing the last message in a partition (which is possible now). We
> > need
> > > > to preserve the offset of the last message so that we don't reuse the
> > > > offset for a different message. One way to simply never remove the
> last
> > > > message. Another way (suggested by Jason) is to create an empty
> message
> > > > batch.
> > > >
> > > > Jun
> > > >
> > > > On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
> > >  > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Any takers on having a look at this KIP and voting on it?
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 280%3A+Enhanced+log+compaction
> > > > >
> > > > > Cheers,
> > > > > Luis
> > > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-06 Thread Jun Rao
p, it reduces the number of keys
> > that
> > > we can keep in the map and thus may require more rounds of cleaning.
> This
> > > is probably not a big issue, but it will be useful to document this
> > impact
> > > in the KIP.
> > >
> > > As a reader, I tend to prefer brief documentation on new features (they
> > > tend to be too many for me to find the willpower to read a 200-page
> essay
> > > about each one), so that influences me to avoid writing about every
> > > micro-impact that may exist, and simply leave it inferred (as you have
> > just
> > > done).
> > > But I also don’t feel strongly enough about it to argue either way. So,
> > > after reading my argument, if you still insist, I’ll happily add this
> > there.
> > >
> > > -: 3. With the new cleaning strategy, we want to be a bit careful with
> > > removing the last message in a partition (which is possible now). We
> need
> > > to preserve the offset of the last message so that we don't reuse the
> > > offset for a different message. One way to simply never remove the last
> > > message. Another way (suggested by Jason) is to create an empty message
> > > batch.
> > >
> > > That is a good point, but isn’t the last message always kept
> regardless?
> > > In all of my tests with this approach, never have I seen it being
> > removed.
> > > This is not because I made it so while changing the code, it was simply
> > > like this before, which made me smile!
> > > Given these results, I just *assumed* (oops) that these scenarios
> > > represented the reality, so the compaction would only affected the
> > “tail”,
> > > while the “head” remained untouched. Now that you say its possible that
> > the
> > > last message actually gets overwritten somehow, I guess a new bullet
> > point
> > > will have to be added to the KIP for this (after I’ve found the time to
> > > review the portion of the code that enacts this behaviour).
> > >
> > > Kind Regards,
> > > Luís Cabral
> > >
> > > From: Jun Rao
> > > Sent: 03 July 2018 23:58
> > > To: dev
> > > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> > >
> > > Hi, Luis,
> > >
> > > Thanks for the KIP. Overall, this seems a useful KIP. A few comments
> > below.
> > >
> > > 1. I guess both new configurations will be at the topic level?
> > > 2. Since the log cleaner now needs to keep both the offset and another
> > long
> > > (say timestamp) in the de-dup map, it reduces the number of keys that
> we
> > > can keep in the map and thus may require more rounds of cleaning. This
> is
> > > probably not a big issue, but it will be useful to document this impact
> > in
> > > the KIP.
> > > 3. With the new cleaning strategy, we want to be a bit careful with
> > > removing the last message in a partition (which is possible now). We
> need
> > > to preserve the offset of the last message so that we don't reuse the
> > > offset for a different message. One way to simply never remove the last
> > > message. Another way (suggested by Jason) is to create an empty message
> > > batch.
> > >
> > > Jun
> > >
> > > On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
> >  > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Any takers on having a look at this KIP and voting on it?
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 280%3A+Enhanced+log+compaction
> > > >
> > > > Cheers,
> > > > Luis
> > > >
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-06 Thread Ismael Juma
Thanks for the KIP, Luis. A brief comment below.

On Wed, Jul 4, 2018 at 11:11 AM Luís Cabral 
wrote:

> As a reader, I tend to prefer brief documentation on new features (they
> tend to be too many for me to find the willpower to read a 200-page essay
> about each one), so that influences me to avoid writing about every
> micro-impact that may exist, and simply leave it inferred (as you have just
> done).
> But I also don’t feel strongly enough about it to argue either way. So,
> after reading my argument, if you still insist, I’ll happily add this there.
>

KIPs are not your typical user level documentation. We strive to document
details like the one Jun pointed out as they're beneficial during review,
but also from understanding the operations impact.

Ismael


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-06 Thread Guozhang Wang
insist, I’ll happily add this
> there.
> >
> > -: 3. With the new cleaning strategy, we want to be a bit careful with
> > removing the last message in a partition (which is possible now). We need
> > to preserve the offset of the last message so that we don't reuse the
> > offset for a different message. One way to simply never remove the last
> > message. Another way (suggested by Jason) is to create an empty message
> > batch.
> >
> > That is a good point, but isn’t the last message always kept regardless?
> > In all of my tests with this approach, never have I seen it being
> removed.
> > This is not because I made it so while changing the code, it was simply
> > like this before, which made me smile!
> > Given these results, I just *assumed* (oops) that these scenarios
> > represented the reality, so the compaction would only affected the
> “tail”,
> > while the “head” remained untouched. Now that you say its possible that
> the
> > last message actually gets overwritten somehow, I guess a new bullet
> point
> > will have to be added to the KIP for this (after I’ve found the time to
> > review the portion of the code that enacts this behaviour).
> >
> > Kind Regards,
> > Luís Cabral
> >
> > From: Jun Rao
> > Sent: 03 July 2018 23:58
> > To: dev
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > Hi, Luis,
> >
> > Thanks for the KIP. Overall, this seems a useful KIP. A few comments
> below.
> >
> > 1. I guess both new configurations will be at the topic level?
> > 2. Since the log cleaner now needs to keep both the offset and another
> long
> > (say timestamp) in the de-dup map, it reduces the number of keys that we
> > can keep in the map and thus may require more rounds of cleaning. This is
> > probably not a big issue, but it will be useful to document this impact
> in
> > the KIP.
> > 3. With the new cleaning strategy, we want to be a bit careful with
> > removing the last message in a partition (which is possible now). We need
> > to preserve the offset of the last message so that we don't reuse the
> > offset for a different message. One way to simply never remove the last
> > message. Another way (suggested by Jason) is to create an empty message
> > batch.
> >
> > Jun
> >
> > On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
>  > >
> > wrote:
> >
> > > Hi all,
> > >
> > > Any takers on having a look at this KIP and voting on it?
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 280%3A+Enhanced+log+compaction
> > >
> > > Cheers,
> > > Luis
> > >
> >
> >
>



-- 
-- Guozhang


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-06 Thread Jun Rao
Hi, Luis,

1. The cleaning policy is configurable at both global and topic level. The
global one has the name log.cleanup.policy and the topic level has the name
cleanup.policy by just stripping the log prefix. We can probably do the
same for the new configs.

2. Since this KIP may require an admin to configure a larger dedup buffer
size, it would be useful to document this impact in the wiki and the
release notes.

3. Yes, it's unlikely for the last message to be removed in the current
implementation since we never clean the active segment. However, in theory,
this can happen. So it would be useful to guard this explicitly.

4. Just thought about another issue. We probably want to be a bit careful
with key deletion. Currently, one can delete a key by sending a message
with a delete tombstone (a null payload). To prevent a reader from missing
a deletion if it's removed too quickly, we depend on a configuration
log.cleaner.delete.retention.ms (defaults to 1 day). The delete tombstone
will only be physically removed from the log after that amount of time. The
expectation is that a reader should finish reading to the end of the log
after consuming a message within that configured time. With the new
strategy, we have similar, but slightly different problems. The first
problem is that the delete tombstone may be delivered earlier than an
outdated record in offset order to a consumer. In order for the consumer
not to take the outdated record, the consumer should cache the deletion
tombstone for some configured amount of time. We ca probably piggyback this
on log.cleaner.delete.retention.ms, but we need to document this. The
second problem is that once the delete tombstone is physically removed from
the log, how can we prevent outdated records to be added (otherwise, they
will never be garbage collected)? Not sure what's the best way to do this.
One possible way is to push this back to the application and require the
user not to publish outdated records after log.cleaner.delete.retention.ms.

Thanks,

Jun

On Wed, Jul 4, 2018 at 11:11 AM, Luís Cabral 
wrote:

> Hi Jun,
>
> -:  1. I guess both new configurations will be at the topic level?
>
> They will exist in the global configuration, at the very least.
> I would like to have them on the topic level as well, but there is an
> inconsistency between the cleanup/compaction properties that exist “only
> globally” vs “globally + per topic”.
> I haven’t gotten around to investigating why, and if that reason would
> then also impact the properties I’m suggesting. At first glance they seem
> to belong with the properties that are "only globally” configured, but
> Guozhang has written earlier with a suggestion of a compaction property
> that works for both (though I haven’t had the time to look into it yet,
> unfortunately).
>
> -:  2. Since the log cleaner now needs to keep both the offset and another
> long (say timestamp) in the de-dup map, it reduces the number of keys that
> we can keep in the map and thus may require more rounds of cleaning. This
> is probably not a big issue, but it will be useful to document this impact
> in the KIP.
>
> As a reader, I tend to prefer brief documentation on new features (they
> tend to be too many for me to find the willpower to read a 200-page essay
> about each one), so that influences me to avoid writing about every
> micro-impact that may exist, and simply leave it inferred (as you have just
> done).
> But I also don’t feel strongly enough about it to argue either way. So,
> after reading my argument, if you still insist, I’ll happily add this there.
>
> -: 3. With the new cleaning strategy, we want to be a bit careful with
> removing the last message in a partition (which is possible now). We need
> to preserve the offset of the last message so that we don't reuse the
> offset for a different message. One way to simply never remove the last
> message. Another way (suggested by Jason) is to create an empty message
> batch.
>
> That is a good point, but isn’t the last message always kept regardless?
> In all of my tests with this approach, never have I seen it being removed.
> This is not because I made it so while changing the code, it was simply
> like this before, which made me smile!
> Given these results, I just *assumed* (oops) that these scenarios
> represented the reality, so the compaction would only affected the “tail”,
> while the “head” remained untouched. Now that you say its possible that the
> last message actually gets overwritten somehow, I guess a new bullet point
> will have to be added to the KIP for this (after I’ve found the time to
> review the portion of the code that enacts this behaviour).
>
> Kind Regards,
> Luís Cabral
>
> From: Jun Rao
> Sent: 03 July 2018 23:58
> To: dev
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>

RE: [VOTE] KIP-280: Enhanced log compaction

2018-07-04 Thread Luís Cabral
Hi Jun,

-:  1. I guess both new configurations will be at the topic level?

They will exist in the global configuration, at the very least.
I would like to have them on the topic level as well, but there is an 
inconsistency between the cleanup/compaction properties that exist “only 
globally” vs “globally + per topic”.
I haven’t gotten around to investigating why, and if that reason would then 
also impact the properties I’m suggesting. At first glance they seem to belong 
with the properties that are "only globally” configured, but Guozhang has 
written earlier with a suggestion of a compaction property that works for both 
(though I haven’t had the time to look into it yet, unfortunately).

-:  2. Since the log cleaner now needs to keep both the offset and another long 
(say timestamp) in the de-dup map, it reduces the number of keys that we can 
keep in the map and thus may require more rounds of cleaning. This is probably 
not a big issue, but it will be useful to document this impact in the KIP.

As a reader, I tend to prefer brief documentation on new features (they tend to 
be too many for me to find the willpower to read a 200-page essay about each 
one), so that influences me to avoid writing about every micro-impact that may 
exist, and simply leave it inferred (as you have just done).
But I also don’t feel strongly enough about it to argue either way. So, after 
reading my argument, if you still insist, I’ll happily add this there.

-: 3. With the new cleaning strategy, we want to be a bit careful with removing 
the last message in a partition (which is possible now). We need to preserve 
the offset of the last message so that we don't reuse the offset for a 
different message. One way to simply never remove the last message. Another way 
(suggested by Jason) is to create an empty message batch.

That is a good point, but isn’t the last message always kept regardless? In all 
of my tests with this approach, never have I seen it being removed. This is not 
because I made it so while changing the code, it was simply like this before, 
which made me smile!
Given these results, I just *assumed* (oops) that these scenarios represented 
the reality, so the compaction would only affected the “tail”, while the “head” 
remained untouched. Now that you say its possible that the last message 
actually gets overwritten somehow, I guess a new bullet point will have to be 
added to the KIP for this (after I’ve found the time to review the portion of 
the code that enacts this behaviour).

Kind Regards,
Luís Cabral

From: Jun Rao
Sent: 03 July 2018 23:58
To: dev
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Hi, Luis,

Thanks for the KIP. Overall, this seems a useful KIP. A few comments below.

1. I guess both new configurations will be at the topic level?
2. Since the log cleaner now needs to keep both the offset and another long
(say timestamp) in the de-dup map, it reduces the number of keys that we
can keep in the map and thus may require more rounds of cleaning. This is
probably not a big issue, but it will be useful to document this impact in
the KIP.
3. With the new cleaning strategy, we want to be a bit careful with
removing the last message in a partition (which is possible now). We need
to preserve the offset of the last message so that we don't reuse the
offset for a different message. One way to simply never remove the last
message. Another way (suggested by Jason) is to create an empty message
batch.

Jun

On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral 
wrote:

> Hi all,
>
> Any takers on having a look at this KIP and voting on it?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%3A+Enhanced+log+compaction
>
> Cheers,
> Luis
>



RE: [VOTE] KIP-280: Enhanced log compaction

2018-07-04 Thread Luís Cabral
Hi Jason,

There’s a “Motivation” chapter in the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction#KIP-280:Enhancedlogcompaction-Motivation

Is it still unclear after reading that?

Kind Regards,
Luís Cabral


From: Jason Gustafson
Sent: 03 July 2018 23:45
To: dev
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Sorry to join the discussion late. Can you you add to the motivation the
use cases for header-based compaction. This seems not very clear to me.

Thanks,
Jason

On Mon, Jul 2, 2018 at 11:00 AM, Guozhang Wang  wrote:

> Hi Luis,
>
> I believe that compaction property is indeed overridable at per-topic
> level, as in
>
> https://github.com/apache/kafka/blob/0cacbcf30e0a90ab9fad7bc310e547
> 7cf959f1fd/clients/src/main/java/org/apache/kafka/common/
> config/TopicConfig.java#L116
>
> And also documented in https://kafka.apache.org/
> documentation/#topicconfigs
>
> Is that right?
>
>
>
> Guozhang
>
> On Mon, Jul 2, 2018 at 7:41 AM, Luís Cabral  >
> wrote:
>
> >  Hi Guozhang,
> >
> > You are right that it is not straightforward to add a dependent property
> > validation.
> > Though it is possible to re-design it to allow for this, that effort
> would
> > be better placed under its own KIP, if it really becomes useful for other
> > properties as well.
> > Given this, the fallback-to-offset behaviour currently documented will be
> > used.
> >
> > Also, while analyzing this, I noticed that the existing compaction
> > properties only exist globally, and not per topic.
> > I don't understand why this is, but it again feels like something out of
> > scope for this KIP.
> > Given this, the KIP was updated to only include the global configuration
> > properties, removing the per-topic configs.
> >
> > I'll soon update the PR according to the documentation, but I trust the
> > KIP doesn't need that to close, right?
> >
> > Cheers,
> > Luis
> >
> > On Monday, July 2, 2018, 2:00:08 PM GMT+2, Luís Cabral
> >  wrote:
> >
> >   Hi Guozhang,
> >
> > At the moment the KIP has your vote, Matthias' and Ted's.
> > Should I ask someone else to have a look?
> >
> > Cheers,
> > Luis
> >
> > On Monday, July 2, 2018, 12:16:48 PM GMT+2, Mickael Maison <
> > mickael.mai...@gmail.com> wrote:
> >
> >  +1 (non binding). Thanks for the KIP!
> >
> > On Sat, Jun 30, 2018 at 12:26 AM, Guozhang Wang 
> > wrote:
> > > Hi Luis,
> > >
> > > Regarding the minor suggest, I agree it would be better to make it as
> > > mandatory, but it might be a bit tricky because it is a conditional
> > > mandatory one depending on the other config's value. Would like to see
> > your
> > > updated PR.
> > >
> > > Regarding the KIP itself, both Matthias and myself can recast our votes
> > to
> > > the updated wiki, while we still need one more committer to vote
> > according
> > > to the bylaws.
> > >
> > >
> > > Guozhang
> > >
> > > On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral
> > 
> > > wrote:
> > >
> > >>  Hi,
> > >>
> > >> Thank you all for having a look!
> > >>
> > >> The KIP is now updated with the result of these late discussions,
> > though I
> > >> did take some liberty with this part:
> > >>
> > >>
> > >>- If the "compaction.strategy.header" configuration is not set (or
> is
> > >> blank), then the compaction strategy will fallback to "offset";
> > >>
> > >>
> > >> Alternatively, we can also set it to be a mandatory property when the
> > >> strategy is "header" and fail the application to start via a config
> > >> validation (I would honestly prefer this, but its up to your taste).
> > >>
> > >> Anyway, this is now a minute detail that can be adapted during the
> final
> > >> stage of this KIP, so are you all alright with me changing the status
> to
> > >> [ACCEPTED]?
> > >>
> > >> Cheers,
> > >> Luis
> > >>
> > >>
> > >>On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
> > >> yuzhih...@gmail.com> wrote:
> > >>
> > >>  +1
> > >>
> > >> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral
> >  > >> >
> > >> wrote:
> > >>
> > >> > Hi Ted,
> > >> > Can I also get your input on this?
> > >> >
> > >> > bq. +1 from my side for using `compaction.strategy` with values
> > >> > "offset","timestamp" and "header" and `compaction.strategy.header`
> > >> > -Matthias
> > >> >
> > >> > bq. +1 from me as well.
> > >> > -Guozhang
> > >> >
> > >> >
> > >> > Cheers,
> > >> > Luis
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>



Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-03 Thread Jun Rao
Hi, Luis,

Thanks for the KIP. Overall, this seems a useful KIP. A few comments below.

1. I guess both new configurations will be at the topic level?
2. Since the log cleaner now needs to keep both the offset and another long
(say timestamp) in the de-dup map, it reduces the number of keys that we
can keep in the map and thus may require more rounds of cleaning. This is
probably not a big issue, but it will be useful to document this impact in
the KIP.
3. With the new cleaning strategy, we want to be a bit careful with
removing the last message in a partition (which is possible now). We need
to preserve the offset of the last message so that we don't reuse the
offset for a different message. One way to simply never remove the last
message. Another way (suggested by Jason) is to create an empty message
batch.

Jun

On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral 
wrote:

> Hi all,
>
> Any takers on having a look at this KIP and voting on it?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%3A+Enhanced+log+compaction
>
> Cheers,
> Luis
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-03 Thread Jason Gustafson
Sorry to join the discussion late. Can you you add to the motivation the
use cases for header-based compaction. This seems not very clear to me.

Thanks,
Jason

On Mon, Jul 2, 2018 at 11:00 AM, Guozhang Wang  wrote:

> Hi Luis,
>
> I believe that compaction property is indeed overridable at per-topic
> level, as in
>
> https://github.com/apache/kafka/blob/0cacbcf30e0a90ab9fad7bc310e547
> 7cf959f1fd/clients/src/main/java/org/apache/kafka/common/
> config/TopicConfig.java#L116
>
> And also documented in https://kafka.apache.org/
> documentation/#topicconfigs
>
> Is that right?
>
>
>
> Guozhang
>
> On Mon, Jul 2, 2018 at 7:41 AM, Luís Cabral  >
> wrote:
>
> >  Hi Guozhang,
> >
> > You are right that it is not straightforward to add a dependent property
> > validation.
> > Though it is possible to re-design it to allow for this, that effort
> would
> > be better placed under its own KIP, if it really becomes useful for other
> > properties as well.
> > Given this, the fallback-to-offset behaviour currently documented will be
> > used.
> >
> > Also, while analyzing this, I noticed that the existing compaction
> > properties only exist globally, and not per topic.
> > I don't understand why this is, but it again feels like something out of
> > scope for this KIP.
> > Given this, the KIP was updated to only include the global configuration
> > properties, removing the per-topic configs.
> >
> > I'll soon update the PR according to the documentation, but I trust the
> > KIP doesn't need that to close, right?
> >
> > Cheers,
> > Luis
> >
> > On Monday, July 2, 2018, 2:00:08 PM GMT+2, Luís Cabral
> >  wrote:
> >
> >   Hi Guozhang,
> >
> > At the moment the KIP has your vote, Matthias' and Ted's.
> > Should I ask someone else to have a look?
> >
> > Cheers,
> > Luis
> >
> > On Monday, July 2, 2018, 12:16:48 PM GMT+2, Mickael Maison <
> > mickael.mai...@gmail.com> wrote:
> >
> >  +1 (non binding). Thanks for the KIP!
> >
> > On Sat, Jun 30, 2018 at 12:26 AM, Guozhang Wang 
> > wrote:
> > > Hi Luis,
> > >
> > > Regarding the minor suggest, I agree it would be better to make it as
> > > mandatory, but it might be a bit tricky because it is a conditional
> > > mandatory one depending on the other config's value. Would like to see
> > your
> > > updated PR.
> > >
> > > Regarding the KIP itself, both Matthias and myself can recast our votes
> > to
> > > the updated wiki, while we still need one more committer to vote
> > according
> > > to the bylaws.
> > >
> > >
> > > Guozhang
> > >
> > > On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral
> > 
> > > wrote:
> > >
> > >>  Hi,
> > >>
> > >> Thank you all for having a look!
> > >>
> > >> The KIP is now updated with the result of these late discussions,
> > though I
> > >> did take some liberty with this part:
> > >>
> > >>
> > >>- If the "compaction.strategy.header" configuration is not set (or
> is
> > >> blank), then the compaction strategy will fallback to "offset";
> > >>
> > >>
> > >> Alternatively, we can also set it to be a mandatory property when the
> > >> strategy is "header" and fail the application to start via a config
> > >> validation (I would honestly prefer this, but its up to your taste).
> > >>
> > >> Anyway, this is now a minute detail that can be adapted during the
> final
> > >> stage of this KIP, so are you all alright with me changing the status
> to
> > >> [ACCEPTED]?
> > >>
> > >> Cheers,
> > >> Luis
> > >>
> > >>
> > >>On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
> > >> yuzhih...@gmail.com> wrote:
> > >>
> > >>  +1
> > >>
> > >> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral
> >  > >> >
> > >> wrote:
> > >>
> > >> > Hi Ted,
> > >> > Can I also get your input on this?
> > >> >
> > >> > bq. +1 from my side for using `compaction.strategy` with values
> > >> > "offset","timestamp" and "header" and `compaction.strategy.header`
> > >> > -Matthias
> > >> >
> > >> > bq. +1 from me as well.
> > >> > -Guozhang
> > >> >
> > >> >
> > >> > Cheers,
> > >> > Luis
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-02 Thread Guozhang Wang
Hi Luis,

I believe that compaction property is indeed overridable at per-topic
level, as in

https://github.com/apache/kafka/blob/0cacbcf30e0a90ab9fad7bc310e5477cf959f1fd/clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java#L116

And also documented in https://kafka.apache.org/documentation/#topicconfigs

Is that right?



Guozhang

On Mon, Jul 2, 2018 at 7:41 AM, Luís Cabral 
wrote:

>  Hi Guozhang,
>
> You are right that it is not straightforward to add a dependent property
> validation.
> Though it is possible to re-design it to allow for this, that effort would
> be better placed under its own KIP, if it really becomes useful for other
> properties as well.
> Given this, the fallback-to-offset behaviour currently documented will be
> used.
>
> Also, while analyzing this, I noticed that the existing compaction
> properties only exist globally, and not per topic.
> I don't understand why this is, but it again feels like something out of
> scope for this KIP.
> Given this, the KIP was updated to only include the global configuration
> properties, removing the per-topic configs.
>
> I'll soon update the PR according to the documentation, but I trust the
> KIP doesn't need that to close, right?
>
> Cheers,
> Luis
>
> On Monday, July 2, 2018, 2:00:08 PM GMT+2, Luís Cabral
>  wrote:
>
>   Hi Guozhang,
>
> At the moment the KIP has your vote, Matthias' and Ted's.
> Should I ask someone else to have a look?
>
> Cheers,
> Luis
>
> On Monday, July 2, 2018, 12:16:48 PM GMT+2, Mickael Maison <
> mickael.mai...@gmail.com> wrote:
>
>  +1 (non binding). Thanks for the KIP!
>
> On Sat, Jun 30, 2018 at 12:26 AM, Guozhang Wang 
> wrote:
> > Hi Luis,
> >
> > Regarding the minor suggest, I agree it would be better to make it as
> > mandatory, but it might be a bit tricky because it is a conditional
> > mandatory one depending on the other config's value. Would like to see
> your
> > updated PR.
> >
> > Regarding the KIP itself, both Matthias and myself can recast our votes
> to
> > the updated wiki, while we still need one more committer to vote
> according
> > to the bylaws.
> >
> >
> > Guozhang
> >
> > On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral
> 
> > wrote:
> >
> >>  Hi,
> >>
> >> Thank you all for having a look!
> >>
> >> The KIP is now updated with the result of these late discussions,
> though I
> >> did take some liberty with this part:
> >>
> >>
> >>- If the "compaction.strategy.header" configuration is not set (or is
> >> blank), then the compaction strategy will fallback to "offset";
> >>
> >>
> >> Alternatively, we can also set it to be a mandatory property when the
> >> strategy is "header" and fail the application to start via a config
> >> validation (I would honestly prefer this, but its up to your taste).
> >>
> >> Anyway, this is now a minute detail that can be adapted during the final
> >> stage of this KIP, so are you all alright with me changing the status to
> >> [ACCEPTED]?
> >>
> >> Cheers,
> >> Luis
> >>
> >>
> >>On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
> >> yuzhih...@gmail.com> wrote:
> >>
> >>  +1
> >>
> >> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral
>  >> >
> >> wrote:
> >>
> >> > Hi Ted,
> >> > Can I also get your input on this?
> >> >
> >> > bq. +1 from my side for using `compaction.strategy` with values
> >> > "offset","timestamp" and "header" and `compaction.strategy.header`
> >> > -Matthias
> >> >
> >> > bq. +1 from me as well.
> >> > -Guozhang
> >> >
> >> >
> >> > Cheers,
> >> > Luis
> >> >
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > -- Guozhang
>



-- 
-- Guozhang


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-02 Thread Luís Cabral
 Hi Guozhang,

You are right that it is not straightforward to add a dependent property 
validation. 
Though it is possible to re-design it to allow for this, that effort would be 
better placed under its own KIP, if it really becomes useful for other 
properties as well.
Given this, the fallback-to-offset behaviour currently documented will be used.

Also, while analyzing this, I noticed that the existing compaction properties 
only exist globally, and not per topic. 
I don't understand why this is, but it again feels like something out of scope 
for this KIP.
Given this, the KIP was updated to only include the global configuration 
properties, removing the per-topic configs.

I'll soon update the PR according to the documentation, but I trust the KIP 
doesn't need that to close, right?

Cheers,
Luis

On Monday, July 2, 2018, 2:00:08 PM GMT+2, Luís Cabral 
 wrote:  
 
  Hi Guozhang,

At the moment the KIP has your vote, Matthias' and Ted's.
Should I ask someone else to have a look?

Cheers,
Luis

    On Monday, July 2, 2018, 12:16:48 PM GMT+2, Mickael Maison 
 wrote:  
 
 +1 (non binding). Thanks for the KIP!

On Sat, Jun 30, 2018 at 12:26 AM, Guozhang Wang  wrote:
> Hi Luis,
>
> Regarding the minor suggest, I agree it would be better to make it as
> mandatory, but it might be a bit tricky because it is a conditional
> mandatory one depending on the other config's value. Would like to see your
> updated PR.
>
> Regarding the KIP itself, both Matthias and myself can recast our votes to
> the updated wiki, while we still need one more committer to vote according
> to the bylaws.
>
>
> Guozhang
>
> On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral 
> wrote:
>
>>  Hi,
>>
>> Thank you all for having a look!
>>
>> The KIP is now updated with the result of these late discussions, though I
>> did take some liberty with this part:
>>
>>
>>    - If the "compaction.strategy.header" configuration is not set (or is
>> blank), then the compaction strategy will fallback to "offset";
>>
>>
>> Alternatively, we can also set it to be a mandatory property when the
>> strategy is "header" and fail the application to start via a config
>> validation (I would honestly prefer this, but its up to your taste).
>>
>> Anyway, this is now a minute detail that can be adapted during the final
>> stage of this KIP, so are you all alright with me changing the status to
>> [ACCEPTED]?
>>
>> Cheers,
>> Luis
>>
>>
>>    On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
>> yuzhih...@gmail.com> wrote:
>>
>>  +1
>>
>> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral > >
>> wrote:
>>
>> > Hi Ted,
>> > Can I also get your input on this?
>> >
>> > bq. +1 from my side for using `compaction.strategy` with values
>> > "offset","timestamp" and "header" and `compaction.strategy.header`
>> > -Matthias
>> >
>> > bq. +1 from me as well.
>> > -Guozhang
>> >
>> >
>> > Cheers,
>> > Luis
>> >
>> >
>> >
>>
>
>
>
> --
> -- Guozhang    

Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-02 Thread Luís Cabral
 Hi Guozhang,

At the moment the KIP has your vote, Matthias' and Ted's.
Should I ask someone else to have a look?

Cheers,
Luis

On Monday, July 2, 2018, 12:16:48 PM GMT+2, Mickael Maison 
 wrote:  
 
 +1 (non binding). Thanks for the KIP!

On Sat, Jun 30, 2018 at 12:26 AM, Guozhang Wang  wrote:
> Hi Luis,
>
> Regarding the minor suggest, I agree it would be better to make it as
> mandatory, but it might be a bit tricky because it is a conditional
> mandatory one depending on the other config's value. Would like to see your
> updated PR.
>
> Regarding the KIP itself, both Matthias and myself can recast our votes to
> the updated wiki, while we still need one more committer to vote according
> to the bylaws.
>
>
> Guozhang
>
> On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral 
> wrote:
>
>>  Hi,
>>
>> Thank you all for having a look!
>>
>> The KIP is now updated with the result of these late discussions, though I
>> did take some liberty with this part:
>>
>>
>>    - If the "compaction.strategy.header" configuration is not set (or is
>> blank), then the compaction strategy will fallback to "offset";
>>
>>
>> Alternatively, we can also set it to be a mandatory property when the
>> strategy is "header" and fail the application to start via a config
>> validation (I would honestly prefer this, but its up to your taste).
>>
>> Anyway, this is now a minute detail that can be adapted during the final
>> stage of this KIP, so are you all alright with me changing the status to
>> [ACCEPTED]?
>>
>> Cheers,
>> Luis
>>
>>
>>    On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
>> yuzhih...@gmail.com> wrote:
>>
>>  +1
>>
>> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral > >
>> wrote:
>>
>> > Hi Ted,
>> > Can I also get your input on this?
>> >
>> > bq. +1 from my side for using `compaction.strategy` with values
>> > "offset","timestamp" and "header" and `compaction.strategy.header`
>> > -Matthias
>> >
>> > bq. +1 from me as well.
>> > -Guozhang
>> >
>> >
>> > Cheers,
>> > Luis
>> >
>> >
>> >
>>
>
>
>
> --
> -- Guozhang  

Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-02 Thread Mickael Maison
+1 (non binding). Thanks for the KIP!

On Sat, Jun 30, 2018 at 12:26 AM, Guozhang Wang  wrote:
> Hi Luis,
>
> Regarding the minor suggest, I agree it would be better to make it as
> mandatory, but it might be a bit tricky because it is a conditional
> mandatory one depending on the other config's value. Would like to see your
> updated PR.
>
> Regarding the KIP itself, both Matthias and myself can recast our votes to
> the updated wiki, while we still need one more committer to vote according
> to the bylaws.
>
>
> Guozhang
>
> On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral 
> wrote:
>
>>  Hi,
>>
>> Thank you all for having a look!
>>
>> The KIP is now updated with the result of these late discussions, though I
>> did take some liberty with this part:
>>
>>
>>- If the "compaction.strategy.header" configuration is not set (or is
>> blank), then the compaction strategy will fallback to "offset";
>>
>>
>> Alternatively, we can also set it to be a mandatory property when the
>> strategy is "header" and fail the application to start via a config
>> validation (I would honestly prefer this, but its up to your taste).
>>
>> Anyway, this is now a minute detail that can be adapted during the final
>> stage of this KIP, so are you all alright with me changing the status to
>> [ACCEPTED]?
>>
>> Cheers,
>> Luis
>>
>>
>> On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
>> yuzhih...@gmail.com> wrote:
>>
>>  +1
>>
>> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral > >
>> wrote:
>>
>> > Hi Ted,
>> > Can I also get your input on this?
>> >
>> > bq. +1 from my side for using `compaction.strategy` with values
>> > "offset","timestamp" and "header" and `compaction.strategy.header`
>> > -Matthias
>> >
>> > bq. +1 from me as well.
>> > -Guozhang
>> >
>> >
>> > Cheers,
>> > Luis
>> >
>> >
>> >
>>
>
>
>
> --
> -- Guozhang


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-29 Thread Guozhang Wang
Hi Luis,

Regarding the minor suggest, I agree it would be better to make it as
mandatory, but it might be a bit tricky because it is a conditional
mandatory one depending on the other config's value. Would like to see your
updated PR.

Regarding the KIP itself, both Matthias and myself can recast our votes to
the updated wiki, while we still need one more committer to vote according
to the bylaws.


Guozhang

On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral 
wrote:

>  Hi,
>
> Thank you all for having a look!
>
> The KIP is now updated with the result of these late discussions, though I
> did take some liberty with this part:
>
>
>- If the "compaction.strategy.header" configuration is not set (or is
> blank), then the compaction strategy will fallback to "offset";
>
>
> Alternatively, we can also set it to be a mandatory property when the
> strategy is "header" and fail the application to start via a config
> validation (I would honestly prefer this, but its up to your taste).
>
> Anyway, this is now a minute detail that can be adapted during the final
> stage of this KIP, so are you all alright with me changing the status to
> [ACCEPTED]?
>
> Cheers,
> Luis
>
>
> On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
> yuzhih...@gmail.com> wrote:
>
>  +1
>
> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral  >
> wrote:
>
> > Hi Ted,
> > Can I also get your input on this?
> >
> > bq. +1 from my side for using `compaction.strategy` with values
> > "offset","timestamp" and "header" and `compaction.strategy.header`
> > -Matthias
> >
> > bq. +1 from me as well.
> > -Guozhang
> >
> >
> > Cheers,
> > Luis
> >
> >
> >
>



-- 
-- Guozhang


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-28 Thread Luís Cabral
 Hi,

Thank you all for having a look!

The KIP is now updated with the result of these late discussions, though I did 
take some liberty with this part:

   
   - If the "compaction.strategy.header" configuration is not set (or is 
blank), then the compaction strategy will fallback to "offset";


Alternatively, we can also set it to be a mandatory property when the strategy 
is "header" and fail the application to start via a config validation (I would 
honestly prefer this, but its up to your taste).

Anyway, this is now a minute detail that can be adapted during the final stage 
of this KIP, so are you all alright with me changing the status to [ACCEPTED]?

Cheers,
Luis


On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu  
wrote:  
 
 +1

On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral 
wrote:

> Hi Ted,
> Can I also get your input on this?
>
> bq. +1 from my side for using `compaction.strategy` with values
> "offset","timestamp" and "header" and `compaction.strategy.header`
> -Matthias
>
> bq. +1 from me as well.
> -Guozhang
>
>
> Cheers,
> Luis
>
>
>  

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-28 Thread Ted Yu
+1

On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral 
wrote:

> Hi Ted,
> Can I also get your input on this?
>
> bq. +1 from my side for using `compaction.strategy` with values
> "offset","timestamp" and "header" and `compaction.strategy.header`
> -Matthias
>
> bq. +1 from me as well.
> -Guozhang
>
>
> Cheers,
> Luis
>
>
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-28 Thread Luís Cabral
Hi Ted,
Can I also get your input on this?

bq. +1 from my side for using `compaction.strategy` with values 
"offset","timestamp" and "header" and `compaction.strategy.header`
-Matthias

bq. +1 from me as well.
-Guozhang 


Cheers,
Luis




Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-25 Thread Guozhang Wang
`offset`, `timestamp`) as valid configs to enable header based
> compaction.
> >>
> >> Personally, I prefer either adding a config or going with
> >> `header=`. Using `_timestamp_`, `_offset_`, and `` might be
> >> good enough (even if this is the solution I like least)---for this case,
> >> we should state explicitly, that the whole space of `_*_` is reserved
> >> and users are not allowed to set those for header compaction. In fact, I
> >> would also add a check for the config that only allows for `_offset_`
> >> and `_timestamp_` and throws an exception for all other `_*_` configs.
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 6/18/18 2:03 PM, Luís Cabral wrote:
> >>> I’m ok with that...
> >>>
> >>> Ted / Matthias?
> >>>
> >>>
> >>> From: Guozhang Wang
> >>> Sent: 18 June 2018 22:49
> >>> To: dev@kafka.apache.org
> >>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >>>
> >>> How about make the preserved values to be "_offset_" and "_timestamp_"
> >>> then? Currently in the KIP they are reserved as "offset" and
> "timestamp".
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral
> >> 
> >>> wrote:
> >>>
> >>>> Hi Guozhang,
> >>>>
> >>>> Yes, that is what I meant (separate configs).
> >>>> Though I would still prefer to keep it as it is, as its a much simpler
> >> and
> >>>> cleaner approach – I’m not so sure that a potential client would
> really
> >> be
> >>>> so inconvenienced for having to use “_offset” or “_timestamp_” as a
> >> header
> >>>>
> >>>> Cheers,
> >>>> Luís
> >>>>
> >>>>
> >>>> From: Guozhang Wang
> >>>> Sent: 18 June 2018 19:35
> >>>> To: dev@kafka.apache.org
> >>>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >>>>
> >>>> Hello Luís,
> >>>>
> >>>> I agree that having an expression evaluation as a config value is not
> >> the
> >>>> best approach; if there are better ideas to allow users to specify the
> >>>> header key which happen to be the same as the preserved config values
> >>>> "offset" and "timestamp" (although the likelihood may be small, as Ted
> >>>> mentioned there may be more preserved config values added in the
> >> future),
> >>>> then I'd be happily follow the suggestions. For example, we could have
> >> the
> >>>> config value for header keys as "header-"? Is that what
> you've
> >>>> suggested? Or do you suggest using two configs instead, and the second
> >>>> config specifying the key name, and will only be considered if the
> first
> >>>> (i.e. current proposed) config's value is `header`, otherwise be
> >> ignored?
> >>>>
> >>>>
> >>>> Guozhang
> >>>>
> >>>>
> >>>> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
> >>>>  >>>>> wrote:
> >>>>
> >>>>>   Hi Ted / Guozhang / Matthias,
> >>>>>
> >>>>> @Ted: I've now added your argument to the "Rejected Alternatives"
> >> portion
> >>>>> of the KIP. Please keep in mind that I would like to keep this as
> >>>> backwards
> >>>>> compatible as possible, so a lot of decisions are inferred from that
> >>>> intent.
> >>>>>
> >>>>> @Guozhang: IMHO, adding expression evaluation to configuration is an
> >>>>> incorrect approach. If you absolutely insist on having this clear
> >>>>> distinction between header/key, then I would suggest instead to have
> a
> >>>>> dedicated property for the "key" part. Of course, this is your
> project
> >> so
> >>>>> I'll just continue whatever approach moves this KIP forward...
> >>>>>
> >>>>> @Matthias: Sorry, but update the KIP according to what?
> >>>>>
> >>>>> Cheers,
> >>>>> Luís
> >>>>>
> >>>>> On Monday, June 18, 2

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-25 Thread Matthias J. Sax
+1 from my side for using `compaction.strategy` with values "offset",
"timestamp" and "header" and `compaction.strategy.header`

-Matthias

On 6/25/18 1:25 AM, Luís Cabral wrote:
>  Hi,
> 
> So, is everyone OK using the approach with 2 properties?
> 
> E.g.:
> 
> Scenario 1:
>     compaction.strategy: offset
> 
>     :- Behaviour is the same as what currently exists, where the compaction 
> is done only via the 'offset'
> 
> 
> Scenario 2:
>     compaction.strategy: timestamp
> 
>     :- Similar to 'offset', but the record timestamp is used instead
> 
> 
> Scenario 3:
>     compaction.strategy: header
> 
>     compaction.strategy.header: xyz
> 
>     :- Searches the headers for 'xyz' key when performing the compaction. 
> Defaults to 'offset' strategy if this header does not exist (special note on 
> the '.header' suffix, as this would allow additional strategies to add 
> whatever extra configuration they need).
> 
> Scenario 4 (hypothetical future):
>     compaction.strategy: foo
> 
>     compaction.strategy.foo.name: bar
>     compaction.strategy.foo.order: DESC
> compaction.strategy.foo.fallback: timestamp
> 
> 
>     :- This one is just to show what I meant with the '.header' suffix 
> mentioned in {Scenario 3}
> 
> 
> 
> Regards,
> Luís
> 
> 
> On Monday, June 18, 2018, 11:56:51 PM GMT+2, Guozhang Wang 
>  wrote:  
>  
>  Hi Matthias,
> 
> Yes, we are effectively assigning the the whole space of Strings minus
> current preserved ones as header keys; honestly I think in practice users
> wanting to use `_something_` would be very rare, but I admit it may still
> be possible in theory.
> 
> I think Luis' point about "header=" is that having a expression
> evaluation as the config value is a bit weird, and thinking about it twice
> it is still not flawless: we can still argue that we are effectively
> assigning the whole sub-space of "header=*" of Strings for headers, and
> what if users want to use preserved value falling into that sub-space
> (again, should not really happen in practice, just being paranoid here).
> 
> It seems that two configs are the common choice that everyone is happy with.
> 
> Guozhang
> 
> 
> On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax 
> wrote:
> 
>> Luis,
>>
>> I meant to update the "Rejected Alternative" sections, what you have
>> done already. Thx.
>>
>> Originally, I also had the idea about a second config, but thought it
>> might be easier to just change the allowed values to be `offset`,
>> `timestamp`, `header=`. (We try to keep the number of configs small
>> if possible, as more configs are more confusing to users.)
>>
>> I don't think that using `_offset_`, `_timestamp_` and `` solves
>> the problem because users still might use `_something_` as header key --
>> and if we want to introduce a new compaction strategy "something" later
>> we face the same issues as without the underscores. We only reduce the
>> likelihood that it happens.
>>
>> Using `header=` as prefix or introducing a second config, that is only
>> effective if the strategy is set to `header` seems to be a cleaner
>> solution.
>>
>> @Luis: why do you think that using `header=` is an "incorrect
>> approach"?
>>
>>> Though I would still prefer to keep it as it is, as its a much simple>
>> and cleaner approach – I’m not so sure that a potential client would
>>> really be so inconvenienced for having to use “_offset” or
>>> “_timestamp_” as a header
>>
>> I don't think that it's about the issue that people cannot use
>> `_offset_` or `_timestamp_` in their header (by "use" I mean for
>> compaction). With the current KIP, they cannot use `offset` or
>> `timestamp` either. The issue is, that we cannot introduce a new system
>> supported compaction strategy in the future without potentially breaking
>> something, as we basically assign the whole space of Strings (minus
>> `offset`, `timestamp`) as valid configs to enable header based compaction.
>>
>> Personally, I prefer either adding a config or going with
>> `header=`. Using `_timestamp_`, `_offset_`, and `` might be
>> good enough (even if this is the solution I like least)---for this case,
>> we should state explicitly, that the whole space of `_*_` is reserved
>> and users are not allowed to set those for header compaction. In fact, I
>> would also add a check for the config that only allows for `_offset_`
>> and `_timestamp_` and throws an exception for all other `_*_` c

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-25 Thread Luís Cabral
 Hi,

So, is everyone OK using the approach with 2 properties?

E.g.:

Scenario 1:
    compaction.strategy: offset

    :- Behaviour is the same as what currently exists, where the compaction is 
done only via the 'offset'


Scenario 2:
    compaction.strategy: timestamp

    :- Similar to 'offset', but the record timestamp is used instead


Scenario 3:
    compaction.strategy: header

    compaction.strategy.header: xyz

    :- Searches the headers for 'xyz' key when performing the compaction. 
Defaults to 'offset' strategy if this header does not exist (special note on 
the '.header' suffix, as this would allow additional strategies to add whatever 
extra configuration they need).

Scenario 4 (hypothetical future):
    compaction.strategy: foo

    compaction.strategy.foo.name: bar
    compaction.strategy.foo.order: DESC
compaction.strategy.foo.fallback: timestamp


    :- This one is just to show what I meant with the '.header' suffix 
mentioned in {Scenario 3}



Regards,
Luís


On Monday, June 18, 2018, 11:56:51 PM GMT+2, Guozhang Wang 
 wrote:  
 
 Hi Matthias,

Yes, we are effectively assigning the the whole space of Strings minus
current preserved ones as header keys; honestly I think in practice users
wanting to use `_something_` would be very rare, but I admit it may still
be possible in theory.

I think Luis' point about "header=" is that having a expression
evaluation as the config value is a bit weird, and thinking about it twice
it is still not flawless: we can still argue that we are effectively
assigning the whole sub-space of "header=*" of Strings for headers, and
what if users want to use preserved value falling into that sub-space
(again, should not really happen in practice, just being paranoid here).

It seems that two configs are the common choice that everyone is happy with.

Guozhang


On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax 
wrote:

> Luis,
>
> I meant to update the "Rejected Alternative" sections, what you have
> done already. Thx.
>
> Originally, I also had the idea about a second config, but thought it
> might be easier to just change the allowed values to be `offset`,
> `timestamp`, `header=`. (We try to keep the number of configs small
> if possible, as more configs are more confusing to users.)
>
> I don't think that using `_offset_`, `_timestamp_` and `` solves
> the problem because users still might use `_something_` as header key --
> and if we want to introduce a new compaction strategy "something" later
> we face the same issues as without the underscores. We only reduce the
> likelihood that it happens.
>
> Using `header=` as prefix or introducing a second config, that is only
> effective if the strategy is set to `header` seems to be a cleaner
> solution.
>
> @Luis: why do you think that using `header=` is an "incorrect
> approach"?
>
> > Though I would still prefer to keep it as it is, as its a much simple>
> and cleaner approach – I’m not so sure that a potential client would
> > really be so inconvenienced for having to use “_offset” or
> > “_timestamp_” as a header
>
> I don't think that it's about the issue that people cannot use
> `_offset_` or `_timestamp_` in their header (by "use" I mean for
> compaction). With the current KIP, they cannot use `offset` or
> `timestamp` either. The issue is, that we cannot introduce a new system
> supported compaction strategy in the future without potentially breaking
> something, as we basically assign the whole space of Strings (minus
> `offset`, `timestamp`) as valid configs to enable header based compaction.
>
> Personally, I prefer either adding a config or going with
> `header=`. Using `_timestamp_`, `_offset_`, and `` might be
> good enough (even if this is the solution I like least)---for this case,
> we should state explicitly, that the whole space of `_*_` is reserved
> and users are not allowed to set those for header compaction. In fact, I
> would also add a check for the config that only allows for `_offset_`
> and `_timestamp_` and throws an exception for all other `_*_` configs.
>
>
> -Matthias
>
>
> On 6/18/18 2:03 PM, Luís Cabral wrote:
> > I’m ok with that...
> >
> > Ted / Matthias?
> >
> >
> > From: Guozhang Wang
> > Sent: 18 June 2018 22:49
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > How about make the preserved values to be "_offset_" and "_timestamp_"
> > then? Currently in the KIP they are reserved as "offset" and "timestamp".
> >
> >
> > Guozhang
> >
> > On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral
> 
> > wrote:
> >
> >> Hi Guozhang,
> >>
> >

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-18 Thread Guozhang Wang
Hi Matthias,

Yes, we are effectively assigning the the whole space of Strings minus
current preserved ones as header keys; honestly I think in practice users
wanting to use `_something_` would be very rare, but I admit it may still
be possible in theory.

I think Luis' point about "header=" is that having a expression
evaluation as the config value is a bit weird, and thinking about it twice
it is still not flawless: we can still argue that we are effectively
assigning the whole sub-space of "header=*" of Strings for headers, and
what if users want to use preserved value falling into that sub-space
(again, should not really happen in practice, just being paranoid here).

It seems that two configs are the common choice that everyone is happy with.

Guozhang


On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax 
wrote:

> Luis,
>
> I meant to update the "Rejected Alternative" sections, what you have
> done already. Thx.
>
> Originally, I also had the idea about a second config, but thought it
> might be easier to just change the allowed values to be `offset`,
> `timestamp`, `header=`. (We try to keep the number of configs small
> if possible, as more configs are more confusing to users.)
>
> I don't think that using `_offset_`, `_timestamp_` and `` solves
> the problem because users still might use `_something_` as header key --
> and if we want to introduce a new compaction strategy "something" later
> we face the same issues as without the underscores. We only reduce the
> likelihood that it happens.
>
> Using `header=` as prefix or introducing a second config, that is only
> effective if the strategy is set to `header` seems to be a cleaner
> solution.
>
> @Luis: why do you think that using `header=` is an "incorrect
> approach"?
>
> > Though I would still prefer to keep it as it is, as its a much simple>
> and cleaner approach – I’m not so sure that a potential client would
> > really be so inconvenienced for having to use “_offset” or
> > “_timestamp_” as a header
>
> I don't think that it's about the issue that people cannot use
> `_offset_` or `_timestamp_` in their header (by "use" I mean for
> compaction). With the current KIP, they cannot use `offset` or
> `timestamp` either. The issue is, that we cannot introduce a new system
> supported compaction strategy in the future without potentially breaking
> something, as we basically assign the whole space of Strings (minus
> `offset`, `timestamp`) as valid configs to enable header based compaction.
>
> Personally, I prefer either adding a config or going with
> `header=`. Using `_timestamp_`, `_offset_`, and `` might be
> good enough (even if this is the solution I like least)---for this case,
> we should state explicitly, that the whole space of `_*_` is reserved
> and users are not allowed to set those for header compaction. In fact, I
> would also add a check for the config that only allows for `_offset_`
> and `_timestamp_` and throws an exception for all other `_*_` configs.
>
>
> -Matthias
>
>
> On 6/18/18 2:03 PM, Luís Cabral wrote:
> > I’m ok with that...
> >
> > Ted / Matthias?
> >
> >
> > From: Guozhang Wang
> > Sent: 18 June 2018 22:49
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > How about make the preserved values to be "_offset_" and "_timestamp_"
> > then? Currently in the KIP they are reserved as "offset" and "timestamp".
> >
> >
> > Guozhang
> >
> > On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral
> 
> > wrote:
> >
> >> Hi Guozhang,
> >>
> >> Yes, that is what I meant (separate configs).
> >> Though I would still prefer to keep it as it is, as its a much simpler
> and
> >> cleaner approach – I’m not so sure that a potential client would really
> be
> >> so inconvenienced for having to use “_offset” or “_timestamp_” as a
> header
> >>
> >> Cheers,
> >> Luís
> >>
> >>
> >> From: Guozhang Wang
> >> Sent: 18 June 2018 19:35
> >> To: dev@kafka.apache.org
> >> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >>
> >> Hello Luís,
> >>
> >> I agree that having an expression evaluation as a config value is not
> the
> >> best approach; if there are better ideas to allow users to specify the
> >> header key which happen to be the same as the preserved config values
> >> "offset" and "timestamp" (although the likelihood may be small, as Ted
> >> mentioned there may be more preserve

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-18 Thread Matthias J. Sax
Luis,

I meant to update the "Rejected Alternative" sections, what you have
done already. Thx.

Originally, I also had the idea about a second config, but thought it
might be easier to just change the allowed values to be `offset`,
`timestamp`, `header=`. (We try to keep the number of configs small
if possible, as more configs are more confusing to users.)

I don't think that using `_offset_`, `_timestamp_` and `` solves
the problem because users still might use `_something_` as header key --
and if we want to introduce a new compaction strategy "something" later
we face the same issues as without the underscores. We only reduce the
likelihood that it happens.

Using `header=` as prefix or introducing a second config, that is only
effective if the strategy is set to `header` seems to be a cleaner solution.

@Luis: why do you think that using `header=` is an "incorrect
approach"?

> Though I would still prefer to keep it as it is, as its a much simple> and 
> cleaner approach – I’m not so sure that a potential client would
> really be so inconvenienced for having to use “_offset” or
> “_timestamp_” as a header

I don't think that it's about the issue that people cannot use
`_offset_` or `_timestamp_` in their header (by "use" I mean for
compaction). With the current KIP, they cannot use `offset` or
`timestamp` either. The issue is, that we cannot introduce a new system
supported compaction strategy in the future without potentially breaking
something, as we basically assign the whole space of Strings (minus
`offset`, `timestamp`) as valid configs to enable header based compaction.

Personally, I prefer either adding a config or going with
`header=`. Using `_timestamp_`, `_offset_`, and `` might be
good enough (even if this is the solution I like least)---for this case,
we should state explicitly, that the whole space of `_*_` is reserved
and users are not allowed to set those for header compaction. In fact, I
would also add a check for the config that only allows for `_offset_`
and `_timestamp_` and throws an exception for all other `_*_` configs.


-Matthias


On 6/18/18 2:03 PM, Luís Cabral wrote:
> I’m ok with that...
> 
> Ted / Matthias?
> 
> 
> From: Guozhang Wang
> Sent: 18 June 2018 22:49
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> 
> How about make the preserved values to be "_offset_" and "_timestamp_"
> then? Currently in the KIP they are reserved as "offset" and "timestamp".
> 
> 
> Guozhang
> 
> On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral 
> wrote:
> 
>> Hi Guozhang,
>>
>> Yes, that is what I meant (separate configs).
>> Though I would still prefer to keep it as it is, as its a much simpler and
>> cleaner approach – I’m not so sure that a potential client would really be
>> so inconvenienced for having to use “_offset” or “_timestamp_” as a header
>>
>> Cheers,
>> Luís
>>
>>
>> From: Guozhang Wang
>> Sent: 18 June 2018 19:35
>> To: dev@kafka.apache.org
>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>>
>> Hello Luís,
>>
>> I agree that having an expression evaluation as a config value is not the
>> best approach; if there are better ideas to allow users to specify the
>> header key which happen to be the same as the preserved config values
>> "offset" and "timestamp" (although the likelihood may be small, as Ted
>> mentioned there may be more preserved config values added in the future),
>> then I'd be happily follow the suggestions. For example, we could have the
>> config value for header keys as "header-"? Is that what you've
>> suggested? Or do you suggest using two configs instead, and the second
>> config specifying the key name, and will only be considered if the first
>> (i.e. current proposed) config's value is `header`, otherwise be ignored?
>>
>>
>> Guozhang
>>
>>
>> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
>> >> wrote:
>>
>>>  Hi Ted / Guozhang / Matthias,
>>>
>>> @Ted: I've now added your argument to the "Rejected Alternatives" portion
>>> of the KIP. Please keep in mind that I would like to keep this as
>> backwards
>>> compatible as possible, so a lot of decisions are inferred from that
>> intent.
>>>
>>> @Guozhang: IMHO, adding expression evaluation to configuration is an
>>> incorrect approach. If you absolutely insist on having this clear
>>> distinction between header/key, then I would suggest instead to have a
>>> dedicated property for the "key" part. Of course, this is your projec

RE: [VOTE] KIP-280: Enhanced log compaction

2018-06-18 Thread Luís Cabral
I’m ok with that...

Ted / Matthias?


From: Guozhang Wang
Sent: 18 June 2018 22:49
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

How about make the preserved values to be "_offset_" and "_timestamp_"
then? Currently in the KIP they are reserved as "offset" and "timestamp".


Guozhang

On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral 
wrote:

> Hi Guozhang,
>
> Yes, that is what I meant (separate configs).
> Though I would still prefer to keep it as it is, as its a much simpler and
> cleaner approach – I’m not so sure that a potential client would really be
> so inconvenienced for having to use “_offset” or “_timestamp_” as a header
>
> Cheers,
> Luís
>
>
> From: Guozhang Wang
> Sent: 18 June 2018 19:35
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hello Luís,
>
> I agree that having an expression evaluation as a config value is not the
> best approach; if there are better ideas to allow users to specify the
> header key which happen to be the same as the preserved config values
> "offset" and "timestamp" (although the likelihood may be small, as Ted
> mentioned there may be more preserved config values added in the future),
> then I'd be happily follow the suggestions. For example, we could have the
> config value for header keys as "header-"? Is that what you've
> suggested? Or do you suggest using two configs instead, and the second
> config specifying the key name, and will only be considered if the first
> (i.e. current proposed) config's value is `header`, otherwise be ignored?
>
>
> Guozhang
>
>
> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
>  > wrote:
>
> >  Hi Ted / Guozhang / Matthias,
> >
> > @Ted: I've now added your argument to the "Rejected Alternatives" portion
> > of the KIP. Please keep in mind that I would like to keep this as
> backwards
> > compatible as possible, so a lot of decisions are inferred from that
> intent.
> >
> > @Guozhang: IMHO, adding expression evaluation to configuration is an
> > incorrect approach. If you absolutely insist on having this clear
> > distinction between header/key, then I would suggest instead to have a
> > dedicated property for the "key" part. Of course, this is your project so
> > I'll just continue whatever approach moves this KIP forward...
> >
> > @Matthias: Sorry, but update the KIP according to what?
> >
> > Cheers,
> > Luís
> >
> > On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
> > matth...@confluent.io> wrote:
> >
> >  Well, for "offset" and "timestamp" policy, not communication between
> > both is required.
> >
> > Only if headers are used, user A should communicate the corresponding
> > header key to user B.
> >
> >
> > @Luis: can you update the KIP accordingly?
> >
> >
> >
> > -Matthias
> >
> > On 6/17/18 5:36 PM, Ted Yu wrote:
> > > My previous reply was just an alternative for consideration.
> > >
> > > bq.  than a second user B can add a header with key "offset" and thus
> > break
> > > the intention of user A
> > >
> > > I didn't see such scenario after reading the KIP. Maybe add this as
> > > reasoning for the current approach ?
> > >
> > > I wonder how user B gets to know the intention of user A. Meaning, if
> > user
> > > B doesn't follow the norm set by user A, there still would be issue,
> > right ?
> > >
> > >
> > > On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> Let me rephrase your answer to make sure I understand what you
> suggest:
> > >>
> > >> If compaction strategy is configured to use "offset", and if there is
> a
> > >> header in the record with `key == offset`, than we should use the
> value
> > >> of the record header instead of the actual record offset?
> > >>
> > >> Do I understand this correctly? If yes, what is the advantage of doing
> > >> this? From my point of view, it might be problematic, because if user
> A
> > >> creates a topic and configures "offset" compaction (with the intend
> that
> > >> the record offset should be uses), than a second user B can add a
> header
> > >> with key "offset" and thus break the intention of user A.
> > >>
> > >> Also, if existing to

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-18 Thread Guozhang Wang
How about make the preserved values to be "_offset_" and "_timestamp_"
then? Currently in the KIP they are reserved as "offset" and "timestamp".


Guozhang

On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral 
wrote:

> Hi Guozhang,
>
> Yes, that is what I meant (separate configs).
> Though I would still prefer to keep it as it is, as its a much simpler and
> cleaner approach – I’m not so sure that a potential client would really be
> so inconvenienced for having to use “_offset” or “_timestamp_” as a header
>
> Cheers,
> Luís
>
>
> From: Guozhang Wang
> Sent: 18 June 2018 19:35
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>
> Hello Luís,
>
> I agree that having an expression evaluation as a config value is not the
> best approach; if there are better ideas to allow users to specify the
> header key which happen to be the same as the preserved config values
> "offset" and "timestamp" (although the likelihood may be small, as Ted
> mentioned there may be more preserved config values added in the future),
> then I'd be happily follow the suggestions. For example, we could have the
> config value for header keys as "header-"? Is that what you've
> suggested? Or do you suggest using two configs instead, and the second
> config specifying the key name, and will only be considered if the first
> (i.e. current proposed) config's value is `header`, otherwise be ignored?
>
>
> Guozhang
>
>
> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
>  > wrote:
>
> >  Hi Ted / Guozhang / Matthias,
> >
> > @Ted: I've now added your argument to the "Rejected Alternatives" portion
> > of the KIP. Please keep in mind that I would like to keep this as
> backwards
> > compatible as possible, so a lot of decisions are inferred from that
> intent.
> >
> > @Guozhang: IMHO, adding expression evaluation to configuration is an
> > incorrect approach. If you absolutely insist on having this clear
> > distinction between header/key, then I would suggest instead to have a
> > dedicated property for the "key" part. Of course, this is your project so
> > I'll just continue whatever approach moves this KIP forward...
> >
> > @Matthias: Sorry, but update the KIP according to what?
> >
> > Cheers,
> > Luís
> >
> > On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
> > matth...@confluent.io> wrote:
> >
> >  Well, for "offset" and "timestamp" policy, not communication between
> > both is required.
> >
> > Only if headers are used, user A should communicate the corresponding
> > header key to user B.
> >
> >
> > @Luis: can you update the KIP accordingly?
> >
> >
> >
> > -Matthias
> >
> > On 6/17/18 5:36 PM, Ted Yu wrote:
> > > My previous reply was just an alternative for consideration.
> > >
> > > bq.  than a second user B can add a header with key "offset" and thus
> > break
> > > the intention of user A
> > >
> > > I didn't see such scenario after reading the KIP. Maybe add this as
> > > reasoning for the current approach ?
> > >
> > > I wonder how user B gets to know the intention of user A. Meaning, if
> > user
> > > B doesn't follow the norm set by user A, there still would be issue,
> > right ?
> > >
> > >
> > > On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> Let me rephrase your answer to make sure I understand what you
> suggest:
> > >>
> > >> If compaction strategy is configured to use "offset", and if there is
> a
> > >> header in the record with `key == offset`, than we should use the
> value
> > >> of the record header instead of the actual record offset?
> > >>
> > >> Do I understand this correctly? If yes, what is the advantage of doing
> > >> this? From my point of view, it might be problematic, because if user
> A
> > >> creates a topic and configures "offset" compaction (with the intend
> that
> > >> the record offset should be uses), than a second user B can add a
> header
> > >> with key "offset" and thus break the intention of user A.
> > >>
> > >> Also, if existing topics might have data with record header key
> > >> "offset", the change would not be backward compatible either.
> > >>
> > &g

RE: [VOTE] KIP-280: Enhanced log compaction

2018-06-18 Thread Luís Cabral
Hi Guozhang,

Yes, that is what I meant (separate configs).
Though I would still prefer to keep it as it is, as its a much simpler and 
cleaner approach – I’m not so sure that a potential client would really be so 
inconvenienced for having to use “_offset” or “_timestamp_” as a header

Cheers,
Luís


From: Guozhang Wang
Sent: 18 June 2018 19:35
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Hello Luís,

I agree that having an expression evaluation as a config value is not the
best approach; if there are better ideas to allow users to specify the
header key which happen to be the same as the preserved config values
"offset" and "timestamp" (although the likelihood may be small, as Ted
mentioned there may be more preserved config values added in the future),
then I'd be happily follow the suggestions. For example, we could have the
config value for header keys as "header-"? Is that what you've
suggested? Or do you suggest using two configs instead, and the second
config specifying the key name, and will only be considered if the first
(i.e. current proposed) config's value is `header`, otherwise be ignored?


Guozhang


On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral  wrote:

>  Hi Ted / Guozhang / Matthias,
>
> @Ted: I've now added your argument to the "Rejected Alternatives" portion
> of the KIP. Please keep in mind that I would like to keep this as backwards
> compatible as possible, so a lot of decisions are inferred from that intent.
>
> @Guozhang: IMHO, adding expression evaluation to configuration is an
> incorrect approach. If you absolutely insist on having this clear
> distinction between header/key, then I would suggest instead to have a
> dedicated property for the "key" part. Of course, this is your project so
> I'll just continue whatever approach moves this KIP forward...
>
> @Matthias: Sorry, but update the KIP according to what?
>
> Cheers,
> Luís
>
> On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
> matth...@confluent.io> wrote:
>
>  Well, for "offset" and "timestamp" policy, not communication between
> both is required.
>
> Only if headers are used, user A should communicate the corresponding
> header key to user B.
>
>
> @Luis: can you update the KIP accordingly?
>
>
>
> -Matthias
>
> On 6/17/18 5:36 PM, Ted Yu wrote:
> > My previous reply was just an alternative for consideration.
> >
> > bq.  than a second user B can add a header with key "offset" and thus
> break
> > the intention of user A
> >
> > I didn't see such scenario after reading the KIP. Maybe add this as
> > reasoning for the current approach ?
> >
> > I wonder how user B gets to know the intention of user A. Meaning, if
> user
> > B doesn't follow the norm set by user A, there still would be issue,
> right ?
> >
> >
> > On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax 
> > wrote:
> >
> >> Let me rephrase your answer to make sure I understand what you suggest:
> >>
> >> If compaction strategy is configured to use "offset", and if there is a
> >> header in the record with `key == offset`, than we should use the value
> >> of the record header instead of the actual record offset?
> >>
> >> Do I understand this correctly? If yes, what is the advantage of doing
> >> this? From my point of view, it might be problematic, because if user A
> >> creates a topic and configures "offset" compaction (with the intend that
> >> the record offset should be uses), than a second user B can add a header
> >> with key "offset" and thus break the intention of user A.
> >>
> >> Also, if existing topics might have data with record header key
> >> "offset", the change would not be backward compatible either.
> >>
> >>
> >> -Matthias
> >>
> >> On 6/16/18 6:59 PM, Ted Yu wrote:
> >>> Pardon the brevity in my previous reply.
> >>> I was talking about this bullet:
> >>>
> >>> bq. When this configuration is set to anything other than "*offset*"
> or "
> >>> *timestamp*", then the record headers are scanned for a key matching
> this
> >>> value.
> >>>
> >>> My point is that if matching key in the header is found, its value
> should
> >>> take precedence over the value of the configuration.
> >>> I understand that such interpretation may have slight performance cost.
> >>>
> >>> Cheers
> >>>
> >>> On Sat, Jun 16, 2018 at 6:

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-18 Thread Guozhang Wang
Hello Luís,

I agree that having an expression evaluation as a config value is not the
best approach; if there are better ideas to allow users to specify the
header key which happen to be the same as the preserved config values
"offset" and "timestamp" (although the likelihood may be small, as Ted
mentioned there may be more preserved config values added in the future),
then I'd be happily follow the suggestions. For example, we could have the
config value for header keys as "header-"? Is that what you've
suggested? Or do you suggest using two configs instead, and the second
config specifying the key name, and will only be considered if the first
(i.e. current proposed) config's value is `header`, otherwise be ignored?


Guozhang


On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral  wrote:

>  Hi Ted / Guozhang / Matthias,
>
> @Ted: I've now added your argument to the "Rejected Alternatives" portion
> of the KIP. Please keep in mind that I would like to keep this as backwards
> compatible as possible, so a lot of decisions are inferred from that intent.
>
> @Guozhang: IMHO, adding expression evaluation to configuration is an
> incorrect approach. If you absolutely insist on having this clear
> distinction between header/key, then I would suggest instead to have a
> dedicated property for the "key" part. Of course, this is your project so
> I'll just continue whatever approach moves this KIP forward...
>
> @Matthias: Sorry, but update the KIP according to what?
>
> Cheers,
> Luís
>
> On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
> matth...@confluent.io> wrote:
>
>  Well, for "offset" and "timestamp" policy, not communication between
> both is required.
>
> Only if headers are used, user A should communicate the corresponding
> header key to user B.
>
>
> @Luis: can you update the KIP accordingly?
>
>
>
> -Matthias
>
> On 6/17/18 5:36 PM, Ted Yu wrote:
> > My previous reply was just an alternative for consideration.
> >
> > bq.  than a second user B can add a header with key "offset" and thus
> break
> > the intention of user A
> >
> > I didn't see such scenario after reading the KIP. Maybe add this as
> > reasoning for the current approach ?
> >
> > I wonder how user B gets to know the intention of user A. Meaning, if
> user
> > B doesn't follow the norm set by user A, there still would be issue,
> right ?
> >
> >
> > On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax 
> > wrote:
> >
> >> Let me rephrase your answer to make sure I understand what you suggest:
> >>
> >> If compaction strategy is configured to use "offset", and if there is a
> >> header in the record with `key == offset`, than we should use the value
> >> of the record header instead of the actual record offset?
> >>
> >> Do I understand this correctly? If yes, what is the advantage of doing
> >> this? From my point of view, it might be problematic, because if user A
> >> creates a topic and configures "offset" compaction (with the intend that
> >> the record offset should be uses), than a second user B can add a header
> >> with key "offset" and thus break the intention of user A.
> >>
> >> Also, if existing topics might have data with record header key
> >> "offset", the change would not be backward compatible either.
> >>
> >>
> >> -Matthias
> >>
> >> On 6/16/18 6:59 PM, Ted Yu wrote:
> >>> Pardon the brevity in my previous reply.
> >>> I was talking about this bullet:
> >>>
> >>> bq. When this configuration is set to anything other than "*offset*"
> or "
> >>> *timestamp*", then the record headers are scanned for a key matching
> this
> >>> value.
> >>>
> >>> My point is that if matching key in the header is found, its value
> should
> >>> take precedence over the value of the configuration.
> >>> I understand that such interpretation may have slight performance cost.
> >>>
> >>> Cheers
> >>>
> >>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax <
> matth...@confluent.io>
> >>> wrote:
> >>>
> >>>> Ted,
> >>>>
> >>>> I am also not sure what you mean by "Shouldn't the selection in header
> >>>> have higher precedence over the configuration"? What selection do you
> >>>> mean? And want configuration?
> >>>>

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-18 Thread Luís Cabral
 Hi Ted / Guozhang / Matthias,

@Ted: I've now added your argument to the "Rejected Alternatives" portion of 
the KIP. Please keep in mind that I would like to keep this as backwards 
compatible as possible, so a lot of decisions are inferred from that intent.

@Guozhang: IMHO, adding expression evaluation to configuration is an incorrect 
approach. If you absolutely insist on having this clear distinction between 
header/key, then I would suggest instead to have a dedicated property for the 
"key" part. Of course, this is your project so I'll just continue whatever 
approach moves this KIP forward...

@Matthias: Sorry, but update the KIP according to what?

Cheers,
Luís

On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax 
 wrote:  
 
 Well, for "offset" and "timestamp" policy, not communication between
both is required.

Only if headers are used, user A should communicate the corresponding
header key to user B.


@Luis: can you update the KIP accordingly?



-Matthias

On 6/17/18 5:36 PM, Ted Yu wrote:
> My previous reply was just an alternative for consideration.
> 
> bq.  than a second user B can add a header with key "offset" and thus break
> the intention of user A
> 
> I didn't see such scenario after reading the KIP. Maybe add this as
> reasoning for the current approach ?
> 
> I wonder how user B gets to know the intention of user A. Meaning, if user
> B doesn't follow the norm set by user A, there still would be issue, right ?
> 
> 
> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax 
> wrote:
> 
>> Let me rephrase your answer to make sure I understand what you suggest:
>>
>> If compaction strategy is configured to use "offset", and if there is a
>> header in the record with `key == offset`, than we should use the value
>> of the record header instead of the actual record offset?
>>
>> Do I understand this correctly? If yes, what is the advantage of doing
>> this? From my point of view, it might be problematic, because if user A
>> creates a topic and configures "offset" compaction (with the intend that
>> the record offset should be uses), than a second user B can add a header
>> with key "offset" and thus break the intention of user A.
>>
>> Also, if existing topics might have data with record header key
>> "offset", the change would not be backward compatible either.
>>
>>
>> -Matthias
>>
>> On 6/16/18 6:59 PM, Ted Yu wrote:
>>> Pardon the brevity in my previous reply.
>>> I was talking about this bullet:
>>>
>>> bq. When this configuration is set to anything other than "*offset*" or "
>>> *timestamp*", then the record headers are scanned for a key matching this
>>> value.
>>>
>>> My point is that if matching key in the header is found, its value should
>>> take precedence over the value of the configuration.
>>> I understand that such interpretation may have slight performance cost.
>>>
>>> Cheers
>>>
>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax 
>>> wrote:
>>>
>>>> Ted,
>>>>
>>>> I am also not sure what you mean by "Shouldn't the selection in header
>>>> have higher precedence over the configuration"? What selection do you
>>>> mean? And want configuration?
>>>>
>>>>
>>>> About the first point, I think this is actually a valid concern: To
>>>> address this issue, it seems that we would need to change the accepted
>>>> format of the config. Instead of "offset", "timestamp", "",
>>>> we could replace the last one with "header=".
>>>>
>>>> WDYT?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 6/15/18 3:06 AM, Ted Yu wrote:
>>>>> If selection exists in header, the selection should override the config
>>>> value.
>>>>> Cheers
>>>>>  Original message From: Luis Cabral
>>>>  Date: 6/15/18  1:40 AM  (GMT-08:00) To:
>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
>> compaction
>>>>> Hi,
>>>>>
>>>>> bq. Can the value be determined now ? My thinking is that what if there
>>>> is a third compaction strategy proposed in the future ? We should guard
>>>> against user unknowingly choosing the 'future' strategy.
>>>>>
>>>>> The idea is that the header name to use is flexible, which protects
>>>&

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-17 Thread Matthias J. Sax
Well, for "offset" and "timestamp" policy, not communication between
both is required.

Only if headers are used, user A should communicate the corresponding
header key to user B.


@Luis: can you update the KIP accordingly?



-Matthias

On 6/17/18 5:36 PM, Ted Yu wrote:
> My previous reply was just an alternative for consideration.
> 
> bq.  than a second user B can add a header with key "offset" and thus break
> the intention of user A
> 
> I didn't see such scenario after reading the KIP. Maybe add this as
> reasoning for the current approach ?
> 
> I wonder how user B gets to know the intention of user A. Meaning, if user
> B doesn't follow the norm set by user A, there still would be issue, right ?
> 
> 
> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax 
> wrote:
> 
>> Let me rephrase your answer to make sure I understand what you suggest:
>>
>> If compaction strategy is configured to use "offset", and if there is a
>> header in the record with `key == offset`, than we should use the value
>> of the record header instead of the actual record offset?
>>
>> Do I understand this correctly? If yes, what is the advantage of doing
>> this? From my point of view, it might be problematic, because if user A
>> creates a topic and configures "offset" compaction (with the intend that
>> the record offset should be uses), than a second user B can add a header
>> with key "offset" and thus break the intention of user A.
>>
>> Also, if existing topics might have data with record header key
>> "offset", the change would not be backward compatible either.
>>
>>
>> -Matthias
>>
>> On 6/16/18 6:59 PM, Ted Yu wrote:
>>> Pardon the brevity in my previous reply.
>>> I was talking about this bullet:
>>>
>>> bq. When this configuration is set to anything other than "*offset*" or "
>>> *timestamp*", then the record headers are scanned for a key matching this
>>> value.
>>>
>>> My point is that if matching key in the header is found, its value should
>>> take precedence over the value of the configuration.
>>> I understand that such interpretation may have slight performance cost.
>>>
>>> Cheers
>>>
>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax 
>>> wrote:
>>>
>>>> Ted,
>>>>
>>>> I am also not sure what you mean by "Shouldn't the selection in header
>>>> have higher precedence over the configuration"? What selection do you
>>>> mean? And want configuration?
>>>>
>>>>
>>>> About the first point, I think this is actually a valid concern: To
>>>> address this issue, it seems that we would need to change the accepted
>>>> format of the config. Instead of "offset", "timestamp", "",
>>>> we could replace the last one with "header=".
>>>>
>>>> WDYT?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 6/15/18 3:06 AM, Ted Yu wrote:
>>>>> If selection exists in header, the selection should override the config
>>>> value.
>>>>> Cheers
>>>>>  Original message From: Luis Cabral
>>>>  Date: 6/15/18  1:40 AM  (GMT-08:00) To:
>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
>> compaction
>>>>> Hi,
>>>>>
>>>>> bq. Can the value be determined now ? My thinking is that what if there
>>>> is a third compaction strategy proposed in the future ? We should guard
>>>> against user unknowingly choosing the 'future' strategy.
>>>>>
>>>>> The idea is that the header name to use is flexible, which protects
>>>> current clients that may want to use this from having to adapt their
>>>> already existing header names (they can just specify a new name).
>>>>>
>>>>> bq. Shouldn't the selection in header have higher precedence over the
>>>> configuration ?
>>>>>
>>>>> Not sure what you mean here, could you clarify?
>>>>>
>>>>> bq. Please create JIRA if you haven't already.
>>>>>
>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
>>>>>
>>>>> Cheers,
>>>>> Luís
>>>>>
>>>>>> On 11 Jun 2018, at 01:50, Ted Yu  wrote:
>>>>>>
>>>>>> bq. When this configuration is set to anything other than "*offset*"
>> or
>>>> "
>>>>>> *timestamp*", then the record headers are scanned for a key matching
>>>> this
>>>>>> value.
>>>>>>
>>>>>> Can the value be determined now ? My thinking is that what if there
>> is a
>>>>>> third compaction strategy proposed in the future ? We should guard
>>>> against
>>>>>> user unknowingly choosing the 'future' strategy.
>>>>>>
>>>>>> bq. If this header is found
>>>>>>
>>>>>> Shouldn't the selection in header have higher precedence over the
>>>> configuration
>>>>>> ?
>>>>>>
>>>>>> Please create JIRA if you haven't already.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
>>>> 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Any takers on having a look at this KIP and voting on it?
>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>> 280%3A+Enhanced+log+compaction
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Luis
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-17 Thread Guozhang Wang
I think refactoring the value `header-key` to `header=` is a
better idea, to allow users to specify using the header key which happen to
be the same name to either `offset` or `timestamp`.


Guozhang

On Sun, Jun 17, 2018 at 5:36 PM, Ted Yu  wrote:

> My previous reply was just an alternative for consideration.
>
> bq.  than a second user B can add a header with key "offset" and thus break
> the intention of user A
>
> I didn't see such scenario after reading the KIP. Maybe add this as
> reasoning for the current approach ?
>
> I wonder how user B gets to know the intention of user A. Meaning, if user
> B doesn't follow the norm set by user A, there still would be issue, right
> ?
>
>
> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax 
> wrote:
>
> > Let me rephrase your answer to make sure I understand what you suggest:
> >
> > If compaction strategy is configured to use "offset", and if there is a
> > header in the record with `key == offset`, than we should use the value
> > of the record header instead of the actual record offset?
> >
> > Do I understand this correctly? If yes, what is the advantage of doing
> > this? From my point of view, it might be problematic, because if user A
> > creates a topic and configures "offset" compaction (with the intend that
> > the record offset should be uses), than a second user B can add a header
> > with key "offset" and thus break the intention of user A.
> >
> > Also, if existing topics might have data with record header key
> > "offset", the change would not be backward compatible either.
> >
> >
> > -Matthias
> >
> > On 6/16/18 6:59 PM, Ted Yu wrote:
> > > Pardon the brevity in my previous reply.
> > > I was talking about this bullet:
> > >
> > > bq. When this configuration is set to anything other than "*offset*"
> or "
> > > *timestamp*", then the record headers are scanned for a key matching
> this
> > > value.
> > >
> > > My point is that if matching key in the header is found, its value
> should
> > > take precedence over the value of the configuration.
> > > I understand that such interpretation may have slight performance cost.
> > >
> > > Cheers
> > >
> > > On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> Ted,
> > >>
> > >> I am also not sure what you mean by "Shouldn't the selection in header
> > >> have higher precedence over the configuration"? What selection do you
> > >> mean? And want configuration?
> > >>
> > >>
> > >> About the first point, I think this is actually a valid concern: To
> > >> address this issue, it seems that we would need to change the accepted
> > >> format of the config. Instead of "offset", "timestamp",
> "",
> > >> we could replace the last one with "header=".
> > >>
> > >> WDYT?
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 6/15/18 3:06 AM, Ted Yu wrote:
> > >>> If selection exists in header, the selection should override the
> config
> > >> value.
> > >>> Cheers
> > >>>  Original message From: Luis Cabral
> > >>  Date: 6/15/18  1:40 AM  (GMT-08:00)
> To:
> > >> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
> > compaction
> > >>> Hi,
> > >>>
> > >>> bq. Can the value be determined now ? My thinking is that what if
> there
> > >> is a third compaction strategy proposed in the future ? We should
> guard
> > >> against user unknowingly choosing the 'future' strategy.
> > >>>
> > >>> The idea is that the header name to use is flexible, which protects
> > >> current clients that may want to use this from having to adapt their
> > >> already existing header names (they can just specify a new name).
> > >>>
> > >>> bq. Shouldn't the selection in header have higher precedence over the
> > >> configuration ?
> > >>>
> > >>> Not sure what you mean here, could you clarify?
> > >>>
> > >>> bq. Please create JIRA if you haven't already.
> > >>>
> > >>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
> > >>>
> > >>>

Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-17 Thread Ted Yu
My previous reply was just an alternative for consideration.

bq.  than a second user B can add a header with key "offset" and thus break
the intention of user A

I didn't see such scenario after reading the KIP. Maybe add this as
reasoning for the current approach ?

I wonder how user B gets to know the intention of user A. Meaning, if user
B doesn't follow the norm set by user A, there still would be issue, right ?


On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax 
wrote:

> Let me rephrase your answer to make sure I understand what you suggest:
>
> If compaction strategy is configured to use "offset", and if there is a
> header in the record with `key == offset`, than we should use the value
> of the record header instead of the actual record offset?
>
> Do I understand this correctly? If yes, what is the advantage of doing
> this? From my point of view, it might be problematic, because if user A
> creates a topic and configures "offset" compaction (with the intend that
> the record offset should be uses), than a second user B can add a header
> with key "offset" and thus break the intention of user A.
>
> Also, if existing topics might have data with record header key
> "offset", the change would not be backward compatible either.
>
>
> -Matthias
>
> On 6/16/18 6:59 PM, Ted Yu wrote:
> > Pardon the brevity in my previous reply.
> > I was talking about this bullet:
> >
> > bq. When this configuration is set to anything other than "*offset*" or "
> > *timestamp*", then the record headers are scanned for a key matching this
> > value.
> >
> > My point is that if matching key in the header is found, its value should
> > take precedence over the value of the configuration.
> > I understand that such interpretation may have slight performance cost.
> >
> > Cheers
> >
> > On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax 
> > wrote:
> >
> >> Ted,
> >>
> >> I am also not sure what you mean by "Shouldn't the selection in header
> >> have higher precedence over the configuration"? What selection do you
> >> mean? And want configuration?
> >>
> >>
> >> About the first point, I think this is actually a valid concern: To
> >> address this issue, it seems that we would need to change the accepted
> >> format of the config. Instead of "offset", "timestamp", "",
> >> we could replace the last one with "header=".
> >>
> >> WDYT?
> >>
> >>
> >> -Matthias
> >>
> >> On 6/15/18 3:06 AM, Ted Yu wrote:
> >>> If selection exists in header, the selection should override the config
> >> value.
> >>> Cheers
> >>>  Original message From: Luis Cabral
> >>  Date: 6/15/18  1:40 AM  (GMT-08:00) To:
> >> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
> compaction
> >>> Hi,
> >>>
> >>> bq. Can the value be determined now ? My thinking is that what if there
> >> is a third compaction strategy proposed in the future ? We should guard
> >> against user unknowingly choosing the 'future' strategy.
> >>>
> >>> The idea is that the header name to use is flexible, which protects
> >> current clients that may want to use this from having to adapt their
> >> already existing header names (they can just specify a new name).
> >>>
> >>> bq. Shouldn't the selection in header have higher precedence over the
> >> configuration ?
> >>>
> >>> Not sure what you mean here, could you clarify?
> >>>
> >>> bq. Please create JIRA if you haven't already.
> >>>
> >>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
> >>>
> >>> Cheers,
> >>> Luís
> >>>
> >>>> On 11 Jun 2018, at 01:50, Ted Yu  wrote:
> >>>>
> >>>> bq. When this configuration is set to anything other than "*offset*"
> or
> >> "
> >>>> *timestamp*", then the record headers are scanned for a key matching
> >> this
> >>>> value.
> >>>>
> >>>> Can the value be determined now ? My thinking is that what if there
> is a
> >>>> third compaction strategy proposed in the future ? We should guard
> >> against
> >>>> user unknowingly choosing the 'future' strategy.
> >>>>
> >>>> bq. If this header is found
> >>>>
> >>>> Shouldn't the selection in header have higher precedence over the
> >> configuration
> >>>> ?
> >>>>
> >>>> Please create JIRA if you haven't already.
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
> >> 
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Any takers on having a look at this KIP and voting on it?
> >>>>>
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>> 280%3A+Enhanced+log+compaction
> >>>>>
> >>>>> Cheers,
> >>>>> Luis
> >>>>>
> >>>
> >>
> >>
> >
>
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-17 Thread Matthias J. Sax
Let me rephrase your answer to make sure I understand what you suggest:

If compaction strategy is configured to use "offset", and if there is a
header in the record with `key == offset`, than we should use the value
of the record header instead of the actual record offset?

Do I understand this correctly? If yes, what is the advantage of doing
this? From my point of view, it might be problematic, because if user A
creates a topic and configures "offset" compaction (with the intend that
the record offset should be uses), than a second user B can add a header
with key "offset" and thus break the intention of user A.

Also, if existing topics might have data with record header key
"offset", the change would not be backward compatible either.


-Matthias

On 6/16/18 6:59 PM, Ted Yu wrote:
> Pardon the brevity in my previous reply.
> I was talking about this bullet:
> 
> bq. When this configuration is set to anything other than "*offset*" or "
> *timestamp*", then the record headers are scanned for a key matching this
> value.
> 
> My point is that if matching key in the header is found, its value should
> take precedence over the value of the configuration.
> I understand that such interpretation may have slight performance cost.
> 
> Cheers
> 
> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax 
> wrote:
> 
>> Ted,
>>
>> I am also not sure what you mean by "Shouldn't the selection in header
>> have higher precedence over the configuration"? What selection do you
>> mean? And want configuration?
>>
>>
>> About the first point, I think this is actually a valid concern: To
>> address this issue, it seems that we would need to change the accepted
>> format of the config. Instead of "offset", "timestamp", "",
>> we could replace the last one with "header=".
>>
>> WDYT?
>>
>>
>> -Matthias
>>
>> On 6/15/18 3:06 AM, Ted Yu wrote:
>>> If selection exists in header, the selection should override the config
>> value.
>>> Cheers
>>>  Original message From: Luis Cabral
>>  Date: 6/15/18  1:40 AM  (GMT-08:00) To:
>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>>> Hi,
>>>
>>> bq. Can the value be determined now ? My thinking is that what if there
>> is a third compaction strategy proposed in the future ? We should guard
>> against user unknowingly choosing the 'future' strategy.
>>>
>>> The idea is that the header name to use is flexible, which protects
>> current clients that may want to use this from having to adapt their
>> already existing header names (they can just specify a new name).
>>>
>>> bq. Shouldn't the selection in header have higher precedence over the
>> configuration ?
>>>
>>> Not sure what you mean here, could you clarify?
>>>
>>> bq. Please create JIRA if you haven't already.
>>>
>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
>>>
>>> Cheers,
>>> Luís
>>>
>>>> On 11 Jun 2018, at 01:50, Ted Yu  wrote:
>>>>
>>>> bq. When this configuration is set to anything other than "*offset*" or
>> "
>>>> *timestamp*", then the record headers are scanned for a key matching
>> this
>>>> value.
>>>>
>>>> Can the value be determined now ? My thinking is that what if there is a
>>>> third compaction strategy proposed in the future ? We should guard
>> against
>>>> user unknowingly choosing the 'future' strategy.
>>>>
>>>> bq. If this header is found
>>>>
>>>> Shouldn't the selection in header have higher precedence over the
>> configuration
>>>> ?
>>>>
>>>> Please create JIRA if you haven't already.
>>>>
>>>> Thanks
>>>>
>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
>> 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Any takers on having a look at this KIP and voting on it?
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 280%3A+Enhanced+log+compaction
>>>>>
>>>>> Cheers,
>>>>> Luis
>>>>>
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-16 Thread Ted Yu
Pardon the brevity in my previous reply.
I was talking about this bullet:

bq. When this configuration is set to anything other than "*offset*" or "
*timestamp*", then the record headers are scanned for a key matching this
value.

My point is that if matching key in the header is found, its value should
take precedence over the value of the configuration.
I understand that such interpretation may have slight performance cost.

Cheers

On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax 
wrote:

> Ted,
>
> I am also not sure what you mean by "Shouldn't the selection in header
> have higher precedence over the configuration"? What selection do you
> mean? And want configuration?
>
>
> About the first point, I think this is actually a valid concern: To
> address this issue, it seems that we would need to change the accepted
> format of the config. Instead of "offset", "timestamp", "",
> we could replace the last one with "header=".
>
> WDYT?
>
>
> -Matthias
>
> On 6/15/18 3:06 AM, Ted Yu wrote:
> > If selection exists in header, the selection should override the config
> value.
> > Cheers
> > ---- Original message From: Luis Cabral
>  Date: 6/15/18  1:40 AM  (GMT-08:00) To:
> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> > Hi,
> >
> > bq. Can the value be determined now ? My thinking is that what if there
> is a third compaction strategy proposed in the future ? We should guard
> against user unknowingly choosing the 'future' strategy.
> >
> > The idea is that the header name to use is flexible, which protects
> current clients that may want to use this from having to adapt their
> already existing header names (they can just specify a new name).
> >
> > bq. Shouldn't the selection in header have higher precedence over the
> configuration ?
> >
> > Not sure what you mean here, could you clarify?
> >
> > bq. Please create JIRA if you haven't already.
> >
> > Done: https://issues.apache.org/jira/browse/KAFKA-7061
> >
> > Cheers,
> > Luís
> >
> >> On 11 Jun 2018, at 01:50, Ted Yu  wrote:
> >>
> >> bq. When this configuration is set to anything other than "*offset*" or
> "
> >> *timestamp*", then the record headers are scanned for a key matching
> this
> >> value.
> >>
> >> Can the value be determined now ? My thinking is that what if there is a
> >> third compaction strategy proposed in the future ? We should guard
> against
> >> user unknowingly choosing the 'future' strategy.
> >>
> >> bq. If this header is found
> >>
> >> Shouldn't the selection in header have higher precedence over the
> configuration
> >> ?
> >>
> >> Please create JIRA if you haven't already.
> >>
> >> Thanks
> >>
> >> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
> 
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Any takers on having a look at this KIP and voting on it?
> >>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 280%3A+Enhanced+log+compaction
> >>>
> >>> Cheers,
> >>> Luis
> >>>
> >
>
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-16 Thread Matthias J. Sax
Ted,

I am also not sure what you mean by "Shouldn't the selection in header
have higher precedence over the configuration"? What selection do you
mean? And want configuration?


About the first point, I think this is actually a valid concern: To
address this issue, it seems that we would need to change the accepted
format of the config. Instead of "offset", "timestamp", "",
we could replace the last one with "header=".

WDYT?


-Matthias

On 6/15/18 3:06 AM, Ted Yu wrote:
> If selection exists in header, the selection should override the config value.
> Cheers
>  Original message From: Luis Cabral 
>  Date: 6/15/18  1:40 AM  (GMT-08:00) To: 
> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log compaction 
> Hi,
> 
> bq. Can the value be determined now ? My thinking is that what if there is a 
> third compaction strategy proposed in the future ? We should guard against 
> user unknowingly choosing the 'future' strategy.
> 
> The idea is that the header name to use is flexible, which protects current 
> clients that may want to use this from having to adapt their already existing 
> header names (they can just specify a new name).
> 
> bq. Shouldn't the selection in header have higher precedence over the 
> configuration ?
> 
> Not sure what you mean here, could you clarify?
> 
> bq. Please create JIRA if you haven't already.
> 
> Done: https://issues.apache.org/jira/browse/KAFKA-7061
> 
> Cheers,
> Luís 
> 
>> On 11 Jun 2018, at 01:50, Ted Yu  wrote:
>>
>> bq. When this configuration is set to anything other than "*offset*" or "
>> *timestamp*", then the record headers are scanned for a key matching this
>> value.
>>
>> Can the value be determined now ? My thinking is that what if there is a
>> third compaction strategy proposed in the future ? We should guard against
>> user unknowingly choosing the 'future' strategy.
>>
>> bq. If this header is found
>>
>> Shouldn't the selection in header have higher precedence over the 
>> configuration
>> ?
>>
>> Please create JIRA if you haven't already.
>>
>> Thanks
>>
>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral 
>> wrote:
>>
>>> Hi all,
>>>
>>> Any takers on having a look at this KIP and voting on it?
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 280%3A+Enhanced+log+compaction
>>>
>>> Cheers,
>>> Luis
>>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-15 Thread Ted Yu
If selection exists in header, the selection should override the config value.
Cheers
 Original message From: Luis Cabral 
 Date: 6/15/18  1:40 AM  (GMT-08:00) To: 
dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log compaction 
Hi,

bq. Can the value be determined now ? My thinking is that what if there is a 
third compaction strategy proposed in the future ? We should guard against user 
unknowingly choosing the 'future' strategy.

The idea is that the header name to use is flexible, which protects current 
clients that may want to use this from having to adapt their already existing 
header names (they can just specify a new name).

bq. Shouldn't the selection in header have higher precedence over the 
configuration ?

Not sure what you mean here, could you clarify?

bq. Please create JIRA if you haven't already.

Done: https://issues.apache.org/jira/browse/KAFKA-7061

Cheers,
Luís 

> On 11 Jun 2018, at 01:50, Ted Yu  wrote:
> 
> bq. When this configuration is set to anything other than "*offset*" or "
> *timestamp*", then the record headers are scanned for a key matching this
> value.
> 
> Can the value be determined now ? My thinking is that what if there is a
> third compaction strategy proposed in the future ? We should guard against
> user unknowingly choosing the 'future' strategy.
> 
> bq. If this header is found
> 
> Shouldn't the selection in header have higher precedence over the 
> configuration
> ?
> 
> Please create JIRA if you haven't already.
> 
> Thanks
> 
> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral 
> wrote:
> 
>> Hi all,
>> 
>> Any takers on having a look at this KIP and voting on it?
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 280%3A+Enhanced+log+compaction
>> 
>> Cheers,
>> Luis
>> 



Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-15 Thread Luis Cabral
Hi,

bq. Can the value be determined now ? My thinking is that what if there is a 
third compaction strategy proposed in the future ? We should guard against user 
unknowingly choosing the 'future' strategy.

The idea is that the header name to use is flexible, which protects current 
clients that may want to use this from having to adapt their already existing 
header names (they can just specify a new name).

bq. Shouldn't the selection in header have higher precedence over the 
configuration ?

Not sure what you mean here, could you clarify?

bq. Please create JIRA if you haven't already.

Done: https://issues.apache.org/jira/browse/KAFKA-7061

Cheers,
Luís 

> On 11 Jun 2018, at 01:50, Ted Yu  wrote:
> 
> bq. When this configuration is set to anything other than "*offset*" or "
> *timestamp*", then the record headers are scanned for a key matching this
> value.
> 
> Can the value be determined now ? My thinking is that what if there is a
> third compaction strategy proposed in the future ? We should guard against
> user unknowingly choosing the 'future' strategy.
> 
> bq. If this header is found
> 
> Shouldn't the selection in header have higher precedence over the 
> configuration
> ?
> 
> Please create JIRA if you haven't already.
> 
> Thanks
> 
> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral 
> wrote:
> 
>> Hi all,
>> 
>> Any takers on having a look at this KIP and voting on it?
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 280%3A+Enhanced+log+compaction
>> 
>> Cheers,
>> Luis
>> 



Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-10 Thread Ted Yu
bq. When this configuration is set to anything other than "*offset*" or "
*timestamp*", then the record headers are scanned for a key matching this
value.

Can the value be determined now ? My thinking is that what if there is a
third compaction strategy proposed in the future ? We should guard against
user unknowingly choosing the 'future' strategy.

bq. If this header is found

Shouldn't the selection in header have higher precedence over the configuration
?

Please create JIRA if you haven't already.

Thanks

On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral 
wrote:

> Hi all,
>
> Any takers on having a look at this KIP and voting on it?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%3A+Enhanced+log+compaction
>
> Cheers,
> Luis
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-10 Thread Matthias J. Sax
+1 (binding)

-Matthias

On 6/9/18 12:39 AM, Luís Cabral wrote:
> Hi all,
> 
> Any takers on having a look at this KIP and voting on it?
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction
> 
> Cheers,
> Luis
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-280: Enhanced log compaction

2018-06-04 Thread Guozhang Wang
Thanks Luís for working on this KIP. I'm +1.

One note is that there is another KIP, KIP-228 going on for allowing
negative timestamp (I'm cc'ing the contributor of that KIP as well). One
related issue from KIP-228 is the handling of "-1": we will treat all other
negative values normally as some time before UTC 1970 Jan 1st, but keep the
behavior of treating "-1" as unknown.

So depending on which KIP gets merged first, we would need make sure by the
end of the day broker behavior for using timestamp for comparison would be,
when using timestamp as the comparison value:

1. if none of them are -1, just compare the values normally as currently
proposed in KIP-280.
2. if one of them is -1 and the other is not, choose the other.
3. if both of them is -1, use offset as tie-breaker.


Guozhang

On Mon, Jun 4, 2018 at 2:04 AM, Luís Cabral 
wrote:

>  Hi all,
>
>
> After a thorough discussion, this KIP is now ready to go into a vote:
>
> KIP-280: Enhanced log compaction - Apache Kafka - Apache Software
> Foundation
>
>
> |
> |
> |  |
> KIP-280: Enhanced log compaction - Apache Kafka - Apache Software Founda...
>
>
>  |
>
>  |
>
>  |
>
>
>
>
> Kind Regards,
> Luís CabralOn Friday, June 1, 2018, 8:51:50 PM GMT+2, Guozhang Wang <
> wangg...@gmail.com> wrote:
>
>  Hello Luis,
>
> Please feel free to continue on the voting process as there seems be no
> further comments on this thread (I have synced with Jun and Ismael
> separately offline and they are in consent with the approach to add the
> fields in offset map for all cases).
>
> We can still continue on reviewing the PR while voting on the thread so
> that it can get in earlier into trunk for the next release.
>
>
>
> Guozhang
>
>
> On Mon, May 28, 2018 at 11:04 AM, Matthias J. Sax 
> wrote:
>
> > Luis,
> >
> > this week is feature freeze for the upcoming 2.0 release and most people
> > focus on getting their PR merged. Thus, this and the next week (until
> > code freeze) KIPs for 2.1 are not a high priority for most people.
> >
> > Please bear with us. Thanks for your understanding.
> >
> >
> > -Matthias
> >
> > On 5/28/18 5:21 AM, Luís Cabral wrote:
> > >  Hi Guozhang,
> > >
> > > It doesn't look like there will be much feedback here.
> > > Is it alright if I just update the spec back to a standardized
> behaviour
> > and move this along?
> > >
> > > Cheers,Luis
> > >On Thursday, May 24, 2018, 11:20:01 AM GMT+2, Luis Cabral <
> > luis_cab...@yahoo.com> wrote:
> > >
> > >  Hi Jun / Ismael,
> > >
> > > Any chance to get your opinion on this?
> > > Thanks in advance!
> > >
> > > Regards,
> > > Luís
> > >
> > >> On 22 May 2018, at 17:30, Guozhang Wang  wrote:
> > >>
> > >> Hello Luís,
> > >>
> > >> While reviewing your PR I realized my previous calculation on the
> memory
> > >> usage was incorrect: in fact, in the current implementation, each
> entry
> > in
> > >> the memory-bounded cache is 16 (default MD5 hash digest length) + 8
> > (long
> > >> type) = 24 bytes, and if we add the long-typed version value it is 32
> > >> bytes. I.e. each entry will be increased by 33%, not doubling.
> > >>
> > >> After redoing the math I'm bit leaning towards just adding this entry
> > for
> > >> all cases rather than treating timestamp differently with others
> (sorry
> > for
> > >> being back and forth, but I just want to make sure we've got a good
> > balance
> > >> between efficiency and semantics consistency). I've also chatted with
> > Jun
> > >> and Ismael about this (cc'ed), and maybe you guys can chime in here as
> > well.
> > >>
> > >>
> > >> Guozhang
> > >>
> > >>
> > >> On Tue, May 22, 2018 at 6:45 AM, Luís Cabral
> > 
> > >> wrote:
> > >>
> > >>> Hi Matthias / Guozhang,
> > >>>
> > >>> Were the questions clarified?
> > >>> Please feel free to add more feedback, otherwise it would be nice to
> > move
> > >>> this topic onwards 
> > >>>
> > >>> Kind Regards,
> > >>> Luís Cabral
> > >>>
> > >>> From: Guozhang Wang
> > >>> Sent: 09 May 2018 20:00
> > >>> To: dev@kafka.apache.org
> > >>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> > >>>
> > >>> I have thought about being consistency in strategy v.s. practical
> > concerns
> > >>> about storage convenience to its impact on compaction effectiveness.
> > >>>
> > >>> The different between timestamp and the header key-value pairs is
> that
> > for
> > >>> the latter, as I mentioned before, "it is arguably out of Kafka's
> > control,
> > >>> and indeed users may (mistakenly) generate many records with the same
> > key
> > >>> and the same header value." So giving up tie breakers may result in
> > very
> > >>> poor compaction effectiveness when it happens, while for timestamps
> the
> > >>> likelihood of this is considered very small.
> > >>>
> > >>>
> > >>> Guozhang
> > >>>
> > >>>
> > >>> On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > >>> wrote:
> > >>>
> >  Thanks.
> > 
> >  To reverse the question: if this argument holds, why does it not
> apply
> >