Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-22 Thread Guozhang Wang
The update looks good to me. Thanks Damian.

Guozhang

On Fri, Aug 19, 2016 at 11:05 AM, Damian Guy  wrote:

> Thanks Jun.
>
> Everyone - i've updated the KIP to use a comma-separated list of
> cleanup.policies as suggested. I know we have already voted on this
> proposal, so are there any objections to this change?
>
> Thanks,
> Damian
>
> On Fri, 19 Aug 2016 at 18:38 Jun Rao  wrote:
>
> > Damian,
> >
> > Yes, using comma-separated policies does seem more extensible for the
> > future. If we want to adopt it, it's better to do it as part of this KIP.
> > Perhaps you can just update the KIP and ask this thread to see if there
> is
> > any objections with the change.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Aug 19, 2016 at 10:01 AM, Damian Guy 
> wrote:
> >
> > > Hi Grant,
> > >
> > > I apologise - I seemed to have skipped over Joel's email.
> > > It is not something we considered, but seems valid.
> > > I'm not sure if we should do it as part of this KIP or revisit it
> if/when
> > > we have more cleanup policies?
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Fri, 19 Aug 2016 at 15:57 Grant Henke  wrote:
> > >
> > > > Thanks for this KIP Damien.
> > > >
> > > > I know this vote has passed and I think its is okay as is, but I had
> > > > similar thoughts as Joel about combining compaction policies.
> > > >
> > > > I'm just wondering if it makes sense to allow specifying multiple
> > > > > comma-separated policies "compact,delete" as opposed to
> > > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever
> come
> > up
> > > > > with more policies. The order could potentially indicate
> precedence.
> > > > >
> > > >
> > > > Out of curiosity was the option of supporting a list of policies
> > > rejected?
> > > > Is it something we may consider adding later but didn't want to
> include
> > > in
> > > > the scope of this KIP?
> > > >
> > > > Thanks,
> > > > Grant
> > > >
> > > >
> > > >
> > > > On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy 
> > wrote:
> > > >
> > > > > Thanks for the proposal. I'm +1 overall with a thought somewhat
> > related
> > > > to
> > > > > Jun's comments.
> > > > >
> > > > > While there may not yet be a sensible use case for it, it should be
> > (in
> > > > > theory) legal to have compact_and_delete with size based retention
> as
> > > > well.
> > > > > I'm just wondering if it makes sense to allow specifying multiple
> > > > > comma-separated policies "compact,delete" as opposed to
> > > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever
> come
> > up
> > > > > with more policies. The order could potentially indicate
> precedence.
> > > > > Anyway, it is just a thought - it may end up being very confusing
> for
> > > > > users.
> > > > >
> > > > > @Jason - I agree this could be used to handle offset expiration as
> > > well.
> > > > We
> > > > > can discuss that separately though; and if we do that we would want
> > to
> > > > also
> > > > > deprecate the retention field in the commit requests.
> > > > >
> > > > > Joel
> > > > >
> > > > > On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy 
> > > > wrote:
> > > > >
> > > > > > Thanks Jason.
> > > > > > The log retention.ms will be set to a value that greater than
> the
> > > > window
> > > > > > retention time. So as windows expire, they eventually get cleaned
> > up
> > > by
> > > > > the
> > > > > > broker. It doesn't matter if old windows are around for sometime
> > > beyond
> > > > > > their usefulness, more that they do eventually get removed and
> the
> > > log
> > > > > > doesn't grow indefinitely (as it does now).
> > > > > >
> > > > > > Damian
> > > > > >
> > > > > > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson  >
> > > > wrote:
> > > > > >
> > > > > > > Hey Damian,
> > > > > > >
> > > > > > > That's true, but it would avoid the need to write tombstones
> for
> > > the
> > > > > > > expired offsets I guess. I'm actually not sure it's a great
> idea
> > > > > anyway.
> > > > > > > One thing we've talked about is not expiring any offsets as
> long
> > > as a
> > > > > > group
> > > > > > > is alive, which would require some custom expiration logic.
> > There's
> > > > > also
> > > > > > > the fact that we'd risk expiring group metadata which is stored
> > in
> > > > the
> > > > > > same
> > > > > > > log. Having a builtin expiration mechanism may be more useful
> for
> > > the
> > > > > > > compacted topics we maintain in Connect, but I think there too
> we
> > > > might
> > > > > > > need some custom logic. For example, expiring connector configs
> > > > purely
> > > > > > > based on time doesn't seem like what we'd want.
> > > > > > >
> > > > > > > By the way, I wonder if you could describe the expected usage
> in
> > a
> > > > > little
> > > > > > > more detail in the KIP for those of us who are not as familiar
> > with
> > > > > Kafka
> > > > > > > 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-19 Thread Damian Guy
Thanks Jun.

Everyone - i've updated the KIP to use a comma-separated list of
cleanup.policies as suggested. I know we have already voted on this
proposal, so are there any objections to this change?

Thanks,
Damian

On Fri, 19 Aug 2016 at 18:38 Jun Rao  wrote:

> Damian,
>
> Yes, using comma-separated policies does seem more extensible for the
> future. If we want to adopt it, it's better to do it as part of this KIP.
> Perhaps you can just update the KIP and ask this thread to see if there is
> any objections with the change.
>
> Thanks,
>
> Jun
>
> On Fri, Aug 19, 2016 at 10:01 AM, Damian Guy  wrote:
>
> > Hi Grant,
> >
> > I apologise - I seemed to have skipped over Joel's email.
> > It is not something we considered, but seems valid.
> > I'm not sure if we should do it as part of this KIP or revisit it if/when
> > we have more cleanup policies?
> >
> > Thanks,
> > Damian
> >
> > On Fri, 19 Aug 2016 at 15:57 Grant Henke  wrote:
> >
> > > Thanks for this KIP Damien.
> > >
> > > I know this vote has passed and I think its is okay as is, but I had
> > > similar thoughts as Joel about combining compaction policies.
> > >
> > > I'm just wondering if it makes sense to allow specifying multiple
> > > > comma-separated policies "compact,delete" as opposed to
> > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come
> up
> > > > with more policies. The order could potentially indicate precedence.
> > > >
> > >
> > > Out of curiosity was the option of supporting a list of policies
> > rejected?
> > > Is it something we may consider adding later but didn't want to include
> > in
> > > the scope of this KIP?
> > >
> > > Thanks,
> > > Grant
> > >
> > >
> > >
> > > On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy 
> wrote:
> > >
> > > > Thanks for the proposal. I'm +1 overall with a thought somewhat
> related
> > > to
> > > > Jun's comments.
> > > >
> > > > While there may not yet be a sensible use case for it, it should be
> (in
> > > > theory) legal to have compact_and_delete with size based retention as
> > > well.
> > > > I'm just wondering if it makes sense to allow specifying multiple
> > > > comma-separated policies "compact,delete" as opposed to
> > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come
> up
> > > > with more policies. The order could potentially indicate precedence.
> > > > Anyway, it is just a thought - it may end up being very confusing for
> > > > users.
> > > >
> > > > @Jason - I agree this could be used to handle offset expiration as
> > well.
> > > We
> > > > can discuss that separately though; and if we do that we would want
> to
> > > also
> > > > deprecate the retention field in the commit requests.
> > > >
> > > > Joel
> > > >
> > > > On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy 
> > > wrote:
> > > >
> > > > > Thanks Jason.
> > > > > The log retention.ms will be set to a value that greater than the
> > > window
> > > > > retention time. So as windows expire, they eventually get cleaned
> up
> > by
> > > > the
> > > > > broker. It doesn't matter if old windows are around for sometime
> > beyond
> > > > > their usefulness, more that they do eventually get removed and the
> > log
> > > > > doesn't grow indefinitely (as it does now).
> > > > >
> > > > > Damian
> > > > >
> > > > > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson 
> > > wrote:
> > > > >
> > > > > > Hey Damian,
> > > > > >
> > > > > > That's true, but it would avoid the need to write tombstones for
> > the
> > > > > > expired offsets I guess. I'm actually not sure it's a great idea
> > > > anyway.
> > > > > > One thing we've talked about is not expiring any offsets as long
> > as a
> > > > > group
> > > > > > is alive, which would require some custom expiration logic.
> There's
> > > > also
> > > > > > the fact that we'd risk expiring group metadata which is stored
> in
> > > the
> > > > > same
> > > > > > log. Having a builtin expiration mechanism may be more useful for
> > the
> > > > > > compacted topics we maintain in Connect, but I think there too we
> > > might
> > > > > > need some custom logic. For example, expiring connector configs
> > > purely
> > > > > > based on time doesn't seem like what we'd want.
> > > > > >
> > > > > > By the way, I wonder if you could describe the expected usage in
> a
> > > > little
> > > > > > more detail in the KIP for those of us who are not as familiar
> with
> > > > Kafka
> > > > > > Streams. Is the intention to have the log retain only the most
> > recent
> > > > > > window? In that case, would you always set the log retention time
> > to
> > > > the
> > > > > > window length? And I suppose a consumer would do a seek to the
> > start
> > > of
> > > > > the
> > > > > > window (as soon as KIP-33 is available) and consume from there in
> > > order
> > > > > to
> > > > > > read the current state?
> > > > > >
> > > > > > Thanks,
> > 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-19 Thread Jun Rao
Damian,

Yes, using comma-separated policies does seem more extensible for the
future. If we want to adopt it, it's better to do it as part of this KIP.
Perhaps you can just update the KIP and ask this thread to see if there is
any objections with the change.

Thanks,

Jun

On Fri, Aug 19, 2016 at 10:01 AM, Damian Guy  wrote:

> Hi Grant,
>
> I apologise - I seemed to have skipped over Joel's email.
> It is not something we considered, but seems valid.
> I'm not sure if we should do it as part of this KIP or revisit it if/when
> we have more cleanup policies?
>
> Thanks,
> Damian
>
> On Fri, 19 Aug 2016 at 15:57 Grant Henke  wrote:
>
> > Thanks for this KIP Damien.
> >
> > I know this vote has passed and I think its is okay as is, but I had
> > similar thoughts as Joel about combining compaction policies.
> >
> > I'm just wondering if it makes sense to allow specifying multiple
> > > comma-separated policies "compact,delete" as opposed to
> > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
> > > with more policies. The order could potentially indicate precedence.
> > >
> >
> > Out of curiosity was the option of supporting a list of policies
> rejected?
> > Is it something we may consider adding later but didn't want to include
> in
> > the scope of this KIP?
> >
> > Thanks,
> > Grant
> >
> >
> >
> > On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy  wrote:
> >
> > > Thanks for the proposal. I'm +1 overall with a thought somewhat related
> > to
> > > Jun's comments.
> > >
> > > While there may not yet be a sensible use case for it, it should be (in
> > > theory) legal to have compact_and_delete with size based retention as
> > well.
> > > I'm just wondering if it makes sense to allow specifying multiple
> > > comma-separated policies "compact,delete" as opposed to
> > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
> > > with more policies. The order could potentially indicate precedence.
> > > Anyway, it is just a thought - it may end up being very confusing for
> > > users.
> > >
> > > @Jason - I agree this could be used to handle offset expiration as
> well.
> > We
> > > can discuss that separately though; and if we do that we would want to
> > also
> > > deprecate the retention field in the commit requests.
> > >
> > > Joel
> > >
> > > On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy 
> > wrote:
> > >
> > > > Thanks Jason.
> > > > The log retention.ms will be set to a value that greater than the
> > window
> > > > retention time. So as windows expire, they eventually get cleaned up
> by
> > > the
> > > > broker. It doesn't matter if old windows are around for sometime
> beyond
> > > > their usefulness, more that they do eventually get removed and the
> log
> > > > doesn't grow indefinitely (as it does now).
> > > >
> > > > Damian
> > > >
> > > > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson 
> > wrote:
> > > >
> > > > > Hey Damian,
> > > > >
> > > > > That's true, but it would avoid the need to write tombstones for
> the
> > > > > expired offsets I guess. I'm actually not sure it's a great idea
> > > anyway.
> > > > > One thing we've talked about is not expiring any offsets as long
> as a
> > > > group
> > > > > is alive, which would require some custom expiration logic. There's
> > > also
> > > > > the fact that we'd risk expiring group metadata which is stored in
> > the
> > > > same
> > > > > log. Having a builtin expiration mechanism may be more useful for
> the
> > > > > compacted topics we maintain in Connect, but I think there too we
> > might
> > > > > need some custom logic. For example, expiring connector configs
> > purely
> > > > > based on time doesn't seem like what we'd want.
> > > > >
> > > > > By the way, I wonder if you could describe the expected usage in a
> > > little
> > > > > more detail in the KIP for those of us who are not as familiar with
> > > Kafka
> > > > > Streams. Is the intention to have the log retain only the most
> recent
> > > > > window? In that case, would you always set the log retention time
> to
> > > the
> > > > > window length? And I suppose a consumer would do a seek to the
> start
> > of
> > > > the
> > > > > window (as soon as KIP-33 is available) and consume from there in
> > order
> > > > to
> > > > > read the current state?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy 
> > > > wrote:
> > > > >
> > > > > > Thanks Jun
> > > > > >
> > > > > > On Fri, 12 Aug 2016 at 16:41 Jun Rao  wrote:
> > > > > >
> > > > > > > Hi, Damian,
> > > > > > >
> > > > > > > I was just wondering if we should disable size-based retention
> in
> > > the
> > > > > > > compact_and_delete mode. So, it sounds like that there could
> be a
> > > use
> > > > > > case
> > > > > > > for that. Since by default, the size-based retention is
> > 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-19 Thread Damian Guy
Hi Grant,

I apologise - I seemed to have skipped over Joel's email.
It is not something we considered, but seems valid.
I'm not sure if we should do it as part of this KIP or revisit it if/when
we have more cleanup policies?

Thanks,
Damian

On Fri, 19 Aug 2016 at 15:57 Grant Henke  wrote:

> Thanks for this KIP Damien.
>
> I know this vote has passed and I think its is okay as is, but I had
> similar thoughts as Joel about combining compaction policies.
>
> I'm just wondering if it makes sense to allow specifying multiple
> > comma-separated policies "compact,delete" as opposed to
> > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
> > with more policies. The order could potentially indicate precedence.
> >
>
> Out of curiosity was the option of supporting a list of policies rejected?
> Is it something we may consider adding later but didn't want to include in
> the scope of this KIP?
>
> Thanks,
> Grant
>
>
>
> On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy  wrote:
>
> > Thanks for the proposal. I'm +1 overall with a thought somewhat related
> to
> > Jun's comments.
> >
> > While there may not yet be a sensible use case for it, it should be (in
> > theory) legal to have compact_and_delete with size based retention as
> well.
> > I'm just wondering if it makes sense to allow specifying multiple
> > comma-separated policies "compact,delete" as opposed to
> > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
> > with more policies. The order could potentially indicate precedence.
> > Anyway, it is just a thought - it may end up being very confusing for
> > users.
> >
> > @Jason - I agree this could be used to handle offset expiration as well.
> We
> > can discuss that separately though; and if we do that we would want to
> also
> > deprecate the retention field in the commit requests.
> >
> > Joel
> >
> > On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy 
> wrote:
> >
> > > Thanks Jason.
> > > The log retention.ms will be set to a value that greater than the
> window
> > > retention time. So as windows expire, they eventually get cleaned up by
> > the
> > > broker. It doesn't matter if old windows are around for sometime beyond
> > > their usefulness, more that they do eventually get removed and the log
> > > doesn't grow indefinitely (as it does now).
> > >
> > > Damian
> > >
> > > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson 
> wrote:
> > >
> > > > Hey Damian,
> > > >
> > > > That's true, but it would avoid the need to write tombstones for the
> > > > expired offsets I guess. I'm actually not sure it's a great idea
> > anyway.
> > > > One thing we've talked about is not expiring any offsets as long as a
> > > group
> > > > is alive, which would require some custom expiration logic. There's
> > also
> > > > the fact that we'd risk expiring group metadata which is stored in
> the
> > > same
> > > > log. Having a builtin expiration mechanism may be more useful for the
> > > > compacted topics we maintain in Connect, but I think there too we
> might
> > > > need some custom logic. For example, expiring connector configs
> purely
> > > > based on time doesn't seem like what we'd want.
> > > >
> > > > By the way, I wonder if you could describe the expected usage in a
> > little
> > > > more detail in the KIP for those of us who are not as familiar with
> > Kafka
> > > > Streams. Is the intention to have the log retain only the most recent
> > > > window? In that case, would you always set the log retention time to
> > the
> > > > window length? And I suppose a consumer would do a seek to the start
> of
> > > the
> > > > window (as soon as KIP-33 is available) and consume from there in
> order
> > > to
> > > > read the current state?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy 
> > > wrote:
> > > >
> > > > > Thanks Jun
> > > > >
> > > > > On Fri, 12 Aug 2016 at 16:41 Jun Rao  wrote:
> > > > >
> > > > > > Hi, Damian,
> > > > > >
> > > > > > I was just wondering if we should disable size-based retention in
> > the
> > > > > > compact_and_delete mode. So, it sounds like that there could be a
> > use
> > > > > case
> > > > > > for that. Since by default, the size-based retention is
> infinite, I
> > > > think
> > > > > > we can just leave the KIP as it is.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy <
> damian@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > The only concrete example i can think of is a case for limiting
> > > disk
> > > > > > usage.
> > > > > > > Say, i had something like Connect running that was tracking
> > changes
> > > > in
> > > > > a
> > > > > > > database. Downstream i don't really care about every change, i
> > just
> > > > > want
> > > > > > > the 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-19 Thread Grant Henke
Thanks for this KIP Damien.

I know this vote has passed and I think its is okay as is, but I had
similar thoughts as Joel about combining compaction policies.

I'm just wondering if it makes sense to allow specifying multiple
> comma-separated policies "compact,delete" as opposed to
> "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
> with more policies. The order could potentially indicate precedence.
>

Out of curiosity was the option of supporting a list of policies rejected?
Is it something we may consider adding later but didn't want to include in
the scope of this KIP?

Thanks,
Grant



On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy  wrote:

> Thanks for the proposal. I'm +1 overall with a thought somewhat related to
> Jun's comments.
>
> While there may not yet be a sensible use case for it, it should be (in
> theory) legal to have compact_and_delete with size based retention as well.
> I'm just wondering if it makes sense to allow specifying multiple
> comma-separated policies "compact,delete" as opposed to
> "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
> with more policies. The order could potentially indicate precedence.
> Anyway, it is just a thought - it may end up being very confusing for
> users.
>
> @Jason - I agree this could be used to handle offset expiration as well. We
> can discuss that separately though; and if we do that we would want to also
> deprecate the retention field in the commit requests.
>
> Joel
>
> On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy  wrote:
>
> > Thanks Jason.
> > The log retention.ms will be set to a value that greater than the window
> > retention time. So as windows expire, they eventually get cleaned up by
> the
> > broker. It doesn't matter if old windows are around for sometime beyond
> > their usefulness, more that they do eventually get removed and the log
> > doesn't grow indefinitely (as it does now).
> >
> > Damian
> >
> > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson  wrote:
> >
> > > Hey Damian,
> > >
> > > That's true, but it would avoid the need to write tombstones for the
> > > expired offsets I guess. I'm actually not sure it's a great idea
> anyway.
> > > One thing we've talked about is not expiring any offsets as long as a
> > group
> > > is alive, which would require some custom expiration logic. There's
> also
> > > the fact that we'd risk expiring group metadata which is stored in the
> > same
> > > log. Having a builtin expiration mechanism may be more useful for the
> > > compacted topics we maintain in Connect, but I think there too we might
> > > need some custom logic. For example, expiring connector configs purely
> > > based on time doesn't seem like what we'd want.
> > >
> > > By the way, I wonder if you could describe the expected usage in a
> little
> > > more detail in the KIP for those of us who are not as familiar with
> Kafka
> > > Streams. Is the intention to have the log retain only the most recent
> > > window? In that case, would you always set the log retention time to
> the
> > > window length? And I suppose a consumer would do a seek to the start of
> > the
> > > window (as soon as KIP-33 is available) and consume from there in order
> > to
> > > read the current state?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy 
> > wrote:
> > >
> > > > Thanks Jun
> > > >
> > > > On Fri, 12 Aug 2016 at 16:41 Jun Rao  wrote:
> > > >
> > > > > Hi, Damian,
> > > > >
> > > > > I was just wondering if we should disable size-based retention in
> the
> > > > > compact_and_delete mode. So, it sounds like that there could be a
> use
> > > > case
> > > > > for that. Since by default, the size-based retention is infinite, I
> > > think
> > > > > we can just leave the KIP as it is.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy  >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The only concrete example i can think of is a case for limiting
> > disk
> > > > > usage.
> > > > > > Say, i had something like Connect running that was tracking
> changes
> > > in
> > > > a
> > > > > > database. Downstream i don't really care about every change, i
> just
> > > > want
> > > > > > the latest values, so compaction could be enabled. However, the
> > kafka
> > > > > > cluster has limited disk space so we need to limit the size of
> each
> > > > > > partition.
> > > > > > In a previous life i have done the same, just without compaction
> > > turned
> > > > > on.
> > > > > >
> > > > > > Besides, i don't think it costs us anything in terms of added
> > > > complexity
> > > > > to
> > > > > > enable it for time & size based retention - the code already does
> > > this
> > > > > for
> > > > > > us.
> > > > > >
> > > > > > Thanks,
> > > > > > Damian
> > > > > >
> > > 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-15 Thread Joel Koshy
Thanks for the proposal. I'm +1 overall with a thought somewhat related to
Jun's comments.

While there may not yet be a sensible use case for it, it should be (in
theory) legal to have compact_and_delete with size based retention as well.
I'm just wondering if it makes sense to allow specifying multiple
comma-separated policies "compact,delete" as opposed to
"compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
with more policies. The order could potentially indicate precedence.
Anyway, it is just a thought - it may end up being very confusing for users.

@Jason - I agree this could be used to handle offset expiration as well. We
can discuss that separately though; and if we do that we would want to also
deprecate the retention field in the commit requests.

Joel

On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy  wrote:

> Thanks Jason.
> The log retention.ms will be set to a value that greater than the window
> retention time. So as windows expire, they eventually get cleaned up by the
> broker. It doesn't matter if old windows are around for sometime beyond
> their usefulness, more that they do eventually get removed and the log
> doesn't grow indefinitely (as it does now).
>
> Damian
>
> On Fri, 12 Aug 2016 at 20:25 Jason Gustafson  wrote:
>
> > Hey Damian,
> >
> > That's true, but it would avoid the need to write tombstones for the
> > expired offsets I guess. I'm actually not sure it's a great idea anyway.
> > One thing we've talked about is not expiring any offsets as long as a
> group
> > is alive, which would require some custom expiration logic. There's also
> > the fact that we'd risk expiring group metadata which is stored in the
> same
> > log. Having a builtin expiration mechanism may be more useful for the
> > compacted topics we maintain in Connect, but I think there too we might
> > need some custom logic. For example, expiring connector configs purely
> > based on time doesn't seem like what we'd want.
> >
> > By the way, I wonder if you could describe the expected usage in a little
> > more detail in the KIP for those of us who are not as familiar with Kafka
> > Streams. Is the intention to have the log retain only the most recent
> > window? In that case, would you always set the log retention time to the
> > window length? And I suppose a consumer would do a seek to the start of
> the
> > window (as soon as KIP-33 is available) and consume from there in order
> to
> > read the current state?
> >
> > Thanks,
> > Jason
> >
> > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy 
> wrote:
> >
> > > Thanks Jun
> > >
> > > On Fri, 12 Aug 2016 at 16:41 Jun Rao  wrote:
> > >
> > > > Hi, Damian,
> > > >
> > > > I was just wondering if we should disable size-based retention in the
> > > > compact_and_delete mode. So, it sounds like that there could be a use
> > > case
> > > > for that. Since by default, the size-based retention is infinite, I
> > think
> > > > we can just leave the KIP as it is.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy 
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The only concrete example i can think of is a case for limiting
> disk
> > > > usage.
> > > > > Say, i had something like Connect running that was tracking changes
> > in
> > > a
> > > > > database. Downstream i don't really care about every change, i just
> > > want
> > > > > the latest values, so compaction could be enabled. However, the
> kafka
> > > > > cluster has limited disk space so we need to limit the size of each
> > > > > partition.
> > > > > In a previous life i have done the same, just without compaction
> > turned
> > > > on.
> > > > >
> > > > > Besides, i don't think it costs us anything in terms of added
> > > complexity
> > > > to
> > > > > enable it for time & size based retention - the code already does
> > this
> > > > for
> > > > > us.
> > > > >
> > > > > Thanks,
> > > > > Damian
> > > > >
> > > > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede 
> > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > The motivation for this KIP is to handle joins and windows in
> Kafka
> > > > > > streams better and since Streams supports time-based windows, the
> > KIP
> > > > > > suggests combining time-based deletion and compaction.
> > > > > >
> > > > > > It might make sense to do the same for size-based windows, but
> can
> > > you
> > > > > > think of a concrete use case? If not, perhaps we can come back to
> > it.
> > > > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao 
> wrote:
> > > > > >
> > > > > >> Hi, Damian,
> > > > > >>
> > > > > >> Thanks for the proposal. It makes sense to use time-based
> deletion
> > > > > >> retention and compaction together, as you mentioned in the
> > KStream.
> > > > > >>
> > > > > >> Is there a use case where we want to combine size-based deletion
> > > > 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-15 Thread Damian Guy
Thanks Jason.
The log retention.ms will be set to a value that greater than the window
retention time. So as windows expire, they eventually get cleaned up by the
broker. It doesn't matter if old windows are around for sometime beyond
their usefulness, more that they do eventually get removed and the log
doesn't grow indefinitely (as it does now).

Damian

On Fri, 12 Aug 2016 at 20:25 Jason Gustafson  wrote:

> Hey Damian,
>
> That's true, but it would avoid the need to write tombstones for the
> expired offsets I guess. I'm actually not sure it's a great idea anyway.
> One thing we've talked about is not expiring any offsets as long as a group
> is alive, which would require some custom expiration logic. There's also
> the fact that we'd risk expiring group metadata which is stored in the same
> log. Having a builtin expiration mechanism may be more useful for the
> compacted topics we maintain in Connect, but I think there too we might
> need some custom logic. For example, expiring connector configs purely
> based on time doesn't seem like what we'd want.
>
> By the way, I wonder if you could describe the expected usage in a little
> more detail in the KIP for those of us who are not as familiar with Kafka
> Streams. Is the intention to have the log retain only the most recent
> window? In that case, would you always set the log retention time to the
> window length? And I suppose a consumer would do a seek to the start of the
> window (as soon as KIP-33 is available) and consume from there in order to
> read the current state?
>
> Thanks,
> Jason
>
> On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy  wrote:
>
> > Thanks Jun
> >
> > On Fri, 12 Aug 2016 at 16:41 Jun Rao  wrote:
> >
> > > Hi, Damian,
> > >
> > > I was just wondering if we should disable size-based retention in the
> > > compact_and_delete mode. So, it sounds like that there could be a use
> > case
> > > for that. Since by default, the size-based retention is infinite, I
> think
> > > we can just leave the KIP as it is.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > The only concrete example i can think of is a case for limiting disk
> > > usage.
> > > > Say, i had something like Connect running that was tracking changes
> in
> > a
> > > > database. Downstream i don't really care about every change, i just
> > want
> > > > the latest values, so compaction could be enabled. However, the kafka
> > > > cluster has limited disk space so we need to limit the size of each
> > > > partition.
> > > > In a previous life i have done the same, just without compaction
> turned
> > > on.
> > > >
> > > > Besides, i don't think it costs us anything in terms of added
> > complexity
> > > to
> > > > enable it for time & size based retention - the code already does
> this
> > > for
> > > > us.
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede 
> wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > The motivation for this KIP is to handle joins and windows in Kafka
> > > > > streams better and since Streams supports time-based windows, the
> KIP
> > > > > suggests combining time-based deletion and compaction.
> > > > >
> > > > > It might make sense to do the same for size-based windows, but can
> > you
> > > > > think of a concrete use case? If not, perhaps we can come back to
> it.
> > > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao  wrote:
> > > > >
> > > > >> Hi, Damian,
> > > > >>
> > > > >> Thanks for the proposal. It makes sense to use time-based deletion
> > > > >> retention and compaction together, as you mentioned in the
> KStream.
> > > > >>
> > > > >> Is there a use case where we want to combine size-based deletion
> > > > retention
> > > > >> and compaction together?
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy  >
> > > > wrote:
> > > > >>
> > > > >> > Hi Jason,
> > > > >> >
> > > > >> > Thanks for your input - appreciated.
> > > > >> >
> > > > >> > 1. Would it make sense to use this KIP in the consumer
> coordinator
> > > to
> > > > >> > > expire offsets based on the topic's retention time? Currently,
> > we
> > > > >> have a
> > > > >> > > periodic task which scans the full cache to check which
> offsets
> > > can
> > > > be
> > > > >> > > expired, but we might be able to get rid of this if we had a
> > > > callback
> > > > >> to
> > > > >> > > update the cache when a segment was deleted. Technically
> offsets
> > > can
> > > > >> be
> > > > >> > > given their own expiration time, but it seems questionable
> > whether
> > > > we
> > > > >> > need
> > > > >> > > this going forward (the new consumer doesn't even expose it at
> > the
> > > > >> > moment).
> > > > >> > >
> > > > >> >
> > > > >> > The KIP in its current form isn't 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-12 Thread Jason Gustafson
Hey Damian,

That's true, but it would avoid the need to write tombstones for the
expired offsets I guess. I'm actually not sure it's a great idea anyway.
One thing we've talked about is not expiring any offsets as long as a group
is alive, which would require some custom expiration logic. There's also
the fact that we'd risk expiring group metadata which is stored in the same
log. Having a builtin expiration mechanism may be more useful for the
compacted topics we maintain in Connect, but I think there too we might
need some custom logic. For example, expiring connector configs purely
based on time doesn't seem like what we'd want.

By the way, I wonder if you could describe the expected usage in a little
more detail in the KIP for those of us who are not as familiar with Kafka
Streams. Is the intention to have the log retain only the most recent
window? In that case, would you always set the log retention time to the
window length? And I suppose a consumer would do a seek to the start of the
window (as soon as KIP-33 is available) and consume from there in order to
read the current state?

Thanks,
Jason

On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy  wrote:

> Thanks Jun
>
> On Fri, 12 Aug 2016 at 16:41 Jun Rao  wrote:
>
> > Hi, Damian,
> >
> > I was just wondering if we should disable size-based retention in the
> > compact_and_delete mode. So, it sounds like that there could be a use
> case
> > for that. Since by default, the size-based retention is infinite, I think
> > we can just leave the KIP as it is.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy 
> wrote:
> >
> > > Hi,
> > >
> > > The only concrete example i can think of is a case for limiting disk
> > usage.
> > > Say, i had something like Connect running that was tracking changes in
> a
> > > database. Downstream i don't really care about every change, i just
> want
> > > the latest values, so compaction could be enabled. However, the kafka
> > > cluster has limited disk space so we need to limit the size of each
> > > partition.
> > > In a previous life i have done the same, just without compaction turned
> > on.
> > >
> > > Besides, i don't think it costs us anything in terms of added
> complexity
> > to
> > > enable it for time & size based retention - the code already does this
> > for
> > > us.
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede  wrote:
> > >
> > > > Jun,
> > > >
> > > > The motivation for this KIP is to handle joins and windows in Kafka
> > > > streams better and since Streams supports time-based windows, the KIP
> > > > suggests combining time-based deletion and compaction.
> > > >
> > > > It might make sense to do the same for size-based windows, but can
> you
> > > > think of a concrete use case? If not, perhaps we can come back to it.
> > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao  wrote:
> > > >
> > > >> Hi, Damian,
> > > >>
> > > >> Thanks for the proposal. It makes sense to use time-based deletion
> > > >> retention and compaction together, as you mentioned in the KStream.
> > > >>
> > > >> Is there a use case where we want to combine size-based deletion
> > > retention
> > > >> and compaction together?
> > > >>
> > > >> Jun
> > > >>
> > > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy 
> > > wrote:
> > > >>
> > > >> > Hi Jason,
> > > >> >
> > > >> > Thanks for your input - appreciated.
> > > >> >
> > > >> > 1. Would it make sense to use this KIP in the consumer coordinator
> > to
> > > >> > > expire offsets based on the topic's retention time? Currently,
> we
> > > >> have a
> > > >> > > periodic task which scans the full cache to check which offsets
> > can
> > > be
> > > >> > > expired, but we might be able to get rid of this if we had a
> > > callback
> > > >> to
> > > >> > > update the cache when a segment was deleted. Technically offsets
> > can
> > > >> be
> > > >> > > given their own expiration time, but it seems questionable
> whether
> > > we
> > > >> > need
> > > >> > > this going forward (the new consumer doesn't even expose it at
> the
> > > >> > moment).
> > > >> > >
> > > >> >
> > > >> > The KIP in its current form isn't adding a callback. So you'd
> still
> > > >> need to
> > > >> > scan the cache and remove any expired offsets, however you
> wouldn't
> > > send
> > > >> > the tombstone messages.
> > > >> > Having a callback sounds useful, though it isn't clear to me how
> you
> > > >> would
> > > >> > know which offsets to remove from the cache on segment deletion? I
> > > will
> > > >> > look into it.
> > > >> >
> > > >> >
> > > >> > > 2. This KIP could also be useful for expiration in the case of a
> > > cache
> > > >> > > maintained on the client, but I don't see an obvious way that
> we'd
> > > be
> > > >> > able
> > > >> > > to leverage it since there's no indication to the client when a
> > > >> 

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-12 Thread Damian Guy
Thanks Jun

On Fri, 12 Aug 2016 at 16:41 Jun Rao  wrote:

> Hi, Damian,
>
> I was just wondering if we should disable size-based retention in the
> compact_and_delete mode. So, it sounds like that there could be a use case
> for that. Since by default, the size-based retention is infinite, I think
> we can just leave the KIP as it is.
>
> Thanks,
>
> Jun
>
> On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy  wrote:
>
> > Hi,
> >
> > The only concrete example i can think of is a case for limiting disk
> usage.
> > Say, i had something like Connect running that was tracking changes in a
> > database. Downstream i don't really care about every change, i just want
> > the latest values, so compaction could be enabled. However, the kafka
> > cluster has limited disk space so we need to limit the size of each
> > partition.
> > In a previous life i have done the same, just without compaction turned
> on.
> >
> > Besides, i don't think it costs us anything in terms of added complexity
> to
> > enable it for time & size based retention - the code already does this
> for
> > us.
> >
> > Thanks,
> > Damian
> >
> > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede  wrote:
> >
> > > Jun,
> > >
> > > The motivation for this KIP is to handle joins and windows in Kafka
> > > streams better and since Streams supports time-based windows, the KIP
> > > suggests combining time-based deletion and compaction.
> > >
> > > It might make sense to do the same for size-based windows, but can you
> > > think of a concrete use case? If not, perhaps we can come back to it.
> > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao  wrote:
> > >
> > >> Hi, Damian,
> > >>
> > >> Thanks for the proposal. It makes sense to use time-based deletion
> > >> retention and compaction together, as you mentioned in the KStream.
> > >>
> > >> Is there a use case where we want to combine size-based deletion
> > retention
> > >> and compaction together?
> > >>
> > >> Jun
> > >>
> > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy 
> > wrote:
> > >>
> > >> > Hi Jason,
> > >> >
> > >> > Thanks for your input - appreciated.
> > >> >
> > >> > 1. Would it make sense to use this KIP in the consumer coordinator
> to
> > >> > > expire offsets based on the topic's retention time? Currently, we
> > >> have a
> > >> > > periodic task which scans the full cache to check which offsets
> can
> > be
> > >> > > expired, but we might be able to get rid of this if we had a
> > callback
> > >> to
> > >> > > update the cache when a segment was deleted. Technically offsets
> can
> > >> be
> > >> > > given their own expiration time, but it seems questionable whether
> > we
> > >> > need
> > >> > > this going forward (the new consumer doesn't even expose it at the
> > >> > moment).
> > >> > >
> > >> >
> > >> > The KIP in its current form isn't adding a callback. So you'd still
> > >> need to
> > >> > scan the cache and remove any expired offsets, however you wouldn't
> > send
> > >> > the tombstone messages.
> > >> > Having a callback sounds useful, though it isn't clear to me how you
> > >> would
> > >> > know which offsets to remove from the cache on segment deletion? I
> > will
> > >> > look into it.
> > >> >
> > >> >
> > >> > > 2. This KIP could also be useful for expiration in the case of a
> > cache
> > >> > > maintained on the client, but I don't see an obvious way that we'd
> > be
> > >> > able
> > >> > > to leverage it since there's no indication to the client when a
> > >> segment
> > >> > has
> > >> > > been deleted (unless they reload the cache from the beginning of
> the
> > >> > log).
> > >> > > One approach I can think of would be to write corresponding
> > >> tombstones as
> > >> > > necessary when a segment is removed, but that seems pretty heavy.
> > Have
> > >> > you
> > >> > > considered this problem?
> > >> > >
> > >> > >
> > >> > We've not considered this and I'm not sure we want to as part of
> this
> > >> KIP.
> > >> >
> > >> > Thanks,
> > >> > Damian
> > >> >
> > >> >
> > >> > >
> > >> > >
> > >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy  >
> > >> > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > We have created KIP 71: Enable log compaction and deletion to
> > >> co-exist`
> > >> > > >
> > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> > >> > > >
> > >> > > > Please take a look. Feedback is appreciated.
> > >> > > >
> > >> > > > Thank you
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-12 Thread Jun Rao
Hi, Damian,

I was just wondering if we should disable size-based retention in the
compact_and_delete mode. So, it sounds like that there could be a use case
for that. Since by default, the size-based retention is infinite, I think
we can just leave the KIP as it is.

Thanks,

Jun

On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy  wrote:

> Hi,
>
> The only concrete example i can think of is a case for limiting disk usage.
> Say, i had something like Connect running that was tracking changes in a
> database. Downstream i don't really care about every change, i just want
> the latest values, so compaction could be enabled. However, the kafka
> cluster has limited disk space so we need to limit the size of each
> partition.
> In a previous life i have done the same, just without compaction turned on.
>
> Besides, i don't think it costs us anything in terms of added complexity to
> enable it for time & size based retention - the code already does this for
> us.
>
> Thanks,
> Damian
>
> On Fri, 12 Aug 2016 at 05:30 Neha Narkhede  wrote:
>
> > Jun,
> >
> > The motivation for this KIP is to handle joins and windows in Kafka
> > streams better and since Streams supports time-based windows, the KIP
> > suggests combining time-based deletion and compaction.
> >
> > It might make sense to do the same for size-based windows, but can you
> > think of a concrete use case? If not, perhaps we can come back to it.
> > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao  wrote:
> >
> >> Hi, Damian,
> >>
> >> Thanks for the proposal. It makes sense to use time-based deletion
> >> retention and compaction together, as you mentioned in the KStream.
> >>
> >> Is there a use case where we want to combine size-based deletion
> retention
> >> and compaction together?
> >>
> >> Jun
> >>
> >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy 
> wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > Thanks for your input - appreciated.
> >> >
> >> > 1. Would it make sense to use this KIP in the consumer coordinator to
> >> > > expire offsets based on the topic's retention time? Currently, we
> >> have a
> >> > > periodic task which scans the full cache to check which offsets can
> be
> >> > > expired, but we might be able to get rid of this if we had a
> callback
> >> to
> >> > > update the cache when a segment was deleted. Technically offsets can
> >> be
> >> > > given their own expiration time, but it seems questionable whether
> we
> >> > need
> >> > > this going forward (the new consumer doesn't even expose it at the
> >> > moment).
> >> > >
> >> >
> >> > The KIP in its current form isn't adding a callback. So you'd still
> >> need to
> >> > scan the cache and remove any expired offsets, however you wouldn't
> send
> >> > the tombstone messages.
> >> > Having a callback sounds useful, though it isn't clear to me how you
> >> would
> >> > know which offsets to remove from the cache on segment deletion? I
> will
> >> > look into it.
> >> >
> >> >
> >> > > 2. This KIP could also be useful for expiration in the case of a
> cache
> >> > > maintained on the client, but I don't see an obvious way that we'd
> be
> >> > able
> >> > > to leverage it since there's no indication to the client when a
> >> segment
> >> > has
> >> > > been deleted (unless they reload the cache from the beginning of the
> >> > log).
> >> > > One approach I can think of would be to write corresponding
> >> tombstones as
> >> > > necessary when a segment is removed, but that seems pretty heavy.
> Have
> >> > you
> >> > > considered this problem?
> >> > >
> >> > >
> >> > We've not considered this and I'm not sure we want to as part of this
> >> KIP.
> >> >
> >> > Thanks,
> >> > Damian
> >> >
> >> >
> >> > >
> >> > >
> >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy 
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > We have created KIP 71: Enable log compaction and deletion to
> >> co-exist`
> >> > > >
> >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> >> > > >
> >> > > > Please take a look. Feedback is appreciated.
> >> > > >
> >> > > > Thank you
> >> > > >
> >> > >
> >> >
> >>
> >
>


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-12 Thread Damian Guy
Hi Jason,

It is still not clear to me how adding a callback would help. You could
remove the periodic task, but you would still need to scan the entire cache
to remove the expired offsets. Am i missing something?

Thanks,
Damian


On 11 August 2016 at 10:00, Damian Guy  wrote:

> Hi Jason,
>
> Thanks for your input - appreciated.
>
> 1. Would it make sense to use this KIP in the consumer coordinator to
>> expire offsets based on the topic's retention time? Currently, we have a
>> periodic task which scans the full cache to check which offsets can be
>> expired, but we might be able to get rid of this if we had a callback to
>> update the cache when a segment was deleted. Technically offsets can be
>> given their own expiration time, but it seems questionable whether we need
>> this going forward (the new consumer doesn't even expose it at the
>> moment).
>>
>
> The KIP in its current form isn't adding a callback. So you'd still need
> to scan the cache and remove any expired offsets, however you wouldn't send
> the tombstone messages.
> Having a callback sounds useful, though it isn't clear to me how you would
> know which offsets to remove from the cache on segment deletion? I will
> look into it.
>
>
>> 2. This KIP could also be useful for expiration in the case of a cache
>> maintained on the client, but I don't see an obvious way that we'd be able
>> to leverage it since there's no indication to the client when a segment
>> has
>> been deleted (unless they reload the cache from the beginning of the log).
>> One approach I can think of would be to write corresponding tombstones as
>> necessary when a segment is removed, but that seems pretty heavy. Have you
>> considered this problem?
>>
>>
> We've not considered this and I'm not sure we want to as part of this KIP.
>
> Thanks,
> Damian
>
>
>>
>>
>> On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy  wrote:
>>
>> > Hi,
>> >
>> > We have created KIP 71: Enable log compaction and deletion to co-exist`
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
>> >
>> > Please take a look. Feedback is appreciated.
>> >
>> > Thank you
>> >
>>
>
>


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-12 Thread Damian Guy
Hi,

The only concrete example i can think of is a case for limiting disk usage.
Say, i had something like Connect running that was tracking changes in a
database. Downstream i don't really care about every change, i just want
the latest values, so compaction could be enabled. However, the kafka
cluster has limited disk space so we need to limit the size of each
partition.
In a previous life i have done the same, just without compaction turned on.

Besides, i don't think it costs us anything in terms of added complexity to
enable it for time & size based retention - the code already does this for
us.

Thanks,
Damian

On Fri, 12 Aug 2016 at 05:30 Neha Narkhede  wrote:

> Jun,
>
> The motivation for this KIP is to handle joins and windows in Kafka
> streams better and since Streams supports time-based windows, the KIP
> suggests combining time-based deletion and compaction.
>
> It might make sense to do the same for size-based windows, but can you
> think of a concrete use case? If not, perhaps we can come back to it.
> On Thu, Aug 11, 2016 at 3:08 PM Jun Rao  wrote:
>
>> Hi, Damian,
>>
>> Thanks for the proposal. It makes sense to use time-based deletion
>> retention and compaction together, as you mentioned in the KStream.
>>
>> Is there a use case where we want to combine size-based deletion retention
>> and compaction together?
>>
>> Jun
>>
>> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy  wrote:
>>
>> > Hi Jason,
>> >
>> > Thanks for your input - appreciated.
>> >
>> > 1. Would it make sense to use this KIP in the consumer coordinator to
>> > > expire offsets based on the topic's retention time? Currently, we
>> have a
>> > > periodic task which scans the full cache to check which offsets can be
>> > > expired, but we might be able to get rid of this if we had a callback
>> to
>> > > update the cache when a segment was deleted. Technically offsets can
>> be
>> > > given their own expiration time, but it seems questionable whether we
>> > need
>> > > this going forward (the new consumer doesn't even expose it at the
>> > moment).
>> > >
>> >
>> > The KIP in its current form isn't adding a callback. So you'd still
>> need to
>> > scan the cache and remove any expired offsets, however you wouldn't send
>> > the tombstone messages.
>> > Having a callback sounds useful, though it isn't clear to me how you
>> would
>> > know which offsets to remove from the cache on segment deletion? I will
>> > look into it.
>> >
>> >
>> > > 2. This KIP could also be useful for expiration in the case of a cache
>> > > maintained on the client, but I don't see an obvious way that we'd be
>> > able
>> > > to leverage it since there's no indication to the client when a
>> segment
>> > has
>> > > been deleted (unless they reload the cache from the beginning of the
>> > log).
>> > > One approach I can think of would be to write corresponding
>> tombstones as
>> > > necessary when a segment is removed, but that seems pretty heavy. Have
>> > you
>> > > considered this problem?
>> > >
>> > >
>> > We've not considered this and I'm not sure we want to as part of this
>> KIP.
>> >
>> > Thanks,
>> > Damian
>> >
>> >
>> > >
>> > >
>> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy 
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > We have created KIP 71: Enable log compaction and deletion to
>> co-exist`
>> > > >
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
>> > > >
>> > > > Please take a look. Feedback is appreciated.
>> > > >
>> > > > Thank you
>> > > >
>> > >
>> >
>>
>


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-11 Thread Neha Narkhede
Jun,

The motivation for this KIP is to handle joins and windows in Kafka streams
better and since Streams supports time-based windows, the KIP suggests
combining time-based deletion and compaction.

It might make sense to do the same for size-based windows, but can you
think of a concrete use case? If not, perhaps we can come back to it.
On Thu, Aug 11, 2016 at 3:08 PM Jun Rao  wrote:

> Hi, Damian,
>
> Thanks for the proposal. It makes sense to use time-based deletion
> retention and compaction together, as you mentioned in the KStream.
>
> Is there a use case where we want to combine size-based deletion retention
> and compaction together?
>
> Jun
>
> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy  wrote:
>
> > Hi Jason,
> >
> > Thanks for your input - appreciated.
> >
> > 1. Would it make sense to use this KIP in the consumer coordinator to
> > > expire offsets based on the topic's retention time? Currently, we have
> a
> > > periodic task which scans the full cache to check which offsets can be
> > > expired, but we might be able to get rid of this if we had a callback
> to
> > > update the cache when a segment was deleted. Technically offsets can be
> > > given their own expiration time, but it seems questionable whether we
> > need
> > > this going forward (the new consumer doesn't even expose it at the
> > moment).
> > >
> >
> > The KIP in its current form isn't adding a callback. So you'd still need
> to
> > scan the cache and remove any expired offsets, however you wouldn't send
> > the tombstone messages.
> > Having a callback sounds useful, though it isn't clear to me how you
> would
> > know which offsets to remove from the cache on segment deletion? I will
> > look into it.
> >
> >
> > > 2. This KIP could also be useful for expiration in the case of a cache
> > > maintained on the client, but I don't see an obvious way that we'd be
> > able
> > > to leverage it since there's no indication to the client when a segment
> > has
> > > been deleted (unless they reload the cache from the beginning of the
> > log).
> > > One approach I can think of would be to write corresponding tombstones
> as
> > > necessary when a segment is removed, but that seems pretty heavy. Have
> > you
> > > considered this problem?
> > >
> > >
> > We've not considered this and I'm not sure we want to as part of this
> KIP.
> >
> > Thanks,
> > Damian
> >
> >
> > >
> > >
> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have created KIP 71: Enable log compaction and deletion to
> co-exist`
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> > > >
> > > > Please take a look. Feedback is appreciated.
> > > >
> > > > Thank you
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-11 Thread Jun Rao
Hi, Damian,

Thanks for the proposal. It makes sense to use time-based deletion
retention and compaction together, as you mentioned in the KStream.

Is there a use case where we want to combine size-based deletion retention
and compaction together?

Jun

On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy  wrote:

> Hi Jason,
>
> Thanks for your input - appreciated.
>
> 1. Would it make sense to use this KIP in the consumer coordinator to
> > expire offsets based on the topic's retention time? Currently, we have a
> > periodic task which scans the full cache to check which offsets can be
> > expired, but we might be able to get rid of this if we had a callback to
> > update the cache when a segment was deleted. Technically offsets can be
> > given their own expiration time, but it seems questionable whether we
> need
> > this going forward (the new consumer doesn't even expose it at the
> moment).
> >
>
> The KIP in its current form isn't adding a callback. So you'd still need to
> scan the cache and remove any expired offsets, however you wouldn't send
> the tombstone messages.
> Having a callback sounds useful, though it isn't clear to me how you would
> know which offsets to remove from the cache on segment deletion? I will
> look into it.
>
>
> > 2. This KIP could also be useful for expiration in the case of a cache
> > maintained on the client, but I don't see an obvious way that we'd be
> able
> > to leverage it since there's no indication to the client when a segment
> has
> > been deleted (unless they reload the cache from the beginning of the
> log).
> > One approach I can think of would be to write corresponding tombstones as
> > necessary when a segment is removed, but that seems pretty heavy. Have
> you
> > considered this problem?
> >
> >
> We've not considered this and I'm not sure we want to as part of this KIP.
>
> Thanks,
> Damian
>
>
> >
> >
> > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy 
> wrote:
> >
> > > Hi,
> > >
> > > We have created KIP 71: Enable log compaction and deletion to co-exist`
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> > >
> > > Please take a look. Feedback is appreciated.
> > >
> > > Thank you
> > >
> >
>


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-11 Thread Damian Guy
Hi Jason,

Thanks for your input - appreciated.

1. Would it make sense to use this KIP in the consumer coordinator to
> expire offsets based on the topic's retention time? Currently, we have a
> periodic task which scans the full cache to check which offsets can be
> expired, but we might be able to get rid of this if we had a callback to
> update the cache when a segment was deleted. Technically offsets can be
> given their own expiration time, but it seems questionable whether we need
> this going forward (the new consumer doesn't even expose it at the moment).
>

The KIP in its current form isn't adding a callback. So you'd still need to
scan the cache and remove any expired offsets, however you wouldn't send
the tombstone messages.
Having a callback sounds useful, though it isn't clear to me how you would
know which offsets to remove from the cache on segment deletion? I will
look into it.


> 2. This KIP could also be useful for expiration in the case of a cache
> maintained on the client, but I don't see an obvious way that we'd be able
> to leverage it since there's no indication to the client when a segment has
> been deleted (unless they reload the cache from the beginning of the log).
> One approach I can think of would be to write corresponding tombstones as
> necessary when a segment is removed, but that seems pretty heavy. Have you
> considered this problem?
>
>
We've not considered this and I'm not sure we want to as part of this KIP.

Thanks,
Damian


>
>
> On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy  wrote:
>
> > Hi,
> >
> > We have created KIP 71: Enable log compaction and deletion to co-exist`
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> >
> > Please take a look. Feedback is appreciated.
> >
> > Thank you
> >
>


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-10 Thread Jason Gustafson
Hi Damian,

Thanks for the KIP. We have a number of use cases in which we maintain a
materialized cache of a compacted topic. The consumer coordinator, for
example, has a cache of consumer offsets which is populated from the
__consumer_offsets topic. Kafka Connect also uses this pattern for its own
offset and config storage. The key distinction in the latter case is that
the cache is maintained on the client. So a couple questions about the
potential impact of this KIP on these use cases:

1. Would it make sense to use this KIP in the consumer coordinator to
expire offsets based on the topic's retention time? Currently, we have a
periodic task which scans the full cache to check which offsets can be
expired, but we might be able to get rid of this if we had a callback to
update the cache when a segment was deleted. Technically offsets can be
given their own expiration time, but it seems questionable whether we need
this going forward (the new consumer doesn't even expose it at the moment).
2. This KIP could also be useful for expiration in the case of a cache
maintained on the client, but I don't see an obvious way that we'd be able
to leverage it since there's no indication to the client when a segment has
been deleted (unless they reload the cache from the beginning of the log).
One approach I can think of would be to write corresponding tombstones as
necessary when a segment is removed, but that seems pretty heavy. Have you
considered this problem?

It may not be necessary to address this problem in this KIP, but since the
need for expiration seems very common for this use case, it could save a
lot of duplicated effort if the broker provided a builtin mechanism for it.

Thanks,
Jason


On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy  wrote:

> Hi,
>
> We have created KIP 71: Enable log compaction and deletion to co-exist`
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 71%3A+Enable+log+compaction+and+deletion+to+co-exist
>
> Please take a look. Feedback is appreciated.
>
> Thank you
>


[DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-08 Thread Damian Guy
Hi,

We have created KIP 71: Enable log compaction and deletion to co-exist`

https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist

Please take a look. Feedback is appreciated.

Thank you