[VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-28 Thread Joel Koshy
> > - It seems that the consumer will need to write log.retention.min.timestamp > periodically to zookeeper as dynamic configuration of the topic, so that > broker can pick up log.retention.min.timestamp. However, this introduces > dependency of consumer on zookeeper which is undesirable. Note

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-26 Thread Dong Lin
Hey Bill, I have some follow up questions after Jun's questions: - It seems that the consumer will need to write log.retention.min.timestamp periodically to zookeeper as dynamic configuration of the topic, so that broker can pick up log.retention.min.timestamp. However, this introduces

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-25 Thread Jun Rao
Bill, That's a good question. I am thinking of the following approach for implementing trim(): (1) client issues metadata request to the broker to determine the leader of topic/partitions and groups topic/partitions by the leader broker; (2) client sends a TrimRequest to each broker with

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-24 Thread Bill Warshaw
Hi Jun, Those are valid concerns. For our particular use case, application events triggering the timestamp update will never occur more than once an hour, and we maintain a sliding window so that we don't delete messages too close to what our consumers may be reading. For more general use cases,

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-24 Thread Jun Rao
Hi, Bill, Thanks for the proposal. Sorry for the late reply. The motivation of the proposal makes sense: don't delete the messages until the application tells you so. I am wondering if the current proposal is the best way to address the need though. There are couple of issues that I saw with

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-11 Thread Guozhang Wang
+1. On Fri, Oct 7, 2016 at 3:35 PM, Gwen Shapira wrote: > +1 (binding) > > On Wed, Oct 5, 2016 at 1:55 PM, Bill Warshaw wrote: > > Bumping for visibility. KIP is here: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+ >

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-07 Thread Gwen Shapira
+1 (binding) On Wed, Oct 5, 2016 at 1:55 PM, Bill Warshaw wrote: > Bumping for visibility. KIP is here: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy > > On Wed, Aug 24, 2016 at 2:32 PM Bill Warshaw

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-10-05 Thread Bill Warshaw
Bumping for visibility. KIP is here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy On Wed, Aug 24, 2016 at 2:32 PM Bill Warshaw wrote: > Hello Guozhang, > > KIP-71 seems unrelated to this KIP. KIP-47 is just adding a

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-08-24 Thread Bill Warshaw
Hello Guozhang, KIP-71 seems unrelated to this KIP. KIP-47 is just adding a new deletion policy (minimum timestamp), while KIP-71 is allowing deletion and compaction to coexist. They both will touch LogManager, but the change for KIP-47 is very isolated. On Wed, Aug 24, 2016 at 2:21 PM

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-08-24 Thread Guozhang Wang
Hi Bill, I would like to reason if there is any correlation between this KIP and KIP-71 https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist I feel they are orthogonal but would like to double check with you. Guozhang On Wed, Aug 24,

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-08-24 Thread Bill Warshaw
I'd like to re-awaken this voting thread now that KIP-33 has merged. This KIP is now completely unblocked. I have a working branch off of trunk with my proposed fix, including testing. On Mon, May 9, 2016 at 8:30 PM Guozhang Wang wrote: > Jay, Bill: > > I'm thinking of one

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-09 Thread Guozhang Wang
Jay, Bill: I'm thinking of one general use case of using timestamp rather than offset for log deletion, which is that for expiration handling in data replication, when the source data store decides to expire some data records based on their timestamps, today we need to configure the corresponding

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Bill Warshaw
Yes, I'd agree that offset is a more precise configuration than timestamp. If there was a way to set a partition-level configuration, I would rather use log.retention.min.offset than timestamp. If you have an approach in mind I'd be open to investigating it. On Mon, May 2, 2016 at 5:33 PM, Jay

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Jay Kreps
Gotcha, good point. But barring that limitation, you agree that that makes more sense? -Jay On Mon, May 2, 2016 at 2:29 PM, Bill Warshaw wrote: > The problem with offset as a config option is that offsets are > partition-specific, so we'd need a per-partition config. This

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Jay Kreps
I think you are saying you considered a kind of trim() api that would synchronously chop off the tail of the log starting from a given offset. That would be one option, but what I was saying was slightly different: in the proposal you have where there is a config that controls retention that the

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Bill Warshaw
1. Initially I looked at using the actual offset, by adding a call to AdminUtils to just delete anything in a given topic/partition to a given offset. I ran into a lot of trouble here trying to work out how the system would recognize that every broker had successfully deleted that range from the

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Jay Kreps
Two comments: 1. Is there a reason to use physical time rather than offset? The idea is for the consumer to say when it has consumed something so it can be deleted, right? It seems like offset would be a much more precise way to do this--i.e. the consumer says "I have checkpointed

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Guozhang Wang
Thanks. I'm +1 on this proposal given the comment above. On Mon, May 2, 2016 at 9:34 AM, Bill Warshaw wrote: > Yeah 1 and 2 could easily be combined into the same predicate. > > On Mon, May 2, 2016 at 10:27 AM, Guozhang Wang wrote: > > > Can we do 1 and

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Bill Warshaw
Yeah 1 and 2 could easily be combined into the same predicate. On Mon, May 2, 2016 at 10:27 AM, Guozhang Wang wrote: > Can we do 1 and 2 in one pass, and 3 in another pass? It may result in > different results but semantically it should be acceptable. Arguably saving > one

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Guozhang Wang
Can we do 1 and 2 in one pass, and 3 in another pass? It may result in different results but semantically it should be acceptable. Arguably saving one pass on the segment list may not be huge, but if it is straight-forward to do I'd suggest choose this option. Guozhang On Mon, May 2, 2016 at

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-02 Thread Bill Warshaw
Conditions 1, 2 and 3 will all be checked sequentially. If any of the three conditions is true, that segment will be deleted. This is what it looks like in my commit:

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-05-01 Thread Guozhang Wang
Thanks Bill. Read through the KIP, LGTM overall. One clarification question: With this KIP the LogManager's cleanup logic would be, for each segment 1) delete the segment if its last timestamp is < current timstamp - log.retention.time (ms, minutes, hours, etc). 2) delete the segment if its

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-04-28 Thread Bill Warshaw
I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been accepted and is in-progress. I've updated the KIP ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy). I have a commit with the functionality for KIP-47 ready to go once KIP-33

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-03-09 Thread Gwen Shapira
For convenience, the KIP is here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy Do you mind updating the KIP with time formats we plan on supporting in the configuration? On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw

[VOTE] KIP-47 - Add timestamp-based log deletion policy

2016-03-09 Thread Bill Warshaw
Hello, I'd like to initiate the vote for KIP-47. Thanks, Bill Warshaw