Re: Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-12 Thread John Roesler
Thanks for looking at it, Bill. I initially agreed with you, but Manikumar asked me to check if it's really a regression before calling it a blocker. I tested 2.3 and found the same (buggy) behavior, so I don't think we can call it a regression, and therefore, it's also not a blocker. I'm still

Re: Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-12 Thread Bill Bejeck
This could be a significant performance issue for some, so I think this fix needs to go into 2.4. Just my 2 cents. -Bill On Mon, Nov 11, 2019 at 5:57 PM John Roesler wrote: > Ok, created: https://github.com/apache/kafka/pull/7681 > > I'm on the fence about whether we should file this as a

Re: Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-11 Thread John Roesler
Ok, created: https://github.com/apache/kafka/pull/7681 I'm on the fence about whether we should file this as a 2.4.0 blocker. It _sounds_ like this would have a pretty big impact on performance. I'm not convinced about any correctness problems, though, since the changelogs are only configured

Re: Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-11 Thread John Roesler
Hi all, I've just been looking over the code and Guozhang's reply... I think that the reply is reasonable, but it seems like the code may not be precisely implementing this logic. As an entry point, in `StreamThread#runOnce`: If the state is `PARTITIONS_ASSIGNED`, we'll call

Re: Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-10 Thread Navinder Brar
Thanks Guozhang. The jira is filed: [KAFKA-9169] Standby Tasks point ask for incorrect offsets on resuming post suspension - ASF JIRA | | | | [KAFKA-9169] Standby Tasks point ask for incorrect offsets on resuming p... | | | On Monday, 11 November, 2019, 03:10:37 am IST,

Re: Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-10 Thread Guozhang Wang
If a standby task is suspended, it will write the checkpoint file again after flushing its state stores, and when it resumes it does not re initialize the position on the consumer and hence it is still the task-manager's responsibility to set the right starting offset from the latest checkpoint

Re: Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-09 Thread Guozhang Wang
Hello Navinder, Sorry for the late reply and thanks for bringing this up. I think this is indeed a bug that needs to be fixed. The rationale behind was the following: for restoring active tasks and processing standby tasks, we are using the same consumer client within the thread (the

Why standby tasks read from the StandbyTasks::checkpointedOffsets in assignStandbyPartitions()

2019-11-04 Thread Navinder Brar
Hi, Please let me know if this is not the correct forum to ask this. But I have a doubt, I was hoping someone can clear it for me. In TaskManager:: updateNewAndRestoringTasks(), the function  assignStandbyPartitions() gets called for all the running standby tasks where it populates the Map: