[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker
boyuanzz commented on pull request #11715: URL: https://github.com/apache/beam/pull/11715#issuecomment-632408026 > What is the intrinsic limitation that did not allow old `OffsetRangeTracker` to be refactored for this use case? or why we want to have both? > `GrowableOffsetRangeTracker` and `OffsetRangeTracker` should be applied to different kind of `OffsetRange`. `GrowableOffsetRangeTracker` is for the `OffsetRange` that the end could be changed during execution time, which mostly happens in streaming case. `OffsetRangeTracker` is for the range that we know what the exact end is, which is perfect for batch. The reason that I didn't make them into one is because I don't want to introduce additional complexity to `OffsetRangeTracker`. It's doable to have the dynamic one as general case where the range with fixed end is a special case. But I want to make them specifically with less confusion. > Does this mean also that we might need `GrowableBytekeyRangeTracker` and basically 'dynamic' versions for every `RestrictionTracker` ? I think it will depend on the actual usage. If we have an application scenario that requires for a dynamic version, then I would say yes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker
boyuanzz commented on pull request #11715: URL: https://github.com/apache/beam/pull/11715#issuecomment-631794981 Thanks for your help! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker
boyuanzz commented on pull request #11715: URL: https://github.com/apache/beam/pull/11715#issuecomment-631204566 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker
boyuanzz commented on pull request #11715: URL: https://github.com/apache/beam/pull/11715#issuecomment-630963389 > > > Should we be using the RangeEndEstimator when providing progress/splitting for ranges not ending at `Long.MAX_VALUE`? > > > Lets say the range estimate is bad and is `MAX_VALUE - 3` but the real end is `5000`, then after a split we end up with `[0, (MAX_VALUE - 3) * 0.5)` and `[(MAX_VALUE - 3) * 0.5, MAX_VALUE)`. We may quickly learn that the residual is empty and then lose all effective progress on the primary. > > > > > > I can see the benefit of using `RangeEndEstimator` for the finite range here. But as long as we don't modify the range end to estimate end or use estimate ed in `tryClaim`, we still cannot say the residual is empty. > > That is true but I was thinking it would make better splitting decisions instead of creating a bunch of empty splits trimming the range down. The advantage of not using the estimator is that we don't have to invoke since it could be expensive for the user and in many situations will produce a value greater than `to`. > > We can leave it out for now unless some compelling use case comes up. Do we want to have a TODO here to track this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker
boyuanzz commented on pull request #11715: URL: https://github.com/apache/beam/pull/11715#issuecomment-630952259 > Should we be using the RangeEndEstimator when providing progress/splitting for ranges not ending at `Long.MAX_VALUE`? > > Lets say the range estimate is bad and is `MAX_VALUE - 3` but the real end is `5000`, then after a split we end up with `[0, (MAX_VALUE - 3) * 0.5)` and `[(MAX_VALUE - 3) * 0.5, MAX_VALUE)`. We may quickly learn that the residual is empty and then lose all effective progress on the primary. I can see the benefit of using `RangeEndEstimator` for the finite range here. But as long as we don't modify the range end to estimate end or use estimate ed in `tryClaim`, we still cannot say the residual is empty. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker
boyuanzz commented on pull request #11715: URL: https://github.com/apache/beam/pull/11715#issuecomment-630536199 Latest changes are for addressing comments and using double during computation. @lukecwik PTAL. Thanks for your help! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker
boyuanzz commented on pull request #11715: URL: https://github.com/apache/beam/pull/11715#issuecomment-629344196 R: @lukecwik CC: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org