[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker

2020-05-21 Thread GitBox


boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-632408026


   > What is the intrinsic limitation that did not allow old 
`OffsetRangeTracker` to be refactored for this use case? or why we want to have 
both?
   > 
   `GrowableOffsetRangeTracker` and `OffsetRangeTracker` should be applied to 
different kind of `OffsetRange`. `GrowableOffsetRangeTracker` is for the 
`OffsetRange` that the end could be changed during execution time, which mostly 
happens in streaming case. `OffsetRangeTracker` is for the range that we know 
what the exact end is, which is perfect for batch.  The reason that I didn't 
make them into one is because I don't want to introduce additional complexity 
to `OffsetRangeTracker`.  It's doable to have the dynamic one as general case 
where the range with fixed end is a special case. But I want to make them 
specifically with less confusion.
   
   > Does this mean also that we might need `GrowableBytekeyRangeTracker` and 
basically 'dynamic' versions for every `RestrictionTracker` ?
   
   I think it will depend on the actual usage. If we have an application 
scenario that requires for a dynamic version, then I would say yes. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker

2020-05-20 Thread GitBox


boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-631794981


   Thanks for your help!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker

2020-05-19 Thread GitBox


boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-631204566


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker

2020-05-19 Thread GitBox


boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-630963389


   > > > Should we be using the RangeEndEstimator when providing 
progress/splitting for ranges not ending at `Long.MAX_VALUE`?
   > > > Lets say the range estimate is bad and is `MAX_VALUE - 3` but the real 
end is `5000`, then after a split we end up with `[0, (MAX_VALUE - 3) * 0.5)` 
and `[(MAX_VALUE - 3) * 0.5, MAX_VALUE)`. We may quickly learn that the 
residual is empty and then lose all effective progress on the primary.
   > > 
   > > 
   > > I can see the benefit of using `RangeEndEstimator` for the finite range 
here. But as long as we don't modify the range end to estimate end or use 
estimate ed in `tryClaim`, we still cannot say the residual is empty.
   > 
   > That is true but I was thinking it would make better splitting decisions 
instead of creating a bunch of empty splits trimming the range down. The 
advantage of not using the estimator is that we don't have to invoke since it 
could be expensive for the user and in many situations will produce a value 
greater than `to`.
   > 
   > We can leave it out for now unless some compelling use case comes up.
   
   Do we want to have a TODO here to track this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker

2020-05-19 Thread GitBox


boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-630952259


   > Should we be using the RangeEndEstimator when providing progress/splitting 
for ranges not ending at `Long.MAX_VALUE`?
   > 
   > Lets say the range estimate is bad and is `MAX_VALUE - 3` but the real end 
is `5000`, then after a split we end up with `[0, (MAX_VALUE - 3) * 0.5)` and 
`[(MAX_VALUE - 3) * 0.5, MAX_VALUE)`. We may quickly learn that the residual is 
empty and then lose all effective progress on the primary.
   
   I can see the benefit of using `RangeEndEstimator` for the finite range 
here.  But as long as we don't modify the range end to estimate end or use 
estimate ed in `tryClaim`, we still cannot say the residual is empty.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker

2020-05-18 Thread GitBox


boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-630536199


   Latest changes are for addressing comments and using double during 
computation. @lukecwik PTAL. Thanks for your help!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11715: [BEAM-9977] Implement GrowableOffsetRangeTracker

2020-05-15 Thread GitBox


boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-629344196


   R: @lukecwik 
   CC: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org