himanshug opened a new issue #8846: VersionedIntervalTimeline performance 
corner case with high number of overlapping segments
URL: https://github.com/apache/incubator-druid/issues/8846
 
 
   Recently all of our historical node restarted on a cluster which was serving 
about 50000 segments of following nature.
   each segment's interval would be 24 hours and each successive segment 
overlaps with previous one for 1439 minutes (1440 minutes is 24 hours), for 
example segment intervals and versions might look like
   
   2019-01-01T00:00:00.000Z - 2019-01-01T23:59:00.000Z , v1
   2019-01-01T00:01:00.000Z - 2019-01-02T00:00:00.000Z , v2
   2019-01-01T00:02:00.000Z - 2019-01-02T00:01:00.000Z , v3
   2019-01-01T00:03:00.000Z - 2019-01-02T00:02:00.000Z , v4
   ...
   ...
   
   that triggered a sequence of `VersionIntervalTimeline.remove(..)` calls for 
each segment one by one and broker/coordinator never recovered and needed a 
forced restart because `VersionIntervalTimeline.remove(..)` becomes very 
expensive for above scenario and never finished.
   
   I did a quick prototype to batch multiple 
`VersionIntervalTimeline.remove(..)` calls into a single 
`VersionIntervalTimeline.removeAll(..)` call which could be used when data 
servers go down which had few optimizations possible. Batched call would first 
remove all entries `allTimelineEntries` and then from 
`complete/incompletePartitionTimeline` and then adjust them based on the state 
of  `allTimelineEntries` ,  with batched version `allTimelineEntries` has 
significantly fewer entries and no unnecessary corrections are to be made to 
`complete/incompletePartitionTimeline` which happens in non-batched removals.
   
   ..creating this issue to discuss other proposed solutions.
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to