clintropolis opened a new pull request #9559: error on value counter overflow 
instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559
 
 
   
   ### Description
   
   This PR fixes an integer overflow issue if too many values are written to a 
column within a single segment, preferring to fail at segment creation time 
instead of write a sad segment with negative values. 
   
   This issue was first noticed with multi-value columns, exploding at query 
time with an error of the form:
   
   ```
   org.apache.druid.java.util.common.IAE: Index[-131072] < 0
        at 
org.apache.druid.segment.data.GenericIndexed.checkIndex(GenericIndexed.java:269)
        at 
org.apache.druid.segment.data.GenericIndexed.access$300(GenericIndexed.java:79)
        at 
org.apache.druid.segment.data.GenericIndexed$3.get(GenericIndexed.java:696)
        at 
org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.loadBuffer(CompressedVSizeColumnarIntsSupplier.java:437)
        at 
org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.get(CompressedVSizeColumnarIntsSupplier.java:350)
        at 
org.apache.druid.segment.data.SliceIndexedInts.get(SliceIndexedInts.java:60)
   ```
   
   but would also occur for any serializer given more than `Integer.MAX_VALUES` 
rows as input.
   
   To fix, primitive column serializers now check the number of values (row 
count in most cases, total number of values for the case of multi-value 
strings) to ensure that it does not extend beyond the values that will be 
expressed in the column header and won't cause any issues at query time. A new 
exception, `ColumnCapacityExceededException` has been added which will give an 
error message that suggests 
   
   ```
   Too many values to store for %s column, try reducing maxRowsPerSegment
   ```
   
   where `%s` is the column name (which all the serializers now know).
   
   I added a bunch of tests to confirm that this works, and also marked them 
all `@Ignore` because they take forever to run. The same `IAE` error can be 
replicated by running 
`V3CompressedVSizeColumnarMultiIntsSerializerTest.testTooManyValues` without 
the modifications to check that overflow has occurred.
   
   I also added a `CompressedDoublesSerdeTest` that copies 
`CompressedFloatsSerdeTest` since i noticed there wasn't a test for double 
columns.
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not 
all of these items apply to every PR. Remove the items which are not done or 
not relevant to the PR. None of the items from the checklist above are strictly 
necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * Primitive column serializers and related tests
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to