clintropolis opened a new pull request #9559: error on value counter overflow instead of writing sad segments URL: https://github.com/apache/druid/pull/9559 ### Description This PR fixes an integer overflow issue if too many values are written to a column within a single segment, preferring to fail at segment creation time instead of write a sad segment with negative values. This issue was first noticed with multi-value columns, exploding at query time with an error of the form: ``` org.apache.druid.java.util.common.IAE: Index[-131072] < 0 at org.apache.druid.segment.data.GenericIndexed.checkIndex(GenericIndexed.java:269) at org.apache.druid.segment.data.GenericIndexed.access$300(GenericIndexed.java:79) at org.apache.druid.segment.data.GenericIndexed$3.get(GenericIndexed.java:696) at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.loadBuffer(CompressedVSizeColumnarIntsSupplier.java:437) at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.get(CompressedVSizeColumnarIntsSupplier.java:350) at org.apache.druid.segment.data.SliceIndexedInts.get(SliceIndexedInts.java:60) ``` but would also occur for any serializer given more than `Integer.MAX_VALUES` rows as input. To fix, primitive column serializers now check the number of values (row count in most cases, total number of values for the case of multi-value strings) to ensure that it does not extend beyond the values that will be expressed in the column header and won't cause any issues at query time. A new exception, `ColumnCapacityExceededException` has been added which will give an error message that suggests ``` Too many values to store for %s column, try reducing maxRowsPerSegment ``` where `%s` is the column name (which all the serializers now know). I added a bunch of tests to confirm that this works, and also marked them all `@Ignore` because they take forever to run. The same `IAE` error can be replicated by running `V3CompressedVSizeColumnarMultiIntsSerializerTest.testTooManyValues` without the modifications to check that overflow has occurred. I also added a `CompressedDoublesSerdeTest` that copies `CompressedFloatsSerdeTest` since i noticed there wasn't a test for double columns. <hr> This PR has: - [ ] been self-reviewed. - [ ] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. --> <hr> ##### Key changed/added classes in this PR * Primitive column serializers and related tests
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org