[ https://issues.apache.org/jira/browse/DRILL-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned DRILL-7324: ---------------------------------- Assignee: Paul Rogers > Many vector-validity errors from unit tests > ------------------------------------------- > > Key: DRILL-7324 > URL: https://issues.apache.org/jira/browse/DRILL-7324 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.16.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Major > > Drill's value vectors contain many counts that must be maintained in sync. > Drill provides a utility, {{BatchValidator}} to check (a subset of) these > values for consistency. > The {{IteratorValidatorBatchIterator}} class is used in tests to validate the > state of each operator (AKA "record batch") as Drill runs the Volcano > iterator. This class can also validate vectors by setting the > {{VALIDATE_VECTORS}} constant to `true`. > This was done, then unit tests were run. Many tests failed. Examples: > {noformat} > [INFO] Running org.apache.drill.TestUnionDistinct > 18:44:26.742 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > key - NullableBitVector: Row count = 0, but value count = 2 > 18:44:26.745 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > key - NullableBitVector: Row count = 0, but value count = 2 > [INFO] Running org.apache.drill.TestUnionDistinct > 8:44:48.302 [22d4256e-c90b-847c-5104-02d6cdf5223e:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > key - NullableBitVector: Row count = 0, but value count = 2 > 18:44:48.703 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > n_nationkey - IntVector: Row count = 2, but value count = 25 > n_regionkey - IntVector: Row count = 2, but value count = 25 > 18:44:48.731 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > n_nationkey - IntVector: Row count = 4, but value count = 25 > n_regionkey - IntVector: Row count = 4, but value count = 25 > 18:44:49.039 [22d4256f-6b39-d2ab-d145-4f2b0db315a3:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > n_nationkey - IntVector: Row count = 2, but value count = 25 > 18:44:49.363 [22d4256e-3d91-850f-9ab4-5939219ac0d0:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > c_custkey - IntVector: Row count = 4, but value count = 1500 > 18:44:49.597 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > n_nationkey - IntVector: Row count = 5, but value count = 25 > n_regionkey - IntVector: Row count = 5, but value count = 25 > 18:44:49.610 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > r_regionkey - IntVector: Row count = 1, but value count = 5 > 18:44:53.029 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > n_nationkey - IntVector: Row count = 0, but value count = 25 > n_name - VarCharVector: Row count = 0, but value count = 25 > n_regionkey - IntVector: Row count = 0, but value count = 25 > 18:44:53.033 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > n_regionkey - IntVector: Row count = 5, but value count = 25 > 18:44:53.331 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > n_nationkey - IntVector: Row count = 5, but value count = 25 > n_name - VarCharVector: Row count = 5, but value count = 25 > n_regionkey - IntVector: Row count = 5, but value count = 25 > 18:44:53.337 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > n_regionkey - IntVector: Row count = 0, but value count = 25 > 18:44:53.646 [22d42569-c293-ced0-c3d0-e9153cc4a70a:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > LimitRecordBatch > key - NullableBitVector: Row count = 0, but value count = 2 > Running org.apache.drill.TestTpchSingleMode > 18:45:01.299 [22d42563-0ed6-1501-86a1-4cb375a9cad4:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > Running org.apache.drill.TestMergeFilterPlan > 18:45:03.738 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > o_orderkey - IntVector: Row count = 561, but value count = 15000 > o_orderdate - DateVector: Row count = 561, but value count = 15000 > o_orderpriority - VarCharVector: Row count = 561, but value count = 15000 > 18:45:03.828 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > l_orderkey - IntVector: Row count = 20580, but value count = 32767 > l_commitdate - DateVector: Row count = 20580, but value count = 32767 > l_receiptdate - DateVector: Row count = 20580, but value count = 32767 > 18:45:03.990 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > l_orderkey - IntVector: Row count = 17317, but value count = 27408 > l_commitdate - DateVector: Row count = 17317, but value count = 27408 > l_receiptdate - DateVector: Row count = 17317, but value count = 27408 > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.041 > s - in org.apache.drill.TestMergeFilterPlan > 18:45:04.929 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > o_orderkey - IntVector: Row count = 2287, but value count = 15000 > o_custkey - IntVector: Row count = 2287, but value count = 15000 > o_orderdate - DateVector: Row count = 2287, but value count = 15000 > 18:45:04.944 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > r_regionkey - IntVector: Row count = 1, but value count = 5 > r_name - VarCharVector: Row count = 1, but value count = 5 > [INFO] Running org.apache.drill.TestSelectWithOption > 18:45:06.120 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > l_quantity - Float8Vector: Row count = 594, but value count = 32767 > l_extendedprice - Float8Vector: Row count = 594, but value count = 32767 > l_discount - Float8Vector: Row count = 594, but value count = 32767 > l_shipdate - DateVector: Row count = 594, but value count = 32767 > 18:45:06.156 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR > o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from > FilterRecordBatch > l_quantity - Float8Vector: Row count = 543, but value count = 27408 > l_extendedprice - Float8Vector: Row count = 543, but value count = 27408 > l_discount - Float8Vector: Row count = 543, but value count = 27408 > l_shipdate - DateVector: Row count = 543, but value count = 27408 > {noformat} > And many, many more. (Note that the test names might not be accurate: Maven > runs multiple tests in parallel and it is hard to correlate log messages with > tests in this output format.) > The problem with these errors is that it makes operators very fragile: once > we accept invalid vectors, it is very hard to detect when an operator makes > vectors even more invalid. It is also hard to reason about the code if the > inputs (or outputs) can be corrupt in normal operation. > Suggestions: > 1. Extend {{BatchValidator}} with the vectors not yet covered (maps, repeated > maps.) > 2. Work step-by-step through tests. > 3. Identify operators that corrupt vectors. > 4. Fix the source of corruption and retest. > 5. Continue until no vector corruption errors occur. > 6. Change the {{IteratorValidatorBatchIterator}} to check vectors by default, > and to throw a fatal error if corruption is found. -- This message was sent by Atlassian Jira (v8.3.4#803005)