[ 
https://issues.apache.org/jira/browse/DRILL-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-7324:
----------------------------------

    Assignee: Paul Rogers

> Many vector-validity errors from unit tests
> -------------------------------------------
>
>                 Key: DRILL-7324
>                 URL: https://issues.apache.org/jira/browse/DRILL-7324
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Drill's value vectors contain many counts that must be maintained in sync. 
> Drill provides a utility, {{BatchValidator}} to check (a subset of) these 
> values for consistency.
> The {{IteratorValidatorBatchIterator}} class is used in tests to validate the 
> state of each operator (AKA "record batch") as Drill runs the Volcano 
> iterator. This class can also validate vectors by setting the 
> {{VALIDATE_VECTORS}} constant to `true`.
> This was done, then unit tests were run. Many tests failed. Examples:
> {noformat}
> [INFO] Running org.apache.drill.TestUnionDistinct
> 18:44:26.742 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> 18:44:26.745 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> [INFO] Running org.apache.drill.TestUnionDistinct
> 8:44:48.302 [22d4256e-c90b-847c-5104-02d6cdf5223e:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> 18:44:48.703 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 2, but value count = 25
> n_regionkey - IntVector: Row count = 2, but value count = 25
> 18:44:48.731 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 4, but value count = 25
> n_regionkey - IntVector: Row count = 4, but value count = 25
> 18:44:49.039 [22d4256f-6b39-d2ab-d145-4f2b0db315a3:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 2, but value count = 25
> 18:44:49.363 [22d4256e-3d91-850f-9ab4-5939219ac0d0:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> c_custkey - IntVector: Row count = 4, but value count = 1500
> 18:44:49.597 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 5, but value count = 25
> n_regionkey - IntVector: Row count = 5, but value count = 25
> 18:44:49.610 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> r_regionkey - IntVector: Row count = 1, but value count = 5
> 18:44:53.029 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_nationkey - IntVector: Row count = 0, but value count = 25
> n_name - VarCharVector: Row count = 0, but value count = 25
> n_regionkey - IntVector: Row count = 0, but value count = 25
> 18:44:53.033 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_regionkey - IntVector: Row count = 5, but value count = 25
> 18:44:53.331 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_nationkey - IntVector: Row count = 5, but value count = 25
> n_name - VarCharVector: Row count = 5, but value count = 25
> n_regionkey - IntVector: Row count = 5, but value count = 25
> 18:44:53.337 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_regionkey - IntVector: Row count = 0, but value count = 25
> 18:44:53.646 [22d42569-c293-ced0-c3d0-e9153cc4a70a:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> Running org.apache.drill.TestTpchSingleMode
> 18:45:01.299 [22d42563-0ed6-1501-86a1-4cb375a9cad4:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> Running org.apache.drill.TestMergeFilterPlan
> 18:45:03.738 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> o_orderkey - IntVector: Row count = 561, but value count = 15000
> o_orderdate - DateVector: Row count = 561, but value count = 15000
> o_orderpriority - VarCharVector: Row count = 561, but value count = 15000
> 18:45:03.828 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_orderkey - IntVector: Row count = 20580, but value count = 32767
> l_commitdate - DateVector: Row count = 20580, but value count = 32767
> l_receiptdate - DateVector: Row count = 20580, but value count = 32767
> 18:45:03.990 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_orderkey - IntVector: Row count = 17317, but value count = 27408
> l_commitdate - DateVector: Row count = 17317, but value count = 27408
> l_receiptdate - DateVector: Row count = 17317, but value count = 27408
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.041 
> s - in org.apache.drill.TestMergeFilterPlan
> 18:45:04.929 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> o_orderkey - IntVector: Row count = 2287, but value count = 15000
> o_custkey - IntVector: Row count = 2287, but value count = 15000
> o_orderdate - DateVector: Row count = 2287, but value count = 15000
> 18:45:04.944 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> r_regionkey - IntVector: Row count = 1, but value count = 5
> r_name - VarCharVector: Row count = 1, but value count = 5
> [INFO] Running org.apache.drill.TestSelectWithOption
> 18:45:06.120 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_quantity - Float8Vector: Row count = 594, but value count = 32767
> l_extendedprice - Float8Vector: Row count = 594, but value count = 32767
> l_discount - Float8Vector: Row count = 594, but value count = 32767
> l_shipdate - DateVector: Row count = 594, but value count = 32767
> 18:45:06.156 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_quantity - Float8Vector: Row count = 543, but value count = 27408
> l_extendedprice - Float8Vector: Row count = 543, but value count = 27408
> l_discount - Float8Vector: Row count = 543, but value count = 27408
> l_shipdate - DateVector: Row count = 543, but value count = 27408
> {noformat}
> And many, many more. (Note that the test names might not be accurate: Maven 
> runs multiple tests in parallel and it is hard to correlate log messages with 
> tests in this output format.)
> The problem with these errors is that it makes operators very fragile: once 
> we accept invalid vectors, it is very hard to detect when an operator makes 
> vectors even more invalid. It is also hard to reason about the code if the 
> inputs (or outputs) can be corrupt in normal operation.
> Suggestions:
> 1. Extend {{BatchValidator}} with the vectors not yet covered (maps, repeated 
> maps.)
> 2. Work step-by-step through tests.
> 3. Identify operators that corrupt vectors.
> 4. Fix the source of corruption and retest.
> 5. Continue until no vector corruption errors occur.
> 6. Change the {{IteratorValidatorBatchIterator}} to check vectors by default, 
> and to throw a fatal error if corruption is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to