Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18745
Change subject: IMPALA-11444: Fix wrong results when reading wide rows from ORC ...................................................................... IMPALA-11444: Fix wrong results when reading wide rows from ORC After IMPALA-9228, ORC scanner reads rows into scratch batch where we perform conjuncts and runtime filters. The survived rows will be picked by the output row batch. We loop this until the output row batch is filled (1024 rows by default) or we finish reading the ORC batch (1024 rows by default). Usually the loop will have only 1 iteration since the scratch batch capacity is also 1024. All rows of the current ORC batch can be materialized into the scratch batch. However, when reading wide rows that have tuple size larger than 4096 bytes, the scratch batch capacity will be reduced to be lower 1024, i.e. the scratch batch can store less than 1024 rows. In this case, we need more iterations in the loop. The bug is that we didn't commit rows to the output row batch after each iteration. The suvived rows will be ovewritten in the second iteration. This is fixed in a later optimization (IMPALA-9469) which is missing in the 3.x branch. This patch only pick the fix of it. Tests: - Add test on wide tables with 2K columns Change-Id: I09f1c23c817ad012587355c16f37f42d5fb41bff --- M be/src/exec/hdfs-orc-scanner.cc M tests/query_test/test_scanners.py 2 files changed, 56 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/18745/1 -- To view, visit http://gerrit.cloudera.org:8080/18745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 3.x Gerrit-MessageType: newchange Gerrit-Change-Id: I09f1c23c817ad012587355c16f37f42d5fb41bff Gerrit-Change-Number: 18745 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang <[email protected]>
