[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16019 ) Change subject: IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16019/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16019/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-9809: A query with multi-aggregation functions on particular nit: make the first line more concise so it fits on a single line. -- To view, visit http://gerrit.cloudera.org:8080/16019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16 Gerrit-Change-Number: 16019 Gerrit-PatchSet: 1 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 01 Jun 2020 23:07:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16019 ) Change subject: IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6187/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16 Gerrit-Change-Number: 16019 Gerrit-PatchSet: 1 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 01 Jun 2020 23:03:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16019 ) Change subject: IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/16019/1/be/src/exec/grouping-aggregator-ir.cc File be/src/exec/grouping-aggregator-ir.cc: http://gerrit.cloudera.org:8080/#/c/16019/1/be/src/exec/grouping-aggregator-ir.cc@160 PS1, Line 160: outBatchStart nit: out_batch_start http://gerrit.cloudera.org:8080/#/c/16019/1/testdata/data/local_parquet_tbl/part-0-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet File testdata/data/local_parquet_tbl/part-0-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet: PS1: Checking in this amount of binary data is an issue for a number of reasons, not least because it bloats the repo checkout... I think it would be best if we checked in the script to generate the data and ran it during data loading. http://gerrit.cloudera.org:8080/#/c/16019/1/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/16019/1/tests/query_test/test_aggregation.py@380 PS1, Line 380: @SkipIfDockerizedCluster.accesses_host_filesystem We should avoid this restriction - other tests achieve similar things without this limitation. We solve this either by copying the files into the cluster with the hdfs client (see the various create_table_and_copy_files() invocations in tests/query_test/test_scanners.py) or by generating and loading the table as part of data loading. -- To view, visit http://gerrit.cloudera.org:8080/16019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16 Gerrit-Change-Number: 16019 Gerrit-PatchSet: 1 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 01 Jun 2020 22:55:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon
Yongzhi Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16019 Change subject: IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon .. IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon In streaming-aggregation-node.cc , when replicate_input_ is true and num_aggs > 1, it will call AddBatchStreaming several times(more than 1), each time, the out_batch will be used. If a row is not cached, the value will be saved in the out_batch, and out_batch's row count will be increased. The row_count did not set back to 0 when next while loop. Therefore in out_batch, it is possible that not all the tuples are non-null. (For example the rows added when agg_idx = 1, only tuple with 1 not null; the rows added when when agg_idx = 2, only tuple with 2 not null). But in grouping-aggregation-ir.cc, the serialize out code is start from very beginning of out_batch for a agg_idx, it has good chance to hit null tuple. Fix the issue by only serialize the tuples being added by current function call. Tests: Manual tests Unit tests Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16 --- M be/src/exec/grouping-aggregator-ir.cc A testdata/data/local_parquet_tbl/_SUCCESS A testdata/data/local_parquet_tbl/part-0-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-1-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-2-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-3-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-4-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-5-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-6-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-7-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-8-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-9-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-00010-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/data/local_parquet_tbl/part-00011-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet A testdata/workloads/functional-query/queries/QueryTest/min-multiple-distinct-aggs.test M tests/query_test/test_aggregation.py 16 files changed, 30 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16019/1 -- To view, visit http://gerrit.cloudera.org:8080/16019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16 Gerrit-Change-Number: 16019 Gerrit-PatchSet: 1 Gerrit-Owner: Yongzhi Chen