[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon

2020-06-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16019 )

Change subject: IMPALA-9809: A query with multi-aggregation functions on 
particular dataset crashes impala daemon
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16019/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16019/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-9809: A query with multi-aggregation functions on particular
nit: make the first line more concise so it fits on a single line.



--
To view, visit http://gerrit.cloudera.org:8080/16019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16
Gerrit-Change-Number: 16019
Gerrit-PatchSet: 1
Gerrit-Owner: Yongzhi Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 01 Jun 2020 23:07:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon

2020-06-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16019 )

Change subject: IMPALA-9809: A query with multi-aggregation functions on 
particular dataset crashes impala daemon
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6187/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16
Gerrit-Change-Number: 16019
Gerrit-PatchSet: 1
Gerrit-Owner: Yongzhi Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 01 Jun 2020 23:03:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon

2020-06-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16019 )

Change subject: IMPALA-9809: A query with multi-aggregation functions on 
particular dataset crashes impala daemon
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16019/1/be/src/exec/grouping-aggregator-ir.cc
File be/src/exec/grouping-aggregator-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16019/1/be/src/exec/grouping-aggregator-ir.cc@160
PS1, Line 160: outBatchStart
nit: out_batch_start


http://gerrit.cloudera.org:8080/#/c/16019/1/testdata/data/local_parquet_tbl/part-0-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
File 
testdata/data/local_parquet_tbl/part-0-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet:

PS1:
Checking in this amount of binary data is an issue for a number of reasons, not 
least because it bloats the repo checkout... I think it would be best if we 
checked in the script to generate the data and ran it during data loading.


http://gerrit.cloudera.org:8080/#/c/16019/1/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/16019/1/tests/query_test/test_aggregation.py@380
PS1, Line 380:   @SkipIfDockerizedCluster.accesses_host_filesystem
We should avoid this restriction - other tests achieve similar things without 
this limitation. We solve this either by copying the files into the cluster 
with the hdfs client (see the various  create_table_and_copy_files() 
invocations in tests/query_test/test_scanners.py) or by generating and loading 
the table as part of data loading.



--
To view, visit http://gerrit.cloudera.org:8080/16019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16
Gerrit-Change-Number: 16019
Gerrit-PatchSet: 1
Gerrit-Owner: Yongzhi Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 01 Jun 2020 22:55:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9809: A query with multi-aggregation functions on particular dataset crashes impala daemon

2020-06-01 Thread Yongzhi Chen (Code Review)
Yongzhi Chen has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16019


Change subject: IMPALA-9809: A query with multi-aggregation functions on 
particular dataset crashes impala daemon
..

IMPALA-9809: A query with multi-aggregation functions on particular
dataset crashes impala daemon

In streaming-aggregation-node.cc , when replicate_input_ is true
and num_aggs > 1, it will call AddBatchStreaming several
times(more than 1), each time, the out_batch will be used.
If a row is not cached, the value will be saved in the out_batch,
and out_batch's row count will be increased.
The row_count did not set back to 0 when next while loop. Therefore
in out_batch, it is possible that not all the tuples are non-null.
(For example the rows added when agg_idx = 1, only tuple with 1 not
null; the rows added when when agg_idx = 2, only tuple with 2 not
null). But in grouping-aggregation-ir.cc, the serialize out code is
start from very beginning of out_batch for a agg_idx, it has good
chance to hit null tuple.

Fix the issue by only serialize the tuples being added by
current function call.

Tests:
Manual tests
Unit tests

Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16
---
M be/src/exec/grouping-aggregator-ir.cc
A testdata/data/local_parquet_tbl/_SUCCESS
A 
testdata/data/local_parquet_tbl/part-0-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-1-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-2-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-3-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-4-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-5-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-6-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-7-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-8-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-9-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-00010-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/data/local_parquet_tbl/part-00011-fafc2cd0-f5c8-4fbb-ac3f-717447d67af8-c000.snappy.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/min-multiple-distinct-aggs.test
M tests/query_test/test_aggregation.py
16 files changed, 30 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16019/1
--
To view, visit http://gerrit.cloudera.org:8080/16019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16
Gerrit-Change-Number: 16019
Gerrit-PatchSet: 1
Gerrit-Owner: Yongzhi Chen