[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..

IMPALA-9829: Add Write Metrics for Spilling

Currently, we have read metrics for spilling, in this patch, we add
support for write metrics. The new metrics could be useful to measure
the write operations and target performance issues when involving in
spilling to remote disks(S3) (IMPALA-9828).

The metrics added record the information includes:
1. write latency of each write operation to the disk, metric kind:
HistogramMetric, unit: nanosecond.
2. write size of each write operation to the disk, metric kind:
HistogramMetric, unit: Bytes.
3. number of write IO errors when writing to the disk, metric kind:
IntCounter.

Testing:
 * added DiskIoMgrTest.MetricsOfWriteSizeAndLatency
 * added DiskIoMgrTest.MetricsOfWriteIoError
Ran unit test disk-io-mgr-test and pre-commit test

Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Reviewed-on: http://gerrit.cloudera.org:8080/16083
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/io/disk-io-mgr-internal.h
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/histogram-metric.h
M common/thrift/metrics.json
6 files changed, 258 insertions(+), 15 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 16
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 15: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 15
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Sat, 27 Jun 2020 03:54:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-26 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/16098/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16098/10//COMMIT_MSG@27
PS10, Line 27: configrations
nit: typo


http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@a1164
PS10, Line 1164:
why remove this?


http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1179
PS10, Line 1179: // If all partitions have good stats, return the total row 
count, contributed
   : // by all of them, as the row count for the table.
is this the intended behavior? so if a user has a table with hundreds of 
partitions, adding a single partition with corrupt stats basically causes the 
stats estimation to fall back to the table-size based estimation? wouldn't a 
more reasonable behavior be if you see a partition with corrupt stats you 
fallback to the file-size based estimation just for that partition?


http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1220
PS10, Line 1220:   if ( numRows > 0 )
   : numRows = Math.min(numRows, estNumRows);
   :   else
   : numRows = estNumRows;
for any if statement thats is longer than a single line, you should add curly 
braces. so it should be:

 if (numRows > 0) {
numRows = Math.min(numRows, estNumRows);
 } else {
numRows = estNumRows;
 }

you don't need the extra space in the if condition either


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py
File tests/metadata/test_compute_stats.py:

http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@180
PS10, Line 180: impala_9744
I know other methods do this, but I'm generally not a fan of including the JIRA 
number in a test method name. I think we can come up with a more descriptive 
name for this method.


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@194
PS10, Line 194: #
nit remove


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@199
PS10, Line 199:   CREATE TABLE {0}.{1} (
I think you could just use 'create table like parquet' rather than listing out 
all the columns. You can create the table in Impala and still modify it using 
the load data local inpath statement in Hive.


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@215
PS10, Line 215:   set hive.stats.autogather=true;
do you need this?


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@236
PS10, Line 236: #
I think its probably best to create separate test methods for the partitioned 
and unpartitioned cases



--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 27 Jun 2020 03:43:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9569: Show progress bar and live summary of the retried query

2020-06-26 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16096 )

Change subject: IMPALA-9569: Show progress bar and live_summary of the retried 
query
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16096/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16096/5/be/src/service/impala-server.cc@759
PS5, Line 759:   if (query_handle->IsRetriedQuery()) 
result->__set_error_logs(error_logs);
is there a specific reason to set the error logs at the end of the method 
rather than inside the above if statement that adds the entries to error_logs?

i guess one benefit is that if the Coordinator is null, then the error logs can 
still be exposed?

might be good to add a code comment mentioning this because its not immediately 
obvious give that this if statement condition is the same as the above if 
statement condition


http://gerrit.cloudera.org:8080/#/c/16096/5/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/16096/5/shell/impala_shell.py@1100
PS5, Line 1100:   if query_id_search and len(query_id_search.groups()) 
>= 1:
the length of the group should only be 1, right?



--
To view, visit http://gerrit.cloudera.org:8080/16096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f96919f00e0b64d589efd15b6b5ec82fb725d56
Gerrit-Change-Number: 16096
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Sat, 27 Jun 2020 02:21:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6440/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 27 Jun 2020 02:11:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6439/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 27 Jun 2020 02:08:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 14:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77
PS12, Line 77: will
> nit: "will go"?
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89
PS12, Line 89:
> Ok, in that case, we should just abandon estimate in case of cardinality is
This is now changed to obtain full input estimate from planner.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90:
> I will look at possibility to access that average size data in the backend.
I figure out how to get more precise estimate from planner. It is close enough 
to the actual input size that I observe in my test.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92
PS12, Line 92: &
> I think that VLOG(3) is enough.
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101: t
> "the" would be better
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101:   /// 'estimated_input_size' is the total rows in bytes that 
are estimated to get added
> nit: missing "are"
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102:   /// into this sorter. This is used to decide if sorter needs 
to proactively spill for
> nit: needs
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102: ively sp
> nit: spill
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223
PS12, Line 223: do
> nit: "do an"?
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816
PS12, Line 816:   PrettyPrinter::PrintBytes(estimated_input_size),
> I think that VLOG(3) is enough here - this should happen if the cardinality
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907
PS12, Line 907: PrettyPrinter::PrintBytes(GetSortRunBytesLimit()));
> Same as line 816.
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645
PS12, Line 645:  a join
> typo: backends
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74
PS12, Line 74: """The first sort run is given a privilege to ignore 
sort_run_bytes_limit, except
 :when estimate hints that spill is inevitable. The lower 
sort_run_bytes_limit of
 :a query is, the more sort runs are likely to be produced 
and spilled.
 :Case 1 : 0 SpilledRuns, because all rows fit within the 
maximum reservation.
 : sort_run_bytes_limit is not enforced.
 :Case 2 : 3 SpilledRuns, because the first run hit 
reservation limit, and the
 : next 2 runs are capped to 150m.
 :Case 3 : 4 SpilledRuns, because sort node estimate that 
spill is inevitable.
 : So all runs are capped to 130m, including the 
first one."""
> I will look at that 'query_result.runtime_profile'.
Done



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 27 Jun 2020 01:50:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#14).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 225 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/14
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#13).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 225 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/13
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9902: add rewrite of TPC-DS q38

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16118 )

Change subject: IMPALA-9902: add rewrite of TPC-DS q38
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6438/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81807683aa265a946729e15156bd2e33724103e1
Gerrit-Change-Number: 16118
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 27 Jun 2020 00:47:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6437/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 27 Jun 2020 00:30:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9902: add rewrite of TPC-DS q38

2020-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16118


Change subject: IMPALA-9902: add rewrite of TPC-DS q38
..

IMPALA-9902: add rewrite of TPC-DS q38

I generated the query with dsqgen and then
rewrote it to avoid intersect.

Testing:
Compared results to hive running the original version of the
query.

Change-Id: I81807683aa265a946729e15156bd2e33724103e1
---
A testdata/workloads/tpcds/queries/tpcds-q38.test
M testdata/workloads/tpcds/queries/tpcds-q8.test
M tests/query_test/test_tpcds_queries.py
3 files changed, 39 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/16118/1
--
To view, visit http://gerrit.cloudera.org:8080/16118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I81807683aa265a946729e15156bd2e33724103e1
Gerrit-Change-Number: 16118
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15961 )

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..


Patch Set 8: Code-Review+2

This looks good to me.


--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 27 Jun 2020 00:10:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..

IMPALA-9294: Support DATE for min-max runtime filter

Implemented Date min-max filter and applied it to Kudu as other
min-max runtime filters.
Added new test cases for Date min-max filters.

Testing:
Passed all core tests.

Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/runtime/date-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/protobuf/common.proto
M common/thrift/Data.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
12 files changed, 264 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16103/4
--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6060/


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 23:13:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15961 )

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 26 Jun 2020 23:08:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 15: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 15
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 26 Jun 2020 23:06:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6063/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 15
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 26 Jun 2020 23:06:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9213: Add query retry info to GetLog result

2020-06-26 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16093 )

Change subject: IMPALA-9213: Add query retry info to GetLog result
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16093/6/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

http://gerrit.cloudera.org:8080/#/c/16093/6/be/src/service/impala-beeswax-server.cc@327
PS6, Line 327: error_log_ss << Substitute(GET_LOG_QUERY_RETRY_INFO_FORMAT,
might be worth adding a DCHECK right before this line asserting that 
'!original_query_handle->query_status()'


http://gerrit.cloudera.org:8080/#/c/16093/6/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/16093/6/be/src/service/impala-hs2-server.cc@939
PS6, Line 939: ss << Substitute(GET_LOG_QUERY_RETRY_INFO_FORMAT,
same comment as the beeswax version


http://gerrit.cloudera.org:8080/#/c/16093/6/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16093/6/be/src/service/impala-server.cc@366
PS6, Line 366: \n
don't think this is necessary? it looks like it is adding an extra new line in 
the error output:

 WARNINGS: Original query failed:
 Failed due to unreachable impalad(s): stakiar-desktop:22002

 Query has been retried using query id: c84bd698dc98fac2:e79c78d1

Unless that was intentional?

It looks like Status::GetDetail --> ErrorMsg::GetFullMessageDetails which 
always has a newline at the end of the stringstream


http://gerrit.cloudera.org:8080/#/c/16093/6/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/16093/6/shell/impala_client.py@337
PS6, Line 337: get_query_link_callback
instead of piping this through all these methods as a callback, can we just 
move the 'get_query_link' method into a util class and just import it and use 
it directly here and in impala_shell.py?

you might be able to just add it as a utility method in impala_client.py and 
import it in impala_shell.py. I guess there isn't a common util class.


http://gerrit.cloudera.org:8080/#/c/16093/6/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/16093/6/shell/impala_shell.py@1072
PS6, Line 1072: _get_query_link_callback
nit: add some method code comments, rename to '_get_query_link' since it isn't 
always used as a callback (e.g. in _periodic_wait_callback)



--
To view, visit http://gerrit.cloudera.org:8080/16093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I58cf94f91a0b92eb9a3088bee3894ac157a954dc
Gerrit-Change-Number: 16093
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 26 Jun 2020 22:51:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 14: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6058/


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 14
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 26 Jun 2020 21:47:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15961 )

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..


Patch Set 8: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15961/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15961/6//COMMIT_MSG@42
PS6, Line 42: get
> Seems right to me?
Done


http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@58
PS4, Line 58: er
> Great, I'll do it in a follow up
JIRA?



--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 26 Jun 2020 21:40:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/runtime/date-value.h
File be/src/runtime/date-value.h:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/runtime/date-value.h@179
PS3, Line 179: pvalue
> Do we need to worry pvalue being nullptr?
No. The caller get pvalue by calling function 
UpdateFilterParamsPB::mutable_min_max_filter(), which will create a 
impala::MinMaxFilterPB object if the object has not been created yet.


http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter-test.cc
File be/src/util/min-max-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter-test.cc@371
PS3, Line 371: DATE_TIME_CHECK_VALS
> Maybe rename it as CHECK_NON_EMPTY_FILTER.
It's a micro for CheckTimestampVals and CheckDateVals. I try to keep the naming 
consistent.


http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.h@220
PS3, Line 220:
> min_ and max_ are not inited in the cstr.
Don't need to initialize min_ and max_ in constructor since always_false_ is 
set as true which means no rows have been inserted. min_ and max_ will be 
overwrote when the first row is inserted.


http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.cc@368
PS3, Line 368: in
> I wonder if In can be empty?
No. There is just one caller for this function. It's called only if 'in' has 
the value in MinMaxFilter::Or().


http://gerrit.cloudera.org:8080/#/c/16103/3/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test:

http://gerrit.cloudera.org:8080/#/c/16103/3/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test@167
PS3, Line 167:  Date type of column for
> Maybe "Query with date-typed join column"?
Agree


http://gerrit.cloudera.org:8080/#/c/16103/3/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test@168
PS3, Line 168: functional_kudu.date_tbl a
 : join [BROADCAST] functional_kudu.date_tbl b
> It may be useful to include some additional join queries:
There is 3-way join test case for integer-typed join columns. Since the min-max 
filter generation function don't check column type, the integer-typed test case 
should be enough to cover the logic. Also I don't see 3-way join for other 
type, like timestamp, string, decimal, etc.



--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 20:34:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6056/


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jun 2020 19:52:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9790: option to use resolved hostname everywhere

2020-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16108 )

Change subject: IMPALA-9790: option to use resolved hostname everywhere
..

IMPALA-9790: option to use resolved hostname everywhere

This adds a flag --use_resolved_hostname, which replaces
--hostname with a resolved IP on startup. This is useful
for containerized environments where the hostname -> IP
mapping can be very dynamic.

This flag is used by default in the dockerized minicluster.

This also fixes a bug in the test code that incorrectly
identified command line flags. Specifically it only checked
the suffix, so it confused use_resolved_hostname and hostname.

Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a
Reviewed-on: http://gerrit.cloudera.org:8080/16108
Tested-by: Impala Public Jenkins 
Reviewed-by: Andrew Sherman 
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M docker/catalogd/Dockerfile
M docker/impalad_coord_exec/Dockerfile
M docker/impalad_coordinator/Dockerfile
M docker/impalad_executor/Dockerfile
M docker/statestored/Dockerfile
M tests/common/impala_cluster.py
8 files changed, 23 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Andrew Sherman: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a
Gerrit-Change-Number: 16108
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.

2020-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16052 )

Change subject: IMPALA-9784: Non correlated subqueries in HAVING.
..


Patch Set 2:

(8 comments)

I think this makes sense, doing the nested query thing and converting it to a 
where clause does simplify it a lot.

http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@465
PS2, Line 465: // TODO: Remove this when independent subquery 
evaluation is implemented.
Can we add a simple planner test for this rewrite? With the cardinality check 
node added.


http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@466
PS2, Line 466: // TODO: IMPALA-5100 to cover all cases, we do let 
through runtime scalars with
> I relaxed some of these rules to let through subqueries such as (select cou
It might be helpful to file a separate JIRA with an example of this query, just 
so we have some record that we can run some new queries. Some simple things 
like:

  select * from functional.alltypes where id < (select count(bool_col) from 
functional.alltypes where int_col=1 group by int_col);


http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@469
PS2, Line 469: presensce
nit: presence


http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1218
PS2, Line 1218:   from functional.alltypestiny group by id
nit: remove the functional database name here and below - the test framework 
should set the current db based on the file format in the test matrix.. A lot 
of other queries here don't follow the best practice.


http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q23-1.test
File testdata/workloads/tpcds/queries/tpcds-q23-1.test:

http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q23-1.test@52
PS2, Line 52:  RESULTS
Compared to 
https://github.com/gregrahn/tpcds-kit/blob/master/answer_sets/23_NULLS_LAST.ans


http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q23-2.test
File testdata/workloads/tpcds/queries/tpcds-q23-2.test:

http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q23-2.test@59
PS2, Line 59: 'Brown','Monika',6031.52
Compared to 
https://github.com/gregrahn/tpcds-kit/blob/master/answer_sets/23_NULLS_LAST.ans


http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q24-2.test
File testdata/workloads/tpcds/queries/tpcds-q24-2.test:

http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q24-2.test@53
PS2, Line 53: 'Hamlin','Heather','able',149.65
Compared to https://github.com/gregrahn/tpcds-kit/blob/master/answer_sets/24.ans


http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q44.test
File testdata/workloads/tpcds/queries/tpcds-q44.test:

http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/tpcds/queries/tpcds-q44.test@51
PS2, Line 51: 1,'oughtantiprin st','callyeingbarcallyought'
Compared this to 
https://github.com/gregrahn/tpcds-kit/blob/master/answer_sets/44.ans



--
To view, visit http://gerrit.cloudera.org:8080/16052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461
Gerrit-Change-Number: 16052
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 19:35:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-9898: Plan generation and execution for grouping sets

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16115 )

Change subject: [WIP] IMPALA-9898: Plan generation and execution for grouping 
sets
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6436/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id474c5373860b0d8014ee9c844a3fb90092be968
Gerrit-Change-Number: 16115
Gerrit-PatchSet: 1
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 19:31:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 3:

(6 comments)

Looks good to me.  Have some comments. Thanks!

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/runtime/date-value.h
File be/src/runtime/date-value.h:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/runtime/date-value.h@179
PS3, Line 179: pvalue
Do we need to worry pvalue being nullptr?


http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter-test.cc
File be/src/util/min-max-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter-test.cc@371
PS3, Line 371: DATE_TIME_CHECK_VALS
Maybe rename it as CHECK_NON_EMPTY_FILTER.


http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.h@220
PS3, Line 220:
min_ and max_ are not inited in the cstr.


http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/16103/3/be/src/util/min-max-filter.cc@368
PS3, Line 368: in
I wonder if In can be empty?


http://gerrit.cloudera.org:8080/#/c/16103/3/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test:

http://gerrit.cloudera.org:8080/#/c/16103/3/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test@167
PS3, Line 167:  Date type of column for
Maybe "Query with date-typed join column"?


http://gerrit.cloudera.org:8080/#/c/16103/3/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test@168
PS3, Line 168: functional_kudu.date_tbl a
 : join [BROADCAST] functional_kudu.date_tbl b
It may be useful to include some additional join queries:
1. 3-way join on date-typed join columns;
2. 3-way join, one join column on date, and another join column on a non-date, 
such as int etc.



--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 19:24:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-9898: Plan generation and execution for grouping sets

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16115 )

Change subject: [WIP] IMPALA-9898: Plan generation and execution for grouping 
sets
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@355
PS1, Line 355: int outputSlotIdx = groupingExprs_.size() + 1; // add 1 to 
account for the tuple id column
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@357
PS1, Line 357:   outputSmap_.put(aggExprs_.get(i).clone(), new 
SlotRef(outputSlots.get(outputSlotIdx)));
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@452
PS1, Line 452:* Note that all classes have the same aggregate output exprs 
but different grouping exprs.
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@523
PS1, Line 523:   CaseExpr caseExpr = new CaseExpr(new 
ValidTupleIdExpr(aggTids), caseWhenClauses, null);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@551
PS1, Line 551:   CaseExpr caseExpr = new CaseExpr(new 
ValidTupleIdExpr(aggTids), caseWhenClauses, null);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@564
PS1, Line 564: CaseExpr caseExprForTids = new CaseExpr(new 
ValidTupleIdExpr(aggTids), caseWhenClauses, null);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@595
PS1, Line 595:   CaseExpr caseExpr = new CaseExpr(new 
ValidTupleIdExpr(aggTids), caseWhenClauses, null);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16115/1/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@597
PS1, Line 597:   // wrap this in an aggif expr because an AggregateInfo 
only allows either aggregate exprs
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id474c5373860b0d8014ee9c844a3fb90092be968
Gerrit-Change-Number: 16115
Gerrit-PatchSet: 1
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 18:54:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9790: option to use resolved hostname everywhere

2020-06-26 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16108 )

Change subject: IMPALA-9790: option to use resolved hostname everywhere
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a
Gerrit-Change-Number: 16108
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Fri, 26 Jun 2020 18:54:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-9898: Plan generation and execution for grouping sets

2020-06-26 Thread Aman Sinha (Code Review)
Aman Sinha has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16115


Change subject: [WIP] IMPALA-9898: Plan generation and execution for grouping 
sets
..

[WIP] IMPALA-9898: Plan generation and execution for grouping sets

This patch enhances the MultiAggregateInfo to handle grouping sets
and rollup (which is converted to grouping sets). This patch does
not itself do parsing/validation of grouping sets syntax but rather
provides the following supporting functionality:
  - A separate analyze method that accepts aggregation classes and
aggregation info that have been created separately (TODO: this
part will be added in another patch).
  - A modified Transpose phase that uses combination of aggif(),
valid_tid() functions and CASE exprs to choose exactly which
slots from the underlying aggregate classes need to be output
based on the tuple id.
  - Modified materialization step where all aggregate slots and
grouping slots are materialized in case of grouping sets.
  - Creates grouping_id value for grouping sets. The grouping_id
function in SQL describes which expression is grouped-by in a
particular row of a query with grouping sets.

Testing:
  This patch is not individually testable but will be tested
  as part of the overall grouping set support.

Change-Id: Id474c5373860b0d8014ee9c844a3fb90092be968
---
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
1 file changed, 260 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/16115/1
--
To view, visit http://gerrit.cloudera.org:8080/16115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id474c5373860b0d8014ee9c844a3fb90092be968
Gerrit-Change-Number: 16115
Gerrit-PatchSet: 1
Gerrit-Owner: Aman Sinha 


[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6435/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 18:45:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-26 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
..

IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

This work addresses the current limitation in computing the total row
count for a Hive table. The row count can be incorrectly computed as 0,
even though there exists data in partitions of the Hive table.

In the fix, as long as no partition in a Hive table exhibits any stats
corruptions including the kind described above, the total row count for
the table is computed from the row counts in all partitions. Otherwise,
Impala estimates the total row count from the total size of the
partitions involved and the row width, if feasible.

With the fix and in the explain output for some scan node, the
cardinality for a table with corrupted stats will be reported as a
positive value (the estimated row count), instead of 'unavailable'.
The table is still listed in the WARNING section for potentially corrupt
table statistics, at the beginning of the explain output.

Testing:
1. Ran unit tests with queries documented in the case against Hive
   tables with the following configrations:
   a. No stats corruption in any partitions
   b. Stats corruption in some partitions
   c. Stats corruption in all partitions
2. Added a new test in test_compute_stats.py:
 test_corrupted_stats_impala_9744
3. Fixed failures in corrupt-stats.test
4. Ran "core" test

Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test
M tests/metadata/test_compute_stats.py
3 files changed, 146 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16098/10
--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6434/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 18:12:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6060/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 18:05:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15961 )

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6059/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 26 Jun 2020 18:02:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6058/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 14
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 26 Jun 2020 17:57:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 14
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 26 Jun 2020 17:57:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15961 )

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6433/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 26 Jun 2020 17:51:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-26 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
..

IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

This work addresses the current limitation in computing the total row
count for a Hive table. The row count can be incorrectly computed as 0,
even though there exists data in partitions of the Hive table.

In the fix, as long as no partition in a Hive table exhibits any stats
corruptions including the kind described above, the total row count for
the table is computed from the row counts in all partitions. Otherwise,
Impala estimates the total row count from the total size of the
partitions involved and the row width, if feasible.

With the fix and in the explain output for some scan node, the
cardinality for a table with corrupted stats will be reported as a
positive value (the estimated row count), instead of 'unavailable'.
The table is still listed in the WARNING section for potentially corrupt
table statistics, at the beginning of the explain output.

Testing:
1. Ran unit tests with queries documented in the case against Hive
   tables with the following configrations:
   a. No stats corruption in any partitions
   b. Stats corruption in some partitions
   c. Stats corruption in all partitions
2. Added a new test in test_compute_stats.py:
 test_corrupted_stats_impala_9744
3. Fixed failures in corrupt-stats.test
4. Ran "core" test

Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test
M tests/metadata/test_compute_stats.py
3 files changed, 153 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16098/9
--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16098/9/tests/metadata/test_compute_stats.py
File tests/metadata/test_compute_stats.py:

http://gerrit.cloudera.org:8080/#/c/16098/9/tests/metadata/test_compute_stats.py@198
PS9, Line 198:
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16098/9/tests/metadata/test_compute_stats.py@198
PS9, Line 198: # autogathering. The partition stats is corrupted: row 
count=0 and row
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16098/9/tests/metadata/test_compute_stats.py@240
PS9, Line 240: #
flake8: E303 too many blank lines (2)



--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 17:50:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-26 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 26 Jun 2020 17:46:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6432/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 17:31:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15961

to look at the new patch set (#8).

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..

IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

In order to support the new admission control service, we need to be
able to return the results of an admission attempt, i.e. the query
schedule, to the coordinator.

To enable this, this patch moves all parts of the QuerySchedule class
and related classes that are required by the coordinator into a new
message QuerySchedulePB. The main admission control interface,
SubmitForAdmission(), now returns a QuerySchedulePB.

Some notable changes:
- Previously, QuerySchedule was used by Coordinator as a way to pass
  around references to parts of the TExecRequest to places like
  Coordinator::ExecSummary and Coordinator::BackendState. This has
  been replaced with the ExecParams class, which is a container for
  references to the TExecRequest and QuerySchedulePB along with
  convenience functions for accessing them.
- Similarly, FragmentExecParams, which is part of QuerySchedule,
  contains references to the associated TPlanFragment, owned by the
  TExecRequest, which were used by the Coordinator when iterating over
  the schedule to initiate the query. Since FragmentExecParamsPB can't
  contain these references, they were replaced by a map between
  fragment idx and TPlanFragment in ExecParams.
- In order to keep payloads reasonable for the eventual RPC interface,
  AdmissionController::ReleaseQuery() and ReleaseQueryBackend() now
  take a query id as a parameter instead of a QuerySchedule. To
  facilitate this, AdmissionController now maintains a map from query
  ids of running queries to the resources that were allocated for them
  so that it can look the resources up when releasing them. This map
  will be necessary when implementing the admission control service to
  facilitate proper accounting of resouces in cases like coordinator
  failures.
- As scheduling is currently organized, we first construct the
  FragmentExecParams with the FInstanceExecParams as their children,
  then we construct the BackendExecParams which get references to
  their FInstanceExecParams. Since we can't send references like these
  through an rpc, we now instead Swap() the FInstanceExecParamsPB
  into the BackendExecParamsPB.

Testing:
- Updated related tests.
- Passed a full run of existing tests.

Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
---
M be/src/runtime/CMakeLists.txt
M be/src/runtime/coordinator-backend-resource-state.cc
M be/src/runtime/coordinator-backend-state-test.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/query-driver.cc
A be/src/runtime/query-exec-params.cc
A be/src/runtime/query-exec-params.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M be/src/scheduling/cluster-membership-test-util.cc
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/debug-util.h
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
A common/protobuf/admission_control_service.proto
34 files changed, 1,082 insertions(+), 776 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/15961/8
--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-26 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..

IMPALA-9294: Support DATE for min-max runtime filter

Implemented Date min-max filter and applied it to Kudu as other
min-max runtime filters.
Added new test cases for Date min-max filters.

Testing:
Passed all core tests.

Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/runtime/date-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/protobuf/common.proto
M common/thrift/Data.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
12 files changed, 264 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16103/3
--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15961 )

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6431/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 26 Jun 2020 16:07:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15961 )

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/15961/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15961/6//COMMIT_MSG@31
PS6, Line 31: eventua
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/15961/6//COMMIT_MSG@42
PS6, Line 42: get
> nit typo
Seems right to me?


http://gerrit.cloudera.org:8080/#/c/15961/6/be/src/runtime/exec-params.h
File be/src/runtime/exec-params.h:

http://gerrit.cloudera.org:8080/#/c/15961/6/be/src/runtime/exec-params.h@37
PS6, Line 37:
> we might actually want to rename this to something like QueryExecParams, ot
Done


http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.h
File be/src/runtime/exec-params.h:

http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.h@73
PS4, Line 73:
> What I had in mind is if you know the total number of indices, then you can
Done


http://gerrit.cloudera.org:8080/#/c/15961/6/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/15961/6/be/src/scheduling/admission-controller.h@724
PS6, Line 724: memory
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/15961/6/be/src/scheduling/admission-controller.h@745
PS6, Line 745:
> mention this is guarded by admission_ctrl_lock_?
Done


http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/cluster-membership-mgr.cc@352
PS4, Line 352: INFO
> makes sense, perhaps a better INFO message would be "Did not blacklist ...
Done


http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@58
PS4, Line 58: er
> I think that makes sense, I like the 'ScheduleState' approach. yeah, its up
Great, I'll do it in a follow up


http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/scheduler.cc@300
PS4, Line 300: }
 :   } else if (ContainsNode(fragment.plan, 
TPlanNodeType::UNION_NODE)
> okay - might be worth re-factoring into a static helper method to make it a
Done


http://gerrit.cloudera.org:8080/#/c/15961/6/common/protobuf/admission_control_service.proto
File common/protobuf/admission_control_service.proto:

http://gerrit.cloudera.org:8080/#/c/15961/6/common/protobuf/admission_control_service.proto@39
PS6, Line 39:   // 0-based ordinal number of this particular instance. This is 
within its fragment, not
:   // query-wide, so eg. there will be one instance '0' for each 
fragment.
> still a bit confused about this represents, but I think this is basically t
Right



--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 26 Jun 2020 15:45:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

2020-06-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15961

to look at the new patch set (#7).

Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf
..

IMPALA-9692 (part 3): Model QuerySchedule as a protobuf

In order to support the new admission control service, we need to be
able to return the results of an admission attempt, i.e. the query
schedule, to the coordinator.

To enable this, this patch moves all parts of the QuerySchedule class
and related classes that are required by the coordinator into a new
message QuerySchedulePB. The main admission control interface,
SubmitForAdmission(), now returns a QuerySchedulePB.

Some notable changes:
- Previously, QuerySchedule was used by Coordinator as a way to pass
  around references to parts of the TExecRequest to places like
  Coordinator::ExecSummary and Coordinator::BackendState. This has
  been replaced with the ExecParams class, which is a container for
  references to the TExecRequest and QuerySchedulePB along with
  convenience functions for accessing them.
- Similarly, FragmentExecParams, which is part of QuerySchedule,
  contains references to the associated TPlanFragment, owned by the
  TExecRequest, which were used by the Coordinator when iterating over
  the schedule to initiate the query. Since FragmentExecParamsPB can't
  contain these references, they were replaced by a map between
  fragment idx and TPlanFragment in ExecParams.
- In order to keep payloads reasonable for the eventual RPC interface,
  AdmissionController::ReleaseQuery() and ReleaseQueryBackend() now
  take a query id as a parameter instead of a QuerySchedule. To
  facilitate this, AdmissionController now maintains a map from query
  ids of running queries to the resources that were allocated for them
  so that it can look the resources up when releasing them. This map
  will be necessary when implementing the admission control service to
  facilitate proper accounting of resouces in cases like coordinator
  failures.
- As scheduling is currently organized, we first construct the
  FragmentExecParams with the FInstanceExecParams as their children,
  then we construct the BackendExecParams which get references to
  their FInstanceExecParams. Since we can't send references like these
  through an rpc, we now instead Swap() the FInstanceExecParamsPB
  into the BackendExecParamsPB.

Testing:
- Updated related tests.
- Passed a full run of existing tests.

Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
---
M be/src/runtime/CMakeLists.txt
M be/src/runtime/coordinator-backend-resource-state.cc
M be/src/runtime/coordinator-backend-state-test.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/query-driver.cc
A be/src/runtime/query-exec-params.cc
A be/src/runtime/query-exec-params.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M be/src/scheduling/cluster-membership-test-util.cc
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/debug-util.h
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
A common/protobuf/admission_control_service.proto
34 files changed, 1,082 insertions(+), 776 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/15961/7
--
To view, visit http://gerrit.cloudera.org:8080/15961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9
Gerrit-Change-Number: 15961
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6056/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jun 2020 14:52:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 12: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jun 2020 14:52:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jun 2020 14:52:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9531: Dropped support for dateless timestamps

2020-06-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15866 )

Change subject: IMPALA-9531: Dropped support for dateless timestamps
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15866/6/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/15866/6/be/src/exprs/timestamp-functions.cc@154
PS6, Line 154: void UnixAndFromUnixPrepare(FunctionContext* context,
This function is not included in the header because you didn't want to include 
the whole datatime-common for CastDirection, right? Could you mention it here?


http://gerrit.cloudera.org:8080/#/c/15866/6/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/15866/6/be/src/runtime/datetime-simple-date-format-parser.h@87
PS6, Line 87: CastDirection cast_mode = FORMAT
This still has a default value. (I saw that you actually added the parameters 
on the callsites.)



--
To view, visit http://gerrit.cloudera.org:8080/15866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48c49bf027cc4b917849b3d58518facba372b322
Gerrit-Change-Number: 15866
Gerrit-PatchSet: 6
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Adam Tamas 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 13:58:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9632: Implement ds hll sketch() and ds hll estimate()

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16000 )

Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6430/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Gerrit-Change-Number: 16000
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 12:55:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9632: Implement ds hll sketch() and ds hll estimate()

2020-06-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16000 )

Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()
..


Patch Set 10:

PS10 is rebase with master and conflict resolution.


--
To view, visit http://gerrit.cloudera.org:8080/16000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Gerrit-Change-Number: 16000
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 12:27:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9632: Implement ds hll sketch() and ds hll estimate()

2020-06-26 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16000

to look at the new patch set (#10).

Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()
..

IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()

These functions can be used to get cardinality estimates of data
using HLL algorithm from Apache DataSketches. ds_hll_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized HLL sketch in string format. This can be written to a
table or be fed directly to ds_hll_estimate() that returns the
cardinality estimate for that sketch.

Comparing to ndv() these functions bring more flexibility as once we
fed data to the sketch it can be written to a table and next time we
can save scanning through the dataset and simply return the estimate
using the sketch. This doesn't come for free, however, as perfomance
measurements show that ndv() is 2x-3.5x faster than sketching. On the
other hand if we query the estimate from an existing sketch then the
runtime is negligible.
Another flexibility with these sketches is that they can be merged
together so e.g. if we had saved a sketch for each of the partitions
of a table then they can be combined with each other based on the
query without touching the actual data.
DataSketches HLL is sensitive for the order of the data fed to the
sketch and as a result running these algorithms in Impala gets
non-deterministic results within the error bounds of the algorithm.
In terms of correctness DataSketches HLL is most of the time in 2%
range from the correct result but there are occasional spikes where
the difference is bigger but never goes out of the range of 5%.
Even though the DataSketches HLL algorithm could be parameterized
currently this implementation hard-codes these parameters and use
HLL_4 and lg_k=12.

For more details about Apache DataSketches' HLL implementation see:
https://datasketches.apache.org/docs/HLL/HLL.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on TPCH25.lineitem to compare perfomance with
   ndv(). Depending on data characteristics ndv() appears 2x-3.5x
   faster. The lower the cardinality of the dataset the bigger the
   difference between the 2 algorithms is.
 - Ran manual tests on TPCH25.lineitem and
   functional_parquet.alltypes to compare correctness with ndv(). See
   results above.

Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-functions-ir.cc
A be/src/exprs/datasketches-functions.h
M be/src/exprs/datasketches-test.cc
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
A testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
A tests/query_test/test_datasketches.py
14 files changed, 477 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16000/10
--
To view, visit http://gerrit.cloudera.org:8080/16000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Gerrit-Change-Number: 16000
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9632: Implement ds hll sketch() and ds hll estimate()

2020-06-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16000 )

Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()
..


Patch Set 9: Code-Review+1

(1 comment)

Thanks for the changes!

http://gerrit.cloudera.org:8080/#/c/16000/9/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
File fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java:

http://gerrit.cloudera.org:8080/#/c/16000/9/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java@79
PS9, Line 79:   private boolean isUnupported_;
For me it seems more logical to move this to base class Function, even if only 
aggregates use it right now.



--
To view, visit http://gerrit.cloudera.org:8080/16000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Gerrit-Change-Number: 16000
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 11:13:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16001/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-original-file.test
File 
testdata/workloads/functional-query/queries/QueryTest/full-acid-original-file.test:

http://gerrit.cloudera.org:8080/#/c/16001/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-original-file.test@235
PS11, Line 235: invalidate metadata;
Ouch, I should have noticed this :/



--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jun 2020 10:38:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6429/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jun 2020 10:21:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 10:20:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..

IMPALA-9697: Support priority based scratch directory selection

The '--scratch_dirs' configuration option now supports specifying the
priority of the scratch direcotry. The lower the numeric value, the
higher is the priority. If priority is not specified then default
priority with value numeric_limits::max() is used.

Valid formats for specifying the priority are:
- ::
- ::
Following formats use default priority:
- 
- :
- ::

The new logic in TmpFileGroup::AllocateSpace() tries to find a target
file using a prioritized round-robin scheme. Files are ordered in
decreasing order of their priority. The priority of a file is same as
the priority of the related directory. A target file is selected by
always searching in the ordered list starting from the file with highest
priority. If multiple files have same priority, then the target file is
selected in a round robin manner.

Testing:
- Added unit and e2e tests for priority based spilling logic.

Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Reviewed-on: http://gerrit.cloudera.org:8080/16091
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
4 files changed, 370 insertions(+), 49 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9632: Implement ds hll sketch() and ds hll estimate()

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16000 )

Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6428/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Gerrit-Change-Number: 16000
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 10:10:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-26 Thread Zoltan Borok-Nagy (Code Review)
Hello Norbert Luksa, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16001

to look at the new patch set (#12).

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..

IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

"Original files" are files that don't have full ACID schema. We can see
such files if we upgrade a non-ACID table to full ACID. Also, the LOAD
DATA statement can load non-ACID files into full ACID tables. So such
files don't store special ACID columns, that means we need
to auto-generate their values. These are (operation,
originalTransaction, bucket, rowid, and currentTransaction).

With the exception of 'rowid', all of them can be calculated based on
the file path, so I add their values to the scanner's template tuple.

'rowid' is the ordinal number of the row inside a bucket inside a
directory. For now Impala only allows one file per bucket per
directory. Therefore we can generate row ids for each file
independently.

Multiple files in a single bucket in a directory can only be present if
the table was non-transactional earlier and we upgraded it to full ACID
table. After the first compaction we should only see one original file
per bucket per directory.

In HdfsOrcScanner we calculate the first row id for our split then
the OrcStructReader fills the rowid slot with the proper values.

Testing:
 * added e2e tests to check if the generated values are correct
 * added e2e test to reject tables that have multiple files per bucket
 * added unit tests to the new auxiliary functions

Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
---
M be/src/exec/acid-metadata-utils-test.cc
M be/src/exec/acid-metadata-utils.cc
M be/src/exec/acid-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M testdata/data/README
A testdata/data/alltypes_non_acid.orc
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A 
testdata/workloads/functional-query/queries/QueryTest/full-acid-original-file.test
M tests/query_test/test_acid.py
16 files changed, 715 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/16001/12
--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9632: Implement ds hll sketch() and ds hll estimate()

2020-06-26 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16000

to look at the new patch set (#9).

Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()
..

IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()

These functions can be used to get cardinality estimates of data
using HLL algorithm from Apache DataSketches. ds_hll_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized HLL sketch in string format. This can be written to a
table or be fed directly to ds_hll_estimate() that returns the
cardinality estimate for that sketch.

Comparing to ndv() these functions bring more flexibility as once we
fed data to the sketch it can be written to a table and next time we
can save scanning through the dataset and simply return the estimate
using the sketch. This doesn't come for free, however, as perfomance
measurements show that ndv() is 2x-3.5x faster than sketching. On the
other hand if we query the estimate from an existing sketch then the
runtime is negligible.
Another flexibility with these sketches is that they can be merged
together so e.g. if we had saved a sketch for each of the partitions
of a table then they can be combined with each other based on the
query without touching the actual data.
DataSketches HLL is sensitive for the order of the data fed to the
sketch and as a result running these algorithms in Impala gets
non-deterministic results within the error bounds of the algorithm.
In terms of correctness DataSketches HLL is most of the time in 2%
range from the correct result but there are occasional spikes where
the difference is bigger but never goes out of the range of 5%.
Even though the DataSketches HLL algorithm could be parameterized
currently this implementation hard-codes these parameters and use
HLL_4 and lg_k=12.

For more details about Apache DataSketches' HLL implementation see:
https://datasketches.apache.org/docs/HLL/HLL.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on TPCH25.lineitem to compare perfomance with
   ndv(). Depending on data characteristics ndv() appears 2x-3.5x
   faster. The lower the cardinality of the dataset the bigger the
   difference between the 2 algorithms is.
 - Ran manual tests on TPCH25.lineitem and
   functional_parquet.alltypes to compare correctness with ndv(). See
   results above.

Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-functions-ir.cc
A be/src/exprs/datasketches-functions.h
M be/src/exprs/datasketches-test.cc
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
A testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
A tests/query_test/test_datasketches.py
14 files changed, 477 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16000/9
--
To view, visit http://gerrit.cloudera.org:8080/16000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Gerrit-Change-Number: 16000
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-06-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@163
PS2, Line 163:  QUERY
Add a test where the input data set contains valid sketches and nulls. 
Expectation is that nulls are ignored.



--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 07:04:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 06:56:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..

IMPALA-9691: Support Kudu Timestamp and Date bloom filter

Impala save timestamp as 12 bytes of structure TimestampValue with
time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time
microseconds. To avoid the data truncation issue in the bloom filter,
add FunctionCallExpr with 'utc_to_unix_micros' as the root of source
expression of bloom filter to convert timestamp values to microseconds
when building timestamp bloom filter for Kudu.
Generated functional date_tbl table in Kudu format for unit-test.
Added new test cases for Kudu Timestamp and Date bloom filters.

Testing:
Passed all core tests.

Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Reviewed-on: http://gerrit.cloudera.org:8080/16094
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
7 files changed, 311 insertions(+), 30 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou