[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5947/


--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jun 2020 05:16:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6219/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jun 2020 01:33:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] Filter out "Checksum validation failed" messages during the maven build

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15775 )

Change subject: Filter out "Checksum validation failed" messages during the 
maven build
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5948/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19afbd157533e52ef3157730c7ec5159241749bc
Gerrit-Change-Number: 15775
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 05 Jun 2020 01:21:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6218/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 05 Jun 2020 01:16:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,128 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/8
--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5947/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jun 2020 01:14:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 00:40:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..

IMPALA-9782: fix Kudu DML with mt_dop

In general, ScalarExpr subclasses should not have
mutable state, i.e. state that is modified during
query execution, and definitely need to be thread-safe.
KuduPartitionExpr violated that, since KuduPartitioner
and KuduPartialRow are both mutated as part of expr
evaluation.

This patch moves those objects into thread-local state
for KuduPartitionExpr.

Testing:
* Add a regression test that does a large insert. Before this
  patch it reliably crashed Impala with mt_dop=4.
* Run more Kudu DML with mt_dop to improve test coverage.
* Make test_kudu.py use the correct test dimension (fixing a
  cosmetic issue).

Perf:
This changes adds some indirection to the expression evaluation,
so I did some manual benchmarking with this query, which
should somewhat stress the partitioning:

set mt_dop=1;
insert into orders_key_only
select o_orderkey from tpch_kudu.orders where o_orderkey % 10 = 0

The timing was in the same range before and after - between 6 and
8 seconds - but the results were very unstable so inconclusive.
The Kudu tservers were using an order-of-magnitude more CPU than
the impalads, so it seems safe to conclude that these partitioning
exprs are *not* a bottleneck for DML workloads.

Perf record showed impala::KuduPartitionExpr::GetIntValInterpreted()
taking up 0.09% of the impalad CPU, for additional evidence that
this doesn't make a difference.

Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Reviewed-on: http://gerrit.cloudera.org:8080/16029
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/exprs/scalar-expr-evaluator.h
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
6 files changed, 129 insertions(+), 44 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879: if (cur_batch_index < batch->num_rows()
> I think it is implied by line 872?
Ahh, Sorter::enable_spilling_ is a bit misleading - that actually is whether 
it's in the partial sort mode or not. You have Sorter::enable_spilling_ = true 
for full sorts, even if spilling is disabled at the query level.

Just as an example of what I mean, if you run a sort query that requires 500mb 
with, say, scratch_limit=0, then currently that will succeed so long as it can 
get the memory.

But I think if you set sort_bytes_limit=100mb, then it would try to spill and 
fail, even thought there's memory available.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 00:38:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision
..


Patch Set 11:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@87
PS11, Line 87: [2, 9, 96, 988, 980, 1000, 944, 1030, 1020, 990, 1010, 957, 
1030, 1027, 9845, 9898],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@88
PS11, Line 88: [2, 9, 97, 957, 1008, 1016, 1005, 963, 994, 993, 1018, 1004, 
963, 1014, 10210, 10280],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@89
PS11, Line 89: [2, 9, 98, 977, 1024, 1020, 975, 977, 1002, 991, 994, 1006, 977, 
999, 10118, 9923],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@90
PS11, Line 90: [2, 9, 99, 986, 1009, 1011, 994, 980, 997, 994, 1002, 997, 980, 
988, 10148, 9987],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@91
PS11, Line 91: [2, 9, 99, 995, 996, 1000, 998, 988, 995, 999, 997, 999, 988, 
979, 9974, 9960],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@92
PS11, Line 92: [2, 9, 99, 998, 1005, 999, 1003, 994, 1000, 993, 999, 998, 994, 
992, 9899, 9941],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@93
PS11, Line 93: [2, 9, 99, 993, 1001, 1007, 1000, 998, 1002, 997, 999, 998, 998, 
999, 9923, 9931],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@94
PS11, Line 94: [2, 9, 99, 994, 998, 1002, 1002, 999, 998, 999, 997, 1000, 999, 
997, 9937, 9973],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@95
PS11, Line 95: [2, 9, 99, 995, 997, 998, 1001, 999, 1001, 996, 997, 1000, 999, 
998, 9989, 9981],
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@96
PS11, Line 96: [2, 9, 99, 998, 998, 997, 999, 998, 1000,  998, 1000, 998, 998, 
1000, 1, 10003]
flake8: E122 continuation line missing indentation or outdented


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@96
PS11, Line 96:  
flake8: E241 multiple spaces after ','


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@315
PS11, Line 315: 
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@315
PS11, Line 315:   # Verify that each ndv() value (one per column for a 
total of 11) is identical
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@316
PS11, Line 316:
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@316
PS11, Line 316:   # to the corresponding known value. Since NDV() invokes 
Hash64() hash function
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@319
PS11, Line 319: -
flake8: E226 missing whitespace around arithmetic operator



--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 05 Jun 2020 00:30:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-04 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision
..

[WIP] IMPALA-2658: Extend the NDV function to accept a precision

This work addresses the current limitation in NDV function by
extending the function to take the 2nd integer-typed argument,
which must be an abstract value in the range of 1 to 10. This
abstract value specifies a real precision value used in the HLL
algorithm for the function.

Front end work:
1. Add a new template ndv function in builtin db that takes two
   arguments.
2. Verify that the 2nd argument of a NDV() is an integer literal in
   [1,10];
3. A new method to implement the mapping of the abstract value to the
   hll precision (the real work is TBD);
4. The length of the intermediate data type is computed based on the
   actual hll precision. When the 2nd argument is missing, the length
   is 1024 as in the current implementation;
5. The 2nd argument, if present, will be carried over all the way to
   the BE.

Back end work:
1. Remove the hardcoded precision (10) from these functions:
 AggregateFunctions::HllInit(),
 AggregateFunctions::HllUpdate(),
 AggregateFunctions::HllMerge(),
 AggregateFunctions::HllFinalEstimate(),
 AggregateFunctions::HllFinalize(),
 HllEstimateBias();
2. Instead, the actual precision is computed from the
   length of the intermediate data type as log2(hll_len);
3. Verify that the length of the intermediate data type is
   correct according to the 2nd argument (if present).

Work TDB:
1. Add unit tests;
2. Determine the final mapping of the 10 abstract values to
   10 possible Hll precisions.

Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
---
M be/src/common/logging.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M tests/query_test/test_aggregation.py
6 files changed, 302 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/15997/11
--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9673: Add external warehouse dir variable in E2E test

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15990 )

Change subject: IMPALA-9673: Add external warehouse dir variable in E2E test
..


Patch Set 11:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5943/


--
To view, visit http://gerrit.cloudera.org:8080/15990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f
Gerrit-Change-Number: 15990
Gerrit-PatchSet: 11
Gerrit-Owner: Xiaomeng Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Xiaomeng Zhang 
Gerrit-Comment-Date: Fri, 05 Jun 2020 00:28:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread David Rorke (Code Review)
David Rorke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> I'm not necessarily opposed to this approach (enforcing limit only after sp
>From a quick look at the planner estimate code it seems we're estimating only 
>what needs to be kept in memory assuming an external 2-phase sort (so assuming 
>we'll spill).  So maybe we should just look at the full input size and enforce 
>the sort_bytes_limit from the outset if the full input size is > the memory 
>limit.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 22:50:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40
PS9, Line 40:   is varied between unspecified (not limited) vs 512 MB. The
> Did you do any experiments with lower values? Wondering if that helps furth
I later did 256 MB limit and it achieved a little faster query time. Lowering 
to 128 MB make the query time worse.
In David's large scale experiment though, there is not much difference between 
256 MB vs 512 MB.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879: if (cur_batch_index < batch->num_rows()
> I have some concern with how this will interact with spilling being disable
I think it is implied by line 872?

  DCHECK(enable_spilling_);


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531
PS9, Line 531: intermediate
> intermediate sort isn't that clear - I guess "intermediate sort runs"
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533
PS9, Line 533:   SORT_BYTES_LIMIT = 103
> I think we might actually want to make this SORT_RUN_BYTES_LIMIT or somethi
Ack



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 23:00:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(4 comments)

I'm thinking through the high level of this. I think the biggest obstacle to 
using this (or turning on by default) is that it can make queries spill that 
didn't spill before, or spill more than before, and that can cause issues.

I guess if the sort is already spilling the different is smaller.

It might be more of a clear win if it achieved the more incremental sorting 
behaviour without the extra spilling, e.g. by having sorted in-memory runs (as 
opposed to sorted spilled runs and unsorted in-memory runs). I don't know that 
I necessarily want to expand the scope of this commit, but would be interested 
on your thoughts on that.

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40
PS9, Line 40:   is varied between unspecified (not limited) vs 512 MB. The
Did you do any experiments with lower values? Wondering if that helps further.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879: if (cur_batch_index < batch->num_rows()
I have some concern with how this will interact with spilling being disabled 
(scratch_limit=0 or disable_unsafe_spills=true, with some missing stats). In 
those cases StartSpilling() returns an error.

Other operators only spill when the alternative is failing the query with OOM, 
so it's fine to just fail the query. But here we could hit that even if there's 
plenty of memory. I think it would make sense to also check if spilling is 
enabled.


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531
PS9, Line 531: intermediate
intermediate sort isn't that clear - I guess "intermediate sort runs"


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533
PS9, Line 533:   SORT_BYTES_LIMIT = 103
I think we might actually want to make this SORT_RUN_BYTES_LIMIT or something 
along those lines. This name kinda boxes us into it being an upper bound on the 
total memory used by the sort node, whereas we might want to use more memory to 
optimise this further.

One way I can see this evolving is that, if we had some free memory to play 
with when hitting this limit, we could defer spilling and keep some of the runs 
in memory. Maybe even incrementally do the quicksort work on previous runs in 
AddBatch() to reduce the blocking time further.

If we did that, SORT_BYTES_LIMIT would be pretty misleading.

Another case I thought about was the interaction with spilling being disabled - 
i left a comment about that in sorter.cc.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 22:37:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread David Rorke (Code Review)
David Rorke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> Great idea! I will try to implement it that way.
I'm not necessarily opposed to this approach (enforcing limit only after 
spilling starts) but if we had confidence in the memory estimate it seems like 
we could enforce the limit from the start if the estimate is > the memory limit 
(we're very likely to spill).  Unfortunately in some of the queries I'm looking 
at our estimates are lower than the actual peak consumed (with no limit) by an 
order of magnitude even though the queries end up spilling heavily.
So we'll have to look into why those estimates are so bad, but for now maybe we 
should file a follow up JIRA to go back and make the application of the limit 
consider the estimate once we've improved the estimates (and consider a TODO in 
the code).



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 22:25:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6217/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 04 Jun 2020 21:53:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,122 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/7
--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6215/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 20:15:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6216/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 04 Jun 2020 20:12:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] Filter out "Checksum validation failed" messages during the maven build

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15775 )

Change subject: Filter out "Checksum validation failed" messages during the 
maven build
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5942/


--
To view, visit http://gerrit.cloudera.org:8080/15775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19afbd157533e52ef3157730c7ec5159241749bc
Gerrit-Change-Number: 15775
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:33:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@102
PS5, Line 102:   Utils.shouldRecursivelyListPartitions(table), oldFds, 
table.getHostIndex(), validTxnList,
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@189
PS5, Line 189:   List partitionRefs) throws CatalogException, 
MetaException, TException {
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:31:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M shaded-deps/pom.xml
19 files changed, 1,108 insertions(+), 162 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/5
--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5944/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:27:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:27:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 3: Code-Review+2

carry


--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:26:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Tim Armstrong (Code Review)
Hello Thomas Tauber-Marshall, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16029

to look at the new patch set (#3).

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..

IMPALA-9782: fix Kudu DML with mt_dop

In general, ScalarExpr subclasses should not have
mutable state, i.e. state that is modified during
query execution, and definitely need to be thread-safe.
KuduPartitionExpr violated that, since KuduPartitioner
and KuduPartialRow are both mutated as part of expr
evaluation.

This patch moves those objects into thread-local state
for KuduPartitionExpr.

Testing:
* Add a regression test that does a large insert. Before this
  patch it reliably crashed Impala with mt_dop=4.
* Run more Kudu DML with mt_dop to improve test coverage.
* Make test_kudu.py use the correct test dimension (fixing a
  cosmetic issue).

Perf:
This changes adds some indirection to the expression evaluation,
so I did some manual benchmarking with this query, which
should somewhat stress the partitioning:

set mt_dop=1;
insert into orders_key_only
select o_orderkey from tpch_kudu.orders where o_orderkey % 10 = 0

The timing was in the same range before and after - between 6 and
8 seconds - but the results were very unstable so inconclusive.
The Kudu tservers were using an order-of-magnitude more CPU than
the impalads, so it seems safe to conclude that these partitioning
exprs are *not* a bottleneck for DML workloads.

Perf record showed impala::KuduPartitionExpr::GetIntValInterpreted()
taking up 0.09% of the impalad CPU, for additional evidence that
this doesn't make a difference.

Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
---
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/exprs/scalar-expr-evaluator.h
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
6 files changed, 129 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/16029/3
--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16029/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16029/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-9782: fix Kudu DML with mt_dop
Need to elaborate on what we did


http://gerrit.cloudera.org:8080/#/c/16029/2/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

http://gerrit.cloudera.org:8080/#/c/16029/2/be/src/exprs/kudu-partition-expr.cc@74
PS2, Line 74: unique_ptr ctx = 
make_unique();
> I don't think this is used anywhere?
Done


http://gerrit.cloudera.org:8080/#/c/16029/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/16029/2/tests/query_test/test_kudu.py@63
PS2, Line 63: # The kudu_read_mode of READ_LATEST does not provide high enough 
consistency for
: # these tests.
: # The default read mode of READ_LATEST does not provide high 
enough consistency for
: # these tests.
> typo
Done



--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:26:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9673: Add external warehouse dir variable in E2E test

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15990 )

Change subject: IMPALA-9673: Add external warehouse dir variable in E2E test
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5943/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f
Gerrit-Change-Number: 15990
Gerrit-PatchSet: 11
Gerrit-Owner: Xiaomeng Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Xiaomeng Zhang 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:25:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] Filter out "Checksum validation failed" messages during the maven build

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15775 )

Change subject: Filter out "Checksum validation failed" messages during the 
maven build
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5942/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19afbd157533e52ef3157730c7ec5159241749bc
Gerrit-Change-Number: 15775
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:10:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 2: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16029/2/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

http://gerrit.cloudera.org:8080/#/c/16029/2/be/src/exprs/kudu-partition-expr.cc@74
PS2, Line 74: unique_ptr ctx = 
make_unique();
I don't think this is used anywhere?


http://gerrit.cloudera.org:8080/#/c/16029/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/16029/2/tests/query_test/test_kudu.py@63
PS2, Line 63: # The kudu_read_mode of READ_LATEST does not provide high enough 
consistency for
: # these tests.
: # The default read mode of READ_LATEST does not provide high 
enough consistency for
: # these tests.
typo



--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:01:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9824: MetastoreClientPool should be singleton

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16030 )

Change subject: IMPALA-9824: MetastoreClientPool should be singleton
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6214/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5477cba68b9053dfae8e7ed785f4d7519c12cd0f
Gerrit-Change-Number: 16030
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Jun 2020 18:14:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6213/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 04 Jun 2020 17:57:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 24:

Could you do a pass over this with clang-format-diff? It's ready to be +2ed 
except for a few formatting issues.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 24
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 04 Jun 2020 17:37:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9824: MetastoreClientPool should be singleton

2020-06-04 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16030


Change subject: IMPALA-9824: MetastoreClientPool should be singleton
..

IMPALA-9824: MetastoreClientPool should be singleton

This patch is a refactor-only patch. It changes the MetastoreClientPool
into process wide singleton object. Currently, it is possible to
instantiate multiple MetastoreClientPool although it doesn't seem to
be a problem since all the places which instantiate it happen to run
in separate processes. It would be better to enforce this in code.
Making MetastoreClientPool a singleton also makes it easier to access
from anywhere in the code without the need to explicitly pass its
reference around which complicates the code flow unnecessarily.

Note that one side-effect of this patch is that number of initial
clients that the pool instantiates is controlled using a configuration
num_metadata_loading_threads which defaults to 16 instead of current
default size of 10 in Catalog. Alternatively, we can introduce a
separate config, but given that any metadata load operation needs a
HMS client, we might be better off creating the clients same as
num_metadata_loading_threads.

Testing:
1. No new tests were added since this was mostly a refactor-only patch.
2. [WIP] Ran existing core tests.

Change-Id: I5477cba68b9053dfae8e7ed785f4d7519c12cd0f
---
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
8 files changed, 43 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/16030/1
--
To view, visit http://gerrit.cloudera.org:8080/16030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5477cba68b9053dfae8e7ed785f4d7519c12cd0f
Gerrit-Change-Number: 16030
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-9782: fix Kudu DML with mt dop

2020-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16029 )

Change subject: IMPALA-9782: fix Kudu DML with mt_dop
..

IMPALA-9782: fix Kudu DML with mt_dop

In general, ScalarExpr subclasses should not have
mutable state, i.e. state that is modified during
query execution, and definitely need to be thread-safe.
KuduPartitionExpr violated that, since KuduPartitioner
and KuduPartialRow are both mutated as part of expr
evaluation.

This patch moves those objects into thread-local state
for KuduPartitionExpr.

Testing:
* Add a regression test that does a large insert. Before this
  patch it reliably crashed Impala with mt_dop=4.
* Run more Kudu DML with mt_dop to improve test coverage.
* Make test_kudu.py use the correct test dimension (fixing a
  cosmetic issue).

Perf:
This changes adds some indirection to the expression evaluation,
so I did some manual benchmarking with this query, which
should somewhat stress the partitioning:

set mt_dop=1;
insert into orders_key_only
select o_orderkey from tpch_kudu.orders where o_orderkey % 10 = 0

The timing was in the same range before and after - between 6 and
8 seconds - but the results were very unstable so inconclusive.
The Kudu tservers were using an order-of-magnitude more CPU than
the impalads, so it seems safe to conclude that these partitioning
exprs are *not* a bottleneck for DML workloads.

Perf record showed impala::KuduPartitionExpr::GetIntValInterpreted()
taking up 0.09% of the impalad CPU, for additional evidence that
this doesn't make a difference.

Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
---
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/exprs/scalar-expr-evaluator.h
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
6 files changed, 132 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/16029/2
--
To view, visit http://gerrit.cloudera.org:8080/16029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7e86894da9dbcb3b69f7db236c841ecc08dbf1a
Gerrit-Change-Number: 16029
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
> Should this solve the issue in general, or this is just one step in that di
This is one step towards solving the problem.  The back pressure problem still 
exist in some degree, but should be decreased using this method.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> I was thinking about ways to minimize the regression this can cause in quer
Great idea! I will try to implement it that way.
I think there are two points that I need to pay attention in regards to this 
idea.

First, the size of the first sort run, where sort_bytes_limit is not enforced 
yet, can be couple times bigger than the rest of other runs. I'm not sure how 
the merge phase will behave in that case. I suppose it may be work just fine, 
since in many case the very last run will have different size (smaller) as well.

Second, when sort_bytes_limit is going to be enforced, I need to make sure to 
lower memory reservation for that SORT NODE to at least sort_bytes_limit. The 
reservation reduction might need to be done gradually as the sorted run pages 
are unpinned.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47
PS9, Line 47: AddBatchTime
> Does this mean the maximum time of a single AddBatchTime call, or the total
This is the maximum time of a single AddBatchTime call.
AddBatchTime is implemented as SummaryStatsCounter showing min, max, average, 
and sample count.

In regards to your proposed idea earlier, I probably need to rethink about this 
counter as well.
The purpose of this counter was to measure the effect of selected 
sort_bytes_limit value towards the time a SORT NODE blocked doing quicksort.
If we not enforcing sort_bytes_limit in the first run, the Max AddBatchTime 
will then always associated with the first quicksort.

Maybe I need to refocus this counter to just measure the quicksort time. Rename 
'AddBatchTime' to 'QuickSortTime'.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880
PS9, Line 880: || (state_->query_options().sort_bytes_limit > 0
 :&& buffer_pool_client_->GetUsedReservation()
 :>= state_->query_options().sort_bytes_limit)
> nit: indentation looks a bit messy, but I am not sure what is the rule here
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894
PS9, Line 894:   timer.Stop();
> I don't know if it matters, but this means that the counter won't be increa
Ack

Will rethink a better way to wrap the timer.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904
PS9, Line 904: RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", 
_bytes_limit));
> I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79
PS9, Line 79: query, exec_option, table_format=table_format)
> nit: +2 indentation
Ack



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 16:13:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6212/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Jun 2020 15:11:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(7 comments)

The change looks good to me in general, I only have some nits and improvement 
suggestions.

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
Should this solve the issue in general, or this is just one step in that 
direction? The 3-6 second AddBatchTime still seems enough to fill the buffers 
and cause back pressure.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
I was thinking about ways to minimize the regression this can cause in queries 
that spill more because of the sort_bytes_limit, e.g. where the memory need of 
the sort is between sort_bytes_limit and mem limit.

A compromise could be to start using sort_bytes_limit only after the sorter had 
to spill at least once. This would mean no spilling if it is not needed at all, 
and the "big sort + spilling" could only occur once for an instance, so the 
penalty would not scale with the size of the data set.

This may potentially allow us to set a safe default for sort_bytes_limit that 
rarely leads to regression but still helps in huge queries.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47
PS9, Line 47: AddBatchTime
Does this mean the maximum time of a single AddBatchTime call, or the total 
time of fragment instance that spent to most time in the function?


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880
PS9, Line 880: || (state_->query_options().sort_bytes_limit > 0
 :&& buffer_pool_client_->GetUsedReservation()
 :>= state_->query_options().sort_bytes_limit)
nit: indentation looks a bit messy, but I am not sure what is the rule here

Moving the check to a function like bool ReachedSortBytesLimit() could help.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894
PS9, Line 894:   timer.Stop();
I don't know if it matters, but this means that the counter won't be increased 
if the function returns with an error, while it is possible that a huge amount 
of time was spent here.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904
PS9, Line 904: RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", 
_bytes_limit));
I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem 
with small values is that it would greatly increase the number of runs, which 
can lead to doing the merge phase of the sort in multiple levels, as all runs 
during a merge needs a buffer, see Sorter::MaxRunsInNextMerge(). It seems also 
ok to enforce this in Sorter without notifying the user.


http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79
PS9, Line 79: query, exec_option, table_format=table_format)
nit: +2 indentation



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 14:31:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-04 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16001

to look at the new patch set (#4).

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..

IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

"Original files" are files that don't have full ACID schema. We can see
such files if we upgrade a non-ACID table to full ACID. Also, the LOAD
DATA statement can load non-ACID files into full ACID tables. So such
files don't store special ACID columns, that means we need
to auto-generate their values. These are (operation,
originalTransaction, bucket, rowid, and currentTransaction).

With the exception of 'rowid', all of them can be calculated based on
the file path, so I add their values to the scanner's template tuple.

'rowid' is the ordinal number of the row inside a bucket inside a
directory. For now Impala only allows one file per bucket per
directory. Therefore we can generate row ids for each file
independently.

Multiple files in a single bucket in a directory can only be present if
the table was non-transactional earlier and we upgraded it to full ACID
table. After the first compaction we should only see one original file
per bucket per directory.

In HdfsOrcScanner we calculate the first row id for our split then
the OrcStructReader fills the rowid slot with the proper values.

Testing:
 * added e2e tests to check if the generated values are correct
 * added e2e test to reject tables that have multiple files per bucket
 * added unit tests to the new auxiliary functions

Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
---
M be/src/exec/acid-metadata-utils-test.cc
M be/src/exec/acid-metadata-utils.cc
M be/src/exec/acid-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M testdata/data/README
A testdata/data/alltypes_non_acid.orc
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A 
testdata/workloads/functional-query/queries/QueryTest/full-acid-original-file.test
M tests/query_test/test_acid.py
14 files changed, 699 insertions(+), 157 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/16001/4
--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6211/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 04 Jun 2020 08:54:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-452 Add support for string concatenation operator using || construct Separated "||" and "OR" into different tokens. -OR (KW OR) remains the same. (it creates CompoundPredicate a

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15877 )

Change subject: IMPALA-452 Add support for string concatenation operator using 
|| construct Separated "||" and "OR" into different tokens. -OR (KW_OR) remains 
the same. (it creates CompoundPredicate and expects two BOOLEAN operands) -|| 
(KW_LOGICAL_OR) creates CompoundVe
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6210/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f990d56ecb1e18d1b2737e8c5eab0d524edfaf
Gerrit-Change-Number: 15877
Gerrit-PatchSet: 9
Gerrit-Owner: Martin Zink 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Martin Zink 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 08:23:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M shaded-deps/pom.xml
16 files changed, 1,074 insertions(+), 142 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/4
--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16008/4/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/4/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@102
PS4, Line 102:   Utils.shouldRecursivelyListPartitions(table), oldFds, 
table.getHostIndex(), validTxnList,
line too long (99 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 04 Jun 2020 08:09:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-452 Add support for string concatenation operator using || construct Separated "||" and "OR" into different tokens. -OR (KW OR) remains the same. (it creates CompoundPredicate a

2020-06-04 Thread Martin Zink (Code Review)
Martin Zink has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/15877 )

Change subject: IMPALA-452 Add support for string concatenation operator using 
|| construct Separated "||" and "OR" into different tokens. -OR (KW_OR) remains 
the same. (it creates CompoundPredicate and expects two BOOLEAN operands) -|| 
(KW_LOGICAL_OR) creates CompoundVe
..

IMPALA-452 Add support for string concatenation operator using || construct 
Separated "||" and "OR" into different tokens.
-OR (KW_OR) remains the same. (it creates CompoundPredicate and expects two 
BOOLEAN operands)
-|| (KW_LOGICAL_OR) creates CompoundVerticalBarExpr which expects two BOOLEAN 
operands or two STRING operands

CompoundVerticalBarExpr creates either a CompoundPredicate or a 
FunctionCallExpr member variable based on the type of the left operand during 
analyze.
CompoundVerticalBarExpr similarly to BetweenPredicate cannot be executed 
directly
so its needs to be replaced by its member variable by 
ExtractCompoundVerticalBarExprRule.

Change-Id: Ie3f990d56ecb1e18d1b2737e8c5eab0d524edfaf
---
M be/src/exprs/expr-test.cc
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/CompoundVerticalBarExpr.java
A 
fe/src/main/java/org/apache/impala/rewrite/ExtractCompoundVerticalBarExprRule.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
11 files changed, 244 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/15877/9
--
To view, visit http://gerrit.cloudera.org:8080/15877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3f990d56ecb1e18d1b2737e8c5eab0d524edfaf
Gerrit-Change-Number: 15877
Gerrit-PatchSet: 9
Gerrit-Owner: Martin Zink 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Martin Zink 
Gerrit-Reviewer: Tim Armstrong