[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16182 )

Change subject: IMPALA-9974: Join elimination based on referential integrity.
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6523/


--
To view, visit http://gerrit.cloudera.org:8080/16182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074
Gerrit-Change-Number: 16182
Gerrit-PatchSet: 4
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 07 Oct 2020 06:09:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16182 )

Change subject: IMPALA-9974: Join elimination based on referential integrity.
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7375/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074
Gerrit-Change-Number: 16182
Gerrit-PatchSet: 4
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 07 Oct 2020 04:22:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16182 )

Change subject: IMPALA-9974: Join elimination based on referential integrity.
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java@292
PS4, Line 292:   EqJoinConjunctScanSlots slots = 
EqJoinConjunctScanSlots.create(eqJoinConjunct, analyzer);
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java@478
PS4, Line 478:   return new EqJoinConjunctScanSlots(eqJoinConjunct, 
lhsScanSlot, rhsScanSlot, analyzer);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@523
PS4, Line 523: FeFsTable p2 = (FeFsTable) 
catalog_.getOrLoadTable("functional", "parent_table_2", "test", null);
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@532
PS4, Line 532: FeFsTable child = (FeFsTable) 
catalog_.getOrLoadTable("functional", "child_table", "test", null);
line too long (101 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074
Gerrit-Change-Number: 16182
Gerrit-PatchSet: 4
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 07 Oct 2020 04:02:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.

2020-10-06 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16182


Change subject: IMPALA-9974: Join elimination based on referential integrity.
..

IMPALA-9974: Join elimination based on referential integrity.

Use defined referential dependency constraints during planning.

  1. For join ndv calculations involving a single column primary key we
 now use the table's row count instead of the column's ndv stats
 which are approximate and incorrect at times.
  2. Convert LEFT SEMI JOIN to INNER JOIN if the inner table's primary
 key is used as the join key.
  3. Convert LEFT OUTER JOIN to INNER JOIN if the join key uses a
 fk/pk relationship, the FK is not nullable.
  4. Prune INNER JOINS on fk/pk conditions where the join doesn't affect
 the output cardinality and there are no predicates or non-key
 columns in use on the inner side.
  5. For a confirmed fk/pk join we include a IS NOT NULL predicate on
 the foreign key columns.

Plan optimizations can be disabled with the query option 
DISABLE_INFORMATIONAL_CONSTRAINTS

Testing:
  - New constraint rewrite planner test
  - TPC-DS functional tests
  - TPC-DS Planner test changes

Some TPC-DS 30TB Performance Improvements:
  - Q72 74.3s -> 24.8s
  - Q82 13.5s -> 2.2s
  - Q84 19.33s -> 6.07s
  - Q85 29s -> 6.33s

Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
A fe/src/main/java/org/apache/impala/analysis/TableConstraintRewriter.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/SqlConstraints.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/bin/compute-table-stats.sh
M testdata/data/child_table.txt
M testdata/data/parent_table.txt
M testdata/data/parent_table_2.txt
M testdata/datasets/functional/functional_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/constraint-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functi

[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 07 Oct 2020 00:34:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..

IMPALA-10189: don't reload partitions for drop stats

This patch avoids reloading partition metadata from the
HMS or filesystems when partition stats are changed,
by a DROP STATS command.

bulkAlterPartitions() is modified to allow callers to
control whether partitions are marked dirty (and therefore
reloaded).

DDLs that do not mark partitions dirty need to correctly update
the in-memory state of the Partition object so that it will
be the same as it would be after a reload from the HMS.

In this case, numRows_ was not previously updated, but now
needs to be updated.

Testing:
* Ran exhaustive tests
* Manually tested some scenarios with modifications to partitions
  do in Hive outside of Impala. The previous behaviour is preserved
  (i.e. cached partition state in Impala overwrites the Hive
  modifictions).
* Manually tested dropping and computing regular and incremental
  stats and inspected the results in both Impala and Hive. We
  already have automated test coverage for these scenarios as well.

Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Reviewed-on: http://gerrit.cloudera.org:8080/16516
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 79 insertions(+), 37 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay

2020-10-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16548 )

Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10
PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS
> I meant "...in the catalog service"
A pointer for debug actions - this is where they are actually implemented:
https://github.com/apache/impala/blob/master/be/src/util/debug-util.cc#L354
SLEEP is good for what we want here - the others could be implemented later 
when they are actually needed.


http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py
File tests/custom_cluster/test_topic_update_frequency.py:

http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py@24
PS2, Line 24: CustomClusterTestSuite
> This change was done to simulate certain conditions for development of patc
It doesn't have to be a custom cluster test for debug actions, as debug_actions 
is a query option. The test should be marked with @pytest.mark.execute_serially 
though to avoid interfering with parallel tests by locking catalog for a long 
time.

It can stay as a custom_cluster test if it makes merging with 
https://gerrit.cloudera.org/#/c/16549/ easier, and we can speed up the test 
later by making it non custom cluster if it turns out to be possible.



--
To view, visit http://gerrit.cloudera.org:8080/16548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1
Gerrit-Change-Number: 16548
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 23:23:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..


Patch Set 5: Code-Review+2

Thanks for the changes!


--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 5
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 06 Oct 2020 23:11:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16538 )

Change subject: IMPALA-10188: Remove unused WebDAV functions
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53
Gerrit-Change-Number: 16538
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 06 Oct 2020 21:47:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16538 )

Change subject: IMPALA-10188: Remove unused WebDAV functions
..

IMPALA-10188: Remove unused WebDAV functions

Included the latest squeasel/master from upstream:
- fadbf61f9e5dfd08b2d6aad785bb521821c870c8:
  The function "sq_get_bound_addresses" returns the full IPv6 addresses.
- 764cd7dab35e5dabc7bfa3a6d4d5b187ace6b913:
  Removed PROPFIND and MKCOL methods from squeasel.

Testing:
- Ran webserver-test
- Validated that the impalad debug UI came up fine and made sure
  that all the existing links work as expected.

Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53
Reviewed-on: http://gerrit.cloudera.org:8080/16538
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/thirdparty/squeasel/squeasel.c
M be/src/util/webserver-test.cc
2 files changed, 6 insertions(+), 111 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53
Gerrit-Change-Number: 16538
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7374/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 5
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 06 Oct 2020 21:42:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16506 )

Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of 
ExecuterBlacklist
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1
Gerrit-Change-Number: 16506
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 06 Oct 2020 21:25:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16506 )

Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of 
ExecuterBlacklist
..

IMPALA-9180 (part 2): Refactor executor_list_ map of ExecuterBlacklist

In current ExecuterBlacklist class, we've keyed maps on a
TNetworkAddress of a backend. To simply the logic for the class,
changes it to key off of the UniqueIdPB backend-id, eg. refactor
'executor_list_' to no longer be a map list>
and instead makes it a map.
Also fixes a minor bug with the calculation of elapsed time when
a backend that was on probation is re-blacklisted.

Testing:
 - Passed test_blacklist.py and test_query_retries.py.
 - Passed exhaustive tests.

Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1
Reviewed-on: http://gerrit.cloudera.org:8080/16506
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/scheduling/executor-blacklist.cc
M be/src/scheduling/executor-blacklist.h
2 files changed, 69 insertions(+), 103 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1
Gerrit-Change-Number: 16506
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..

IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

We found that Ranger would generate an AuthzAuditEvent as long as
there exists a column masking policy corresponding to the column
even though the policy does not apply to the requesting user. This
resulted in an IllegalStateException if a user "A" submits a SELECT
query against a table that has a column specified in a column masking
policy when the policy does not apply to "A", i.e., the field of
'Select User' for this policy in the Ranger web UI does not contain "A".
For such an AuthzAuditEvent, its field of 'accessType' will not be one
of the supported mask types since its corresponding
accessResult.isMaskEnabled() would evaluates to false, indicating that
there is no matching column masking policy associated with the user "A"
and thus the AuthzAuditEvent will not be post-processed by Impala in
RangerAuthorizationCheker#createColumnMask(). But since we did not
filter out such an AuthzAuditEvent when it was generated and returned
from RangerBasePlugin#evalDataMaskPolicies(), we failed the check that
requires every AuthzAuditEvent be column masking-related in
RangerAuthorizationContext#stashAuditEvents().

To address this issue, in this patch we filter out such an
AuthzAuditEvent after each call to
RangerBasePlugin#evalDataMaskPolicies() so that no redundant
AuthzAuditEvent is generated.

Testing:
 - Added a new column masking policy associated with a non-matching user
   in RangerAuditLogTest#testAuditsForColumnMasking() to verify that
   the redundant AuthzAuditEvent is removed.
 - Verified that the patch passes the exhaustive tests in the DEBUG
   build.

Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
---
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java
M 
fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
2 files changed, 35 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/16524/5
--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 5
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16534 )

Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile 
path hash
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7373/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
Gerrit-Change-Number: 16534
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 21:15:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7372/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 06 Oct 2020 21:03:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash

2020-10-06 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/16534 )

Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile 
path hash
..

IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash

Current code generates data path hash in MD5 for Iceberg Table. But
MD5 is one of forbidden algorithms for FIPS. Even for non-security
purposes, like hash map, we still cannot use MD5 algorithm.
This patch replaces MD5 with non-cryptographic hash function
murmur3_128, which generates hash value with same length as MD5.

Testing:
 - Passed core tests.

Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
6 files changed, 22 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/16534/4
--
To view, visit http://gerrit.cloudera.org:8080/16534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
Gerrit-Change-Number: 16534
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16516/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16516/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1647
PS8, Line 1647:   // Remove the ROW_COUNT parameter if it has been set and 
set numRows to reflect
change "set and set" to "set and update".

Maybe better to move the numRows comment down further so it's more clear what 
code that belongs with.


http://gerrit.cloudera.org:8080/#/c/16516/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4765
PS8, Line 4765: FileChecksum checkSum = fs.getFileChecksum(filePath);
getFileSystem was changed to return null and is being used unchecked here.



--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:52:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16547 )

Change subject: IMPALA-9664: Fix typo in test_event_processing.py
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3
Gerrit-Change-Number: 16547
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:52:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16547 )

Change subject: IMPALA-9664: Fix typo in test_event_processing.py
..

IMPALA-9664: Fix typo in test_event_processing.py

The test tries to add src_db within a database
object which would fail when the database
managed location is present. This test doesn't
fail currently in ASF master since the database
doesn't have managed location yet. It will start
failing once the managed location for databases is
available in the toolchain build of Hive.

Testing:
1. The test was working before the patch since the
managed db location was probably not set and modified
line was not getting executed. I made sure the test works
with the patch as well.
2. I applied the patch in an environment where
managed db location is available and the error disappears.
(Although the test fails for another unrelated reason
HIVE-23995) so we should be aware of this when the toolchain
hive build is bumped up.

Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3
Reviewed-on: http://gerrit.cloudera.org:8080/16547
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M tests/custom_cluster/test_event_processing.py
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3
Gerrit-Change-Number: 16547
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16524/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16524/1//COMMIT_MSG@9
PS1, Line 9: We found that Ranger would generate an AuthzAuditEvent as long as
   : there exists a column masking policy corresponding to the column
   : even though the policy does not apply to the requesting user
> Thanks Csaba! I have consulted Abhay Kulkarni on the Ranger project about t
Thanks for investigating this!
If this is an expected behavior than I agree that it is the best to just remove 
the extra event as the patch does.


http://gerrit.cloudera.org:8080/#/c/16524/1//COMMIT_MSG@30
PS1, Line 30: Furthermore, we also revise all the checks
: for the generated AuthzAuditEvent's due to the evaluation of 
column
: masking policies so that a failed check would also result in an 
entry in
: the error log.
> Thanks Csaba for the feedback! I will add back the removed checks according
I see that you kept both the checks and the logs - do you think that the logs 
are necessary? The failed preconditions are always added to the error log AFAIK.



--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:51:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash

2020-10-06 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16534 )

Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile 
path hash
..


Patch Set 3:

Right, we have to construct the absolute paths in loadFileDescFromThrift(). It 
causes memory overhead on the coordinator and coordinator is more memory 
intensive.
Will keep Murmur hash.


--
To view, visit http://gerrit.cloudera.org:8080/16534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
Gerrit-Change-Number: 16534
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:50:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..


Patch Set 3:

> Patch Set 3:
>
> Build Failed
>
> https://jenkins.impala.io/job/gerrit-code-review-checks/7371/ : Initial code 
> review checks failed. See linked job for details on the failure.

The error message provided in the following indicates this failure does not 
seem to be related to my patch. So I re-triggered the build again.

19:48:12 [ERROR] Failed to execute goal on project impala-minimal-s3a-aws-sdk: 
Could not resolve dependencies for project 
org.apache.impala:impala-minimal-s3a-aws-sdk:jar:0.1-SNAPSHOT: Failed to 
collect dependencies at org.apache.hadoop:hadoop-aws:jar:3.1.1.7.2.1.0-287: 
Failed to read artifact descriptor for 
org.apache.hadoop:hadoop-aws:jar:3.1.1.7.2.1.0-287: Could not transfer artifact 
org.apache.hadoop:hadoop-main:pom:3.1.1.7.2.1.0-287 from/to impala.cdp.repo 
(https://native-toolchain.s3.amazonaws.com/build/cdp_components/4493826/maven): 
Connect to native-toolchain.s3.amazonaws.com:443 
[native-toolchain.s3.amazonaws.com/52.219.112.106] failed: Connection timed out 
(Connection timed out) -> [Help 1]


--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:44:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9792: Add ability to split kudu scan ranges

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16385 )

Change subject: IMPALA-9792: Add ability to split kudu scan ranges
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6518/


--
To view, visit http://gerrit.cloudera.org:8080/16385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8
Gerrit-Change-Number: 16385
Gerrit-PatchSet: 9
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:44:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay

2020-10-06 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16548 )

Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10
PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS
> Thanks for this suggestion. I was not aware of this and will look into this
I meant "...in the catalog service"



--
To view, visit http://gerrit.cloudera.org:8080/16548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1
Gerrit-Change-Number: 16548
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:39:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay

2020-10-06 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16548 )

Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10
PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS
> Couldn't we use DEBUG_ACTION in the frontend similarly to delay/failure inj
Thanks for this suggestion. I was not aware of this and will look into this. 
Definitely looks like a good match which can be enhanced to other RPC delays 
(HMS, Ranger etc) and the catalog service.


http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc@89
PS2, Line 89: DECLARE_int64(catalog_simulate_hdfs_listing_delay_ms);
> What is the relation of the query option and the flag? I ran through the co
yes, originally, I had created this as catalog service configuration which was 
being picked in the code during initialization of CatalogServiceCatalog. I 
think this can be removed. I will update the patch.


http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py
File tests/custom_cluster/test_topic_update_frequency.py:

http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py@24
PS2, Line 24: CustomClusterTestSuite
> Does this has to be a custom cluster test? I agree that this should be a se
This change was done to simulate certain conditions for development of patch 
for IMPALA-6671 (https://gerrit.cloudera.org/#/c/16549/) The test which uses 
this option in that patch is a custom cluster test. It looks like with debug 
actions though, we may have to keep this as a custom cluster test, right?



--
To view, visit http://gerrit.cloudera.org:8080/16548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1
Gerrit-Change-Number: 16548
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 20:38:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7371/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:50:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6671: [WIP] Skip locked tables from topic updates

2020-10-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16549 )

Change subject: IMPALA-6671: [WIP] Skip locked tables from topic updates
..


Patch Set 1:

(3 comments)

I think if this approach works out it would meaningfully improve metadata for 
production workloads - seems like it should reduce the number of hiccups that 
non-long-running-ddl queries experience, although obviously with the approach 
they can eventually get stuck behind a DDL.

We'd expect that potentially, this could have some impact on tables where there 
are multiple long-running DDLs, right? E.g. that table might skip topic updates 
and delay queries waiting for that particular table. I think that seems fine, 
but just wanted to be sure.

http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@9
PS1, Line 9: This change adds a mechanism for topic-update thread
question, that maybe will be answered as I read the patch - if an update is 
skipped, does the topic update collection thread get woken up when the lock is 
released?


http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@23
PS1, Line 23: This solution unblocks topic-updates for a while until a
Maybe this is the best we can do at the moment, but it does seem to have 
similar problems to the IMPALA-5058 solution where it does help with latency 
when it's blocked for shorter periods of time but still gets blocked when a 
lock is held for a long time.


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392
PS1, Line 1392: if (tblVersion > ctx.toVersion &&
I've seen some cases in production workloads where the topic update thread gets 
stuck acquiring the table lock, but then has to skip the update anywhere here. 
Is there any way we can move this check outside of the lock acquisition so that 
it can exit early in that case?



--
To view, visit http://gerrit.cloudera.org:8080/16549
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655
Gerrit-Change-Number: 16549
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:50:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay

2020-10-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16548 )

Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10
PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS
Couldn't we use DEBUG_ACTION in the frontend similarly to delay/failure 
injection to the backend? E.g. CATALOGD_HDFS_LISTING:SLEEP@1000

This may be more work now, but similar delays could be implemented with much 
less plumbing in the future.
I am ok with the current solution if we need to merge this quickly, but I think 
it worth a TODO, as in the long term DEBUG_ACTION seems better to me.


http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc@89
PS2, Line 89: DECLARE_int64(catalog_simulate_hdfs_listing_delay_ms);
What is the relation of the query option and the flag? I ran through the code 
and tests and didn't see any usage of the flag. The commit message also doesn't 
mention it.


http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py
File tests/custom_cluster/test_topic_update_frequency.py:

http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py@24
PS2, Line 24: CustomClusterTestSuite
Does this has to be a custom cluster test? I agree that this should be a 
serialized test, but I don't see why would we need to restart the cluster.



--
To view, visit http://gerrit.cloudera.org:8080/16548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1
Gerrit-Change-Number: 16548
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:37:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7370/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:27:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

2020-10-06 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16524 )

Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column 
masking
..

IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking

We found that Ranger would generate an AuthzAuditEvent as long as
there exists a column masking policy corresponding to the column
even though the policy does not apply to the requesting user. This
resulted in an IllegalStateException if a user "A" submits a SELECT
query against a table that has a column specified in a column masking
policy when the policy does not apply to "A", i.e., the field of
'Select User' for this policy in the Ranger web UI does not contain "A".
For such an AuthzAuditEvent, its field of 'accessType' will not be one
of the supported mask types since its corresponding
accessResult.isMaskEnabled() would evaluates to false, indicating that
there is no matching column masking policy associated with the user "A"
and thus the AuthzAuditEvent will not be post-processed by Impala in
RangerAuthorizationCheker#createColumnMask(). But since we did not
filter out such an AuthzAuditEvent when it was generated and returned
from RangerBasePlugin#evalDataMaskPolicies(), we failed the check that
requires every AuthzAuditEvent be column masking-related in
RangerAuthorizationContext#stashAuditEvents().

To address this issue, in this patch we filter out such an
AuthzAuditEvent after each call to
RangerBasePlugin#evalDataMaskPolicies() so that no redundant
AuthzAuditEvent is generated. Furthermore, we also revise all the checks
for the generated AuthzAuditEvent's due to the evaluation of column
masking policies so that a failed check would also result in an entry in
the error log.

Testing:
 - Added a new column masking policy associated with a non-matching user
   in RangerAuditLogTest#testAuditsForColumnMasking() to verify that
   the redundant AuthzAuditEvent is removed.
 - Verified that the patch passes the exhaustive tests in the DEBUG
   build.

Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
---
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java
M 
fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
3 files changed, 53 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/16524/3
--
To view, visit http://gerrit.cloudera.org:8080/16524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b
Gerrit-Change-Number: 16524
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6522/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..


Patch Set 7: Code-Review+2

carry


--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Tim Armstrong (Code Review)
Hello Quanlong Huang, Kurt Deschler, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16516

to look at the new patch set (#6).

Change subject: IMPALA-10189: don't reload partitions for drop stats
..

IMPALA-10189: don't reload partitions for drop stats

This patch avoids reloading partition metadata from the
HMS or filesystems when partition stats are changed,
by a DROP STATS command.

bulkAlterPartitions() is modified to allow callers to
control whether partitions are marked dirty (and therefore
reloaded).

DDLs that do not mark partitions dirty need to correctly update
the in-memory state of the Partition object so that it will
be the same as it would be after a reload from the HMS.

In this case, numRows_ was not previously updated, but now
needs to be updated.

Testing:
* Ran exhaustive tests
* Manually tested some scenarios with modifications to partitions
  do in Hive outside of Impala. The previous behaviour is preserved
  (i.e. cached partition state in Impala overwrites the Hive
  modifictions).
* Manually tested dropping and computing regular and incremental
  stats and inspected the results in both Impala and Hive. We
  already have automated test coverage for these scenarios as well.

Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 79 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/16516/6
--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats

2020-10-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16516 )

Change subject: IMPALA-10189: don't reload partitions for drop stats
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16516/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16516/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@199
PS5, Line 199: import com.codahale.metrics.Timer;
> nit, unnecessary change by IDE? May be move them back to original place.
Done


http://gerrit.cloudera.org:8080/#/c/16516/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4115
PS5, Line 4115: ..
> nit, can remove .
Done



--
To view, visit http://gerrit.cloudera.org:8080/16516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d
Gerrit-Change-Number: 16516
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16503 )

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Oct 2020 18:44:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10210: Skip Authentication for connection from a trusted domain

2020-10-06 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16542 )

Change subject: IMPALA-10210: Skip Authentication for connection from a trusted 
domain
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/rpc/authentication-util.cc
File be/src/rpc/authentication-util.cc:

http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/rpc/authentication-util.cc@181
PS1, Line 181:   s = sock_addr.LookupHostname(&host_name);
Is there any performance problems with doing a reverse dns lookup on every 
request?


http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/transport/THttpServer.cpp
File be/src/transport/THttpServer.cpp:

http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/transport/THttpServer.cpp@195
PS1, Line 195: transport_->getOrigin()
> for connections originating from the localhost, this returns "localhost" wh
Does this ever return an IP address? From looking at the code for TSocket, it 
seems like this will always return a hostname, in which case of course we 
should probably handle all hostnames, not just localhost.


http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/transport/THttpServer.cpp@206
PS1, Line 206: // Try authenticating with cookies first.
This is outdated now


http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/util/webserver.cc
File be/src/util/webserver.cc:

http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/util/webserver.cc@599
PS1, Line 599: // Try authenticating with a cookie first, if enabled.
outdated



--
To view, visit http://gerrit.cloudera.org:8080/16542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I09234078e2314dbc3177d0e869ae028e216ca699
Gerrit-Change-Number: 16542
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 06 Oct 2020 17:59:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10198: Unify Java code under the $IMPALA HOME/java directory

2020-10-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16500 )

Change subject: IMPALA-10198: Unify Java code under the $IMPALA_HOME/java 
directory
..


Patch Set 3:

(1 comment)

Missed this review comment somehow. I included random other build fixes in this 
upload, and I'm going to split it up into a couple changes to make the pieces 
clear. I'll let you know when this is ready to review.

http://gerrit.cloudera.org:8080/#/c/16500/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16500/3//COMMIT_MSG@18
PS3, Line 18: This moves all the Java projects other than fe into
> We'll probably have to update some of the wiki pages to reflect this right,
The nice thing about this structure is that the maven commands all work on the 
submodules. Since fe didn't move, the same commands still work from the same 
directory. I will double check that to be sure.



--
To view, visit http://gerrit.cloudera.org:8080/16500
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Gerrit-Change-Number: 16500
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 06 Oct 2020 17:49:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16550 )

Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 
'iceberg.file_format' as Iceberg table property
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Gerrit-Change-Number: 16550
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 16:58:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16550 )

Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 
'iceberg.file_format' as Iceberg table property
..

IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg 
table property

We provide several new table properties in IMPALA-10164, such as
'iceberg.catalog', in order to keep consist of these properties, we
rename 'iceberg_file_format' to 'iceberg.file_format'. When we creating
Iceberg table, we should use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.file_format'='parquet',
'iceberg.catalog'='hadoop.tables')

Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Reviewed-on: http://gerrit.cloudera.org:8080/16550
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
6 files changed, 22 insertions(+), 22 deletions(-)

Approvals:
  Zoltan Borok-Nagy: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Gerrit-Change-Number: 16550
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16538 )

Change subject: IMPALA-10188: Remove unused WebDAV functions
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53
Gerrit-Change-Number: 16538
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 06 Oct 2020 16:18:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16538 )

Change subject: IMPALA-10188: Remove unused WebDAV functions
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6521/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53
Gerrit-Change-Number: 16538
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 06 Oct 2020 16:19:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16506 )

Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of 
ExecuterBlacklist
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6520/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1
Gerrit-Change-Number: 16506
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 06 Oct 2020 16:10:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16506 )

Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of 
ExecuterBlacklist
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1
Gerrit-Change-Number: 16506
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 06 Oct 2020 16:10:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16543 )

Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7369/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a
Gerrit-Change-Number: 16543
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:50:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository

2020-10-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16543 )

Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository
..

IMPALA-10218: Remove impala.cdh.repo Maven repository

This removes the impala.cdh.repo Maven repository (i.e.
the repository for the CDH_BUILD_NUMBER). It removes
the associated code for CDH_BUILD_NUMBER.

The only remaining dependency for the CDH_BUILD_NUMBER
repository was Apache Kite in some of our test code.
This transitions that code to use the public version
of Apache Kite.

The testdata/TableFlattener Java project is intended
to be used manually and is not used for any tests.
It has bitrotted and currently does not build. I verified
that it now builds (which it currently did not), but I did
not verify functionality.

Testing:
 - Ran a core job
 - Built testdata/TableFlattener Java project

Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a
---
M bin/impala-config.sh
M buildall.sh
M impala-parent/pom.xml
M testdata/TableFlattener/pom.xml
M 
testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/Main.java
M testdata/pom.xml
6 files changed, 55 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16543/3
--
To view, visit http://gerrit.cloudera.org:8080/16543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a
Gerrit-Change-Number: 16543
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository

2020-10-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16543 )

Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository
..


Patch Set 3:

Thanks for the review!


--
To view, visit http://gerrit.cloudera.org:8080/16543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a
Gerrit-Change-Number: 16543
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:30:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository

2020-10-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16543 )

Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository
..

IMPALA-10218: Remove impala.cdh.repo Maven repository

This removes the impala.cdh.repo Maven repository (i.e.
the repository for the CDH_BUILD_NUMBER). It removes
the associated code for CDH_BUILD_NUMBER.

The only remaining dependency for the CDH_BUILD_NUMBER
repository was Apache Kite in some of our test code.
This transitions that code to use the public version
of Apache Kite.

The testdata/TableFlattener Java project is intended
to be used manually and is not used for any tests.
It has bitrotted and currently does not build. I verified
that it now builds (which it currently did not), but I did
not verify functionality.

Testing:
 - Ran a core job
 - Built testdata/TableFlattener Java project

Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a
Reviewed-on: http://gerrit.cloudera.org:8080/16543
Tested-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 
---
M bin/impala-config.sh
M buildall.sh
M impala-parent/pom.xml
M testdata/TableFlattener/pom.xml
M 
testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/Main.java
M testdata/pom.xml
6 files changed, 55 insertions(+), 116 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Csaba Ringhofer: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a
Gerrit-Change-Number: 16543
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16547 )

Change subject: IMPALA-9664: Fix typo in test_event_processing.py
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3
Gerrit-Change-Number: 16547
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:26:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16547 )

Change subject: IMPALA-9664: Fix typo in test_event_processing.py
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6519/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3
Gerrit-Change-Number: 16547
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:26:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9792: Add ability to split kudu scan ranges

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16385 )

Change subject: IMPALA-9792: Add ability to split kudu scan ranges
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8
Gerrit-Change-Number: 16385
Gerrit-PatchSet: 9
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:24:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9792: Add ability to split kudu scan ranges

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16385 )

Change subject: IMPALA-9792: Add ability to split kudu scan ranges
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6518/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8
Gerrit-Change-Number: 16385
Gerrit-PatchSet: 9
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:24:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16503 )

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7368/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:21:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16503 )

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@824
PS2, Line 824: if (!ComputeCandidatePages(page_locations, 
candidate_ranges_, row_group.num_rows,
> this is not needed, as we set it to false in ResetPageFiltering() anyway
Done


http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc
File be/src/exec/parquet/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@172
PS1, Line 172: while (next_page_idx < page_locations.size() &&
 :page_locations[next_page_idx].compressed_page_size == 
0) {
 :   ++next_page_idx;
 : }
> I don't think that copying a vector with the size of the number of pages wo
Yeah, for now I'd rather leave it that way.



--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Oct 2020 15:03:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Zoltan Borok-Nagy (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16503

to look at the new patch set (#3).

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..

IMPALA-9952: Fix page index filtering for empty pages

As IMPALA-4371 and IMPALA-10186 points out, Impala might write
empty data pages. It usually does that when it has to write a bigger
page than the current page size. If we really need to write empty data
pages is a different question, but we need to handle them correctly
as there are already such files out there.

The corresponding Parquet offset index entries to empty data pages
are invalid PageLocation objects with 'compressed_page_size' = 0.
Before this commit Impala didn't ignore the empty page locations, but
generated a warning. Since invalid page index doesn't fail a scan
by default, Impala continued scanning the file with semi-initialized
page filtering. This resulted in 'Top level rows aren't in sync'
error, or a crash in DEBUG builds.

With this commit Impala ignores empty data pages and still able to
filter the rest of the pages. Also, if the page index is corrupt
for some other reason, Impala correctly resets the page filtering
logic and falls back to regular scanning.

Testing:
* Added unit test for empty data pages
* Added e2e test for empty data pages
* Added e2e test for invalid page index

Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-common-test.cc
M be/src/exec/parquet/parquet-common.cc
M testdata/data/README
A testdata/data/alltypes_empty_pages.parquet
A testdata/data/alltypes_invalid_pages.parquet
M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
M tests/query_test/test_parquet_stats.py
9 files changed, 195 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16503/3
--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16552 )

Change subject: IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7367/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69
Gerrit-Change-Number: 16552
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 06 Oct 2020 14:50:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16503 )

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@824
PS2, Line 824:   filter_pages_ = false;
this is not needed, as we set it to false in ResetPageFiltering() anyway


http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc
File be/src/exec/parquet/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@172
PS1, Line 172: int next_page_idx = page_idx + 1;
 : while (next_page_idx < page_locations.size() &&
 :page_locations[next_page_idx].compressed_page_size == 
0) {
 :
> Yeah it'd also work, but I didn't want to dynamically allocate a new vector
I don't think that copying a vector with the size of the number of pages would 
matter, but I am ok with the current solution if you prefer it.



--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Oct 2020 14:46:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.

2020-10-06 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16552


Change subject: IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.
..

IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.

Total RPC time was incorrectly computed using
resp_.receiver_latency_ns() in function EndDataStreamCompleteCb(). This
patch fix the bug by replacing it with eos_rsp_.receiver_latency_ns().
This patch also fix logging mistakes in LogSlowRpc() to use its 'resp'
parameter instead of 'resp_' field member.

Testing:
- Manually run data loading query that exhibit the bug for several times
  and verify that the Min value of RpcNetworkTime counter is always
  positive after the patch. The query used in testing is insert query to
  TPC-DS fact table store_sales of scale 10GB in single machine mini
  cluster.
- Add DCHECK to verify that total rpc time is greater than or equal to
  receiver_latency_ns.
- Run and pass core tests.

Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69
---
M be/src/runtime/krpc-data-stream-sender.cc
1 file changed, 8 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16552/1
--
To view, visit http://gerrit.cloudera.org:8080/16552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69
Gerrit-Change-Number: 16552
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16503 )

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7366/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Oct 2020 13:59:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16503 )

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..


Patch Set 2:

(5 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16503/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16503/1//COMMIT_MSG@11
PS1, Line 11: data
> typo: data
Done


http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@637
PS1, Line 637:   if (filter_pages_ && candidate_ranges_.empty()) {
> nit: extra line
Done


http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@724
PS1, Line 724: ETURN_IF_ERR
> Maybe make filter_pages a member, and reset it in ResetPageFiltering?
Done


http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc
File be/src/exec/parquet/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@142
PS1, Line 142: auto& last_valid_page = page_locations[last_valid_idx];
 :
> can you add some comment in this block? e.g. first_row_index must have prog
Done


http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@172
PS1, Line 172: int next_page_idx = page_idx + 1;
 : while (next_page_idx < page_locations.size() &&
 :page_locations[next_page_idx].compressed_page_size == 
0) {
 :
> I may have missed something, but wouldn't it be simpler to create a copy of
Yeah it'd also work, but I didn't want to dynamically allocate a new vector and 
copy integers. Though I didn't measure what would be the actual overhead.



--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Oct 2020 13:40:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages

2020-10-06 Thread Zoltan Borok-Nagy (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16503

to look at the new patch set (#2).

Change subject: IMPALA-9952: Fix page index filtering for empty pages
..

IMPALA-9952: Fix page index filtering for empty pages

As IMPALA-4371 and IMPALA-10186 points out, Impala might write
empty data pages. It usually does that when it has to write a bigger
page than the current page size. If we really need to write empty data
pages is a different question, but we need to handle them correctly
as there are already such files out there.

The corresponding Parquet offset index entries to empty data pages
are invalid PageLocation objects with 'compressed_page_size' = 0.
Before this commit Impala didn't ignore the empty page locations, but
generated a warning. Since invalid page index doesn't fail a scan
by default, Impala continued scanning the file with semi-initialized
page filtering. This resulted in 'Top level rows aren't in sync'
error, or a crash in DEBUG builds.

With this commit Impala ignores empty data pages and still able to
filter the rest of the pages. Also, if the page index is corrupt
for some other reason, Impala correctly resets the page filtering
logic and falls back to regular scanning.

Testing:
* Added unit test for empty data pages
* Added e2e test for empty data pages
* Added e2e test for invalid page index

Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-common-test.cc
M be/src/exec/parquet/parquet-common.cc
M testdata/data/README
A testdata/data/alltypes_empty_pages.parquet
A testdata/data/alltypes_invalid_pages.parquet
M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
M tests/query_test/test_parquet_stats.py
9 files changed, 196 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16503/2
--
To view, visit http://gerrit.cloudera.org:8080/16503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Gerrit-Change-Number: 16503
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16551 )

Change subject: IMPALA-10165: Implement Bucket and Truncate partition 
transforms for Iceberg tables
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7365/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Gerrit-Change-Number: 16551
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 06 Oct 2020 12:37:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables

2020-10-06 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16551 )

Change subject: IMPALA-10165: Implement Bucket and Truncate partition 
transforms for Iceberg tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java:

http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@49
PS1, Line 49:   throw new AnalysisException("The parameter of a 
partition transform should " +
> line too long (91 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/16551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Gerrit-Change-Number: 16551
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 06 Oct 2020 12:17:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables

2020-10-06 Thread Gabor Kaszab (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16551

to look at the new patch set (#2).

Change subject: IMPALA-10165: Implement Bucket and Truncate partition 
transforms for Iceberg tables
..

IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg 
tables

This patch adds support for Iceberg Bucket and Truncate partition
transforms. Both accept a parameter: number of buckets and width
respectively.

Usage:
CREATE TABLE tbl_name (i int, p1 int, p2 timestamp)
PARTITION BY SPEC (
  p1 BUCKET 10,
  p1 TRUNCATE 5
) STORED AS ICEBERG
TBLPROPERTIES ('iceberg.catalog'='hadoop.tables');

Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
A fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
12 files changed, 340 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/16551/2
--
To view, visit http://gerrit.cloudera.org:8080/16551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Gerrit-Change-Number: 16551
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property

2020-10-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16550 )

Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 
'iceberg.file_format' as Iceberg table property
..


Patch Set 2: Code-Review+2

Thanks for this fix, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/16550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Gerrit-Change-Number: 16550
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 11:28:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16550 )

Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 
'iceberg.file_format' as Iceberg table property
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6517/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Gerrit-Change-Number: 16550
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 11:28:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] MPALA-10075: Reuse unchanged partition instances

2020-10-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16392 )

Change subject: MPALA-10075: Reuse unchanged partition instances
..


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@7
PS5, Line 7: MPALA
typo: IMPALA


http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@11
PS5, Line 11: redudant
typo: redundant


http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@13
PS5, Line 13: changes
typo: changed


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@293
PS5, Line 293:   if (oldFd == null || oldFd.getFileLength() != 
fd.getFileLength()
 :   || oldFd.getModificationTime() != 
fd.getModificationTime()
Are you sure that we shouldn't check other properties too? Or the modification 
time will change for all other changes?

An example that comes to mind is HDFS rebalancing - my guess is that if the 
locations of some blocks change for a file, the modification time won't change, 
so we won't propagate the change, which can lead to suboptimal plans.

One more thing: I would prefer to move this logic to FileDescriptor, e.g. in a 
function like equalsTo(), or isUnchanged().


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541
PS5, Line 1541: filesChanged_
I think that relying on filesChanged_ is counter intuitive in equalsTo. 
Changing the name to something like "equalsToOriginal" would reflect the 
semantics better, as we assume here that the partition is compared to the same 
partition before the update.

Checking the files here again is also a solution - it would waste some CPU but 
the semantics would be clear.


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541
PS5, Line 1541: oldInstance == oldInstance_
Is it possible for these two to be different?


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1545
PS5, Line 1545: oldInstance_
We are using oldInstance_ here and in the last line while using oldInstance in 
all other places. This is correct as we have checked that the two are equal, 
but I would prefer more consistency.


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@933
PS5, Line 933:   // TODO: trace
Do you plan to change the logging below?


http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py
File tests/metadata/test_reuse_partitions.py:

http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@49
PS5, Line 49: test_reuse_partitions
An idea to extend this test is to run the same queries for an insert_only 
transactional table. Some operations are not supported there (ALTER TABLE), but 
I think that other operations should lead to the same result. What may affect 
this is writeId - if it is changed while the partition actually doesn't change, 
it could lead to less reuse than in the the non-transactional case. It would be 
good to verify that this doesn't happen.


http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@55
PS5, Line 55:   "create table %s.%s (id int) partitioned by (p int) stored 
as textfile"
:   % (unique_database, tbl_name))
nit: +2 indentation



--
To view, visit http://gerrit.cloudera.org:8080/16392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2dd645c260d271291021e52fdac4b74924df1170
Gerrit-Change-Number: 16392
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 06 Oct 2020 10:44:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)

2020-10-06 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16545 )

Change subject: IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg 
tables (Parquet)
..


Patch Set 1:

(3 comments)

Thanks for this new feature, some small nits

http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/output-partition.h
File be/src/exec/output-partition.h:

http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/output-partition.h@25
PS1, Line 25: needed
Needed?


http://gerrit.cloudera.org:8080/#/c/16545/1/common/thrift/Descriptors.thrift
File common/thrift/Descriptors.thrift:

http://gerrit.cloudera.org:8080/#/c/16545/1/common/thrift/Descriptors.thrift@54
PS1, Line 54: struct TIcebergColumnDescriptor {
:   1: required i32 fieldId
: }
Maybe some comment here? If this struct just contains fieldId, why not just use 
i32 in TColumnDescriptor?


http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@213
PS1, Line 213: String tableLoc = feIcebergTable.getLocation();
 : HadoopTables tables = IcebergUtil.getHadoopTables();
These two variables not been used in this method.



--
To view, visit http://gerrit.cloudera.org:8080/16545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
Gerrit-Change-Number: 16545
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 09:45:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property

2020-10-06 Thread wangsheng (Code Review)
Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16550

to look at the new patch set (#2).

Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 
'iceberg.file_format' as Iceberg table property
..

IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg 
table property

We provide several new table properties in IMPALA-10164, such as
'iceberg.catalog', in order to keep consist of these properties, we
rename 'iceberg_file_format' to 'iceberg.file_format'. When we creating
Iceberg table, we should use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.file_format'='parquet',
'iceberg.catalog'='hadoop.tables')

Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
6 files changed, 22 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/16550/2
--
To view, visit http://gerrit.cloudera.org:8080/16550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Gerrit-Change-Number: 16550
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property

2020-10-06 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16550 )

Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 
'iceberg.file_format' as Iceberg table property
..


Patch Set 2:

(1 comment)

Thanks for review!

http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG@9
PS1, Line 9:
> nit: in commit message body please keep the lines under 72 chars width.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Gerrit-Change-Number: 16550
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 09:30:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property

2020-10-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16550 )

Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 
'iceberg.file_format' as Iceberg table property
..


Patch Set 1: Code-Review+1

(1 comment)

Only a small nit about the commit message, otherwise LGTM!

http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG@9
PS1, Line 9: g.catalog',
nit: in commit message body please keep the lines under 72 chars width.



--
To view, visit http://gerrit.cloudera.org:8080/16550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Gerrit-Change-Number: 16550
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Oct 2020 09:27:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16551 )

Change subject: IMPALA-10165: Implement Bucket and Truncate partition 
transforms for Iceberg tables
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7364/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Gerrit-Change-Number: 16551
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 06 Oct 2020 09:25:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables

2020-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16551 )

Change subject: IMPALA-10165: Implement Bucket and Truncate partition 
transforms for Iceberg tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java:

http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@49
PS1, Line 49:   throw new AnalysisException("The parameter of a 
partition transform should be " +
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Gerrit-Change-Number: 16551
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 06 Oct 2020 09:06:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables

2020-10-06 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16551


Change subject: IMPALA-10165: Implement Bucket and Truncate partition 
transforms for Iceberg tables
..

IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg 
tables

This patch adds support for Iceberg Bucket and Truncate partition
transforms. Both accept a parameter: number of buckets and width
respectively.

Usage:
CREATE TABLE tbl_name (i int, p1 int, p2 timestamp)
PARTITION BY SPEC (
  p1 BUCKET 10,
  p1 TRUNCATE 5
) STORED AS ICEBERG
TBLPROPERTIES ('iceberg.catalog'='hadoop.tables');

Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
A fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
12 files changed, 340 insertions(+), 65 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/16551/1
--
To view, visit http://gerrit.cloudera.org:8080/16551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Gerrit-Change-Number: 16551
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash

2020-10-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16534 )

Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile 
path hash
..


Patch Set 3:

Looking at THdfsFileDesc it might be not possible to avoid memory overhead at 
the coordinator-side.

The file descriptors only store the relative path, but we need the absolute 
path to filter Iceberg data files, therefore we need to construct the paths 
from  + .

So maybe we should just stick with Murmur3?

I've spotted another issue with our FileDescriptor class. For getRelativePath() 
it uses the underlying flat buffer to return a value:

https://github.com/apache/impala/blob/d453d52aadcbd158147b906813b22eb2944ac90b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L231

Since the flat buffer doesn't contain a Java String object, whenever we issue 
fileDesc.getRelativePath() it returns a newly constructed java String each time.


--
To view, visit http://gerrit.cloudera.org:8080/16534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
Gerrit-Change-Number: 16534
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 06 Oct 2020 08:23:46 +
Gerrit-HasComments: No