[Impala-ASF-CR] IMPALA-11135: Deflake LEFT ANTI JOIN test case in test spilling.py

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18261 )

Change subject: IMPALA-11135: Deflake LEFT ANTI JOIN test case in 
test_spilling.py
..

IMPALA-11135: Deflake LEFT ANTI JOIN test case in test_spilling.py

TestSpillingDebugActionDimensions.test_spilling has been flaky because a
test case from IMPALA-9725 sometimes does not spill its hash join
partition. This patch lowers the buffer_pool_limit of this test from
110MB to 105MB, just slightly above its Max Per-Host Resource
Reservation (104.61MB), to ensure consistent spilling behavior.

Testing:
After lowering the buffer pool limit, I loop the test 1000 times, and
all spill consistently in fragment "HASH_JOIN_NODE (id=14)".
To be specific, these are the num of SpilledPartitions of the first
instance (ending with "000d") of "Hash Join Builder (join_node_id=14)"
fragment across 1000 query runs:

++--+
| #SpilledPartitions | #Queries |
++--+
|  2 |   30 |
|  3 |   96 |
|  4 |  674 |
|  5 |   52 |
|  6 |  146 |
|  7 |2 |
++--+

Change-Id: Idad9fc6ec6a0ba7fc70e0701e567da7165e40e83
Reviewed-on: http://gerrit.cloudera.org:8080/18261
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
1 file changed, 3 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idad9fc6ec6a0ba7fc70e0701e567da7165e40e83
Gerrit-Change-Number: 18261
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-11135: Deflake LEFT ANTI JOIN test case in test spilling.py

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18261 )

Change subject: IMPALA-11135: Deflake LEFT ANTI JOIN test case in 
test_spilling.py
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idad9fc6ec6a0ba7fc70e0701e567da7165e40e83
Gerrit-Change-Number: 18261
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 24 Feb 2022 07:06:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Feb 2022 06:24:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..

IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

This patch addresses the potential row count over-estimation against
HBase tables by capping the estimation by the row count when available
from HMS.

Testing:
  1. ran core test successfully

Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Reviewed-on: http://gerrit.cloudera.org:8080/18250
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
1 file changed, 6 insertions(+), 4 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..

IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition 
columns

Impala crashes on a Parquet file that contains the partition columns.
Data files usually don't contain the partition columns, so Impala don't
expect to find such columns in the data files. Unfortunately min/max
filtering generates a SEGFAULT when the partition column is present in
the data files.

It happens when FindSkipRangesForPagesWithMinMaxFilters() tries to
retrieve the Parquet schema element for a given slot descriptor. When
the slot descriptor refers to a partition column, we usually don't find
a schema element so we don't try to skip pages.

But when the partition column is present in the data file, the code
tries to calculate the filtered pages for the column. It uses the column
reader object corresponding to the column, but this is NULL for
partition columns, hence we get a SEGFAULT.

The code shouldn't do anything at the page-level for partition columns,
as the data in such columns are the same for the whole file and it is
already filtered at a higher level.

Testing:
 * added e2e test

Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Reviewed-on: http://gerrit.cloudera.org:8080/18265
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M testdata/data/README
A testdata/data/partition_col_in_parquet.parquet
M tests/query_test/test_runtime_filters.py
4 files changed, 35 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 24 Feb 2022 04:24:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10219/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Feb 2022 03:25:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 16:

(9 comments)

Thanks for your feedbacks, Qifan!

Addressed the comments and added tests for
* empty string
* large string that exceeds the limit
* DATE type

Also added some logs that are useful for debugging.

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@20
PS14, Line 20: Currently they
 : are generated for the build side of a broadcast join. They will 
only be
 : applied on ORC
> nit. They are generated for the build side of a broadcast join.
Done


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@24
PS14, Line 24: ld side exceeds a threshold (default to 1024), we se
> nit. # of distinct values of the build side exceeds a threshold (default to
Done


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@39
PS14, Line 39: IN-list filters are disabled by default
> It seems a formal performance test against TPCDs can help decide the defaul
Yeah, hopefully we can turn it on if all the regressions are resolved.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@380
PS14, Line 380:
DCHECK(query_state_->query_options().__isset.runtime_in_list_filter_entry_limit);
  : int entry_limit = 
query_state_->query_options().runtime_in_list_filter_entry_limit;
  : in_list_filter = 
InListFilter::Create(params.in_list_filter(),
  : fs->consumed_filter->type(), entry_limit, _pool_);
  : fs->in_list_filters.push_back(in_list_filter);
  : 
total_in_list_filter_items_->Add(params.in_list_filter().value_size());
  : details = Substitute(" with $0 items", params.
> Duplicated code with a portion of implementation of RuntimeFilterBank::Allo
The codes are much simpler now.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@444
PS14, Line 444:
> I wonder if we need this, since runtime_in_list_filter_entry_limit is defau
Yeah, I think thrift will make sure the default value is set. Removed this.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc@61
PS14, Line 61: str_va
> should set always_true_ to true here.
Oops, thanks for catching this!


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift@725
PS14, Line 725:   // Maximum number of entries in a runtime in-list filter.
> nit. missing comment.
Done


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift@578
PS14, Line 578:   // See comment in ImpalaService.thrift
> nit. missing comment.
Done


http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127
PS13, Line 127: select STRAIGHT_JOIN count(*) from date_tbl a
> Okay.
Added the DATE tests.



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Feb 2022 03:04:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-23 Thread Quanlong Huang (Code Review)
Hello Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18141

to look at the new patch set (#16).

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
..

IMPALA-10898: Add runtime IN-list filters for ORC tables

ORC files have optional bloom filter indexes for each column. Since
ORC-1.7.0, the C++ reader supports pushing down predicates to skip
unreleated RowGroups. The pushed down predicates will be evaludated on
file indexes (i.e. statistics and bloom filter indexes). Note that only
EQUALS and IN-list predicates can leverage bloom filter indexes.

Currently Impala has two kinds of runtime filters: bloom filter and
min-max filter. Unfortunately they can't be converted into EQUALS or
IN-list predicates. So they can't leverage the file level bloom filter
indexes.

This patch adds runtime IN-list filters for this purpose. Currently they
are generated for the build side of a broadcast join. They will only be
applied on ORC tables and be pushed down to the ORC reader(i.e. ORC
lib). To avoid exploding the IN-list, if # of distinct values of the
build side exceeds a threshold (default to 1024), we set the filter to
ALWAYS_TRUE and clear its entry. The threshold can be configured by a
new query option, RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT.

Evaluating runtime IN-list filters is much slower than evaluating
runtime bloom filters due to the current simple implementation (i.e.
std::unorder_set) and the lack of codegen. So we disable it at row
level.

For visibility, this patch addes two counters in the HdfsScanNode:
 - NumPushedDownPredicates
 - NumPushedDownRuntimeFilters
They reflect the predicates and runtime filters that are pushed down to
the ORC reader.

Currently, runtime IN-list filters are disabled by default. This patch
extends the query option, ENABLED_RUNTIME_FILTER_TYPES, to support a
comma separated list of filter types. It defaults to be "BLOOM,MIN_MAX".
Add "IN_LIST" in it to enable runtime IN-list filters.

Ran perf tests on a 3 instances cluster on my desktop using TPC-DS with
scale factor 20. It shows significant improvements in some queries:

+---+-+++-++++---++-++
| Workload  | Query   | File Format| Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval   |
+---+-+++-++++---++-++
| TPCDS(20) | TPCDS-Q67A  | orc / snap / block | 35.07  | 44.01   | I 
-20.32%  |   0.38%|   1.38%| 10| I -25.69%  | -3.58   | 
-45.33 |
| TPCDS(20) | TPCDS-Q37   | orc / snap / block | 1.08   | 1.45| I 
-25.23%  |   7.14%|   3.09%| 10| I -34.09%  | -3.58   | 
-12.94 |
| TPCDS(20) | TPCDS-Q70A  | orc / snap / block | 6.30   | 8.60| I 
-26.81%  |   5.24%|   4.21%| 10| I -36.67%  | -3.58   | 
-14.88 |
| TPCDS(20) | TPCDS-Q16   | orc / snap / block | 1.33   | 1.85| I 
-28.28%  |   4.98%|   5.92%| 10| I -39.38%  | -3.58   | 
-12.93 |
| TPCDS(20) | TPCDS-Q18A  | orc / snap / block | 5.70   | 8.06| I 
-29.25%  |   3.00%|   4.12%| 10| I -40.30%  | -3.58   | 
-19.95 |
| TPCDS(20) | TPCDS-Q22A  | orc / snap / block | 2.01   | 2.97| I 
-32.21%  |   6.12%|   5.94%| 10| I -47.68%  | -3.58   | 
-14.05 |
| TPCDS(20) | TPCDS-Q77A  | orc / snap / block | 8.49   | 12.44   | I 
-31.75%  |   6.44%|   3.96%| 10| I -49.71%  | -3.58   | 
-16.97 |
| TPCDS(20) | TPCDS-Q75   | orc / snap / block | 7.76   | 12.27   | I 
-36.76%  |   5.01%|   3.87%| 10| I -59.56%  | -3.58   | 
-23.26 |
| TPCDS(20) | TPCDS-Q21   | orc / snap / block | 0.71   | 1.27| I 
-44.26%  |   4.56%|   4.24%| 10| I -77.31%  | -3.58   | 
-28.31 |
| TPCDS(20) | TPCDS-Q80A  | orc / snap / block | 9.24   | 20.42   | I 
-54.77%  |   4.03%|   3.82%| 10| I -123.12% | -3.58   | 
-40.90 |
| TPCDS(20) | TPCDS-Q39-1 | orc / snap / block | 1.07   | 2.26| I 
-52.74%  | * 23.83% * |   2.60%| 10| I -149.68% | -3.58   | 
-14.43 |
| TPCDS(20) | TPCDS-Q39-2 | orc / snap / block | 1.00   | 2.33| I 
-56.95%  | * 19.53% * |   2.07%| 10| I -151.89% | -3.58   | 
-20.81 |
+---+-+++-++++---++-++
"Base Avg" is the avg of the original time. "Avg" is the current time.

However, we also see some regressions due to the suboptimal
implementation. The follow-up JIRAs will focus 

[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7871/


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Feb 2022 01:18:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11133 (Addendum): Encode a string in utf8 before printing it

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18270 )

Change subject: IMPALA-11133 (Addendum): Encode a string in utf8 before 
printing it
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10218/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad9b1fb0a523e219bc9f40a57ff7335808be283f
Gerrit-Change-Number: 18270
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Feb 2022 00:48:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11008: fix incorrect to propagate inferred predicates

2022-02-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18234 )

Change subject: IMPALA-11008: fix incorrect to propagate inferred predicates
..


Patch Set 7: Code-Review+1

> Patch Set 7: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7868/

The failure is due to IMPALA-11150.


--
To view, visit http://gerrit.cloudera.org:8080/18234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e64230f6d0c2b9ef1560186ceba349a5920ccdf
Gerrit-Change-Number: 18234
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Thu, 24 Feb 2022 00:30:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11133 (Addendum): Encode a string in utf8 before printing it

2022-02-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18270 )

Change subject: IMPALA-11133 (Addendum): Encode a string in utf8 before 
printing it
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad9b1fb0a523e219bc9f40a57ff7335808be283f
Gerrit-Change-Number: 18270
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Feb 2022 00:30:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11133 (Addendum): Encode a string in utf8 before printing it

2022-02-23 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18270


Change subject: IMPALA-11133 (Addendum): Encode a string in utf8 before 
printing it
..

IMPALA-11133 (Addendum): Encode a string in utf8 before printing it

In the first part of this patch, we decoded a string with 'utf8' in
order to print it (on the command line) since the author field of a
commit could contain non-ASCII characters.

However, we did not take into consideration that in some scenarios,
we would like to redirect the output to another file. If this is the
case, then we may encounter a UnicodeEncodeError due to
sys.stdout.encoding being None. To resolve the issue, we encode the
formatted string in 'utf8'.

Testing:
 - Manually verified that we won't get a UnicodeEncodeError if we
   redirect the output to another file.

Change-Id: Iad9b1fb0a523e219bc9f40a57ff7335808be283f
---
M bin/compare_branches.py
1 file changed, 2 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/18270/1
--
To view, visit http://gerrit.cloudera.org:8080/18270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iad9b1fb0a523e219bc9f40a57ff7335808be283f
Gerrit-Change-Number: 18270
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11135: Deflake LEFT ANTI JOIN test case in test spilling.py

2022-02-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18261 )

Change subject: IMPALA-11135: Deflake LEFT ANTI JOIN test case in 
test_spilling.py
..


Patch Set 1: Code-Review+2

Thanks for spending time on this!


--
To view, visit http://gerrit.cloudera.org:8080/18261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idad9fc6ec6a0ba7fc70e0701e567da7165e40e83
Gerrit-Change-Number: 18261
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 24 Feb 2022 00:21:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11135: Deflake LEFT ANTI JOIN test case in test spilling.py

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18261 )

Change subject: IMPALA-11135: Deflake LEFT ANTI JOIN test case in 
test_spilling.py
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7874/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idad9fc6ec6a0ba7fc70e0701e567da7165e40e83
Gerrit-Change-Number: 18261
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 24 Feb 2022 00:21:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11135: Deflake LEFT ANTI JOIN test case in test spilling.py

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18261 )

Change subject: IMPALA-11135: Deflake LEFT ANTI JOIN test case in 
test_spilling.py
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idad9fc6ec6a0ba7fc70e0701e567da7165e40e83
Gerrit-Change-Number: 18261
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 24 Feb 2022 00:21:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11133: Decode author of a commit with utf8 before printing it

2022-02-23 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18256 )

Change subject: IMPALA-11133: Decode author of a commit with utf8 before 
printing it
..


Patch Set 6:

Hi Quanlong and Laszlo, I just realized the issue is a bit more complicated 
than expected.

Running "$IMPALA_HOME/bin/compare_branches.py --source_remote_name="" 
--source_branch apache-ref-master --target_remote_name="" --target_branch 
cdw-master-staging" on the command line is okay.

However, if we try to redirect the result to another file like 
"$IMPALA_HOME/bin/compare_branches.py --source_remote_name="" --source_branch 
apache-ref-master --target_remote_name="" --target_branch cdw-master-staging > 
out.txt", we will see the following error.

Traceback (most recent call last):
  File "/home/fangyurao/Impala_for_FE/bin/compare_branches.py", line 290, in 

main()
  File "/home/fangyurao/Impala_for_FE/bin/compare_branches.py", line 271, in 
main
.format(commit_hash, msg.decode('utf8'), date, author.decode('utf8'))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 
156: ordinal not in range(128)

I have verified that appending ".encode()" to ".format()" could fix the 
problem. I will thus push a follow-up patch.


--
To view, visit http://gerrit.cloudera.org:8080/18256
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb03b0937a994db2bf08e4199574d04f7fb99f5d
Gerrit-Change-Number: 18256
Gerrit-PatchSet: 6
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 23:56:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7873/ 
DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 23:44:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 23:44:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 23:44:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10999 Flakiness in TestAsyncLoadData.test async load

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18268 )

Change subject: IMPALA-10999 Flakiness in TestAsyncLoadData.test_async_load
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10217/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2ac954b0494b7413ce0ec405718fcc354dba9e0
Gerrit-Change-Number: 18268
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 23 Feb 2022 23:12:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18267 )

Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..

IMPALA-11150: Remove resource-requirements tests on functional_parquet.alltypes

These test became flaky after IMPALA-10961 as it led to smaller and
varying size for the table. This is a short term solution to make
builds green as fixing the tests properly may take some time.

Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Reviewed-on: http://gerrit.cloudera.org:8080/18267
Reviewed-by: Fang-Yu Rao 
Reviewed-by: Csaba Ringhofer 
Tested-by: Csaba Ringhofer 
---
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
1 file changed, 0 insertions(+), 263 deletions(-)

Approvals:
  Fang-Yu Rao: Looks good to me, but someone else must approve
  Csaba Ringhofer: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18267 )

Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..


Patch Set 1:

Merging this without test run to unblock tests quickly


--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 23 Feb 2022 22:54:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18267 )

Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 23 Feb 2022 22:54:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18267 )

Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 23 Feb 2022 22:53:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10999 Flakiness in TestAsyncLoadData.test async load

2022-02-23 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18268


Change subject: IMPALA-10999 Flakiness in TestAsyncLoadData.test_async_load
..

IMPALA-10999 Flakiness in TestAsyncLoadData.test_async_load

This patch addresses the flakness by allowing RUNNING state to be a
legit exec state returned from execute_query_async_using_client() in
python. This call submits the load query to Impala backend.

The corresponding Impala backend code for beewax protocol is
ImpalaServer::query() which utilizes a wait thread executing
ClientRequestState::Wait() to set the exec state from RUNNING to
FINISH. Sometime, when this wait thread does not run fast to do so,
the returned state can be RUNNING.

The fix is purely a modification to the test itself.

Testing:
  1. ran core test successfully

Change-Id: Ic2ac954b0494b7413ce0ec405718fcc354dba9e0
---
M tests/metadata/test_load.py
1 file changed, 11 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/18268/1
--
To view, visit http://gerrit.cloudera.org:8080/18268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic2ac954b0494b7413ce0ec405718fcc354dba9e0
Gerrit-Change-Number: 18268
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7872/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 21:48:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 21:48:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7870/


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 21:41:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18267 )

Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 23 Feb 2022 20:57:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18267 )

Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..


Patch Set 1:

Thanks very much for the help Csaba! I do not have any additional comment on 
this patch.


--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 23 Feb 2022 20:56:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18267 )

Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10216/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 23 Feb 2022 20:41:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11008: fix incorrect to propagate inferred predicates

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18234 )

Change subject: IMPALA-11008: fix incorrect to propagate inferred predicates
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7868/


--
To view, visit http://gerrit.cloudera.org:8080/18234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e64230f6d0c2b9ef1560186ceba349a5920ccdf
Gerrit-Change-Number: 18234
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Wed, 23 Feb 2022 20:24:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11150: Remove resource-requirements tests on functional parquet.alltypes

2022-02-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18267


Change subject: IMPALA-11150: Remove resource-requirements tests on 
functional_parquet.alltypes
..

IMPALA-11150: Remove resource-requirements tests on functional_parquet.alltypes

These test became flaky after IMPALA-10961 as it led to smaller and
varying size for the table. This is a short term solution to make
builds green as fixing the tests properly may take some time.

Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
---
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
1 file changed, 0 insertions(+), 263 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/18267/1
--
To view, visit http://gerrit.cloudera.org:8080/18267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5bf0f963d3e053345aec27e834974eeead4190ac
Gerrit-Change-Number: 18267
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-11137: Enable proleptic Gregorian Calendar for Hive

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18262 )

Change subject: IMPALA-11137: Enable proleptic Gregorian Calendar for Hive
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6be9c9720dd352d6821cdaa6c64d35ba20473bc0
Gerrit-Change-Number: 18262
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 20:09:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11137: Enable proleptic Gregorian Calendar for Hive

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18262 )

Change subject: IMPALA-11137: Enable proleptic Gregorian Calendar for Hive
..

IMPALA-11137: Enable proleptic Gregorian Calendar for Hive

Since HIVE-22589, Hive still uses Julian Calendar for writing dates
before 1582-10-15, whereas Impala uses proleptic Gregorian Calendar.
This affects the results Impala gets when querying tables written by
Hive. Currently, the Avro and ORC formats of date_tbl are suffering this
issue.

This patch enables proleptic Gregorian Calendar for Hive by default.
It also reverts the two commits of IMPALA-9555 which modifies the tests
to satisfy the inconsistent results.

Tests:
 - Ran CORE tests

Change-Id: I6be9c9720dd352d6821cdaa6c64d35ba20473bc0
Reviewed-on: http://gerrit.cloudera.org:8080/18262
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/test/resources/hive-site.xml.py
M testdata/workloads/functional-query/queries/QueryTest/avro_date.test
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
M tests/query_test/test_date_queries.py
4 files changed, 36 insertions(+), 38 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6be9c9720dd352d6821cdaa6c64d35ba20473bc0
Gerrit-Change-Number: 18262
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-23 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..


Patch Set 21:

(8 comments)

> (12 comments)
 >
 > > > > (8 comments)
 > > > >
 > > > > Thank you for taking this on. There is a lot of history here.
 > > > > Originally, the file handle cache used a generic structure
 > like
 > > > > this:
 > > > > https://github.com/apache/impala/blob/branch-2.10.0/be/src/util/lru-cache.h
 > > > >
 > > > > In my rewrite, I switched it to remove the generic structure.
 > > > This
 > > > > heads back in the other direction.
 > > > >
 > > > > I like that you have backend tests for the generic data
 > > > structure,
 > > > > which is definitely one advantage of that approach. One
 > > question
 > > > I
 > > > > have about moving back to a generic structure is whether we
 > > would
 > > > > be able to add new customization to the file handle cache
 > case.
 > > I
 > > > > had been thinking about adding a file structure that could
 > > > contain
 > > > > additional per-file data and/or stats. Is that possible with
 > > the
 > > > > new generic structure?
 > > >
 > > > "had been thinking about adding a file structure that could
 > > contain
 > > > additional per-file data and/or stat"
 > > >
 > > > I was thinking about similar things (e.g. caching processed
 > > > Parquet/ORC headers), but this seems a somewhat different
 > feature
 > > > to me - while we want to cache more than one file handle per
 > file
 > > > and apply LRU logic per handle, we want to cache data for a
 > file
 > > > only once and apply LRU logic per file.
 > >
 > > Yeah, it's unclear whether we would ever want to extend the file
 > > handle cache
 > > to deal with other things. Separate data structures may be
 > cleaner
 > > even
 > > if it means duplicating filename strings or other things. The
 > file
 > > handle
 > > cache is pretty unusual in structure and historically we haven't
 > > extended
 > > it. I don't have any strong objection to a generic structure. I
 > > just wanted
 > > to think through whether there are any extensions that would end
 > up
 > > getting more complicated.
 > >
 > > For more ordinary caches that don't need duplication, we should
 > be
 > > using the
 > > cache implementations in be/src/util/cache, because that also
 > gets
 > > us different
 > > cache eviction policies.
 >
 > Moving the caching to a separate class definitely helps with
 > testing.
 >
 > I think making it generic helps with encapsulation too, as the
 > caching algorithm (even if it is somewhat specialized to a use
 > case) has nothing to do with the stored data. Moreover, making it
 > generic helped with unit testing too, as I didn't have to juggle
 > around file handles to test out the caching feature.
 >
 > As I understand, the requirements were simple enough to fit it in a
 > generic structure. This would help in the future to decide - if we
 > ever want to use something similar to this - whether if it's easier
 > to extend with some configurability or not. We can specialize this
 > more towards file handle caching, although the arguments for
 > templated key/value are still applicable in that case, maybe we can
 > change the name and place of the code for something less generic.
 >
 > Regarding to per-file data/stats, we could squeeze it in (e.g.
 > unordered_map> but I don't think
 > that is a good direction, it breaks the encapsulation of the cache.
 > I would rather put a container in FileHandleCache class next to the
 > cache and do handle based operations/metrics during creation
 > (GetFileHandle() new entry branch) and do access based
 > operation/metrics in FileHandleCache::Accessor

This generic design is good, and we don't need to overthink this.
My feeling is that the odds of us reusing the generic structure for
something else are pretty low. ​(We'll find out! Maybe I'm wrong.)
If you believe that, then I think an intermediate point is a
non-generic structure that allows mocking the FileHandle. In other
words, it allows for writing backend tests without creating real
file handles, but it also isn't dealing with arbitrary types.
Either way, I think if we need to extend this, then we will find
a way to make it work. I'm not concerned.

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h
File be/src/runtime/io/handle-cache.inline.h:

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@167
PS21, Line 167:   // Opening a file handle requires talking to the NameNode so 
it can take some time.
  :   RETURN_IF_ERROR(accessor_tmp.Get()->Init(hdfs_monitor_));
> Yes, it will be in_use, does not lock the cache and not available for other
Ok, great!


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h
File be/src/util/lru-multi-cache.h:


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-23 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 3:

Hit by flaky test_async_load (IMPALA-10999)


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 23 Feb 2022 17:57:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7871/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 23 Feb 2022 17:58:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 23 Feb 2022 17:58:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-23 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 14:

(10 comments)

Great. Thanks a lot for the rework.

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@20
PS14, Line 20: Currently they
 : are generated only for small build side (e.g. #rows <= 1024) of a
 : broadcast join.
nit. They are generated for the build side of a broadcast join.

Suggest not to mention small build side until a few sentences later.


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@24
PS14, Line 24: #rows of the build side exceeds the threshold (1024)
nit. # of distinct values of the build side exceeds a threshold (default to 
1024).


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@39
PS14, Line 39: IN-list filters are disabled by default
It seems a formal performance test against TPCDs can help decide the default 
setting. Maybe we do this once Impala 11140, 11141 and 11142 are resolved.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@380
PS14, Line 380:uint32_t entry_limit = InListFilter::DEFAULT_ENTRY_LIMIT;
  : if 
(query_state_->query_options().__isset.runtime_in_list_filter_entry_limit) {
  :   entry_limit = 
query_state_->query_options().runtime_in_list_filter_entry_limit;
  : }
  : in_list_filter = 
InListFilter::Create(params.in_list_filter(),
  : fs->consumed_filter->type(), entry_limit, _pool_);
  : fs->in_list_filters.push_back(in_list_filter);
Duplicated code with a portion of implementation of 
RuntimeFilterBank::AllocateScratchInListFilte().


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@444
PS14, Line 444: :DEFAULT_ENTRY_LIMIT;
I wonder if we need this, since runtime_in_list_filter_entry_limit is default 
to 1024 and covers the non-negative domain.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@55
PS13, Line 55: if (UNLIKELY(s->ptr == nullptr)) {
 : contains_null_ = true
> The default constructor of StringValue creates a null 'ptr'. I think we'd b
Done


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc@61
PS14, Line 61: return
should set always_true_ to true here.


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift@725
PS14, Line 725:   RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT = 142;
nit. missing comment.


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift@578
PS14, Line 578:   143: optional i32 runtime_in_list_filter_entry_limit = 1024;
nit. missing comment.


http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127
PS13, Line 127:
> Sure. The ORC date_tbl is corrupted and need to wait for https://gerrit.clo
Okay.



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 14
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 16:10:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10215/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 15:29:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10214/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 15:24:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10213/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 15:08:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7870/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 15:02:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 3: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 15:01:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18265/2/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/18265/2/tests/query_test/test_runtime_filters.py@328
PS2, Line 328: =
flake8: E225 missing whitespace around operator



--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 15:01:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18265

to look at the new patch set (#3).

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..

IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition 
columns

Impala crashes on a Parquet file that contains the partition columns.
Data files usually don't contain the partition columns, so Impala don't
expect to find such columns in the data files. Unfortunately min/max
filtering generates a SEGFAULT when the partition column is present in
the data files.

It happens when FindSkipRangesForPagesWithMinMaxFilters() tries to
retrieve the Parquet schema element for a given slot descriptor. When
the slot descriptor refers to a partition column, we usually don't find
a schema element so we don't try to skip pages.

But when the partition column is present in the data file, the code
tries to calculate the filtered pages for the column. It uses the column
reader object corresponding to the column, but this is NULL for
partition columns, hence we get a SEGFAULT.

The code shouldn't do anything at the page-level for partition columns,
as the data in such columns are the same for the whole file and it is
already filtered at a higher level.

Testing:
 * added e2e test

Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M testdata/data/README
A testdata/data/partition_col_in_parquet.parquet
M tests/query_test/test_runtime_filters.py
4 files changed, 35 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/18265/3
--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18265

to look at the new patch set (#2).

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..

IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition 
columns

Impala crashes on a Parquet file that contains the partition columns.
Data files usually don't contain the partition columns, so Impala don't
expect to find such columns in the data files. Unfortunately min/max
filtering generates a SEGFAULT when the partition column is present in
the data files.

It happens when FindSkipRangesForPagesWithMinMaxFilters() tries to
retrieve the Parquet schema element for a given slot descriptor. When
the slot descriptor refers to a partition column, we usually don't find
a schema element so we don't try to skip pages.

But when the partition column is present in the data file, the code
tries to calculate the filtered pages for the column. It uses the column
reader object corresponding to the column, but this is NULL for
partition columns, hence we get a SEGFAULT.

The code shouldn't do anything at the page-level for partition columns,
as the data in such columns are the same for the whole file and it is
already filtered at a higher level.

Testing:
 * added e2e test

Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M testdata/data/README
A testdata/data/partition_col_in_parquet.parquet
M tests/query_test/test_runtime_filters.py
4 files changed, 35 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/18265/2
--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 1:

Thanks for the review!


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 14:57:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7869/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 23 Feb 2022 14:57:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition columns

2022-02-23 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18265 )

Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
..


Patch Set 1: Code-Review+2

Thanks a lot Zoltan for fixing this bug!


--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 23 Feb 2022 14:50:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..


Patch Set 5:

(1 comment)

Thanks Quanlong for the quick review!

http://gerrit.cloudera.org:8080/#/c/18250/4/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18250/4/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@322
PS4, Line 322: rowsFromHms =
> nit: rowsFromHms. We use CamelCase in FE codes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 14:44:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

2022-02-23 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18250 )

Change subject: IMPALA-11132 Front-end test 
PlannerTest.testResourceRequirements can fail
..

IMPALA-11132 Front-end test PlannerTest.testResourceRequirements can fail

This patch addresses the potential row count over-estimation against
HBase tables by capping the estimation by the row count when available
from HMS.

Testing:
  1. ran core test successfully

Change-Id: I11c51f76212e1337a7e726097931890c2edab182
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
1 file changed, 6 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/18250/5
--
To view, visit http://gerrit.cloudera.org:8080/18250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I11c51f76212e1337a7e726097931890c2edab182
Gerrit-Change-Number: 18250
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11112: Impala can't resolve json tables created by Hive

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18263 )

Change subject: IMPALA-2: Impala can't resolve json tables created by Hive
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7866/


--
To view, visit http://gerrit.cloudera.org:8080/18263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cac55b58dca88d900db3256ceaa25c17d7864d5
Gerrit-Change-Number: 18263
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 14:31:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11008: fix incorrect to propagate inferred predicates

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18234 )

Change subject: IMPALA-11008: fix incorrect to propagate inferred predicates
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7868/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e64230f6d0c2b9ef1560186ceba349a5920ccdf
Gerrit-Change-Number: 18234
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Wed, 23 Feb 2022 13:44:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11137: Enable proleptic Gregorian Calendar for Hive

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18262 )

Change subject: IMPALA-11137: Enable proleptic Gregorian Calendar for Hive
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6be9c9720dd352d6821cdaa6c64d35ba20473bc0
Gerrit-Change-Number: 18262
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 13:40:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11137: Enable proleptic Gregorian Calendar for Hive

2022-02-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18262 )

Change subject: IMPALA-11137: Enable proleptic Gregorian Calendar for Hive
..


Patch Set 1:

Thank Csaba!


--
To view, visit http://gerrit.cloudera.org:8080/18262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6be9c9720dd352d6821cdaa6c64d35ba20473bc0
Gerrit-Change-Number: 18262
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 13:39:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11137: Enable proleptic Gregorian Calendar for Hive

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18262 )

Change subject: IMPALA-11137: Enable proleptic Gregorian Calendar for Hive
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7867/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6be9c9720dd352d6821cdaa6c64d35ba20473bc0
Gerrit-Change-Number: 18262
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 13:40:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11008: fix incorrect to propagate inferred predicates

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18234 )

Change subject: IMPALA-11008: fix incorrect to propagate inferred predicates
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7865/


--
To view, visit http://gerrit.cloudera.org:8080/18234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e64230f6d0c2b9ef1560186ceba349a5920ccdf
Gerrit-Change-Number: 18234
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Wed, 23 Feb 2022 12:52:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10737: Optimize the number of Iceberg API Metadata requests

2022-02-23 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18226 )

Change subject: IMPALA-10737: Optimize the number of Iceberg API Metadata 
requests
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18226/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/18226/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1036
PS1, Line 1036: TGetPartialCatalogObjectResponse resp = sendRequest(req);
> Currently we always send the request for every query. Can we add caching? S
Updated this part with loadWithCaching().



--
To view, visit http://gerrit.cloudera.org:8080/18226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e62a1fb9753ea1b022c7763047d9ccfd1d27d62
Gerrit-Change-Number: 18226
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Feb 2022 12:18:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11137: Enable proleptic Gregorian Calendar for Hive

2022-02-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18262 )

Change subject: IMPALA-11137: Enable proleptic Gregorian Calendar for Hive
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6be9c9720dd352d6821cdaa6c64d35ba20473bc0
Gerrit-Change-Number: 18262
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 10:43:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11008: fix incorrect to propagate inferred predicates

2022-02-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18234 )

Change subject: IMPALA-11008: fix incorrect to propagate inferred predicates
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e64230f6d0c2b9ef1560186ceba349a5920ccdf
Gerrit-Change-Number: 18234
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Wed, 23 Feb 2022 10:38:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-23 Thread Code Review
Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..


Patch Set 21:

(12 comments)

> > > (8 comments)
 > > >
 > > > Thank you for taking this on. There is a lot of history here.
 > > > Originally, the file handle cache used a generic structure like
 > > > this:
 > > > https://github.com/apache/impala/blob/branch-2.10.0/be/src/util/lru-cache.h
 > > >
 > > > In my rewrite, I switched it to remove the generic structure.
 > > This
 > > > heads back in the other direction.
 > > >
 > > > I like that you have backend tests for the generic data
 > > structure,
 > > > which is definitely one advantage of that approach. One
 > question
 > > I
 > > > have about moving back to a generic structure is whether we
 > would
 > > > be able to add new customization to the file handle cache case.
 > I
 > > > had been thinking about adding a file structure that could
 > > contain
 > > > additional per-file data and/or stats. Is that possible with
 > the
 > > > new generic structure?
 > >
 > > "had been thinking about adding a file structure that could
 > contain
 > > additional per-file data and/or stat"
 > >
 > > I was thinking about similar things (e.g. caching processed
 > > Parquet/ORC headers), but this seems a somewhat different feature
 > > to me - while we want to cache more than one file handle per file
 > > and apply LRU logic per handle, we want to cache data for a file
 > > only once and apply LRU logic per file.
 >
 > Yeah, it's unclear whether we would ever want to extend the file
 > handle cache
 > to deal with other things. Separate data structures may be cleaner
 > even
 > if it means duplicating filename strings or other things. The file
 > handle
 > cache is pretty unusual in structure and historically we haven't
 > extended
 > it. I don't have any strong objection to a generic structure. I
 > just wanted
 > to think through whether there are any extensions that would end up
 > getting more complicated.
 >
 > For more ordinary caches that don't need duplication, we should be
 > using the
 > cache implementations in be/src/util/cache, because that also gets
 > us different
 > cache eviction policies.

Moving the caching to a separate class definitely helps with testing.

I think making it generic helps with encapsulation too, as the caching 
algorithm (even if it is somewhat specialized to a use case) has nothing to do 
with the stored data. Moreover, making it generic helped with unit testing too, 
as I didn't have to juggle around file handles to test out the caching feature.

As I understand, the requirements were simple enough to fit it in a generic 
structure. This would help in the future to decide - if we ever want to use 
something similar to this - whether if it's easier to extend with some 
configurability or not. We can specialize this more towards file handle 
caching, although the arguments for templated key/value are still applicable in 
that case, maybe we can change the name and place of the code for something 
less generic.

Regarding to per-file data/stats, we could squeeze it in (e.g. 
unordered_map> but I don't think that is a 
good direction, it breaks the encapsulation of the cache. I would rather put a 
container in FileHandleCache class next to the cache and do handle based 
operations/metrics during creation (GetFileHandle() new entry branch) and do 
access based operation/metrics in FileHandleCache::Accessor

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.h
File be/src/runtime/io/handle-cache.h:

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.h@118
PS21, Line 118: return file_handle.mtime() == mtime;
> The semantics of mtime  is unclear to me - why do we handle it separately a
Very good point, I think I got stuck on earlier design :)


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h
File be/src/runtime/io/handle-cache.inline.h:

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@167
PS21, Line 167:   // Opening a file handle requires talking to the NameNode so 
it can take some time.
  :   RETURN_IF_ERROR(accessor_tmp.Get()->Init(hdfs_monitor_));
> Let me double-check my understanding of the threading here:
Yes, it will be in_use, does not lock the cache and not available for other 
threads.


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@167
PS21, Line 167:   // Opening a file handle requires talking to the NameNode so 
it can take some time.
  :   RETURN_IF_ERROR(accessor_tmp.Get()->Init(hdfs_monitor_));
> Another question: How does this work if this call fails? Does the entry get
Very good point, now it will be released back to the cache, I will add a destroy



[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7864/


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 23 Feb 2022 09:33:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11112: Impala can't resolve json tables created by Hive

2022-02-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18263 )

Change subject: IMPALA-2: Impala can't resolve json tables created by Hive
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10212/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cac55b58dca88d900db3256ceaa25c17d7864d5
Gerrit-Change-Number: 18263
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Feb 2022 08:14:34 +
Gerrit-HasComments: No