[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-24 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21441 )

Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..


Patch Set 5:

> Patch Set 4:
>
> > Patch Set 4: Verified-1
> >
> > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10665/
>
> Tests show some real failures: 
> https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/2665/testReport/

Related tests have been updated in PS5. We use boost::lexical_cast to cast 
double/float to string in TestCast, which also loses precisions. So I use 
string values directly for comparison.


--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Fri, 24 May 2024 14:59:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-24 Thread Yifan Zhang (Code Review)
Hello Daniel Becker, Gabor Kaszab, Csaba Ringhofer, Michael Smith, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21441

to look at the new patch set (#5).

Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..

IMPALA-12562: Cast double and float to string with exact presicion

The builtin functions casttostring(DOUBLE) and casttostring(FLOAT)
printed more digits when converting double and float values to
string values. This patch fixes this by switching to use the existing
methods DoubleToBuffer and FloatToBuffer, which are simple and fast
implementations to print necessary digits.

Testing:
  - Add end-to-end tests to verify the fixes
  - Add benchmarks for modified functions
  - Update tests in expr-test

Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
4 files changed, 173 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/5
--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-23 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21441 )

Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21441/3/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/21441/3/be/src/exprs/cast-functions-ir.cc@352
PS3, Line 352: strlen
> strlen could be avoid by saving the original pointer and comparing the diff
It seems that the returned pointer doesn't point to the terminating null byte 
and it can't be used to compare with the original pointer.

I also found this strlen redundant, since snprintf returns the length of the 
buffer. Maybe we have to use strlen if we don't want to change implementations 
in gutil/strings.



--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 23 May 2024 08:33:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-22 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21441 )

Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21441/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21441/2//COMMIT_MSG@12
PS2, Line 12: le a
> Can you provide some benchmarks? I expect it to be an improvement, but it w
I added some benchmarks, I guess this could bring a slight performance 
degradation since it does more work to print more accurate results.

According to comments in gutil/strings/numbers.cc, these implementations are 
about as fast as other strategies and do not need to introduce a new library.


http://gerrit.cloudera.org:8080/#/c/21441/2/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/21441/2/be/src/exprs/cast-functions-ir.cc@344
PS2, Line 344: if (val.is_null) return StringVal::null();   
  \
> At this point no allocation happened yet so this check is not useful.
Done


http://gerrit.cloudera.org:8080/#/c/21441/2/be/src/exprs/cast-functions-ir.cc@355
PS2, Line 355: sary(ctx->
> I would prefer using DoubleToBuffer / FloatToBuffer as it avoids the extra
Done



--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 22 May 2024 09:56:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-22 Thread Yifan Zhang (Code Review)
Hello Daniel Becker, Gabor Kaszab, Csaba Ringhofer, Michael Smith, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21441

to look at the new patch set (#3).

Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..

IMPALA-12562: Cast double and float to string with exact presicion

The builtin functions casttostring(DOUBLE) and casttostring(FLOAT)
printed more digits when converting double and float values to
string values. This patch fixes this by switching to use the existing
methods DoubleToBuffer and FloatToBuffer, which are simple and fast
implementations to print necessary digits.

Testing:
  - Add end-to-end tests to verify the fixes
  - Add benchmarks for modified functions

Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
3 files changed, 135 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/3
--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-17 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21441 )

Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..


Patch Set 2:

(1 comment)

> Patch Set 1: Code-Review+1
>
> (1 comment)
>
> Thanks for fixing this!

Thanks for your review!

http://gerrit.cloudera.org:8080/#/c/21441/1/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/21441/1/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3300
PS1, Line 3300: select cast(round(cast(1.33 as double), 2) as string);
> A few more test cases around the limits of large/small number of decimals w
Done



--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Sat, 18 May 2024 01:14:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-17 Thread Yifan Zhang (Code Review)
Hello Daniel Becker, Gabor Kaszab, Csaba Ringhofer, Michael Smith, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21441

to look at the new patch set (#2).

Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..

IMPALA-12562: Cast double and float to string with exact presicion

The builtin functions casttostring(DOUBLE) and casttostring(FLOAT)
printed more digits when converting double and float values to
string values. This patch fixes this by switching to use the existing
methods SimpleDtoa and SimpleFtoa, which are simple and fast
implementations to print necessary digits.

Testing:
  - Add end-to-end tests to query_test/test_exprs.py

Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
---
M be/src/exprs/cast-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
2 files changed, 104 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/2
--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion

2024-05-17 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21441


Change subject: IMPALA-12562: Cast double and float to string with exact 
presicion
..

IMPALA-12562: Cast double and float to string with exact presicion

The builtin functions casttostring(DOUBLE) and casttostring(FLOAT)
printed more digits when converting double and float values to
string values. This patch fixes this by switching to use the existing
methods SimpleDtoa and SimpleFtoa, which are simple and fast
implementations to print necessary digits.

Testing:
  - Add end-to-end tests to query_test/test_exprs.py

Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
---
M be/src/exprs/cast-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
2 files changed, 43 insertions(+), 25 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/1
--
To view, visit http://gerrit.cloudera.org:8080/21441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Gerrit-Change-Number: 21441
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR](branch-3.4.2) IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2024-04-09 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21262 )

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-3.4.2
Gerrit-MessageType: comment
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 21262
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 09 Apr 2024 08:19:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent

2024-03-27 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21090 )

Change subject: IMPALA-12852: Make Kudu service start and stop independent
..


Patch Set 2:

> Patch Set 2:
>
> Please rename the new scripts to start-kudu and stop-kudu

Considering that the other scripts in the folder 'testdata/bin/' used for 
starting and stoping cluster services are named ‘run-xxx.sh' and 'kill-xxx.sh', 
I'd like to make new scripts consistent in naming.


--
To view, visit http://gerrit.cloudera.org:8080/21090
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201
Gerrit-Change-Number: 21090
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 27 Mar 2024 09:26:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent

2024-03-26 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21090

to look at the new patch set (#2).

Change subject: IMPALA-12852: Make Kudu service start and stop independent
..

IMPALA-12852: Make Kudu service start and stop independent

This patch decouples run-kudu.sh and kill-kudu.sh from run-mini-dfs.sh
and kill-mini-dfs.sh. These scripts can be useful for setting up test
environments that require no or only Kudu service.

Testing:
  - Ran the modified and new scripts and checked they worked as expected.

Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201
---
A testdata/bin/kill-kudu.sh
M testdata/bin/run-all.sh
A testdata/bin/run-kudu.sh
M testdata/cluster/admin
4 files changed, 123 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21090/2
--
To view, visit http://gerrit.cloudera.org:8080/21090
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201
Gerrit-Change-Number: 21090
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile

2024-02-29 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21063 )

Change subject: IMPALA-12834: Add number of concurrent queries to profile
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21063/3/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/21063/3/be/src/scheduling/admission-controller.cc@189
PS3, Line 189: Number of running queries in designated executor group when ad
> I think this description is somewhat misleading, its actual meaning seems t
The description is updated.

Collecting the query load during the execution of a query is a good idea, but 
it's hard to define and calculate an average value.



--
To view, visit http://gerrit.cloudera.org:8080/21063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
Gerrit-Change-Number: 21063
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 29 Feb 2024 10:14:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile

2024-02-29 Thread Yifan Zhang (Code Review)
Hello Zihao Ye, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21063

to look at the new patch set (#4).

Change subject: IMPALA-12834: Add number of concurrent queries to profile
..

IMPALA-12834: Add number of concurrent queries to profile

This patch adds profile info string for the number of current running
queries of the executor group on which the query is scheduled, to
diagnose potential performance issues due to resource limit.

Testing:
- Add an e2e test to verify the information appears in profile

Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
---
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M tests/custom_cluster/test_admission_controller.py
3 files changed, 43 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/4
--
To view, visit http://gerrit.cloudera.org:8080/21063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
Gerrit-Change-Number: 21063
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent

2024-02-29 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21090


Change subject: IMPALA-12852: Make Kudu service start and stop independent
..

IMPALA-12852: Make Kudu service start and stop independent

This patch decouples run-kudu.sh and kill-kudu.sh from run-mini-dfs.sh
and kill-mini-dfs.sh. These scripts can be useful for setting up test
environments that require no or only Kudu service.

Testing:
  - Ran the modified and new scripts and checked they worked as expected.

Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201
---
A testdata/bin/kill-kudu.sh
A testdata/bin/run-kudu.sh
M testdata/cluster/admin
3 files changed, 118 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21090/1
--
To view, visit http://gerrit.cloudera.org:8080/21090
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201
Gerrit-Change-Number: 21090
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile

2024-02-26 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21063 )

Change subject: IMPALA-12834: Add number of concurrent queries to profile
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.h@1171
PS2, Line 1171: for the given executor group.
> nit: for the given executor group
Done


http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.cc@189
PS2, Line 189: designate
> nit: designated ?
Done


http://gerrit.cloudera.org:8080/#/c/21063/2/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/21063/2/tests/custom_cluster/test_admission_controller.py@928
PS2, Line 928: "Admission result: Admitted im
> nit: indent spaces
Done



--
To view, visit http://gerrit.cloudera.org:8080/21063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
Gerrit-Change-Number: 21063
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 27 Feb 2024 03:05:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile

2024-02-26 Thread Yifan Zhang (Code Review)
Hello Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21063

to look at the new patch set (#3).

Change subject: IMPALA-12834: Add number of concurrent queries to profile
..

IMPALA-12834: Add number of concurrent queries to profile

This patch adds profile info string for the number of current running
queries of the executor group on which the query is scheduled, to
diagnose potential performance issues due to resource limit.

Testing:
- Add an e2e test to verify the information appears in profile

Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
---
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M tests/custom_cluster/test_admission_controller.py
3 files changed, 43 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/3
--
To view, visit http://gerrit.cloudera.org:8080/21063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
Gerrit-Change-Number: 21063
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile

2024-02-26 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21063

to look at the new patch set (#2).

Change subject: IMPALA-12834: Add number of concurrent queries to profile
..

IMPALA-12834: Add number of concurrent queries to profile

This patch adds profile info string for the number of current running
queries of the executor group on which the query is scheduled, to
diagnose potential performance issues due to resource limit.

Testing:
- Add an e2e test to verify the information appears in profile

Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
---
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M tests/custom_cluster/test_admission_controller.py
3 files changed, 43 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/2
--
To view, visit http://gerrit.cloudera.org:8080/21063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
Gerrit-Change-Number: 21063
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile

2024-02-23 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21063


Change subject: IMPALA-12834: Add number of concurrent queries to profile
..

IMPALA-12834: Add number of concurrent queries to profile

This patch adds profile info string for the number of current running
queries of the executor group on which the query is scheduled, to
diagnose potential performance issues due to resource limit.

Testing:
- add an e2e test to verify the information appears in profile

Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
---
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M tests/custom_cluster/test_admission_controller.py
3 files changed, 42 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/1
--
To view, visit http://gerrit.cloudera.org:8080/21063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658
Gerrit-Change-Number: 21063
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12801: Increase query log default size and bound its memory.

2024-02-19 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21020 )

Change subject: IMPALA-12801: Increase query_log_ default size and bound its 
memory.
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602
Gerrit-Change-Number: 21020
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 20 Feb 2024 03:09:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12426: Adds the Impala built-in functions prettyprint duration and prettyprint memory.

2024-02-18 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21038 )

Change subject: IMPALA-12426: Adds the Impala built-in functions 
prettyprint_duration and prettyprint_memory.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml@1183
PS4, Line 1183: PRETTYPRINT_MEMORY
nit: Maybe we can rename it to 'prettyprint_size' or 'prettyprint_bytes'?



--
To view, visit http://gerrit.cloudera.org:8080/21038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913
Gerrit-Change-Number: 21038
Gerrit-PatchSet: 4
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Sun, 18 Feb 2024 10:14:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12801: Increase query log size from 100 to 200.

2024-02-18 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21020 )

Change subject: IMPALA-12801: Increase query_log_size from 100 to 200.
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602
Gerrit-Change-Number: 21020
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sun, 18 Feb 2024 10:13:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12801: Increase query log size from 100 to 200.

2024-02-17 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21020 )

Change subject: IMPALA-12801: Increase query_log_size from 100 to 200.
..


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21020/4/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21020/4/www/queries.tmpl@30
PS4, Line 30: The size of that archive is controlled with the
: --query_log_size command line parameter.
nit: This should be updated.



--
To view, visit http://gerrit.cloudera.org:8080/21020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602
Gerrit-Change-Number: 21020
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sun, 18 Feb 2024 04:01:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-02-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20804/15/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20804/15/be/src/exec/parquet/hdfs-parquet-scanner.cc@444
PS15, Line 444: if (file_metadata_.num_rows > 0) {
Add this for backward compatibility: See L893 in this file.



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 15
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 05 Feb 2024 04:24:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-02-04 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#15).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Before this patch frontend generates multiple scan ranges for a
parquet file when count star optimization is enabled. Backend function
HdfsParquetScanner::GetNextInternal() also call NextRowGroup()
multiple times to find row groups and sum up RowGroup.num_rows. This
could be inefficient because we only need to read file metadata to
compute count star. This patch optimizes it by creating only one
scan range that contains the file footer for each parquet file.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table are generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/util/parse_util.py
11 files changed, 138 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/15
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 15
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12381: Set JDBC related properties in JDBC data source

2024-02-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20941 )

Change subject: IMPALA-12381: Set JDBC related properties in JDBC data source
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0a0b5d9d7b06825842828c3722c2bcdb4ea8
Gerrit-Change-Number: 20941
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Sun, 04 Feb 2024 09:59:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-02-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has abandoned this change. ( http://gerrit.cloudera.org:8080/20992 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/20992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Idf1177617477d19a92c4526adac3c486ae65ccd5
Gerrit-Change-Number: 20992
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-02-04 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#14).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Before this patch frontend generates multiple scan ranges for a
parquet file when count star optimization is enabled. Backend function
HdfsParquetScanner::GetNextInternal() also call NextRowGroup()
multiple times to find row groups and sum up RowGroup.num_rows. This
could be inefficient because we only need to read file metadata to
compute count star. This patch optimizes it by creating only one
scan range that contains the file footer for each parquet file.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table are generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/util/parse_util.py
11 files changed, 144 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/14
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 14
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-02-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20992


Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Before this patch frontend generates multiple scan ranges for a
parquet file when count star optimization is enabled. Backend function
HdfsParquetScanner::GetNextInternal() also call NextRowGroup()
multiple times to find row groups and sum up RowGroup.num_rows. This
could be inefficient because we only need to read file metadata to
compute count star. This patch optimizes it by creating only one
scan range that contains the file footer for each parquet file.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table are generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd

rm query option

Change-Id: Idf1177617477d19a92c4526adac3c486ae65ccd5
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/util/parse_util.py
11 files changed, 144 insertions(+), 71 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/20992/1
--
To view, visit http://gerrit.cloudera.org:8080/20992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idf1177617477d19a92c4526adac3c486ae65ccd5
Gerrit-Change-Number: 20992
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12381: Set JDBC related properties in JDBC data source

2024-02-02 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20941 )

Change subject: IMPALA-12381: Set JDBC related properties in JDBC data source
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20941/2/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
File fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:

http://gerrit.cloudera.org:8080/#/c/20941/2/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java@324
PS2, Line 324: if (Strings.isNullOrEmpty(tblInitString)) {
 :   if (dsPropertyMap != null) {
 : // Change keys to lower case.
 : for (Map.Entry entry : 
dsPropertyMap.entrySet()) {
 :   combinedPropertyMap.put(entry.getKey().toLowerCase(), 
entry.getValue());
 : }
 :   }
 : } else {
 :   // Append additional properties of DataSource to 
initString.
 :   try {
 : Map tblPropertyMap =
 : JsonUtil.convertJSonToPropertyMap(tblInitString);
 : if (tblPropertyMap != null) {
 :   // Change keys to lower case.
 :   for (Map.Entry entry : 
tblPropertyMap.entrySet()) {
 : 
combinedPropertyMap.put(entry.getKey().toLowerCase(), entry.getValue());
 :   }
 : }
 :   } catch (ImpalaRuntimeException e) {
 : // Return initString which is set in the table creation 
statement if it's
 : // invalid JSON string. This could happen for non JDBC 
data source.
 : return tblInitString;
 :   }
 :   if (dsPropertyMap != null) {
 : for (Map.Entry entry : 
dsPropertyMap.entrySet()) {
 :   if 
(!combinedPropertyMap.containsKey(entry.getKey().toLowerCase())) {
 : 
combinedPropertyMap.put(entry.getKey().toLowerCase(), entry.getValue());
 :   }
 : }
 :   }
 : }
Can we reorganize this to:

if (dsPropertyMap != null) {
  // Change keys to lower case.
  for (Map.Entry entry : dsPropertyMap.entrySet()) {
  combinedPropertyMap.put(entry.getKey().toLowerCase(), 
entry.getValue());
  }
}
if (Strings.isNullOrEmpty(tblInitString)) {
  Map tblPropertyMap =
JsonUtil.convertJSonToPropertyMap(tblInitString);
  ...
}

That would be more readable.



--
To view, visit http://gerrit.cloudera.org:8080/20941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0a0b5d9d7b06825842828c3722c2bcdb4ea8
Gerrit-Change-Number: 20941
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Fri, 02 Feb 2024 11:53:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building

2024-01-30 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20965 )

Change subject: IMPALA-12762: Fix cmake error in package building
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh
File bin/jenkins/build-all-flag-combinations.sh:

http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh@42
PS1, Line 42: notests -
> Let's change this to "notests" so new changes won't break the build again.
Done


http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh@50
PS1, Line 50: notests -
> Let's change this to "notests" as well.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20965
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Gerrit-Change-Number: 20965
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Jan 2024 08:31:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building

2024-01-30 Thread Yifan Zhang (Code Review)
Hello Quanlong Huang, Zihao Ye, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20965

to look at the new patch set (#2).

Change subject: IMPALA-12762: Fix cmake error in package building
..

IMPALA-12762: Fix cmake error in package building

This patch adds extra processing of option 'BUILD_WITH_NO_TESTS' in
be/src/exec/json/CMakeLists.txt, so test targets will not be generated
by the CMake when building Impala with -package and -notests.

Testing:
  - Run './buildall.sh -noclean -notests -package' with no error

Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
---
M be/src/exec/json/CMakeLists.txt
M bin/jenkins/build-all-flag-combinations.sh
2 files changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/20965/2
--
To view, visit http://gerrit.cloudera.org:8080/20965
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Gerrit-Change-Number: 20965
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building

2024-01-29 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20965


Change subject: IMPALA-12762: Fix cmake error in package building
..

IMPALA-12762: Fix cmake error in package building

This patch adds extra processing of option 'BUILD_WITH_NO_TESTS' in
be/src/exec/json/CMakeLists.txt, so test targets will not be generated
by the CMake when building Impala with -package and -notests.

Testing:
  - Run './buildall.sh -noclean -notests -package' with no error

Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
---
M be/src/exec/json/CMakeLists.txt
1 file changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/20965/1
--
To view, visit http://gerrit.cloudera.org:8080/20965
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Gerrit-Change-Number: 20965
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-22 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20804/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20804/12//COMMIT_MSG@16
PS12, Line 16: A new query option parquet_count_star_use_file_metadata is added 
for
 : forward compatibility. Its default value is true, if any 
inconsistency
 : between FileMetaData.num_rows and RowGroup.num_rows is found, we 
can
 : set it to false to get same results as old versions.
> Probably that would be a corrupt Parquet file. But if we are afraid of inco
Yeah. I adjusted it to sum RowGroup.num_rows in PS13 and got the same 
performance improvement by running the single node perf test.

Then I think we do not need to introduce this new query option since no 
behavior changes are made. What do you think?


http://gerrit.cloudera.org:8080/#/c/20804/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20804/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1452
PS12, Line 1452:   if (isFooterOnly) {
   : // Only generate one scan range for footer only 
scans.
   : currentOffset += remainingLength - currentLength;
   : remainingLength = currentLength;
   :   }
> Why do we need to this now? We didn't do that for partition key scans.
For count star optimization scans, it's not a zero-slot scan, we have one slot 
for num rows statistic. But a partition scan is a zero-slot scan. We create a 
footer range for every scan range if it is not a zero-slot scan in 
HdfsScanner::IssueFooterRanges().


http://gerrit.cloudera.org:8080/#/c/20804/12/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/20804/12/tests/query_test/test_aggregation.py@275
PS12, Line 275:
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/20804/12/tests/query_test/test_aggregation.py@277
PS12, Line 277:
> flake8: E501 line too long (91 > 90 characters)
Done



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 13
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Jan 2024 09:02:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-22 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#13).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/query_test/test_aggregation.py
M tests/util/parse_util.py
16 files changed, 177 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/13
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 13
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-18 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 12:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20804/11/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20804/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@445
PS11, Line 445: _file_metadata) {
> What is the reason behind this check?
Done.

When max_scan_range_length is set to a small value, we may generate more than 
one scan range per block. We should also handle the cases. Related checks are 
updated in frontend.


http://gerrit.cloudera.org:8080/#/c/20804/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@447
PS11, Line 447: capacity = 1;
> Add this check before assignment:
Done


http://gerrit.cloudera.org:8080/#/c/20804/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20804/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1260
PS11, Line 1260: If parquet_count_star_use_file_metadata is enabled, we only 
need the
   :   // 'num_ro
> Mention the difference between enabling/disabling parquet_count_star_use_fi
Done


http://gerrit.cloudera.org:8080/#/c/20804/11/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/20804/11/tests/query_test/test_aggregation.py@271
PS11, Line 271: vector.get_value('exec_option')['batch_size'] = 1
  : self.run_test_case('QueryTest/parquet-stats-agg', vector, 
unique_database)
  : 
  : vector.get_value('exec_option')['parquet_count_star_
> If parquet_count_star_use_file_metadata = true becomes default, I'd prefer
Done



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 12
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Jan 2024 09:25:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-18 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#12).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/query_test/test_aggregation.py
M tests/util/parse_util.py
16 files changed, 172 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/12
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 12
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12054: Lazily check Kudu flags in tests

2024-01-15 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20904 )

Change subject: IMPALA-12054: Lazily check Kudu flags in tests
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7a8282b59d72322085c21c70a5019c51b586a52
Gerrit-Change-Number: 20904
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 16 Jan 2024 03:17:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table

2024-01-15 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20837 )

Change subject: IMPALA-12642: Support query options for Impala external JDBC 
table
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20837
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Gerrit-Change-Number: 20837
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Mon, 15 Jan 2024 12:36:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-15 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#11).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/query_test/test_aggregation.py
13 files changed, 331 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/11
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 11
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table

2024-01-12 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20837 )

Change subject: IMPALA-12642: Support query options for Impala external JDBC 
table
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG@11
PS1, Line 11: comma-delimited key=value string,
> A few query options have string type of value like request_pool and client_
I found the query option 'ENABLED_RUNTIME_FILTER_TYPES' with a set type, 
sometimes we could set it with multiple values separated by a comma.

Also, can the string value in 'jdbc.options' also be used to be as settings for 
MySQL or other JDBC tables except for external Impala tables? I'm not sure if 
some setting configs that may contain commas.



--
To view, visit http://gerrit.cloudera.org:8080/20837
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Gerrit-Change-Number: 20837
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Fri, 12 Jan 2024 13:23:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-11 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@473
PS8, Line 473:
> The following query return different output with and without PARQUET_COUNT_
It turned out all 'dst_row' points to the same 'dst_tuple', so we can't reuse 
the same 'tuple_buf' to hold different values. I have updated this and added 
new test cases.



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 10
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Jan 2024 13:09:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-11 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#10).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/query_test/test_aggregation.py
12 files changed, 327 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/10
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 10
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@473
PS8, Line 473: dst_tuple
> I still think there's an issue here. Only tuple_buf is updated in the loop,
There is already a test case covering this, see 'QueryTest/parquet-stats-agg', 
we query multiblock tables including 'functional_parquet.lineitem_multiblock' 
and 'functional_parquet.lineitem_sixblocks'. I have tested it with this patch 
and got the correct results.

I didn't quite understand what you meant. But 'So, in each iteration, data is 
written to the first Tuple' is not true. We created a new 'dst_row' in the 
loop(see L470).



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 9
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 05 Jan 2024 07:07:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@473
PS8, Line 473: dst_tuple
> Is it okay if we don't update dst_tuple inside the loop? If there are multi
Yes, it is. Since 'dst_slot' points to the slot of the 'dst_tuple', we only 
need to update 'dst_slot' in the loop.


http://gerrit.cloudera.org:8080/#/c/20804/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20804/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1274
PS8, Line 1274: isFooterOnly
> nit: This parameter seems unnecessary. Can't we do the same check (countSta
This is to address the review feedback in 
PS7:https://gerrit.cloudera.org/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#1397.

I think wrapping it into a boolean variable makes codes in 
transformBlocksToScanRanges() more readable.



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 8
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 05 Jan 2024 05:09:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2024-01-04 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Zihao Ye, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#9).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/query_test/test_aggregation.py
12 files changed, 316 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/9
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 9
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table

2024-01-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20837 )

Change subject: IMPALA-12642: Support query options for Impala external JDBC 
table
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20837
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Gerrit-Change-Number: 20837
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Fri, 05 Jan 2024 02:47:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table

2024-01-02 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20837 )

Change subject: IMPALA-12642: Support query options for Impala external JDBC 
table
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20837
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Gerrit-Change-Number: 20837
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 03 Jan 2024 02:22:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table

2024-01-02 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20837 )

Change subject: IMPALA-12642: Support query options for Impala external JDBC 
table
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG@11
PS1, Line 11: comma-delimited key=value string,
nit: Just curious about whether is there a case that the value of the option 
contains commas.



--
To view, visit http://gerrit.cloudera.org:8080/20837
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Gerrit-Change-Number: 20837
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 02 Jan 2024 09:26:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12665: Update ScratchMicroBatch length to new scratch batch ->capacity after ScratchTupleBatch::Reset

2023-12-26 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20832 )

Change subject: IMPALA-12665: Update ScratchMicroBatch length to new 
scratch_batch_->capacity after ScratchTupleBatch::Reset
..


Patch Set 2:

(2 comments)

I think tests are needed to verify the problem is fixed, and we should also 
check whether orc scanner also has this problem.

http://gerrit.cloudera.org:8080/#/c/20832/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20832/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-12665: Update ScratchMicroBatch length to new 
scratch_batch_->capacity after ScratchTupleBatch::Reset
You need to add more information in the commit message, which should include 
what the problem was, and how it was fixed.


http://gerrit.cloudera.org:8080/#/c/20832/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20832/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@2504
PS2, Line 2504:   // Update length to new scratch_batch_->capacity after 
ScratchTupleBatch::Reset
Can we update ScratchMicroBatch just after calling ScratchTupleBatch::Reset()?



--
To view, visit http://gerrit.cloudera.org:8080/20832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75e19c4e7a566441510a1172ea01537046c5c885
Gerrit-Change-Number: 20832
Gerrit-PatchSet: 2
Gerrit-Owner: Zinway 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 26 Dec 2023 12:28:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-25 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20804/7/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20804/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@480
PS7, Line 480: // There are no materialized slots and we are not optimizing 
coun
> Is it possible to unify the code here?
Done


http://gerrit.cloudera.org:8080/#/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@426
PS7, Line 426: // compute scan range locations with optional sampling
 : computeScanRangeLocations(analyzer);
 :
 : if (hasParquet(fileFormats_)) {
 :   // Compute min-max conjuncts only if the 
PARQUET_READ_STATISTICS query option is
 :   // set to true.
 :   if (analyzer.getQueryOptions().parquet_read_statistics) {
 : computeStatsTupleAndConjuncts(analyzer);
 :   }
 :   // Compute dictionary conjuncts only if the 
PARQUET_DICTIONARY_FILTERING query
 :
> Is it OK to move these into computeScanRangeLocations?
Done


http://gerrit.cloudera.org:8080/#/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1396
PS7, Line 1396:
  :* Given a fileDesc of partition, transforms the blocks into 
TScanRanges. Eac
> Contain this into a boolean variable. Possibly at caller of transformBlocks
Done


http://gerrit.cloudera.org:8080/#/c/20804/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test:

http://gerrit.cloudera.org:8080/#/c/20804/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test@10
PS7, Line 10:  RUNTIME_PROFILE
: aggregation(SUM, NumRowGroups): 24
: aggregation(SUM, NumFileMetadataRead): 24
: aggregation(SUM, RowsRead): 0
> For modified tests here, can you also add the same test with parquet_count_
Done



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 8
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Mon, 25 Dec 2023 11:28:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-25 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#8).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Added new test cases to query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
M tests/query_test/test_aggregation.py
12 files changed, 316 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/8
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 8
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12581: Fix issue of ILIKE and IREGEXP not working correctly with non-const pattern

2023-12-22 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20785 )

Change subject: IMPALA-12581: Fix issue of ILIKE and IREGEXP not working 
correctly with non-const pattern
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d66680f5a7660e6a41859754c4230f276e66712
Gerrit-Change-Number: 20785
Gerrit-PatchSet: 4
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 22 Dec 2023 08:43:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-21 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20804/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20804/5//COMMIT_MSG@30
PS5, Line 30: TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / 
none / none | 0.17   | 0.16
> Any idea why this benchmark does not improve? I thought this patch should i
I checked profiles without this patch and found that 'NumFileMetadataRead' is 
the same as 'NumRowGroups'. In this case, the calculation cost is not saved.


http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@464
PS5, Line 464: int capacity = min(
 : static_cast(file_metadata_.row_groups.size()), 
row_batch->capacity());
 : 
RETURN_IF_ERROR(RowBatch::ResizeAndAllocateTupleBuffer(state_,
 : row_batch->tuple_data_pool(), 
row_batch->row_desc()->GetRowSize(),
 : , _buf_size, _buf));
> Unnecessary changes?
Done


http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/service/query-options.cc@1198
PS5, Line 1198:   case 
TImpalaQueryOptions::PARQUET_COUNT_STAR_USE_FILE_METADATA: {
  : 
query_options->__set_parquet_count_star_use_file_metadata(IsTrue(value));
  : break;
  :   }
> Option parsing does not look right to me.
Fixed


http://gerrit.cloudera.org:8080/#/c/20804/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20804/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@a411
PS5, Line 411:
> I think canApplyCountStarOptimization should not change, especially this ch
Done



--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Fri, 22 Dec 2023 06:12:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-21 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#7).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
10 files changed, 119 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/7
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-21 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#6).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
10 files changed, 119 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/6
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-21 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#5).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
10 files changed, 124 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/5
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-21 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 5:

I chose to add a query option instead of a backend flag to control whether to 
enable this optimization. The reason is:
- This patch also contains changes on frontend.
- Different configurations for backends in a cluster can lead to incorrect 
query results.


--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 21 Dec 2023 12:35:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-21 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#4).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls, generates and processes only one
footer range per file.

A new query option parquet_count_star_use_file_metadata is added for
forward compatibility. Its default value is true, if any inconsistency
between FileMetaData.num_rows and RowGroup.num_rows is found, we can
set it to false to get same results as old versions.

The following table shows a performance comparison before and after
the patch. primitive_count_star_multiblock query is a modified
primitive_count_star query that targets a multi-block
tpch10_parquet.lineitem table. The files of the table is generated
by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`.

+---+-+---++-++++---++-++
| Workload  | Query   | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 
0.17   | 0.16|   +2.58%   | * 29.53% * | * 27.16% * | 30|   
+1.20%   | 0.58| 0.35   |
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED   | parquet / none / none | 
0.27   | 0.26|   +2.96%   |   8.97%|   9.94%| 30|   
+0.16%   | 0.44| 1.19   |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 
0.18   | 0.18|   -0.69%   |   1.65%|   1.99%| 30|   
-0.34%   | -1.55   | -1.47  |
| TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 
0.06   | 0.12| I -49.88%  |   4.11%|   3.53%| 30| I 
-99.97%  | -6.54   | -66.81 |
+---+-+---++-++++---++-++

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test
10 files changed, 124 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/4
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-19 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 3:

> Since this will be a behavior change, do you mind adding a backend flag to 
> control this? Default to count using FileMetaData.num_rows, but back to 
> RowGroups.num_rows when flag is disabled. This way, user can revert to old 
> behavior if they do hit an inaccurate FileMetaData.num_rows issue.
>
> Basic performance benchmark is also desirable to ensure no regression happen 
> like IMPALA-11123. Maybe you can steal TPCDS-Q_COUNT_OPTIMIZED, 
> TPCDS-Q_COUNT_UNOPTIMIZED, and TPCDS-Q_COUNT_ZERO_SLOT from 
> https://gerrit.cloudera.org/c/19927 and run single_node_perf_run.py such as:
>
> ./bin/single_node_perf_run.py --num_impalads=3 \
> --workloads=tpcds --iterations=9 --table_formats=parquet/none/none \
> 
> --query_names=TPCDS-Q_COUNT_OPTIMIZED,TPCDS-Q_COUNT_UNOPTIMIZED,TPCDS-Q_COUNT_ZERO_SLOT
>  \
> asf-master 
> 
> Even better if you can do it with larger scale TPC-DS like 10GB:
>
> ./bin/single_node_perf_run.py --num_impalads=3 --load --scale=10 \
> --workloads=tpcds --iterations=9 --table_formats=parquet/none/none \
> 
> --query_names=TPCDS-Q_COUNT_OPTIMIZED,TPCDS-Q_COUNT_UNOPTIMIZED,TPCDS-Q_COUNT_ZERO_SLOT
>  \
> asf-master 
>
> Using tpch_parquet.lineitem should be fine as well.

Thanks for the guidance! I'll try to add a backend flag and do some performance 
tests.


--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 19 Dec 2023 09:33:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-18 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20804 )

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..


Patch Set 3:

> Patch Set 3:
>
> I think there is a strong reason why Impala trust RowGroups.num_rows more 
> than FileMetaData.num_rows. Maybe there are still invalid parquet files out 
> there that is written with mismatched FileMetaData.num_rows.
>
> See:
> https://issues.apache.org/jira/browse/IMPALA-3943
> https://issues.apache.org/jira/browse/IMPALA-2230

Thanks for your reply, Riza!
I investigated the issues, especially IMPALA-3943. For parquet scans, Impala 
now treats the file with FileMetaData.num_rows=0 as an empty file, see: 
https://github.com/apache/impala/blob/4114fe8db6ec80b2e1679e946555f91ab7043f2e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L895C1-L898.
 And I also found other places that using FileMetaData.num_rows to generate 
query results for metadata only queries, see: 
https://github.com/apache/impala/blob/4114fe8db6ec80b2e1679e946555f91ab7043f2e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L477-L481.
 Besides, Impala-3943 also added warning logs for inconsistency of 
FileMetaData.num_rows and RowGroups.num_rows.

So I think using FileMetaData.num_rows for count star optimizations should be 
acceptable.


-- 
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Mon, 18 Dec 2023 09:56:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-16 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#3).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in Parquet 'RowGroup.num_rows' field to compute count star,
it still needs to find row groups and sum all 'RowGroup.num_rows'.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls. Then each file only needs to
be processed once. The planner is also modified to generate only one
scan range per file.

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
3 files changed, 70 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/3
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-15 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#2).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls. Then each file only needs to
be processed once. The planner is also modified to generate only one
scan range per file.

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
3 files changed, 70 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/2
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans

2023-12-15 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20804


Change subject: IMPALA-12631: Improve count star performance for parquet scans
..

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls. Then each file only needs to
be processed once. The planner is also modified to generate only one
scan range per file.

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
3 files changed, 68 insertions(+), 35 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/1
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20773/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20773/2//COMMIT_MSG@11
PS2, Line 11: prevent users from del
> nit: prevent users from deleting
Done


http://gerrit.cloudera.org:8080/#/c/20773/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20773/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2683
PS2, Line 2683: dropTablesFromKudu(db, kudu_table_reserve_seconds)
> What's behavior of Kudu engine for soft table deletion when the database is
The managed Kudu tables will be in the 'soft-deleted' state for the reservation 
period, during this time the tables can be recovered by calling Kudu's 'recall 
table' API.



--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 07:29:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Yifan Zhang (Code Review)
Hello Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20773

to look at the new patch set (#4).

Change subject: IMPALA-12229: Support soft-delete Kudu table
..

IMPALA-12229: Support soft-delete Kudu table

Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users from deleting important Kudu tables
by mistake.

Testing:
- Added e2e tests.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M infra/python/deps/kudu-requirements.txt
M tests/query_test/test_kudu.py
10 files changed, 112 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/4
--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Yifan Zhang (Code Review)
Hello Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20773

to look at the new patch set (#3).

Change subject: IMPALA-12229: Support soft-delete Kudu table
..

IMPALA-12229: Support soft-delete Kudu table

Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users from deleting important Kudu tables
by mistake.

Testing:
- Added e2e tests.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M infra/python/deps/kudu-requirements.txt
M tests/query_test/test_kudu.py
10 files changed, 107 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/3
--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20773

to look at the new patch set (#2).

Change subject: IMPALA-12229: Support soft-delete Kudu table
..

IMPALA-12229: Support soft-delete Kudu table

Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users deleting important Kudu tables
by mistake.

Testing:
- Added an e2e test.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M infra/python/deps/kudu-requirements.txt
M tests/query_test/test_kudu.py
10 files changed, 72 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/2
--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20773


Change subject: IMPALA-12229: Support soft-delete Kudu table
..

IMPALA-12229: Support soft-delete Kudu table

Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users deleting important Kudu tables
by mistake.

Testing:
- Added an e2e test.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M infra/python/deps/kudu-requirements.txt
M tests/query_test/test_kudu.py
10 files changed, 73 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/1
--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12535: Fix misleading metric keys for the threadz page

2023-11-14 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20658 )

Change subject: IMPALA-12535: Fix misleading metric keys for the threadz page
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20658
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15a8cf0a318bc7122d1f5df29f18d8e467249ef7
Gerrit-Change-Number: 20658
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 14 Nov 2023 12:44:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12544: Add additional query progress reporting for the shell

2023-11-08 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20672 )

Change subject: IMPALA-12544: Add additional query progress reporting for the 
shell
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20672/5/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/20672/5/shell/impala_shell.py@1317
PS5, Line 1317:
nit: Remove this blank?



--
To view, visit http://gerrit.cloudera.org:8080/20672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11a704885505442b7499a026fcee3b86696cd064
Gerrit-Change-Number: 20672
Gerrit-PatchSet: 5
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 09 Nov 2023 07:28:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12377: Improve count(*) performance for jdbc external table

2023-11-08 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20653 )

Change subject: IMPALA-12377: Improve count(*) performance for jdbc external 
table
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20653
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9953dca949eb773022f1d6dcf48d8877857635d6
Gerrit-Change-Number: 20653
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 09 Nov 2023 07:19:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12377: Improve count(*) performance for jdbc external table

2023-11-05 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20653 )

Change subject: IMPALA-12377: Improve count(*) performance for jdbc external 
table
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20653
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9953dca949eb773022f1d6dcf48d8877857635d6
Gerrit-Change-Number: 20653
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Sun, 05 Nov 2023 06:46:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12376: DataSourceScanNode drop some returned rows

2023-11-02 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20636 )

Change subject: IMPALA-12376: DataSourceScanNode drop some returned rows
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I978d0a65faa63a47ec86a0127c0bee8dfb79530b
Gerrit-Change-Number: 20636
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 02 Nov 2023 06:26:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-10-11 Thread Yifan Zhang (Code Review)
Yifan Zhang has abandoned this change. ( http://gerrit.cloudera.org:8080/20273 )

Change subject: IMPALA-12312: Using correct executor group set info for planning
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-09-26 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20273 )

Change subject: IMPALA-12312: Using correct executor group set info for planning
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc@671
PS6, Line 671: // Add a default exec group set if no expected group sets were 
specified.
 : exec_group_sets.emplace_back();
 : 
exec_group_sets.back().__set_expected_num_executors(FLAGS_num_expected_executors);
> Agree with Riza. We only add 'default' EG if expected EGs are not set.
With and without this change, if some backends/executors in a cluster are 
configured with '--executor_groups=default' or configured without setting the 
'executor_groups' flag(an empty string in this flag means 'default'), the 
coordinator can find the 'default' group in 'all_groups'. Actually, all 
executors and executor groups in the cluster can be found in the cluster 
membership snapshot.

Impala now(without this change), only sends the 'default' EG to the frontend as 
long as a 'default' group can be found in the cluster membership snapshot, no 
matter whether the flag 'expected_exec_group_sets' is set or not.

IIUC, 'cluster in multiple executor group set mode' means multiple EG sets are 
configured in the startup flag 'expected_exec_group_sets'. Impala should assume 
that the cluster only has one 'default' EG if the flag 
'expected_exec_group_sets' is not set. Is this right? If so, seems we need a 
check on whether EGs in 'expected_exec_group_sets' and EGs in the cluster 
membership snapshot are consistent and default groups should not exist with 
other non-default groups. If not, frontend will see different cluster members 
with backends, and can't make good query plans.



--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 26 Sep 2023 08:56:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-09-22 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20273 )

Change subject: IMPALA-12312: Using correct executor group set info for planning
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20273/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/20273/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1974
PS6, Line 1974: isDefaultExecGroupSet(e)) {
  : result.add(new TExecutorGroupSet(e));
> Wrap this into a function?
Done



--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Fri, 22 Sep 2023 12:51:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-09-22 Thread Yifan Zhang (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20273

to look at the new patch set (#7).

Change subject: IMPALA-12312: Using correct executor group set info for planning
..

IMPALA-12312: Using correct executor group set info for planning

Prior to this patch, planner always selects the default group if there
is a default group in an impala cluster. When a client sets a
non-default request pool, planner still assumes the query run on the
default group and calculates the wrong number of nodes and instances.

This patch fixes it by including both default and non-default groups
in the update message sent from BE to FE, so planner can generate a
plan based on correct executor group set info. Besides, if no matched
executor group is found, planner falls back to using the default group
for planning, which is consistent with BE's behavior in
GetExecutorGroupsForQuery.

Tests:
  - Add new test cases to ClusterMembershipMgrUnitTest.
  - Add e2e test to verify the new behavior of planner.

Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
---
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
5 files changed, 166 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/7
--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-09-21 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20273 )

Change subject: IMPALA-12312: Using correct executor group set info for planning
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc@671
PS6, Line 671: // Add a default exec group set if no expected group sets were 
specified.
 : exec_group_sets.emplace_back();
 : 
exec_group_sets.back().__set_expected_num_executors(FLAGS_num_expected_executors);
> Can we double check what is Impala rules on executor group sets configurati
AFAIK, Impala does allow users to configure a 'default' EG and other EGs in a 
cluster.

Yes, the 'default' EG here is just a fallback. If REQUEST_POOL query option is 
set to a non-existent pool(misconfiguration or some EG sets have been destroyed 
for auto-scaling), the coordinator should schedule this query on the 'default' 
group: 
https://github.com/apache/impala/blob/4d15558b5eaa69e872917c8bbf69dc1dc2146bc5/be/src/scheduling/admission-controller.cc#L2603-L2608.



--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 21 Sep 2023 13:36:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-09-19 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20273

to look at the new patch set (#6).

Change subject: IMPALA-12312: Using correct executor group set info for planning
..

IMPALA-12312: Using correct executor group set info for planning

Prior to this patch, planner always selects the default group if there
is a default group in an impala cluster. When a client sets a
non-default request pool, planner still assumes the query run on the
default group and calculates the wrong number of nodes and instances.

This patch fixes it by including both default and non-default groups
in the update message sent from BE to FE, so planner can generate a
plan based on correct executor group set info. Besides, if no matched
executor group is found, planner falls back to using the default group
for planning, which is consistent with BE's behavior in
GetExecutorGroupsForQuery.

Tests:
  - Add new test cases to ClusterMembershipMgrUnitTest.
  - Add e2e test to verify the new behavior of planner.

Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
---
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
5 files changed, 165 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/6
--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-09-19 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20273

to look at the new patch set (#5).

Change subject: IMPALA-12312: Using correct executor group set info for planning
..

IMPALA-12312: Using correct executor group set info for planning

Prior to this patch, planner always selects the default group if there
is a default group in an impala cluster. When a client sets a
non-default request pool, planner still assumes the query run on the
default group and calculates the wrong number of nodes and instances.

This patch fixes it by including both default and non-default groups
in the update message sent from BE to FE, so planner can generate a
plan based on correct executor group set info. Besides, if no matched
executor group is found, planner falls back to using the default group
for planning, which is consistent with BE's behavior in
GetExecutorGroupsForQuery.

Tests:
  - Add new test cases to ClusterMembershipMgrUnitTest.
  - Add e2e test to verify the new behavior of planner.

Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
---
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
5 files changed, 163 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/5
--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-09-19 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20273

to look at the new patch set (#4).

Change subject: IMPALA-12312: Using correct executor group set info for planning
..

IMPALA-12312: Using correct executor group set info for planning

Prior to this patch, planner always selects the default group if there
is a default group in an impala cluster. When a client sets a
non-default request pool, planner still assumes the query run on the
default group and calculates the wrong number of nodes and instances.

This patch fixes it by including both default and non-default groups
in the update message sent from BE to FE, so planner can generate a
plan based on correct executor group set info. Besides, if no matched
executor group is found, planner falls back to using the default group
for planning, which is consistent with BE's behavior in
GetExecutorGroupsForQuery.

Tests:
  - Add new test cases to ClusterMembershipMgrUnitTest.
  - Add e2e test to verify the new behavior of planner.

Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
---
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M be/src/service/impala-server.cc
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
7 files changed, 161 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/4
--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-10 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20294 )

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20294/5/testdata/bin/copy-udfs-udas.sh
File testdata/bin/copy-udfs-udas.sh:

http://gerrit.cloudera.org:8080/#/c/20294/5/testdata/bin/copy-udfs-udas.sh@58
PS5, Line 58:   cd "${IMPALA_HOME}/java/test-hive-udfs"
> optional: Can we skip this when we are building Impala without the 'notests
It's truly a bit tricky to set the CMAKE option again like this.

I updated 'buildall.sh' in the newest patch set, now the 'BUILD_WITH_NO_TESTS' 
is set ON only when '-notests' and 'package' flags are used at the same time. 
So that the previous test workflow will not be impacted.



--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 10 Aug 2023 12:31:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-10 Thread Yifan Zhang (Code Review)
Hello Quanlong Huang, Michael Smith, Joe McDonnell, Impala Public Jenkins, 
Xiang Yang,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20294

to look at the new patch set (#7).

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..

IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets

This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not
to generate test targets. In order to be consistent with the previous
test workflow, this option is only set ON when building impala using
the 'buildall.sh' script with '-notest' and '-package' flags. This
is useful for a packaging build which do not need to build all test
binaries.

Testing:
  - Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.

Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
---
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/avro/CMakeLists.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/gutil/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/io/CMakeLists.txt
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/statestore/CMakeLists.txt
M be/src/testutil/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/cache/CMakeLists.txt
M buildall.sh
24 files changed, 167 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/7
--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-10 Thread Yifan Zhang (Code Review)
Hello Quanlong Huang, Michael Smith, Joe McDonnell, Impala Public Jenkins, 
Xiang Yang,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20294

to look at the new patch set (#6).

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..

IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets

This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not
to generate test targets. In order to be consistent with the previous
test workflow, this option is only set ON when building impala using
the 'buildall.sh' script with '-notest' and '-package' flags. This
is useful for a packaging build which do not need to build all test
binaries.

Testing:
  - Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.

Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
---
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/avro/CMakeLists.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/gutil/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/io/CMakeLists.txt
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/statestore/CMakeLists.txt
M be/src/testutil/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/cache/CMakeLists.txt
M buildall.sh
24 files changed, 167 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/6
--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-07 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20294 )

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt
File be/src/catalog/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt@29
PS5, Line 29: if (BUILD_WITH_NO_TESTS)
> Well, for example, if someone add new release target at the bottom of this
Yeah, that makes sense. But I think this kind of error can be easily detected 
by regression tests.



--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Mon, 07 Aug 2023 13:21:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-07 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20294 )

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt
File be/src/catalog/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt@29
PS5, Line 29: if (BUILD_WITH_NO_TESTS)
> Hi yifan, I think it'd better to wrap the following code block within the '
Well, I think we can get the same result by returning early without modifying 
too much codes. Could you elaborate more on why is it recommended to wrap the 
codes into an if block?



--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Mon, 07 Aug 2023 06:38:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-03 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20294 )

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20294/3/bin/bootstrap_build.sh
File bin/bootstrap_build.sh:

http://gerrit.cloudera.org:8080/#/c/20294/3/bin/bootstrap_build.sh@64
PS3, Line 64: ./buildall.sh -notests -so
> Ah, create-load-data calls copy-udfs-udas, which makes a few specific targe
This change was intended to fix the failure in copy-udfs-udas. But it turns out 
that 'bootstrap_build.sh' is called in 'jenkins/build-only.sh', which is not 
used to run all tests.

It seems that we run tests using 'jenkins/dockerized-impala-run-tests.sh', 
which calls './buildall.sh -format -testdata -notests' to build Impala and load 
data. Considering that building all backend tests is not necessary, I fixed the 
issue in copy-udfs-udas by manually running cmake again before building tests.



--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Fri, 04 Aug 2023 04:53:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-03 Thread Yifan Zhang (Code Review)
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20294

to look at the new patch set (#5).

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..

IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets

This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to
generate test targets. The option is set ON when building impala using
the command 'buildall.sh -notests'. This should be useful for a packing
build because cpack built all targets prior to this change.

Testing:
  - Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.

Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
---
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/avro/CMakeLists.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/gutil/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/io/CMakeLists.txt
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/statestore/CMakeLists.txt
M be/src/testutil/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/cache/CMakeLists.txt
M buildall.sh
M testdata/bin/copy-udfs-udas.sh
25 files changed, 174 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/5
--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-03 Thread Yifan Zhang (Code Review)
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20294

to look at the new patch set (#4).

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..

IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets

This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to
generate test targets. The option is set ON when building impala using
the command 'buildall.sh -notests'. This should be useful for a packing
build because cpack built all targets prior to this change.

Testing:
  - Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.

Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
---
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/avro/CMakeLists.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/gutil/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/io/CMakeLists.txt
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/statestore/CMakeLists.txt
M be/src/testutil/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/cache/CMakeLists.txt
M buildall.sh
M testdata/bin/copy-udfs-udas.sh
25 files changed, 174 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/4
--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-03 Thread Yifan Zhang (Code Review)
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20294

to look at the new patch set (#3).

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..

IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets

This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to
generate test targets. The option is set ON when building impala using
the command 'buildall.sh -notests'. This should be useful for a packing
build because cpack built all targets prior to this change.

Testing:
  - Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.

Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
---
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/avro/CMakeLists.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/gutil/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/io/CMakeLists.txt
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/statestore/CMakeLists.txt
M be/src/testutil/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/cache/CMakeLists.txt
M bin/bootstrap_build.sh
M buildall.sh
25 files changed, 168 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/3
--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-01 Thread Yifan Zhang (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20294

to look at the new patch set (#2).

Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..

IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets

This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to
generate test targets. The option is set ON when building impala using
the command 'buildall.sh -notests'. This should be useful for a packing
build because cpack built all targets prior to this change.

Testing:
  - Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.

Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
---
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/avro/CMakeLists.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/gutil/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/io/CMakeLists.txt
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/statestore/CMakeLists.txt
M be/src/testutil/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/cache/CMakeLists.txt
M buildall.sh
24 files changed, 167 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/2
--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets

2023-08-01 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20294


Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test 
targets
..

IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets

This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to
generate test targets. The option is set ON when building impala using
the command 'buildall.sh -notests'. This should be useful for a packing
build because cpack build all targets prior to this change.

Testing:
  - Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.

Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
---
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/avro/CMakeLists.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/gutil/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/io/CMakeLists.txt
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/statestore/CMakeLists.txt
M be/src/testutil/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/cache/CMakeLists.txt
M buildall.sh
24 files changed, 164 insertions(+), 63 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/1
--
To view, visit http://gerrit.cloudera.org:8080/20294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Gerrit-Change-Number: 20294
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-07-27 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20273

to look at the new patch set (#2).

Change subject: IMPALA-12312: Using correct executor group set info for planning
..

IMPALA-12312: Using correct executor group set info for planning

Prior to this patch, planner always selects the default group if there
is a default group in an impala cluster. When a client sets a
non-default request pool, planner still assumes the query run on the
default group and calculates the wrong number of nodes and instances.

This patch fixes it by including both default and non-default groups
in the update message sent from BE to FE, so planner can generate a
plan based on correct executor group set info. Besides, if no matched
executor group is found, planner falls back to using the default group
for planning, which is consistent with BE's behavior in
GetExecutorGroupsForQuery.

Tests:
  - Add new test cases to ClusterMembershipMgrUnitTest.
  - Add e2e test to verify the new behavior of planner.

Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
---
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
5 files changed, 149 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/2
--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning

2023-07-27 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20273


Change subject: IMPALA-12312: Using correct executor group set info for planning
..

IMPALA-12312: Using correct executor group set info for planning

Prior to this patch, planner always selects the default group if there
is a default group in an impala cluster. When a client sets a
non-default request pool, planner still assumes the query run on the
default group and calculates the wrong number of nodes and instances.

This patch fixes it by including both default and non-default groups
in the update message sent from BE to FE, so planner can generate a
plan based on correct executor group set info. Besides, if no matched
executor group is found, planner falls back to using the default group
for planning, which is consistent with BE's behavior in
GetExecutorGroupsForQuery.

Tests:
  - Add new test cases to ClusterMembershipMgrUnitTest.
  - Add e2e test to verify the new behavior of planner.

Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
---
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
5 files changed, 149 insertions(+), 43 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/1
--
To view, visit http://gerrit.cloudera.org:8080/20273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80
Gerrit-Change-Number: 20273
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support

2023-07-14 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18939 )

Change subject: IMPALA-10262: RPM/DEB Packaging Support
..


Patch Set 10:

> Patch Set 10: Code-Review+2
>
> I filed IMPALA-12288 to track having a mode that avoids building the tests 
> when packaging.
>
> I'm bumping up to +2, because I think we can address any issues we find in 
> subsequent changes.

That makes sense to me. Thanks for filing the JIRA.


--
To view, visit http://gerrit.cloudera.org:8080/18939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Gerrit-Change-Number: 18939
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Sat, 15 Jul 2023 01:19:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support

2023-07-14 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18939 )

Change subject: IMPALA-10262: RPM/DEB Packaging Support
..


Patch Set 10: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Gerrit-Change-Number: 18939
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Sat, 15 Jul 2023 01:16:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support

2023-07-13 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18939 )

Change subject: IMPALA-10262: RPM/DEB Packaging Support
..


Patch Set 10:

I ran the command './buildall.sh -noclean -notests -release -package' and found 
the ctests were built even with '--notests' option.


--
To view, visit http://gerrit.cloudera.org:8080/18939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Gerrit-Change-Number: 18939
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 13 Jul 2023 09:26:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12188: Avoid unnecessary output from sourcing bin/impala-config.sh

2023-07-10 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20098 )

Change subject: IMPALA-12188: Avoid unnecessary output from sourcing 
bin/impala-config.sh
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4e39f50c7efb8c42a6d3597be0e18c4c79457c5
Gerrit-Change-Number: 20098
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Mon, 10 Jul 2023 08:16:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12249: Fix the unexpected word wrap of 'Progress' in WebUI queries page

2023-06-28 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20130 )

Change subject: IMPALA-12249: Fix the unexpected word wrap of 'Progress' in 
WebUI queries page
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I894ada826282d33c3f2395231db1ddf97bc82367
Gerrit-Change-Number: 20130
Gerrit-PatchSet: 1
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 28 Jun 2023 09:00:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support

2023-06-26 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18939 )

Change subject: IMPALA-10262: RPM/DEB Packaging Support
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18939/8/package/bin/impala-env.sh
File package/bin/impala-env.sh:

http://gerrit.cloudera.org:8080/#/c/18939/8/package/bin/impala-env.sh@32
PS8, Line 32: export LIBHDFS_OPTS="${LIBHDFS_OPTS:-} 
-Djava.library.path=${HADOOP_LIB_DIR}/native/"
:   echo "Using hadoop native libs in ${HADOOP_LIB_DIR}/native/"
: else
:   echo "WARNING: HDFS short-circuit reads are not enabled due to 
HADOOP_HOME not set."
nit: Could we also pack hadoop native libs into the final package?



--
To view, visit http://gerrit.cloudera.org:8080/18939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Gerrit-Change-Number: 18939
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Mon, 26 Jun 2023 07:04:21 +
Gerrit-HasComments: Yes


  1   2   >