[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 13:

(4 comments)

Hi All,
David has ran several benchmark run. Perf number seems to improve from this 
async IO prototype.
I will proceed cleaning up the code and add proper commit message. Here are 
some that I plan to address next.

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.h@123
PS13, Line 123:  // ExecEnv::GetInstance()->disk_io_mgr()->max_buffer_size();
Can be removed?


http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.cc@300
PS13, Line 300: // stream_->ReleaseCompletedResources(true);
  : stream_->ReleaseCompletedResources(false);
Calling 'ReleaseCompletedResources(true)' seems to be OK here?


http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-scan-node-base.cc@821
PS13, Line 821:   // DCHECK_LE(offset + len, 
GetFileDesc(metadata->partition_id, file)->file_length)
  :   //<< "Scan range beyond end of file (offset=" << offset 
<< ", len=" << len << ")";
Can be removed?


http://gerrit.cloudera.org:8080/#/c/15370/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15370/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2134
PS13, Line 2134: for (SlotDescriptor slot: desc_.getSlots()) {
Just for our note, we found a corner case here for "select count(*)" kind of 
query over ORC.
Somehow, desc._getSlots() is empty in this corner case, but 
HdfsOrcScanner::StartColumnReading actually see couple streams that is eligible 
for async read.

Patch set 12 already adds a workaround within 
HdfsOrcScanner::StartColumnReading to TryIncreaseReservation 8KB 
(min_buffer_size) for each eligible stream. If it can't increase, then the rest 
of the stream will be read synchronously. I will file a follow up JIRA to 
document this situation.



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:17:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9804/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Nov 2021 17:22:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-columnar-scanner.cc
File be/src/exec/hdfs-columnar-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-columnar-scanner.cc@239
PS13, Line 239:   
columnar_scanner_actual_reservation_counter_->UpdateCounter(context_->total_reservation());
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.cc@208
PS13, Line 208: 
columnar_scanner_actual_reservation_counter_->UpdateCounter(context_->total_reservation());
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Nov 2021 17:02:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-18 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded a new patch set (#13) to the change originally 
created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test
14 files changed, 475 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/13
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9799/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 17 Nov 2021 22:12:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 12:

Patch set 12 has the following changes:
1. Remove several debugging logs.
2. Adjust resource-requirements.test (from 
PlannerTest.testResourceRequirements).
3. Add workaround to increase memory reservation for certain select count cases.

Following queries failed without workaround from point 3:
select count(*) from complextypes_partitioned.int_array;
select count(*) from complextypes_partitioned.nested_struct.c.d.item 
inner_array;


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 17 Nov 2021 21:54:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded a new patch set (#12) to the change originally 
created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test
14 files changed, 475 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/12
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9794/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 17 Nov 2021 16:44:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9793/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 17 Nov 2021 16:34:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 11:

Patch set 11 fix stream positioning bug and resolve some e2e tests failure.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 17 Nov 2021 16:24:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded a new patch set (#11) to the change originally 
created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test
13 files changed, 517 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/11
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 10:

Patch set 10 is a rebase of patch set 9 over recent master branch.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 17 Nov 2021 16:14:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-17 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded a new patch set (#10) to the change originally 
created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
12 files changed, 494 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/10
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7464/


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Sep 2021 04:50:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9446/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 22:56:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 9:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc
File be/src/exec/hdfs-columnar-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@153
PS9, Line 153:   //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << "reserved: " << min_buffer_size * 
col_range_lengths.size();
line too long (138 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@158
PS9, Line 158:LOG(INFO) << "col_range_lengths: " << col_range_lengths[i];
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@160
PS9, Line 160:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@172
PS9, Line 172:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute <<
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@172
PS9, Line 172:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute <<
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@203
PS9, Line 203:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << " bytes to add " << bytes_to_add;
line too long (115 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@203
PS9, Line 203:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << " bytes to add " << bytes_to_add;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-columnar-scanner.cc@209
PS9, Line 209:LOG(INFO) << "column reservation: " << tmp_reservation.second;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@112
PS9, Line 112:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@113
PS9, Line 113:LOG(INFO) << "Read random from orc. offset: " << offset << " 
length: " << length;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@124
PS9, Line 124:LOG(INFO) << "Read async orc. offset: " << offset << " 
length: " << length;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@147
PS9, Line 147:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@203
PS9, Line 203: unique_ptr stream = 
stripe.getStreamInformation(stream_id);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@278
PS9, Line 278:  DCHECK(false);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@289
PS9, Line 289:return status;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@290
PS9, Line 290:}
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@291
PS9, Line 291://LOG(INFO) << "HdfsOrcScanner::ColumnRange::read skipping: " 
<< (offset - position_);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-orc-scanner.cc@304
PS9, Line 304:  //LOG(INFO) << "HdfsOrcScanner::ColumnRange::read stream 
finished: ";
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/15370/9/be/src/exec/hdfs-scan-node-base.cc@823
PS9, Line 823:   if (offset + len > GetFileDesc(metadata->partition_id, 
file)->file_length) return nullptr;
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 22:04:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7464/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 22:04:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15370

to look at the new patch set (#9).

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
12 files changed, 496 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/9
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7462/


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 20:28:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9444/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 19:08:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7462/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 18:50:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7461/


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 18:47:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 8:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc
File be/src/exec/hdfs-columnar-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@153
PS8, Line 153:   //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << "reserved: " << min_buffer_size * 
col_range_lengths.size();
line too long (138 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@158
PS8, Line 158:LOG(INFO) << "col_range_lengths: " << col_range_lengths[i];
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@160
PS8, Line 160:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@172
PS8, Line 172:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute <<
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@172
PS8, Line 172:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute <<
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@203
PS8, Line 203:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << " bytes to add " << bytes_to_add;
line too long (115 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@203
PS8, Line 203:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << " bytes to add " << bytes_to_add;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-columnar-scanner.cc@209
PS8, Line 209:LOG(INFO) << "column reservation: " << tmp_reservation.second;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@112
PS8, Line 112:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@113
PS8, Line 113://LOG(INFO) << "Read random from orc. offset: " << offset << 
" length: " << length;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@124
PS8, Line 124://LOG(INFO) << "Read async orc. offset: " << offset << " 
length: " << length;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@145
PS8, Line 145:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@198
PS8, Line 198: unique_ptr stream = 
stripe.getStreamInformation(stream_id);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@269
PS8, Line 269:  DCHECK(false);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@280
PS8, Line 280:return status;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@281
PS8, Line 281:}
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@282
PS8, Line 282://LOG(INFO) << "HdfsOrcScanner::ColumnRange::read skipping: " 
<< (offset - position_);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/8/be/src/exec/hdfs-orc-scanner.cc@295
PS8, Line 295:  //LOG(INFO) << "HdfsOrcScanner::ColumnRange::read stream 
finished: ";
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 18:45:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15370

to look at the new patch set (#8).

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
11 files changed, 483 insertions(+), 191 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/8
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9441/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 13:03:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7461/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 12:42:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 7:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc
File be/src/exec/hdfs-columnar-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@153
PS7, Line 153:   //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << "reserved: " << min_buffer_size * 
col_range_lengths.size();
line too long (138 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@158
PS7, Line 158:LOG(INFO) << "col_range_lengths: " << col_range_lengths[i];
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@160
PS7, Line 160:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@172
PS7, Line 172:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute <<
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@172
PS7, Line 172:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute <<
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@203
PS7, Line 203:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << " bytes to add " << bytes_to_add;
line too long (115 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@203
PS7, Line 203:  //LOG(INFO) << "reservation_to_distribute: " << 
reservation_to_distribute << " bytes to add " << bytes_to_add;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-columnar-scanner.cc@209
PS7, Line 209:LOG(INFO) << "column reservation: " << tmp_reservation.second;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@112
PS7, Line 112://LOG(INFO) << "Read random from orc. offset: " << offset << 
" length: " << length;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@123
PS7, Line 123://LOG(INFO) << "Read async orc. offset: " << offset << " 
length: " << length;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@191
PS7, Line 191: unique_ptr stream = 
stripe.getStreamInformation(stream_id);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@262
PS7, Line 262:  DCHECK(false);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@273
PS7, Line 273:return status;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@274
PS7, Line 274:}
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@275
PS7, Line 275://LOG(INFO) << "HdfsOrcScanner::ColumnRange::read skipping: " 
<< (offset - position_);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/15370/7/be/src/exec/hdfs-orc-scanner.cc@288
PS7, Line 288:  //LOG(INFO) << "HdfsOrcScanner::ColumnRange::read stream 
finished: ";
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Sep 2021 12:40:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-09 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15370

to look at the new patch set (#7).

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
11 files changed, 476 insertions(+), 191 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/7
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7455/


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 07 Sep 2021 17:00:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9425/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 07 Sep 2021 11:49:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7455/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 07 Sep 2021 11:27:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-07 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15370

to look at the new patch set (#6).

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
11 files changed, 395 insertions(+), 184 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/6
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7454/


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Sep 2021 16:53:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9423/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Sep 2021 13:09:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7454/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Sep 2021 12:47:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-09-06 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15370

to look at the new patch set (#5).

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
11 files changed, 395 insertions(+), 181 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/5
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7448/


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 16:15:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9411/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:41:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7448/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:30:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7447/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:30:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15370

to look at the new patch set (#3).

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
11 files changed, 395 insertions(+), 181 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/3
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9257/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 09 Aug 2021 14:58:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 2:

Note that this was a quite hacky implementation - the problem is that when the 
ORC lib reads from the file, it only gives us an offset and length and we do 
not know which column (or stream) does it try to read. So we build a map of 
ranges beforehand (HdfsOrcScanner::StartColumnReading), and try to guess which 
range to advance during every individual read call and fall back to sync-IO if 
the read is not what we expected (HdfsOrcScanner::ScanRangeInputStream::read)

This seems to work, but changes in ORC lib can easily lead "disabling" async 
scanning by reading in unexpected patterns. The best would be to move most of 
the logic to ORC, so that it would return the ranges to us and identify the 
given range in every read call.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 09 Aug 2021 14:44:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15370


Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
11 files changed, 386 insertions(+), 188 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/2
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer