Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: IMPALA-6636: Use async IO in ORC scanner
......................................................................


Patch Set 22:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15370/21/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/21/be/src/exec/hdfs-orc-scanner.cc@484
PS21, Line 484:
> I believe it is unique_ptr here because orc::createReader first parameter a
Ah, yeah, that's a problem. Then we need to make sure 'input_stream_' won't be 
used if the corresponding 'reader_' is destroyed. I don't see any other usages 
of 'input_stream_' outside this method. Can we just make it a local variable?


http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@114
PS22, Line 114: VLOG_FILE
Could you change this to warning or use VLOG_QUERY? Otherwise it's hard to 
notice it in practise.


http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@128
PS22, Line 128:     string msg = Substitute("Invalid read len on ORC file $0.", 
filename_);
Can we report the offset and length here?


http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@137
PS22, Line 137:   // Set expected_local to false to avoid cache on stale data 
(IMPALA-6830)
              :   bool expected_local = false;
Not related to this patch, but I think we need to revisit this as IMPALA-6830 
is not an issue.


http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@238
PS22, Line 238: range.length_
Shouldn't this be "range.offset_ + range.length_"?


http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@261
PS22, Line 261:   if (offset < current_position_) {
              :     DCHECK(false);
              :     string msg = Substitute(
              :         "ORC read request to already read range. offset: $0 
length: $1 pos: $2 $3",
              :         offset, length, current_position_, debug());
              :     return Status(msg);
              :   }
I think we can change this to DCHECK(offset >= current_position_) now, since 
we've moved the check to line 113.


http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@287
PS22, Line 287: // TODO: extend Orc interface to avoid the copy
Could you create a JIRA for this?



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 22
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Fri, 21 Jan 2022 03:06:40 +0000
Gerrit-HasComments: Yes

Reply via email to