Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 )
Change subject: IMPALA-6636: Use async IO in ORC scanner ...................................................................... Patch Set 22: (7 comments) http://gerrit.cloudera.org:8080/#/c/15370/21/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15370/21/be/src/exec/hdfs-orc-scanner.cc@484 PS21, Line 484: > I believe it is unique_ptr here because orc::createReader first parameter a Ah, yeah, that's a problem. Then we need to make sure 'input_stream_' won't be used if the corresponding 'reader_' is destroyed. I don't see any other usages of 'input_stream_' outside this method. Can we just make it a local variable? http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@114 PS22, Line 114: VLOG_FILE Could you change this to warning or use VLOG_QUERY? Otherwise it's hard to notice it in practise. http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@128 PS22, Line 128: string msg = Substitute("Invalid read len on ORC file $0.", filename_); Can we report the offset and length here? http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@137 PS22, Line 137: // Set expected_local to false to avoid cache on stale data (IMPALA-6830) : bool expected_local = false; Not related to this patch, but I think we need to revisit this as IMPALA-6830 is not an issue. http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@238 PS22, Line 238: range.length_ Shouldn't this be "range.offset_ + range.length_"? http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@261 PS22, Line 261: if (offset < current_position_) { : DCHECK(false); : string msg = Substitute( : "ORC read request to already read range. offset: $0 length: $1 pos: $2 $3", : offset, length, current_position_, debug()); : return Status(msg); : } I think we can change this to DCHECK(offset >= current_position_) now, since we've moved the check to line 113. http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@287 PS22, Line 287: // TODO: extend Orc interface to avoid the copy Could you create a JIRA for this? -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 22 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Fri, 21 Jan 2022 03:06:40 +0000 Gerrit-HasComments: Yes
