Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14406 )

Change subject: IMPALA-9033: log on slow HDFS I/Os
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14406/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/14406/1/be/src/runtime/io/hdfs-file-reader.cc@183
PS1, Line 183:         LOG(INFO) << "Slow FS I/O operation on " << 
*scan_range_->file_string() << " for "
I forgot to log the time taken, fixed that.


http://gerrit.cloudera.org:8080/#/c/14406/1/be/src/runtime/io/hdfs-file-reader.cc@178
PS1, Line 178:       // Log diagnostics
             :       int64_t elapsed_time = 
req_context_read_timer.ElapsedTime();
             :       bool is_slow_read = elapsed_time
             :           > FLAGS_fs_slow_read_log_threshold_ms * 
NANOS_PER_MICRO * MICROS_PER_MILLI;
             :       if (is_slow_read) {
             :         LOG(INFO) << "Slow FS I/O operation on " << 
*scan_range_->file_string() << " for "
             :                   << "instance " << 
PrintId(scan_range_->reader_->instance_id())
             :                   << " of query " << 
PrintId(scan_range_->reader_->query_id()) << ". "
             :                   << "Last read returned "
             :                   << 
PrettyPrinter::PrintBytes(current_bytes_read) << ". "
             :                   << "This thread has read " << 
PrettyPrinter::PrintBytes(*bytes_read)
             :                   << "/" << 
PrettyPrinter::PrintBytes(bytes_to_read)
             :                   << " starting at offset " << file_offset << " 
in this I/O scheduling "
             :                   << "quantum.";
             :       }
> One thought: If an IO took a long time and then failed, it would be nice to
That's a good point, we really want to know in that case too.

I had thought about the multiple calls problem as well and my primary reason to 
do it this way was to minimise the invasiveness. It does also have the 
advantage that it'll catch issues where the individual reads add up enough to 
block the I/O thread for a significant time. In those cases once you pass the 
threshold, you'll get multiple logs so should be able to see if each additional 
call added a significant amount of time.



--
To view, visit http://gerrit.cloudera.org:8080/14406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1929921495706b482d91d91cffe27bee4478f5c4
Gerrit-Change-Number: 14406
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 10 Oct 2019 19:10:54 +0000
Gerrit-HasComments: Yes

Reply via email to