Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors ......................................................................
Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/8011/3//COMMIT_MSG Commit Message: Line 19: - Then during error handling, the BaseSequenceScanner calls SkipToSync() I wonder if the same bug can happen for text files. I think parquet may be OK since it gives up on the row group when it encounters an error. http://gerrit.cloudera.org:8080/#/c/8011/3/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: PS3, Line 187: Abort I'm wondering whether we should abort the whole query or abort scanning this file only by setting eos_. The second seems more in the spirit of abort_on_error=0. http://gerrit.cloudera.org:8080/#/c/8011/3/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: Line 156: Status status(scan_range_->GetNext(&io_buffer_)); nit: Status status = scan_range_->GetNext(...) I guess it's equivalent but I had to read it twice since it's different from the normal idiom. Line 193: return Status(TErrorCode::GENERAL_IO_ERROR, Substitute("Failed to receive buffer " This would be really confusing if a user hit this. I think we should make it clear that it's an internal error from the message. We should also file a JIRA to remove this, since we don't want this workaround hanging around indefinitely. Line 364: if (status.ok() || status.code() == TErrorCode::GENERAL_IO_ERROR) return status; Can we file a follow-on JIRA to rework this - e.g. ensure that all HDFS read errors coming out of the I/O manager are identifiable. Line 365: Status wrapped_status(TErrorCode::GENERAL_IO_ERROR); Does the wrapped error show up in the error log and query profile ok? I know there's some weirdness with merged errors getting dropped in some cases: IMPALA-4697 http://gerrit.cloudera.org:8080/#/c/8011/3/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: Line 337: ("GENERAL_IO_ERROR", 110, "General I/O error."), Can we call this something like SCANNER_CONTEXT_WRAPPED_IO_ERROR for now? It's a bit confusing since it sounds like it's more general than it is. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
