Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 )
Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS ...................................................................... Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@584 PS1, Line 584: isPathOnFileSystem > This patch adds requirement for checking existence. So we can only used it It also returns true if the parameter is a qualified path (which is always the case currently), and its URI/scheme belongs to the filesystem ('path.equals(qp))'), but does not exist. So in that sense the name 'isPathExistsOnFileSystem' could be misleading. Currently what this method does is: * For qualified paths: return /*fast-path=*/ path.equals(qualifiedPath) || /*slow-path=*/ fs.exists(path) * For unqualified paths: return /*slow-path=*/ fs.exists(path) http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588 PS1, Line 588: return fs.exists(path); > I think it'd be good if we have unit tests in FileSystemUtilTest on this me Good idea. We already have the necessary configuration in the dev environment to test the different file system implementations, so it wasn't too hard to add an extra unit test to FileSystemUtilTest. http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588 PS1, Line 588: return fs.exists(path); > I wonder if there is a need to call fs.exists(path) here, since a relative This would be similar to the 'base' version. It wouldn't work because AzureBlobFileSystem doesn't throw an exception when it sees a different qualified path than fs. E.g. it happily substitutes 'hdfs://' to 'abfs://'. Currently we only use this function in relocateFile() to decide whether the source file is on the destination file system. So it seemed rational to me to check existence, but only if 'path.equals(qp)' returns false. If we can assume that path is always qualified, then the following code should be enough: try { Path qp = fs.makeQualified(path); return path.equals(qp)); } Maybe I could add a precondition like this to ensure that we are dealing with a qualified path: Precondition.checkState(!path.equals(Path.getPathWithoutSchemeAndAuthority(path)); http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@590 PS1, Line 590: // Path is not on fs. > Maybe it's worth some logs here. So we know what path (whether qualified or Added a debug log. -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 20 Apr 2021 17:02:55 +0000 Gerrit-HasComments: Yes
