[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966638#comment-13966638 ] Daryn Sharp commented on HDFS-6143: --- bq. [...] It may not work since the seek offset is missing in open. There is no way to calculate the redirect. I have an internal 0.23 patch that I'm reworking for 2.x that will actually use the block locations for opens - HDFS-6221. The current incarnation fetches block locations for just the offset when the http connection stream is opened, but it could easily be changed to fetch all the block locations when the webhdfs open is called - which will elicit a FNF - and then use the locations for a given offset when the http connection stream is opened. bq. Short term, looking at the ByteRangeInputStream, it's inefficient in that for even a single byte forward seek (seek(getPos()+1), it closes the connection and re-opens it [...] read-ahead for short range seeks, which is a lot more efficient Yes, I've already tinkered with fixing this very problem. Internally we found that a fraction of jobs actually perform seeks after the split offset seek, and those that did seek would only do so maybe 1-2 times so it was deemed a low priority fix. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963976#comment-13963976 ] Steve Loughran commented on HDFS-6143: -- Daryn, Having spent time looking at traces of swift FS operations, the combination of Open+seek is ubiquitous, and it is expensive over long-distance links, especially with HTTP in the story. But: we do expect {{open(path)}} to fail if its not there -changing that is a major change in expectations. What would make sense -long term- is for a new operation {{openAt(Path, offset)}}. For any of the HTTP filesystems, this would do a GET from the offset at open time; Short term, looking at the {{ByteRangeInputStream}}, it's inefficient in that for even a single byte forward seek (seek(getPos()+1), it closes the connection and re-opens it, adds the cost of setting up the connection and resets all flow control data on the channel. If you look a {{SwiftNativeInputStream}} you can see how it does read-ahead for short range seeks, which is a lot more efficient for any code that is reading and skipping ahead. Someone should think about doing that as it would reduce the performance of those seeks. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962773#comment-13962773 ] Hudson commented on HDFS-6143: -- FAILURE: Integrated in Hadoop-Yarn-trunk #533 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/533/]) HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths. Contributed by Gera Shegalov. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962963#comment-13962963 ] Hudson commented on HDFS-6143: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1751 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1751/]) HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths. Contributed by Gera Shegalov. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962974#comment-13962974 ] Hudson commented on HDFS-6143: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1725 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1725/]) HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths. Contributed by Gera Shegalov. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962999#comment-13962999 ] Daryn Sharp commented on HDFS-6143: --- Unfortunately this appears to introduce a performance penalty for jobs using webhdfs. A file used as job input will open the file and immediately seek to the split location. Currently, the lazy open will only begin streaming from the seek offset. Doesn't this patch cause every map to begin streaming from offset 0, followed by the seek which closes the stream, and then re-opening and streaming from the new offset? If yes this adds unnecessary load, additional latency for an unnecessary connection that will be closed, is wasteful of bandwidth for data that will be ignored, etc. I think the patch should be reverted. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963005#comment-13963005 ] Daryn Sharp commented on HDFS-6143: --- If I understand the intent of this jira correctly, I believe you want to do something similar to the manual redirect resolution ala the two-step write: upon creating the data stream, issue the open to the NN with follow redirects disabled which will elicit an immediate 404. Then upon read use the resolved url. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963139#comment-13963139 ] Gera Shegalov commented on HDFS-6143: - [~daryn], thanks for review. I agree that there is a performance cost associated with streaming too early if we are going to seek right away, e.g, to a split offset. What do think of just invoking {{getFileStatus}} within as the first thing in {{WebHdfsFileSystem.open}} WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963356#comment-13963356 ] Daryn Sharp commented on HDFS-6143: --- A {{getFileStatus}} call will add more load to the NN, and more importantly I don't see how it will address the issue. The NN shouldn't be redirecting you if the file doesn't exist. If it is, that's the bug that must be fixed. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963359#comment-13963359 ] Daryn Sharp commented on HDFS-6143: --- I'm sorry, my brain is elsewhere, ignore previous comment. I think the client should do the two step redirect resolve so the client will get FNF immediately, but the datanode stream will still lazy open. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963419#comment-13963419 ] Tsz Wo Nicholas Sze commented on HDFS-6143: --- ... believe you want to do something similar to the manual redirect resolution ala the two-step write: upon creating the data stream, issue the open to the NN with follow redirects disabled which will elicit an immediate 404. Then upon read use the resolved url. It may not work since the seek offset is missing in open. There is no way to calculate the redirect. [~jira.shegalov], could you explain how this breaks LzoInputFormat.getSplits? I am also concerned about the performance penalty. I wonder if there are other solutions. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963576#comment-13963576 ] Gera Shegalov commented on HDFS-6143: - [~daryn], I think putting calling {{getFileStatus}} in open to get FNF, and keep the old lazy open will have the same effect. {code} diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ index bdf744a..bad1534 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java @@ -898,6 +898,7 @@ public boolean delete(Path f, boolean recursive) throws IOException { @Override public FSDataInputStream open(final Path f, final int buffersize ) throws IOException { +getFileStatus(f); statistics.incrementReadOps(1); final HttpOpParam.Op op = GetOpParam.Op.OPEN; final URL url = toUrl(op, f, new BufferSizeParam(buffersize)); {code} WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963595#comment-13963595 ] Gera Shegalov commented on HDFS-6143: - [~szetszwo], it's easy to understand the issue if your review the unit test modifications in the patch. As for our production use case, that works fine on the LocalFileSystem and HDFS, please review splittable Lzo in elephant bird. Hadoop-lzo's {{LzoIndex.readIndex}} reads the index file and exploits the fact the FS should [error out opening a non-existing file|https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/compression/lzo/LzoIndex.java#L176], instead of using an RPC to get the file status directly or via {{exists}}. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961652#comment-13961652 ] Hadoop QA commented on HDFS-6143: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638935/HDFS-6143-trunk-after-HDFS-5570.v01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6597//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6597//console This message is automatically generated. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961714#comment-13961714 ] Hadoop QA commented on HDFS-6143: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638943/HDFS-6143-branch-2.4.0.v01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6600//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6600//console This message is automatically generated. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961906#comment-13961906 ] Arun C Murthy commented on HDFS-6143: - One concern: what is the impact on compatibility with 2.2 etc? WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962042#comment-13962042 ] Haohui Mai commented on HDFS-6143: -- {code} +} else if (e instanceof UnresolvedLinkException) { + s = Response.Status.GONE; {code} Should this be 404 instead of 410? In UNIX, opening a unresolved symlink return {{ENOENT}}. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962224#comment-13962224 ] Gera Shegalov commented on HDFS-6143: - Hi [~wheat9], bq. Should this be 404 instead of 410? In UNIX, opening a unresolved symlink return ENOENT. That's server-side. On the client side, {{HttpUrlConnection}} translates {{GONE}} to {{FileNotFoundException}} as well. On another note, if you look at the API for {{AbstractFileSystem}} a potential future WebHdfs implementation will be able to surface the real cause as {{UnresolvedLinkException}} WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962313#comment-13962313 ] Haohui Mai commented on HDFS-6143: -- bq. That's server-side. On the client side, HttpUrlConnection translates GONE to FileNotFoundException as well... It looks to me that it depends on the fact that the server return invalid json rather than relying on the status code. Since this is marked as a blocker, it might be good to keep the changes minimal. Is it okay to remove this changes from this jira? We can address it into a separate jira though. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962372#comment-13962372 ] Haohui Mai commented on HDFS-6143: -- +1 pending jenkins. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962494#comment-13962494 ] Hadoop QA commented on HDFS-6143: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639083/HDFS-6143-trunk-after-HDFS-5570.v02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestSafeMode org.apache.hadoop.hdfs.TestFileAppend4 {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6609//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6609//console This message is automatically generated. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962505#comment-13962505 ] Haohui Mai commented on HDFS-6143: -- Test failures are unrelated. I'll commit it shortly. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962512#comment-13962512 ] Hudson commented on HDFS-6143: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5467 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5467/]) HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths. Contributed by Gera Shegalov. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961512#comment-13961512 ] Haohui Mai commented on HDFS-6143: -- {code} + public ByteRangeInputStream(URLOpener o, URLOpener r, boolean connect) + throws IOException { +this(o, r); +if (connect) { + getInputStream(); +} + } {code} {{WebHdfsFileSystem}} is only user of {{ByteRangeInputStream}}. I think it is safe to change the original constructor instead of adding a new one. {code} hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ExceptionHandler.java ... {code} This should be unnecessary, as based on my understanding the code is only invoked at the server side. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961515#comment-13961515 ] Haohui Mai commented on HDFS-6143: -- bq. The v05 patch did not apply because HDFS-5570 removed TestByteRangeInputStream. Was it intentional? Thanks for catching it. The old test only covers the usage from {{HftpFileSystem}}. I'll file a jira to create new tests to test {{ByteRangeInputStream}}, and we don't need to address it in this jira. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961306#comment-13961306 ] Hadoop QA commented on HDFS-6143: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638892/HDFS-6143.v05.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6594//console This message is automatically generated. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961334#comment-13961334 ] Hadoop QA commented on HDFS-6143: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638895/HDFS-6143.v06.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestSafeMode {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6595//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6595//console This message is automatically generated. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961339#comment-13961339 ] Gera Shegalov commented on HDFS-6143: - Test failure seems unrelated and reported earlier in HDFS-6160. Rerun org.apache.hadoop.hdfs.TestSafeMode on my laptop succeeded. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)