[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-11 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966638#comment-13966638
 ] 

Daryn Sharp commented on HDFS-6143:
---

bq. [...] It may not work since the seek offset is missing in open. There is no 
way to calculate the redirect.

I have an internal 0.23 patch that I'm reworking for 2.x that will actually use 
the block locations for opens - HDFS-6221.  The current incarnation fetches 
block locations for just the offset when the http connection stream is opened, 
but it could easily be changed to fetch all the block locations when the 
webhdfs open is called - which will elicit a FNF - and then use the locations 
for a given offset when the http connection stream is opened.

bq. Short term, looking at the ByteRangeInputStream, it's inefficient in that 
for even a single byte forward seek (seek(getPos()+1), it closes the connection 
and re-opens it [...] read-ahead for short range seeks, which is a lot more 
efficient

Yes, I've already tinkered with fixing this very problem.  Internally we found 
that a fraction of jobs actually perform seeks after the split offset seek, and 
those that did seek would only do so maybe 1-2 times so it was deemed a low 
priority fix.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963976#comment-13963976
 ] 

Steve Loughran commented on HDFS-6143:
--

Daryn,

Having spent time looking at traces of swift FS operations, the combination of 
Open+seek is ubiquitous, and it is expensive over long-distance links, 
especially with HTTP in the story.

But: we do expect {{open(path)}} to fail if its not there -changing that is a 
major change in expectations.

What would make sense -long term- is for a new operation  {{openAt(Path, 
offset)}}. For any of the HTTP filesystems, this would do a GET from the offset 
at open time; 

Short term, looking at the {{ByteRangeInputStream}}, it's inefficient in that 
for even a single byte forward seek (seek(getPos()+1), it closes the connection 
and re-opens it, adds the cost of setting up the connection and resets all flow 
control data on the channel. If you look a {{SwiftNativeInputStream}} you can 
see how it does read-ahead for short range seeks, which is a lot more efficient 
for any code that is reading and skipping ahead. Someone should think about 
doing that as it would reduce the performance of those seeks.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962773#comment-13962773
 ] 

Hudson commented on HDFS-6143:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #533 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/533/])
HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for 
non-existing paths. Contributed by Gera Shegalov. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java


 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962963#comment-13962963
 ] 

Hudson commented on HDFS-6143:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1751 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1751/])
HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for 
non-existing paths. Contributed by Gera Shegalov. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java


 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962974#comment-13962974
 ] 

Hudson commented on HDFS-6143:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1725 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1725/])
HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for 
non-existing paths. Contributed by Gera Shegalov. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java


 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962999#comment-13962999
 ] 

Daryn Sharp commented on HDFS-6143:
---

Unfortunately this appears to introduce a performance penalty for jobs using 
webhdfs.

A file used as job input will open the file and immediately seek to the split 
location.  Currently, the lazy open will only begin streaming from the seek 
offset.  Doesn't this patch cause every map to begin streaming from offset 0, 
followed by the seek which closes the stream, and then re-opening and streaming 
from the new offset?

If yes this adds unnecessary load, additional latency for an unnecessary 
connection that will be closed, is wasteful of bandwidth for data that will be 
ignored, etc.  I think the patch should be reverted.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963005#comment-13963005
 ] 

Daryn Sharp commented on HDFS-6143:
---

If I understand the intent of this jira correctly, I believe you want to do 
something similar to the manual redirect resolution ala the two-step write: 
upon creating the data stream, issue the open to the NN with follow redirects 
disabled which will elicit an immediate 404.  Then upon read use the resolved 
url.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963139#comment-13963139
 ] 

Gera Shegalov commented on HDFS-6143:
-

[~daryn], thanks for review. I agree that there is a performance cost 
associated with streaming too early if we are going to seek right away, e.g, to 
a split offset. What do think of just invoking {{getFileStatus}} within as the 
first thing in {{WebHdfsFileSystem.open}} 

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963356#comment-13963356
 ] 

Daryn Sharp commented on HDFS-6143:
---

A {{getFileStatus}} call will add more load to the NN, and more importantly I 
don't see how it will address the issue.  The NN shouldn't be redirecting you 
if the file doesn't exist.  If it is, that's the bug that must be fixed.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963359#comment-13963359
 ] 

Daryn Sharp commented on HDFS-6143:
---

I'm sorry, my brain is elsewhere, ignore previous comment.  I think the client 
should do the two step redirect resolve so the client will get FNF immediately, 
but the datanode stream will still lazy open.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963419#comment-13963419
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6143:
---

 ... believe you want to do something similar to the manual redirect 
 resolution ala the two-step write: upon creating the data stream, issue the 
 open to the NN with follow redirects disabled which will elicit an immediate 
 404. Then upon read use the resolved url.

It may not work since the seek offset is missing in open.  There is no way to 
calculate the redirect.

[~jira.shegalov], could you explain how this breaks LzoInputFormat.getSplits?  
I am also concerned about the performance penalty.  I wonder if there are other 
solutions.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963576#comment-13963576
 ] 

Gera Shegalov commented on HDFS-6143:
-

[~daryn], I think putting calling {{getFileStatus}} in open to get FNF, and 
keep the old lazy open will have the same effect.

{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
 b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/
index bdf744a..bad1534 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
@@ -898,6 +898,7 @@ public boolean delete(Path f, boolean recursive) throws 
IOException {
   @Override
   public FSDataInputStream open(final Path f, final int buffersize
   ) throws IOException {
+getFileStatus(f);
 statistics.incrementReadOps(1);
 final HttpOpParam.Op op = GetOpParam.Op.OPEN;
 final URL url = toUrl(op, f, new BufferSizeParam(buffersize));
{code}


 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-08 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963595#comment-13963595
 ] 

Gera Shegalov commented on HDFS-6143:
-

[~szetszwo], it's easy to understand the issue if your review the unit test 
modifications in the patch. As for our production use case, that works fine on 
the LocalFileSystem and HDFS, please review  splittable Lzo in elephant bird. 
Hadoop-lzo's {{LzoIndex.readIndex}} reads the index file and exploits the fact 
the FS should [error out opening a non-existing 
file|https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/compression/lzo/LzoIndex.java#L176],
 instead of using an RPC to get the file status  directly or via {{exists}}.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961652#comment-13961652
 ] 

Hadoop QA commented on HDFS-6143:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12638935/HDFS-6143-trunk-after-HDFS-5570.v01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6597//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6597//console

This message is automatically generated.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961714#comment-13961714
 ] 

Hadoop QA commented on HDFS-6143:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12638943/HDFS-6143-branch-2.4.0.v01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6600//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6600//console

This message is automatically generated.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961906#comment-13961906
 ] 

Arun C Murthy commented on HDFS-6143:
-

One concern: what is the impact on compatibility with 2.2 etc?

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962042#comment-13962042
 ] 

Haohui Mai commented on HDFS-6143:
--

{code}
+} else if (e instanceof UnresolvedLinkException) {
+  s = Response.Status.GONE;
{code}

Should this be 404 instead of 410? In UNIX, opening a unresolved symlink return 
{{ENOENT}}.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962224#comment-13962224
 ] 

Gera Shegalov commented on HDFS-6143:
-

Hi [~wheat9],

bq. Should this be 404 instead of 410? In UNIX, opening a unresolved symlink 
return ENOENT.

That's server-side. On the client side, {{HttpUrlConnection}} translates 
{{GONE}} to {{FileNotFoundException}} as well. 

On another note, if you look at the API for {{AbstractFileSystem}}  a potential 
future WebHdfs implementation will be able to surface the real cause as 
{{UnresolvedLinkException}} 

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962313#comment-13962313
 ] 

Haohui Mai commented on HDFS-6143:
--

bq. That's server-side. On the client side, HttpUrlConnection translates GONE 
to FileNotFoundException as well...

It looks to me that it depends on the fact that the server return invalid json 
rather than relying on the status code. Since this is marked as a blocker, it 
might be good to keep the changes minimal. Is it okay to remove this changes 
from this jira? We can address it into a separate jira though.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962372#comment-13962372
 ] 

Haohui Mai commented on HDFS-6143:
--

+1 pending jenkins.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962494#comment-13962494
 ] 

Hadoop QA commented on HDFS-6143:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12639083/HDFS-6143-trunk-after-HDFS-5570.v02.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestSafeMode
  org.apache.hadoop.hdfs.TestFileAppend4

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6609//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6609//console

This message is automatically generated.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962505#comment-13962505
 ] 

Haohui Mai commented on HDFS-6143:
--

Test failures are unrelated. I'll commit it shortly.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962512#comment-13962512
 ] 

Hudson commented on HDFS-6143:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5467 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5467/])
HDFS-6143. WebHdfsFileSystem open should throw FileNotFoundException for 
non-existing paths. Contributed by Gera Shegalov. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585639)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java


 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961512#comment-13961512
 ] 

Haohui Mai commented on HDFS-6143:
--

{code}
+  public ByteRangeInputStream(URLOpener o, URLOpener r, boolean connect)
+  throws IOException {
+this(o, r);
+if (connect) {
+  getInputStream();
+}
+  }
{code}

{{WebHdfsFileSystem}} is only user of {{ByteRangeInputStream}}. I think it is 
safe to change the original constructor instead of adding a new one.

{code}
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ExceptionHandler.java
...
{code}

This should be unnecessary, as based on my understanding the code is only 
invoked at the server side.




 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, 
 HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961515#comment-13961515
 ] 

Haohui Mai commented on HDFS-6143:
--

bq. The v05 patch did not apply because HDFS-5570 removed 
TestByteRangeInputStream. Was it intentional?

Thanks for catching it. The old test only covers the usage from 
{{HftpFileSystem}}. I'll file a jira to create new tests to test 
{{ByteRangeInputStream}}, and we don't need to address it in this jira.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, 
 HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961306#comment-13961306
 ] 

Hadoop QA commented on HDFS-6143:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12638892/HDFS-6143.v05.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6594//console

This message is automatically generated.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, 
 HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v05.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961334#comment-13961334
 ] 

Hadoop QA commented on HDFS-6143:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12638895/HDFS-6143.v06.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestSafeMode

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6595//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6595//console

This message is automatically generated.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, 
 HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-05 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961339#comment-13961339
 ] 

Gera Shegalov commented on HDFS-6143:
-

Test failure seems unrelated and reported earlier in HDFS-6160. Rerun 
org.apache.hadoop.hdfs.TestSafeMode on my laptop succeeded.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch, 
 HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)