[jira] [Comment Edited] (HADOOP-14691) Shell command "hadoop fs -put" multiple close problem

2017-08-08 Thread Andras Bokor (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118303#comment-16118303
 ] 

Andras Bokor edited comment on HADOOP-14691 at 8/8/17 2:02 PM:
---

It's a very good catch but the solution seems makes the things more complicated 
and it is not backward-compatible since we change the signature of a method.
Instead, I suggest fixing HADOOP-5943. Using try-with-resources at the same 
place where the resource is created seems a better practice and used widely in 
Java world.
We can introduce the new methods without closing ability and keep the old ones 
as deprecated to keep the compatibility. I am happy to send a patch for 
HADOOP-5943.
Thoughts?

P.s.: I am removing the linked issue since it is not related to the exception 
in HDFS-10429.


was (Author: boky01):
It's a very good catch but the solution seems makes the things more complicated 
and it is not backward-compatible since we change the signature of a method.
Instead, I suggest fixing HADOOP-5943. Using try-with-resources at the same 
place where the resource is created seems a better practice and used widely in 
Java world.
We can introduce the new methods without closing ability and keep the old ones 
as deprecated to keep the compatibility. I am happy to send a patch for 
HADOOP-5943.
Thoughts?

P.s.: I removing the linked issue since it is not related to the exception in 
HDFS-10429.

> Shell command "hadoop fs -put" multiple close problem
> -
>
> Key: HADOOP-14691
> URL: https://issues.apache.org/jira/browse/HADOOP-14691
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.7.3
> Environment: CentOS7.0
> JDK1.8.0_121
> hadoop2.7.3
>Reporter: Eric Lei
>Assignee: Eric Lei
>  Labels: close, filesystem, hadoop, multi
> Attachments: CommandWithDestination.patch, 
> hadoop_common_unit_test_result_after_modification.docx, 
> hadoop_common_unit_test_result_before_modification.docx, IOUtils.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> 1.Bug description
> Shell command “Hadoop fs -put” is a write operation. In this process, 
> FSDataOutputStream is new created and closed lastly. Finally, the 
> FSDataOutputStream.close() calls the close method in HDFS to end up the 
> communication of this write process between the server and client.
> With the command “Hadoop fs -put”, for each created FSDataOutputStream 
> object, FSDataOutputStream.close() is called twice, which means the close 
> method, in the underlying distributed file system, is called twice. This is 
> the error, that’s because the communication process, for example socket, 
> might be repeated shut down. Unfortunately, if there is no error protection 
> for the socket, there might be error for the socket in the second close. 
> Further, we think a correct upper file system design should keep the one time 
> close principle. It means that each creation of underlying distributed file 
> system object should correspond with close only once. 
> For the command “Hadoop fs -put”, there are double close as follows:
> a.The first close process:
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:466)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:391)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:328)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:263)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
> at 
> org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:267)
> at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
> at 

[jira] [Comment Edited] (HADOOP-14691) Shell command "hadoop fs -put" multiple close problem

2017-07-27 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103829#comment-16103829
 ] 

Wei-Chiu Chuang edited comment on HADOOP-14691 at 7/27/17 8:11 PM:
---

Hi [~Eric88] thanks for the detailed report!
It seems what you discovered is similar to HDFS-10429. 

I have not yet reviewed the patch in depth, but it looks like your patch 
contains irrelevant stuff. Could you remove them and rebase against trunk? 
Thanks.


was (Author: jojochuang):
Hi [~Eric88] thanks for the detailed report!
It seems what you discovered is similar to HDFS-10429. 

I have not yet reviewed the patch in depth, but it looks like your patch 
contains unrelevant stuff. Could you remove them and rebase against trunk? 
Thanks.

> Shell command "hadoop fs -put" multiple close problem
> -
>
> Key: HADOOP-14691
> URL: https://issues.apache.org/jira/browse/HADOOP-14691
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.7.3
> Environment: CentOS7.0
> JDK1.8.0_121
> hadoop2.7.3
>Reporter: Eric Lei
>  Labels: close, filesystem, hadoop, multi
> Attachments: hadoop-2.7.3-src.patch, 
> hadoop_common_unit_test_result_after_modification.docx, 
> hadoop_common_unit_test_result_before_modification.docx
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> 1.Bug description
> Shell command “Hadoop fs -put” is a write operation. In this process, 
> FSDataOutputStream is new created and closed lastly. Finally, the 
> FSDataOutputStream.close() calls the close method in HDFS to end up the 
> communication of this write process between the server and client.
> With the command “Hadoop fs -put”, for each created FSDataOutputStream 
> object, FSDataOutputStream.close() is called twice, which means the close 
> method, in the underlying distributed file system, is called twice. This is 
> the error, that’s because the communication process, for example socket, 
> might be repeated shut down. Unfortunately, if there is no error protection 
> for the socket, there might be error for the socket in the second close. 
> Further, we think a correct upper file system design should keep the one time 
> close principle. It means that each creation of underlying distributed file 
> system object should correspond with close only once. 
> For the command “Hadoop fs -put”, there are double close as follows:
> a.The first close process:
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:466)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:391)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:328)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:263)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
> at 
> org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:267)
> at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
> b.The second close process:
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
> at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:261)
> at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:468)
> at 
>