[jira] [Comment Edited] (HADOOP-14691) Shell command "hadoop fs -put" multiple close problem
[ https://issues.apache.org/jira/browse/HADOOP-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118303#comment-16118303 ] Andras Bokor edited comment on HADOOP-14691 at 8/8/17 2:02 PM: --- It's a very good catch but the solution seems makes the things more complicated and it is not backward-compatible since we change the signature of a method. Instead, I suggest fixing HADOOP-5943. Using try-with-resources at the same place where the resource is created seems a better practice and used widely in Java world. We can introduce the new methods without closing ability and keep the old ones as deprecated to keep the compatibility. I am happy to send a patch for HADOOP-5943. Thoughts? P.s.: I am removing the linked issue since it is not related to the exception in HDFS-10429. was (Author: boky01): It's a very good catch but the solution seems makes the things more complicated and it is not backward-compatible since we change the signature of a method. Instead, I suggest fixing HADOOP-5943. Using try-with-resources at the same place where the resource is created seems a better practice and used widely in Java world. We can introduce the new methods without closing ability and keep the old ones as deprecated to keep the compatibility. I am happy to send a patch for HADOOP-5943. Thoughts? P.s.: I removing the linked issue since it is not related to the exception in HDFS-10429. > Shell command "hadoop fs -put" multiple close problem > - > > Key: HADOOP-14691 > URL: https://issues.apache.org/jira/browse/HADOOP-14691 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.7.3 > Environment: CentOS7.0 > JDK1.8.0_121 > hadoop2.7.3 >Reporter: Eric Lei >Assignee: Eric Lei > Labels: close, filesystem, hadoop, multi > Attachments: CommandWithDestination.patch, > hadoop_common_unit_test_result_after_modification.docx, > hadoop_common_unit_test_result_before_modification.docx, IOUtils.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > 1.Bug description > Shell command “Hadoop fs -put” is a write operation. In this process, > FSDataOutputStream is new created and closed lastly. Finally, the > FSDataOutputStream.close() calls the close method in HDFS to end up the > communication of this write process between the server and client. > With the command “Hadoop fs -put”, for each created FSDataOutputStream > object, FSDataOutputStream.close() is called twice, which means the close > method, in the underlying distributed file system, is called twice. This is > the error, that’s because the communication process, for example socket, > might be repeated shut down. Unfortunately, if there is no error protection > for the socket, there might be error for the socket in the second close. > Further, we think a correct upper file system design should keep the one time > close principle. It means that each creation of underlying distributed file > system object should correspond with close only once. > For the command “Hadoop fs -put”, there are double close as follows: > a.The first close process: > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) > at > org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:466) > at > org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:391) > at > org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:328) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:263) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220) > at > org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:267) > at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) > at org.apache.hadoop.fs.shell.Command.run(Command.java:165) > at
[jira] [Comment Edited] (HADOOP-14691) Shell command "hadoop fs -put" multiple close problem
[ https://issues.apache.org/jira/browse/HADOOP-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103829#comment-16103829 ] Wei-Chiu Chuang edited comment on HADOOP-14691 at 7/27/17 8:11 PM: --- Hi [~Eric88] thanks for the detailed report! It seems what you discovered is similar to HDFS-10429. I have not yet reviewed the patch in depth, but it looks like your patch contains irrelevant stuff. Could you remove them and rebase against trunk? Thanks. was (Author: jojochuang): Hi [~Eric88] thanks for the detailed report! It seems what you discovered is similar to HDFS-10429. I have not yet reviewed the patch in depth, but it looks like your patch contains unrelevant stuff. Could you remove them and rebase against trunk? Thanks. > Shell command "hadoop fs -put" multiple close problem > - > > Key: HADOOP-14691 > URL: https://issues.apache.org/jira/browse/HADOOP-14691 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.7.3 > Environment: CentOS7.0 > JDK1.8.0_121 > hadoop2.7.3 >Reporter: Eric Lei > Labels: close, filesystem, hadoop, multi > Attachments: hadoop-2.7.3-src.patch, > hadoop_common_unit_test_result_after_modification.docx, > hadoop_common_unit_test_result_before_modification.docx > > Original Estimate: 72h > Remaining Estimate: 72h > > 1.Bug description > Shell command “Hadoop fs -put” is a write operation. In this process, > FSDataOutputStream is new created and closed lastly. Finally, the > FSDataOutputStream.close() calls the close method in HDFS to end up the > communication of this write process between the server and client. > With the command “Hadoop fs -put”, for each created FSDataOutputStream > object, FSDataOutputStream.close() is called twice, which means the close > method, in the underlying distributed file system, is called twice. This is > the error, that’s because the communication process, for example socket, > might be repeated shut down. Unfortunately, if there is no error protection > for the socket, there might be error for the socket in the second close. > Further, we think a correct upper file system design should keep the one time > close principle. It means that each creation of underlying distributed file > system object should correspond with close only once. > For the command “Hadoop fs -put”, there are double close as follows: > a.The first close process: > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) > at > org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:466) > at > org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:391) > at > org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:328) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:263) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220) > at > org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:267) > at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) > at org.apache.hadoop.fs.shell.Command.run(Command.java:165) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) > b.The second close process: > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) > at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:261) > at > org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:468) > at >