[jira] [Commented] (HADOOP-16378) RawLocalFileStatus throws exception if a file is created and deleted quickly
[ https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886671#comment-16886671 ] K S commented on HADOOP-16378: -- Hey [~ste...@apache.org]. Sorry but I have unfortunately not been able to reproduce. I'm not sure if I can even post a stack trace, as it is part of my company's IP. I have already posted the reproduction steps in the description of this bug. Hope that helps. > RawLocalFileStatus throws exception if a file is created and deleted quickly > > > Key: HADOOP-16378 > URL: https://issues.apache.org/jira/browse/HADOOP-16378 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.3.0 > Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists > on later versions of Hadoop as well), Java 8 ( + Java 11). >Reporter: K S >Priority: Critical > > Bug occurs when NFS creates temporary ".nfs*" files as part of file moves and > accesses. If this file is deleted very quickly after being created, a > RuntimeException is thrown. The root cause is in the loadPermissionInfo > method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission > info, it first does > > {code:java} > ls -ld{code} > and then attempts to get permissions info about each file. If a file > disappears between these two steps, an exception is thrown. > *Reproduction Steps:* > An isolated way to reproduce the bug is to run FileInputFormat.listStatus > over and over on the same dir that we’re creating those temp files in. On > Ubuntu or any other Linux-based system, this should fail intermittently > *Fix:* > One way in which we managed to fix this was to ignore the exception being > thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's > possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem > would fix this issue, though we never tested this, and the flag was > implemented to fix -HADOOP-9652-. Could also fix in conjunction with > HADOOP-8772. > > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16378) RawLocalFileStatus throws exception if a file is created and deleted quickly
[ https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875103#comment-16875103 ] K S commented on HADOOP-16378: -- Sorry forgot to try and reproduce it. I will do it over the weekend > RawLocalFileStatus throws exception if a file is created and deleted quickly > > > Key: HADOOP-16378 > URL: https://issues.apache.org/jira/browse/HADOOP-16378 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.3.0 > Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists > on later versions of Hadoop as well), Java 8 ( + Java 11). >Reporter: K S >Priority: Critical > > Bug occurs when NFS creates temporary ".nfs*" files as part of file moves and > accesses. If this file is deleted very quickly after being created, a > RuntimeException is thrown. The root cause is in the loadPermissionInfo > method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission > info, it first does > > {code:java} > ls -ld{code} > and then attempts to get permissions info about each file. If a file > disappears between these two steps, an exception is thrown. > *Reproduction Steps:* > An isolated way to reproduce the bug is to run FileInputFormat.listStatus > over and over on the same dir that we’re creating those temp files in. On > Ubuntu or any other Linux-based system, this should fail intermittently > *Fix:* > One way in which we managed to fix this was to ignore the exception being > thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's > possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem > would fix this issue, though we never tested this, and the flag was > implemented to fix -HADOOP-9652-. Could also fix in conjunction with > HADOOP-8772. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16378) RawLocalFileStatus throws exception if a file is created and deleted quickly
[ https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871768#comment-16871768 ] K S commented on HADOOP-16378: -- Eh, it'll be a little difficult to reproduce. We discovered the error by mistake when running company software, and managed to reproduce it by running a set of programs and running a bash script to quickly generate and delete files that start with "." I will try to reproduce it tomorrow evening > RawLocalFileStatus throws exception if a file is created and deleted quickly > > > Key: HADOOP-16378 > URL: https://issues.apache.org/jira/browse/HADOOP-16378 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.3.0 > Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists > on later versions of Hadoop as well), Java 8 ( + Java 11). >Reporter: K S >Priority: Critical > > Bug occurs when NFS creates temporary ".nfs*" files as part of file moves and > accesses. If this file is deleted very quickly after being created, a > RuntimeException is thrown. The root cause is in the loadPermissionInfo > method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission > info, it first does > > {code:java} > ls -ld{code} > and then attempts to get permissions info about each file. If a file > disappears between these two steps, an exception is thrown. > *Reproduction Steps:* > An isolated way to reproduce the bug is to run FileInputFormat.listStatus > over and over on the same dir that we’re creating those temp files in. On > Ubuntu or any other Linux-based system, this should fail intermittently > *Fix:* > One way in which we managed to fix this was to ignore the exception being > thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's > possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem > would fix this issue, though we never tested this, and the flag was > implemented to fix -HADOOP-9652-. Could also fix in conjunction with > HADOOP-8772. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16378) RawLocalFileStatus throws exception if a file is created and deleted quickly
[ https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871248#comment-16871248 ] Steve Loughran commented on HADOOP-16378: - lets not worry about that. The main thing is not to suffer here. what's the full stack trace? > RawLocalFileStatus throws exception if a file is created and deleted quickly > > > Key: HADOOP-16378 > URL: https://issues.apache.org/jira/browse/HADOOP-16378 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.3.0 > Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists > on later versions of Hadoop as well), Java 8 ( + Java 11). >Reporter: K S >Priority: Critical > > Bug occurs when NFS creates temporary ".nfs*" files as part of file moves and > accesses. If this file is deleted very quickly after being created, a > RuntimeException is thrown. The root cause is in the loadPermissionInfo > method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission > info, it first does > > {code:java} > ls -ld{code} > and then attempts to get permissions info about each file. If a file > disappears between these two steps, an exception is thrown. > *Reproduction Steps:* > An isolated way to reproduce the bug is to run FileInputFormat.listStatus > over and over on the same dir that we’re creating those temp files in. On > Ubuntu or any other Linux-based system, this should fail intermittently > *Fix:* > One way in which we managed to fix this was to ignore the exception being > thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's > possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem > would fix this issue, though we never tested this, and the flag was > implemented to fix -HADOOP-9652-. Could also fix in conjunction with > HADOOP-8772. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16378) RawLocalFileStatus throws exception if a file is created and deleted quickly
[ https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868812#comment-16868812 ] K S commented on HADOOP-16378: -- [~ste...@apache.org] I don't see that happening. The loadPermissionInfo method only uses the shell, unless you're looking somewhere else. Moving off of shell entirely would be a good idea, though im not familiar enough with this codebase to give any sort of advice. It would be good to have other developers weigh in here > RawLocalFileStatus throws exception if a file is created and deleted quickly > > > Key: HADOOP-16378 > URL: https://issues.apache.org/jira/browse/HADOOP-16378 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.3.0 > Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists > on later versions of Hadoop as well), Java 8 ( + Java 11). >Reporter: K S >Priority: Critical > > Bug occurs when NFS creates temporary ".nfs*" files as part of file moves and > accesses. If this file is deleted very quickly after being created, a > RuntimeException is thrown. The root cause is in the loadPermissionInfo > method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission > info, it first does > > {code:java} > ls -ld{code} > and then attempts to get permissions info about each file. If a file > disappears between these two steps, an exception is thrown. > *Reproduction Steps:* > An isolated way to reproduce the bug is to run FileInputFormat.listStatus > over and over on the same dir that we’re creating those temp files in. On > Ubuntu or any other Linux-based system, this should fail intermittently > *Fix:* > One way in which we managed to fix this was to ignore the exception being > thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's > possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem > would fix this issue, though we never tested this, and the flag was > implemented to fix -HADOOP-9652-. Could also fix in conjunction with > HADOOP-8772. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16378) RawLocalFileStatus throws exception if a file is created and deleted quickly
[ https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1689#comment-1689 ] Steve Loughran commented on HADOOP-16378: - I'd prefer moving off shell entirely and into the fs APIs, either java or hadoop native. Doesn't it already drop to some native lib if its available? > RawLocalFileStatus throws exception if a file is created and deleted quickly > > > Key: HADOOP-16378 > URL: https://issues.apache.org/jira/browse/HADOOP-16378 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.3.0 > Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists > on later versions of Hadoop as well), Java 8 ( + Java 11). >Reporter: K S >Priority: Critical > > Bug occurs when NFS creates temporary ".nfs*" files as part of file moves and > accesses. If this file is deleted very quickly after being created, a > RuntimeException is thrown. The root cause is in the loadPermissionInfo > method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission > info, it first does > > {code:java} > ls -ld{code} > and then attempts to get permissions info about each file. If a file > disappears between these two steps, an exception is thrown. > *Reproduction Steps:* > An isolated way to reproduce the bug is to run FileInputFormat.listStatus > over and over on the same dir that we’re creating those temp files in. On > Ubuntu or any other Linux-based system, this should fail intermittently > *Fix:* > One way in which we managed to fix this was to ignore the exception being > thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's > possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem > would fix this issue, though we never tested this, and the flag was > implemented to fix -HADOOP-9652-. Could also fix in conjunction with > HADOOP-8772. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org