[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275613#comment-16275613 ] Ping Liu commented on HADOOP-14600: --- This is great to hear! Finally, this gets in. Thanks [~chris.douglas]! > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Fix For: 3.1.0 > > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java, > command_line_test_result__linux.txt, command_line_test_result__windows.txt > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273970#comment-16273970 ] Ping Liu edited comment on HADOOP-14600 at 12/1/17 5:47 AM: Just verified. There is no error! I missed {{-Pnative}} in Maven build that is required profile to generate JNI native code. Now after built with {{-Pnative}}, things look good. I tried the patch on IntelliJ in both Windows and Linux and made sure seeing the code flow into the test cases. Also tested command line console. I am attaching the command line test results from both Windows and Linux (see attachments: {{command_line_test_result__linux.txt}}, {{command_line_test_result__windows.txt}}). cc: [~chris.douglas], [~steve_l] was (Author: myapachejira): Just verified. There is no error! I missed {{-Pnative}} in Maven build that is required profile to generate JNI native code. Now things look good. I tried the patch on IntelliJ in both Windows and Linux and made sure seeing the code flow into the test cases. Also tested command line console. I am attaching the command line test results from both Windows and Linux (see attachments: {{command_line_test_result__linux.txt}}, {{command_line_test_result__windows.txt}}). cc: [~chris.douglas], [~steve_l] > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java, > command_line_test_result__linux.txt, command_line_test_result__windows.txt > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273970#comment-16273970 ] Ping Liu commented on HADOOP-14600: --- Just verified. There is no error! I missed {{-Pnative}} in Maven build that is required profile to generate JNI native code. Now things look good. I tried the patch on IntelliJ in both Windows and Linux and made sure seeing the code flow into the test cases. Also tested command line console. I am attaching the command line test results from both Windows and Linux (see attachments: {{command_line_test_result__linux.txt}}, {{command_line_test_result__windows.txt}}). cc: [~chris.douglas], [~steve_l] > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java, > command_line_test_result__linux.txt, command_line_test_result__windows.txt > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: command_line_test_result__linux.txt command_line_test_result__windows.txt > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java, > command_line_test_result__linux.txt, command_line_test_result__windows.txt > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268098#comment-16268098 ] Ping Liu commented on HADOOP-14600: --- Yes, Chris. I am verifying the patch. There is an issue just found tonight in my Linux environment. In TestRawLocalFileSystemContract.testPermission(), the native call failed with {{java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;}}. I'll look into it further. I guess it is due to my last change. I'll come back with update. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263821#comment-16263821 ] Ping Liu edited comment on HADOOP-14600 at 11/23/17 5:01 AM: - [~chris.douglas] Finally, this round is green. That's great! Do you still need me verify it? If so, I will try to work on it during this weekend. was (Author: myapachejira): [~chris.douglas] Finally, this round is green. That's great! Do you still need me verify it? If so, I need learn how to use "git apply " :) > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263821#comment-16263821 ] Ping Liu commented on HADOOP-14600: --- [~chris.douglas] Finally, this round is green. That's great! Do you still need me verify it? If so, I need learn how to use "git apply " :) > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257912#comment-16257912 ] Ping Liu commented on HADOOP-14600: --- [~chris.douglas] excellent catch! Your correction is perfect. Yes, {{recursive}} is boolean and will instruct {{Shell.getSetPermissionCommand()}} to get "set permission command" either recursively or not based on the flag. Currently, only non-recursive mode is used. But in the future, recursive mode can be used when needed. Other changes look good. Thanks for detailed changes! The only question I have is the number of spaces for indentation. I notice you are using two spaces in {{StatUtils}}. I was using two-space before as I think this saves space but was told four-space should be used for readability as two-space looks busy. Oh, as I just read the Oracle/Sun code convention, it says indentation should be four spaces. Other than indentation, all else look good. Thanks Chris! > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239865#comment-16239865 ] Ping Liu commented on HADOOP-14600: --- This time unit test fails on different test case (TestZKFailoverController.testGracefulFailover). But again it is irrelevant to the patch. cc: [~chris.douglas], [~ste...@apache.org] > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: HADOOP-14600.006.patch > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: (was: HADOOP-14600.006.patch) > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239347#comment-16239347 ] Ping Liu commented on HADOOP-14600: --- [~chris.douglas] You are right. {{path}} doesn't connect to return value. It should be released regardless the value of {{ret}}. I just added updated new patch. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: HADOOP-14600.006.patch > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235082#comment-16235082 ] Ping Liu commented on HADOOP-14600: --- Can someone have a look at this? As I said before, the unit test failure is irrelevant to this fix. [~ste...@apache.org] > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu >Priority: Major > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: HADOOP-14600.005.patch * Moved up deprecation suppression annotation in TestRawLocalFileSystemContract to class level - hopefully this will clear javac warning. * Again, the JUnit test failures with KDiag.java and TestRaceWhenRelogin.java are irrelevant to HADOOP-14600. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: HADOOP-14600.004.patch => Fixed license issue - added ASF license header to newly added Helper.java => Fixed deprecation issue - added Suppression anotation in TestRawLocalFileSystemContract.testPermission() => Fixed leftover test directory and file, which should have be cleaned up - changed to use test base directory (getTestBaseDir()) in TestRawLocalFileSystemContract.testPermission() - this directory will be automatically recycled in tearDown() Please note that there are totally three test failures as follows. {quote}Tests in error: TestKDiag.testKeytabAndPrincipal:162->kdiag:119 ? KerberosAuth Login failure f... TestKDiag.testFileOutput:186->kdiag:119 ? KerberosAuth Login failure for user:... TestKDiag.testLoadResource:196->kdiag:119 ? KerberosAuth Login failure for use... Tests run: 3927, Failures: 0, Errors: 3, Skipped: 206{quote} The failures are all from TestKDiag which is not related to HADOOP-14600. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186875#comment-16186875 ] Ping Liu commented on HADOOP-14600: --- [~ste...@apache.org] Thanks for your detailed code review! Really appreciate it. I just finished making recommended changes and attached *HADOOP-14600.003.patch*. Following are details. {{Helper.java (new class):}} For better supporting unit test, I added some more testing mechanism and moved the logic into a new class called Helper.java. Now I can not only check permission but also change permission. I didn't find a place where we can put the utilities. So just add this one. In case, if one want to added other common utility method. Helper class can be the place. {{TestRawLocalFileSystemContract.java:}} With this addition, I can improve test by adding testPermission() into TestRawLocalFileSystemContract where now both loadPermissionInfoByNativeIO() and loadPermissionInfoByNonNativeIO() can be directly tested as you suggested. I can test it on Linux. But on Windows, sticky bit change doesn't take effect. I guess Windows probably doesn't have sticky bit feature. {{TestNativeIO.java:}} Similarly, doStatTest() was simplified by calling the Helper method. Also improved testStatOnError() by using LambdaTestUtils, improved testMultiThreadedStat() by using ExecutorService and Future, also adding testMultiThreadedStatOnError(). {{RawLocalFileSystem.java:}} A minor issue found (loadPermissionInfo()) is that domain returned with group in Windows. So we need remove domain. This is the same as removing domain from domain/user in existing code. Lastly, for {{NativeIO.c}}, FindFileOwnerAndPermission is not a MSDN function. I found it defined at Line 811 in hadoop-common/src/main/winutils/libwinutils.c. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: HADOOP-14600.003.patch > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160168#comment-16160168 ] Ping Liu commented on HADOOP-14600: --- Changes has been made. *HADOOP-14600.002.patch* is attached. Also added unit tests. CC: [~ste...@apache.org] > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: HADOOP-14600.002.patch > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157223#comment-16157223 ] Ping Liu commented on HADOOP-14600: --- Excellent comments. I'm going to make the suggested changes. For all of those tests that timed out, they don't use the file permission. They test something else. I should have clarified it. Thanks Steve! > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154192#comment-16154192 ] Ping Liu commented on HADOOP-14600: --- I couldn't successfully set up a local environment to run test-patch. So I went to the test result at https://builds.apache.org/job/PreCommit-HADOOP-Build/13153/testReport/ in the above table from [~hadoopqa]. I did manual test on all of five tests as follows. * TestSFTPFileSystem.testStatFile * TestDNS.testDefaultDnsServer * TestRaceWhenRelogin.test * TestKDiag.testKeytabAndPrincipal * TestKDiag.testFileOutput * TestKDiag.testLoadResource But none of the tests hits on the new method *loadPermissionInfoByNativeIO()* in *RawLocalFileSystem* -- *loadPermissionInfoByNativeIO()* is the new code that swaps the original *_loadPermissionInfo()_* and is the only change to the previous version. Additionally, I ran "mvn test -Pnative -Dtest=allNative" on my local environment and found 3 failures and 5 errors. But they are mainly timed out. After giving more time, majority of the tests passed. For TestRPCWaitForProxy.testInterruptedWaitForProxy, it's the only one still generating error after timeout time has been increased. However, manual test on it didn't hit the break point in *loadPermissionInfoByNativeIO()* too. In summary, I didn't find any failed test case for the target new method, *loadPermissionInfoByNativeIO()*. Please let me know if this is enough for the verification or there are more tests to run and how. CC: [~hadoopqa] > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151289#comment-16151289 ] Ping Liu commented on HADOOP-14600: --- Yeah, it must be automatically included with MingW, Visual Studio, or some other installation. Thanks for telling me that! It's good to know all of these especially when using command line. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151255#comment-16151255 ] Ping Liu commented on HADOOP-14600: --- Thanks [~aw]! I have both Cygwin and Git. But Neither has /usr/bin. As I checked Program Files, I found there is another Git installation just as you mentioned! I can try this one. But you are right. I got lots of cuts and blood with getting mvn test -Dtest=foo running on Windows. I'll try to run test-patch on Linux and come back test the same scenario on Windows probably with mvn test -Dtest=foo one by one as a workaround. Thanks! > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151090#comment-16151090 ] Ping Liu commented on HADOOP-14600: --- Now I am trying to do the patch test on my Windows. It looks like dev-support/bin/test-patch is a BASH script and cannot be run on Windows. Is there any guide on how to run it on Windows or the patch test is not expected to be run on Windows? CC: [~aw] > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150822#comment-16150822 ] Ping Liu commented on HADOOP-14600: --- Hi [~jzhuge], thanks for your help! The patch and the test file are now attached. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping Liu updated HADOOP-14600: -- Attachment: HADOOP-14600.001.patch TestRawLocalFileSystemContract.java > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, > TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150254#comment-16150254 ] Ping Liu commented on HADOOP-14600: --- Oops, looks like I don't have permission to attach files. I'll see if I can request the permission from the mailing list. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150245#comment-16150245 ] Ping Liu commented on HADOOP-14600: --- I just followed [~steve_l]'s idea to add stat() native implementation. Yes, it is similar to fstat() but doesn't need open file as it doesn't require a file descriptor. Now there is no need to spawn extra thread to gather process info any more. I did some manual test on both Windows 10 and Linux (Ubuntu on VirtualBox). It looks like it has dramatic improvement on both systems. {noformat} Windows number of files time (ms) time (ms) with native IO 100 14274 1234 150 19002 1782 200 21865 2250 500 timed out 5125 1000timed out 9735 2000timed out 18875 Linux number of files time (ms) time (ms) with native IO 100 45391632 150 61372031 200 71392764 500 15566 5292 1000timed out 7490 2000timed out 14040 {noformat} The test is primitive but sufficiently shows the improvement. Attached is the patch file: *HADOOP-14600__Patch__20170901.txt*. When doing the test, I added testListStatusForPerformance() to TestRawLocalFileSystem.java. Also attached above. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org