[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690473#comment-17690473 ] ASF GitHub Bot commented on HDFS-13616: --- steveloughran commented on PR #1725: URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1434869803 @fanlinqian best to file an HDFS issue on the apache jira server. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.3.0 > > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642978#comment-17642978 ] ASF GitHub Bot commented on HDFS-13616: --- fanlinqian commented on PR #1725: URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1336375758 Hello, I encountered a bug when using the batch method, when I input a directory with more than 1000 files in it and 2 replications of each file's data block, only the first 500 files of this directory are returned and then it stops. I think it should be hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java in getBatchedListing() method to modify, as follows. ``` for (; srcsIndex < srcs.length; srcsIndex++) { String src = srcs[srcsIndex]; HdfsPartialListing listing; try { DirectoryListing dirListing = getListingInt(dir, pc, src, indexStartAfter, needLocation); if (dirListing == null) { throw new FileNotFoundException("Path " + src + " does not exist");} listing = new HdfsPartialListing(srcsIndex, Lists.newArrayList(dirListing.getPartialListing())); numEntries += listing.getPartialListing().size(); lastListing = dirListing; } catch (Exception e) { if (e instanceof AccessControlException) { logAuditEvent(false, operationName, src);} listing = new HdfsPartialListing(srcsIndex, new RemoteException(e.getClass().getCanonicalName(), e.getMessage())); lastListing = null; LOG.info("Exception listing src {}", src, e);} listings.put(srcsIndex, listing); //My modification (lastListing.getRemainingEntries()!=0) { break; } if (indexStartAfter.length != 0) { indexStartAfter = new byte[0]; } // Terminate if we've reached the maximum listing size if (numEntries >= dir.getListLimit()) { break; } } ``` The reason for this bug is mainly that the result returned by the getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit both the number of files in the directory as well as the number of data blocks and replications of the files at the same time. But the getBatchedListing() method will only exit the loop if the number of returned results is greater than 1000. Looking forward to your reply. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.3.0 > > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642976#comment-17642976 ] ASF GitHub Bot commented on HDFS-13616: --- fanlinqian commented on PR #1725: URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1336373809 Hello, I encountered a bug when using the batch method, when I input a directory with more than 1000 files in it and 2 replications of each file's data block, only the first 500 files of this directory are returned and then it stops. I think it should be hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java in getBatchedListing() method to modify, as follows. ` for (; srcsIndex < srcs.length; srcsIndex++) { String src = srcs[srcsIndex]; HdfsPartialListing listing; try { DirectoryListing dirListing = getListingInt(dir, pc, src, indexStartAfter, needLocation); if (dirListing == null) { throw new FileNotFoundException("Path " + src + " does not exist");} listing = new HdfsPartialListing(srcsIndex, Lists.newArrayList(dirListing.getPartialListing())); numEntries += listing.getPartialListing().size(); lastListing = dirListing; } catch (Exception e) { if (e instanceof AccessControlException) { logAuditEvent(false, operationName, src);} listing = new HdfsPartialListing(srcsIndex, new RemoteException(e.getClass().getCanonicalName(), e.getMessage())); lastListing = null; LOG.info("Exception listing src {}", src, e);} listings.put(srcsIndex, listing); //My modification (lastListing.getRemainingEntries()!=0) { break; } if (indexStartAfter.length != 0) { indexStartAfter = new byte[0]; } // Terminate if we've reached the maximum listing size if (numEntries >= dir.getListLimit()) { break; } }` The reason for this bug is mainly that the result returned by the getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit both the number of files in the directory as well as the number of data blocks and replications of the files at the same time. But the getBatchedListing() method will only exit the loop if the number of returned results is greater than 1000. Looking forward to your reply. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.3.0 > > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642972#comment-17642972 ] ASF GitHub Bot commented on HDFS-13616: --- fanlinqian commented on PR #1725: URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1336371353 Hello, I encountered a bug when using the batch method, when I input a directory with more than 1000 files in it and 2 replications of each file's data block, only the first 500 files of this directory are returned and then it stops. I think it should be hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java in getBatchedListing() method to modify, as follows. for (; srcsIndex < srcs.length; srcsIndex++) { String src = srcs[srcsIndex]; HdfsPartialListing listing; try { DirectoryListing dirListing = getListingInt(dir, pc, src, indexStartAfter, needLocation); if (dirListing == null) { throw new FileNotFoundException("Path " + src + " does not exist");} listing = new HdfsPartialListing(srcsIndex, Lists.newArrayList(dirListing.getPartialListing())); numEntries += listing.getPartialListing().size(); lastListing = dirListing; } catch (Exception e) { if (e instanceof AccessControlException) { logAuditEvent(false, operationName, src);} listing = new HdfsPartialListing(srcsIndex, new RemoteException(e.getClass().getCanonicalName(), e.getMessage())); lastListing = null; LOG.info("Exception listing src {}", src, e);} listings.put(srcsIndex, listing); //My modification (lastListing.getRemainingEntries()!=0) { break; } if (indexStartAfter.length != 0) { indexStartAfter = new byte[0]; } // Terminate if we've reached the maximum listing size if (numEntries >= dir.getListLimit()) { break; } } The reason for this bug is mainly that the result returned by the getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit both the number of files in the directory as well as the number of data blocks and replications of the files at the same time. But the getBatchedListing() method will only exit the loop if the number of returned results is greater than 1000. Looking forward to your reply. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.3.0 > > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642970#comment-17642970 ] ASF GitHub Bot commented on HDFS-13616: --- fanlinqian commented on PR #1725: URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1336370086 Hello, I encountered a bug when using the batch method, when I input a directory with more than 1000 files in it and 2 replications of each file's data block, only the first 500 files of this directory are returned and then it stops. I think it should be hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java in getBatchedListing() method to modify, as follows. for (; srcsIndex < srcs.length; srcsIndex++) { String src = srcs[srcsIndex]; HdfsPartialListing listing; try { DirectoryListing dirListing = getListingInt(dir, pc, src, indexStartAfter, needLocation); if (dirListing == null) { throw new FileNotFoundException("Path " + src + " does not exist");} listing = new HdfsPartialListing(srcsIndex, Lists.newArrayList(dirListing.getPartialListing())); numEntries += listing.getPartialListing().size(); lastListing = dirListing; } catch (Exception e) { if (e instanceof AccessControlException) { logAuditEvent(false, operationName, src);} listing = new HdfsPartialListing(srcsIndex, new RemoteException(e.getClass().getCanonicalName(), e.getMessage())); lastListing = null; LOG.info("Exception listing src {}", src, e);} listings.put(srcsIndex, listing); //My modification (lastListing.getRemainingEntries()!=0) { break; } if (indexStartAfter.length != 0) { indexStartAfter = new byte[0]; } // Terminate if we've reached the maximum listing size if (numEntries >= dir.getListLimit()) { break; } } The reason for this bug is mainly that the result returned by the getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit both the number of files in the directory as well as the number of data blocks and replications of the files at the same time. But the getBatchedListing() method will only exit the loop if the number of returned results is greater than 1000. Looking forward to your reply > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Fix For: 3.3.0 > > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016452#comment-17016452 ] Hudson commented on HDFS-13616: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17868 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17868/]) HDFS-13616. Batch listing of multiple directories (#1725) (weichiu: rev d7c4f8ab21c56a52afcfbd0a56d9120e61376d0c) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/protocol/TestReadOnly.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/BatchedDirectoryListing.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/ListingBenchmark.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBatchedListDirectories.java * (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsPartialListing.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFilterFileSystem.java * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PartialListing.java > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Fix For: 3.3.0 > > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990172#comment-16990172 ] Hadoop QA commented on HDFS-13616: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red} HDFS-13616 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13616 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28477/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898411#comment-16898411 ] Wei-Chiu Chuang commented on HDFS-13616: [~csun] please help! Thanks! > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897314#comment-16897314 ] Hadoop QA commented on HDFS-13616: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-13616 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13616 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27355/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897311#comment-16897311 ] Chao Sun commented on HDFS-13616: - This seems like a great feature. [~andrew.wang] do you still plan to finish this? I'd be happy to help to move this forward. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791908#comment-16791908 ] Todd Lipcon commented on HDFS-13616: Another data point here: this would be very useful for the Hive ACID table layout as well. For example, currently, querying a Hive ACID table requires Hive to do one listStatus for each partition, and then within the partitions, one listStatus per uncompacted transaction range (minimum 1 for a fully compact table). Again for a fully compacted table with relatively fine grained partitions, the ratio of returned files to listStatus calls can be quite small. If we assume that a large portion of the load on a NN might be coming from a Hive workload, implementing RPC batching could reduce RPC rate by an order of magnitude or more. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791920#comment-16791920 ] Hadoop QA commented on HDFS-13616: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-13616 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13616 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26469/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517070#comment-16517070 ] lpstudy commented on HDFS-13616: I am working on erasure coding, and encounter a question with no answer. I do not know where to post the question. Question: Hadoop 3.0 supports striped layout erasure coding, which will require to create multiple output streams so as to write data into file. However, according to my knowledge, hadoop doesn't support to write one file simultaneously. So my question is how to achieve this? > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499793#comment-16499793 ] Tsz Wo Nicholas Sze commented on HDFS-13616: > ... I'm not sure how to apply it to stream-oriented operations like reading > and writing files, ... It seems to me that we could just return a generic result class and then we could get the input/output stream from the result. I guess this will help a lot on reading/writing a lot of small files. > ... So if we're okay with a limited set of operations initially (say, just > listing and delete), then I could look into it. Sure, it is a good start. Thank you. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497016#comment-16497016 ] Andrew Wang commented on HDFS-13616: Would we be okay with making this a DistributedFileSystem-only change, with no changes to FileSystem? Right now non-DFS will just throw UnsupportedOperationException anyway, so it's specific to HDFS as it is. The batch feed API seems reasonable but it does add a lot of scope. I'm not sure how to apply it to stream-oriented operations like reading and writing files, so it may only be usable for metadata operations. So if we're okay with a limited set of operations initially (say, just listing and delete), then I could look into it. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494381#comment-16494381 ] Tsz Wo Nicholas Sze commented on HDFS-13616: We may have something similar to [Google Data batch feed|https://developers.google.com/gdata/docs/batch]. {code} // HDFS batch mode example API FileSystemFeed f = fs.createFeed(); f.list(p1); f.delete(p2); ... Iterable r = f.execute(); {code} > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494327#comment-16494327 ] Todd Lipcon commented on HDFS-13616: bq. In batch mode, user can submit any file system calls. All these calls will be sent in batch, possibly by multiple calls. Can you clarify how that would work with an API example? For example, if you are the Hive planner and you have a List partitionDirs, and your end goal is to come up with a Map> partitionToFiles, how would you use the API you're proposing? > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494293#comment-16494293 ] Tsz Wo Nicholas Sze commented on HDFS-13616: {code} Hi Nicholas, thanks for taking a look. Currently we don't see a need for API support beyond listing. The workload we're looking at is metadata loading for applications like Hive and Impala. {code} Batch delete definitely is very useful. {code} This batched listing API could also be combined with an async API (or a thread pool), so it's not an "either or" situation. {code} You are right that it is not "either or", although batch with async is natural. The batchedListStatusIterator APIs in the patch are too restrictive and have problems such as the List paths parameter can only has a limited size (it is not a remote iterator). How about we support a batch mode? In batch mode, user can submit any file system calls. All these calls will be sent in batch, possibly by multiple calls. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493863#comment-16493863 ] Todd Lipcon commented on HDFS-13616: bq. maybe getSourcePath? getListedPath? Yea, I think getListedPath or getRequestedPath would be good. Should clarify in the docs also whether the path returned there is guaranteed to be equal to the path that was passed in, or whether it could have been canonicalized, etc. I would hope that it's identical, since it makes "matching up" back to the original requested paths much easier. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491400#comment-16491400 ] Andrew Wang commented on HDFS-13616: bq. if I pass a file (instead of a directory), will I get back a standalone PartialListing that only includes the FileStatus for that file? Good question! I have a unit test that covers this too, it returns back a PartialListing with just the filestatus of the file, and getParent will return the file's path. This is the same behavior as listLocatedStatus. This makes me realize though that "getParent" is not the best name since it won't always be the parent, maybe getSourcePath? getListedPath? Happy to take suggestions here, and yea I can beef up the documentation around this too. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491009#comment-16491009 ] Todd Lipcon commented on HDFS-13616: bq. Why not just RemoteIterator? In an earlier prototype of this patch, Andrew had it this way, but I requested the change to the new API, which groups together all of the results for a given parent directory. This makes it much easier to associate back the returned FileStatus to the directories being requested. Certainly it would be possible to iterate over each FileStatus, get the Path, and then get the parent for each one to prefix match back, but that wastes a bit of CPU in URI parsing, etc. One question about the API for Andrew, actually: if I pass a file (instead of a directory), will I get back a standalone PartialListing that only includes the FileStatus for that file? If so, is that indicated differently in any way? Will 'batch.getParent' end up returning the file or its parent? It's probably worth documenting the semantics there. bq. Does that even fit the use case? (I'm guessing no, only some of the partition dirs in the parent need listing–but we need to justify any new FileSystem surface area). Right, it doesn't fit the use case. A table might have thousands of partitions, but with a query like 'select * where time = 123' it's likely to be able to prune most of them and only need to list the files in the remaining few. Additionally, it's permitted for users to specify custom partition locations, so a given table may have partitions located in different spots on the file system, even though the common case is that they all share a parent directory. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491002#comment-16491002 ] Andrew Wang commented on HDFS-13616: Thanks for taking a look, Xiao and Aaron! bq. We currently FNFE on the first error. Is it possible a partition is deleted while another thread is listing halfway for Hive/Impala? What's the expected behavior from them if so? (I'm lacking the knowledge of this so no strong preference either way, but curious...) This case is somewhat addressed by the the unit test listSomeDoNotExist, you'll see that the get() method throws if there was an exception but you can still get results from other listing batches returned by the iterator. If you're talking about listing a single large directory and the directory gets deleted during the listing, then yea this API will throw an FNFE like the existing RemoteIterator API. Paged listings aren't atomic. bq. If caller added some subdirs to srcs, should we list the subdir twice, or throw, or 'smartly' list everything at most once? This is addressed by the unit test listSamePaths, it lists it multiple times. I didn't see it as the role of the filesystem to coalesce these paths, semantically I wanted it to behave like the existing RemoteIterator API called in a for loop. Aaron, I'll hit your review comments in a new patch rev. Precommit is getting pretty close, so I'm hoping to coalesce review comments from others before posting the next one. bq. Why not just RemoteIterator? We need an entry point to throw an exception for a single path that doesn't kill the entire listing. From a client POV, it's also nice to have the same path passed in provided back, since the HDFS returns back absolute, qualified paths. It also makes it easier to understand the empty directory case. I attached the benchmark I ran for further examination. I think you correctly answered the usecase question yourself, but to confirm: the Hive/Impala client already has a list of leaf directories to list, so it'd require some contortions to use a recursive API like listFiles instead. I imagine a server-side listFiles (like what S3 has) would be a nice speedup though. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490305#comment-16490305 ] Aaron Fabbri commented on HDFS-13616: - Cool stuff [~andrew.wang]. I like batching. :) Didn't have time for full review yet but a couple of quick/stupid questions. {noformat} + * Batched listing API that returns {@link PartialListing}s for the + * passed Paths. + * + * @param paths List of paths to list. + * @return RemoteIterator that returns corresponding PartialListings. + * @throws IOException + */ + public RemoteIteratorbatchedListStatusIterator( + final List paths) throws IOException { {noformat} Are paths listed recursively or not? We might as well specify that here. Why not just RemoteIterator? {noformat} + * partial listing, multiple ListingBatches may need to be combined to obtain {noformat} ListingBatches? Did you mean PartialListing? Other thought, test code looks DFS-specific. Do we want to test {{FileSystem}} instead and make this a filesystem contract test? For the benchmarking, what was the comparison code? Recursive listLocatedStatus() loop? I'm curious what the delta would be against an optimized listFiles(recursive=true) on a parent dir instead. Does that even fit the use case? (I'm guessing no, only some of the partition dirs in the parent need listing–but we need to justify any new FileSystem surface area). > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch, HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490247#comment-16490247 ] genericqa commented on HDFS-13616: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 44s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 2m 28s{color} | {color:red} hadoop-common in trunk failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 7s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 25m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 25m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 5s{color} | {color:orange} root: The patch generated 15 new + 1184 unchanged - 0 fixed = 1199 total (was 1184) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 10s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 32s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 56s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 0s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 42s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}244m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | |
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490167#comment-16490167 ] Xiao Chen commented on HDFS-13616: -- Thanks for the work here Andrew, and others for the comments! I had fun reading through. :) Some semantic questions: - We currently FNFE on the first error. Is it possible a partition is deleted while another thread is listing halfway for Hive/Impala? What's the expected behavior from them if so? (I'm lacking the knowledge of this so no strong preference either way, but curious...) - If caller added some subdirs to srcs, should we list the subdir twice, or throw, or 'smartly' list everything at most once? > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch, HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490068#comment-16490068 ] Andrew Wang commented on HDFS-13616: Latest patch addresses some precommit issues. As stated earlier, non-HDFS filesystems are going to throw UnsupportedOperationException. One correction to my earlier comment too, the default listing limit is 1000, not 100. 100 is the current default limit on the number of paths that can be listed per batched listing call. Hi Nicholas, thanks for taking a look. Currently we don't see a need for API support beyond listing. The workload we're looking at is metadata loading for applications like Hive and Impala. Regarding an async API, Todd's benchmarking shows that the batched API is more CPU efficient than processing individual listing calls. It beats the 5-thread case for sparse directories in CPU time and wall time. My benchmarking additionally shows that the batched API generates significantly less garbage. This batched listing API could also be combined with an async API (or a thread pool), so it's not an "either or" situation. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch, HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490032#comment-16490032 ] Andrew Wang commented on HDFS-13616: Hi Zhe, thanks for taking a look! This API respects the existing lsLimit setting of 100, and also limits the number of paths that can be listed in a single batch call. This means that the per-call overhead is very similar to the existing RemoteIterator calls when returning 100-item partial listings. Todd saw ~7ms RPC handling times for 100-item batches on a cluster, which feels like the right granularity for holding a read lock. To answer Todd's question about benchmarking, I wrote a little unit test that invokes NameNodeRpcServer directly and times with System.nanotime(). I made a synthetic directory structure with 30,000 directories, each with one file. This makes it a best case scenario for the batched listing API. Precautions were taken to allow JVM warmup, I let the benchmarks run for about 30s before recording with JFR/JMC. I was able to list 8.4x more LocatedFileStatuses/second with the batched listing. JMC showed a TLAB allocation rate of 5x. Non-TLAB allocation was trivial. This means we're much more CPU efficient per-FileStatus, and also doing less allocation. Since this did not include RTT time or lock contention from concurrent threads, a more realistic benchmark might do even better. I think this explains the 10-20x that Todd saw when benchmarking on a real cluster. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490012#comment-16490012 ] Tsz Wo Nicholas Sze commented on HDFS-13616: Thanks for filing the JIRA. I have a concern that the proposed batchedListStatusIterator is too restrictive since it only supports batch ls. Other operations such batch delete is also very useful. It seems better doing it via non-blocking APIs; see HDFS-9924. We may support batch non-blocking calls. Thought? > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489981#comment-16489981 ] Todd Lipcon commented on HDFS-13616: Actually collecting that was easier than I thought. I found a table with 28509 partitions and only 73400 tables (5500 of the partitions are even empty). With the batched approach, average NN CPU consumption is 2.58sec of CPU. With the 5-threaded threadpool approach, it's 5.78sec of CPU (2.24x improvement). For this table it also reduces the number of round trips enough that the wall-time of fetching the partitions to Impala went from 15.5sec down to 8.0sec. In my experience neither type of table is uncommon - we see some tables with lots of partitions, each of which is large, and some tables with lots of partitions each containing a very small handful of files. I just grabbed a few random tables from a customer workload and found both types.The benefit is much larger for the tables like the latter, but this shouldn't be detrimental for the former either. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489969#comment-16489969 ] Todd Lipcon commented on HDFS-13616: [~zhz] My feeling on these sorts of APIs is that a user who wants to list a bunch of directories is just as likely to do so whether provided with a 'batchListDirectories(List)' API as they are likely to do so with an equiavalent for loop. In particular, applications like MR, Hive, Impala, Presto, etc, end up needing this workflow in order to collect all the input paths from a list of partition directories, so will do this whether we provide a specific API or not. Our belief is that with a batch API we have a better chance of optimizing this common pattern vs a bunch of separate API calls. For example, the various amortization benefits mentioned above. If we eventually add compression of RPC responses, we also get benefit by having larger responses with repeated substrings vs a bunch of smaller responses. I just collected some numbers comparing three options for Impala fetching partition directory contents in order to plan a 'select *' from a large table. The table has 2181 partitions containing a total of 321,008 files. I'm testing against a 2.x branch build with this patch applied, and measuring CPU consumption of the NN for the total of fetching all file block locations from these 2181 directories. No other work is targeting this NN, and the NN is about 2ms away from the host doing the planning. ||Method||User CPU (sec)||System CPU (sec)||Total CPU (sec)|| |Non-batched (1 thread)|5.95|0.30|6.25| |Non-batched (5 threads)|6.25|0.32|6.57| |Batched (1 thread)|5.93|0.21|6.14| The end-to-end planning time of the batched approach is not as good as the 5-thread non-batched, but noticeably faster than the single-threaded non-batched. And the total CPU consumption is a few percent lower (especially system CPU). Note that this particular table isn't the optimal case for batching since the average partition has 147 files and thus each round trip can only fetch a few partitions worth of info. I'll try to gather some data on a table where the average partition doesn't have so many files as well, where we'd expect the benefits to be larger. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489913#comment-16489913 ] genericqa commented on HDFS-13616: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 2s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 16m 52s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 16m 52s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 16m 52s{color} | {color:red} root in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 45s{color} | {color:orange} root: The patch generated 13 new + 997 unchanged - 0 fixed = 1010 total (was 997) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 37s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 20s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 33s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 30s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}235m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-client | | | org.apache.hadoop.hdfs.protocol.BatchedDirectoryListing.getListings() may expose internal representation by returning BatchedDirectoryListing.listings At BatchedDirectoryListing.java:by returning
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489882#comment-16489882 ] Zhe Zhang commented on HDFS-13616: -- Interesting idea! Any thought on the potential of DDoSing the NameNode? Does this patch make it easier for an abusing application to saturate the NN? > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489568#comment-16489568 ] Andrew Wang commented on HDFS-13616: Thanks for providing the additional context Todd, agree on all points. I'm happy to build a simple benchmark, let me start on that. I'm thinking of invoking the NN methods directly, since external measurement seems a little tricky, and I don't know how easy it is to hook in something like JMH. Getting a realistic measurement would require doing something multi-threaded on a server-class machine, but let's see how far I get on my laptop. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489530#comment-16489530 ] Todd Lipcon commented on HDFS-13616: One other performance point: in addition to reducing "wait time" due to network RTT, this also should reduce in a net reduction in load on the NN vs separate calls. That's because we amortize fixed RPC costs like context switches to and from the IPC threads, should get much better CPU cache locality (both instruction and data caches), and amortize lock acquisition overhead on the FSN lock. I think the batched API also offers some future optimizations like amortizing the path traversal cost in the common case that all of the arguments share a common prefix path. This is exceedingly common in applications like Hive where the planner must fetch file lists for /user/hive/warehouse/dbname/tablename/{...100 partitions...}. Again this should be a net reduction in NN CPU usage as well as an improvement in client-visible wall-clock. Andrew, any chance you've done a simple benchmark on the CPU time spent namenode-side, eg of 1000 "listdir" calls vs 1 batched call for the same set of directories? I can help with that if you haven't already set something up. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489518#comment-16489518 ] Andrew Wang commented on HDFS-13616: Patch attached to get some initial feedback on the API and a precommit run on the patch. I can add a naive default FileSystem implementation if desired, I don't have any other immediate TODOs in mind. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-13616.001.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org