[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions of a single table in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup and applies for each table load) -max_hdfs_partitions_parallel_load(default=5) -max_nonhdfs_partitions_parallel_load(default=20) We use different thread pool sizes for HDFS and non-HDFS tables since non-HDFS supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that data structure as a shared state between multiple partition metadata loding threads. Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speedup on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Reviewed-on: http://gerrit.cloudera.org:8080/8235 Reviewed-by: Bharath VissapragadaTested-by: Impala Public Jenkins --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 462 insertions(+), 242 deletions(-) Approvals: Bharath Vissapragada: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 12 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 26 Oct 2017 21:56:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1397/ -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 26 Oct 2017 18:08:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 11: Code-Review+2 Rebased, carrying +2. Thanks Dimitris. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 26 Oct 2017 18:07:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 10: Code-Review+2 Nice work! -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 10 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Oct 2017 22:22:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8235 to look at the new patch set (#10). Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions of a single table in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup and applies for each table load) -max_hdfs_partitions_parallel_load(default=5) -max_nonhdfs_partitions_parallel_load(default=20) We use different thread pool sizes for HDFS and non-HDFS tables since non-HDFS supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that data structure as a shared state between multiple partition metadata loding threads. Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speedup on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 462 insertions(+), 242 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/10 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 10 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 9: (16 comments) http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@42 PS9, Line 42: datastructure > nit: data structure Done http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@50 PS9, Line 50: speed up > speedup I see conflicting results. As per webster it is "speedup", but as per Oxford it is "speed-up". Now I don't know who to trust, but I'll go with your suggestion :) https://en.oxforddictionaries.com/definition/speed-up https://www.merriam-webster.com/dictionary/speedup http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122 PS9, Line 122: Limits the > "Maximum" Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@223 PS9, Line 223: public FileMetadataLoadStats(Path path) { hdfsPath = path; } > nit: move the ctor below the class variable members. Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@228 PS9, Line 228: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@232 PS9, Line 232: the > remove Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248 PS9, Line 248: runnable > callable Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@266 PS9, Line 266: // Computes the file metadata from scratch (by calling resetAndLoadFileMetadata()) : // when reuseFileMd_ is false, else refreshes the existing file metadata for : // modified files using refreshFileMetadata(). > nit: you can actually remove the comment as it simply describes the code be Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@465 PS9, Line 465: hasFileChanged > I think you need to put back the caching check. Alex made a good point that Alex to the rescue, good point. Made the change. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773 PS9, Line 773: loadParitionFileMetadataFromScratch > "load from scratch" and "reset and load" essentially mean the same thing. M Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792 PS9, Line 792: S3 > non-HDFS (S3/ADLS) Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792 PS9, Line 792: S3 > "the latter" Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@844 PS9, Line 844: if (LOG.isTraceEnabled()) { : LOG.trace(loadStats.debugString()); : } > nit: single line Done http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java File fe/src/main/java/org/apache/impala/service/BackendConfig.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java@71 PS9, Line 71: maxS3PartsParallelLoad > Why "maxS3..."? Everywhere else we use "maxNonHdfs..". Yep, forgot to change here. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java File fe/src/main/java/org/apache/impala/util/ListMap.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@43 PS9, Line 43: getList() { return list_; } > Hm, why is list_ exposed to the world? You may want to check who wants this - I checked the callers and all of them just readers and either iterate / sets it inside another thrift object (which is when I think a copy is made). - Looks like ImmutableList.copyOf() doesn't always make a copy (although subject to change), using it here. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@76 PS9, Line 76: synchronized (this) { > this is equivalent to public synchronized void populate... oops, yea, updated. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 9: (16 comments) http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@42 PS9, Line 42: datastructure nit: data structure http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@50 PS9, Line 50: speed up speedup http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122 PS9, Line 122: Limits the "Maximum" http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@223 PS9, Line 223: public FileMetadataLoadStats(Path path) { hdfsPath = path; } nit: move the ctor below the class variable members. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@228 PS9, Line 228: nit: extra space http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@232 PS9, Line 232: the remove http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248 PS9, Line 248: runnable callable http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@266 PS9, Line 266: // Computes the file metadata from scratch (by calling resetAndLoadFileMetadata()) : // when reuseFileMd_ is false, else refreshes the existing file metadata for : // modified files using refreshFileMetadata(). nit: you can actually remove the comment as it simply describes the code below. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@465 PS9, Line 465: hasFileChanged I think you need to put back the caching check. Alex made a good point that it is used for updating the cached bytes size. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773 PS9, Line 773: loadParitionFileMetadataFromScratch "load from scratch" and "reset and load" essentially mean the same thing. Maybe rename this to resetAndLoadPartitionFileMetadata()? http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792 PS9, Line 792: S3 non-HDFS (S3/ADLS) http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792 PS9, Line 792: S3 "the latter" http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@844 PS9, Line 844: if (LOG.isTraceEnabled()) { : LOG.trace(loadStats.debugString()); : } nit: single line http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java File fe/src/main/java/org/apache/impala/service/BackendConfig.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java@71 PS9, Line 71: maxS3PartsParallelLoad Why "maxS3..."? Everywhere else we use "maxNonHdfs..". http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java File fe/src/main/java/org/apache/impala/util/ListMap.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@43 PS9, Line 43: getList() { return list_; } Hm, why is list_ exposed to the world? You may want to check who wants this and what do they do with it (i.e modify or simply iterate). If it's the latter you can pass an ImmutableList(). If it is the former, see if we need to expand the public api of this class instead. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@76 PS9, Line 76: synchronized (this) { this is equivalent to public synchronized void populate... -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Oct 2017 19:24:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 9: Code-Review+1 (1 comment) Thanks Alex, carrying +1. Rebased for the final review. http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122 PS8, Line 122: // Limits the number of errors logged when loading partitioned tables. > remove "huge" Done -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Oct 2017 17:46:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8235 to look at the new patch set (#9). Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions of a single table in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup and applies for each table load) -max_hdfs_partitions_parallel_load(default=5) -max_nonhdfs_partitions_parallel_load(default=20) We use different thread pool sizes for HDFS and non-HDFS tables since non-HDFS supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that datastructure as a shared state between multiple partition metadata loding threads. Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speed up on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 461 insertions(+), 245 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/9 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 8: Code-Review+1 (1 comment) Dimitris should +2 http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122 PS8, Line 122: // Limits the number of errors logged when loading huge partitioned tables. remove "huge" -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 23 Oct 2017 16:47:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8235 to look at the new patch set (#8). Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions of a single table in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup and applies for each table load) -max_hdfs_partitions_parallel_load(default=5) -max_nonhdfs_partitions_parallel_load(default=20) We use different thread pool sizes for HDFS and non-HDFS tables since non-HDFS supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that datastructure as a shared state between multiple partition metadata loding threads. Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speed up on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 461 insertions(+), 245 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/8 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221 PS7, Line 221: // are excluded because they are considered hidden from Impala's perspecitve > Shrink: Done http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@226 PS7, Line 226: // Number of files skipped while computeing the block metadata information. > Shrink: Done http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248 PS7, Line 248: // If set to true, reloads the block metadata only when the files in this path > reloads the file metadata (let's try to use the new "file metadata" termino Done. Changed at other places too. http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@249 PS7, Line 249: // has changed since last load (more details in hasFileChanged()). > have changed Done http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@785 PS7, Line 785:* much higher throughput of RPC calls for listStatus/listFiles. Based on our > ... RPC calls for listStatus/listFiles. For simplicity, the filesystem type Done http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@791 PS7, Line 791:* This method is not optimized for tables with mixed partition types (partitions mapped > I don'd think we need this much detail, how about folding a simple sentence Done http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@794 PS7, Line 794:* This may not work well with the mixed partition types and needs to fixed. > Don't think this is needed. Done http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@819 PS7, Line 819: // For tables without partitions, we have no block metadata to load. > remove "," Done http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@841 PS7, Line 841: LOG.error("Encountered an error loading block metadata for table: " + > Could this flood the log when HDFS is under pressure or there is an intermi Very good point. This can likely flood the log. IMO we should have some stack traces of errors for supportability. Added some code to limit the number of errors to log (100) and logged the total failed tasks. Please let me know if you prefer some other approach. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Sun, 22 Oct 2017 20:05:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221 PS7, Line 221: // are excluded because they are considered hidden from Impala's perspecitve Shrink: // Number of hidden files excluded from file metadata loading . More details at isValidDataFile(). http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@226 PS7, Line 226: // Number of files skipped while computeing the block metadata information. Shrink: // Number of files skipped from file metadata loading because the the files have not changed since the last load. More details at hasFileChanged(). http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248 PS7, Line 248: // If set to true, reloads the block metadata only when the files in this path reloads the file metadata (let's try to use the new "file metadata" terminology consistently) http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@249 PS7, Line 249: // has changed since last load (more details in hasFileChanged()). have changed http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@785 PS7, Line 785:* much higher throughput of RPC calls for listStatus/listFiles. Based on our ... RPC calls for listStatus/listFiles. For simplicity, the filesystem type is determined based on the table's root path and not for each partition individually. Based on our experiments, S3 showed ... http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@791 PS7, Line 791:* This method is not optimized for tables with mixed partition types (partitions mapped I don'd think we need this much detail, how about folding a simple sentence into the existing paragraph above, see suggestion above. http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@794 PS7, Line 794:* This may not work well with the mixed partition types and needs to fixed. Don't think this is needed. http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@819 PS7, Line 819: // For tables without partitions, we have no block metadata to load. remove "," http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@841 PS7, Line 841: LOG.error("Encountered an error loading block metadata for table: " + Could this flood the log when HDFS is under pressure or there is an intermittent Kerberos issue? I'm thinking we should aggregate these into a single log message. Do you think having every single path helps with supportability? My feeling is it's more important to note how many tasks failed, but I'll defer to you if you think having the paths is useful. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Sat, 21 Oct 2017 00:12:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@38 PS6, Line 38: DEFINE_int32(max_hdfs_parts_parallel_load, 5, > Let's spell out "parts" into "partitions" to avoid confusion. Done http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@40 PS6, Line 40: "tables. Due to HDFS architectural limitations, it is unlikely to get a linear " > In the sentence "Due to HDFS architectural..." what does the "it" after the Done http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@42 PS6, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 20, > Does ADLS fall into this or the HDFS bucket? What about other filesystems? Yes, although I haven't tested it specifically with ADLS. I mentioned that in the commit message but it looks like a good thing to reflect that in the config name too. I changed it too, let me know if you think we should use a better name here. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@216 PS6, Line 216: private class PathStorageMdLoadingStats { > Naming seems a little weird. I think "Storage" is too generic and "Block" i Wfm. Changed it. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@240 PS6, Line 240: // If set to true, reloads the block metadata only when the underlying file > Please clarify "the underlying file" Done http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@413 PS6, Line 413: ++loadStats.ignoredFiles; > I think we should distinguish the hidden file and already-up-to-date file c Done http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773 PS6, Line 773:* Returns the thread pool size to load the block metadata of this table. > Command on how we distinguish between HDFS, S3 and other filesystems (using Done http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@793 PS6, Line 793: return Math.max(Math.min(numPaths, threadPoolSize), 1); > For my understanding, why is the max() needed here? Is it not be a precondi For empty tables, numPaths is 0. But yes this should be a preconditions check. So I moved it to the caller and added a preconditions check in this method. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@800 PS6, Line 800:* metadata is reloaded, else it is loaded from scratch. > Difference between "reloaded" and "loaded from scratch" is not very clear. Sounds better. Done. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@804 PS6, Line 804: int numPathsToLoad = partsByPath.keySet().size(); > partsByPath.size()? Done http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@812 PS6, Line 812: for (Path p: partsByPath.keySet()) { > The tasks are ordered randomly. I wonder if submitting the tasks in a clust Will do. I have two other jiras (based on others' comments) to create, so will add this to the list. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@821 PS6, Line 821: } catch (ExecutionException | InterruptedException e) { > We still consider the load successful if one of these fails. What do you th Oops, I forgot to wrap it as a TLE and throw. Thanks for pointing it out. My opinion is that a partially loaded table is ok, but we should let the user know about it. What do you think? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@822 PS6, Line 822: LOG.error("Encountered an error loading block metadata for table: " + > Let's also dump the path ExecutionException (ex, IOException in this case) includes the path by default and LOG.error() logs the full trace? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@898 PS6, Line 898:* the newly created HdfsPartitions in parallel. > remove "the" Done http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java File fe/src/main/java/org/apache/impala/util/ListMap.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java@38 PS6, Line 38: private
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@38 PS6, Line 38: DEFINE_int32(max_hdfs_parts_parallel_load, 5, Let's spell out "parts" into "partitions" to avoid confusion. http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@40 PS6, Line 40: "tables. Due to HDFS architectural limitations, it is unlikely to get a linear " In the sentence "Due to HDFS architectural..." what does the "it" after the comma refer to? Better to spell it out, e.g., "the response time of block metadata loading is unlikely to improve beyond 5 threads." http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@42 PS6, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 20, Does ADLS fall into this or the HDFS bucket? What about other filesystems? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@216 PS6, Line 216: private class PathStorageMdLoadingStats { Naming seems a little weird. I think "Storage" is too generic and "Block" is too specific. How about we settle on "FileMetadata", so we'd have: FileMetadataLoadStats FileMetadataLoadRequest http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@240 PS6, Line 240: // If set to true, reloads the block metadata only when the underlying file Please clarify "the underlying file" http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@413 PS6, Line 413: ++loadStats.ignoredFiles; I think we should distinguish the hidden file and already-up-to-date file cases, e.g. split ignoredFiles into 'hiddenFiles' and 'skippedFreshFiles' or similar http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773 PS6, Line 773:* Returns the thread pool size to load the block metadata of this table. Command on how we distinguish between HDFS, S3 and other filesystems (using the table root path?), and what the behavior is for mixed-filesystem tables. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@793 PS6, Line 793: return Math.max(Math.min(numPaths, threadPoolSize), 1); For my understanding, why is the max() needed here? Is it not be a precondition that numPaths > 0 and threadPoolSize > 0? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@800 PS6, Line 800:* metadata is reloaded, else it is loaded from scratch. Difference between "reloaded" and "loaded from scratch" is not very clear. Maybe say "incrementally refreshed" vs. "reloaded from scratch"? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@804 PS6, Line 804: int numPathsToLoad = partsByPath.keySet().size(); partsByPath.size()? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@812 PS6, Line 812: for (Path p: partsByPath.keySet()) { The tasks are ordered randomly. I wonder if submitting the tasks in a clustered fashion would be better or worse. Not relevant to this patch, but might be interesting as a follow-on investigation. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@821 PS6, Line 821: } catch (ExecutionException | InterruptedException e) { We still consider the load successful if one of these fails. What do you think the right behavior should be? Is a partially loaded table an acceptable outcome? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@822 PS6, Line 822: LOG.error("Encountered an error loading block metadata for table: " + Let's also dump the path http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@898 PS6, Line 898:* the newly created HdfsPartitions in parallel. remove "the" http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java File fe/src/main/java/org/apache/impala/util/ListMap.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java@38 PS6, Line 38: private List list_ = Collections.synchronizedList(new ArrayList()); The CC requirements and behavior are not clear to me. Why is it not sufficient to make all methods synchronized? -- To
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39 PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS based partitioned " : "tables. Due to HDFS architectural limitations, it is unlikely to get a linear " : "speed up beyond 5 threads. > When multiple tables are loaded, should I think about the total number of t Unfortunately not. With the current design, this queue is actually unbounded. num_metadata_loading_threads only applies to the loads happening via TableLoadingMgr class and there would be other loads, that happen via DDLs (CatalogOpExecutor) like REFRESHES/ADD PARTITIONS etc. The original plan was to use the TableLoadingMgr to queue all these loads but we didn't end up doing it since it shows up as a regression to the end users (since the DDLs can potentially wait in the queue much longer than before). Ideally we could do it and increase num_metadata_loading_threads to a large value to mimic the present behavior. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783 PS5, Line 783: numPaths) throws Ca > CONF is the default configuration and its loaded once upfront for the lifet getFileSystem() by default uses a cache underneath unless we disable it via CONF.setBoolean("fs.hdfs.impl.disable.cache", true); But I'm not totally sure if there is a way to optimize beyond that point for each partition. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801 PS5, Line 801: > yes, that answers it. might be useful to try a workload that has the same n That'd be a good experiment. We are definitely as fast as the slowest partition load. Given we are using a thread pool, smaller partitions give up the executing thread much quicker and that would be used by the queued partitions_to_load. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818 PS6, Line 818: for (Future task: pendingMdLoadTasks) > just for my own info-- since this work is triggered by an end-user, how is Currently we don't support query cancellation for planning queries (IMPALA-915). That is a bigger query life-cycle change and needs to done separately. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 16 Oct 2017 22:28:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39 PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS based partitioned " : "tables. Due to HDFS architectural limitations, it is unlikely to get a linear " : "speed up beyond 5 threads. When multiple tables are loaded, should I think about the total number of threads as num_metadata_loading_threads * max_hdfs_parts_parallel_load? If so, is the scaling limitation of 5 with regards to total threads hitting the namenode or 5 * 16 (per default settings)? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783 PS5, Line 783: numPaths) throws Ca > Correct. This is one of the overheads as noticed in the perf runs and unfor CONF is the default configuration and its loaded once upfront for the lifetime of this class (L201). I suspect few filesystems are specified-- perhaps we may get lucky and there is only one. Potentially, there's a way to make this method cheaper for such cases? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801 PS5, Line 801: > Yes, each partition can have its own no. of files, so the work definitely v yes, that answers it. might be useful to try a workload that has the same number of blocks as your current workload, but distributed non-uniformly across partitions and files. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818 PS6, Line 818: for (Future task: pendingMdLoadTasks) just for my own info-- since this work is triggered by an end-user, how is cancellation dealt with? -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 16 Oct 2017 20:35:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8235 to look at the new patch set (#6). Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions of a single table in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup and applies for each table load) -max_hdfs_parts_parallel_load(default=5) -max_s3_parts_parallel_load(default=10) We use different thread pool sizes for HDFS and S3 based tables since S3 supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that datastructure as a shared state between multiple partition metadata loding threads. - While the configuration param max_s3_parts_parallel_load says s3, it applies for any FileSystem implementation that doesn't support storage IDs (like ADLS). Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speed up on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 411 insertions(+), 238 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/6 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 5: (36 comments) http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG@59 PS4, Line 59: > I see. So are these literally as simple as, to pick the first one, a single That is correct. http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc@35 PS5, Line 35: DEFINE_int32(num_metadata_loading_threads, 16, : "(Advanced) The number of metadata loading threads (degree of parallelism) to use " : "when loading catalog metadata."); > I'm confused by the commit message which talks about not loading from hms u Right, this flag controls the number parallel *table* loads. And each table loading involves - Loading HMS metadata (including schema details, partition information etc.) - Loading/synthesizing block metadata So this flag is about inter-table parallelism and the commit message talks about intra-table parallel loading. Does that make sense? (I clarified the content in the commit message a little to reflect this, incase that helps). http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc@42 PS5, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 10, > I would be more aggressive with this parameter and put it at 20. Done http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@215 PS5, Line 215: // Block metadata loading stats for a single HDFS path. > nit: File/block (since we're also loading/refreshing files). Also, you may Done http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217 PS5, Line 217: loadedFiles_ > You may want to add a comment here. What is loaded vs refreshed? Is the one Done http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@218 PS5, Line 218: _ > I believe the convention is that we don't use '_' for public members. Done. Was not aware of this. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217 PS5, Line 217: public int loadedFiles_ = 0; : public int refreshedFiles_ = 0; : public int ignoredFiles_ = 0; > add comments for these-- see the question regarding refreshedFiles below, f Yep, the current variables are confusing for sure, redid them and added comments. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@231 PS5, Line 231: Runnable > I don't know how this is used later on, but alternatively you can make Path I settled for Runnable because it looked more cleaner to me with this class exposing the debugString() method and only those callers who need the return value can access that method. I'm fine either way. I switched to Callable. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@247 PS5, Line 247: Blocks on the loadBlockMetadata() call. > Not following this comment. run() either calls refreshBlockMetadata() or lo The comment was stale. Updated it. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@333 PS5, Line 333: loadBlockMetadata > I know I am guilty for some of these names but maybe rename this to "resetA Done http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@363 PS5, Line 363: numUnknownDiskIds > Are you overriding or incrementing the value of numUnknownDiskIds in the cr Incrementing and will use the final value in L369. Tracking the create call which eventually calls HdfsPartition.createDiskIds(), it only increments and doesn't overwrite. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368 PS5, Line 368: for (HdfsPartition partition: partitions) partition.setFileDescriptors( > Am I misreading this or does each partition get set to the same list of new That is right. All the HdfsPartitions in 'partitions' map to the same partDir. So we update the fileDescs for each of them. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368 PS5, Line 368: newFileDescs > Hm, that doesn't seem particularly safe (i.e. using the same list for every Like we discussed, there is a subtle bug here (even without the patch). We can see the inconsistency with the following steps - create table test(a
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368 PS5, Line 368: for (HdfsPartition partition: partitions) partition.setFileDescriptors( > Am I misreading this or does each partition get set to the same list of new Just clarified this so I'll post my misunderstanding here. The comment states that "partitions in 'partitions' that correspond to the path 'partDir'". "correspond" is open to interpretation so I understood this as: /.../partDir/partition1_dir, partition2_dir, etc. The intended interpretation is that multiple partitions map (or refer) to the same fs path ('pathDir') (usecase: "latest" is an alias for the most recently populated partition). -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 13 Oct 2017 18:36:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Mostafa Mokhtar has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc@42 PS5, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 10, I would be more aggressive with this parameter and put it at 20. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@787 PS5, Line 787: int threadPoolSize = FileSystemUtil.supportsStorageIds(tableFs) ? What is the expected behavior for tables with mixed FSs? As a mix of S3 and HDFS partitions. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801 PS5, Line 801: getLoadingThreadPoolSize > can different partitions have different number of files? if so, work across Parallelization of metadata loading is done on per partition granularity. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 12 Oct 2017 04:51:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 5: (22 comments) http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@215 PS5, Line 215: // Block metadata loading stats for a single HDFS path. nit: File/block (since we're also loading/refreshing files). Also, you may want to change the name of the private class to reflect that, e.g. PathMetadataLoadingStats or something like that. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217 PS5, Line 217: loadedFiles_ You may want to add a comment here. What is loaded vs refreshed? Is the one superset of the other. Good to clarify to avoid confusion. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@218 PS5, Line 218: _ I believe the convention is that we don't use '_' for public members. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@231 PS5, Line 231: Runnable I don't know how this is used later on, but alternatively you can make PathBlockMetadataLoadRequest implement Callable, hence returning the stats when calling call(). Now you seem to access the stats only through the debugString() function. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@247 PS5, Line 247: Blocks on the loadBlockMetadata() call. Not following this comment. run() either calls refreshBlockMetadata() or loadBlockMetadata(), so I can't really interpret what this comment is saying. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@333 PS5, Line 333: loadBlockMetadata I know I am guilty for some of these names but maybe rename this to "resetAndLoadBlockMetadata"? Then it is more clear what the differences are between this function and rerfreshBlockMetadata(). http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@363 PS5, Line 363: numUnknownDiskIds Are you overriding or incrementing the value of numUnknownDiskIds in the create() function? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368 PS5, Line 368: newFileDescs Hm, that doesn't seem particularly safe (i.e. using the same list for every partition). Are we certain that any other partition modification operation (e.g. alter partition set location) will not try to override the file descriptors, thereby affecting all the other partitions? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@380 PS5, Line 380: changed :* mtime It's not just the changed mtime that we're looking for in order to determiner modification, so you may want to either remove this or mention all the criteria we use. I prefer the former. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@383 PS5, Line 383: The initial table load still uses the listFiles() :* on the data directory that fetches both the FileStatus as well as BlockLocations in :* a single call. remove http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@398 PS5, Line 398: get(0) Comment why we take the file descriptors of one partition and that it doesn't really matter which one we choose. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@399 PS5, Line 399: public String apply(FileDescriptor desc) { : return desc.getFileName(); : } Add @Override and make it a single line (if it fits) http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@448 PS5, Line 448: (fd == null) || (fd.getFileLength() != status.getLen()) || : (fd.getModificationTime() != status.getModificationTime()); I think we were also checking if the partition was cached. Isn't this check needed anymore? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@455 PS5, Line 455: refreshFileMetadata Maybe call it refreshPartitionStorageMetadata? Overall, it may make sense to replace "File/Block" with "Storage" whenever it makes sense. Thoughts? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@694 PS5, Line 694: Exception Why this change?
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc@35 PS5, Line 35: DEFINE_int32(num_metadata_loading_threads, 16, : "(Advanced) The number of metadata loading threads (degree of parallelism) to use " : "when loading catalog metadata."); I'm confused by the commit message which talks about not loading from hms using multiple threads and this flag which indicates that hms is loaded using multiple threads. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217 PS5, Line 217: public int loadedFiles_ = 0; : public int refreshedFiles_ = 0; : public int ignoredFiles_ = 0; add comments for these-- see the question regarding refreshedFiles below, for example. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368 PS5, Line 368: for (HdfsPartition partition: partitions) partition.setFileDescriptors( Am I misreading this or does each partition get set to the same list of newly found descriptors? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@426 PS5, Line 426: new Reference(Long.valueOf(0) why not use numUnknownDiskIds here? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@431 PS5, Line 431: ++loadStats.refreshedFiles_; does refreshedFiles mean "file blocks reloaded" or "file checked for reload and possibly reloaded"? would be good to track how many times the if-block on L418 was entered since this method is intended to be used when few changes are present. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@433 PS5, Line 433: for (HdfsPartition partition: partitions) partition.setFileDescriptors(n same question as in the load method. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773 PS5, Line 773: HDFS and S3 just to clarify, HdfsTable covers both hdfs table metadata as well as metadata needed for s3? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783 PS5, Line 783: getFileSystem(CONF) I noticed that this is called in many places in this class-- is it bc a given table can be stored on multiple filesystems? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801 PS5, Line 801: getLoadingThreadPoolSize can different partitions have different number of files? if so, work across threads may vary. what's costly here: per file call, per partition call, or number of blocks per file? -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 11 Oct 2017 18:51:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG@59 PS4, Line 59: 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% > Thanks for pointing this out. I should've added more context here. Mostafa I see. So are these literally as simple as, to pick the first one, a single refresh call? -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Wed, 11 Oct 2017 01:14:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Hello Jim Apple, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8235 to look at the new patch set (#5). Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup) -max_hdfs_parts_parallel_load(default=5) -max_s3_parts_parallel_load(default=10) We use different thread pool sizes for HDFS and S3 based tables since S3 supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that datastructure as a shared state between multiple partition metadata loding threads. - While the configuration param max_s3_parts_parallel_load says s3, it applies for any FileSystem implementation that doesn't support storage IDs (like ADLS). Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speed up on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 386 insertions(+), 212 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/5 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG@59 PS4, Line 59: 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% > What are these benchmarks? Thanks for pointing this out. I should've added more context here. Mostafa worked on creating a metadata benchmark by synthesizing large partitioned tables and then running some operations that stress the Catalog. Looks like the code/scripts are not yet ready for consumption by everyone. I borrowed those from him to run these tests. I'll add that information to the commit message. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Wed, 11 Oct 2017 00:24:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG@59 PS4, Line 59: 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% What are these benchmarks? -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Tue, 10 Oct 2017 03:48:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8235 Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup) -max_hdfs_parts_parallel_load(default=5) -max_s3_parts_parallel_load(default=10) We use different thread pool sizes for HDFS and S3 based tables since S3 supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that datastructure as a shared state between multiple partition metadata loding threads. - While the configuration param max_s3_parts_parallel_load says s3, it applies for any FileSystem implementation that doesn't support storage IDs (like ADLS). Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speed up on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Benchmark improvements on a 16 node cluster (I = Improvement) 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 100K-PARTITIONS-1M-FILES-CUSTOM-11-DROP-PARTITIONI -18.53% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-01-DROP I -34.82% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-01-DROP I -70.38% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 386 insertions(+), 212 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/4 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada