[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 10 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 15 Feb 2018 05:04:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ The following 2 locks have shown to be frequent points of contention on recent perf runs: - qs_map_lock_ - client_request_state_map_lock_ Since these are process wide locks, any threads waiting on these locks potentially slow down the runtime of a query. I tried to address this previously by converting the client_request_state_map_lock_ to a reader-writer lock. This showed great perf improvements in the general case, however, there were edge cases with big regressions as well. In the general case, strict readers of the map got through so quickly that we were able to see a reduction in the number of client connections created, since this lock was contended for in the context of Thrift threads too. The bad case is when writers were starved trying to register a new query since there were so many readers. Changing the starve option resulted in worse read performance all round. Another approach which is relatively simpler is to shard the locks, which proves to be very effective with no regressions. The maps and locks are sharded to a default of 4 buckets initally. Query IDs are created by using boost::uuids::random_generator. We use the high bits of a query ID to assign queries to buckets. I verified that the distribution of the high bits of a query ID are even across buckets on my local machine: For 10,000 queries sharded across 4 buckets, the distribution was: bucket[0]: 2500 bucket[1]: 2489 bucket[2]: 2566 bucket[3]: 2445 A micro-benchmark is added to measure the improvement in performance. This benchmark creates multiple threads each of which creates a QueryState and accesses it multiple times. We can see improvements in the range 2x - 3.5x. BEFORE: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 1ms Total Time (#Queries: 50 #Accesses: 100) : 8ms Total Time (#Queries: 50 #Accesses: 1000) : 54ms Total Time (#Queries: 500 #Accesses: 100) : 76ms Total Time (#Queries: 500 #Accesses: 1000) : 543ms AFTER: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles Total Time (#Queries: 50 #Accesses: 100) : 4ms Total Time (#Queries: 50 #Accesses: 1000) : 15ms Total Time (#Queries: 500 #Accesses: 100) : 46ms Total Time (#Queries: 500 #Accesses: 1000) : 151ms This change introduces a ShardedQueryMap, which is used to replace the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_, and their corresponding locks, thereby abstracting away the access to the maps locks. For operations that need to happen on every entry in the ShardedQueryMap maps, a new function ShardedQueryMap::DoFuncForAllEntries() is introduced which takes a user supplied lambda and passes it every individual map entry and executes it. NOTE: This microbenchmark has shown that SpinLock has better performance than boost::mutex for the qs_map_lock_'s, so that change has been made too. TODO: Add benchmark for client_request_state_map_lock_ too. The APIs around that are more complicated, so this patch only includes the benchmarking of qs_map_lock_. TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap. Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Reviewed-on: http://gerrit.cloudera.org:8080/8363 Reviewed-by: Sailesh Mukil Tested-by: Impala Public Jenkins --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/process-wide-locks-benchmark.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A be/src/util/sharded-query-map-util.h 8 files changed, 379 insertions(+), 59 deletions(-) Approvals: Sailesh Mukil: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 11 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1940/ -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 10 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 15 Feb 2018 01:26:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 10: Code-Review+2 Fixed clang-tidy issues. Rebase, carry +2. -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 10 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 15 Feb 2018 01:26:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Hello Philip Zeyliger, Dan Hecht, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8363 to look at the new patch set (#10). Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ The following 2 locks have shown to be frequent points of contention on recent perf runs: - qs_map_lock_ - client_request_state_map_lock_ Since these are process wide locks, any threads waiting on these locks potentially slow down the runtime of a query. I tried to address this previously by converting the client_request_state_map_lock_ to a reader-writer lock. This showed great perf improvements in the general case, however, there were edge cases with big regressions as well. In the general case, strict readers of the map got through so quickly that we were able to see a reduction in the number of client connections created, since this lock was contended for in the context of Thrift threads too. The bad case is when writers were starved trying to register a new query since there were so many readers. Changing the starve option resulted in worse read performance all round. Another approach which is relatively simpler is to shard the locks, which proves to be very effective with no regressions. The maps and locks are sharded to a default of 4 buckets initally. Query IDs are created by using boost::uuids::random_generator. We use the high bits of a query ID to assign queries to buckets. I verified that the distribution of the high bits of a query ID are even across buckets on my local machine: For 10,000 queries sharded across 4 buckets, the distribution was: bucket[0]: 2500 bucket[1]: 2489 bucket[2]: 2566 bucket[3]: 2445 A micro-benchmark is added to measure the improvement in performance. This benchmark creates multiple threads each of which creates a QueryState and accesses it multiple times. We can see improvements in the range 2x - 3.5x. BEFORE: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 1ms Total Time (#Queries: 50 #Accesses: 100) : 8ms Total Time (#Queries: 50 #Accesses: 1000) : 54ms Total Time (#Queries: 500 #Accesses: 100) : 76ms Total Time (#Queries: 500 #Accesses: 1000) : 543ms AFTER: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles Total Time (#Queries: 50 #Accesses: 100) : 4ms Total Time (#Queries: 50 #Accesses: 1000) : 15ms Total Time (#Queries: 500 #Accesses: 100) : 46ms Total Time (#Queries: 500 #Accesses: 1000) : 151ms This change introduces a ShardedQueryMap, which is used to replace the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_, and their corresponding locks, thereby abstracting away the access to the maps locks. For operations that need to happen on every entry in the ShardedQueryMap maps, a new function ShardedQueryMap::DoFuncForAllEntries() is introduced which takes a user supplied lambda and passes it every individual map entry and executes it. NOTE: This microbenchmark has shown that SpinLock has better performance than boost::mutex for the qs_map_lock_'s, so that change has been made too. TODO: Add benchmark for client_request_state_map_lock_ too. The APIs around that are more complicated, so this patch only includes the benchmarking of qs_map_lock_. TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap. Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/process-wide-locks-benchmark.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A be/src/util/sharded-query-map-util.h 8 files changed, 379 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/10 -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 10 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1925/ -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 9 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 13 Feb 2018 04:32:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1925/ -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 9 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 13 Feb 2018 00:47:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 9: Code-Review+2 (3 comments) Thanks for the review. Carry +2. http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h@28 PS8, Line 28: > I don't see that used explicitly in this file. Can it be removed? Done http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h@44 PS8, Line 44: #include "util/thread-pool > same Done http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h@95 PS8, Line 95: int qs_m > generally we just use the un-sized primitive type (which for impala codebas Done -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 9 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 13 Feb 2018 00:46:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Hello Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8363 to look at the new patch set (#9). Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ The following 2 locks have shown to be frequent points of contention on recent perf runs: - qs_map_lock_ - client_request_state_map_lock_ Since these are process wide locks, any threads waiting on these locks potentially slow down the runtime of a query. I tried to address this previously by converting the client_request_state_map_lock_ to a reader-writer lock. This showed great perf improvements in the general case, however, there were edge cases with big regressions as well. In the general case, strict readers of the map got through so quickly that we were able to see a reduction in the number of client connections created, since this lock was contended for in the context of Thrift threads too. The bad case is when writers were starved trying to register a new query since there were so many readers. Changing the starve option resulted in worse read performance all round. Another approach which is relatively simpler is to shard the locks, which proves to be very effective with no regressions. The maps and locks are sharded to a default of 4 buckets initally. Query IDs are created by using boost::uuids::random_generator. We use the high bits of a query ID to assign queries to buckets. I verified that the distribution of the high bits of a query ID are even across buckets on my local machine: For 10,000 queries sharded across 4 buckets, the distribution was: bucket[0]: 2500 bucket[1]: 2489 bucket[2]: 2566 bucket[3]: 2445 A micro-benchmark is added to measure the improvement in performance. This benchmark creates multiple threads each of which creates a QueryState and accesses it multiple times. We can see improvements in the range 2x - 3.5x. BEFORE: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 1ms Total Time (#Queries: 50 #Accesses: 100) : 8ms Total Time (#Queries: 50 #Accesses: 1000) : 54ms Total Time (#Queries: 500 #Accesses: 100) : 76ms Total Time (#Queries: 500 #Accesses: 1000) : 543ms AFTER: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles Total Time (#Queries: 50 #Accesses: 100) : 4ms Total Time (#Queries: 50 #Accesses: 1000) : 15ms Total Time (#Queries: 500 #Accesses: 100) : 46ms Total Time (#Queries: 500 #Accesses: 1000) : 151ms This change introduces a ShardedQueryMap, which is used to replace the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_, and their corresponding locks, thereby abstracting away the access to the maps locks. For operations that need to happen on every entry in the ShardedQueryMap maps, a new function ShardedQueryMap::DoFuncForAllEntries() is introduced which takes a user supplied lambda and passes it every individual map entry and executes it. NOTE: This microbenchmark has shown that SpinLock has better performance than boost::mutex for the qs_map_lock_'s, so that change has been made too. TODO: Add benchmark for client_request_state_map_lock_ too. The APIs around that are more complicated, so this patch only includes the benchmarking of qs_map_lock_. TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap. Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/process-wide-locks-benchmark.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A be/src/util/sharded-query-map-util.h 8 files changed, 376 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/9 -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 9 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 8: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h@28 PS8, Line 28: #include "util/spinlock.h" I don't see that used explicitly in this file. Can it be removed? http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h@44 PS8, Line 44: #include "util/spinlock.h" same http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h@95 PS8, Line 95: uint64_t generally we just use the un-sized primitive type (which for impala codebase we default to 'int'). -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 8 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 13 Feb 2018 00:10:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@67 PS7, Line 67: struct MapShard { : std::unordered_map map_; : SpinLock map_lock_; : }; > you should run a benchmark to verify, but you may want to explicitly align I ran the benchmarks both with and without inheriting CacheLineAligned, and there wasn't a noticeable delta on my machine. But I inherited CacheLineAligned to be more explicit. http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@94 PS7, Line 94: uint8_t > see comment below about uint8_t Done http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@122 PS7, Line 122: uint8_t > why uint8_t? Since it doesn't live in memory, doesn't seem necessary to di I thought it wouldn't make sense to have more than 255 buckets, but better not to make my own assumptions. I changed it to uint64_t. Do let me know if you had something else in mind. http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@129 PS7, Line 129: std::unordered_map* map_; : SpinLock* map_lock_; > This could be just a ShardedQueryMap::MapShard*, but up to you. Done -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 7 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 12 Feb 2018 23:56:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Hello Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8363 to look at the new patch set (#8). Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ The following 2 locks have shown to be frequent points of contention on recent perf runs: - qs_map_lock_ - client_request_state_map_lock_ Since these are process wide locks, any threads waiting on these locks potentially slow down the runtime of a query. I tried to address this previously by converting the client_request_state_map_lock_ to a reader-writer lock. This showed great perf improvements in the general case, however, there were edge cases with big regressions as well. In the general case, strict readers of the map got through so quickly that we were able to see a reduction in the number of client connections created, since this lock was contended for in the context of Thrift threads too. The bad case is when writers were starved trying to register a new query since there were so many readers. Changing the starve option resulted in worse read performance all round. Another approach which is relatively simpler is to shard the locks, which proves to be very effective with no regressions. The maps and locks are sharded to a default of 4 buckets initally. Query IDs are created by using boost::uuids::random_generator. We use the high bits of a query ID to assign queries to buckets. I verified that the distribution of the high bits of a query ID are even across buckets on my local machine: For 10,000 queries sharded across 4 buckets, the distribution was: bucket[0]: 2500 bucket[1]: 2489 bucket[2]: 2566 bucket[3]: 2445 A micro-benchmark is added to measure the improvement in performance. This benchmark creates multiple threads each of which creates a QueryState and accesses it multiple times. We can see improvements in the range 2x - 3.5x. BEFORE: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 1ms Total Time (#Queries: 50 #Accesses: 100) : 8ms Total Time (#Queries: 50 #Accesses: 1000) : 54ms Total Time (#Queries: 500 #Accesses: 100) : 76ms Total Time (#Queries: 500 #Accesses: 1000) : 543ms AFTER: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles Total Time (#Queries: 50 #Accesses: 100) : 4ms Total Time (#Queries: 50 #Accesses: 1000) : 15ms Total Time (#Queries: 500 #Accesses: 100) : 46ms Total Time (#Queries: 500 #Accesses: 1000) : 151ms This change introduces a ShardedQueryMap, which is used to replace the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_, and their corresponding locks, thereby abstracting away the access to the maps locks. For operations that need to happen on every entry in the ShardedQueryMap maps, a new function ShardedQueryMap::DoFuncForAllEntries() is introduced which takes a user supplied lambda and passes it every individual map entry and executes it. NOTE: This microbenchmark has shown that SpinLock has better performance than boost::mutex for the qs_map_lock_'s, so that change has been made too. TODO: Add benchmark for client_request_state_map_lock_ too. The APIs around that are more complicated, so this patch only includes the benchmarking of qs_map_lock_. TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap. Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/process-wide-locks-benchmark.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A be/src/util/sharded-query-map-util.h 8 files changed, 379 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/8 -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 8 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@67 PS7, Line 67: struct MapShard { : std::unordered_map map_; : SpinLock map_lock_; : }; you should run a benchmark to verify, but you may want to explicitly align that to a cache line (by inheriting CacheLineAligned). http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@94 PS7, Line 94: uint8_t see comment below about uint8_t http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@122 PS7, Line 122: uint8_t why uint8_t? Since it doesn't live in memory, doesn't seem necessary to dictate the size. http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@129 PS7, Line 129: std::unordered_map* map_; : SpinLock* map_lock_; This could be just a ShardedQueryMap::MapShard*, but up to you. -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 7 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 12 Feb 2018 22:41:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Hello Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8363 to look at the new patch set (#7). Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ The following 2 locks have shown to be frequent points of contention on recent perf runs: - qs_map_lock_ - client_request_state_map_lock_ Since these are process wide locks, any threads waiting on these locks potentially slow down the runtime of a query. I tried to address this previously by converting the client_request_state_map_lock_ to a reader-writer lock. This showed great perf improvements in the general case, however, there were edge cases with big regressions as well. In the general case, strict readers of the map got through so quickly that we were able to see a reduction in the number of client connections created, since this lock was contended for in the context of Thrift threads too. The bad case is when writers were starved trying to register a new query since there were so many readers. Changing the starve option resulted in worse read performance all round. Another approach which is relatively simpler is to shard the locks, which proves to be very effective with no regressions. The maps and locks are sharded to a default of 4 buckets initally. Query IDs are created by using boost::uuids::random_generator. We use the high bits of a query ID to assign queries to buckets. I verified that the distribution of the high bits of a query ID are even across buckets on my local machine: For 10,000 queries sharded across 4 buckets, the distribution was: bucket[0]: 2500 bucket[1]: 2489 bucket[2]: 2566 bucket[3]: 2445 A micro-benchmark is added to measure the improvement in performance. This benchmark creates multiple threads each of which creates a QueryState and accesses it multiple times. We can see improvements in the range 2x - 3.5x. BEFORE: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 1ms Total Time (#Queries: 50 #Accesses: 100) : 8ms Total Time (#Queries: 50 #Accesses: 1000) : 54ms Total Time (#Queries: 500 #Accesses: 100) : 76ms Total Time (#Queries: 500 #Accesses: 1000) : 543ms AFTER: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles Total Time (#Queries: 50 #Accesses: 100) : 4ms Total Time (#Queries: 50 #Accesses: 1000) : 15ms Total Time (#Queries: 500 #Accesses: 100) : 46ms Total Time (#Queries: 500 #Accesses: 1000) : 151ms This change introduces a ShardedQueryMap, which is used to replace the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_, and their corresponding locks, thereby abstracting away the access to the maps locks. For operations that need to happen on every entry in the ShardedQueryMap maps, a new function ShardedQueryMap::DoFuncForAllEntries() is introduced which takes a user supplied lambda and passes it every individual map entry and executes it. NOTE: This microbenchmark has shown that SpinLock has better performance than boost::mutex for the qs_map_lock_'s, so that change has been made too. TODO: Add benchmark for client_request_state_map_lock_ too. The APIs around that are more complicated, so this patch only includes the benchmarking of qs_map_lock_. TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap. Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/process-wide-locks-benchmark.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A be/src/util/sharded-query-map-util.h 8 files changed, 380 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/7 -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 7 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 7: (11 comments) http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@29 PS6, Line 29: > can you make this a static constexpr inside the Template class to limit the Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@35 PS6, Line 35: /// Usage pattern: > Update - it's not a single spinlock, but a spinlock per shard. Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@43 PS6, Line 43: > I don't think we usually include the word Template since it's obvious by th Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@62 PS6, Line 62: RY_B > maps_ (plural) would help to clarify I moved this into a struct of arrays, so I kept the name as map_. http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@63 PS6, Line 63: > map_locks_ Same as above. I moved this into a struct of arrays, so I kept the name as map_lock_. http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@64 PS6, Line 64: // We group the map and its corresponding lock together to avoid false sharing. Since > you may end up with false sharing with this layout since locks might share Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@66 PS6, Line 66: // they can be allocated on the same cache line. : struct MapShard { > this needs updating. that thing is not an array, and it's not even possible Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@92 PS6, Line 92: const TUniqueId& query_id, class ShardedQueryMap* sharded_map) { > consider adding: map_lock_->DCheckLocked() Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@96 PS6, Line 96: ap_lock_ = &sharded_map->shards_[qs_map_bucket].ma > let's not explain public methods in terms of private fields. do something l Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@98 PS6, Line 98: // Lock the corresponding shard. > consider adding: map_lock_->DCheckLocked() Done http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@117 PS6, Line 117: } > why do we need that? Forgot to get rid of this. Done. -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 7 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Sat, 10 Feb 2018 00:57:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 6: (11 comments) http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@29 PS6, Line 29: #define NUM_QUERY_BUCKETS 4 can you make this a static constexpr inside the Template class to limit the scope? http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@35 PS6, Line 35: /// access to each shard of the map. Update - it's not a single spinlock, but a spinlock per shard. Add: The underlying shard is locked and accessed by instantiating a ScopedShardedMapRef. http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@43 PS6, Line 43: Template I don't think we usually include the word Template since it's obvious by the <> in the use of the type. http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@62 PS6, Line 62: map_ maps_ (plural) would help to clarify http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@63 PS6, Line 63: map_lock_ map_locks_ http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@64 PS6, Line 64: }; you may end up with false sharing with this layout since locks might share cache lines. you'd probably be better off grouping each { map, lock} and using that as an element to the array (and making sure each shard is on its own cacheline). http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@66 PS6, Line 66: /// Use this class to obtain a lock on a ShardedQueryMapTemplate[] for the duration of a : /// function/block, rather than doing so manually. this needs updating. that thing is not an array, and it's not even possible to do it manually, and it would be good to explain what it's giving you a locked reference to. i.e. also say something about how this class gives you a reference to the underlying map that represents the shard. http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@92 PS6, Line 92: ~ScopedShardedMapRef() { consider adding: map_lock_->DCheckLocked() http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@96 PS6, Line 96: Returns the map corresponding to 'qs_map_bucket_'. let's not explain public methods in terms of private fields. do something like: Returns the shard (map) for the 'query_id' passed to the constructor. http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@98 PS6, Line 98: std::unordered_map* get() { consider adding: map_lock_->DCheckLocked() http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@117 PS6, Line 117: uint8_t qs_map_bucket_; why do we need that? -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 6 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 09 Feb 2018 23:26:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@36 PS5, Line 36: > doesn't that have to be NUM_QUERY_BUCKETS? otherwise, QueryIdToBucket() doe Yes, I've made the change to have everything encapsulated within these classes. http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61 PS5, Line 61: > This comment would become moot once you encode the number of shards. then t Done http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61 PS5, Line 61: > how about 'container_array' to make it clear this must be an array? Done http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@63 PS5, Line 63: NUM_QUERY_BUCKE > like above, given this is a function of the number of shards, it should be Done -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 6 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 09 Feb 2018 22:18:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Hello Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8363 to look at the new patch set (#6). Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ The following 2 locks have shown to be frequent points of contention on recent perf runs: - qs_map_lock_ - client_request_state_map_lock_ Since these are process wide locks, any threads waiting on these locks potentially slow down the runtime of a query. I tried to address this previously by converting the client_request_state_map_lock_ to a reader-writer lock. This showed great perf improvements in the general case, however, there were edge cases with big regressions as well. In the general case, strict readers of the map got through so quickly that we were able to see a reduction in the number of client connections created, since this lock was contended for in the context of Thrift threads too. The bad case is when writers were starved trying to register a new query since there were so many readers. Changing the starve option resulted in worse read performance all round. Another approach which is relatively simpler is to shard the locks, which proves to be very effective with no regressions. The maps and locks are sharded to a default of 4 buckets initally. Query IDs are created by using boost::uuids::random_generator. We use the high bits of a query ID to assign queries to buckets. I verified that the distribution of the high bits of a query ID are even across buckets on my local machine: For 10,000 queries sharded across 4 buckets, the distribution was: bucket[0]: 2500 bucket[1]: 2489 bucket[2]: 2566 bucket[3]: 2445 A micro-benchmark is added to measure the improvement in performance. This benchmark creates multiple threads each of which creates a QueryState and accesses it multiple times. We can see improvements in the range 2x - 3.5x. BEFORE: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 1ms Total Time (#Queries: 50 #Accesses: 100) : 8ms Total Time (#Queries: 50 #Accesses: 1000) : 54ms Total Time (#Queries: 500 #Accesses: 100) : 76ms Total Time (#Queries: 500 #Accesses: 1000) : 543ms AFTER: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles Total Time (#Queries: 50 #Accesses: 100) : 4ms Total Time (#Queries: 50 #Accesses: 1000) : 15ms Total Time (#Queries: 500 #Accesses: 100) : 46ms Total Time (#Queries: 500 #Accesses: 1000) : 151ms This change introduces a ShardedQueryMapTemplate, which is used to replace the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_, and their corresponding locks, thereby abstracting away the access to the maps locks. For operations that need to happen on every entry in the ShardedQueryMapTemplate maps, a new function ShardedQueryMapTemplate::DoFuncForAllEntries() is introduced which takes a user supplied lambda and passes it every individual map entry and executes it. NOTE: This microbenchmark has shown that SpinLock has better performance than boost::mutex for the qs_map_lock_'s, so that change has been made too. TODO: Add benchmark for client_request_state_map_lock_ too. The APIs around that are more complicated, so this patch only includes the benchmarking of qs_map_lock_. TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap. Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/process-wide-locks-benchmark.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A be/src/util/sharded-query-map-util.h 8 files changed, 367 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/6 -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 6 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61 PS5, Line 61: map_container > how about 'container_array' to make it clear this must be an array? This comment would become moot once you encode the number of shards. then this parameter is just 'sharded_map' or similar (and contains all shards). -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 5 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 06 Feb 2018 20:14:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@36 PS5, Line 36: 4 doesn't that have to be NUM_QUERY_BUCKETS? otherwise, QueryIdToBucket() doesn't work. But, I think it'd be better to have everything encapsulated in these classes, i.e. the number of shards should be either encoded in here. http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61 PS5, Line 61: map_container how about 'container_array' to make it clear this must be an array? http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@63 PS5, Line 63: QueryIdToBucket like above, given this is a function of the number of shards, it should be part of this class. -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 5 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 06 Feb 2018 20:12:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Hello Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8363 to look at the new patch set (#5). Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ The following 2 locks have shown to be frequent points of contention on recent perf runs: - qs_map_lock_ - client_request_state_map_lock_ Since these are process wide locks, any threads waiting on these locks potentially slow down the runtime of a query. I tried to address this previously by converting the client_request_state_map_lock_ to a reader-writer lock. This showed great perf improvements in the general case, however, there were edge cases with big regressions as well. In the general case, strict readers of the map got through so quickly that we were able to see a reduction in the number of client connections created, since this lock was contended for in the context of Thrift threads too. The bad case is when writers were starved trying to register a new query since there were so many readers. Changing the starve option resulted in worse read performance all round. Another approach which is relatively simpler is to shard the locks, which proves to be very effective with no regressions. The maps and locks are sharded to a default of 4 buckets initally. Query IDs are created by using boost::uuids::random_generator. We use the high bits of a query ID to assign queries to buckets. I verified that the distribution of the high bits of a query ID are even across buckets on my local machine: For 10,000 queries sharded across 4 buckets, the distribution was: bucket[0]: 2500 bucket[1]: 2489 bucket[2]: 2566 bucket[3]: 2445 A micro-benchmark is added to measure the improvement in performance. This benchmark creates multiple threads each of which creates a QueryState and accesses it multiple times. We can see improvements in the range 2x - 3.5x. BEFORE: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 1ms Total Time (#Queries: 50 #Accesses: 100) : 8ms Total Time (#Queries: 50 #Accesses: 1000) : 54ms Total Time (#Queries: 500 #Accesses: 100) : 76ms Total Time (#Queries: 500 #Accesses: 1000) : 543ms AFTER: --Benchmark 1: Create and access Query States. Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles Total Time (#Queries: 50 #Accesses: 100) : 4ms Total Time (#Queries: 50 #Accesses: 1000) : 15ms Total Time (#Queries: 500 #Accesses: 100) : 46ms Total Time (#Queries: 500 #Accesses: 1000) : 151ms This change introduces a ShardedQueryMapTemplate, which is used to replace the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_, and their corresponding locks. NOTE: This microbenchmark has shown that SpinLock has better performance than boost::mutex for the qs_map_lock_'s, so that change has been made too. TODO: Add benchmark for client_request_state_map_lock_ too. The APIs around that are more complicated, so this patch only includes the benchmarking of qs_map_lock_. TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap. Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/process-wide-locks-benchmark.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A be/src/util/sharded-query-map-util.h M be/src/util/uid-util.h 9 files changed, 353 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/5 -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 5 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@73 PS4, Line 73: : QueryState* QueryExecMgr::CreateQueryState(cons > just call get() to make it obvious this is the same Done http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@78 PS4, Line 78: return qs; : } > would probably make the above code simpler to instead just store direct poi Done http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h@830 PS4, Line 830: typedef boost::unordered_map QueryLogIndex; > same comment as in the other scoped class. Done. I made a ScopedShardedMapRef class in be/src/util/sharded-query-map-util.h -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 5 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 06 Feb 2018 20:00:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@73 PS4, Line 73: exec_mgr_->qs_map_lock_[qs_map_bucket_].DCheckLocked(); : return &exec_mgr_->qs_map_[qs_map_bucket_]; just call get() to make it obvious this is the same http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@78 PS4, Line 78: QueryExecMgr* exec_mgr_; : uint8_t qs_map_bucket_; would probably make the above code simpler to instead just store direct pointers to the lock and the map. http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h@830 PS4, Line 830: uint8_t client_map_bucket_; same comment as in the other scoped class. (consider maybe even creating a template ScopedShardedMapRef so you can share the pattern explicitly and just make these scoped classes be typedefs). -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 4 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 01:40:22 + Gerrit-HasComments: Yes