[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 10
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 15 Feb 2018 05:04:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..

IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the 
client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
sharded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

BEFORE:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

AFTER:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

This change introduces a ShardedQueryMap, which is used to replace
the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_,
and their corresponding locks, thereby abstracting away the access to the
maps locks.

For operations that need to happen on every entry in the ShardedQueryMap
maps, a new function ShardedQueryMap::DoFuncForAllEntries() is
introduced which takes a user supplied lambda and passes it every individual
map entry and executes it.

NOTE: This microbenchmark has shown that SpinLock has better performance
than boost::mutex for the qs_map_lock_'s, so that change has been made
too.

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Reviewed-on: http://gerrit.cloudera.org:8080/8363
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A be/src/util/sharded-query-map-util.h
8 files changed, 379 insertions(+), 59 deletions(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 11
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1940/


--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 10
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 15 Feb 2018 01:26:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 10: Code-Review+2

Fixed clang-tidy issues.

Rebase, carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 10
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 15 Feb 2018 01:26:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-14 Thread Sailesh Mukil (Code Review)
Hello Philip Zeyliger, Dan Hecht, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8363

to look at the new patch set (#10).

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..

IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the 
client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
sharded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

BEFORE:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

AFTER:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

This change introduces a ShardedQueryMap, which is used to replace
the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_,
and their corresponding locks, thereby abstracting away the access to the
maps locks.

For operations that need to happen on every entry in the ShardedQueryMap
maps, a new function ShardedQueryMap::DoFuncForAllEntries() is
introduced which takes a user supplied lambda and passes it every individual
map entry and executes it.

NOTE: This microbenchmark has shown that SpinLock has better performance
than boost::mutex for the qs_map_lock_'s, so that change has been made
too.

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A be/src/util/sharded-query-map-util.h
8 files changed, 379 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/10
--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 10
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1925/


--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 13 Feb 2018 04:32:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1925/


--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 13 Feb 2018 00:47:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 9: Code-Review+2

(3 comments)

Thanks for the review.

Carry +2.

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h@28
PS8, Line 28:
> I don't see that used explicitly in this file. Can it be removed?
Done


http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h@44
PS8, Line 44: #include "util/thread-pool
> same
Done


http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h@95
PS8, Line 95: int qs_m
> generally we just use the un-sized primitive type (which for impala codebas
Done



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 13 Feb 2018 00:46:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Sailesh Mukil (Code Review)
Hello Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8363

to look at the new patch set (#9).

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..

IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the 
client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
sharded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

BEFORE:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

AFTER:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

This change introduces a ShardedQueryMap, which is used to replace
the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_,
and their corresponding locks, thereby abstracting away the access to the
maps locks.

For operations that need to happen on every entry in the ShardedQueryMap
maps, a new function ShardedQueryMap::DoFuncForAllEntries() is
introduced which takes a user supplied lambda and passes it every individual
map entry and executes it.

NOTE: This microbenchmark has shown that SpinLock has better performance
than boost::mutex for the qs_map_lock_'s, so that change has been made
too.

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A be/src/util/sharded-query-map-util.h
8 files changed, 376 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/9
--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 8: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/runtime/query-exec-mgr.h@28
PS8, Line 28: #include "util/spinlock.h"
I don't see that used explicitly in this file. Can it be removed?


http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/service/impala-server.h@44
PS8, Line 44: #include "util/spinlock.h"
same


http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/8/be/src/util/sharded-query-map-util.h@95
PS8, Line 95: uint64_t
generally we just use the un-sized primitive type (which for impala codebase we 
default to 'int').



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 8
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 13 Feb 2018 00:10:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@67
PS7, Line 67: struct MapShard {
: std::unordered_map map_;
: SpinLock map_lock_;
:   };
> you should run a benchmark to verify, but you may want to explicitly align
I ran the benchmarks both with and without inheriting CacheLineAligned, and 
there wasn't a noticeable delta on my machine. But I inherited CacheLineAligned 
to be more explicit.


http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@94
PS7, Line 94: uint8_t
> see comment below about uint8_t
Done


http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@122
PS7, Line 122: uint8_t
> why uint8_t?  Since it doesn't live in memory, doesn't seem necessary to di
I thought it wouldn't make sense to have more than 255 buckets, but better not 
to make my own assumptions. I changed it to uint64_t. Do let me know if you had 
something else in mind.


http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@129
PS7, Line 129:   std::unordered_map* map_;
 :   SpinLock* map_lock_;
> This could be just a ShardedQueryMap::MapShard*, but up to you.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 12 Feb 2018 23:56:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Sailesh Mukil (Code Review)
Hello Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8363

to look at the new patch set (#8).

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..

IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the 
client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
sharded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

BEFORE:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

AFTER:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

This change introduces a ShardedQueryMap, which is used to replace
the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_,
and their corresponding locks, thereby abstracting away the access to the
maps locks.

For operations that need to happen on every entry in the ShardedQueryMap
maps, a new function ShardedQueryMap::DoFuncForAllEntries() is
introduced which takes a user supplied lambda and passes it every individual
map entry and executes it.

NOTE: This microbenchmark has shown that SpinLock has better performance
than boost::mutex for the qs_map_lock_'s, so that change has been made
too.

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A be/src/util/sharded-query-map-util.h
8 files changed, 379 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/8
--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 8
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@67
PS7, Line 67: struct MapShard {
: std::unordered_map map_;
: SpinLock map_lock_;
:   };
you should run a benchmark to verify, but you may want to explicitly align that 
to a cache line (by inheriting CacheLineAligned).


http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@94
PS7, Line 94: uint8_t
see comment below about uint8_t


http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@122
PS7, Line 122: uint8_t
why uint8_t?  Since it doesn't live in memory, doesn't seem necessary to 
dictate the size.


http://gerrit.cloudera.org:8080/#/c/8363/7/be/src/util/sharded-query-map-util.h@129
PS7, Line 129:   std::unordered_map* map_;
 :   SpinLock* map_lock_;
This could be just a ShardedQueryMap::MapShard*, but up to you.



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 12 Feb 2018 22:41:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-09 Thread Sailesh Mukil (Code Review)
Hello Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8363

to look at the new patch set (#7).

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..

IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the 
client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
sharded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

BEFORE:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

AFTER:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

This change introduces a ShardedQueryMap, which is used to replace
the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_,
and their corresponding locks, thereby abstracting away the access to the
maps locks.

For operations that need to happen on every entry in the ShardedQueryMap
maps, a new function ShardedQueryMap::DoFuncForAllEntries() is
introduced which takes a user supplied lambda and passes it every individual
map entry and executes it.

NOTE: This microbenchmark has shown that SpinLock has better performance
than boost::mutex for the qs_map_lock_'s, so that change has been made
too.

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A be/src/util/sharded-query-map-util.h
8 files changed, 380 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/7
--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-09 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 7:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@29
PS6, Line 29:
> can you make this a static constexpr inside the Template class to limit the
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@35
PS6, Line 35: /// Usage pattern:
> Update - it's not a single spinlock, but a spinlock per shard.
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@43
PS6, Line 43:
> I don't think we usually include the word Template since it's obvious by th
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@62
PS6, Line 62: RY_B
> maps_ (plural) would help to clarify
I moved this into a struct of arrays, so I kept the name as map_.


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@63
PS6, Line 63:
> map_locks_
Same as above. I moved this into a struct of arrays, so I kept the name as 
map_lock_.


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@64
PS6, Line 64:   // We group the map and its corresponding lock together to 
avoid false sharing. Since
> you may end up with false sharing with this layout since locks might share
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@66
PS6, Line 66:   // they can be allocated on the same cache line.
:   struct MapShard {
> this needs updating. that thing is not an array, and it's not even possible
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@92
PS6, Line 92:   const TUniqueId& query_id, class ShardedQueryMap* 
sharded_map) {
> consider adding: map_lock_->DCheckLocked()
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@96
PS6, Line 96: ap_lock_ = &sharded_map->shards_[qs_map_bucket].ma
> let's not explain public methods in terms of private fields. do something l
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@98
PS6, Line 98: // Lock the corresponding shard.
> consider adding: map_lock_->DCheckLocked()
Done


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@117
PS6, Line 117:   }
> why do we need that?
Forgot to get rid of this. Done.



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Sat, 10 Feb 2018 00:57:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-09 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 6:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@29
PS6, Line 29: #define NUM_QUERY_BUCKETS 4
can you make this a static constexpr inside the Template class to limit the 
scope?


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@35
PS6, Line 35: /// access to each shard of the map.
Update - it's not a single spinlock, but a spinlock per shard.

Add:
The underlying shard is locked and accessed by instantiating a 
ScopedShardedMapRef.


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@43
PS6, Line 43: Template
I don't think we usually include the word Template since it's obvious by the <> 
in the use of the type.


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@62
PS6, Line 62: map_
maps_ (plural) would help to clarify


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@63
PS6, Line 63: map_lock_
map_locks_


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@64
PS6, Line 64: };
you may end up with false sharing with this layout since locks might share 
cache lines. you'd probably be better off grouping each { map, lock} and using 
that as an element to the array (and making sure each shard is on its own 
cacheline).


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@66
PS6, Line 66: /// Use this class to obtain a lock on a 
ShardedQueryMapTemplate[] for the duration of a
: /// function/block, rather than doing so manually.
this needs updating. that thing is not an array, and it's not even possible to 
do it manually, and it would be good to explain what it's giving you a locked 
reference to. i.e. also say something about how this class gives you a 
reference to the underlying map that represents the shard.


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@92
PS6, Line 92:   ~ScopedShardedMapRef() {
consider adding: map_lock_->DCheckLocked()


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@96
PS6, Line 96: Returns the map corresponding to 'qs_map_bucket_'.
let's not explain public methods in terms of private fields. do something like:
Returns the shard (map) for the 'query_id' passed to the constructor.


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@98
PS6, Line 98:   std::unordered_map* get() {
consider adding: map_lock_->DCheckLocked()


http://gerrit.cloudera.org:8080/#/c/8363/6/be/src/util/sharded-query-map-util.h@117
PS6, Line 117:   uint8_t qs_map_bucket_;
why do we need that?



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 6
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 09 Feb 2018 23:26:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-09 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@36
PS5, Line 36:
> doesn't that have to be NUM_QUERY_BUCKETS? otherwise, QueryIdToBucket() doe
Yes, I've made the change to have everything encapsulated within these classes.


http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61
PS5, Line 61:
> This comment would become moot once you encode the number of shards. then t
Done


http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61
PS5, Line 61:
> how about 'container_array' to make it clear this must be an array?
Done


http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@63
PS5, Line 63: NUM_QUERY_BUCKE
> like above, given this is a function of the number of shards, it should be
Done



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 6
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 09 Feb 2018 22:18:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-09 Thread Sailesh Mukil (Code Review)
Hello Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8363

to look at the new patch set (#6).

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..

IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the 
client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
sharded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

BEFORE:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

AFTER:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

This change introduces a ShardedQueryMapTemplate, which is used to replace
the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_,
and their corresponding locks, thereby abstracting away the access to the
maps locks.

For operations that need to happen on every entry in the ShardedQueryMapTemplate
maps, a new function ShardedQueryMapTemplate::DoFuncForAllEntries() is
introduced which takes a user supplied lambda and passes it every individual
map entry and executes it.

NOTE: This microbenchmark has shown that SpinLock has better performance
than boost::mutex for the qs_map_lock_'s, so that change has been made
too.

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A be/src/util/sharded-query-map-util.h
8 files changed, 367 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/6
--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 6
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-06 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61
PS5, Line 61: map_container
> how about 'container_array' to make it clear this must be an array?
This comment would become moot once you encode the number of shards. then this 
parameter is just 'sharded_map' or similar (and contains all shards).



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 5
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 06 Feb 2018 20:14:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-06 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@36
PS5, Line 36: 4
doesn't that have to be NUM_QUERY_BUCKETS? otherwise, QueryIdToBucket() doesn't 
work.

But, I think it'd be better to have everything encapsulated in these classes, 
i.e. the number of shards should be either encoded in here.


http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@61
PS5, Line 61: map_container
how about 'container_array' to make it clear this must be an array?


http://gerrit.cloudera.org:8080/#/c/8363/5/be/src/util/sharded-query-map-util.h@63
PS5, Line 63: QueryIdToBucket
like above, given this is a function of the number of shards, it should be part 
of this class.



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 5
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 06 Feb 2018 20:12:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-06 Thread Sailesh Mukil (Code Review)
Hello Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8363

to look at the new patch set (#5).

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..

IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the 
client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
sharded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

BEFORE:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

AFTER:
--Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

This change introduces a ShardedQueryMapTemplate, which is used to replace
the QueryExecMgr::qs_map_ and the ImpalaServer::client_request_state_map_,
and their corresponding locks.

NOTE: This microbenchmark has shown that SpinLock has better performance
than boost::mutex for the qs_map_lock_'s, so that change has been made
too.

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

TODO 2: Consider adopting the ShardedQueryMapTemplate for the SessionStateMap.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A be/src/util/sharded-query-map-util.h
M be/src/util/uid-util.h
9 files changed, 353 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/8363/5
--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 5
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2018-02-06 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@73
PS4, Line 73:
: QueryState* QueryExecMgr::CreateQueryState(cons
> just call get() to make it obvious this is the same
Done


http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@78
PS4, Line 78:   return qs;
: }
> would probably make the above code simpler to instead just store direct poi
Done


http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h@830
PS4, Line 830:   typedef boost::unordered_map 
QueryLogIndex;
> same comment as in the other scoped class.
Done. I made a ScopedShardedMapRef class in be/src/util/sharded-query-map-util.h



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 5
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 06 Feb 2018 20:00:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2017-11-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@73
PS4, Line 73: exec_mgr_->qs_map_lock_[qs_map_bucket_].DCheckLocked();
: return &exec_mgr_->qs_map_[qs_map_bucket_];
just call get() to make it obvious this is the same


http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/runtime/query-exec-mgr.cc@78
PS4, Line 78:   QueryExecMgr* exec_mgr_;
:   uint8_t qs_map_bucket_;
would probably make the above code simpler to instead just store direct 
pointers to the lock and the map.


http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8363/4/be/src/service/impala-server.h@830
PS4, Line 830: uint8_t client_map_bucket_;
same comment as in the other scoped class.
(consider maybe even creating a template ScopedShardedMapRef 
so you can share the pattern explicitly and just make these scoped classes be 
typedefs).



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 01:40:22 +
Gerrit-HasComments: Yes