[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.

This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Reviewed-on: http://gerrit.cloudera.org:8080/6707
Reviewed-by: Bharath Vissapragada 
Tested-by: Impala Public Jenkins
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 164 insertions(+), 54 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Bharath Vissapragada: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 11:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/611/

-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 11: Code-Review+2

Carrying +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Bharath Vissapragada (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6707

to look at the new patch set (#11).

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..

IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.

This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 164 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6707/10/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS10, Line 83: result = 
impalad.service.read_debug_webpage("query_profile_encoded?query_id="\
 : + inflight_query_ids[0]['query_id'])
 : assert result.startswith("Could not obtain runtime profile")
> This should be moved to after line 90 (after the original test case). other
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 10: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6707/10/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS10, Line 83: result = 
impalad.service.read_debug_webpage("query_profile_encoded?query_id="\
 : + inflight_query_ids[0]['query_id'])
 : assert result.startswith("Could not obtain runtime profile")
This should be moved to after line 90 (after the original test case). 
otherwise, in the case without the fix, hitting this webpage causes the test to 
block until the planning delay is over and allows the test to pass


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-23 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6707/9/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS9, Line 75: 3
> I don't think it waits 15s everytime. It waits for 15s overall. So it still
What I'm saying is that with only 30s the test can still pass even on a build 
that contains the bug, because the second timeout expires at after the delay in 
impalad is over.  So, the test doesn't work as a regression test with such a 
low timeout.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#10).

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..

IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.

This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 164 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6707/9/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 607:   if (request_state->query_state() == 
beeswax::QueryState::CREATED) {
> as we discussed in person, here we'll return something like the caller does
Done


Line 607:   if (request_state->query_state() == 
beeswax::QueryState::CREATED) {
> Additionally, let's exercise this case in the new test.
Done


http://gerrit.cloudera.org:8080/#/c/6707/9/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS9, Line 75: 3
> that doesn't seem large enough. check_registered_queries() is allowed to wa
I don't think it waits 15s everytime. It waits for 15s overall. So it still 
returns before pause_injection time is hit (30s). Not sure I follow you. Did it 
fail for you locally? In any case, I made the pause 100s, we don't wait on it 
anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6707/9/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 607:   if (request_state->query_state() == 
beeswax::QueryState::CREATED) {
> as we discussed in person, here we'll return something like the caller does
Additionally, let's exercise this case in the new test.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6707/9/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 607:   if (request_state->query_state() == 
beeswax::QueryState::CREATED) {
as we discussed in person, here we'll return something like the caller does. 
Actually, maybe we should just return an error Status with the message, and 
then let the caller put that string in the profile?


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6707/9/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS9, Line 75: 3
that doesn't seem large enough. check_registered_queries() is allowed to wait 
15 seconds each time, so the test can still pass  even if the bug exists, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-19 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#9).

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..

IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.

This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 162 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-19 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#8).

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..

IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.

This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 164 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-19 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 7:

(8 comments)

The new test fails deterministically without the patch when run locally.

http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

PS7, Line 296: NULL
> nit: change that one too to at least keep functions consistent
Done


http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

PS7, Line 721: just return
> that's not what the code does (it also sets plan_metadata_unavailable), ple
Done


PS7, Line 730: adopt_lock_t
> shouldn't that be deleted?
Oops, missed that during rebase.


http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 836: #endif
> actually, how about moving this to before UpdateQueryStatus() since that mo
Done


http://gerrit.cloudera.org:8080/#/c/6707/7/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS7, Line 32: The intention here is to check contention on the 
query_exec_state_map_lock_
> This is talking about how the old code worked, which won't make sense to pe
Rephrased.


PS7, Line 54: This creates lock contention on the coordinator by
: calling QuerySummaryHandler() method
> This is no longer true with your fix. How about saying:
Yea it doesn't make sense to someone reading it with the fix. . IMO, the test 
class comment conveys the intention already and this specific comment looks 
redundant. Removed it, please let me know if you disagree.


PS7, Line 74: time.sleep(2)
> I'm worried that this will be flaky, especially with ASAN.  Instead of this
Yep good idea to reduce flakiness. But I don't think passing a large value to 
get_inflight_queries() would achieve that since the /inflight_queries end point 
never times out. Instead we should repeatedly poll that page in loop with 
timeout. Redid the logic accordingly.


PS7, Line 83: time.sleep(2)
> this delay is a bit harder to eliminate.  How about we increase --stress_me
Same comment as above. I redid the logic, please let me know if that makes 
sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-19 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

PS7, Line 296: NULL
nit: change that one too to at least keep functions consistent


http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

PS7, Line 721: just return
that's not what the code does (it also sets plan_metadata_unavailable), please 
rephrase.  Could rephrase the whole comment as: 

If the query plan isn't generated, avoid waiting for the lock, which could take 
a while if catalog metadata is being loaded.


PS7, Line 730: adopt_lock_t
shouldn't that be deleted?


http://gerrit.cloudera.org:8080/#/c/6707/7/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS7, Line 32: The intention here is to check contention on the 
query_exec_state_map_lock_
This is talking about how the old code worked, which won't make sense to people 
reading the current code (after this change). It should say something like:

The intention is to check that the webserver does not hold any global locks or 
otherwise prevent impalad from servicing new requests.


PS7, Line 54: This creates lock contention on the coordinator by
: calling QuerySummaryHandler() method
This is no longer true with your fix. How about saying:

This is to verify that QuerySummaryHandler() does not hold any global locks 
that would, for example, prevent another query from starting.


PS7, Line 74: time.sleep(2)
I'm worried that this will be flaky, especially with ASAN.  Instead of this 
delay, couldn't we just wait for in_flight_queries to become 1?  And you could 
use the parameter to get_in_flight_queries() to do that by passing some largish 
value. That has the advantage that we'll wait only as long as necessary for the 
value to change to 1, so we can have a relatively long timeout (rather than 
delay).


PS7, Line 83: time.sleep(2)
this delay is a bit harder to eliminate.  How about we increase 
--stress_metadata_loading_pause_injection_ms to something really large, say 
1000 seconds (which doesn't matter -- we don't actually need the queries to 
finish planning to end the test, right?). 

And then we can use a larger timeout here, but we don't need to delay for it. 
We can just do:

inflight_query_ids = impalad.service.get_in_flight_queries(30)

which will poll the webui once per second and give up after 30 seconds.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-18 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6707/6//COMMIT_MSG
Commit Message:

PS6, Line 16: While the actual fix for this is to not hold the QES::lock_ during
: metadata load,
> I disagree. what you are doing is the actual fix for the lock acquisition o
Done


PS6, Line 20: We make sure that QES::lock_ doesn't block 
query_exec_state_map_lock_
:   in hot paths.
> After this change, isn't it the case that we'll never try to acquire anothe
This was stale. Updated.


http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 192:   DEFINE_int64(metadata_loading_pause_injection_ms, 0, "Simulates 
metadata loading for a "
> how about prefixing the flag with "stress_", like we do for the one in free
Done


PS6, Line 822: // Note: this acquires the exec_state lock *before* the
 : // query_exec_state_map_ lock. This is the opposite of
 : // GetQueryExecState(..., true), and therefore looks like a
 : // candidate for deadlock. The reason this works here is that
 : // GetQueryExecState cannot find exec_state (under the exec 
state
 : // map lock) and take it's lock until RegisterQuery has
 : // finished. By that point, the exec state map lock will 
have been
 : // given up, so the classic deadlock interleaving is not 
possible.
> this comment is outdated now.
Removed.


http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS6, Line 85: /// 4. query_exec_state_map_lock_
> i think this should be updated. with this change, do we ever hold this lock
No. Checked the usage of in the callers. Updated the comment.


http://gerrit.cloudera.org:8080/#/c/6707/6/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS6, Line 55: GetRuntimeProfileStr
> This doesn't make sense. GetRuntimeProfileStr() doesn't hold the map lock w
Ah this comment is wrong. My intention was exactly to call 
QuerySummaryHandler() which is place that calls with the interleaving locking 
that causes the hang. Corrected it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention

2017-05-18 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#7).

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ 
contention
..

IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.

This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 151 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson