[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Reviewed-on: http://gerrit.cloudera.org:8080/6707 Reviewed-by: Bharath VissapragadaTested-by: Impala Public Jenkins --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 164 insertions(+), 54 deletions(-) Approvals: Impala Public Jenkins: Verified Bharath Vissapragada: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 11: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/611/ -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 11: Code-Review+2 Carrying +2. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6707 to look at the new patch set (#11). Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 164 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/11 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/6707/10/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS10, Line 83: result = impalad.service.read_debug_webpage("query_profile_encoded?query_id="\ : + inflight_query_ids[0]['query_id']) : assert result.startswith("Could not obtain runtime profile") > This should be moved to after line 90 (after the original test case). other Done -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 10: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6707/10/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS10, Line 83: result = impalad.service.read_debug_webpage("query_profile_encoded?query_id="\ : + inflight_query_ids[0]['query_id']) : assert result.startswith("Could not obtain runtime profile") This should be moved to after line 90 (after the original test case). otherwise, in the case without the fix, hitting this webpage causes the test to block until the planning delay is over and allows the test to pass -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6707/9/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS9, Line 75: 3 > I don't think it waits 15s everytime. It waits for 15s overall. So it still What I'm saying is that with only 30s the test can still pass even on a build that contains the bug, because the second timeout expires at after the delay in impalad is over. So, the test doesn't work as a regression test with such a low timeout. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has uploaded a new patch set (#10). Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 164 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/10 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/6707/9/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 607: if (request_state->query_state() == beeswax::QueryState::CREATED) { > as we discussed in person, here we'll return something like the caller does Done Line 607: if (request_state->query_state() == beeswax::QueryState::CREATED) { > Additionally, let's exercise this case in the new test. Done http://gerrit.cloudera.org:8080/#/c/6707/9/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS9, Line 75: 3 > that doesn't seem large enough. check_registered_queries() is allowed to wa I don't think it waits 15s everytime. It waits for 15s overall. So it still returns before pause_injection time is hit (30s). Not sure I follow you. Did it fail for you locally? In any case, I made the pause 100s, we don't wait on it anyway. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6707/9/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 607: if (request_state->query_state() == beeswax::QueryState::CREATED) { > as we discussed in person, here we'll return something like the caller does Additionally, let's exercise this case in the new test. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6707/9/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 607: if (request_state->query_state() == beeswax::QueryState::CREATED) { as we discussed in person, here we'll return something like the caller does. Actually, maybe we should just return an error Status with the message, and then let the caller put that string in the profile? -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6707/9/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS9, Line 75: 3 that doesn't seem large enough. check_registered_queries() is allowed to wait 15 seconds each time, so the test can still pass even if the bug exists, right? -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has uploaded a new patch set (#9). Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 162 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/9 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has uploaded a new patch set (#8). Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 164 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/8 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 7: (8 comments) The new test fails deterministically without the patch when run locally. http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: PS7, Line 296: NULL > nit: change that one too to at least keep functions consistent Done http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: PS7, Line 721: just return > that's not what the code does (it also sets plan_metadata_unavailable), ple Done PS7, Line 730: adopt_lock_t > shouldn't that be deleted? Oops, missed that during rebase. http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 836: #endif > actually, how about moving this to before UpdateQueryStatus() since that mo Done http://gerrit.cloudera.org:8080/#/c/6707/7/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS7, Line 32: The intention here is to check contention on the query_exec_state_map_lock_ > This is talking about how the old code worked, which won't make sense to pe Rephrased. PS7, Line 54: This creates lock contention on the coordinator by : calling QuerySummaryHandler() method > This is no longer true with your fix. How about saying: Yea it doesn't make sense to someone reading it with the fix. . IMO, the test class comment conveys the intention already and this specific comment looks redundant. Removed it, please let me know if you disagree. PS7, Line 74: time.sleep(2) > I'm worried that this will be flaky, especially with ASAN. Instead of this Yep good idea to reduce flakiness. But I don't think passing a large value to get_inflight_queries() would achieve that since the /inflight_queries end point never times out. Instead we should repeatedly poll that page in loop with timeout. Redid the logic accordingly. PS7, Line 83: time.sleep(2) > this delay is a bit harder to eliminate. How about we increase --stress_me Same comment as above. I redid the logic, please let me know if that makes sense. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: PS7, Line 296: NULL nit: change that one too to at least keep functions consistent http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: PS7, Line 721: just return that's not what the code does (it also sets plan_metadata_unavailable), please rephrase. Could rephrase the whole comment as: If the query plan isn't generated, avoid waiting for the lock, which could take a while if catalog metadata is being loaded. PS7, Line 730: adopt_lock_t shouldn't that be deleted? http://gerrit.cloudera.org:8080/#/c/6707/7/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS7, Line 32: The intention here is to check contention on the query_exec_state_map_lock_ This is talking about how the old code worked, which won't make sense to people reading the current code (after this change). It should say something like: The intention is to check that the webserver does not hold any global locks or otherwise prevent impalad from servicing new requests. PS7, Line 54: This creates lock contention on the coordinator by : calling QuerySummaryHandler() method This is no longer true with your fix. How about saying: This is to verify that QuerySummaryHandler() does not hold any global locks that would, for example, prevent another query from starting. PS7, Line 74: time.sleep(2) I'm worried that this will be flaky, especially with ASAN. Instead of this delay, couldn't we just wait for in_flight_queries to become 1? And you could use the parameter to get_in_flight_queries() to do that by passing some largish value. That has the advantage that we'll wait only as long as necessary for the value to change to 1, so we can have a relatively long timeout (rather than delay). PS7, Line 83: time.sleep(2) this delay is a bit harder to eliminate. How about we increase --stress_metadata_loading_pause_injection_ms to something really large, say 1000 seconds (which doesn't matter -- we don't actually need the queries to finish planning to end the test, right?). And then we can use a larger timeout here, but we don't need to delay for it. We can just do: inflight_query_ids = impalad.service.get_in_flight_queries(30) which will poll the webui once per second and give up after 30 seconds. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/6707/6//COMMIT_MSG Commit Message: PS6, Line 16: While the actual fix for this is to not hold the QES::lock_ during : metadata load, > I disagree. what you are doing is the actual fix for the lock acquisition o Done PS6, Line 20: We make sure that QES::lock_ doesn't block query_exec_state_map_lock_ : in hot paths. > After this change, isn't it the case that we'll never try to acquire anothe This was stale. Updated. http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 192: DEFINE_int64(metadata_loading_pause_injection_ms, 0, "Simulates metadata loading for a " > how about prefixing the flag with "stress_", like we do for the one in free Done PS6, Line 822: // Note: this acquires the exec_state lock *before* the : // query_exec_state_map_ lock. This is the opposite of : // GetQueryExecState(..., true), and therefore looks like a : // candidate for deadlock. The reason this works here is that : // GetQueryExecState cannot find exec_state (under the exec state : // map lock) and take it's lock until RegisterQuery has : // finished. By that point, the exec state map lock will have been : // given up, so the classic deadlock interleaving is not possible. > this comment is outdated now. Removed. http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.h File be/src/service/impala-server.h: PS6, Line 85: /// 4. query_exec_state_map_lock_ > i think this should be updated. with this change, do we ever hold this lock No. Checked the usage of in the callers. Updated the comment. http://gerrit.cloudera.org:8080/#/c/6707/6/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS6, Line 55: GetRuntimeProfileStr > This doesn't make sense. GetRuntimeProfileStr() doesn't hold the map lock w Ah this comment is wrong. My intention was exactly to call QuerySummaryHandler() which is place that calls with the interleaving locking that causes the hang. Corrected it. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Bharath Vissapragada has uploaded a new patch set (#7). Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention .. IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 151 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/7 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson