Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/9038 )
Change subject: IMPALA-2642: Fix a potential deadlock in statestore ...................................................................... Patch Set 2: (3 comments) Working on the test case. http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG@17 PS2, Line 17: Testing: The problem is easily reproduced by lowering the value of : STATESTORE_MAX_SUBSCRIBERS to 3, and then launching a mini cluster : with 3 impalads. With the fix, we can see the following entry in the : statestore log: > Is it possible to add a test for this? Doing it. The approach I am taking requires changing STATESTORE_MAX_SUBSCRIBERS to FLAGS_statestore_max_subscribers, so that we can start statestored with a low max value. I think this is good to do anyway, as that allows us to test other boundary behavior. http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353 PS2, Line 353: // Assumes that the caller has taken subscribers_lock_ > I would add this comment in the header comment in the .h file. Done http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353 PS2, Line 353: // Assumes that the caller has taken subscribers_lock_ > Its kind of weird IMO that OfferUpdate() expects the caller to take subscri Thought of that, ie, moving the call to OfferUpdate from DoSubscriberUpdate to another scope that does not have subscribers_lock_. Then I saw that UnregisterSubscriber() also expects the caller to take subscribers_lock_. I thought it made sense to follow the same pattern here. I think there is scope for cleaning up the way locking in the statestore code. For instance, RegisterSubscriber() takes subscribers_lock_, instead of its caller. I am just not sure if it makes sense to do that as part of this jira. Another thing: BlockingQueue::BlockingPut() does not release the mutex it acquires in the shut down path. I know we're shutting down anyway, but if ever we implement clean shut down that does some work, a deadlock might happen here. -- To view, visit http://gerrit.cloudera.org:8080/9038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d49dede221ce1f50ec299643b5532c61f93f0c6 Gerrit-Change-Number: 9038 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Zoram Thanga <[email protected]> Gerrit-Comment-Date: Wed, 17 Jan 2018 19:22:30 +0000 Gerrit-HasComments: Yes
