Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9038 )

Change subject: IMPALA-2642: Fix a potential deadlock in statestore
......................................................................


Patch Set 2:

(3 comments)

Working on the test case.

http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG@17
PS2, Line 17: Testing: The problem is easily reproduced by lowering the value of
            : STATESTORE_MAX_SUBSCRIBERS to 3, and then launching a mini cluster
            : with 3 impalads. With the fix, we can see the following entry in 
the
            : statestore log:
> Is it possible to add a test for this?
Doing it. The approach I am taking requires changing STATESTORE_MAX_SUBSCRIBERS 
to FLAGS_statestore_max_subscribers, so that we can start statestored with a 
low max value. I think this is good to do anyway, as that allows us to test 
other boundary behavior.


http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353
PS2, Line 353: // Assumes that the caller has taken subscribers_lock_
> I would add this comment in the header comment in the .h file.
Done


http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353
PS2, Line 353: // Assumes that the caller has taken subscribers_lock_
> Its kind of weird IMO that OfferUpdate() expects the caller to take subscri
Thought of that, ie, moving the call to OfferUpdate from DoSubscriberUpdate to 
another scope that does not have subscribers_lock_. Then I saw that 
UnregisterSubscriber() also expects the caller to take subscribers_lock_. I 
thought it made sense to follow the same pattern here.

I think there is scope for cleaning up the way locking in the statestore code. 
For instance, RegisterSubscriber() takes subscribers_lock_, instead of its 
caller. I am just not sure if it makes sense to do that as part of this jira.

Another thing: BlockingQueue::BlockingPut() does not release the mutex it 
acquires in the shut down path. I know we're shutting down anyway, but if ever 
we implement clean shut down that does some work, a deadlock might happen here.



--
To view, visit http://gerrit.cloudera.org:8080/9038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d49dede221ce1f50ec299643b5532c61f93f0c6
Gerrit-Change-Number: 9038
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Zoram Thanga <[email protected]>
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:22:30 +0000
Gerrit-HasComments: Yes

Reply via email to