GitHub user nabarunnag opened a pull request:
https://github.com/apache/geode/pull/732
GEODE-3276: Managing race conditions while the senders are stopped
* When a connection is initialized, a readAckThread may be alive from a
previous incarnation.
* This AckThread will be stuck on a read socket with no timeout as
nothing was dispatched.
* Also while it was stuck on the read, it will hold a connection
lifecycle read lock
* The initialize connection needs a connection life cycle write lock to
start the connection but the read lock is held by the ack thread.
* This results in a deadlock and eventually a hang.
* Another situation is that we set the flag isStopped for the event
processor before actually shutting down the diapatcher and ack thread.
* So after the flag is set and before actually shutting down the
dispatcher and ackThread, a gateway proxy stomper thread gets in between these
two steps of execution.
* The stomper thread checks the isStopped flag, which was set to true,
and proceeds to destroy the connection pool. However the dispatcher and
ackThread were still running.
* This results in a out of heap memory exception while the ack thread
is reading from the socket while connection pool was destroyed.
* To solve this issue, the stomper thread checks if the event processor
and dispatcher exists, if true then we close the input streams before
destroying the connection pool.
Thank you for submitting a contribution to Apache Geode.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced in
the commit message?
- [ ] Has your PR been rebased against the latest commit within the target
branch (typically `develop`)?
- [ ] Is your initial contribution a single, squashed commit?
- [ ] Does `gradlew build` run cleanly?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and
submit an update to your PR as soon as possible. If you need help, please
send an
email to dev@geode.apache.org.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nabarunnag/incubator-geode feature/GEODE-3276
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/geode/pull/732.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #732
commit 9618f69d2620f5a3d3a6a8576631906a61b512f9
Author: nabarun
Date: 2017-08-17T17:51:15Z
GEODE-3276: Managing race conditions while the senders are stopped
* When a connection is initialized, a readAckThread may be alive from a
previous incarnation.
* This AckThread will be stuck on a read socket with no timeout as
nothing was dispatched.
* Also while it was stuck on the read, it will hold a connection
lifecycle read lock
* The initialize connection needs a connection life cycle write lock to
start the connection but the read lock is held by the ack thread.
* This results in a deadlock and eventually a hang.
* Another situation is that we set the flag isStopped for the event
processor before actually shutting down the diapatcher and ack thread.
* So after the flag is set and before actually shutting down the
dispatcher and ackThread, a gateway proxy stomper thread gets in between these
two steps of execution.
* The stomper thread checks the isStopped flag, which was set to true,
and proceeds to destroy the connection pool. However the dispatcher and
ackThread were still running.
* This results in a out of heap memory exception while the ack thread
is reading from the socket while connection pool was destroyed.
* To solve this issue, the stomper thread checks if the event processor
and dispatcher exists, if true then we close the input streams before
destroying the connection pool.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---