[
https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706639#action_12706639
]
Rush Manbert commented on THRIFT-488:
-------------------------------------
Okay, that doesn't work either because in order to access the object pointed to
by the weak_ptr, you need to convert it to a shared_ptr. We do that, then the
Worker finds that he should be idle, so he waits forever on the monitor_ mutex.
Because the worker holds a shared_ptr, the manager's use count never goes to
zero and his destructor never gets called.
I haven't used weak_ptr before, and there is a chance that I messed up, so I am
going to add a patch that converts an unmodified ThreadManager.cpp to my
version that uses weak_ptr. As always, you need to apply THRIFT-487 to the test
code in order to use it.
I had another idea, but it sort of smells bad. If all of the shared resources
were in an object (a structure) allocated on the heap and all the players had a
shared_ptr to it, then none of the mutex variables would disappear. We would
also need the manager to set a flag when he goes away, so we won't leak Workers
and tasks. And we would need to protect access to the shared structure. I hate
the idea of adding another mutex.
Thoughts?
> ThreadManager crashing bugs
> ---------------------------
>
> Key: THRIFT-488
> URL: https://issues.apache.org/jira/browse/THRIFT-488
> Project: Thrift
> Issue Type: Bug
> Components: Library (C++)
> Affects Versions: 0.1
> Environment: Mac OS X 10.5.6, Xcode
> Reporter: Rush Manbert
> Attachments: ThreadManager.cpp.patch, ThreadManagerAssertPatch.txt
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together
> to execute client threads. There are race conditions between the two classes
> that can cause Bus Error exceptions in certain cases. In general, these occur
> when a ThreadManager instance is destructed, so might not be seen in long
> running programs. They happen frequently enough, though, that looped
> repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test
> program) fail quite often.
> These errors are generally not seen with the current version of
> ThreadManagerTests::blockTest() due to errors in the test itself that cause
> failures at a far higher frequency. In order to see them, you need to apply
> the patches that are attached to THRIFT-487
> (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and
> observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer
> (generally) and usually fails with an assert in Monitor.cpp. This failure is
> because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original
> ThreadManager::Worker::run() method where active never becomes true. In
> practice, exercising this code path is difficuly. The way that I exercised it
> was to edit line 322 in the patched version of ThreadManager.cpp. I changed
> the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That
> extra worker caused quite a bit of trouble until I moved his handling up to
> the top of the run() method. I don't understand how this situation could
> occur in real life, but this code appears to handle it correctly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.