[jira] Commented: (THRIFT-488) ThreadManager crashing bugs

Rush Manbert (JIRA) Wed, 06 May 2009 15:37:53 -0700

    [ 
https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706639#action_12706639
 ]


Rush Manbert commented on THRIFT-488:
-------------------------------------

Okay, that doesn't work either because in order to access the object pointed to 
by the weak_ptr, you need to convert it to a shared_ptr. We do that, then the 
Worker finds that he should be idle, so he waits forever on the monitor_ mutex. 
Because the worker holds a shared_ptr, the manager's use count never goes to 
zero and his destructor never gets called.

I haven't used weak_ptr before, and there is a chance that I messed up, so I am 
going to add a patch that converts an unmodified ThreadManager.cpp to my 
version that uses weak_ptr. As always, you need to apply THRIFT-487 to the test 
code in order to use it.

I had another idea, but it sort of smells bad. If all of the shared resources 
were in an object (a structure) allocated on the heap and all the players had a 
shared_ptr to it, then none of the mutex variables would disappear. We would 
also need the manager to set a flag when he goes away, so we won't leak Workers 
and tasks. And we would need to protect access to the shared structure. I hate 
the idea of adding another mutex.

Thoughts?

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch, ThreadManagerAssertPatch.txt
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together 
> to execute client threads. There are race conditions between the two classes 
> that can cause Bus Error exceptions in certain cases. In general, these occur 
> when a ThreadManager instance is destructed, so might not be seen in long 
> running programs. They happen frequently enough, though, that looped 
> repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test 
> program) fail quite often.
> These errors are generally not seen with the current version of 
> ThreadManagerTests::blockTest() due to errors in the test itself that cause 
> failures at a far higher frequency. In order to see them, you need to apply 
> the patches that are attached to THRIFT-487 
> (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and 
> observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer 
> (generally) and usually fails with an assert in Monitor.cpp. This failure is 
> because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original 
> ThreadManager::Worker::run() method where active never becomes true. In 
> practice, exercising this code path is difficuly. The way that I exercised it 
> was to edit line 322 in the patched version of ThreadManager.cpp. I changed 
> the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That 
> extra worker caused quite a bit of trouble until I moved his handling up to 
> the top of the run() method. I don't understand how this situation could 
> occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (THRIFT-488) ThreadManager crashing bugs

Reply via email to