David,
There is a daemon level code in thread #5 working with handler list.
There are two locks there: one for handler list and the other for handler.
To get access to these locks one should call one of oh_xxx() functions
from the daemon.
A plug-in usually doesn't work with these locks at all.
On my recollection ipmidirect is quite able to successfully lock itself
without any help from other threads.
It often tries writelock() on the lock that has been already acquired with
readlock().
Or to to release lock that none has been acquired before (I suspect it is
your case).
POSIX says these operations lead to UB.
Anton Pak
On Mon, 26 Sep 2011 06:22:32 +0400, David McKinley <[email protected]>
wrote:
> Anton,
>
> First, I pulled down the latest code on the trunk from svn, and checked
> that the problem still exists - it did.
>
> As you suggested, I ran it with gdb, and did back traces for all
> threads. This was with the current trunk code. There does seem to be a
> deadlock between two threads, though I have not yet been able to track
> it back to its root. I ran the test two times, and these two threads
> were both waiting to get locks in exactly the same places both times, so
> it seems pretty clear that they are deadlocked.
>
> In the attached, the "interesting" threads are #5 and #3. Thread #3 is
> the one that I can track progress on using the ipmidirect log file, and
> sure enough, it is blocked waiting on a "write lock" for the domain
> object immediately after reading the SEL entries.
>
> I'm guessing that the reason it cannot get that lock is because it is
> owned by Thread #3, but I have not been able to verify this. Thread #3
> is waiting on a mutex for the handler. The simplest case would be if
> thread #3 grabbed the domain lock first, then tried to get the handler
> lock, while Thread #5 held the handler lock and then went for the domain
> lock. Whether it is this simple, I don't know - but I would be
> surprised if the deadlock isn't somehow between these two threads.
>
> Anyway, I'm learning a lot, as I look at it. I'm hoping, though, that
> you or someone else more familiar with the theory of operation here will
> see the problem quicker than I am likely to be able to figure it out.
>
> Regards,
>
> David
>
>
>> -----Original Message-----
>> From: Anton Pak [mailto:[email protected]]
>> Sent: Sunday, September 25, 2011 4:11 AM
>> To: [email protected]; David McKinley
>> Subject: Re: [Openhpi-devel] Hang/Deadlock in 2.17 release
>>
>> Suggest to run it under gdb and print stack trace for each thread when
>> it
>> hangs.
>>
>> Anton Pak
>>
>> On Sun, 25 Sep 2011 07:43:04 +0400, David McKinley
>> <[email protected]>
>> wrote:
>>
>> > Hello,
>> >
>> > On my platform, which is a Sun Netra, using the ipmidirect plugin,
>> > things seem to work fine on the 2.12, 2.14, and 2.16 release codes,
>> but
>> > with the 2.17 release code, it hangs during the discovery process.
>> > Looking at the log file created by the ipmidirect plugin, it proceeds
>> > through discovery to the point where it reads the SEL, but then never
>> > logs anything else, and in particular never logs the message, "BMC
>> > Discovery Done". Meanwhile, in clients, calls to saHpiDiscover()
>> hang.
>> >
>> > Backing out all the code changes in the ipmidirect plugin between
>> 2.16
>> > and 2.17 made no difference (there were very few, and apparently
>> > trivial). So, the problem seems to have been introduced elsewhere.
>> I
>> > looked through the tracker, and did not see any problem like this
>> > reported.
>> >
>> > Given that I'm still very much a newbie in this codebase, I doubt
>> that
>> > I'll be able to track this down very quickly - and if the plugin is
>> > working in other platforms, others should judge how much importance
>> to
>> > attach to this issue. But, I did want to mention it, as it seems
>> like
>> > some sort of regression, at least on this platform.
>> >
>> > David
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Openhpi-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openhpi-devel