Re: log4j bug causes deadlock?!? (was: which sort of application bugs hang wotaskd?)

2016-10-25 Thread Chuck Hill
Well, the fix seems easy if that works for you.

From: <webobjects-dev-bounces+chill=gevityinc@lists.apple.com> on behalf of 
"o...@ocs.cz" <o...@ocs.cz>
Date: Tuesday, October 25, 2016 at 2:56 PM
To: WebObjects-Dev <webobjects-dev@lists.apple.com>
Subject: Re: log4j bug causes deadlock?!? (was: which sort of application bugs 
hang wotaskd?)

Hmmm, looks like we are not the only one who got bit by that thing in tender 
parts: http://stackoverflow.com/a/7397857

On 25. 10. 2016, at 10:59 PM, OC <o...@ocs.cz<mailto:o...@ocs.cz>> wrote:
Chuck,
On 24. 10. 2016, at 18:23, Chuck Hill 
<ch...@gevityinc.com<mailto:ch...@gevityinc.com>> wrote:
Running “sudo jstack –F ” should dump a trace of all threads.  
Should…
Luckily, it did; just it is sort of weird that it identifies threads by name in 
the deadlock report, but by some unknown ID (perhaps hash of the Thread 
instance, looks like that) in the traces, which makes for sort of difficult 
parsing. I guess I'm not the first one to swear :)
... As for the hang, the code in the logging does seem like a likely culprit.
Actually it looks like the culprit is log4j itself?!? What the heck! Anyroad, 
here's the deadlock report:
===
"model.ReportPDFTask@5f6ce9a5":
  waiting to lock Monitor@0x00012c0008a8 (Object@0x0003f7efb980, a 
org/apache/log4j/spi/RootLogger),
  which is held by "WorkerThread6"
"WorkerThread6":
waiting for ownable synchronizer 0x0003f9538960, (a 
java/util/concurrent/locks/ReentrantLock$NonfairSync),
which is held by "model.ReportPDFTask@5f6ce9a5"
===
the WorkerThread6 problem happens in the “log.info("$tch")” part, it looks like 
this (and although changesFromCommittedSnapshot happens to be part of the 
culprit, I believe the problem is actually caused by the logger itself): 
instead of rendering its argument immediately, as soon as it is determined that 
we should log, it does that self-evidently under a lock:
=== Worker6 (cleaned up considerably) ===
- er.extensions.eof.ERXEC.lockObjectStore() // the lock W6 [2] hangs on, caused 
by...
- er.extensions.eof.ERXGenericRecord.changesFromCommittedSnapshot() // ... 
changesFromCommittedSnapshot ...
- cz.ocs.model.OCSEnterpriseObject.toString() // ... which yours truly dumbly 
uses in toString, alas!
- org.apache.log4j.or.DefaultRenderer.doRender(java.lang.Object) // ... this 
thing happens INSIDE of log4j code
... lot of frames here; self-evidently, some of them acquires and holds 
Monitor@0x00012c0008a8 [1] ...
- org.apache.log4j.Category.info(java.lang.Object) // this is the 
log.info("$tch") of mine
... the loop and the other standard stuff up to dispatchRequest and 
WOWorkerThread.run() ...
===
The "model.ReportPDFTask" actually does not do anything wrong (far as I can 
say) -- it just fetches. A fetch presumably (and understandably) would lock the 
object store. With locked object store, it... logs! (I do not think the stack 
trace is important here; anyway, I have attached it at the end of this message 
for reference.)
Nothing wrong there I believe, it feels right to be able to log with a locked 
object store; but it causes deadlock, for the log tries to acquire its 
Monitor@0x00012c0008a8 -- which is held by Worker6.
Now, although in my case the culprit happens to be unneeded (and generally 
dangerous) changesFromCommittedSnapshot, I believe that
(a) whatever which locks object store in the log would cause deadlock just as 
well
(b) which can be e.g., a fault fired, or lots of other perfectly valid things 
to log.
I do understand why log4j methods accept generic Object and render the result 
inside -- it is reasonable not to render anything unless we want to log with 
the current setting.
What seems to me to be a grave bug though is that this rendering happens under 
a lock, which causes a deadlock if two threads log concurrently and it so 
happens that
(i) one of them logs under object store lock (which I believe is valid and 
correct)
(ii) the other's log contents causes an object store lock (which again I 
believe is valid and correct, see (a) and (b) above).
So, well, is there indeed a grave bug inside of the ubiquitous log4j, or am I 
overlooking something of importance?
And if there is a bug, is there anything better one can do to work around it 
than using toStrings for all non-trivial logs, like 
“log.info("non-trivial-contents".toString())”, which would affect efficiency 
pretty bad?
Thanks and all the best,
OC
=== "model.ReportPDFTask@5f6ce9a5" slightly cleaned up ===
- org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) 
@bci=12, line=204 (Compiled frame) // hangs on Monitor@0x00012c0008a8, 
which is held by [1] above
- org.apache.log4j.Category.forcedLog(java.lang.String, 
org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, 
line=391 (Compiled frame)

Re: log4j bug causes deadlock?!? (was: which sort of application bugs hang wotaskd?)

2016-10-25 Thread o...@ocs.cz
Hmmm, looks like we are not the only one who got bit by that thing in tender 
parts: http://stackoverflow.com/a/7397857

> On 25. 10. 2016, at 10:59 PM, OC  wrote:
> 
> Chuck,
> 
> On 24. 10. 2016, at 18:23, Chuck Hill  wrote:
> 
>> Running “sudo jstack –F ” should dump a trace of all threads.  
>> Should… 
> 
> Luckily, it did; just it is sort of weird that it identifies threads by name 
> in the deadlock report, but by some unknown ID (perhaps hash of the Thread 
> instance, looks like that) in the traces, which makes for sort of difficult 
> parsing. I guess I'm not the first one to swear :)
> 
>> ... As for the hang, the code in the logging does seem like a likely culprit.
> 
> Actually it looks like the culprit is log4j itself?!? What the heck! Anyroad, 
> here's the deadlock report:
> 
> ===
> "model.ReportPDFTask@5f6ce9a5":
>  waiting to lock Monitor@0x00012c0008a8 (Object@0x0003f7efb980, a 
> org/apache/log4j/spi/RootLogger),
>  which is held by "WorkerThread6"
> "WorkerThread6":
> waiting for ownable synchronizer 0x0003f9538960, (a 
> java/util/concurrent/locks/ReentrantLock$NonfairSync),
> which is held by "model.ReportPDFTask@5f6ce9a5"
> ===
> 
> the WorkerThread6 problem happens in the “log.info("$tch")” part, it looks 
> like this (and although changesFromCommittedSnapshot happens to be part of 
> the culprit, I believe the problem is actually caused by the logger itself): 
> instead of rendering its argument immediately, as soon as it is determined 
> that we should log, it does that self-evidently under a lock:
> 
> === Worker6 (cleaned up considerably) ===
> - er.extensions.eof.ERXEC.lockObjectStore() // the lock W6 [2] hangs on, 
> caused by...
> - er.extensions.eof.ERXGenericRecord.changesFromCommittedSnapshot() // ... 
> changesFromCommittedSnapshot ...
> - cz.ocs.model.OCSEnterpriseObject.toString() // ... which yours truly dumbly 
> uses in toString, alas!
> - org.apache.log4j.or.DefaultRenderer.doRender(java.lang.Object) // ... this 
> thing happens INSIDE of log4j code
> ... lot of frames here; self-evidently, some of them acquires and holds 
> Monitor@0x00012c0008a8 [1] ...
> - org.apache.log4j.Category.info(java.lang.Object) // this is the 
> log.info("$tch") of mine
> ... the loop and the other standard stuff up to dispatchRequest and 
> WOWorkerThread.run() ...
> ===
> 
> The "model.ReportPDFTask" actually does not do anything wrong (far as I can 
> say) -- it just fetches. A fetch presumably (and understandably) would lock 
> the object store. With locked object store, it... logs! (I do not think the 
> stack trace is important here; anyway, I have attached it at the end of this 
> message for reference.)
> 
> Nothing wrong there I believe, it feels right to be able to log with a locked 
> object store; but it causes deadlock, for the log tries to acquire its 
> Monitor@0x00012c0008a8 -- which is held by Worker6.
> 
> Now, although in my case the culprit happens to be unneeded (and generally 
> dangerous) changesFromCommittedSnapshot, I believe that
> 
> (a) whatever which locks object store in the log would cause deadlock just as 
> well
> (b) which can be e.g., a fault fired, or lots of other perfectly valid things 
> to log.
> 
> I do understand why log4j methods accept generic Object and render the result 
> inside -- it is reasonable not to render anything unless we want to log with 
> the current setting.
> 
> What seems to me to be a grave bug though is that this rendering happens 
> under a lock, which causes a deadlock if two threads log concurrently and it 
> so happens that
> 
> (i) one of them logs under object store lock (which I believe is valid and 
> correct)
> (ii) the other's log contents causes an object store lock (which again I 
> believe is valid and correct, see (a) and (b) above).
> 
> So, well, is there indeed a grave bug inside of the ubiquitous log4j, or am I 
> overlooking something of importance?
> 
> And if there is a bug, is there anything better one can do to work around it 
> than using toStrings for all non-trivial logs, like 
> “log.info("non-trivial-contents".toString())”, which would affect efficiency 
> pretty bad?
> 
> Thanks and all the best,
> OC
> 
> === "model.ReportPDFTask@5f6ce9a5" slightly cleaned up ===
> - org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) 
> @bci=12, line=204 (Compiled frame) // hangs on Monitor@0x00012c0008a8, 
> which is held by [1] above
> - org.apache.log4j.Category.forcedLog(java.lang.String, 
> org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, 
> line=391 (Compiled frame)
> - org.apache.log4j.Category.info(java.lang.Object) @bci=38, line=666 
> (Compiled frame)
> - 
> er.extensions.eof.ERXEOAccessUtilities.logExpression(com.webobjects.eoaccess.EOAdaptorChannel,
>  com.webobjects.eoaccess.EOSQLExpression, long) @bci=406, line=1307 (Compiled 
> frame)
> - 
> 

log4j bug causes deadlock?!? (was: which sort of application bugs hang wotaskd?)

2016-10-25 Thread OC
Chuck,

On 24. 10. 2016, at 18:23, Chuck Hill  wrote:

> Running “sudo jstack –F ” should dump a trace of all threads.  
> Should… 

Luckily, it did; just it is sort of weird that it identifies threads by name in 
the deadlock report, but by some unknown ID (perhaps hash of the Thread 
instance, looks like that) in the traces, which makes for sort of difficult 
parsing. I guess I'm not the first one to swear :)

> ... As for the hang, the code in the logging does seem like a likely culprit.

Actually it looks like the culprit is log4j itself?!? What the heck! Anyroad, 
here's the deadlock report:

===
"model.ReportPDFTask@5f6ce9a5":
  waiting to lock Monitor@0x00012c0008a8 (Object@0x0003f7efb980, a 
org/apache/log4j/spi/RootLogger),
  which is held by "WorkerThread6"
"WorkerThread6":
 waiting for ownable synchronizer 0x0003f9538960, (a 
java/util/concurrent/locks/ReentrantLock$NonfairSync),
 which is held by "model.ReportPDFTask@5f6ce9a5"
===

the WorkerThread6 problem happens in the “log.info("$tch")” part, it looks like 
this (and although changesFromCommittedSnapshot happens to be part of the 
culprit, I believe the problem is actually caused by the logger itself): 
instead of rendering its argument immediately, as soon as it is determined that 
we should log, it does that self-evidently under a lock:

=== Worker6 (cleaned up considerably) ===
 - er.extensions.eof.ERXEC.lockObjectStore() // the lock W6 [2] hangs on, 
caused by...
 - er.extensions.eof.ERXGenericRecord.changesFromCommittedSnapshot() // ... 
changesFromCommittedSnapshot ...
 - cz.ocs.model.OCSEnterpriseObject.toString() // ... which yours truly dumbly 
uses in toString, alas!
 - org.apache.log4j.or.DefaultRenderer.doRender(java.lang.Object) // ... this 
thing happens INSIDE of log4j code
... lot of frames here; self-evidently, some of them acquires and holds 
Monitor@0x00012c0008a8 [1] ...
 - org.apache.log4j.Category.info(java.lang.Object) // this is the 
log.info("$tch") of mine
... the loop and the other standard stuff up to dispatchRequest and 
WOWorkerThread.run() ...
===

The "model.ReportPDFTask" actually does not do anything wrong (far as I can 
say) -- it just fetches. A fetch presumably (and understandably) would lock the 
object store. With locked object store, it... logs! (I do not think the stack 
trace is important here; anyway, I have attached it at the end of this message 
for reference.)

Nothing wrong there I believe, it feels right to be able to log with a locked 
object store; but it causes deadlock, for the log tries to acquire its 
Monitor@0x00012c0008a8 -- which is held by Worker6.

Now, although in my case the culprit happens to be unneeded (and generally 
dangerous) changesFromCommittedSnapshot, I believe that

(a) whatever which locks object store in the log would cause deadlock just as 
well
(b) which can be e.g., a fault fired, or lots of other perfectly valid things 
to log.

I do understand why log4j methods accept generic Object and render the result 
inside -- it is reasonable not to render anything unless we want to log with 
the current setting.

What seems to me to be a grave bug though is that this rendering happens under 
a lock, which causes a deadlock if two threads log concurrently and it so 
happens that

(i) one of them logs under object store lock (which I believe is valid and 
correct)
(ii) the other's log contents causes an object store lock (which again I 
believe is valid and correct, see (a) and (b) above).

So, well, is there indeed a grave bug inside of the ubiquitous log4j, or am I 
overlooking something of importance?

And if there is a bug, is there anything better one can do to work around it 
than using toStrings for all non-trivial logs, like 
“log.info("non-trivial-contents".toString())”, which would affect efficiency 
pretty bad?

Thanks and all the best,
OC

=== "model.ReportPDFTask@5f6ce9a5" slightly cleaned up ===
 - org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) 
@bci=12, line=204 (Compiled frame) // hangs on Monitor@0x00012c0008a8, 
which is held by [1] above
 - org.apache.log4j.Category.forcedLog(java.lang.String, 
org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, 
line=391 (Compiled frame)
 - org.apache.log4j.Category.info(java.lang.Object) @bci=38, line=666 (Compiled 
frame)
 - 
er.extensions.eof.ERXEOAccessUtilities.logExpression(com.webobjects.eoaccess.EOAdaptorChannel,
 com.webobjects.eoaccess.EOSQLExpression, long) @bci=406, line=1307 (Compiled 
frame)
 - 
er.extensions.eof.ERXAdaptorChannelDelegate.adaptorChannelDidEvaluateExpression(com.webobjects.eoaccess.EOAdaptorChannel,
 com.webobjects.eoaccess.EOSQLExpression) @bci=76, line=81 (Compiled frame)
 - com.webobjects.foundation._NSDelegate.perform(java.lang.String, 
java.lang.Object, java.lang.Object) @bci=14, line=163 (Compiled frame)
 -