Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
I've just sent out an RFR for 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java The proposed fix incorporates the change suggested by Per and discussed in this thread of moving the pending reference list management entirely into the VM.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Kim, On 03/23/2016 09:40 PM, Kim Barrett wrote: I don't think there's any throughput penalty for a long timeout. The proper response to waitForCleanups returning false (assuming the epoch was obtained early and passed as an argument) is OOME. I really doubt the latency for reporting OOME is of critical importance. The above assumption is not entirely correct. The correct response to waitForCleanups returning false should be at least one attempt to trigger GC reference discovery 1st and only after that it should be OOME. Suppose a program tries to allocate direct memory above the limit. Waiting for cleanups to happen might be very long if there's no heap memory pressure although there might be already lots of unreachable direct buffers on the heap. So guessing the right timeout before attempting to trigger GC is not trivial. If you make it to small, there will be excessive GCs triggered and throughput will suffer. If you make it to long, throughput will suffer again. Nevertheless I managed to create a variant that self-adjusts the timeout based on the last successful wait time. At least with the DirectBufferAllocTest using 16 or 32 allocating threads (on 4-core CPU) the throughput is comparable as before and what's important, the test passes: java -XX:MaxDirectMemorySize=128m -cp out DirectBufferAllocTest -r 600 -t 16 -p 5000 Allocating direct ByteBuffers with capacity 1048576 bytes, using 16 threads for 600 seconds, printing the average per-thread latency of 5000 consecutive allocations... Thread 11: 1.94 ms/allocation Thread 6: 1.97 ms/allocation Thread 12: 2.05 ms/allocation Thread 0: 2.10 ms/allocation Thread 7: 2.15 ms/allocation Thread 3: 2.16 ms/allocation Thread 1: 2.26 ms/allocation Thread 5: 2.32 ms/allocation Thread 2: 2.33 ms/allocation Thread 4: 2.34 ms/allocation Thread 13: 2.36 ms/allocation Thread 9: 2.38 ms/allocation Thread 14: 2.40 ms/allocation Thread 10: 2.40 ms/allocation Thread 8: 2.42 ms/allocation Thread 15: 2.44 ms/allocation Thread 6: 1.72 ms/allocation Thread 11: 1.75 ms/allocation Thread 12: 1.86 ms/allocation Thread 0: 1.86 ms/allocation Thread 3: 1.94 ms/allocation Thread 7: 2.07 ms/allocation Thread 1: 2.08 ms/allocation Thread 2: 2.12 ms/allocation Thread 4: 2.14 ms/allocation Thread 5: 2.16 ms/allocation Thread 9: 2.13 ms/allocation Here's the webrev: http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.10.part2/ So what do you think? Regards, Peter That is, the caller looks something like (not even pretending to write Java) alloc = tryAllocatation(allocSize) if alloc != NULL return alloc endif // Maybe add a retry+wait with a short timeout here, // to allow existing cleanups to run before requesting // another gc. Not clear that's really worthwhile, as // it only comes up when we get here just after a gc // and the resulting cleanups are not yet all processed. System.gc() while true epoch = getEpoch() alloc = tryAllocation(allocSize) if alloc != NULL return alloc elif !waitForCleanup(epoch) throw OOME // No cleanup progress for a while endif end
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
> On Mar 23, 2016, at 4:42 PM, Peter Levartwrote: > > Hi Kim, > > Thinking more about your approach. Basically your idea is to detect that > there are no more unprocessed but pending or enqueued Cleanables by timing > out on waiting for next Cleanable to be processed. In that case the timeout > should be reset when each Cleanable is detected to be processed so that when > there's a "silence" detected for at least the whole timeout period, we can > claim with enough probability that there are no more unprocessed Cleanables > either pending or enqueued and that we can give up with OOME. Exactly, and much better stated than I did.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 03/23/2016 09:40 PM, Kim Barrett wrote: On Mar 23, 2016, at 3:33 PM, Peter Levartwrote: Hi Kim, On 03/23/2016 07:55 PM, Kim Barrett wrote: On Mar 23, 2016, at 10:02 AM, Peter Levart wrote: ...so I checked what it would be needed if there was such getPendingReferences() native method. It turns out that a single native method would not be enough to support the precise direct ByteBuffer allocation. Here's a refactored webrev that introduces a getPendingReferences() method which could be turned into a native equivalent one day. There is another native method needed - int awaitEnqueuePhaseStart(): http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/ I don't think the Reference.awaitEnqueuePhaseStart thing is needed. Rather, I think the Direct-X-Buffer allocation should conspire with the the Direct-X-Buffer cleanups directly to manage that sort of thing, and not add anything to Reference and the reference processing thread. E.g. the phase and signal/wait are purely part of Direct-X-Buffer. (I also think something like that could/should have been done instead of providing Direct-X-Buffer with access to Reference.tryHandlePending, but that's likely water under the bridge now.) Something very roughly like this: allocating thread, after allocation failed bool waitForCleanups() { int epoch = DXB.getCleanupCounter(); long start = startTime(); long timeout = calcTimeout(start) synchronized (DXB.getCleanupMonitor()) { while (epoch == DBX.getCleanupCounter()) { wait(timeout); timeout = calcTimeout(start); if (timeout <= 0) break; } return epoch != DBX.getCleanupCounter(); } } cleanup function, after freeing memory synchronized (DBX.getCleanupMonitor()) { DBX.incCleanupCounter(); DBX.getCleanupMonitor().notify_all(); } Actually, epoch should probably have been obtained *before* the failed allocation attempt, and should be an argument to waitForCleanups. That's all quite sketchy, but I need to do other things today. Peter, care to try filling this in? There's no need to maintain a special cleanup counter as java.nio.Bits already maintains the amount of currently allocated direct memory (in bytes). What your suggestion leads to is similar to one of previous versions of java.nio.Bits which waited for some 'timeout' time after invoking System.gc() and then re-tried reservation, failing if it didn't succeed. The problem with such "asynchronous" approach is that there's no right value of 'timeout' for all situations. If you wait for to short time, you might get OOME although there are plenty unreachable but still uncleaned direct buffers. If you wait for to long, your throughput will suffer. There has to be some "feedback" from reference processing to know when there's still beneficial to wait and when there's no point in waiting any more. Regards, Peter I don't think there's any throughput penalty for a long timeout. The proper response to waitForCleanups returning false (assuming the epoch was obtained early and passed as an argument) is OOME. I really doubt the latency for reporting OOME is of critical importance. That is, the caller looks something like (not even pretending to write Java) alloc = tryAllocatation(allocSize) if alloc != NULL return alloc endif // Maybe add a retry+wait with a short timeout here, // to allow existing cleanups to run before requesting // another gc. Not clear that's really worthwhile, as // it only comes up when we get here just after a gc // and the resulting cleanups are not yet all processed. System.gc() while true epoch = getEpoch() alloc = tryAllocation(allocSize) if alloc != NULL return alloc elif !waitForCleanup(epoch) throw OOME // No cleanup progress for a while endif end Right, this is easier to understand. I already figured out what you wanted to say the 1st time. I'll try to prepare a prototype along this idea tomorrow. Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Kim, Thinking more about your approach. Basically your idea is to detect that there are no more unprocessed but pending or enqueued Cleanables by timing out on waiting for next Cleanable to be processed. In that case the timeout should be reset when each Cleanable is detected to be processed so that when there's a "silence" detected for at least the whole timeout period, we can claim with enough probability that there are no more unprocessed Cleanables either pending or enqueued and that we can give up with OOME. Let me try to see with a prototype if this approach leads to success... Regards, Peter On 03/23/2016 08:33 PM, Peter Levart wrote: Hi Kim, On 03/23/2016 07:55 PM, Kim Barrett wrote: On Mar 23, 2016, at 10:02 AM, Peter Levartwrote: ...so I checked what it would be needed if there was such getPendingReferences() native method. It turns out that a single native method would not be enough to support the precise direct ByteBuffer allocation. Here's a refactored webrev that introduces a getPendingReferences() method which could be turned into a native equivalent one day. There is another native method needed - int awaitEnqueuePhaseStart(): http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/ I don't think the Reference.awaitEnqueuePhaseStart thing is needed. Rather, I think the Direct-X-Buffer allocation should conspire with the the Direct-X-Buffer cleanups directly to manage that sort of thing, and not add anything to Reference and the reference processing thread. E.g. the phase and signal/wait are purely part of Direct-X-Buffer. (I also think something like that could/should have been done instead of providing Direct-X-Buffer with access to Reference.tryHandlePending, but that's likely water under the bridge now.) Something very roughly like this: allocating thread, after allocation failed bool waitForCleanups() { int epoch = DXB.getCleanupCounter(); long start = startTime(); long timeout = calcTimeout(start) synchronized (DXB.getCleanupMonitor()) { while (epoch == DBX.getCleanupCounter()) { wait(timeout); timeout = calcTimeout(start); if (timeout <= 0) break; } return epoch != DBX.getCleanupCounter(); } } cleanup function, after freeing memory synchronized (DBX.getCleanupMonitor()) { DBX.incCleanupCounter(); DBX.getCleanupMonitor().notify_all(); } Actually, epoch should probably have been obtained *before* the failed allocation attempt, and should be an argument to waitForCleanups. That's all quite sketchy, but I need to do other things today. Peter, care to try filling this in? There's no need to maintain a special cleanup counter as java.nio.Bits already maintains the amount of currently allocated direct memory (in bytes). What your suggestion leads to is similar to one of previous versions of java.nio.Bits which waited for some 'timeout' time after invoking System.gc() and then re-tried reservation, failing if it didn't succeed. The problem with such "asynchronous" approach is that there's no right value of 'timeout' for all situations. If you wait for to short time, you might get OOME although there are plenty unreachable but still uncleaned direct buffers. If you wait for to long, your throughput will suffer. There has to be some "feedback" from reference processing to know when there's still beneficial to wait and when there's no point in waiting any more. Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
> On Mar 23, 2016, at 3:33 PM, Peter Levartwrote: > > Hi Kim, > > On 03/23/2016 07:55 PM, Kim Barrett wrote: >>> On Mar 23, 2016, at 10:02 AM, Peter Levart >>> wrote: >>> ...so I checked what it would be needed if there was such >>> getPendingReferences() native method. It turns out that a single native >>> method would not be enough to support the precise direct ByteBuffer >>> allocation. Here's a refactored webrev that introduces a >>> getPendingReferences() method which could be turned into a native >>> equivalent one day. There is another native method needed - int >>> awaitEnqueuePhaseStart(): >>> >>> >>> http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/ >> I don't think the Reference.awaitEnqueuePhaseStart thing is needed. >> >> Rather, I think the Direct-X-Buffer allocation should conspire with >> the the Direct-X-Buffer cleanups directly to manage that sort of >> thing, and not add anything to Reference and the reference processing >> thread. E.g. the phase and signal/wait are purely part of >> Direct-X-Buffer. (I also think something like that could/should have >> been done instead of providing Direct-X-Buffer with access to >> Reference.tryHandlePending, but that's likely water under the bridge >> now.) >> >> Something very roughly like this: >> >> allocating thread, after allocation failed >> >> bool waitForCleanups() { >> int epoch = DXB.getCleanupCounter(); >> long start = startTime(); >> long timeout = calcTimeout(start) >> synchronized (DXB.getCleanupMonitor()) { >> while (epoch == DBX.getCleanupCounter()) { >> wait(timeout); >> timeout = calcTimeout(start); >> if (timeout <= 0) break; >> } >> return epoch != DBX.getCleanupCounter(); >> } >> } >> >> cleanup function, after freeing memory >> >> synchronized (DBX.getCleanupMonitor()) { >> DBX.incCleanupCounter(); >> DBX.getCleanupMonitor().notify_all(); >> } >> >> Actually, epoch should probably have been obtained *before* the failed >> allocation attempt, and should be an argument to waitForCleanups. >> >> That's all quite sketchy, but I need to do other things today. >> >> Peter, care to try filling this in? >> >> > > There's no need to maintain a special cleanup counter as java.nio.Bits > already maintains the amount of currently allocated direct memory (in bytes). > What your suggestion leads to is similar to one of previous versions of > java.nio.Bits which waited for some 'timeout' time after invoking System.gc() > and then re-tried reservation, failing if it didn't succeed. The problem with > such "asynchronous" approach is that there's no right value of 'timeout' for > all situations. If you wait for to short time, you might get OOME although > there are plenty unreachable but still uncleaned direct buffers. If you wait > for to long, your throughput will suffer. There has to be some "feedback" > from reference processing to know when there's still beneficial to wait and > when there's no point in waiting any more. > > Regards, Peter I don't think there's any throughput penalty for a long timeout. The proper response to waitForCleanups returning false (assuming the epoch was obtained early and passed as an argument) is OOME. I really doubt the latency for reporting OOME is of critical importance. That is, the caller looks something like (not even pretending to write Java) alloc = tryAllocatation(allocSize) if alloc != NULL return alloc endif // Maybe add a retry+wait with a short timeout here, // to allow existing cleanups to run before requesting // another gc. Not clear that's really worthwhile, as // it only comes up when we get here just after a gc // and the resulting cleanups are not yet all processed. System.gc() while true epoch = getEpoch() alloc = tryAllocation(allocSize) if alloc != NULL return alloc elif !waitForCleanup(epoch) throw OOME // No cleanup progress for a while endif end
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Kim, On 03/23/2016 07:55 PM, Kim Barrett wrote: On Mar 23, 2016, at 10:02 AM, Peter Levartwrote: ...so I checked what it would be needed if there was such getPendingReferences() native method. It turns out that a single native method would not be enough to support the precise direct ByteBuffer allocation. Here's a refactored webrev that introduces a getPendingReferences() method which could be turned into a native equivalent one day. There is another native method needed - int awaitEnqueuePhaseStart(): http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/ I don't think the Reference.awaitEnqueuePhaseStart thing is needed. Rather, I think the Direct-X-Buffer allocation should conspire with the the Direct-X-Buffer cleanups directly to manage that sort of thing, and not add anything to Reference and the reference processing thread. E.g. the phase and signal/wait are purely part of Direct-X-Buffer. (I also think something like that could/should have been done instead of providing Direct-X-Buffer with access to Reference.tryHandlePending, but that's likely water under the bridge now.) Something very roughly like this: allocating thread, after allocation failed bool waitForCleanups() { int epoch = DXB.getCleanupCounter(); long start = startTime(); long timeout = calcTimeout(start) synchronized (DXB.getCleanupMonitor()) { while (epoch == DBX.getCleanupCounter()) { wait(timeout); timeout = calcTimeout(start); if (timeout <= 0) break; } return epoch != DBX.getCleanupCounter(); } } cleanup function, after freeing memory synchronized (DBX.getCleanupMonitor()) { DBX.incCleanupCounter(); DBX.getCleanupMonitor().notify_all(); } Actually, epoch should probably have been obtained *before* the failed allocation attempt, and should be an argument to waitForCleanups. That's all quite sketchy, but I need to do other things today. Peter, care to try filling this in? There's no need to maintain a special cleanup counter as java.nio.Bits already maintains the amount of currently allocated direct memory (in bytes). What your suggestion leads to is similar to one of previous versions of java.nio.Bits which waited for some 'timeout' time after invoking System.gc() and then re-tried reservation, failing if it didn't succeed. The problem with such "asynchronous" approach is that there's no right value of 'timeout' for all situations. If you wait for to short time, you might get OOME although there are plenty unreachable but still uncleaned direct buffers. If you wait for to long, your throughput will suffer. There has to be some "feedback" from reference processing to know when there's still beneficial to wait and when there's no point in waiting any more. Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
> On Mar 23, 2016, at 10:02 AM, Peter Levartwrote: > ...so I checked what it would be needed if there was such > getPendingReferences() native method. It turns out that a single native > method would not be enough to support the precise direct ByteBuffer > allocation. Here's a refactored webrev that introduces a > getPendingReferences() method which could be turned into a native equivalent > one day. There is another native method needed - int awaitEnqueuePhaseStart(): > > http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/ I don't think the Reference.awaitEnqueuePhaseStart thing is needed. Rather, I think the Direct-X-Buffer allocation should conspire with the the Direct-X-Buffer cleanups directly to manage that sort of thing, and not add anything to Reference and the reference processing thread. E.g. the phase and signal/wait are purely part of Direct-X-Buffer. (I also think something like that could/should have been done instead of providing Direct-X-Buffer with access to Reference.tryHandlePending, but that's likely water under the bridge now.) Something very roughly like this: allocating thread, after allocation failed bool waitForCleanups() { int epoch = DXB.getCleanupCounter(); long start = startTime(); long timeout = calcTimeout(start) synchronized (DXB.getCleanupMonitor()) { while (epoch == DBX.getCleanupCounter()) { wait(timeout); timeout = calcTimeout(start); if (timeout <= 0) break; } return epoch != DBX.getCleanupCounter(); } } cleanup function, after freeing memory synchronized (DBX.getCleanupMonitor()) { DBX.incCleanupCounter(); DBX.getCleanupMonitor().notify_all(); } Actually, epoch should probably have been obtained *before* the failed allocation attempt, and should be an argument to waitForCleanups. That's all quite sketchy, but I need to do other things today. Peter, care to try filling this in?
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter, On 2016-03-23 15:02, Peter Levart wrote: Hi Per, Kim, On 03/22/2016 10:24 AM, Per Liden wrote: So, I imagine the ReferenceHandler could do something like this: while (true) { // getPendingReferences() is a downcall to the VM which // blocks until the pending list becomes non-empty and // returns the whole list, transferring it to from VM-land // to Java-land in a safe and robust way. Reference pending = getPendingReferences(); // Enqueue the references while (pending != null) { Reference r = pending; pending = r.discovered; r.discovered = null; ReferenceQueue q = r.queue; if (q != ReferenceQueue.NULL) { q.enqueue(r); } } } ...so I checked what it would be needed if there was such getPendingReferences() native method. It turns out that a single native method would not be enough to support the precise direct ByteBuffer allocation. Here's a refactored webrev that introduces a getPendingReferences() method which could be turned into a native equivalent one day. There is another native method needed - int awaitEnqueuePhaseStart(): http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/ The need for this additional method arises when one wants to combine reference discovery with enqueueing of discovered references into one synchronous operation (discoverAndEnqueueReferences()). A direct ByteBuffer allocating thread wants to trigger reference discovery (System.gc()) and wait for discovered references to be enqueued before continuing with direct memory reservation retries. An alternative to what I have done in above webrev would be a maintenance of a single enqueuePhase counter on the Java side with usage roughly as: discoverAndEnqueueReferences() { int phase = Reference.getEnqueuePhase(); System.gc(); Reference.awaitEnqueuePhaseGreaterThan(phase); } But in that case, System.gc() would have to guarantee that after discovery of no new references, blocked getPendingReferences() would still return with an empty list of References (null) just to keep the DBB allocating thread alive. I have tried to do this variant and unfortunately it can't be reliably performed with current protocol as getPendingReferences() can only be programmed to return non-empty Reference lists without ambiguity. I created a DirectBufferAllocOOMETest to exercise situations where no new Reference(s) are discovered in a GC round. So do what do you think - what would it be easier to support: a) getPendingReferences() returns empty Reference list (null) after a GC round that discovers no new pending references b) getPendingReferences() returns when new Reference(s) are discovered and there is an additional int awaitEnqueuePhaseStart() as defined in above webrev. I've prototyped the VM side. I've ignored the "await" issue for now as I first just wanted the basic structure up. I'm running out of time for today (and I'll be away the rest of the week) but let's continue the discussion next week and figure out the "await" details/alternatives. Webrevs for jdk9/hs-rt: http://cr.openjdk.java.net/~pliden/reference_pending_list/webrev.0-jdk http://cr.openjdk.java.net/~pliden/reference_pending_list/webrev.0-hotspot It passes jdk/test/java/lang/ref/* and our VM tests for reference processing. cheers, Per
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Per, Kim, On 03/22/2016 10:24 AM, Per Liden wrote: So, I imagine the ReferenceHandler could do something like this: while (true) { // getPendingReferences() is a downcall to the VM which // blocks until the pending list becomes non-empty and // returns the whole list, transferring it to from VM-land // to Java-land in a safe and robust way. Reference pending = getPendingReferences(); // Enqueue the references while (pending != null) { Reference r = pending; pending = r.discovered; r.discovered = null; ReferenceQueue q = r.queue; if (q != ReferenceQueue.NULL) { q.enqueue(r); } } } ...so I checked what it would be needed if there was such getPendingReferences() native method. It turns out that a single native method would not be enough to support the precise direct ByteBuffer allocation. Here's a refactored webrev that introduces a getPendingReferences() method which could be turned into a native equivalent one day. There is another native method needed - int awaitEnqueuePhaseStart(): http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/ The need for this additional method arises when one wants to combine reference discovery with enqueueing of discovered references into one synchronous operation (discoverAndEnqueueReferences()). A direct ByteBuffer allocating thread wants to trigger reference discovery (System.gc()) and wait for discovered references to be enqueued before continuing with direct memory reservation retries. An alternative to what I have done in above webrev would be a maintenance of a single enqueuePhase counter on the Java side with usage roughly as: discoverAndEnqueueReferences() { int phase = Reference.getEnqueuePhase(); System.gc(); Reference.awaitEnqueuePhaseGreaterThan(phase); } But in that case, System.gc() would have to guarantee that after discovery of no new references, blocked getPendingReferences() would still return with an empty list of References (null) just to keep the DBB allocating thread alive. I have tried to do this variant and unfortunately it can't be reliably performed with current protocol as getPendingReferences() can only be programmed to return non-empty Reference lists without ambiguity. I created a DirectBufferAllocOOMETest to exercise situations where no new Reference(s) are discovered in a GC round. So do what do you think - what would it be easier to support: a) getPendingReferences() returns empty Reference list (null) after a GC round that discovers no new pending references b) getPendingReferences() returns when new Reference(s) are discovered and there is an additional int awaitEnqueuePhaseStart() as defined in above webrev. Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi, On 2016-03-23 08:13, Peter Levart wrote: On 03/22/2016 10:28 PM, Kim Barrett wrote: On Mar 22, 2016, at 5:24 AM, Per Lidenwrote: One thing I like about this approach is that it's only the ReferenceHandler thread that pops of elements from the pending list and enqueues them. That simplifies things a lot. I like that too. And hopefully we really can get rid of sun.misc.Cleaner (under whatever name). From a GC perspective I would however like to get away from the shared pending list and the pending list lock entirety and instead provide a VM downcall to get the pending list. The goal would of course be to have a more robust way of transferring the pending list to Java land, instead of today's secret handshake which is easy to get wrong. Also, not requiring the pending list lock (which is a Java monitor) to be held during a GC would also simplify things a lot on the GC side. E.g. the ReferencePendingListLockerThread could be removed completely. I’ve been thinking along the same lines. I think having the pending list (and associated locking and notification) in Java is just making life difficult for ourselves, and that things could be much simpler if that whole protocol was owned by the VM. Once the reference handler thread has obtained the latest list, if it then wants to publish that list for other Java threads to help process, that’s a policy choice that can be explored on the Java side, with no impact on the VM (including the GC). If the only blocking/waiting of ReferenceHandler thread was performed by native code, could it simply ignore Java thread interrupts? If this is possible, then the problems of InterruptedException allocation and consequent OutOfMemoryError(s) just disappear. Yes, blocking in the VM here would ignore thread interrupts and not throw InterruptedException. cheers, Per
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 03/22/2016 10:28 PM, Kim Barrett wrote: On Mar 22, 2016, at 5:24 AM, Per Lidenwrote: One thing I like about this approach is that it's only the ReferenceHandler thread that pops of elements from the pending list and enqueues them. That simplifies things a lot. I like that too. And hopefully we really can get rid of sun.misc.Cleaner (under whatever name). From a GC perspective I would however like to get away from the shared pending list and the pending list lock entirety and instead provide a VM downcall to get the pending list. The goal would of course be to have a more robust way of transferring the pending list to Java land, instead of today's secret handshake which is easy to get wrong. Also, not requiring the pending list lock (which is a Java monitor) to be held during a GC would also simplify things a lot on the GC side. E.g. the ReferencePendingListLockerThread could be removed completely. I’ve been thinking along the same lines. I think having the pending list (and associated locking and notification) in Java is just making life difficult for ourselves, and that things could be much simpler if that whole protocol was owned by the VM. Once the reference handler thread has obtained the latest list, if it then wants to publish that list for other Java threads to help process, that’s a policy choice that can be explored on the Java side, with no impact on the VM (including the GC). If the only blocking/waiting of ReferenceHandler thread was performed by native code, could it simply ignore Java thread interrupts? If this is possible, then the problems of InterruptedException allocation and consequent OutOfMemoryError(s) just disappear. Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
> On Mar 22, 2016, at 5:24 AM, Per Lidenwrote: > One thing I like about this approach is that it's only the ReferenceHandler > thread that pops of elements from the pending list and enqueues them. That > simplifies things a lot. I like that too. And hopefully we really can get rid of sun.misc.Cleaner (under whatever name). > From a GC perspective I would however like to get away from the shared > pending list and the pending list lock entirety and instead provide a VM > downcall to get the pending list. The goal would of course be to have a more > robust way of transferring the pending list to Java land, instead of today's > secret handshake which is easy to get wrong. Also, not requiring the pending > list lock (which is a Java monitor) to be held during a GC would also > simplify things a lot on the GC side. E.g. the > ReferencePendingListLockerThread could be removed completely. I’ve been thinking along the same lines. I think having the pending list (and associated locking and notification) in Java is just making life difficult for ourselves, and that things could be much simpler if that whole protocol was owned by the VM. Once the reference handler thread has obtained the latest list, if it then wants to publish that list for other Java threads to help process, that’s a policy choice that can be explored on the Java side, with no impact on the VM (including the GC).
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter, On 2016-03-21 16:30, Peter Levart wrote: Hi Per, May I point you to my proposed change in Reference(Handler) for JDK 9, being discussed in the thread about JDK-8149925. It will hopefully remove the special-casing of sun.misc.Cleaner, change the way how pending references are being enqueued by ReferenceHandler thread and how other thread(s) can synchronize with it. Since you seem to have a great knowledge of VM part of things, I would very much like to hear what you think of that change. Here's the latest webrev: http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.08.part2/ (see Reference.java and Bits.java for an example of how this synchronization with ReferenceHandler thread is to be used) One thing I like about this approach is that it's only the ReferenceHandler thread that pops of elements from the pending list and enqueues them. That simplifies things a lot. From a GC perspective I would however like to get away from the shared pending list and the pending list lock entirety and instead provide a VM downcall to get the pending list. The goal would of course be to have a more robust way of transferring the pending list to Java land, instead of today's secret handshake which is easy to get wrong. Also, not requiring the pending list lock (which is a Java monitor) to be held during a GC would also simplify things a lot on the GC side. E.g. the ReferencePendingListLockerThread could be removed completely. So, I imagine the ReferenceHandler could do something like this: while (true) { // getPendingReferences() is a downcall to the VM which // blocks until the pending list becomes non-empty and // returns the whole list, transferring it to from VM-land // to Java-land in a safe and robust way. Reference pending = getPendingReferences(); // Enqueue the references while (pending != null) { Reference r = pending; pending = r.discovered; r.discovered = null; ReferenceQueue q = r.queue; if (q != ReferenceQueue.NULL) { q.enqueue(r); } } } I haven't thought through the details when it comes having additional Java threads helping out with Cleaners. The ReferenceHandler would be free to use whatever lists/locks is wants to handle this and the GC wouldn't know anything about it. But, with the above approach at least the interface between the ReferenceHandler and the VM would be pretty clear and hard(er) to misuse. cheers, Per Regards, Peter On 03/21/2016 04:13 PM, Peter Levart wrote: Hi Per, David, As things stand, there is a very good chance that sun.misc.Cleaner will go away in JDK9, so all this speculation about the source of OOME(s) can be put to rest. But for JDK 8u, I agree that this should be sorted out. My feeling is that (instanceof Cleaner) can not result in allocation and therefore can not trigger OOME if the Cleaner class is already loaded at that time. I think that we were chasing the wrong rabbit. As I have found later, there is a much more probable cause for ReferenceHandler thread dying with OOME after the fix to catch OOME from lock.wait(). It is triggered by the invocation of Cleaner.clean() later down in the code. I even created a reproducer for it. See my last two comments of the following issue: https://bugs.openjdk.java.net/browse/JDK-8066859 (but don't look at the proposed fix since it is not very good) I think that for JDK 8u we could revert the code and do (instanceof Cleaner) checks outside the synchronized block and in addition, find a way to handle the OOME thrown from Cleaner.clean(). What do you think? Regards, Peter On 03/21/2016 02:41 PM, Per Liden wrote: Hi David, On 2016-03-21 13:49, David Holmes wrote: Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 2016-03-21 18:32, Kim Barrett wrote: On Mar 21, 2016, at 8:20 AM, Per Lidenwrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. Thoughts? thanks, Per Btw, to the best of my knowledge, the pre-loading of Cleaner should avoid any GC activity from instanceof, but I can't say that am a 100% sure either. Per - I think you are raising the same issue as discussed in https://bugs.openjdk.java.net/browse/JDK-8055232. Ah, thanks Kim for pointing that out. cheers, Per
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter, On 2016-03-21 16:13, Peter Levart wrote: Hi Per, David, As things stand, there is a very good chance that sun.misc.Cleaner will go away in JDK9, so all this speculation about the source of OOME(s) can be put to rest. But for JDK 8u, I agree that this should be sorted out. My feeling is that (instanceof Cleaner) can not result in allocation and therefore can not trigger OOME if the Cleaner class is already loaded at that time. I think that we were chasing the wrong rabbit. As I have found later, there is a much more probable cause for ReferenceHandler thread dying with OOME after the fix to catch OOME from lock.wait(). It is triggered by the invocation of Cleaner.clean() later down in the code. I even created a reproducer for it. See my last two comments of the following issue: https://bugs.openjdk.java.net/browse/JDK-8066859 (but don't look at the proposed fix since it is not very good) I think that for JDK 8u we could revert the code and do (instanceof Cleaner) checks outside the synchronized block and in addition, find a way to handle the OOME thrown from Cleaner.clean(). What do you think? That sound good to me. With the addition of the try/catch around Cleaner.clean() catching not just OOME, but all Throwables, right? cheers, Per Regards, Peter On 03/21/2016 02:41 PM, Per Liden wrote: Hi David, On 2016-03-21 13:49, David Holmes wrote: Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. Then the code was completely broken because it was obviously capable of allocating whilst holding the lock. There is nothing in the Java code to indicate allocation should not happen and no way that Java code can directly control that! We were only fixing the problem of the exception killing the thread, not trying to address an undisclosed illegal allocation problem! JDK-8022321 did indeed fix a real issue. It might also have unintentionally introduced a new one. Prior to JDK-8022321 we knew that the ReferenceHandler couldn't provoke a GC while manipulating the pending list, since the code was: synchronized (lock) { if (pending != null) { r = pending; pending = r.discovered; r.discovered = null; } else { } } The manipulation of the pending list is built on some secret/ugly rules and handshakes between the GC and the ReferenceHandler, which only works because we control of both. How would a GC thread update pending if the ReferenceHandlerThread holds the lock? The pending list lock is grabbed by the Java thread issuing the VM operation, on behalf of the GC to allow the GC the manipulate the pending list. If the thread issuing the VM operation is the ReferenceHandler, then the monitor is taken recursively, which is ok as long as ReferenceHandler isn't in the middle of unlinking an element. On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. I found it very difficult to determine with 100% certainty whether or not the instanceof could ever trigger an allocation and hence potentially an OOME. I agree, it's not obvious. cheers, Per With JVMCI it is now easier to imagine that compilation of
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 22/03/2016 3:32 AM, Kim Barrett wrote: On Mar 21, 2016, at 8:20 AM, Per Lidenwrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. Thoughts? thanks, Per Btw, to the best of my knowledge, the pre-loading of Cleaner should avoid any GC activity from instanceof, but I can't say that am a 100% sure either. Per - I think you are raising the same issue as discussed in https://bugs.openjdk.java.net/browse/JDK-8055232. That bug somehow escaped my notice as well. :( Thanks, David -
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 21/03/2016 11:41 PM, Per Liden wrote: Hi David, On 2016-03-21 13:49, David Holmes wrote: Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. Then the code was completely broken because it was obviously capable of allocating whilst holding the lock. There is nothing in the Java code to indicate allocation should not happen and no way that Java code can directly control that! We were only fixing the problem of the exception killing the thread, not trying to address an undisclosed illegal allocation problem! JDK-8022321 did indeed fix a real issue. It might also have unintentionally introduced a new one. Prior to JDK-8022321 we knew that the ReferenceHandler couldn't provoke a GC while manipulating the pending list, since the code was: synchronized (lock) { if (pending != null) { r = pending; pending = r.discovered; r.discovered = null; } else { } } Except that it actually could if the wait() in the else part was interrupted. But yes the move of instanceof did add another potential allocation point (as follow up bugs showed) but the pre-loading does seem to have addressed that (though perhaps not with 100% certainty). The manipulation of the pending list is built on some secret/ugly rules and handshakes between the GC and the ReferenceHandler, which only works because we control of both. Unfortunately implicit allocation was not given enough consideration. Which really makes me concerned about the possibility of this code being JIT-compiled by a Java compiler under JVMCI! How would a GC thread update pending if the ReferenceHandlerThread holds the lock? The pending list lock is grabbed by the Java thread issuing the VM operation, on behalf of the GC to allow the GC the manipulate the pending list. If the thread issuing the VM operation is the ReferenceHandler, then the monitor is taken recursively, which is ok as long as ReferenceHandler isn't in the middle of unlinking an element. Ah I see. Thanks, David - On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. I found it very difficult to determine with 100% certainty whether or not the instanceof could ever trigger an allocation and hence potentially an OOME. I agree, it's not obvious. cheers, Per With JVMCI it is now easier to imagine that compilation of this code by a JVMCI compiler might lead to allocation while the lock is held! Cheers, David Thoughts? thanks, Per Btw, to the best of my knowledge, the pre-loading of Cleaner should avoid any GC activity from instanceof, but I can't say that am a 100% sure either. Any specific reason to use Unsafe to do the preload rather than Class.forName ? Does this force Unsafe to be loaded earlier than it otherwise would? Thanks, David all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler). I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test with this code and report any failure. Thanks, Peter On 01/21/2014 08:57 AM, David Holmes wrote: On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
> On Mar 21, 2016, at 8:20 AM, Per Lidenwrote: > > Hi Peter & David, > > (Resurrecting an old thread here...) > > On 2014-01-22 03:19, David Holmes wrote: >> Hi Peter, >> >> On 22/01/2014 12:00 AM, Peter Levart wrote: >>> Hi, David, Kalyan, >>> >>> Summing up the discussion, I propose the following patch for >>> ReferenceHandler: >>> >>> http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ >>> >> >> I can live with it, though it maybe that once Cleaner has been preloaded >> instanceof can no longer throw OOME. Can't be 100% sure. And there's >> some duplication/verbosity in the commentary that could be trimmed down :) > > While investigating a Reference pending list issue on the GC side of things I > looked at the ReferenceHandler thread and noticed something which made me > uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to > avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with > a comment that it "sometimes" can throw an OOME. I understand this was done > because we're not 100% sure if a OOME can still happen here, despite the > pre-loading. > > However, if it can throw an OOME that means it's allocating, which in turn > means it can provoke a GC. If that happens, it looks to me like we have a bug > here. The ReferenceHandler thread is not allowed to provoke a GC while it's > holding on to the pending list lock, since the pending list might be updated > during a GC and "pending = r.discovered" will than overwrite something other > than "r", silently dropping any newly discovered References which will never > be discovered by the the GC again. > > On the other hand, if an OOME can never happen (i.e. no GC) here then we're > good the comment is just incorrect. The instanceof check could be moved out > of the try/catch block again, like it was prior to this change, just to make > it obvious that we will not be able to cause new allocations inside the > critical section. Or at a minimum, the comment saying OOME can still happen > should be adjusted. > > Thoughts? > > thanks, > Per > > Btw, to the best of my knowledge, the pre-loading of Cleaner should avoid any > GC activity from instanceof, but I can't say that am a 100% sure either. Per - I think you are raising the same issue as discussed in https://bugs.openjdk.java.net/browse/JDK-8055232.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 03/21/2016 04:13 PM, Peter Levart wrote: Hi Per, David, As things stand, there is a very good chance that sun.misc.Cleaner will go away in JDK9, so all this speculation about the source of OOME(s) can be put to rest. But for JDK 8u, I agree that this should be sorted out. My feeling is that (instanceof Cleaner) can not result in allocation and therefore can not trigger OOME if the Cleaner class is already loaded at that time. I think that we were chasing the wrong rabbit. As I have found later, there is a much more probable cause for ReferenceHandler thread dying with OOME after the fix to catch OOME from lock.wait(). It is triggered by the invocation of Cleaner.clean() later down in the code. I even created a reproducer for it. See my last two comments of the following issue: https://bugs.openjdk.java.net/browse/JDK-8066859 (but don't look at the proposed fix since it is not very good) I think that for JDK 8u we could revert the code and do (instanceof Cleaner) checks outside the synchronized block and in addition, find a way to handle the OOME thrown from Cleaner.clean(). What do you think? Regards, Peter OTOH, If you are not 100% sure about instanceof doing allocation, then a simple fix would be to re-check the 'pending' field if it still points to the same object as before instanceof check: synchronized (lock) { while ((r = pending) != null) { // 'instanceof' might throw OutOfMemoryError sometimes // so do this before un-linking 'r' from the 'pending' chain... c = r instanceof Cleaner ? (Cleaner) r : null; // unlink 'r' from 'pending' chain if it is still the same as before // 'instanceof' check which might have triggered GC and GC might // have discovered some more references and hooked them on // the pending list... if (pending == r) { pending = r.discovered; r.discovered = null; break; } } if (r == null) { // The waiting on the lock may cause an OutOfMemoryError // because it may try to allocate exception objects. if (waitForNotify) { lock.wait(); } // retry if waited return waitForNotify; } } Regards, Peter On 03/21/2016 02:41 PM, Per Liden wrote: Hi David, On 2016-03-21 13:49, David Holmes wrote: Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. Then the code was completely broken because it was obviously capable of allocating whilst holding the lock. There is nothing in the Java code to indicate allocation should not happen and no way that Java code can directly control that! We were only fixing the problem of the exception killing the thread, not trying to address an undisclosed illegal allocation problem! JDK-8022321 did indeed fix a real issue. It might also have unintentionally introduced a new one. Prior to JDK-8022321 we knew that the ReferenceHandler couldn't provoke a GC while manipulating the pending list, since the code was: synchronized (lock) { if (pending != null) { r = pending; pending = r.discovered; r.discovered
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Per, May I point you to my proposed change in Reference(Handler) for JDK 9, being discussed in the thread about JDK-8149925. It will hopefully remove the special-casing of sun.misc.Cleaner, change the way how pending references are being enqueued by ReferenceHandler thread and how other thread(s) can synchronize with it. Since you seem to have a great knowledge of VM part of things, I would very much like to hear what you think of that change. Here's the latest webrev: http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.08.part2/ (see Reference.java and Bits.java for an example of how this synchronization with ReferenceHandler thread is to be used) Regards, Peter On 03/21/2016 04:13 PM, Peter Levart wrote: Hi Per, David, As things stand, there is a very good chance that sun.misc.Cleaner will go away in JDK9, so all this speculation about the source of OOME(s) can be put to rest. But for JDK 8u, I agree that this should be sorted out. My feeling is that (instanceof Cleaner) can not result in allocation and therefore can not trigger OOME if the Cleaner class is already loaded at that time. I think that we were chasing the wrong rabbit. As I have found later, there is a much more probable cause for ReferenceHandler thread dying with OOME after the fix to catch OOME from lock.wait(). It is triggered by the invocation of Cleaner.clean() later down in the code. I even created a reproducer for it. See my last two comments of the following issue: https://bugs.openjdk.java.net/browse/JDK-8066859 (but don't look at the proposed fix since it is not very good) I think that for JDK 8u we could revert the code and do (instanceof Cleaner) checks outside the synchronized block and in addition, find a way to handle the OOME thrown from Cleaner.clean(). What do you think? Regards, Peter On 03/21/2016 02:41 PM, Per Liden wrote: Hi David, On 2016-03-21 13:49, David Holmes wrote: Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. Then the code was completely broken because it was obviously capable of allocating whilst holding the lock. There is nothing in the Java code to indicate allocation should not happen and no way that Java code can directly control that! We were only fixing the problem of the exception killing the thread, not trying to address an undisclosed illegal allocation problem! JDK-8022321 did indeed fix a real issue. It might also have unintentionally introduced a new one. Prior to JDK-8022321 we knew that the ReferenceHandler couldn't provoke a GC while manipulating the pending list, since the code was: synchronized (lock) { if (pending != null) { r = pending; pending = r.discovered; r.discovered = null; } else { } } The manipulation of the pending list is built on some secret/ugly rules and handshakes between the GC and the ReferenceHandler, which only works because we control of both. How would a GC thread update pending if the ReferenceHandlerThread holds the lock? The pending list lock is grabbed by the Java thread issuing the VM operation, on behalf of the GC to allow the GC the manipulate the pending list. If the thread issuing the VM operation is the ReferenceHandler, then the monitor is taken recursively, which is ok as long as ReferenceHandler isn't in the middle of unlinking an element. On the other hand, if an OOME can never happen (i.e. no GC) here
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Per, David, As things stand, there is a very good chance that sun.misc.Cleaner will go away in JDK9, so all this speculation about the source of OOME(s) can be put to rest. But for JDK 8u, I agree that this should be sorted out. My feeling is that (instanceof Cleaner) can not result in allocation and therefore can not trigger OOME if the Cleaner class is already loaded at that time. I think that we were chasing the wrong rabbit. As I have found later, there is a much more probable cause for ReferenceHandler thread dying with OOME after the fix to catch OOME from lock.wait(). It is triggered by the invocation of Cleaner.clean() later down in the code. I even created a reproducer for it. See my last two comments of the following issue: https://bugs.openjdk.java.net/browse/JDK-8066859 (but don't look at the proposed fix since it is not very good) I think that for JDK 8u we could revert the code and do (instanceof Cleaner) checks outside the synchronized block and in addition, find a way to handle the OOME thrown from Cleaner.clean(). What do you think? Regards, Peter On 03/21/2016 02:41 PM, Per Liden wrote: Hi David, On 2016-03-21 13:49, David Holmes wrote: Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. Then the code was completely broken because it was obviously capable of allocating whilst holding the lock. There is nothing in the Java code to indicate allocation should not happen and no way that Java code can directly control that! We were only fixing the problem of the exception killing the thread, not trying to address an undisclosed illegal allocation problem! JDK-8022321 did indeed fix a real issue. It might also have unintentionally introduced a new one. Prior to JDK-8022321 we knew that the ReferenceHandler couldn't provoke a GC while manipulating the pending list, since the code was: synchronized (lock) { if (pending != null) { r = pending; pending = r.discovered; r.discovered = null; } else { } } The manipulation of the pending list is built on some secret/ugly rules and handshakes between the GC and the ReferenceHandler, which only works because we control of both. How would a GC thread update pending if the ReferenceHandlerThread holds the lock? The pending list lock is grabbed by the Java thread issuing the VM operation, on behalf of the GC to allow the GC the manipulate the pending list. If the thread issuing the VM operation is the ReferenceHandler, then the monitor is taken recursively, which is ok as long as ReferenceHandler isn't in the middle of unlinking an element. On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. I found it very difficult to determine with 100% certainty whether or not the instanceof could ever trigger an allocation and hence potentially an OOME. I agree, it's not obvious. cheers, Per With JVMCI it is now easier to imagine that compilation of this code by a JVMCI compiler might lead to allocation while the lock is held! Cheers, David Thoughts? thanks, Per Btw, to the best of my knowledge, the pre-loading of
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi David, On 2016-03-21 13:49, David Holmes wrote: Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. Then the code was completely broken because it was obviously capable of allocating whilst holding the lock. There is nothing in the Java code to indicate allocation should not happen and no way that Java code can directly control that! We were only fixing the problem of the exception killing the thread, not trying to address an undisclosed illegal allocation problem! JDK-8022321 did indeed fix a real issue. It might also have unintentionally introduced a new one. Prior to JDK-8022321 we knew that the ReferenceHandler couldn't provoke a GC while manipulating the pending list, since the code was: synchronized (lock) { if (pending != null) { r = pending; pending = r.discovered; r.discovered = null; } else { } } The manipulation of the pending list is built on some secret/ugly rules and handshakes between the GC and the ReferenceHandler, which only works because we control of both. How would a GC thread update pending if the ReferenceHandlerThread holds the lock? The pending list lock is grabbed by the Java thread issuing the VM operation, on behalf of the GC to allow the GC the manipulate the pending list. If the thread issuing the VM operation is the ReferenceHandler, then the monitor is taken recursively, which is ok as long as ReferenceHandler isn't in the middle of unlinking an element. On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. I found it very difficult to determine with 100% certainty whether or not the instanceof could ever trigger an allocation and hence potentially an OOME. I agree, it's not obvious. cheers, Per With JVMCI it is now easier to imagine that compilation of this code by a JVMCI compiler might lead to allocation while the lock is held! Cheers, David Thoughts? thanks, Per Btw, to the best of my knowledge, the pre-loading of Cleaner should avoid any GC activity from instanceof, but I can't say that am a 100% sure either. Any specific reason to use Unsafe to do the preload rather than Class.forName ? Does this force Unsafe to be loaded earlier than it otherwise would? Thanks, David all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler). I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test with this code and report any failure. Thanks, Peter On 01/21/2014 08:57 AM, David Holmes wrote: On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Per, On 21/03/2016 10:20 PM, Per Liden wrote: Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. Then the code was completely broken because it was obviously capable of allocating whilst holding the lock. There is nothing in the Java code to indicate allocation should not happen and no way that Java code can directly control that! We were only fixing the problem of the exception killing the thread, not trying to address an undisclosed illegal allocation problem! How would a GC thread update pending if the ReferenceHandlerThread holds the lock? On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. I found it very difficult to determine with 100% certainty whether or not the instanceof could ever trigger an allocation and hence potentially an OOME. With JVMCI it is now easier to imagine that compilation of this code by a JVMCI compiler might lead to allocation while the lock is held! Cheers, David Thoughts? thanks, Per Btw, to the best of my knowledge, the pre-loading of Cleaner should avoid any GC activity from instanceof, but I can't say that am a 100% sure either. Any specific reason to use Unsafe to do the preload rather than Class.forName ? Does this force Unsafe to be loaded earlier than it otherwise would? Thanks, David all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler). I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test with this code and report any failure. Thanks, Peter On 01/21/2014 08:57 AM, David Holmes wrote: On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* Curious. I wonder what the controlling factor is ?? I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So perhaps it would be good to trigger Cleaner loading and initialization as part of ReferenceHandler initialization to play things safe. If we do that for Cleaner we may as well do it for InterruptedException too. Also, it is not that I think ReferenceHandler is responsible for reporting OOME, but that it is responsible for reporting that it was unable to perform a clean or enqueue because of OOME. This would be necessary if
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter & David, (Resurrecting an old thread here...) On 2014-01-22 03:19, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) While investigating a Reference pending list issue on the GC side of things I looked at the ReferenceHandler thread and noticed something which made me uneasy. The fix for JDK-8022321 added pre-loading of the Cleaer class to avoid OMME, but also moved the "instanceof Cleaner" inside the try/catch with a comment that it "sometimes" can throw an OOME. I understand this was done because we're not 100% sure if a OOME can still happen here, despite the pre-loading. However, if it can throw an OOME that means it's allocating, which in turn means it can provoke a GC. If that happens, it looks to me like we have a bug here. The ReferenceHandler thread is not allowed to provoke a GC while it's holding on to the pending list lock, since the pending list might be updated during a GC and "pending = r.discovered" will than overwrite something other than "r", silently dropping any newly discovered References which will never be discovered by the the GC again. On the other hand, if an OOME can never happen (i.e. no GC) here then we're good the comment is just incorrect. The instanceof check could be moved out of the try/catch block again, like it was prior to this change, just to make it obvious that we will not be able to cause new allocations inside the critical section. Or at a minimum, the comment saying OOME can still happen should be adjusted. Thoughts? thanks, Per Btw, to the best of my knowledge, the pre-loading of Cleaner should avoid any GC activity from instanceof, but I can't say that am a 100% sure either. Any specific reason to use Unsafe to do the preload rather than Class.forName ? Does this force Unsafe to be loaded earlier than it otherwise would? Thanks, David all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler). I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test with this code and report any failure. Thanks, Peter On 01/21/2014 08:57 AM, David Holmes wrote: On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* Curious. I wonder what the controlling factor is ?? I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So perhaps it would be good to trigger Cleaner loading and initialization as part of ReferenceHandler initialization to play things safe. If we do that for Cleaner we may as well do it for InterruptedException too. Also, it is not that I think ReferenceHandler is responsible for reporting OOME, but that it is responsible for reporting that it was unable to perform a clean or enqueue because of OOME. This would be necessary if we skipped a Reference because of OOME, but if we just re-try until we eventually succeed, nothing is lost, nothing to report (but a slow response)... Agreed - just trying to clarify things. Your suggested approach seems okay though I'm not sure why we shouldn't help things along by calling System.gc() ourselves rather than just yielding and hoping things will get cleaned up elsewhere. But for the present purposes your approach will suffice I think. Maybe my understanding is wrong but isn't the fact that OOME is rised a consequence of that VM has already attempted to clear things up (executing a GC round synchronously) but didn't succeed to make enough free space to satisfy the allocation request? If this is only how some collectors/allocators are implemented and not a general rule, then we
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 29/01/2014 19:10, Mandy Chung wrote: On 1/29/2014 5:09 AM, Peter Levart wrote: Since I don't know what should be the correct behaviour of javac, I can leave the Reference.java changes as proposed since it compiles in both cases. Or should I revert the change to declaration of local variable 'q' ? I slightly prefer to revert the change to ReferenceQueue? super Object for now as there is no supertype for Object and this looks a little odd. We can clean this up as a separate fix after we get clarification from compiler-dev. I see Peter has posted a question to compiler-dev on this and it can always be re-visited once it clear why it compiles when both Reference and ReferenceQueue are in the same compilation unit. -Alan
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/30/2014 03:46 PM, Alan Bateman wrote: On 29/01/2014 19:10, Mandy Chung wrote: On 1/29/2014 5:09 AM, Peter Levart wrote: Since I don't know what should be the correct behaviour of javac, I can leave the Reference.java changes as proposed since it compiles in both cases. Or should I revert the change to declaration of local variable 'q' ? I slightly prefer to revert the change to ReferenceQueue? super Object for now as there is no supertype for Object and this looks a little odd. We can clean this up as a separate fix after we get clarification from compiler-dev. I see Peter has posted a question to compiler-dev on this and it can always be re-visited once it clear why it compiles when both Reference and ReferenceQueue are in the same compilation unit. -Alan I Just commited the version with no change to ReferenceQueueObject line to jdk9/dev. If there is a bug in javac and the code would not compile as is, the change to this line should be committed as part of javac fix, right? Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 30/01/2014 14:51, Peter Levart wrote: I Just commited the version with no change to ReferenceQueueObject line to jdk9/dev. If there is a bug in javac and the code would not compile as is, the change to this line should be committed as part of javac fix, right? It's good to get this change in. If javac were to be changed to reject this code then it need to be changed at the same time (but I guess we wait to see if this is case as it's just not obvious yet). -Alan
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/28/2014 04:46 PM, Alan Bateman wrote: On 28/01/2014 08:44, Peter Levart wrote: Yes, I tried that too and it results in even more unsafe casts. It's odd yes, since the compile-time error is not present when building via OpenJDK build system make files (using make images in top directory for example) but only if I compile the class from command line (using javac directly) or from IDEA. I use JDK 8 ea-b121 in all cases as a build JDK. Are there any special options passed to javac for compiling those classes in JDK build system that allow such code? jdk/make/Setup.gmk has the -Xlint options that are used in the build but I suspect it more than that all the classes in java/lang/ref are compiled together. -Alan That's right. If I add the source for ReferenceQueue.java into a directory where Reference.java resides and then compile with: javac -d /tmp Reference.java ...then Reference as well as ReferenceQueue gets compiled and there's no error. If there is sole Reference.java in the directory, a compile time error is emitted. I checked the source of ReferenceQueue.java in JDK 8 ea-b121 (the JDK used for compiling) and it only differs in copyright year from the source in jdk9-dev. So there seems to be inconsistency in javac's handling of types that are read from .class vs. .java files. I'll try to create a reproducer example and post it to compiler-dev. Since I don't know what should be the correct behaviour of javac, I can leave the Reference.java changes as proposed since it compiles in both cases. Or should I revert the change to declaration of local variable 'q' ? Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 1/29/2014 5:09 AM, Peter Levart wrote: Since I don't know what should be the correct behaviour of javac, I can leave the Reference.java changes as proposed since it compiles in both cases. Or should I revert the change to declaration of local variable 'q' ? I slightly prefer to revert the change to ReferenceQueue? super Object for now as there is no supertype for Object and this looks a little odd. We can clean this up as a separate fix after we get clarification from compiler-dev. Mandy
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/28/2014 03:17 AM, David Holmes wrote: On 27/01/2014 5:07 AM, Peter Levart wrote: On 01/25/2014 05:35 AM, srikalyan chandrashekar wrote: Hi Peter, if you are a committer would you like to take this further (OR) perhaps david could sponsor this change. Hi, Here's new webrev that takes into account Kaylan's and David's review comments: cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.02/ I changed into using Class.forName() instead of Unsafe for class preloading and initialization just to be on the safe side regarding unwanted premature initialization of Unsafe class. I also took the liberty of removing an unneeded semicolon (line 114) and fixing a JDK 8 compile time error in generics (line 189): incompatible types: java.lang.ref.ReferenceQueuecapture#1 of ? super java.lang.Object cannot be converted to java.lang.ref.ReferenceQueuejava.lang.Object Seems somewhat odd given there is no supertype for Object but it is consistent with the field declaration: ReferenceQueue? super T queue; The generics here is a little odd as we don't really know the type of T we just play fast-and-loose by declaring: ReferenceObject r; Which only works because of erasure. I guess it wouldn't work to try and use a simple wildcard '?' for both 'r' and 'q' as they would be different captures to javac. Yes, I tried that too and it results in even more unsafe casts. It's odd yes, since the compile-time error is not present when building via OpenJDK build system make files (using make images in top directory for example) but only if I compile the class from command line (using javac directly) or from IDEA. I use JDK 8 ea-b121 in all cases as a build JDK. Are there any special options passed to javac for compiling those classes in JDK build system that allow such code? Regards, Peter I re-ran the java/lang/ref tests and they pass. Can I count you as a reviewer, Kalyan? If I get a go also from David, I'll commit this to jdk9/dev... I can be counted as the Reviewer. Kalyan can be listed as a reviewer. Thanks Peter. David - Regards, Peter -- Thanks kalyan On 1/24/14 4:05 PM, Peter Levart wrote: On 01/24/2014 02:53 AM, srikalyan chandrashekar wrote: Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. Hi Kalyan, Good to hear that test runs without failures so far. Regarding assignment of 'r'. What I tried to accomplish with the change was eliminate double reading of 'pending' field. I have a mental model of local variable being a register and field being a memory location. This may be important if the field is volatile, but for normal fields, I guess the optimizer knows how to compile such code most optimally in either case. The old (your) version is better from logical perspective, since it guarantees that dereferencing the 'r', wherever it is possible, will never throw NPE (dereferencing where 'r' is not assigned is not possible because of definitive assignment rules). So I support going back to your version... Regards, Peter -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 28/01/2014 08:44, Peter Levart wrote: Yes, I tried that too and it results in even more unsafe casts. It's odd yes, since the compile-time error is not present when building via OpenJDK build system make files (using make images in top directory for example) but only if I compile the class from command line (using javac directly) or from IDEA. I use JDK 8 ea-b121 in all cases as a build JDK. Are there any special options passed to javac for compiling those classes in JDK build system that allow such code? jdk/make/Setup.gmk has the -Xlint options that are used in the build but I suspect it more than that all the classes in java/lang/ref are compiled together. -Alan
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 27/01/2014 5:07 AM, Peter Levart wrote: On 01/25/2014 05:35 AM, srikalyan chandrashekar wrote: Hi Peter, if you are a committer would you like to take this further (OR) perhaps david could sponsor this change. Hi, Here's new webrev that takes into account Kaylan's and David's review comments: cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.02/ I changed into using Class.forName() instead of Unsafe for class preloading and initialization just to be on the safe side regarding unwanted premature initialization of Unsafe class. I also took the liberty of removing an unneeded semicolon (line 114) and fixing a JDK 8 compile time error in generics (line 189): incompatible types: java.lang.ref.ReferenceQueuecapture#1 of ? super java.lang.Object cannot be converted to java.lang.ref.ReferenceQueuejava.lang.Object Seems somewhat odd given there is no supertype for Object but it is consistent with the field declaration: ReferenceQueue? super T queue; The generics here is a little odd as we don't really know the type of T we just play fast-and-loose by declaring: ReferenceObject r; Which only works because of erasure. I guess it wouldn't work to try and use a simple wildcard '?' for both 'r' and 'q' as they would be different captures to javac. I re-ran the java/lang/ref tests and they pass. Can I count you as a reviewer, Kalyan? If I get a go also from David, I'll commit this to jdk9/dev... I can be counted as the Reviewer. Kalyan can be listed as a reviewer. Thanks Peter. David - Regards, Peter -- Thanks kalyan On 1/24/14 4:05 PM, Peter Levart wrote: On 01/24/2014 02:53 AM, srikalyan chandrashekar wrote: Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. Hi Kalyan, Good to hear that test runs without failures so far. Regarding assignment of 'r'. What I tried to accomplish with the change was eliminate double reading of 'pending' field. I have a mental model of local variable being a register and field being a memory location. This may be important if the field is volatile, but for normal fields, I guess the optimizer knows how to compile such code most optimally in either case. The old (your) version is better from logical perspective, since it guarantees that dereferencing the 'r', wherever it is possible, will never throw NPE (dereferencing where 'r' is not assigned is not possible because of definitive assignment rules). So I support going back to your version... Regards, Peter -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 1/26/2014 11:07 AM, Peter Levart wrote: On 01/25/2014 05:35 AM, srikalyan chandrashekar wrote: Hi Peter, if you are a committer would you like to take this further (OR) perhaps david could sponsor this change. Hi, Here's new webrev that takes into account Kaylan's and David's review comments: cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.02/ This looks good to me. Sorry I have been behind in following the discussion of this thread. It's good to see this problem be diagnosed and fixed (thank you all). I also prefer using Class.forName to do the preloading and initialization. Mandy
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/25/2014 05:35 AM, srikalyan chandrashekar wrote: Hi Peter, if you are a committer would you like to take this further (OR) perhaps david could sponsor this change. Hi, Here's new webrev that takes into account Kaylan's and David's review comments: cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.02/ I changed into using Class.forName() instead of Unsafe for class preloading and initialization just to be on the safe side regarding unwanted premature initialization of Unsafe class. I also took the liberty of removing an unneeded semicolon (line 114) and fixing a JDK 8 compile time error in generics (line 189): incompatible types: java.lang.ref.ReferenceQueuecapture#1 of ? super java.lang.Object cannot be converted to java.lang.ref.ReferenceQueuejava.lang.Object I re-ran the java/lang/ref tests and they pass. Can I count you as a reviewer, Kalyan? If I get a go also from David, I'll commit this to jdk9/dev... Regards, Peter -- Thanks kalyan On 1/24/14 4:05 PM, Peter Levart wrote: On 01/24/2014 02:53 AM, srikalyan chandrashekar wrote: Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. Hi Kalyan, Good to hear that test runs without failures so far. Regarding assignment of 'r'. What I tried to accomplish with the change was eliminate double reading of 'pending' field. I have a mental model of local variable being a register and field being a memory location. This may be important if the field is volatile, but for normal fields, I guess the optimizer knows how to compile such code most optimally in either case. The old (your) version is better from logical perspective, since it guarantees that dereferencing the 'r', wherever it is possible, will never throw NPE (dereferencing where 'r' is not assigned is not possible because of definitive assignment rules). So I support going back to your version... Regards, Peter -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 1/26/14 11:07 AM, Peter Levart wrote: On 01/25/2014 05:35 AM, srikalyan chandrashekar wrote: Hi Peter, if you are a committer would you like to take this further (OR) perhaps david could sponsor this change. Hi, Here's new webrev that takes into account Kaylan's and David's review comments: cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.02/ I changed into using Class.forName() instead of Unsafe for class preloading and initialization just to be on the safe side regarding unwanted premature initialization of Unsafe class. I also took the liberty of removing an unneeded semicolon (line 114) and fixing a JDK 8 compile time error in generics (line 189): incompatible types: java.lang.ref.ReferenceQueuecapture#1 of ? super java.lang.Object cannot be converted to java.lang.ref.ReferenceQueuejava.lang.Object I re-ran the java/lang/ref tests and they pass. Can I count you as a reviewer, Kalyan? If I get a go also from David, I'll commit this to jdk9/dev... Hi Peter, I do not have review rights. So it has to be someone else from core-libs-dev. Regards, Peter -- Thanks kalyan -- Thanks kalyan On 1/24/14 4:05 PM, Peter Levart wrote: On 01/24/2014 02:53 AM, srikalyan chandrashekar wrote: Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. Hi Kalyan, Good to hear that test runs without failures so far. Regarding assignment of 'r'. What I tried to accomplish with the change was eliminate double reading of 'pending' field. I have a mental model of local variable being a register and field being a memory location. This may be important if the field is volatile, but for normal fields, I guess the optimizer knows how to compile such code most optimally in either case. The old (your) version is better from logical perspective, since it guarantees that dereferencing the 'r', wherever it is possible, will never throw NPE (dereferencing where 'r' is not assigned is not possible because of definitive assignment rules). So I support going back to your version... Regards, Peter -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/24/2014 02:53 AM, srikalyan chandrashekar wrote: Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. Hi Kalyan, Good to hear that test runs without failures so far. Regarding assignment of 'r'. What I tried to accomplish with the change was eliminate double reading of 'pending' field. I have a mental model of local variable being a register and field being a memory location. This may be important if the field is volatile, but for normal fields, I guess the optimizer knows how to compile such code most optimally in either case. The old (your) version is better from logical perspective, since it guarantees that dereferencing the 'r', wherever it is possible, will never throw NPE (dereferencing where 'r' is not assigned is not possible because of definitive assignment rules). So I support going back to your version... Regards, Peter -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/22/2014 03:19 AM, David Holmes wrote: Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) Any specific reason to use Unsafe to do the preload rather than Class.forName ? Does this force Unsafe to be loaded earlier than it otherwise would? Good question. In systemDictionary.hpp they are both on the preloaded list in this order: do_klass(Reference_klass, java_lang_ref_Reference, Pre ) \ ... do_klass(misc_Unsafe_klass, sun_misc_Unsafe, Pre ) \ So when Reference is initialized, the Unsafe is already loaded. But I don't know if it is already initialized. This should be studied. I'll try to find out what is the case and get back to you. Regards, Peter Thanks, David all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler). I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test with this code and report any failure. Thanks, Peter On 01/21/2014 08:57 AM, David Holmes wrote: On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* Curious. I wonder what the controlling factor is ?? I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So perhaps it would be good to trigger Cleaner loading and initialization as part of ReferenceHandler initialization to play things safe. If we do that for Cleaner we may as well do it for InterruptedException too. Also, it is not that I think ReferenceHandler is responsible for reporting OOME, but that it is responsible for reporting that it was unable to perform a clean or enqueue because of OOME. This would be necessary if we skipped a Reference because of OOME, but if we just re-try until we eventually succeed, nothing is lost, nothing to report (but a slow response)... Agreed - just trying to clarify things. Your suggested approach seems okay though I'm not sure why we shouldn't help things along by calling System.gc() ourselves rather than just yielding and hoping things will get cleaned up elsewhere. But for the present purposes your approach will suffice I think. Maybe my understanding is wrong but isn't the fact that OOME is rised a consequence of that VM has already attempted to clear things up (executing a GC round synchronously) but didn't succeed to make enough free space to satisfy the allocation request? If this is only how some collectors/allocators are implemented and not a general rule, then we should put a System.gc() in place of Thread.yield(). Should we also combine that with Thread.yield()? I'm concerned of a possibility that we spin, consume too much CPU (ReferenceHandler thread has MAX priority) so that other threads dont' get enough CPU time to proceed and clean things up (we hope other threads will also get OOME and release things as their stacks unwind...). You are probably right about the System.gc() - OOME should be thrown after GC fails to create space, so it really needs some other thread to drop live references to allow further space to be reclaimed. But note that Thread.yield() can behave badly on some linux systems too, so spinning is still a possibility - but either way this would only be really bad on a uniprocessor system where yield() is unlikely to misbehave. David - Regards, Peter Thanks, David On 20/01/2014 6:42 PM, Peter Levart wrote: On 01/20/2014 09:00 AM, Peter Levart wrote: On 01/20/2014 02:51 AM, David Holmes wrote: Hi Peter, On 17/01/2014 11:24 PM, Peter Levart wrote: On 01/17/2014 02:13 PM, Peter Levart
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter, if you are a committer would you like to take this further (OR) perhaps david could sponsor this change. -- Thanks kalyan On 1/24/14 4:05 PM, Peter Levart wrote: On 01/24/2014 02:53 AM, srikalyan chandrashekar wrote: Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. Hi Kalyan, Good to hear that test runs without failures so far. Regarding assignment of 'r'. What I tried to accomplish with the change was eliminate double reading of 'pending' field. I have a mental model of local variable being a register and field being a memory location. This may be important if the field is volatile, but for normal fields, I guess the optimizer knows how to compile such code most optimally in either case. The old (your) version is better from logical perspective, since it guarantees that dereferencing the 'r', wherever it is possible, will never throw NPE (dereferencing where 'r' is not assigned is not possible because of definitive assignment rules). So I support going back to your version... Regards, Peter -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter/David, we have 2000 runs without a single failure. -- Thanks kalyan Ph: (408)-585-8040 On 1/23/14, 12:10 PM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 24/01/2014 11:53 AM, srikalyan chandrashekar wrote: Hi David, yes thats right, only benefit i see is we can avoid assignment to 'r' if pending is null. I'm okay with either version. David -- Thanks kalyan On 1/23/14 4:33 PM, David Holmes wrote: On 24/01/2014 6:10 AM, srikalyan wrote: Hi Peter, i have modified your code from r = pending; if (r != null) { .. TO if (pending != null) { r = pending; This is because the r is used later in the code and must not be assigned pending unless it is not null(this was as is earlier). If r is null, because pending is null then you perform the wait() and then continue - back to the top of the loop. There is no bug in Peter's code. The new webrev is posted here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev-V2/ . I ran a 1000 run and no failures so far, however i would like to run a couple more 1000 runs to assert the fix. PS: The description section of JEP-122 (http://openjdk.java.net/jeps/122) says meta-data would be in native memory(not heap). The class_mirror is a Java object not meta-data. David -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 2:31 PM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* Curious. I wonder what the controlling factor is ?? I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So perhaps it would be good to trigger Cleaner loading and initialization as part of ReferenceHandler initialization to play things safe. If we do that for Cleaner we may as well do it for InterruptedException too. Also, it is not that I think ReferenceHandler is responsible for reporting OOME, but that it is responsible for reporting that it was unable to perform a clean or enqueue because of OOME. This would be necessary if we skipped a Reference because of OOME, but if we just re-try until we eventually succeed, nothing is lost, nothing to report (but a slow response)... Agreed - just trying to clarify things. Your suggested approach seems okay though I'm not sure why we shouldn't help things along by calling System.gc() ourselves rather than just yielding and hoping things will get cleaned up elsewhere. But for the present purposes your approach will suffice I think. Maybe my understanding is wrong but isn't the fact that OOME is rised a consequence of that VM has already attempted to clear things up (executing a GC round synchronously) but didn't succeed to make enough free space to satisfy the allocation request? If this is only how some collectors/allocators are implemented and not a general rule, then we should put a System.gc() in place of Thread.yield(). Should we also combine that with Thread.yield()? I'm concerned of a possibility that we spin, consume too much CPU (ReferenceHandler thread has MAX priority) so that other threads dont' get enough CPU time to proceed and clean things up (we hope other threads will also get OOME and release things as their stacks unwind...). You are probably right about the System.gc() - OOME should be thrown after GC fails to create space, so it really needs some other thread to drop live references to allow further space to be reclaimed. But note that Thread.yield() can behave badly on some linux systems too, so spinning is still a possibility - but either way this would only be really bad on a uniprocessor system where yield() is unlikely to misbehave. David - Regards, Peter Thanks, David On 20/01/2014 6:42 PM, Peter Levart wrote: On 01/20/2014 09:00 AM, Peter Levart wrote: On 01/20/2014 02:51 AM, David Holmes wrote: Hi Peter, On 17/01/2014 11:24 PM, Peter Levart wrote: On 01/17/2014 02:13 PM, Peter Levart wrote: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Well, no. The above code would just skip Cleaner processing in this situation. And will never be doing it again after the heap is freed... So it might be good to load and initialize Cleaner class as part of ReferenceHandler initialization to ensure correct operation... Well, yes and no. Let me try once more: Above code will skip Cleaner processing if the 1st time instanceof Cleaner is executed, OOME is thrown as a consequence of full heap while loading and initializing the Cleaner class. Yes - I was assuming that this would not fail the very first time and so the Cleaner class would already be loaded. Failing to be able to load the Cleaner class was one of the potential issues flagged earlier with this problem. I was actually assuming that Cleaner would be loaded already due to some actual Cleaner subclasses being used, but this does not happen as part of the default initialization. :( The irony being that if the Cleaner class is not
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler). I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test with this code and report any failure. Thanks, Peter On 01/21/2014 08:57 AM, David Holmes wrote: On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* Curious. I wonder what the controlling factor is ?? I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So perhaps it would be good to trigger Cleaner loading and initialization as part of ReferenceHandler initialization to play things safe. If we do that for Cleaner we may as well do it for InterruptedException too. Also, it is not that I think ReferenceHandler is responsible for reporting OOME, but that it is responsible for reporting that it was unable to perform a clean or enqueue because of OOME. This would be necessary if we skipped a Reference because of OOME, but if we just re-try until we eventually succeed, nothing is lost, nothing to report (but a slow response)... Agreed - just trying to clarify things. Your suggested approach seems okay though I'm not sure why we shouldn't help things along by calling System.gc() ourselves rather than just yielding and hoping things will get cleaned up elsewhere. But for the present purposes your approach will suffice I think. Maybe my understanding is wrong but isn't the fact that OOME is rised a consequence of that VM has already attempted to clear things up (executing a GC round synchronously) but didn't succeed to make enough free space to satisfy the allocation request? If this is only how some collectors/allocators are implemented and not a general rule, then we should put a System.gc() in place of Thread.yield(). Should we also combine that with Thread.yield()? I'm concerned of a possibility that we spin, consume too much CPU (ReferenceHandler thread has MAX priority) so that other threads dont' get enough CPU time to proceed and clean things up (we hope other threads will also get OOME and release things as their stacks unwind...). You are probably right about the System.gc() - OOME should be thrown after GC fails to create space, so it really needs some other thread to drop live references to allow further space to be reclaimed. But note that Thread.yield() can behave badly on some linux systems too, so spinning is still a possibility - but either way this would only be really bad on a uniprocessor system where yield() is unlikely to misbehave. David - Regards, Peter Thanks, David On 20/01/2014 6:42 PM, Peter Levart wrote: On 01/20/2014 09:00 AM, Peter Levart wrote: On 01/20/2014 02:51 AM, David Holmes wrote: Hi Peter, On 17/01/2014 11:24 PM, Peter Levart wrote: On 01/17/2014 02:13 PM, Peter Levart wrote: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Well, no. The above code would just skip Cleaner processing in this situation. And will never be doing it again after the heap is freed... So it might be good to load and initialize Cleaner class as part of ReferenceHandler initialization to ensure correct operation... Well, yes and no. Let me try once more: Above code will skip Cleaner processing if the 1st time instanceof Cleaner is executed, OOME is thrown as a consequence of full heap while loading and initializing the Cleaner class. Yes - I was assuming
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/21/2014 08:57 AM, David Holmes wrote: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* Curious. I wonder what the controlling factor is ?? The Cleaner is usually loaded by ReferenceHandler in JDK8 in the 1st execution of it's loop. It looks like JDK8 system initialization produces at least one XXXReference that is cleared before main() method is entered (debugging, I found it's a Finalizer for a FileInputStream - perhaps of the stream that loads the TimeZone data), so ReferenceHandler thread is woken-up, executes the instanceof Cleaner check and this loads the class. I put the following printfs in an original ReferenceHandler: System.out.println(Before using Cleaner...); // Fast path for cleaners if (r instanceof Cleaner) { ((Cleaner)r).clean(); continue; } System.out.println(After using Cleaner...); ...and the empty main() test with -verbose:class prints: ... [Loaded java.io.DataInput from /home/peter/work/hg/jdk8-tl/build/linux-x86_64-normal-server-release/images/j2sdk-image/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/work/hg/jdk8-tl/build/linux-x86_64-normal-server-release/images/j2sdk-image/jre/lib/rt.jar] *Before using Cleaner...** **[Loaded sun.misc.Cleaner from out/production/jdk]** **After using Cleaner...* [Loaded java.io.ByteArrayInputStream from /home/peter/work/hg/jdk8-tl/build/linux-x86_64-normal-server-release/images/j2sdk-image/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/work/hg/jdk8-tl/build/linux-x86_64-normal-server-release/images/j2sdk-image/jre/lib/rt.jar] ... But sometimes, It seems, the VM is not so quick in clearing the early XXXReferences and/or the ReferenceHandler start-up is delayed and the 1st iteration of the loop is executed after the OOMEInReferenceHandler test already fills the heap and consequently loading of Cleaner class throws OOME in instanceof check... My proposed fix is very aggressive. It pre-loads classes, initializes them and watches for OOMEs thrown in all ocasions. It might be that pre-loading Cleaner class in ReferenceHandler initialization would be sufficient to fix this intermittent failure. Or do you think instanceof check could throw OOME for some other reason besides loading of the class? Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 22/01/2014 1:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb I suspect it also depends on your TZ environment too as I do not see this on my systems. David Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 22/01/2014 8:31 AM, Peter Levart wrote: On 01/21/2014 07:17 PM, srikalyan wrote: Hi Peter/David, catching up after long weekend. Why would there be an OOME in object heap due to class loading in perm gen space ? The perm gen is not a problem her (JDK 8 does not have it and we see OOME on JDK8 too). Each time a class is loaded, new java.lang.Class object is allocated on heap. For the bootloader classes I thought, but could easily be wrong, that the Class mirror did indeed go into the PermGen. But still this is not relevant on JDK8 where there is no PermGen. It maybe that changed as part of the early PermGen removal prep work that did go into 7u. David Regards, Peter Please correct if i am missing something here. Meanwhile i will give the version of Reference Handler you both agreed on a try. -- Thanks kalyan Ph: (408)-585-8040 On 1/21/14, 7:24 AM, Peter Levart wrote: On 01/21/2014 07:54 AM, Peter Levart wrote: *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] ... I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So what changed between JDK 7 and JDK 8? I suspect the following: 8007572: Replace existing jdk timezone data at java.home/lib/zi with JSR310's tzdb Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter, On 22/01/2014 12:00 AM, Peter Levart wrote: Hi, David, Kalyan, Summing up the discussion, I propose the following patch for ReferenceHandler: http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ I can live with it, though it maybe that once Cleaner has been preloaded instanceof can no longer throw OOME. Can't be 100% sure. And there's some duplication/verbosity in the commentary that could be trimmed down :) Any specific reason to use Unsafe to do the preload rather than Class.forName ? Does this force Unsafe to be loaded earlier than it otherwise would? Thanks, David all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler). I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test with this code and report any failure. Thanks, Peter On 01/21/2014 08:57 AM, David Holmes wrote: On 21/01/2014 4:54 PM, Peter Levart wrote: On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* Curious. I wonder what the controlling factor is ?? I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So perhaps it would be good to trigger Cleaner loading and initialization as part of ReferenceHandler initialization to play things safe. If we do that for Cleaner we may as well do it for InterruptedException too. Also, it is not that I think ReferenceHandler is responsible for reporting OOME, but that it is responsible for reporting that it was unable to perform a clean or enqueue because of OOME. This would be necessary if we skipped a Reference because of OOME, but if we just re-try until we eventually succeed, nothing is lost, nothing to report (but a slow response)... Agreed - just trying to clarify things. Your suggested approach seems okay though I'm not sure why we shouldn't help things along by calling System.gc() ourselves rather than just yielding and hoping things will get cleaned up elsewhere. But for the present purposes your approach will suffice I think. Maybe my understanding is wrong but isn't the fact that OOME is rised a consequence of that VM has already attempted to clear things up (executing a GC round synchronously) but didn't succeed to make enough free space to satisfy the allocation request? If this is only how some collectors/allocators are implemented and not a general rule, then we should put a System.gc() in place of Thread.yield(). Should we also combine that with Thread.yield()? I'm concerned of a possibility that we spin, consume too much CPU (ReferenceHandler thread has MAX priority) so that other threads dont' get enough CPU time to proceed and clean things up (we hope other threads will also get OOME and release things as their stacks unwind...). You are probably right about the System.gc() - OOME should be thrown after GC fails to create space, so it really needs some other thread to drop live references to allow further space to be reclaimed. But note that Thread.yield() can behave badly on some linux systems too, so spinning is still a possibility - but either way this would only be really bad on a uniprocessor system where yield() is unlikely to misbehave. David - Regards, Peter Thanks, David On 20/01/2014 6:42 PM, Peter Levart wrote: On 01/20/2014 09:00 AM, Peter Levart wrote: On 01/20/2014 02:51 AM, David Holmes wrote: Hi Peter, On 17/01/2014 11:24 PM, Peter Levart wrote: On 01/17/2014 02:13 PM, Peter Levart wrote: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Well, no. The above code would just skip Cleaner processing in this situation. And will
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/20/2014 02:51 AM, David Holmes wrote: Hi Peter, On 17/01/2014 11:24 PM, Peter Levart wrote: On 01/17/2014 02:13 PM, Peter Levart wrote: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Well, no. The above code would just skip Cleaner processing in this situation. And will never be doing it again after the heap is freed... So it might be good to load and initialize Cleaner class as part of ReferenceHandler initialization to ensure correct operation... Well, yes and no. Let me try once more: Above code will skip Cleaner processing if the 1st time instanceof Cleaner is executed, OOME is thrown as a consequence of full heap while loading and initializing the Cleaner class. Yes - I was assuming that this would not fail the very first time and so the Cleaner class would already be loaded. Failing to be able to load the Cleaner class was one of the potential issues flagged earlier with this problem. I was actually assuming that Cleaner would be loaded already due to some actual Cleaner subclasses being used, but this does not happen as part of the default initialization. :( The irony being that if the Cleaner class is not loaded then r can not be an instance of Cleaner and so we would fail to load the class in a case where we didn't need it anyway. What I wanted to focus on here was an OOME from the instanceof itself, but as you say that might trigger classloading of Cleaner (which is not what I was interested in). The 2nd time the instanceof Cleaner is executed after such OOME, the same line would throw NoClassDefFoundError as a consequence of referencing a class that failed initialization. Am I right? instanceof is not one of the class initialization triggers, so we should not see an OOME generated due to a class initialization exception and so the class will not be put into the Erroneous state and so subsequent attempts to use the class will not automatically trigger NoClassdefFoundError. If OOME occurs during actual loading/linking of the class Cleaner it is unclear what would happen on subsequent attempts. OOME is not a LinkageError that must be rethrown on subsequent attempts, and it is potentially a transient condition, so I would expect a re-load attempt to be allowed. However we are now deep into the details of the VM and it may well depend on the exact place from which the OOME originates. The bottom line with the current problem is that there are multiple non-obvious paths by which the ReferenceHandler can encounter an OOME. In such cases we do not want the ReferenceHandler to terminate - which implies catching the OOME and continuing. However we also do not want to silently skip Cleaner processing or reference queue processing - as that would lead to hard to diagnoze bugs. But trying to report the problem may not be possible due to being out-of-memory. It may be that we need to break things up into multiple try/catch blocks, where each catch does a System.gc() and then reports that the OOME occurred. Of course the reporting must still be in a try/catch for the OOME. Though at some point letting the ReferenceHandler die may be the only way to report a major memory problem. David Hm... If I give -verbose:class option to run a simple test program: public class Test { public static void main(String... a) {} } I see Cleaner class being loaded before Test class. I don't see by which tread or if it might get loaded after main() starts, but I suspect that loading of Cleaner is not a problem here. Initialization of Cleaner class is not performed by ReferenceHandler thread as you pointed out. The instanceof does not trigger it and if it returns true then Cleaner has already been initialized. So there must be some other cause for instanceof throwing OOME... What do you say about this variant of ReferenceHandler.run() method: public void run() { for (;;) { Reference r; Cleaner c; synchronized (lock) { r = pending; if (r != null) { // instanceof operator might throw OOME sometimes. Just retry after // yielding - might have better luck next time... try { c = r instanceof Cleaner ? (Cleaner) r : null; } catch (OutOfMemoryError x) { Thread.yield(); continue; } pending = r.discovered;
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/21/2014 03:22 AM, David Holmes wrote: Hi Peter, I do not see Cleaner being loaded prior to the main class on either Windows or Linux. Which platform are you on? Did you see it loaded before the main class or as part of executing it? Before. The main class is empty: public class Test { public static void main(String... a) {} } Here's last few lines of -verbose:class: [Loaded java.util.TimeZone from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfo from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInput from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.DataInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded sun.misc.Cleaner from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]* [Loaded java.io.ByteArrayInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$ZoneOffsetTransitionRule from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.zip.Checksum from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.zip.CRC32 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.ZoneInfoFile$Checksum from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.TimeZone$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.CalendarDate from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.BaseCalendar$Date from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.Gregorian$Date from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.util.calendar.CalendarUtils from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.jar.JarEntry from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.jar.JarFile$JarFileEntry from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.zip.ZipFile$ZipFileInputStream from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.AbstractSequentialList from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.LinkedList from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.util.LinkedList$Node from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.security.PrivilegedActionException from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.misc.URLClassPath$FileLoader from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.misc.Resource from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.misc.URLClassPath$FileLoader$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.nio.ByteBuffered from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.security.PermissionCollection from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.security.Permissions from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.net.URLConnection from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.net.www.URLConnection from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.net.www.protocol.file.FileURLConnection from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded sun.net.www.MessageHeader from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.FilePermission from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.FilePermission$1 from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.io.FilePermissionCollection from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.security.AllPermission from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.security.UnresolvedPermission from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.security.BasicPermissionCollection from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] *[Loaded Test from file:/tmp/]* [Loaded sun.launcher.LauncherHelper$FXHelper from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.lang.Shutdown from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] [Loaded java.lang.Shutdown$Lock from /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar] I'm on linux, 64bit and using official EA build 121 of JDK 8... But if I try with JDK 7u45, I don't see it. So perhaps it would be good to trigger Cleaner loading and initialization as part of ReferenceHandler initialization to play things safe. Also, it is not that I think ReferenceHandler is responsible for reporting OOME, but that it is responsible for reporting that it was unable to perform a clean or enqueue because of OOME. This would be necessary if we
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter, On 17/01/2014 11:24 PM, Peter Levart wrote: On 01/17/2014 02:13 PM, Peter Levart wrote: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Well, no. The above code would just skip Cleaner processing in this situation. And will never be doing it again after the heap is freed... So it might be good to load and initialize Cleaner class as part of ReferenceHandler initialization to ensure correct operation... Well, yes and no. Let me try once more: Above code will skip Cleaner processing if the 1st time instanceof Cleaner is executed, OOME is thrown as a consequence of full heap while loading and initializing the Cleaner class. Yes - I was assuming that this would not fail the very first time and so the Cleaner class would already be loaded. Failing to be able to load the Cleaner class was one of the potential issues flagged earlier with this problem. I was actually assuming that Cleaner would be loaded already due to some actual Cleaner subclasses being used, but this does not happen as part of the default initialization. :( The irony being that if the Cleaner class is not loaded then r can not be an instance of Cleaner and so we would fail to load the class in a case where we didn't need it anyway. What I wanted to focus on here was an OOME from the instanceof itself, but as you say that might trigger classloading of Cleaner (which is not what I was interested in). The 2nd time the instanceof Cleaner is executed after such OOME, the same line would throw NoClassDefFoundError as a consequence of referencing a class that failed initialization. Am I right? instanceof is not one of the class initialization triggers, so we should not see an OOME generated due to a class initialization exception and so the class will not be put into the Erroneous state and so subsequent attempts to use the class will not automatically trigger NoClassdefFoundError. If OOME occurs during actual loading/linking of the class Cleaner it is unclear what would happen on subsequent attempts. OOME is not a LinkageError that must be rethrown on subsequent attempts, and it is potentially a transient condition, so I would expect a re-load attempt to be allowed. However we are now deep into the details of the VM and it may well depend on the exact place from which the OOME originates. The bottom line with the current problem is that there are multiple non-obvious paths by which the ReferenceHandler can encounter an OOME. In such cases we do not want the ReferenceHandler to terminate - which implies catching the OOME and continuing. However we also do not want to silently skip Cleaner processing or reference queue processing - as that would lead to hard to diagnoze bugs. But trying to report the problem may not be possible due to being out-of-memory. It may be that we need to break things up into multiple try/catch blocks, where each catch does a System.gc() and then reports that the OOME occurred. Of course the reporting must still be in a try/catch for the OOME. Though at some point letting the ReferenceHandler die may be the only way to report a major memory problem. David David Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/17/2014 05:38 AM, David Holmes wrote: On 17/01/2014 1:31 PM, srikalyan chandrashekar wrote: Hi David, the disassembled code is also attached to the bug. Per my Sorry missed that. analysis the exception was thrown when Reference Handler was on line 143 as put in the earlier email. But if the numbers in the dissassembly match the BCI then 65 shows: 65: instanceof#11 // class sun/misc/Cleaner which makes more sense, the runtime instanceof check might encounter an OOME condition. I wish there was some easy way to trace into the full call chain as TraceExceptions doesn't show you any runtime frames :( Still, it is easy enough to check: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Perhaps we should pre-load and initialize the Cleaner class as part of ReferenceHandler initialization... Regards, Peter Thanks, David -- Thanks kalyan On 1/16/14 6:16 PM, David Holmes wrote: On 17/01/2014 4:48 AM, srikalyan wrote: Hi David On 1/15/14, 9:04 PM, David Holmes wrote: On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote: Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. All I can see is the log for the OOMECatchingTest program not one for the actual ReferenceHandler ?? Please search for ReferenceHandler in the log. Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Sorry but the GC does not trigger asynchronous exceptions so this explanation does not make any sense to me. What part of the log led you to this conclusion? -- Log Excerpt begins -- Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 168] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c600} 'runImpl' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 65 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c478} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 1 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddcaaf90} 'uncaughtException' '(Ljava/lang/Thread;Ljava/lang/Throwable;)V' in ' at bci 48 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddca7298} 'dispatchUncaughtException' '(Ljava/lang/Throwable;)V' in 'java/lang/ at bci 6 for thread 0x7feed80cf800 -- Log Excerpt ends -- Sorry if it is a wrong understanding. What you are seeing there is an OOME escaping the run() method which will cause the uncaughtExceptionHandler to be run which then triggers a second OOME (likely as it tries to report information about the first OOME). The first exception occurred in runImpl at BCI 65. Can you disassemble (javap -c) the class you used so we can see what is at BCI 65. Thanks, David Suggested fix: - As proposed earlier putting an outer guard(try-catch on OOME) in the ReferenceHandler will fix the issue, if ReferenceHandler is considered as part of the GC sub system then it should be alive even in the midst of an OOME so i feel that the additional guard should be allowed, however i might still be ignorant of vital implications. - Apart
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/17/2014 02:00 PM, Peter Levart wrote: On 01/17/2014 05:38 AM, David Holmes wrote: On 17/01/2014 1:31 PM, srikalyan chandrashekar wrote: Hi David, the disassembled code is also attached to the bug. Per my Sorry missed that. analysis the exception was thrown when Reference Handler was on line 143 as put in the earlier email. But if the numbers in the dissassembly match the BCI then 65 shows: 65: instanceof#11 // class sun/misc/Cleaner which makes more sense, the runtime instanceof check might encounter an OOME condition. I wish there was some easy way to trace into the full call chain as TraceExceptions doesn't show you any runtime frames :( Still, it is easy enough to check: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Well, no. The above code would just skip Cleaner processing in this situation. And will never be doing it again after the heap is freed... So it might be good to load and initialize Cleaner class as part of ReferenceHandler initialization to ensure correct operation... Peter Perhaps we should pre-load and initialize the Cleaner class as part of ReferenceHandler initialization... Regards, Peter Thanks, David -- Thanks kalyan On 1/16/14 6:16 PM, David Holmes wrote: On 17/01/2014 4:48 AM, srikalyan wrote: Hi David On 1/15/14, 9:04 PM, David Holmes wrote: On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote: Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. All I can see is the log for the OOMECatchingTest program not one for the actual ReferenceHandler ?? Please search for ReferenceHandler in the log. Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Sorry but the GC does not trigger asynchronous exceptions so this explanation does not make any sense to me. What part of the log led you to this conclusion? -- Log Excerpt begins -- Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 168] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c600} 'runImpl' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 65 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c478} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 1 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddcaaf90} 'uncaughtException' '(Ljava/lang/Thread;Ljava/lang/Throwable;)V' in ' at bci 48 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddca7298} 'dispatchUncaughtException' '(Ljava/lang/Throwable;)V' in 'java/lang/ at bci 6 for thread 0x7feed80cf800 -- Log Excerpt ends -- Sorry if it is a wrong understanding. What you are seeing there is an OOME escaping the run() method which will cause the uncaughtExceptionHandler to be run which then triggers a second OOME (likely as it tries to report information about the first OOME). The first exception occurred in runImpl at BCI 65. Can you disassemble (javap -c) the class you used so we can see what is at BCI 65. Thanks, David Suggested fix: - As
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/17/2014 02:13 PM, Peter Levart wrote: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Hi David, Kalyan, I've caught-up now. Just thinking: is instanceof Cleaner throwing OOME as a result of loading the Cleaner class? Wouldn't the above code then throw some error also in ((Cleaner)r) - the checkcast, since Cleaner class would not be successfully initialized? Well, no. The above code would just skip Cleaner processing in this situation. And will never be doing it again after the heap is freed... So it might be good to load and initialize Cleaner class as part of ReferenceHandler initialization to ensure correct operation... Well, yes and no. Let me try once more: Above code will skip Cleaner processing if the 1st time instanceof Cleaner is executed, OOME is thrown as a consequence of full heap while loading and initializing the Cleaner class. The 2nd time the instanceof Cleaner is executed after such OOME, the same line would throw NoClassDefFoundError as a consequence of referencing a class that failed initialization. Am I right? Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi David On 1/15/14, 9:04 PM, David Holmes wrote: On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote: Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. All I can see is the log for the OOMECatchingTest program not one for the actual ReferenceHandler ?? Please search for ReferenceHandler in the log. Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Sorry but the GC does not trigger asynchronous exceptions so this explanation does not make any sense to me. What part of the log led you to this conclusion? -- Log Excerpt begins -- Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 168] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c600} 'runImpl' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 65 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c478} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 1 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddcaaf90} 'uncaughtException' '(Ljava/lang/Thread;Ljava/lang/Throwable;)V' in ' at bci 48 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddca7298} 'dispatchUncaughtException' '(Ljava/lang/Throwable;)V' in 'java/lang/ at bci 6 for thread 0x7feed80cf800 -- Log Excerpt ends -- Sorry if it is a wrong understanding. Suggested fix: - As proposed earlier putting an outer guard(try-catch on OOME) in the ReferenceHandler will fix the issue, if ReferenceHandler is considered as part of the GC sub system then it should be alive even in the midst of an OOME so i feel that the additional guard should be allowed, however i might still be ignorant of vital implications. - Apart from the above changes, Peter's suggestion to create and call a private runImpl() from run() in ReferenceHandler makes sense to me. Why would we need this? David - --- Thanks kalyan On 01/13/2014 03:57 PM, srikalyan wrote: On 1/11/14, 6:15 AM, Peter Levart wrote: On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote: Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), Did you run it with or without fastedbug -XX:+TraceExceptions ? If with, it might be that fastdebug and/or -XX:+TraceExceptions changes the execution a bit so that we can no longer reproduce the wrong behaviour. With fastdebug -XX:TraceExceptions. I will try combination of possible options(i.e without -XX:TraceEception on debug build etc) soon. even if there were to be an error you cannot print the string of thread to console(these have been attempted earlier). ...it has been attempted to print toString in uncaught exception handler. At that time, the heap is still full. I'm printing it after the GC has cleared the heap. You can try that it works by commenting out the try { and corresponding } catch (OOME x) {} exception handler... Since there is a GC call prior to printing string i will give that a shot with non-debug build. - The test's running on interpreter mode, what i am watching for is one error with trace. Without fastdebug build and -XX:+TraceExceptions i am able to reproduce failure atleast 5 failures out of 1000 runs but with fastdebug+Trace no luck yet(already past few 1000 runs). It might be interesting to try with fastebug build but without the -XX:+TraceExceptions option to see what has an effect on it. It might also be interesting to try the modified ReferenceHandler (the
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi David, the disassembled code is also attached to the bug. Per my analysis the exception was thrown when Reference Handler was on line 143 as put in the earlier email. -- Thanks kalyan On 1/16/14 6:16 PM, David Holmes wrote: On 17/01/2014 4:48 AM, srikalyan wrote: Hi David On 1/15/14, 9:04 PM, David Holmes wrote: On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote: Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. All I can see is the log for the OOMECatchingTest program not one for the actual ReferenceHandler ?? Please search for ReferenceHandler in the log. Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Sorry but the GC does not trigger asynchronous exceptions so this explanation does not make any sense to me. What part of the log led you to this conclusion? -- Log Excerpt begins -- Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 168] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c600} 'runImpl' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 65 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c478} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 1 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddcaaf90} 'uncaughtException' '(Ljava/lang/Thread;Ljava/lang/Throwable;)V' in ' at bci 48 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddca7298} 'dispatchUncaughtException' '(Ljava/lang/Throwable;)V' in 'java/lang/ at bci 6 for thread 0x7feed80cf800 -- Log Excerpt ends -- Sorry if it is a wrong understanding. What you are seeing there is an OOME escaping the run() method which will cause the uncaughtExceptionHandler to be run which then triggers a second OOME (likely as it tries to report information about the first OOME). The first exception occurred in runImpl at BCI 65. Can you disassemble (javap -c) the class you used so we can see what is at BCI 65. Thanks, David Suggested fix: - As proposed earlier putting an outer guard(try-catch on OOME) in the ReferenceHandler will fix the issue, if ReferenceHandler is considered as part of the GC sub system then it should be alive even in the midst of an OOME so i feel that the additional guard should be allowed, however i might still be ignorant of vital implications. - Apart from the above changes, Peter's suggestion to create and call a private runImpl() from run() in ReferenceHandler makes sense to me. Why would we need this? David - --- Thanks kalyan On 01/13/2014 03:57 PM, srikalyan wrote: On 1/11/14, 6:15 AM, Peter Levart wrote: On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote: Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), Did you run it with or without fastedbug -XX:+TraceExceptions ? If with, it might be that fastdebug and/or -XX:+TraceExceptions changes the execution a bit so that we can no longer reproduce the wrong behaviour. With fastdebug -XX:TraceExceptions. I will try combination of possible options(i.e without -XX:TraceEception on debug build etc) soon. even if there were to be an error you cannot print the string of thread to console(these have been attempted earlier). ...it has been attempted to print toString in uncaught exception handler. At that time, the heap is still full. I'm printing it after the GC has cleared the heap. You can try that it works by
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 17/01/2014 1:31 PM, srikalyan chandrashekar wrote: Hi David, the disassembled code is also attached to the bug. Per my Sorry missed that. analysis the exception was thrown when Reference Handler was on line 143 as put in the earlier email. But if the numbers in the dissassembly match the BCI then 65 shows: 65: instanceof#11 // class sun/misc/Cleaner which makes more sense, the runtime instanceof check might encounter an OOME condition. I wish there was some easy way to trace into the full call chain as TraceExceptions doesn't show you any runtime frames :( Still, it is easy enough to check: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } if (isCleaner) { ((Cleaner)r).clean(); continue; } Thanks, David -- Thanks kalyan On 1/16/14 6:16 PM, David Holmes wrote: On 17/01/2014 4:48 AM, srikalyan wrote: Hi David On 1/15/14, 9:04 PM, David Holmes wrote: On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote: Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. All I can see is the log for the OOMECatchingTest program not one for the actual ReferenceHandler ?? Please search for ReferenceHandler in the log. Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Sorry but the GC does not trigger asynchronous exceptions so this explanation does not make any sense to me. What part of the log led you to this conclusion? -- Log Excerpt begins -- Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 168] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c600} 'runImpl' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 65 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c478} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 1 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddcaaf90} 'uncaughtException' '(Ljava/lang/Thread;Ljava/lang/Throwable;)V' in ' at bci 48 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddca7298} 'dispatchUncaughtException' '(Ljava/lang/Throwable;)V' in 'java/lang/ at bci 6 for thread 0x7feed80cf800 -- Log Excerpt ends -- Sorry if it is a wrong understanding. What you are seeing there is an OOME escaping the run() method which will cause the uncaughtExceptionHandler to be run which then triggers a second OOME (likely as it tries to report information about the first OOME). The first exception occurred in runImpl at BCI 65. Can you disassemble (javap -c) the class you used so we can see what is at BCI 65. Thanks, David Suggested fix: - As proposed earlier putting an outer guard(try-catch on OOME) in the ReferenceHandler will fix the issue, if ReferenceHandler is considered as part of the GC sub system then it should be alive even in the midst of an OOME so i feel that the additional guard should be allowed, however i might still be ignorant of vital implications. - Apart from the above changes, Peter's suggestion to create and call a private runImpl() from run() in ReferenceHandler makes sense to me. Why would we need this? David - --- Thanks kalyan On 01/13/2014 03:57 PM, srikalyan wrote: On 1/11/14, 6:15 AM, Peter Levart wrote: On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote: Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), Did you
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 1/16/14 8:38 PM, David Holmes wrote: On 17/01/2014 1:31 PM, srikalyan chandrashekar wrote: Hi David, the disassembled code is also attached to the bug. Per my Sorry missed that. analysis the exception was thrown when Reference Handler was on line 143 as put in the earlier email. But if the numbers in the dissassembly match the BCI then 65 shows: 65: instanceof#11 // class sun/misc/Cleaner which makes more sense, the runtime instanceof check might encounter an OOME condition. I wish there was some easy way to trace into the full call chain as TraceExceptions doesn't show you any runtime frames :( Still, it is easy enough to check: // Fast path for cleaners boolean isCleaner = false; try { isCleaner = r instanceof Cleaner; } catch (OutofMemoryError oome) { continue; } Will get this into build and give a shot soon, in the log if you see bci 6 and bci 48 are where dispatch and uncaught exceptions are raised(please correct me if i am wrong), i assumed its from ReferenceHandler thread as it says the same thread Id 0x7feed80cf800. if (isCleaner) { ((Cleaner)r).clean(); continue; } Thanks, David -- Thanks kalyan On 1/16/14 6:16 PM, David Holmes wrote: On 17/01/2014 4:48 AM, srikalyan wrote: Hi David On 1/15/14, 9:04 PM, David Holmes wrote: On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote: Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. All I can see is the log for the OOMECatchingTest program not one for the actual ReferenceHandler ?? Please search for ReferenceHandler in the log. Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Sorry but the GC does not trigger asynchronous exceptions so this explanation does not make any sense to me. What part of the log led you to this conclusion? -- Log Excerpt begins -- Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 168] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c600} 'runImpl' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 65 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff7808e8) thrown in interpreter method {method} {0x7feeddd3c478} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at bci 1 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddcaaf90} 'uncaughtException' '(Ljava/lang/Thread;Ljava/lang/Throwable;)V' in ' at bci 48 for thread 0x7feed80cf800 Exception a 'java/lang/OutOfMemoryError' (0xff780868) thrown in interpreter method {method} {0x7feeddca7298} 'dispatchUncaughtException' '(Ljava/lang/Throwable;)V' in 'java/lang/ at bci 6 for thread 0x7feed80cf800 -- Log Excerpt ends -- Sorry if it is a wrong understanding. What you are seeing there is an OOME escaping the run() method which will cause the uncaughtExceptionHandler to be run which then triggers a second OOME (likely as it tries to report information about the first OOME). The first exception occurred in runImpl at BCI 65. Can you disassemble (javap -c) the class you used so we can see what is at BCI 65. Thanks, David Suggested fix: - As proposed earlier putting an outer guard(try-catch on OOME) in the ReferenceHandler will fix the issue, if ReferenceHandler is considered as part of the GC sub system then it should be alive even in the midst of an OOME so i feel that the additional guard should be allowed, however i might still be ignorant of vital implications. - Apart from the above changes, Peter's suggestion to create and call a private runImpl() from run() in ReferenceHandler makes sense to me. Why
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Suggested fix: - As proposed earlier putting an outer guard(try-catch on OOME) in the ReferenceHandler will fix the issue, if ReferenceHandler is considered as part of the GC sub system then it should be alive even in the midst of an OOME so i feel that the additional guard should be allowed, however i might still be ignorant of vital implications. - Apart from the above changes, Peter's suggestion to create and call a private runImpl() from run() in ReferenceHandler makes sense to me. --- Thanks kalyan On 01/13/2014 03:57 PM, srikalyan wrote: On 1/11/14, 6:15 AM, Peter Levart wrote: On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote: Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), Did you run it with or without fastedbug -XX:+TraceExceptions ? If with, it might be that fastdebug and/or -XX:+TraceExceptions changes the execution a bit so that we can no longer reproduce the wrong behaviour. With fastdebug -XX:TraceExceptions. I will try combination of possible options(i.e without -XX:TraceEception on debug build etc) soon. even if there were to be an error you cannot print the string of thread to console(these have been attempted earlier). ...it has been attempted to print toString in uncaught exception handler. At that time, the heap is still full. I'm printing it after the GC has cleared the heap. You can try that it works by commenting out the try { and corresponding } catch (OOME x) {} exception handler... Since there is a GC call prior to printing string i will give that a shot with non-debug build. - The test's running on interpreter mode, what i am watching for is one error with trace. Without fastdebug build and -XX:+TraceExceptions i am able to reproduce failure atleast 5 failures out of 1000 runs but with fastdebug+Trace no luck yet(already past few 1000 runs). It might be interesting to try with fastebug build but without the -XX:+TraceExceptions option to see what has an effect on it. It might also be interesting to try the modified ReferenceHandler (the one with private runImpl() method called from run()) and with normal non-fastdebug JDK. This info might be useful when one starts to inspect the exception handling code in interpreter... Regards, Peter -- Thanks kalyan Ph: (408)-585-8040 --- Thanks kalyan On 01/10/2014 02:57 AM, Peter Levart wrote: On 01/10/2014 09:31 AM, Peter Levart wrote: Since we suspect there's something wrong with exception handling in interpreter, I devised a hypothetical reproducer that tries to simulate ReferenceHandler in many aspects, but doesn't require to be a ReferenceHandler: http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java This is designed to run indefinitely and only terminate if/when thread dies. Could you run this program in the environment that causes the OOMEInReferenceHandler test to fail and see if it terminates? I forgot to mention that in order for this long-running program to exhibit interpreter behaviour, it should be run with -Xint option. So I suggest: -Xmx24M -XX:-UseTLAB -Xint Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote: Hi Peter/David, we could finally get a trace of exception with fastdebug build and ReferenceHandler modified (with runImpl() added and called from run()). The logs, disassembled code is available in JIRA https://bugs.openjdk.java.net/browse/JDK-8022321 as attachments. All I can see is the log for the OOMECatchingTest program not one for the actual ReferenceHandler ?? Observations from the log: Root Cause: 1) UncaughtException is being dispatched from Reference.java:143 141 ReferenceObject r; 142 synchronized (lock) { 143if (pending != null) { 144r = pending; 145pending = r.discovered; 146r.discovered = null; pending field in Reference is touched and updated by the collector, so at line 143 when the execution context is in Reference handler there might have been an Exception pending due to allocation done by collector which causes ReferenceHandler thread to die. Sorry but the GC does not trigger asynchronous exceptions so this explanation does not make any sense to me. What part of the log led you to this conclusion? Suggested fix: - As proposed earlier putting an outer guard(try-catch on OOME) in the ReferenceHandler will fix the issue, if ReferenceHandler is considered as part of the GC sub system then it should be alive even in the midst of an OOME so i feel that the additional guard should be allowed, however i might still be ignorant of vital implications. - Apart from the above changes, Peter's suggestion to create and call a private runImpl() from run() in ReferenceHandler makes sense to me. Why would we need this? David - --- Thanks kalyan On 01/13/2014 03:57 PM, srikalyan wrote: On 1/11/14, 6:15 AM, Peter Levart wrote: On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote: Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), Did you run it with or without fastedbug -XX:+TraceExceptions ? If with, it might be that fastdebug and/or -XX:+TraceExceptions changes the execution a bit so that we can no longer reproduce the wrong behaviour. With fastdebug -XX:TraceExceptions. I will try combination of possible options(i.e without -XX:TraceEception on debug build etc) soon. even if there were to be an error you cannot print the string of thread to console(these have been attempted earlier). ...it has been attempted to print toString in uncaught exception handler. At that time, the heap is still full. I'm printing it after the GC has cleared the heap. You can try that it works by commenting out the try { and corresponding } catch (OOME x) {} exception handler... Since there is a GC call prior to printing string i will give that a shot with non-debug build. - The test's running on interpreter mode, what i am watching for is one error with trace. Without fastdebug build and -XX:+TraceExceptions i am able to reproduce failure atleast 5 failures out of 1000 runs but with fastdebug+Trace no luck yet(already past few 1000 runs). It might be interesting to try with fastebug build but without the -XX:+TraceExceptions option to see what has an effect on it. It might also be interesting to try the modified ReferenceHandler (the one with private runImpl() method called from run()) and with normal non-fastdebug JDK. This info might be useful when one starts to inspect the exception handling code in interpreter... Regards, Peter -- Thanks kalyan Ph: (408)-585-8040 --- Thanks kalyan On 01/10/2014 02:57 AM, Peter Levart wrote: On 01/10/2014 09:31 AM, Peter Levart wrote: Since we suspect there's something wrong with exception handling in interpreter, I devised a hypothetical reproducer that tries to simulate ReferenceHandler in many aspects, but doesn't require to be a ReferenceHandler: http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java This is designed to run indefinitely and only terminate if/when thread dies. Could you run this program in the environment that causes the OOMEInReferenceHandler test to fail and see if it terminates? I forgot to mention that in order for this long-running program to exhibit interpreter behaviour, it should be run with -Xint option. So I suggest: -Xmx24M -XX:-UseTLAB -Xint Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 1/11/14, 6:15 AM, Peter Levart wrote: On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote: Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), Did you run it with or without fastedbug -XX:+TraceExceptions ? If with, it might be that fastdebug and/or -XX:+TraceExceptions changes the execution a bit so that we can no longer reproduce the wrong behaviour. With fastdebug -XX:TraceExceptions. I will try combination of possible options(i.e without -XX:TraceEception on debug build etc) soon. even if there were to be an error you cannot print the string of thread to console(these have been attempted earlier). ...it has been attempted to print toString in uncaught exception handler. At that time, the heap is still full. I'm printing it after the GC has cleared the heap. You can try that it works by commenting out the try { and corresponding } catch (OOME x) {} exception handler... Since there is a GC call prior to printing string i will give that a shot with non-debug build. - The test's running on interpreter mode, what i am watching for is one error with trace. Without fastdebug build and -XX:+TraceExceptions i am able to reproduce failure atleast 5 failures out of 1000 runs but with fastdebug+Trace no luck yet(already past few 1000 runs). It might be interesting to try with fastebug build but without the -XX:+TraceExceptions option to see what has an effect on it. It might also be interesting to try the modified ReferenceHandler (the one with private runImpl() method called from run()) and with normal non-fastdebug JDK. This info might be useful when one starts to inspect the exception handling code in interpreter... Regards, Peter -- Thanks kalyan Ph: (408)-585-8040 --- Thanks kalyan On 01/10/2014 02:57 AM, Peter Levart wrote: On 01/10/2014 09:31 AM, Peter Levart wrote: Since we suspect there's something wrong with exception handling in interpreter, I devised a hypothetical reproducer that tries to simulate ReferenceHandler in many aspects, but doesn't require to be a ReferenceHandler: http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java This is designed to run indefinitely and only terminate if/when thread dies. Could you run this program in the environment that causes the OOMEInReferenceHandler test to fail and see if it terminates? I forgot to mention that in order for this long-running program to exhibit interpreter behaviour, it should be run with -Xint option. So I suggest: -Xmx24M -XX:-UseTLAB -Xint Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote: Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), Did you run it with or without fastedbug -XX:+TraceExceptions ? If with, it might be that fastdebug and/or -XX:+TraceExceptions changes the execution a bit so that we can no longer reproduce the wrong behaviour. even if there were to be an error you cannot print the string of thread to console(these have been attempted earlier). ...it has been attempted to print toString in uncaught exception handler. At that time, the heap is still full. I'm printing it after the GC has cleared the heap. You can try that it works by commenting out the try { and corresponding } catch (OOME x) {} exception handler... - The test's running on interpreter mode, what i am watching for is one error with trace. Without fastdebug build and -XX:+TraceExceptions i am able to reproduce failure atleast 5 failures out of 1000 runs but with fastdebug+Trace no luck yet(already past few 1000 runs). It might be interesting to try with fastebug build but without the -XX:+TraceExceptions option to see what has an effect on it. It might also be interesting to try the modified ReferenceHandler (the one with private runImpl() method called from run()) and with normal non-fastdebug JDK. This info might be useful when one starts to inspect the exception handling code in interpreter... Regards, Peter --- Thanks kalyan On 01/10/2014 02:57 AM, Peter Levart wrote: On 01/10/2014 09:31 AM, Peter Levart wrote: Since we suspect there's something wrong with exception handling in interpreter, I devised a hypothetical reproducer that tries to simulate ReferenceHandler in many aspects, but doesn't require to be a ReferenceHandler: http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java This is designed to run indefinitely and only terminate if/when thread dies. Could you run this program in the environment that causes the OOMEInReferenceHandler test to fail and see if it terminates? I forgot to mention that in order for this long-running program to exhibit interpreter behaviour, it should be run with -Xint option. So I suggest: -Xmx24M -XX:-UseTLAB -Xint Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/10/2014 12:59 AM, srikalyan chandrashekar wrote: David/Peter you are right, the logs trace came from passed run, i am trying to simulate the failure and get the logs for failed runs(2000+ runs done and still no failure), will get back to you once i have the data from failed run. Sorry for the confusion. I doubt the logs will be any different. A simple test that throws an exception inside Thread.run() without catching it shows that TraceExceptions doesn't report the fact that Thread.run() terminates abruptly (as David pointed out, pending exception is reported after every bytecode executed and there's no bytecode that invoked Thread.run()). While you're at it, testing, could you also test the modified ReferenceHandler (the one that calls private runImpl() from it's run() method) so that we get a proof of incorrect behaviour. Since we suspect there's something wrong with exception handling in interpreter, I devised a hypothetical reproducer that tries to simulate ReferenceHandler in many aspects, but doesn't require to be a ReferenceHandler: http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java This is designed to run indefinitely and only terminate if/when thread dies. Could you run this program in the environment that causes the OOMEInReferenceHandler test to fail and see if it terminates? Regards, Peter --- Thanks kalyan On 01/08/2014 11:22 PM, David Holmes wrote: Thanks Peter. Kalyan: Can you confirm, as Peter asked, that the TraceExceptions output came from a failed run? AFAICS the Trace info is printed after each bytecode where there is a pending exception - though I'm not 100% sure on the printing within the VM runtime. Based on that I think we see the Trace output in run() at the point where wait() returns, so it may well be caught after that - in which case this was not a failing run. I also can't reproduce the problem :( David On 8/01/2014 10:34 PM, Peter Levart wrote: On 01/08/2014 07:30 AM, David Holmes wrote: On 8/01/2014 4:19 PM, David Holmes wrote: On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote: Hi David, TraceExceptions with fastdebug build produced some nice trace http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . The native method wait(long) is where the OOME if being thrown, the deepest call is in src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157 Yes but it is the caller that is of interest: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 The ReferenceHandler thread gets the OOME trying to allocate the InterruptedException. However we already have a catch block around the wait() so how is this OOME getting through? A bug in exception handling in the interpreter ?? Might be. And it may have something to do with the fact that the Thread.run() method is the 1st call frame on the thread's stack (seems like corner case). The last few meaningful TraceExceptions records are: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ca8} 'wait' '()V' in 'java/lang/Object' at *bci 2* for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b48d2250} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at *bci 36* for thread 0x7f78c40d2800 Here's the relevant bytecodes: public class java.lang.Object public final void wait() throws java.lang.InterruptedException; descriptor: ()V flags: ACC_PUBLIC, ACC_FINAL Code: stack=3, locals=1, args_size=1 0: aload_0 1: lconst_0 * 2: invokevirtual #73 // Method wait:(J)V* 5: return LineNumberTable: line 502: 0 line 503: 5 Exceptions: throws java.lang.InterruptedException class java.lang.ref.Reference$ReferenceHandler extends java.lang.Thread public void run(); descriptor: ()V
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/10/2014 09:31 AM, Peter Levart wrote: Since we suspect there's something wrong with exception handling in interpreter, I devised a hypothetical reproducer that tries to simulate ReferenceHandler in many aspects, but doesn't require to be a ReferenceHandler: http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java This is designed to run indefinitely and only terminate if/when thread dies. Could you run this program in the environment that causes the OOMEInReferenceHandler test to fail and see if it terminates? I forgot to mention that in order for this long-running program to exhibit interpreter behaviour, it should be run with -Xint option. So I suggest: -Xmx24M -XX:-UseTLAB -Xint Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Peter the version you provided ran indefinitely(i put a 10 minute timeout) and the program got interrupted(no error), even if there were to be an error you cannot print the string of thread to console(these have been attempted earlier). - The test's running on interpreter mode, what i am watching for is one error with trace. Without fastdebug build and -XX:+TraceExceptions i am able to reproduce failure atleast 5 failures out of 1000 runs but with fastdebug+Trace no luck yet(already past few 1000 runs). --- Thanks kalyan On 01/10/2014 02:57 AM, Peter Levart wrote: On 01/10/2014 09:31 AM, Peter Levart wrote: Since we suspect there's something wrong with exception handling in interpreter, I devised a hypothetical reproducer that tries to simulate ReferenceHandler in many aspects, but doesn't require to be a ReferenceHandler: http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java This is designed to run indefinitely and only terminate if/when thread dies. Could you run this program in the environment that causes the OOMEInReferenceHandler test to fail and see if it terminates? I forgot to mention that in order for this long-running program to exhibit interpreter behaviour, it should be run with -Xint option. So I suggest: -Xmx24M -XX:-UseTLAB -Xint Regards, Peter
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
David/Peter you are right, the logs trace came from passed run, i am trying to simulate the failure and get the logs for failed runs(2000+ runs done and still no failure), will get back to you once i have the data from failed run. Sorry for the confusion. --- Thanks kalyan On 01/08/2014 11:22 PM, David Holmes wrote: Thanks Peter. Kalyan: Can you confirm, as Peter asked, that the TraceExceptions output came from a failed run? AFAICS the Trace info is printed after each bytecode where there is a pending exception - though I'm not 100% sure on the printing within the VM runtime. Based on that I think we see the Trace output in run() at the point where wait() returns, so it may well be caught after that - in which case this was not a failing run. I also can't reproduce the problem :( David On 8/01/2014 10:34 PM, Peter Levart wrote: On 01/08/2014 07:30 AM, David Holmes wrote: On 8/01/2014 4:19 PM, David Holmes wrote: On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote: Hi David, TraceExceptions with fastdebug build produced some nice trace http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . The native method wait(long) is where the OOME if being thrown, the deepest call is in src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157 Yes but it is the caller that is of interest: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 The ReferenceHandler thread gets the OOME trying to allocate the InterruptedException. However we already have a catch block around the wait() so how is this OOME getting through? A bug in exception handling in the interpreter ?? Might be. And it may have something to do with the fact that the Thread.run() method is the 1st call frame on the thread's stack (seems like corner case). The last few meaningful TraceExceptions records are: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ca8} 'wait' '()V' in 'java/lang/Object' at *bci 2* for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b48d2250} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at *bci 36* for thread 0x7f78c40d2800 Here's the relevant bytecodes: public class java.lang.Object public final void wait() throws java.lang.InterruptedException; descriptor: ()V flags: ACC_PUBLIC, ACC_FINAL Code: stack=3, locals=1, args_size=1 0: aload_0 1: lconst_0 * 2: invokevirtual #73 // Method wait:(J)V* 5: return LineNumberTable: line 502: 0 line 503: 5 Exceptions: throws java.lang.InterruptedException class java.lang.ref.Reference$ReferenceHandler extends java.lang.Thread public void run(); descriptor: ()V flags: ACC_PUBLIC Code: stack=2, locals=5, args_size=1 0: invokestatic #62 // Method java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock; 3: dup 4: astore_2 5: monitorenter 6: invokestatic #61 // Method java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference; 9: ifnull33 12: invokestatic #61 // Method java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference; 15: astore_1 16: aload_1 17: invokestatic #64 // Method java/lang/ref/Reference.access$300:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 20: invokestatic #63 // Method java/lang/ref/Reference.access$202:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 23: pop 24: aload_1 25: aconst_null 26: invokestatic #65 // Method java/lang/ref/Reference.access$302:(Ljava/lang/ref/Reference;Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 29: pop
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/08/2014 07:30 AM, David Holmes wrote: On 8/01/2014 4:19 PM, David Holmes wrote: On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote: Hi David, TraceExceptions with fastdebug build produced some nice trace http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . The native method wait(long) is where the OOME if being thrown, the deepest call is in src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157 Yes but it is the caller that is of interest: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 The ReferenceHandler thread gets the OOME trying to allocate the InterruptedException. However we already have a catch block around the wait() so how is this OOME getting through? A bug in exception handling in the interpreter ?? Might be. And it may have something to do with the fact that the Thread.run() method is the 1st call frame on the thread's stack (seems like corner case). The last few meaningful TraceExceptions records are: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ca8} 'wait' '()V' in 'java/lang/Object' at *bci 2* for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b48d2250} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at *bci 36* for thread 0x7f78c40d2800 Here's the relevant bytecodes: public class java.lang.Object public final void wait() throws java.lang.InterruptedException; descriptor: ()V flags: ACC_PUBLIC, ACC_FINAL Code: stack=3, locals=1, args_size=1 0: aload_0 1: lconst_0 * 2: invokevirtual #73 // Method wait:(J)V* 5: return LineNumberTable: line 502: 0 line 503: 5 Exceptions: throws java.lang.InterruptedException class java.lang.ref.Reference$ReferenceHandler extends java.lang.Thread public void run(); descriptor: ()V flags: ACC_PUBLIC Code: stack=2, locals=5, args_size=1 0: invokestatic #62 // Method java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock; 3: dup 4: astore_2 5: monitorenter 6: invokestatic #61 // Method java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference; 9: ifnull33 12: invokestatic #61 // Method java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference; 15: astore_1 16: aload_1 17: invokestatic #64 // Method java/lang/ref/Reference.access$300:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 20: invokestatic #63 // Method java/lang/ref/Reference.access$202:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 23: pop 24: aload_1 25: aconst_null 26: invokestatic #65 // Method java/lang/ref/Reference.access$302:(Ljava/lang/ref/Reference;Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 29: pop 30: goto 52 33: invokestatic #62 // Method java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock; *36: invokevirtual #59 // Method java/lang/Object.wait:()V* 39: goto 43 42: astore_3 43: goto 47 46: astore_3 47: aload_2 48: monitorexit 49: goto 0 52: aload_2 53: monitorexit 54: goto 64 57: astore4 59: aload_2 60: monitorexit 61: aload 4 63: athrow 64: aload_1 65: instanceof#38 // class sun/misc/Cleaner 68: ifeq 81 71: aload_1 72: checkcast #38 // class sun/misc/Cleaner 75: invokevirtual #67 // Method
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Kal, Can you give access to Peter to the machine where you ran this test. Please send the details to him privately. Thanks, Sandeep On Jan 8, 2014, at 12:08 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Hi Peter, the jtreg test configuration is @run main/othervm -Xmx24M -XX:-UseTLAB OOMEInReferenceHandler. With this option you still have to run the test several times(like a 1000 runs) to capture 1(OR) more failures. Platform may not have an affect, however i used a 64 bit Ubuntu 12.04 LTS , 8GB, 2 core workstation and any JDK(7/8). --- Thanks kalyan On 01/08/2014 05:53 AM, Peter Levart wrote: Hi Kalyan, What hardware/OS/JVM and what JVM options are you using to reproduce this failure. I would really like to reproduce this myself, but all attempts on my PC have so far been unsuccessful. I might be able to get access to a machine that is similar to yours... Regards, Peter On 01/07/2014 09:55 PM, srikalyan chandrashekar wrote: Peter, getting state info out(to console or otherwise) from within Reference Handler's exceptions handlers have been unsuccessful. However David's suggestion produced some useful trace with fast debug build and could get some information , see the log here http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . --- Thanks kalyan On 01/07/2014 12:42 AM, Peter Levart wrote: On 01/07/2014 03:15 AM, srikalyan chandrashekar wrote: Sure David will give that a try, we have so far attempted to 1. Print state data(as per the test creator peter.levart's inputs), Hi Kalyan, Have you been able to reproduce the OOME in that set-up? What was the result? Regards, Peter 2. Use UEH(uncaught exception handler per Mandy's inputs) -- Thanks kalyan On 1/6/14 4:40 PM, David Holmes wrote: Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Thanks Peter. Kalyan: Can you confirm, as Peter asked, that the TraceExceptions output came from a failed run? AFAICS the Trace info is printed after each bytecode where there is a pending exception - though I'm not 100% sure on the printing within the VM runtime. Based on that I think we see the Trace output in run() at the point where wait() returns, so it may well be caught after that - in which case this was not a failing run. I also can't reproduce the problem :( David On 8/01/2014 10:34 PM, Peter Levart wrote: On 01/08/2014 07:30 AM, David Holmes wrote: On 8/01/2014 4:19 PM, David Holmes wrote: On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote: Hi David, TraceExceptions with fastdebug build produced some nice trace http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . The native method wait(long) is where the OOME if being thrown, the deepest call is in src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157 Yes but it is the caller that is of interest: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 The ReferenceHandler thread gets the OOME trying to allocate the InterruptedException. However we already have a catch block around the wait() so how is this OOME getting through? A bug in exception handling in the interpreter ?? Might be. And it may have something to do with the fact that the Thread.run() method is the 1st call frame on the thread's stack (seems like corner case). The last few meaningful TraceExceptions records are: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ca8} 'wait' '()V' in 'java/lang/Object' at *bci 2* for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b48d2250} 'run' '()V' in 'java/lang/ref/Reference$ReferenceHandler' at *bci 36* for thread 0x7f78c40d2800 Here's the relevant bytecodes: public class java.lang.Object public final void wait() throws java.lang.InterruptedException; descriptor: ()V flags: ACC_PUBLIC, ACC_FINAL Code: stack=3, locals=1, args_size=1 0: aload_0 1: lconst_0 * 2: invokevirtual #73 // Method wait:(J)V* 5: return LineNumberTable: line 502: 0 line 503: 5 Exceptions: throws java.lang.InterruptedException class java.lang.ref.Reference$ReferenceHandler extends java.lang.Thread public void run(); descriptor: ()V flags: ACC_PUBLIC Code: stack=2, locals=5, args_size=1 0: invokestatic #62 // Method java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock; 3: dup 4: astore_2 5: monitorenter 6: invokestatic #61 // Method java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference; 9: ifnull33 12: invokestatic #61 // Method java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference; 15: astore_1 16: aload_1 17: invokestatic #64 // Method java/lang/ref/Reference.access$300:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 20: invokestatic #63 // Method java/lang/ref/Reference.access$202:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 23: pop 24: aload_1 25: aconst_null 26: invokestatic #65 // Method java/lang/ref/Reference.access$302:(Ljava/lang/ref/Reference;Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 29: pop 30: goto 52 33: invokestatic #62 // Method java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock; *36: invokevirtual #59 // Method java/lang/Object.wait:()V* 39: goto 43 42: astore_3 43: goto 47 46: astore_3
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 01/07/2014 03:15 AM, srikalyan chandrashekar wrote: Sure David will give that a try, we have so far attempted to 1. Print state data(as per the test creator peter.levart's inputs), Hi Kalyan, Have you been able to reproduce the OOME in that set-up? What was the result? Regards, Peter 2. Use UEH(uncaught exception handler per Mandy's inputs) -- Thanks kalyan On 1/6/14 4:40 PM, David Holmes wrote: Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Peter, getting state info out(to console or otherwise) from within Reference Handler's exceptions handlers have been unsuccessful. However David's suggestion produced some useful trace with fast debug build and could get some information , see the log here http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . --- Thanks kalyan On 01/07/2014 12:42 AM, Peter Levart wrote: On 01/07/2014 03:15 AM, srikalyan chandrashekar wrote: Sure David will give that a try, we have so far attempted to 1. Print state data(as per the test creator peter.levart's inputs), Hi Kalyan, Have you been able to reproduce the OOME in that set-up? What was the result? Regards, Peter 2. Use UEH(uncaught exception handler per Mandy's inputs) -- Thanks kalyan On 1/6/14 4:40 PM, David Holmes wrote: Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi David, TraceExceptions with fastdebug build produced some nice trace http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . The native method wait(long) is where the OOME if being thrown, the deepest call is in src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157 --- Excerpt Begins - 147 if (!gc_overhead_limit_was_exceeded) { 148// -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support 149report_java_out_of_memory(Java heap space); 150 151if (JvmtiExport::should_post_resource_exhausted()) { 152 JvmtiExport::post_resource_exhausted( 153JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP, 154Java heap space); 155} 156 157THROW_OOP_0(Universe::out_of_memory_error_java_heap()); 158 } else { --- Excerpt Ends - Would be helpful if David/some one else in the team could explain the latent aspects/probable cause. --- Thanks kalyan On 01/06/2014 04:40 PM, David Holmes wrote: Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote: Hi David, TraceExceptions with fastdebug build produced some nice trace http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . The native method wait(long) is where the OOME if being thrown, the deepest call is in src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157 Yes but it is the caller that is of interest: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 The ReferenceHandler thread gets the OOME trying to allocate the InterruptedException. David - --- Excerpt Begins - 147 if (!gc_overhead_limit_was_exceeded) { 148// -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support 149report_java_out_of_memory(Java heap space); 150 151if (JvmtiExport::should_post_resource_exhausted()) { 152 JvmtiExport::post_resource_exhausted( 153JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP, 154Java heap space); 155} 156 157THROW_OOP_0(Universe::out_of_memory_error_java_heap()); 158 } else { --- Excerpt Ends - Would be helpful if David/some one else in the team could explain the latent aspects/probable cause. --- Thanks kalyan On 01/06/2014 04:40 PM, David Holmes wrote: Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount()
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 8/01/2014 4:19 PM, David Holmes wrote: On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote: Hi David, TraceExceptions with fastdebug build produced some nice trace http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log . The native method wait(long) is where the OOME if being thrown, the deepest call is in src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157 Yes but it is the caller that is of interest: Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, line 1649] for thread 0x7f78c40d2800 Exception a 'java/lang/OutOfMemoryError' (0xd6a01840) thrown in interpreter method {method} {0x7f78b4800ae0} 'wait' '(J)V' in 'java/lang/Object' at bci 0 for thread 0x7f78c40d2800 The ReferenceHandler thread gets the OOME trying to allocate the InterruptedException. However we already have a catch block around the wait() so how is this OOME getting through? A bug in exception handling in the interpreter ?? David David - --- Excerpt Begins - 147 if (!gc_overhead_limit_was_exceeded) { 148// -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support 149report_java_out_of_memory(Java heap space); 150 151if (JvmtiExport::should_post_resource_exhausted()) { 152 JvmtiExport::post_resource_exhausted( 153JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP, 154Java heap space); 155} 156 157THROW_OOP_0(Universe::out_of_memory_error_java_heap()); 158 } else { --- Excerpt Ends - Would be helpful if David/some one else in the team could explain the latent aspects/probable cause. --- Thanks kalyan On 01/06/2014 04:40 PM, David Holmes wrote: Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Sure David will give that a try, we have so far attempted to 1. Print state data(as per the test creator peter.levart's inputs), 2. Use UEH(uncaught exception handler per Mandy's inputs) -- Thanks kalyan On 1/6/14 4:40 PM, David Holmes wrote: Back from vacation ... On 20/12/2013 4:49 PM, David Holmes wrote: On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. Run a debug VM with -XX:+TraceExceptions to see where the OOME is triggered. David - David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Mandy, after some trials i could simulate the failure again (now with UEH in place), however the UEH now cannot print enough details as it also tries to allocate memory, when it does Thread.getName()(it internally creates a String object), printStackTrace() also creates new WrappedPrintStream object. See the following trace Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Reference Handler ERROR: java.lang.Exception: Reference Handler thread died. at OOMEInReferenceHandler.main(OOMEInReferenceHandler.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:744) Meanwhile i am trying looking around to actually print something useful without allocating any new memory. --- Thanks kalyan On 12/20/2013 01:00 PM, srikalyan wrote: Hi Mandy, yes I ran with JTreg to simulate the failure, i will try the UEH patch to see if it sheds some light and get back to you. Thanks for the direction :) -- Thanks kalyan Ph: (408)-585-8040 On 12/19/13, 8:33 PM, Mandy Chung wrote: Hi Srikalyan, Maybe you can get add an uncaught handler to see if you can get any information. I ran it for 1000 times but not able to duplicate the failure. Did you run it with jtreg (I didn't)? Below is the patch to install a thread's uncaught handler that you can take and try. diff --git a/test/java/lang/ref/OOMEInReferenceHandler.java b/test/java/lang/ref/OOMEInReferenceHand ler.java --- a/test/java/lang/ref/OOMEInReferenceHandler.java +++ b/test/java/lang/ref/OOMEInReferenceHandler.java @@ -51,6 +51,14 @@ return first; } + static class UEH implements Thread.UncaughtExceptionHandler { + public void uncaughtException(Thread t, Throwable e) { + System.err.println(ERROR: + t.getName() + exception + + e.getMessage()); + e.printStackTrace(); + } + } + public static void main(String[] args) throws Exception { // preinitialize the InterruptedException class so that the reference handler // does not die due to OOME when loading the class if it is the first use @@ -77,6 +85,8 @@ throw new IllegalStateException(Couldn't find Reference Handler thread.); } + referenceHandlerThread.setUncaughtExceptionHandler(new UEH()); + ReferenceQueueObject refQueue = new ReferenceQueue(); Object referent = new Object(); WeakReferenceObject weakRef = new WeakReference(referent, refQueue); On 12/19/2013 6:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2)
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 12/23/2013 2:02 PM, srikalyan chandrashekar wrote: Hi Mandy, after some trials i could simulate the failure again (now with UEH in place), however the UEH now cannot print enough details as it also tries to allocate memory, when it does Thread.getName()(it internally creates a String object), printStackTrace() also creates new WrappedPrintStream object. See the following trace That's what I later also thought that may run into after suggesting UEH and no object can be allocated at this point. It worths trying Peter's suggestion to override the modified version of Reference class with instrumentation and see what you will get. Mandy Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Reference Handler ERROR: java.lang.Exception: Reference Handler thread died. at OOMEInReferenceHandler.main(OOMEInReferenceHandler.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:744) Meanwhile i am trying looking around to actually print something useful without allocating any new memory. --- Thanks kalyan On 12/20/2013 01:00 PM, srikalyan wrote: Hi Mandy, yes I ran with JTreg to simulate the failure, i will try the UEH patch to see if it sheds some light and get back to you. Thanks for the direction :) -- Thanks kalyan Ph: (408)-585-8040 On 12/19/13, 8:33 PM, Mandy Chung wrote: Hi Srikalyan, Maybe you can get add an uncaught handler to see if you can get any information. I ran it for 1000 times but not able to duplicate the failure. Did you run it with jtreg (I didn't)? Below is the patch to install a thread's uncaught handler that you can take and try. diff --git a/test/java/lang/ref/OOMEInReferenceHandler.java b/test/java/lang/ref/OOMEInReferenceHand ler.java --- a/test/java/lang/ref/OOMEInReferenceHandler.java +++ b/test/java/lang/ref/OOMEInReferenceHandler.java @@ -51,6 +51,14 @@ return first; } + static class UEH implements Thread.UncaughtExceptionHandler { + public void uncaughtException(Thread t, Throwable e) { + System.err.println(ERROR: + t.getName() + exception + + e.getMessage()); + e.printStackTrace(); + } + } + public static void main(String[] args) throws Exception { // preinitialize the InterruptedException class so that the reference handler // does not die due to OOME when loading the class if it is the first use @@ -77,6 +85,8 @@ throw new IllegalStateException(Couldn't find Reference Handler thread.); } + referenceHandlerThread.setUncaughtExceptionHandler(new UEH()); + ReferenceQueueObject refQueue = new ReferenceQueue(); Object referent = new Object(); WeakReferenceObject weakRef = new WeakReference(referent, refQueue); On 12/19/2013 6:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 12/21/2013 8:50 AM, Peter Levart wrote: Is it possible to get the test output when it fails? It can fail in two different ways. I can't look at the bug (not authorized)... You should be able to look at it now. There isn't any other information besides OOME error. Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Reference Handler java.lang.Exception: Reference Handler thread died. at OOMEInReferenceHandler.main(OOMEInReferenceHandler.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:491) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:724) Mandy
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi David, Is it possible to get the test output when it fails? It can fail in two different ways. I can't look at the bug (not authorized)... On 12/20/2013 10:54 AM, Chris Hegarty wrote: On 20 Dec 2013, at 04:33, Mandy Chung mandy.ch...@oracle.com wrote: Hi Srikalyan, Maybe you can get add an uncaught handler to see if you can get any information. +1. With this, at least the next time we see this failure we should have a better idea where the OOM is coming from. -Chris. We can try, but I think the VM already prints the stack-trace of the exception by default and as far as I remember, OOME thrown by VM is preallocated and does not contain a stack trace. So I suspect we'll see nothing more with the suggested UEH. Is it possible to include in test, a modified version of Reference class that would be prepended to boot-classpath? For example, containing the following ReferenceHandler: private static class ReferenceHandler extends Thread { ReferenceHandler(ThreadGroup g, String name) { super(g, name); } private volatile int state; @Override public String toString() { return super.toString() + [state= + state + ]; } public void run() { for (;;) { state = 1; ReferenceObject r; state = 2; synchronized (lock) { state = 3; if (pending != null) { state = 4; r = pending; state = 5; pending = r.discovered; state = 6; r.discovered = null; state = 7; } else { state = 8; // The waiting on the lock may cause an OOME because it may try to allocate // exception objects, so also catch OOME here to avoid silent exit of the // reference handler thread. // // Explicitly define the order of the two exceptions we catch here // when waiting for the lock. // // We do not want to try to potentially load the InterruptedException class // (which would be done if this was its first use, and InterruptedException // were checked first) in this situation. // // This may lead to the VM not ever trying to load the InterruptedException // class again. try { state = 9; try { state = 10; lock.wait(); state = 11; } catch (InterruptedException x) { state = 12; } state = 13; } catch (OutOfMemoryError x) { state = 14; } state = 15; continue; } state = 16; } state = 17; // Fast path for cleaners if (r instanceof Cleaner) { state = 18; ((Cleaner)r).clean(); state = 19; continue; } state = 20; ReferenceQueueObject q = (ReferenceQueue) r.queue; state = 21; if (q != ReferenceQueue.NULL) q.enqueue(r); state = 22; } } } ...then just include the toString of referenceHandlerThread instance as part of the exception message at the end of the test: ... ... // wait at most 10 seconds for success or failure for (int i = 0; i 20; i++) { if (refQueue.poll() != null) { // Reference Handler thread still working - success return; } System.gc(); Thread.sleep(500L); // wait a little to allow GC to do it's work before allocating objects if (!referenceHandlerThread.isAlive()) { // Reference Handler thread died - failure throw new Exception(Reference Handler thread died. referenceHandlerThread: + referenceHandlerThread); } } // no sure answer after 10 seconds throw new IllegalStateException(Reference Handler thread stuck. weakRef.get(): + weakRef.get() + , referenceHandlerThread: + referenceHandlerThread); } This might be safer than using UEH since at the time the UEH.uncaughtException() is called, the heap might still be full which would prevent printing the message.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 20 Dec 2013, at 04:33, Mandy Chung mandy.ch...@oracle.com wrote: Hi Srikalyan, Maybe you can get add an uncaught handler to see if you can get any information. +1. With this, at least the next time we see this failure we should have a better idea where the OOM is coming from. -Chris. I ran it for 1000 times but not able to duplicate the failure. Did you run it with jtreg (I didn't)? Below is the patch to install a thread's uncaught handler that you can take and try. diff --git a/test/java/lang/ref/OOMEInReferenceHandler.java b/test/java/lang/ref/OOMEInReferenceHand ler.java --- a/test/java/lang/ref/OOMEInReferenceHandler.java +++ b/test/java/lang/ref/OOMEInReferenceHandler.java @@ -51,6 +51,14 @@ return first; } + static class UEH implements Thread.UncaughtExceptionHandler { + public void uncaughtException(Thread t, Throwable e) { + System.err.println(ERROR: + t.getName() + exception + + e.getMessage()); + e.printStackTrace(); + } + } + public static void main(String[] args) throws Exception { // preinitialize the InterruptedException class so that the reference handler // does not die due to OOME when loading the class if it is the first use @@ -77,6 +85,8 @@ throw new IllegalStateException(Couldn't find Reference Handler thread.); } + referenceHandlerThread.setUncaughtExceptionHandler(new UEH()); + ReferenceQueueObject refQueue = new ReferenceQueue(); Object referent = new Object(); WeakReferenceObject weakRef = new WeakReference(referent, refQueue); On 12/19/2013 6:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Mandy, yes I ran with JTreg to simulate the failure, i will try the UEH patch to see if it sheds some light and get back to you. Thanks for the direction :) -- Thanks kalyan Ph: (408)-585-8040 On 12/19/13, 8:33 PM, Mandy Chung wrote: Hi Srikalyan, Maybe you can get add an uncaught handler to see if you can get any information. I ran it for 1000 times but not able to duplicate the failure. Did you run it with jtreg (I didn't)? Below is the patch to install a thread's uncaught handler that you can take and try. diff --git a/test/java/lang/ref/OOMEInReferenceHandler.java b/test/java/lang/ref/OOMEInReferenceHand ler.java --- a/test/java/lang/ref/OOMEInReferenceHandler.java +++ b/test/java/lang/ref/OOMEInReferenceHandler.java @@ -51,6 +51,14 @@ return first; } + static class UEH implements Thread.UncaughtExceptionHandler { + public void uncaughtException(Thread t, Throwable e) { + System.err.println(ERROR: + t.getName() + exception + + e.getMessage()); + e.printStackTrace(); + } + } + public static void main(String[] args) throws Exception { // preinitialize the InterruptedException class so that the reference handler // does not die due to OOME when loading the class if it is the first use @@ -77,6 +85,8 @@ throw new IllegalStateException(Couldn't find Reference Handler thread.); } + referenceHandlerThread.setUncaughtExceptionHandler(new UEH()); + ReferenceQueueObject refQueue = new ReferenceQueue(); Object referent = new Object(); WeakReferenceObject weakRef = new WeakReference(referent, refQueue); On 12/19/2013 6:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Hi Srikalyan, Maybe you can get add an uncaught handler to see if you can get any information. I ran it for 1000 times but not able to duplicate the failure. Did you run it with jtreg (I didn't)? Below is the patch to install a thread's uncaught handler that you can take and try. diff --git a/test/java/lang/ref/OOMEInReferenceHandler.java b/test/java/lang/ref/OOMEInReferenceHand ler.java --- a/test/java/lang/ref/OOMEInReferenceHandler.java +++ b/test/java/lang/ref/OOMEInReferenceHandler.java @@ -51,6 +51,14 @@ return first; } + static class UEH implements Thread.UncaughtExceptionHandler { + public void uncaughtException(Thread t, Throwable e) { + System.err.println(ERROR: + t.getName() + exception + + e.getMessage()); + e.printStackTrace(); + } + } + public static void main(String[] args) throws Exception { // preinitialize the InterruptedException class so that the reference handler // does not die due to OOME when loading the class if it is the first use @@ -77,6 +85,8 @@ throw new IllegalStateException(Couldn't find Reference Handler thread.); } + referenceHandlerThread.setUncaughtExceptionHandler(new UEH()); + ReferenceQueueObject refQueue = new ReferenceQueue(); Object referent = new Object(); WeakReferenceObject weakRef = new WeakReference(referent, refQueue); On 12/19/2013 6:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.
Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote: Hi David Thanks for your comments, the unguarded part(clean and enqueue) in the Reference Handler thread does not seem to create any new objects, so it is the application(the test in this case) which is adding objects to heap and causing the Reference Handler to die with OOME. The ReferenceHandler thread can only get OOME if it allocates (directly or indirectly) - so there has to be something in the unguarded part that causes this. Again it may be an implicit action in the VM - similar to the class load issue for InterruptedException. David I am still unsure about the side effects of the code change and agree with your thoughts(on memory exhaustion test's reliability). PS: hotspot dev alias removed from CC. -- Thanks kalyan On 12/19/13 5:08 PM, David Holmes wrote: Hi Kalyan, This is not a hotspot issue so I'm moving this to core-libs, please drop hotspot from any replies. On 20/12/2013 6:26 AM, srikalyan wrote: Hi all, I have been working on the bug JDK-8022321 https://bugs.openjdk.java.net/browse/JDK-8022321 , this is a sporadic failure and the webrev is available here http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ I'm really not sure what to make of this. We have a test that triggers an out-of-memory condition but the OOME can actually turn up in the ReferenceHandler thread causing it to terminate and the test to fail. We previously accounted for the non-obvious occurrences of OOME due to the Object.wait and the possible need to load the InterruptedException class - but still the OOME can appear where we don't want it. So finally you have just placed the whole for(;;) loop in a try/catch(OOME) that ignores the OOME. I'm certain that makes the test happy, but I'm not sure it is really what we want for the ReferenceHandler thread. If the OOME occurs while cleaning, or enqueuing then we will fail to clean and/or enqueue but there would be no indication that has occurred and I think that is a bigger problem than this test failing. There may be no way to make this test 100% reliable. In fact I'd suggest that no memory exhaustion test can be 100% reliable. David * **Root Cause:Still not known* 2 places where there is a possibility for OOME 1) Cleaner.clean() 2) ReferenceQueue.enqueue() 1) The cleanup code in turn has 2 places where there is potential for throwing OOME, a) thunk Thread which is run from clean() method. This Runnable is passed to Cleaner and appears in the following classes java/nio/DirectByteBuffer.java sun/misc/Perf.java sun/nio/fs/NativeBuffer.java sun/nio/ch/IOVecWrapper.java sun/misc/Cleaner/ExitOnThrow.java However none of the above overridden implementations ever create an object in the clean() code. b) new PrivilegedAction created in try catch Exception block of clean() method but for this object to be created and to be held responsible for OOME an Exception(other than OOME) has to be thrown. 2) No new heap objects are created in the enqueue method nor anywhere in the deep call stack (VM.addFinalRefCount() etc) so this cannot be a potential cause. *Experimental change to java.lang.Reference.java* : - Put one more guard (try catch with OOME block) in the Reference Handler Thread which may give the Reference Handler a chance to cleanup. This is fixing the test failure (several 1000 runs with 0 failures) - Without the above change the test fails atleast 3-5 times for every 1000 run. *PS*: The code change is to a very critical part of JDK and i am fully not aware of the consequences of the change, hence seeking expert help here. Appreciate your time and inputs towards this.