Hi Roman, I'm okay with fix.
Thanks, Serguei On 3/26/20 17:15, serguei.spit...@oracle.com wrote:
Hi Roman, Yes. Thank you for the explanation. Thanks, Serguei On 3/26/20 01:44, Roman Kennke wrote:That was in the previous implementation: I got a condition wrong in the table lookup (as noted by Serguei), and this prevented any class-unload-events from getting out. I have fixed this, but found other problems in that implementation (deadlocks and a crash). The current implementation has none of these problems: we don't need table-lookups - we simply pass-through the signatures, and locking is much simpler and in particular we don't need a lock around the JVMTI call (SetTag) which was the cause of the deadlock. Does that answer your questions? Thanks, RomanHi Roman,It passed all my testing. I think before you push Serguei has a questionregarding an issue you brought up a while back. You mentioned that you weren't getting some events, and suddenly started seeing them. We were discussing it today and it was unclear if this was an issue you were seeing before your changes, and your changes resolved it, or it was initially caused by an earlier version of your changes, and you later fixed it. We just want to better understand what this issue was and how it was fixed. thanks, Chris On 3/25/20 3:22 PM, Roman Kennke wrote:The new job finished, its ID is: mach5-one-rkennke-JDK-8227269-2-20200325-2027-9716289 Thank you, RomanYes, please submit a new job. I'll start my testing once I see that thebuilds are done. Chris On 3/25/20 12:59 PM, Roman Kennke wrote:Hi Chris, Apparently we can get into classTrack_reset() before calling activate(),and we're seeing a null deletedSignatureBag. A simple NULL-check aroundthe cleaning routine fixes the problem for me. http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.08/ Should I post another submit-repo job with that fix? Thanks, RomanHi Roman, com/sun/jdi/JdwpAllowTest.java crashed on many runs: Stack: [0x00007fbb790f9000,0x00007fbb791fa000], sp=0x00007fbb791f8af0, free space=1022k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libjdwp.so+0xdb71] bagEnumerateOver+0x11 C [libjdwp.so+0xe365] classTrack_reset+0x25 C [libjdwp.so+0xfca1] debugInit_reset+0x71 C [libjdwp.so+0x12e0d] debugLoop_run+0x38d C [libjdwp.so+0x25700] acceptThread+0x80V [libjvm.so+0xf4b5a7] JvmtiAgentThread::call_start_function()+0x1c7V [libjvm.so+0x15215c6] JavaThread::thread_main_inner()+0x226 V [libjvm.so+0x1527736] Thread::call_run()+0xf6 V [libjvm.so+0x1250ade] thread_native_entry(Thread*)+0x10eThis happened during a test task run of open/test/jdk/:jdk_jdi. Theredoesn't seem to be anything magic on the command line that might be triggering. Pretty much I see it with all the various VM configs we test. I'm also seeing crashes in the following tests, but not as often:serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java vmTestbase/nsk/jdwp/VirtualMachine/Version/version002/TestDescription.javavmTestbase/nsk/jdwp/VirtualMachine/ReleaseEvents/releaseevents002/TestDescription.javavmTestbase/nsk/jdwp/VirtualMachine/HoldEvents/holdevents002/TestDescription.javavmTestbase/nsk/jdwp/VirtualMachine/Dispose/dispose001/TestDescription.javathanks, Chris On 3/25/20 11:37 AM, Roman Kennke wrote:Hi Chris,Regarding the new assert: 105 if (gdata && gdata->assertOn) { 106 // Check this is not already tagged. 107 jlong tag; 108 error = JVMTI_FUNC_PTR(trackingEnv, GetTag)(env, klass, &tag); 109 if (error != JVMTI_ERROR_NONE) {110 EXIT_ERROR(error, "Unable to GetTag with classtrackingEnv"); 111 } 112 JDI_ASSERT(tag == NOT_TAGGED); 113 } I think you should remove the gdata check. gdata should never be NULL when you get to this code. If it is ever NULL then there's a bug, and the check will hide the bug.Ok, will remove this.Regarding testing, after you do the submit repo testing let me knowthe jobID and I'll do additional testing on it.I did the submit repo earlier today, and it came back green: mach5-one-rkennke-JDK-8227269-2-20200325-1421-9706762 Thanks, Romanthanks, Chris On 3/25/20 6:00 AM, Roman Kennke wrote:Hi Sergei,The fix looks pretty clean now. I also like new name of the lock.:)Thank you!Just one comment below.http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.06/src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c.frames.html110 if (tag != 0l) { 111 return; // Already added 112 } It is better to use a named constant or macro instead.Also, it'd be nice to add a short comment about this value is.As I replied to Chris earlier, this whole block can be turned into an assert. I also made a constant for the value 0, which should be pretty much self-explaining. http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.07/How do you test the fix?I am using a manual test that is provided in this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1751985 "Script to compare performance of GC with and without debugger, when many classes are loaded and classes are being unloaded": https://bugzilla.redhat.com/attachment.cgi?id=1640688 I am also using this test and manually attach/detach jdb a couple oftimes in a row to check that disconnecting and reconnecting workswell (this tended to deadlock or crash with an earlier version of the patch, and is now looking good).I am also running tier1 and tier2 tests locally, and as soon as weall agree that the fix is reasonable, I will push it to the submit repo. I am not sure if any of those tests actually exercise that code, though. Let me know if you want me to run any specific tests. Thank you, RomanThanks, Serguei On 3/20/20 08:30, Roman Kennke wrote:I believe I came up with a much simpler solution that also solves the problems of the existing one, and the ones I proposed earlier. It turns out that we can take advantage of the fact that we can use *anything* as tags in JVMTI, even pointers to stuff (this is explicitely mentioned in the JVMTI spec). This means we can simply stick a pointerto the signature of a class into the tag, and pull it out againwhen we get notified that the class gets unloaded.This means we don't need an extra data-structure to keep track ofclasses and signatures, and it also makes the story around locking*much* simpler. Performance-wise this is O(1), i.e. no scanningof all classes needed (as in the current implementation) and no searching of table needed (like in my previous attempts). Please review this new revision: http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.06/(Notice that there still appears to be a performance bottleneckwith class-unloading when an actual debugger is attached. This doesn't seem to be related to the classTrack.c implementation though, but looks like a consequence of getting all those class-unload notifications over the wire. My testcase generates 1000s of them, and it's clogging up the buffers.) I am not sure why jdb needs to enable class-unload listener always. Asimple hack disables it, and performance is brilliant, even whenjdb is attached:http://cr.openjdk.java.net/~rkennke/disable-jdk-class-unload.patchBut this is not in the scope of this bug.) Roman On 3/16/20 8:05 AM, serguei.spit...@oracle.com wrote:Sorry, forgot to complete my comments at the end (see below). On 3/15/20 23:57, serguei.spit...@oracle.com wrote:Hi Roman,Thank you for the update and sorry for the latency in review.Some comments are below.http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.05/src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c.frames.html87 cbTrackingObjectFree(jvmtiEnv* jvmti_env, jlong tag) 88 { 89 debugMonitorEnter(deletedSignatureLock); 90 if (currentClassTag == -1) { 91 // Class tracking not initialized, nobody's interested 92 debugMonitorExit(deletedSignatureLock); 93 return; 94 } Just a question: Q1: Should the ObjectFree events be disabled for the jvmtiEnv that does the class tracking if class tracking has not been initialized? 70 static jlong currentClassTag; I'm thinking if the name is better to be something like: lastClassTag or highestClassTag. 99 KlassNode* klass = *klass_ptr;100 102 while (klass != NULL && klass->klass_tag != tag) { 103klass_ptr = &klass->next; 104 klass = *klass_ptr; 105 } 106 if (klass != NULL || klass->klass_tag != tag) { // klass not found - ignore. 107 debugMonitorExit(deletedSignatureLock); 108 return; 109 }It seems to me, something is wrong in the condition at L106above. Should it be? : if (klass == NULL || klass->klass_tag != tag) Otherwise, how can the second check ever work correctly as the return will always happen when (klass != NULL)?There are several places in this file with the the indent:90 if (currentClassTag == -1) { 91 // Class tracking not initialized, nobody's interested 92 debugMonitorExit(deletedSignatureLock); 93 return; 94 } ... 152 if (currentClassTag == -1) {153 // Class tracking not initialized yet, nobody's interested154 debugMonitorExit(deletedSignatureLock); 155 return; 156 } ... 161 if (error != JVMTI_ERROR_NONE) { 162 EXIT_ERROR(error, "Unable to GetTag with class trackingEnv"); 163 } 164 if (tag != 0l) { 165 debugMonitorExit(deletedSignatureLock); 166 return; // Already added 167 } ... 281 cleanDeleted(void *signatureVoid, void *arg) 282 { 283 char* sig = (char*)signatureVoid; 284 jvmtiDeallocate(sig); 285 return JNI_TRUE; 286 } ... 291 void 292 classTrack_reset(void) 293 { 294 int idx; 295 debugMonitorEnter(deletedSignatureLock); 296 297 for (idx = 0; idx < CT_SLOT_COUNT; ++idx) { 298 KlassNode* node = table[idx]; 299 while (node != NULL) { 300 KlassNode* next = node->next; 301 jvmtiDeallocate(node->signature); 302 jvmtiDeallocate(node); 303 node = next; 304 } 305 } 306 jvmtiDeallocate(table); 307308 bagEnumerateOver(deletedSignatureBag, cleanDeleted, NULL);309 bagDestroyBag(deletedSignatureBag); 310 311 currentClassTag = -1; 312 313(void)JVMTI_FUNC_PTR(trackingEnv,DisposeEnvironment)(trackingEnv);314 trackingEnv = NULL; 315 316 debugMonitorExit(deletedSignatureLock); Could you, please, fix several comments below? 63 * The JVMTI tracking env to keep track of klass tags, for class-unloads The comma is not needed. Would it better to replace: klass tags => klass_tag's ? 73 * Lock to keep table, currentClassTag and deletedSignatureBag consistentMaybe: Lock to guard ... or lock to keep integrity of ...84 * Callback when classes are freed, Finds the signature andremembers it in deletedSignatureBag. Would be better to use words like "store" or "record", "Find" should not start from capital letter:Invoke the callback when classes are freed, find and record thesignature in deletedSignatureBag. 96 // Find deleted KlassNode 133 // Class tracking not initialized,nobody's interested 153 // Class tracking not initialized yet,nobody's interested 158 /* Check this is not a duplicate */ Missed dotat the end. 106 if (klass != NULL || klass->klass_tag != tag){ //klass not found - ignore. In opposite, dot is not needed as thecomment does not start from a capital letter. 111 // At this point we have the KlassNode corresponding to the tag 112 // in klass, and the pointer to it in klass_node.The comment above can be better. Maybe, something like:" At this point, we found the KlassNode matching the klasstag(and it is linked).113 // Remember the unloaded signature.Better: Record the signature of the unloaded class and unlink it. Thanks, SergueiThanks, Serguei On 3/9/20 05:39, Roman Kennke wrote:Hello all, Can I please get reviews of this change? In the meantime, we've done more testing and also field-/torture-testing by a customer who is happy now. :-) Thanks, RomanHi Serguei, Thanks for reviewing!I updated the patch to reflect your suggestions, very good! It also includes a fix to allow re-connecting an agent afterdisconnect, namely move setup of the trackingEnv and deletedSignatureBag to_activate() to ensure have those structures after re-connect.http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.05/ Let me know what you think! RomanHi Roman, Thank you for taking care about this scalability issue! I have a couple of quick comments.http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.04/src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c.frames.html72 /* 73 * Lock to protect deletedSignatureBag 74 */ 75 static jrawMonitorID deletedSignatureLock; 76 77 /*78 * A bag containing all the deleted classes' signatures.Must be accessed under 79 * deletedTagLock, 80 */ 81 struct bag* deletedSignatureBag; The comments contradict to each other. I guess, the lock name at line 79 has to be deletedSignatureLock instead of deletedTagLock. Also, comma at the end must be replaced with dot. 101 // Tag not found? Ignore. 102 if (klass == NULL) { 103 debugMonitorExit(deletedSignatureLock); 104 return; 105 } 106 107 // Scan linked-list. 108 jlong found_tag = klass->klass_tag; 109 while (klass != NULL && found_tag != tag) { 110 klass_ptr = &klass->next; 111 klass = *klass_ptr; 112 found_tag = klass->klass_tag; 113 } 114 115 // Tag not found? Ignore. 116 if (found_tag != tag) { 117 debugMonitorExit(deletedSignatureLock); 118 return; 119 } The code above can be simplified, so that the lines 101-105 are not needed anymore. It can be something like this: // Scan linked-list. while (klass != NULL && klass->klass_tag != tag) { klass_ptr = &klass->next; klass = *klass_ptr; }if (klass == NULL || klass->klass_tag != tag) { // klass notfound - ignore. debugMonitorExit(deletedSignatureLock); return; } It will take more time when I get a chance to look at the rest. Thanks, Serguei On 12/21/19 13:24, Roman Kennke wrote:Here comes an update that resolves some races that happenwhendisconnecting an agent. In particular, we need to take thelock onbasically every operation, and also need to check whetheror notclass-tracking is active and return an appropriate result(e.g. an empty list) when we're not. Updated webrev:http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.04/Thanks, RomanSo, here comes the O(1) implementation:- Whenever a class is 'prepared', it is registered with atag, and we set-up a listener to get notified when it is unloaded.- Prepared classes are kept in a datastructure that is atable, whicheach entry being the head of a linked-list of KlassNode*.The table is indexed by tag % slot-count, and then simply prepend the new KlassNode*. This is O(1) operation. - When we get notified of unloading a class, we look up the signature ofthe reported tag in that table, and remember it in a bag.The KlassNode*is then unlinked from the table and deallocated. This is~O(1) operationtoo, depending on the depth of the table. In my testcasewhich hammered the code with class-loads and unloads, I usually see depths of like 2-3, but not usually more. It should be ok. - when processUnloads() gets called, we simply hand out that bag, and allocate a new one.- I also added cleanup-code in classTrack_reset() to avoidleaking the signatures and KlassNode* etc when debug agent gets detached and/or re-attached (was missing before). - I also added locks around data-structure-manipulation (was missing before).- Also, I only activate this whole process when an actuallistener gets registered on EI_GC_FINISH. This seems to happen right when attaching a jdb, not sure why jdb does that though. This may be something to improve in the future? In my tests, the performance of class-tracking itself looks really good. The bottleneck now is clearly actual synthesizing the class-unloadevents. I don't see how this can be helped when the debugagent asks for it? Updated webrev:http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.03/Please let me know what you think of it. Thanks, RomanAlright, the perfectionist in me got me. I am implementing the even moreefficient ~O(1) class tracking. Please hold off reviewingfor now. Thanks,Roman Hi Chris,I'll have a look at this, although it might not be for a few days. In the meantime, maybe you can describe your new implementation in classTrack.c so it's easier to look through the changes.Sure. The purpose of this class-tracking is to be able to determine thesignatures of unloaded classes when GC/class-unloadinghappened, so that we can generate the appropriate JDWP event. The current implementation does so by maintaining a table of currently prepared classes by building that table when classTrack is initialized,and then add new classes whenever a class gets loaded.When unloading occurs, that cache is rebuilt into a new table, and compared with the old table, and whatever is in the old, but not in the new table getsreturned. The problem is that when GCs happen frequentlyand/or many classes get loaded+unloaded, this amounts to O(classCount*gcCount) complexity.The new implementation keeps a linked-list of preparedclasses, and alsotracks unloads via the listener cbTrackingObjectFree().Whenever an unload/GC occurs, the list of prepared classes is scanned, and classes that are also in the deletedTagBag are unlinked (thus maintaining theprepared-classes-list) and its signature put in the listthat gets returned.The implementation is not perfect. In order to determinewhether or nota class is unloaded, it needs to scan the deletedTagBag.That process is therefore still O(unloadedClassCount). The assumption here is thatunloadedClassCount << classCount. In my experiments thisseems to be true, and also reasonable to expect.(I have some ideas how to improve the implementation to~O(1) but itwould be considerably more complex: have to maintain a(hash)table that maps tags -> KlassNode*, unlink them directly upon unload, and build the unloaded-signatures list there, but I don't currently see that it's worth the effort).In addition to all that, this process is only activatedwhen there's an actual listener registered for EI_GC_FINISH. Thanks, RomanChris On 12/18/19 5:05 AM, Roman Kennke wrote:Hello all, Issue: https://bugs.openjdk.java.net/browse/JDK-8227269 I am proposing what amounts to a rewrite of classTrack.c. It avoidsthrowing away the class cache on GC, and instead keepstrack of loaded/unloaded classes one-by-one. In addition to that, it avoids this whole dance until an agent registers interest in EI_GC_FINISH. Webrev:http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.01/Testing: manual testing of provided test scenarios andtiming. Eg with the testcase provided here: https://bugzilla.redhat.com/show_bug.cgi?id=1751985 I am getting those numbers: unpatched: no debug: 84s with debug: 225s patched: no debug: 85s with debug: 95s I also tested successfully through jdk/submit repo Can I please get a review? Thanks, Roman