Hi Serguei, Thanks for your review! A quick update on my progress:
The wrong condition was a good find! In-fact so much that it lead to the whole implementation not reporting any unloaded classes. I changed that back, and now it's so slow because of all those unloaded classes firing events, I'm trying to understand where it's loosing time. Some other findings: - I can't keep the lock while calling into JVMTI e.g. for GetTag() or SetTag(), otherwise it risks to deadlock. - The current implementation doesn't seem to report any unloaded classes either (i.e. the bag returned by classTrack_processUnloads(JNIEnv *env) is always empty or NULL), at least not in my testcase. I'm investigating why this might be the case, or maybe I did something wrong. Roman > Sorry, forgot to complete my comments at the end (see below). > > > On 3/15/20 23:57, serguei.spit...@oracle.com wrote: >> Hi Roman, >> >> Thank you for the update and sorry for the latency in review. >> >> Some comments are below. >> >> http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.05/src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c.frames.html >> >> 87 cbTrackingObjectFree(jvmtiEnv* jvmti_env, jlong tag) >> 88 { >> 89 debugMonitorEnter(deletedSignatureLock); >> 90 if (currentClassTag == -1) { >> 91 // Class tracking not initialized, nobody's interested >> 92 debugMonitorExit(deletedSignatureLock); >> 93 return; >> 94 } >> Just a question: >> Q1: Should the ObjectFree events be disabled for the jvmtiEnv that does >> the class tracking if class tracking has not been initialized? >> >> 70 static jlong currentClassTag; I'm thinking if the name is better to >> be something like: lastClassTag or highestClassTag. >> >> 99 KlassNode* klass = *klass_ptr; >> 100 102 while (klass != NULL && klass->klass_tag != tag) { 103 >> klass_ptr = &klass->next; 104 klass = *klass_ptr; >> 105 } 106 if (klass != NULL || klass->klass_tag != tag) { // klass not >> found - ignore. >> 107 debugMonitorExit(deletedSignatureLock); >> 108 return; >> 109 } >> It seems to me, something is wrong in the condition at L106 above. >> Should it be? : >> if (klass == NULL || klass->klass_tag != tag) >> >> Otherwise, how can the second check ever work correctly as the return >> will always happen when (klass != NULL)? >> >> >> There are several places in this file with the the indent: >> 90 if (currentClassTag == -1) { >> 91 // Class tracking not initialized, nobody's interested >> 92 debugMonitorExit(deletedSignatureLock); >> 93 return; >> 94 } >> ... >> 152 if (currentClassTag == -1) { >> 153 // Class tracking not initialized yet, nobody's interested >> 154 debugMonitorExit(deletedSignatureLock); >> 155 return; >> 156 } >> ... >> 161 if (error != JVMTI_ERROR_NONE) { >> 162 EXIT_ERROR(error, "Unable to GetTag with class trackingEnv"); >> 163 } >> 164 if (tag != 0l) { >> 165 debugMonitorExit(deletedSignatureLock); >> 166 return; // Already added >> 167 } >> ... >> 281 cleanDeleted(void *signatureVoid, void *arg) >> 282 { >> 283 char* sig = (char*)signatureVoid; >> 284 jvmtiDeallocate(sig); >> 285 return JNI_TRUE; >> 286 } >> ... >> 291 void >> 292 classTrack_reset(void) >> 293 { >> 294 int idx; >> 295 debugMonitorEnter(deletedSignatureLock); >> 296 >> 297 for (idx = 0; idx < CT_SLOT_COUNT; ++idx) { >> 298 KlassNode* node = table[idx]; >> 299 while (node != NULL) { >> 300 KlassNode* next = node->next; >> 301 jvmtiDeallocate(node->signature); >> 302 jvmtiDeallocate(node); >> 303 node = next; >> 304 } >> 305 } >> 306 jvmtiDeallocate(table); >> 307 >> 308 bagEnumerateOver(deletedSignatureBag, cleanDeleted, NULL); >> 309 bagDestroyBag(deletedSignatureBag); >> 310 >> 311 currentClassTag = -1; >> 312 >> 313 (void)JVMTI_FUNC_PTR(trackingEnv,DisposeEnvironment)(trackingEnv); >> 314 trackingEnv = NULL; >> 315 >> 316 debugMonitorExit(deletedSignatureLock); >> >> Could you, please, fix several comments below? >> 63 * The JVMTI tracking env to keep track of klass tags, for class-unloads >> The comma is not needed. >> Would it better to replace: klass tags => klass_tag's ? >> >> >> 73 * Lock to keep table, currentClassTag and deletedSignatureBag >> consistent >> Maybe: Lock to guard ... or lock to keep integrity of ... >> >> 84 * Callback when classes are freed, Finds the signature and >> remembers it in deletedSignatureBag. Would be better to use words like >> "store" or "record", "Find" should not start from capital letter: >> Invoke the callback when classes are freed, find and record the >> signature in deletedSignatureBag. >> >> 96 // Find deleted KlassNode 133 // Class tracking not initialized, >> nobody's interested 153 // Class tracking not initialized yet, >> nobody's interested 158 /* Check this is not a duplicate */ Missed dot >> at the end. 106 if (klass != NULL || klass->klass_tag != tag) { // >> klass not found - ignore. In opposite, dot is not needed as the >> comment does not start from a capital letter. 111 // At this point we >> have the KlassNode corresponding to the tag >> 112 // in klass, and the pointer to it in klass_node. > > The comment above can be better. Maybe, something like: > " At this point, we found the KlassNode matching the klass tag(and it is > linked). > >> 113 // Remember the unloaded signature. > Better: Record the signature of the unloaded class and unlink it. > > Thanks, > Serguei > >> Thanks, >> Serguei >> >> On 3/9/20 05:39, Roman Kennke wrote: >>> Hello all, >>> >>> Can I please get reviews of this change? In the meantime, we've done >>> more testing and also field-/torture-testing by a customer who is happy >>> now. :-) >>> >>> Thanks, >>> Roman >>> >>> >>>> Hi Serguei, >>>> >>>> Thanks for reviewing! >>>> >>>> I updated the patch to reflect your suggestions, very good! >>>> It also includes a fix to allow re-connecting an agent after disconnect, >>>> namely move setup of the trackingEnv and deletedSignatureBag to >>>> _activate() to ensure have those structures after re-connect. >>>> >>>> http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.05/ >>>> >>>> Let me know what you think! >>>> Roman >>>> >>>>> Hi Roman, >>>>> >>>>> Thank you for taking care about this scalability issue! >>>>> >>>>> I have a couple of quick comments. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.04/src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c.frames.html >>>>> >>>>> 72 /* >>>>> 73 * Lock to protect deletedSignatureBag >>>>> 74 */ >>>>> 75 static jrawMonitorID deletedSignatureLock; 76 77 /* >>>>> 78 * A bag containing all the deleted classes' signatures. Must be >>>>> accessed under >>>>> 79 * deletedTagLock, >>>>> 80 */ >>>>> 81 struct bag* deletedSignatureBag; >>>>> >>>>> The comments contradict to each other. >>>>> I guess, the lock name at line 79 has to be deletedSignatureLock >>>>> instead of deletedTagLock. >>>>> Also, comma at the end must be replaced with dot. >>>>> >>>>> >>>>> 101 // Tag not found? Ignore. >>>>> 102 if (klass == NULL) { >>>>> 103 debugMonitorExit(deletedSignatureLock); >>>>> 104 return; >>>>> 105 } >>>>> 106 >>>>> 107 // Scan linked-list. >>>>> 108 jlong found_tag = klass->klass_tag; >>>>> 109 while (klass != NULL && found_tag != tag) { >>>>> 110 klass_ptr = &klass->next; >>>>> 111 klass = *klass_ptr; >>>>> 112 found_tag = klass->klass_tag; >>>>> 113 } >>>>> 114 >>>>> 115 // Tag not found? Ignore. >>>>> 116 if (found_tag != tag) { >>>>> 117 debugMonitorExit(deletedSignatureLock); >>>>> 118 return; >>>>> 119 } >>>>> >>>>> >>>>> The code above can be simplified, so that the lines 101-105 are not >>>>> needed anymore. >>>>> It can be something like this: >>>>> >>>>> // Scan linked-list. >>>>> while (klass != NULL && klass->klass_tag != tag) { >>>>> klass_ptr = &klass->next; >>>>> klass = *klass_ptr; >>>>> } >>>>> if (klass == NULL || klass->klass_tag != tag) { // klass not found - >>>>> ignore. >>>>> debugMonitorExit(deletedSignatureLock); >>>>> return; >>>>> } >>>>> >>>>> It will take more time when I get a chance to look at the rest. >>>>> >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> >>>>> >>>>> On 12/21/19 13:24, Roman Kennke wrote: >>>>>> Here comes an update that resolves some races that happen when >>>>>> disconnecting an agent. In particular, we need to take the lock on >>>>>> basically every operation, and also need to check whether or not >>>>>> class-tracking is active and return an appropriate result (e.g. an empty >>>>>> list) when we're not. >>>>>> >>>>>> Updated webrev: >>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.04/ >>>>>> >>>>>> Thanks, >>>>>> Roman >>>>>> >>>>>> >>>>>>> So, here comes the O(1) implementation: >>>>>>> >>>>>>> - Whenever a class is 'prepared', it is registered with a tag, and we >>>>>>> set-up a listener to get notified when it is unloaded. >>>>>>> - Prepared classes are kept in a datastructure that is a table, which >>>>>>> each entry being the head of a linked-list of KlassNode*. The table is >>>>>>> indexed by tag % slot-count, and then simply prepend the new KlassNode*. >>>>>>> This is O(1) operation. >>>>>>> - When we get notified of unloading a class, we look up the signature of >>>>>>> the reported tag in that table, and remember it in a bag. The KlassNode* >>>>>>> is then unlinked from the table and deallocated. This is ~O(1) operation >>>>>>> too, depending on the depth of the table. In my testcase which hammered >>>>>>> the code with class-loads and unloads, I usually see depths of like 2-3, >>>>>>> but not usually more. It should be ok. >>>>>>> - when processUnloads() gets called, we simply hand out that bag, and >>>>>>> allocate a new one. >>>>>>> - I also added cleanup-code in classTrack_reset() to avoid leaking the >>>>>>> signatures and KlassNode* etc when debug agent gets detached and/or >>>>>>> re-attached (was missing before). >>>>>>> - I also added locks around data-structure-manipulation (was missing >>>>>>> before). >>>>>>> - Also, I only activate this whole process when an actual listener gets >>>>>>> registered on EI_GC_FINISH. This seems to happen right when attaching a >>>>>>> jdb, not sure why jdb does that though. This may be something to improve >>>>>>> in the future? >>>>>>> >>>>>>> In my tests, the performance of class-tracking itself looks really good. >>>>>>> The bottleneck now is clearly actual synthesizing the class-unload >>>>>>> events. I don't see how this can be helped when the debug agent asks >>>>>>> for it? >>>>>>> >>>>>>> Updated webrev: >>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.03/ >>>>>>> >>>>>>> Please let me know what you think of it. >>>>>>> >>>>>>> Thanks, >>>>>>> Roman >>>>>>> >>>>>>> >>>>>>>> Alright, the perfectionist in me got me. I am implementing the even >>>>>>>> more >>>>>>>> efficient ~O(1) class tracking. Please hold off reviewing for now. >>>>>>>> >>>>>>>> Thanks,Roman >>>>>>>> >>>>>>>> Hi Chris, >>>>>>>>>> I'll have a look at this, although it might not be for a few days. In >>>>>>>>>> the meantime, maybe you can describe your new implementation in >>>>>>>>>> classTrack.c so it's easier to look through the changes. >>>>>>>>> Sure. >>>>>>>>> >>>>>>>>> The purpose of this class-tracking is to be able to determine the >>>>>>>>> signatures of unloaded classes when GC/class-unloading happened, so >>>>>>>>> that >>>>>>>>> we can generate the appropriate JDWP event. >>>>>>>>> >>>>>>>>> The current implementation does so by maintaining a table of currently >>>>>>>>> prepared classes by building that table when classTrack is >>>>>>>>> initialized, >>>>>>>>> and then add new classes whenever a class gets loaded. When unloading >>>>>>>>> occurs, that cache is rebuilt into a new table, and compared with the >>>>>>>>> old table, and whatever is in the old, but not in the new table gets >>>>>>>>> returned. The problem is that when GCs happen frequently and/or many >>>>>>>>> classes get loaded+unloaded, this amounts to O(classCount*gcCount) >>>>>>>>> complexity. >>>>>>>>> >>>>>>>>> The new implementation keeps a linked-list of prepared classes, and >>>>>>>>> also >>>>>>>>> tracks unloads via the listener cbTrackingObjectFree(). Whenever an >>>>>>>>> unload/GC occurs, the list of prepared classes is scanned, and classes >>>>>>>>> that are also in the deletedTagBag are unlinked (thus maintaining the >>>>>>>>> prepared-classes-list) and its signature put in the list that gets >>>>>>>>> returned. >>>>>>>>> >>>>>>>>> The implementation is not perfect. In order to determine whether or >>>>>>>>> not >>>>>>>>> a class is unloaded, it needs to scan the deletedTagBag. That process >>>>>>>>> is >>>>>>>>> therefore still O(unloadedClassCount). The assumption here is that >>>>>>>>> unloadedClassCount << classCount. In my experiments this seems to be >>>>>>>>> true, and also reasonable to expect. >>>>>>>>> >>>>>>>>> (I have some ideas how to improve the implementation to ~O(1) but it >>>>>>>>> would be considerably more complex: have to maintain a (hash)table >>>>>>>>> that >>>>>>>>> maps tags -> KlassNode*, unlink them directly upon unload, and build >>>>>>>>> the >>>>>>>>> unloaded-signatures list there, but I don't currently see that it's >>>>>>>>> worth the effort). >>>>>>>>> >>>>>>>>> In addition to all that, this process is only activated when there's >>>>>>>>> an >>>>>>>>> actual listener registered for EI_GC_FINISH. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Roman >>>>>>>>> >>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> On 12/18/19 5:05 AM, Roman Kennke wrote: >>>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> Issue: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227269 >>>>>>>>>>> >>>>>>>>>>> I am proposing what amounts to a rewrite of classTrack.c. It avoids >>>>>>>>>>> throwing away the class cache on GC, and instead keeps track of >>>>>>>>>>> loaded/unloaded classes one-by-one. >>>>>>>>>>> >>>>>>>>>>> In addition to that, it avoids this whole dance until an agent >>>>>>>>>>> registers interest in EI_GC_FINISH. >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8227269/webrev.01/ >>>>>>>>>>> >>>>>>>>>>> Testing: manual testing of provided test scenarios and timing. >>>>>>>>>>> >>>>>>>>>>> Eg with the testcase provided here: >>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1751985 >>>>>>>>>>> >>>>>>>>>>> I am getting those numbers: >>>>>>>>>>> unpatched: no debug: 84s with debug: 225s >>>>>>>>>>> patched: no debug: 85s with debug: 95s >>>>>>>>>>> >>>>>>>>>>> I also tested successfully through jdk/submit repo >>>>>>>>>>> >>>>>>>>>>> Can I please get a review? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Roman >>>>>>>>>>> >> >
signature.asc
Description: OpenPGP digital signature