On Wed, Jun 6, 2018 at 4:55 AM, David Holmes <david.hol...@oracle.com> wrote: > Hi Goetz, > > On 5/06/2018 11:07 PM, Lindenmaier, Goetz wrote: >> >> Hi >> >> this discussion touched a lot of points so far, which seem >> to lead to different conclusions. >> >> I think we should look at the values printed: >> >> 1. cpu=6300.65ms elapsed=123.28s >> Overhead >> cpu time: >> * system calls at thread dump time >> elapsed time: >> * 1 system call at thread creation time >> * 1 64-bit field per thread for the thread start time >> * 1 system call at thread dump time >> >> As I understand, JDK-8143176 would have had to get the >> time at each locking, which is much more time critical >> than doing this during thread creation. While > > > Correct. > >> the time a lock was held would be much more useful, >> the ratio here, combined with knowledge about the application, >> could lead to the conclusion that the thread is wrongly >> blocked at much lower cost. > > > Agreed. I see no issue with unconditionally adding this information as it > should address some of the concerns of 8143176 as well. > >> >> 2. pthread-id=0x109708000 >> Overhead: >> * a field access at thread dump time. Negligible I would say. > > > I'd overlooked this addition, but it is of course the chunk of ifdef code in > osThread.cpp that I objected to. The problem I have here is the convoluted > mess of "thread identifiers" that we already have. We currently print: > > st->print("tid=" INTPTR_FORMAT " ", p2i(this)); > > which is just the address of the Thread/JavaThread object. Then: > > st->print("nid=0x%x ", thread_id()); > > where 'n' is supposed to represent "native thread id", which is: > - linux: kernel thread id from syscall gettid > - solaris: thread library identity from thr_create() or thr_self() > - windows: thread id from _beginthreadex > - OS X: kernel thread id from pthread_mach_thread_np(pthread_self()); > - Other BSD: kernel (?) thread id from syscall thr_self or getthrid > - AIX: thread library identity from pthread_create() or pthread_self() > > It's telling that the code Gunter added gets the thread library id on linux, > but a "kernel thread id" on other platforms! Do we want to see a thread > library id and a kernel thread id? IIRC nid is supposed be the id you need > to see the thread in a debugger. In which case I'm unclear what role the > thread id Gunter wants to add is playing? I don't think we need to see > kernel ids in general, and pthread-id isn't useful for debugging is it? >
Yes, feels just wrong. Plus, not even sufficient, since you can have both pthread-id and kernel thread id, e.g. on AIX, and depending on your situation you may need both. But then, you may need a lot of things, and where do we draw the line? I admit I also feel uncomfortable with adding all this output to thread dump, which is noise in 99% of cases. (BTW what about tools parsing the thread dump in its current form? Wont we make a lot of scripts out there unhappy?) -- Code wise, at the very least, I would abstract this kind of coding into something like os::print_native_thread_id(outputStream* st) with the expectation of "only writes a few words, no newline". Platforms like AIX may then print both. (Note: not "get_native_thread_id()", since that would stipulates that there is only one and that is can be expressed as a common type over all architectures. Also, I would not want anyone to use it as a number.) -- As others before me (Kirk) noted, I find the information added by this patch useful. But after looking closely at the patch, I would strongly prefer a new separate jcmd for this. Lets call it "VM.thread_stats". You still would have to collect and carry the statistics by thread, but output would be separate both from the user perspective and also in coding - you would not have to force-shove it into Thread::print_on, for instance. Such a hypothetical command could be in the future extended with other statistics as we see fit. I do not see any advantage in overloading the thread dump vehicle in this way. Just my 5 cent. Best Regards, Thomas >> >> 3. allocated=242236760B defined_classes=1725 >> Overhead: >> * 1 64-bit field per thread. >> * counter increment on class creation >> >> Especially defined_classes seems to be controversial. >> As this only makes sense for Java threads, this could >> be printed in a line of it's own in Threads::print_on_error() >> (which already gets a flag to customize for jstack: >> print_concurrent_locks) by calling a dedicated function in Thread. >> No need for flag PrintExtendedThreadInfo thus. >> >> But I would propose to drop this information as it is >> too controversial. > > > Agreed. > >> This leaves the times and the pthread-id to be decided whether >> they are worth the cost. If so, I think they should be printed >> unconditional. > > > Agreed. With some refinement of the "thread id" issue. > >> If a flag is required to avoid bloating the info in the default case, >> I would include more infos, as os_prio and nid under that flag. > > > Of course those things exist for good reason. The native id is supposed to > be what aids in you matching what you see in our thread dumps with what you > see in a debugger. Priority is less of an issue these days, but there was a > time ... ;-) > >> As jstack is deprecated, changes to jstack could be skipped always >> disabling the new printouts. >> >> Finally, I would propose to move _allocated_bytes into >> threadStatisticInfo.hpp. > > > Didn't that part get dropped? (You don't need it for the time info or thread > id). That's a GC logging issue to me. > > Thanks, > David > > >> Best regards, >> Goetz. >> >> >> >> >>> -----Original Message----- >>> From: serviceability-dev [mailto:serviceability-dev- >>> boun...@openjdk.java.net] On Behalf Of David Holmes >>> Sent: Dienstag, 5. Juni 2018 04:53 >>> To: Haug, Gunter <gunter.h...@sap.com>; Chris Plummer >>> <chris.plum...@oracle.com>; serviceability-dev <serviceability- >>> d...@openjdk.java.net>; Langer, Christoph <christoph.lan...@sap.com>; >>> hotspot-runtime-...@openjdk.java.net >>> Subject: Re: RFR(S): 8200720: Print additional information in thread dump >>> (times, allocated bytes etc.) >>> >>> Hi Gunter, >>> >>> On 5/06/2018 1:27 AM, Haug, Gunter wrote: >>>> >>>> Hi David, Chris, >>>> >>>> would it be an option to unconditionally print the additional >>>> information? >>> >>> Regardless which way this is implemented it will be rather complicated >>> and it >>> only switches on/off a few field in the thread dump. >>> >>> I'm not convinced this is all suitable information for a thread stack >>> dump. I can see the time information being useful if using the dump to >>> try and diagnoze a hang or responsiveness issue. But the allocated-bytes >>> and classes-defined just doesn't seem useful in the context of a thread >>> dump to me. Anyone interested in allocation stats is going to be looking >>> at GC logs etc which is where this belongs. Ditto the class-stats belong >>> in some kind of classloading logging IMHO. >>> >>> I'm very reluctant to burden the implementation with capturing these >>> kinds of stats, just to use for diagnostic purposes that may not even be >>> asked for. I'm very much in the "you shouldn't pay for what you don't >>> use" camp in that regard. (See my comments in JDK-8143176 referenced by >>> Sean.) >>> >>> The ThreadStatisticInfo adds overhead to every thread object - and I'd >>> have to suspect there could be overlap with whatever flight recorder >>> sticks in there too. (Thread has become even more bloated in recent >>> time!). >>> >>> Cheers, >>> David >>> >>> >>>> Thanks, >>>> Gunter >>>> >>>> On 04.06.18, 01:13, "David Holmes" <david.hol...@oracle.com> wrote: >>>> >>>> Hi Gunter, Chris, >>>> >>>> Sorry just trying to catch up and this is only a partial look so >>>> far ... >>>> >>>> BTW these changes are not limited to serviceability code so adding >>>> in >>>> runtime team as well. >>>> >>>> On 2/06/2018 9:12 AM, Chris Plummer wrote: >>>> > Hi Gunter, >>>> > >>>> > On 6/1/18 3:17 AM, Haug, Gunter wrote: >>>> >> Hi Chris, >>>> >> >>>> >> thanks for looking into this! >>>> >> >>>> >> Re the synchronization: The value is stored only in a VM >>>> operation at >>>> >> a safepoint by the VM thread. I was of the opinion, that this >>>> could >>>> >> not be interrupted by a second VM operation (of the same type). >>>> Or >>> >>> am >>>> >>>> >> I missing something else? >>>> > I was really thinking more about the temporary changing of >>>> > PrintExtendedThreadInfo, not the value stored in the VMOp. You >>>> may >>> >>> be be >>>> >>>> > correct that no more than one VMOp is executing, but while it is >>>> > executing it is has changed the value of >>>> PrintExtendedThreadInfo, >>> >>> which >>>> >>>> > might have an impact on anything else that triggers printing >>>> thread info >>>> > while the VMOp is executing. >>>> >>>> Even if nothing else can possibly be running during the safepoint >>>> I find >>>> it extremely bad form to change a command-line flag in this way, >>>> particularly from a safepoint! >>>> >>>> If this flag is intended to be dynamically enabled as part of a >>>> dcmd >>>> then it should also be a Manageable flag IMHO. >>>> >>>> That said ... >>>> >>>> >> I did think about passing an argument to the various print_on >>> >>> member >>>> >>>> >> functions of the thread classes, but this would require rather >>>> >> extensive code changes for a rather tiny extension. Therefore I >>>> feel >>>> >> doing it like this is the lesser evil. >>>> >>>> ... it's obviously not truly a global setting, but one that is >>>> needed on >>>> a per-print-request basis. The flag is just the default, but if >>>> the >>>> default is off you still want to enable extended printing on a >>>> per-request basis. >>>> >>>> The current print_on mechanics is not set up for this kind of >>>> flexibility. I think this needs more thought. >>>> >>>> --- >>>> >>>> Re osThread.cpp shared code changes ... I really hate to see all >>>> the >>>> ifdefs in there. Please add a pd_print_on function to support this >>>> if >>>> you truly want platform specific information. >>>> >>>> --- >>>> >>>> threadStatisticInfo.hpp >>>> >>>> typo: getElepsedTime() -> getElapsedTime() >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> >>>> >> >>>> >> Re thread_dump(): I think it's correct or, well, at least it >>>> works ;-) >>>> >> In fact jstack will transfer the "-e" and "-l" in only one >>>> string, >>>> >> i.e. op->arg(0). >>>> > So you are saying that op->arg(0) is "-e -l" when using jstack? >>>> I think >>>> > you really need to clean up the parsing. As it stands right now >>>> passing, >>>> > I get the feeling that if a user erroneously asks for help by >>>> using >>>> > "jcmd <pid> Thread.Print -help", it will end up executing with >>>> -e an -l >>>> > enabled, and no failure for the invalid "-help" option. >>>> >> Christoph has already explained my reasoning. But I agree, >>>> it's not >>>> >> nice and I would be happy to do it like Christoph suggested. >>>> > I'll respond separately to his suggestion. >>>> > >>>> > thanks, >>>> > >>>> > Chris >>>> >> >>>> >> And typo fixed, sorry. >>>> >> >>>> >> Thanks again, >>>> >> Gunter >>>> >> >>>> >> On 01.06.18, 00:03, "Chris Plummer" <chris.plum...@oracle.com> >>> >>> wrote: >>>> >>>> >> >>>> >> Hi Gunter, >>>> >> globals.hpp: fix typo "informatiuon" >>>> >> I worry a little bit about the synchronizing (if that's >>>> the right >>>> >> word) >>>> >> of PrintExtendedThreadInfo and the dcmd's -e flag. When >>>> using -e, >>>> >> you >>>> >> are temporarily enabling PrintExtendedThreadInfo if it was >>>> false. >>>> >> This >>>> >> temporarily changes the behavior of thread dumps, and >>>> could >>>> >> impact other >>>> >> uses that happen in parallel. Also, could two simultaneous >>>> uses >>>> >> of -e >>>> >> result in PrintExtendedThreadInfo not getting restored >>>> properly? >>>> >> thread_dump() doesn't look right. It looks like you are >>>> iterating >>>> >> char >>>> >> by char over the argument, and expect something like "-el" >>>> to be >>>> >> specified rather then "-e -l". The loop should be >>>> iterating over >>>> >> op->arg(i), not op->arg(0)[i]. >>>> >> The rest of the changes look fine. >>>> >> thanks, >>>> >> Chris >>>> >> On 5/30/18 8:12 AM, Haug, Gunter wrote: >>>> >> > Hi all, >>>> >> > >>>> >> > As Chris proposed, I have made an the extended output >>>> >> switchable. There is an VM flag (PrintExtendedThreadInfo), >>>> which is >>>> >> false by default. Moreover, there is an Option (-e) which can >>>> be used >>>> >> with jcmd Thread.print as well as with jstack. The -e option >>>> >> essentially sets PrintExtendedThreadInfo true just for the >>>> respective >>>> >> thread dump. >>>> >> > >>>> >> > Here is the updated webrev: >>>> >> > >>>> >> > http://cr.openjdk.java.net/~ghaug/webrevs/8200720.v2 >>>> >> > >>>> >> > (https://bugs.openjdk.java.net/browse/JDK-8200720) >>>> >> > >>>> >> > Thanks, >>>> >> > Gunter >>>> >> > >>>> >> > >>>> >> > On 02.05.18, 17:07, "serviceability-dev on behalf of >>>> Haug, >>>> >> Gunter" <serviceability-dev-boun...@openjdk.java.net on behalf >>>> of >>>> >> gunter.h...@sap.com> wrote: >>>> >> > >>>> >> > Hi Chris, >>>> >> > >>>> >> > Thanks for looking into this. >>>> >> > You're right, there is a little more we have. We >>>> have >>>> >> implemented an IO tracing mechanism which - rather as a >>>> byproduct - >>>> >> keeps track of bytes read and written per thread. However, this >>>> of >>>> >> course requires changes not only in hotspot. We would be happy >>>> to >>>> >> contribute this as well, but this is a far bigger change and >>>> will >>>> >> probably lead to a far bigger discussion. Anyway, with the >>>> number of >>>> >> bytes read, the number of classes defined doesn't look that >>>> arbitrary >>>> >> anymore, as one can correlate IO to class loading. >>>> >> > >>>> >> > Regarding the verbose option I think that's a good >>>> idea! >>>> >> > >>>> >> > Thanks again, >>>> >> > Gunter >>>> >> > >>>> >> > On 01.05.18, 22:55, "Chris Plummer" >>>> >> <chris.plum...@oracle.com> wrote: >>>> >> > >>>> >> > Hi Gunter, >>>> >> > >>>> >> > The output you are adding is all useful. I >>>> think the >>>> >> question is (and >>>> >> > I'd like to see a few people chime in on this >>>> for this >>>> >> review) is >>>> >> > whether or not all of it is the appropriate for >>>> a >>>> >> thread's stack dump. >>>> >> > For example, defined_classes is on the fringe >>>> of what >>>> >> I would call >>>> >> > generally useful info in a stack dump. Sure, >>>> there >>>> >> might be that rare >>>> >> > case when you need it, but how often compared >>>> to other >>>> >> useful info >>>> >> > maintained on a per thread basis. How many >>>> other bits >>>> >> of useful info are >>>> >> > not being printed in favor of defined_classes? >>>> It >>>> >> seems you have more in >>>> >> > the queue. How many? I'm worried about how >>>> cluttered >>>> >> the stack dumps >>>> >> > will get. Maybe we should add some sort of >>>> verbose >>>> >> thread dumping >>>> >> > option. Just a thought. >>>> >> > >>>> >> > As for the implementation, overall it looks >>>> good, but >>>> >> I can't speak to >>>> >> > whether or not you are doing proper accounting >>>> of >>>> >> defined_classes and >>>> >> > bytes allocated. You'll need input from someone >>>> with >>>> >> more knowledge of >>>> >> > those areas. We'll also need to do some testing >>>> to >>>> >> make sure tool tests >>>> >> > are not impacted. >>>> >> > >>>> >> > thanks, >>>> >> > >>>> >> > Chris >>>> >> > >>>> >> > On 4/30/18 2:51 AM, Haug, Gunter wrote: >>>> >> > > Hi, >>>> >> > > >>>> >> > > this is an update to an RFR I posted on >>>> hotspot-dev, >>>> >> but it is probably more suitable to post it here. Can I please >>>> have a >>>> >> review and a sponsor for the following enhancement: >>>> >> > > >>>> >> > > >>>> http://cr.openjdk.java.net/~ghaug/webrevs/8200720.v1 >>>> >> > > >>>> https://bugs.openjdk.java.net/browse/JDK-8200720 >>>> >> > > >>>> >> > > We at SAP have extended the thread dumps >>>> (obtained >>>> >> by jstack or jcmd) by several fields providing thread specific >>>> >> information. These extensions are quite popular with our >>>> support >>> >>> team. >>>> >>>> >> With some knowledge of the architecture of the application, >>>> they >>> >>> often >>>> >>>> >> allow for quick and simple diagnosis of a running system. >>>> Therefore >>> >>> we >>>> >>>> >> would like to contribute these enhancements. >>>> >> > > >>>> >> > > I took a few simple examples here, namely cpu >>>> time, >>>> >> elapsed time since thread creation, bytes allocated and classes >>>> >> defined by the thread and the pthread-id or equivalent on >>>> platforms >>>> >> where it makes sense. Provided it is known how the application >>> >>> should >>>> >>>> >> behave, a misbehaving thread can identified easily. >>>> >> > > >>>> >> > > There is no measurable overhead for this >>>> >> enhancement. However, it may be a problem that the format of >>>> the >>>> >> output is changed. Tools parsing the output may have to be >>>> changed. >>>> >> > > >>>> >> > > Here is an example of the output generated: >>>> >> > > >>>> >> > > >>>> ------------------------------------------------------ >>>> >> > > >>>> >> > > "main" #1 prio=5 os_prio=31 cpu=6300.65ms >>>> >> elapsed=123.28s allocated=242236760B defined_classes=1725 >>>> >> tid=0x00007fa13a806000 nid=0x1c03 pthread-id=0x109708000 >>>> waiting >>> >>> on >>>> >>>> >> condition [0x0000000109707000] >>>> >> > > java.lang.Thread.State: TIMED_WAITING >>>> (sleeping) >>>> >> > > JavaThread state: _thread_blocked >>>> >> > > Thread: 0x00007fa13a806000 [0x1c03] State: >>>> >> _at_safepoint _has_called_back 0 _at_poll_safepoint 0 >>>> >> > > JavaThread state: _thread_blocked >>>> >> > > at java.lang.Thread.sleep(java.base/Native >>>> Method) >>>> >> > > ... >>>> >> > > >>>> ------------------------------------------------------ >>>> >> > > >>>> >> > > As mentioned above, we have a whole bunch of >>>> other >>>> >> enhancements to the thread dump similar to this one and would >>>> be >>>> >> willing to contribute them if there is any interest. >>>> >> > > >>>> >> > > Thanks and best regards, >>>> >> > > Gunter >>>> >> > > >>>> >> > > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> >>>> > >>>> > >>>> >>>> >