Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
You may want to cross-post to a Java alias, but I've been down this road before. Java will call into malloc() for buffers for network reads and writes that are larger than 2k bytes (the 2k is from memory, and I think it was a 1.5 JVM). A large number of malloc calls, and resulting contention on locks in the library, are due to the application doing network writes of larger than 2k. Newer JVMs (1.6) may improve this, but I'm not sure. There's also an alternative set of classes and methods, NIO, which also can help (although I've heard tell that NIO brings other problems along with it, but I can not speak from personal experience). At this point, I think you need to consult with Java experts to determine what options you have for buffer allocation for network IO from the Java heap, versus the current behavior of the JVM dropping back to malloc for allocating buffers for network IO. The other option of course is determining if the code can be changed to use buffers smaller than 2k. Thanks, /jim Kleyson Rios wrote: > OK jonathan, > > I understand. > > So, looking on right place now, i can see few locks and sometimes no locks > (just Mutex Hold). But I still have many threads in 100% LCK. > > If I don't have a lot of locks, where is my problem ? > > Running rickey c weisner's script a get: > > (...) > 25736 > libc.so.1`_so_send+0x15 > libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67 > libjvm.so`JVM_Send+0x32 > libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131 > 0xc3c098d3 >10 > 25736 > 0xc3d2a33a >14 > 25736 > libc.so.1`_write+0x15 > libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d > libjvm.so`JVM_Write+0x30 > libjava.so`0xc8f7c04b >16 > 25736 > libc.so.1`stat64+0x15 >21 > 25736 > libc.so.1`_write+0x15 > libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d > libjvm.so`JVM_Write+0x30 > libjava.so`0xc8f80ce9 >76 > java 25736 kernel-level lock 1 > java 25736 shuttle6 > java 25736 preempted 7 > java 25736 user-level lock511 > java 25736 condition variable 748 > > > Atenciosamente, > > -- > > Kleyson Rios. > Gerência de Suporte Técnico > Analista de Suporte / Líder de Equipe > > > -Mensagem original- > De: Jonathan Adams [mailto:[EMAIL PROTECTED] > Enviada em: sexta-feira, 18 de abril de 2008 15:40 > Para: Kleyson Rios > Cc: dtrace-discuss@opensolaris.org > Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please) > > > On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote: > >> Hi przemol, >> >> Bellow output of plockstat for malloc and libumem. Both many locks. >> Why changing to libumem I didn't get less locks ? >> >> > > You're looking at Mutex hold statistics, which don't mean a lot > (unless contention is caused by long hold times) > > The important thing for multi-threaded performance is *contention*. > (Spinning and blocking) Those are the statistics you should be > looking at. > > Both malloc and libumem use locks to protect their state; libumem > just uses many locks, in order to reduce contention. > > Cheers, > - jonathan > > > > > ___ > dtrace-discuss mailing list > dtrace-discuss@opensolaris.org > ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
[dtrace-discuss] Help with some custom kernel sdt probes
I'm playing with FreeBSD's port of DTrace we're adding some sdt probes to some kernel modules so we can time some events The probes load fine and they show up in dtrace -l, but I can't manage to access their arguments here's some snip of code static void em_intr(void *arg) { struct adapter *adapter = arg; struct ifnet*ifp = adapter->ifp; uint32_treg_icr; SDT_INTR_PROBE(em, interrupt_start, adapter->dev, &em_intr, ifp, adapter , 0); If I want to access some of the struct ifnet's fields, like this: sdt:::interrupt_end { this->ifnet=(struct ifnet *)arg2; @[this->ifnet->if_xname]=quantize(timestamp-self->ts); } I get the following error: dtrace: failed to compile script sdt-test2.d: line 13: operator -> cannot be applied to a forward declaration: no struct ifnet definition is available is there anything I'm missing to make the struct definition visible to a D script? any help/suggestions would be greatly apreciated Fer Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
[dtrace-discuss] SUSTAINABILITY OF OPEN SOURCE COMMUNITIES
Hey all, I'm a student of international business at University of Amsterdam (Uva). For my Master thesis i'm conducting an investigation about the main factors that influence the sustainability of open source communities, and in order to obtain an empirical confirmation of my reseach i need to conduct a survey and collect information from those who are members of open source communities. Can you please devote 5 minutes of yout time ro fill in my questionnaire? You can do it just following the link below: http://www.thesistools.com/?qid=50932&ln=eng Thanks a lot. I'll inform you about the results of my Research Kind Regards Giuseppe Vaccaro -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] whatfor.d -- where's null pointer?
On Tue, Apr 22, 2008 at 11:34:32AM -0700, Roman Shaposhnik wrote: > Without knowing the details of how the structure to which t_sobj_ops is > pointing gets managed it seems to me that there's a tiny window of > opportunity between recording the address of the structure into > this->tmp and the structure itself dealocated/reused for something > else. Of course, the recorded address will still be valid, but once > you do this->tmp->sobj_type you might get garbage. > > Can this happen? No. t_sobj_ops is always assigned the address of a static structure that is never deallocated. But it's a fair point for other cases, and I don't think there's a generic solution. Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] whatfor.d -- where's null pointer?
On Tue, 2008-04-22 at 11:21 -0700, Adam Leventhal wrote: > On Tue, Apr 22, 2008 at 10:57:26AM -0700, Roman Shaposhnik wrote: > > On Tue, 2008-04-22 at 10:21 -0700, Adam Leventhal wrote: > > > On Tue, Apr 22, 2008 at 09:37:57AM -0400, Lytvyn, Oleksandr (IT) wrote: > > > > That's an interesting analysis. Hope we can have a fix soon. > > > > > > I forgot to mention that you can work around the issue by replacing > > > curlwpsinfo->pr_stype with: > > > > > > (((this->tmp = curthread->t_sobj_ops) != NULL) ? this->tmp->sobj_type > > > : 0) > > > > Isn't there a race-condition still (although a much less probable one)? > > I don't think so; can you describe the race condition you see? Without knowing the details of how the structure to which t_sobj_ops is pointing gets managed it seems to me that there's a tiny window of opportunity between recording the address of the structure into this->tmp and the structure itself dealocated/reused for something else. Of course, the recorded address will still be valid, but once you do this->tmp->sobj_type you might get garbage. Can this happen? Thanks, Roman. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] whatfor.d -- where's null pointer?
On Tue, Apr 22, 2008 at 10:57:26AM -0700, Roman Shaposhnik wrote: > On Tue, 2008-04-22 at 10:21 -0700, Adam Leventhal wrote: > > On Tue, Apr 22, 2008 at 09:37:57AM -0400, Lytvyn, Oleksandr (IT) wrote: > > > That's an interesting analysis. Hope we can have a fix soon. > > > > I forgot to mention that you can work around the issue by replacing > > curlwpsinfo->pr_stype with: > > > > (((this->tmp = curthread->t_sobj_ops) != NULL) ? this->tmp->sobj_type : > > 0) > > Isn't there a race-condition still (although a much less probable one)? I don't think so; can you describe the race condition you see? Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] whatfor.d -- where's null pointer?
On Tue, 2008-04-22 at 10:21 -0700, Adam Leventhal wrote: > On Tue, Apr 22, 2008 at 09:37:57AM -0400, Lytvyn, Oleksandr (IT) wrote: > > That's an interesting analysis. Hope we can have a fix soon. > > I forgot to mention that you can work around the issue by replacing > curlwpsinfo->pr_stype with: > > (((this->tmp = curthread->t_sobj_ops) != NULL) ? this->tmp->sobj_type : 0) Isn't there a race-condition still (although a much less probable one)? Thanks, Roman. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
[dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
OK jonathan, I understand. So, looking on right place now, i can see few locks and sometimes no locks (just Mutex Hold). But I still have many threads in 100% LCK. If I don't have a lot of locks, where is my problem ? Running rickey c weisner's script a get: (...) 25736 libc.so.1`_so_send+0x15 libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67 libjvm.so`JVM_Send+0x32 libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131 0xc3c098d3 10 25736 0xc3d2a33a 14 25736 libc.so.1`_write+0x15 libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d libjvm.so`JVM_Write+0x30 libjava.so`0xc8f7c04b 16 25736 libc.so.1`stat64+0x15 21 25736 libc.so.1`_write+0x15 libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d libjvm.so`JVM_Write+0x30 libjava.so`0xc8f80ce9 76 java 25736 kernel-level lock 1 java 25736 shuttle6 java 25736 preempted 7 java 25736 user-level lock511 java 25736 condition variable 748 Atenciosamente, -- Kleyson Rios. Gerência de Suporte Técnico Analista de Suporte / Líder de Equipe -Mensagem original- De: Jonathan Adams [mailto:[EMAIL PROTECTED] Enviada em: sexta-feira, 18 de abril de 2008 15:40 Para: Kleyson Rios Cc: dtrace-discuss@opensolaris.org Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please) On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote: > Hi przemol, > > Bellow output of plockstat for malloc and libumem. Both many locks. > Why changing to libumem I didn't get less locks ? > You're looking at Mutex hold statistics, which don't mean a lot (unless contention is caused by long hold times) The important thing for multi-threaded performance is *contention*. (Spinning and blocking) Those are the statistics you should be looking at. Both malloc and libumem use locks to protect their state; libumem just uses many locks, in order to reduce contention. Cheers, - jonathan ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] whatfor.d -- where's null pointer?
On Tue, Apr 22, 2008 at 09:37:57AM -0400, Lytvyn, Oleksandr (IT) wrote: > That's an interesting analysis. Hope we can have a fix soon. I forgot to mention that you can work around the issue by replacing curlwpsinfo->pr_stype with: (((this->tmp = curthread->t_sobj_ops) != NULL) ? this->tmp->sobj_type : 0) > It also makes me wonder: how many races of that kind are there? And is > it possible to have a comprehensive fix or approach to handle stuff like > this? There may be other problems like this, but the same fix can apply to all of them. If we go with a compiler change that will cause the DIF to only load the member once in a given probe firing, then that will clean up similarly affected situations. If we create a language construct to make a new local scope, then we'll need to vet our translators and fix them individually. Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] whatfor.d -- where's null pointer?
Adam, That's an interesting analysis. Hope we can have a fix soon. It also makes me wonder: how many races of that kind are there? And is it possible to have a comprehensive fix or approach to handle stuff like this? Thanks, Oleksandr Lytvyn Morgan Stanley | Technology 210 Carnegie Center, 4th Floor | Princeton, NJ 08540 Phone: +1 609 936-4026 Mobile: +1 732 773-4145 [EMAIL PROTECTED] > -Original Message- > From: Adam Leventhal [mailto:[EMAIL PROTECTED] > Sent: Monday, April 21, 2008 2:00 PM > To: Lytvyn, Oleksandr (IT) > Cc: dtrace-discuss@opensolaris.org > Subject: Re: [dtrace-discuss] whatfor.d -- where's null pointer? > > Hi Oleksandr, > > This turned out to be a rather interesting problem. To > investigate, I used > this script: > > ---8<--- > off-cpu > { > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > this->tmp = curlwpsinfo->pr_stype; > } > > ERROR > { > @[arg2] = count(); > } > ---8<--- > > Which resulted in a table like this: > > 91 >101 > 52 > 72 > 13 > 33 > 25 > 65 > 4 10 > > So curlwpsinfo->pr_stype can work and later fail. Looking at > the translator > for that field we see that it looks like this: > > pr_stype = T->t_sobj_ops ? T->t_sobj_ops->sobj_type : 0; > > This compiles to this DIF code: > > OFF OPCODE INSTRUCTION > 00: 29010001ldgs DT_VAR(256), %r1 ! DT_VAR(256) > = "curthread" > 01: 2502setx DT_INTEGER[0], %r2 ! 0x0 > 02: 04010201sll %r1, %r2, %r1 > 03: 05010201srl %r1, %r2, %r1 > 04: 0e010002mov %r1, %r2 > 05: 25000103setx DT_INTEGER[1], %r3 ! 0x88 > 06: 07020302add %r2, %r3, %r2 > 07: 22020002ldx [%r2], %r2 > 08: 1002tst %r2 > 09: 1211be 17 > 10: 0e010002mov %r1, %r2 > 11: 25000103setx DT_INTEGER[1], %r3 ! 0x88 > 12: 07020302add %r2, %r3, %r2 > 13: 22020002ldx [%r2], %r2 > 14: 1e020002ldsw [%r2], %r2 > 15: 0e020002mov %r2, %r2 > 16: 1112ba 18 > 17: 2502setx DT_INTEGER[0], %r2 ! 0x0 > 18: 25000203setx DT_INTEGER[2], %r3 ! 0x38 > 19: 04020302sll %r2, %r3, %r2 > 20: 2e020302sra %r2, %r3, %r2 > 21: 2302ret %r2 > > We can see that we load the t_sobj_ops member once at offset > 07 and then again > at offset 17 (right before we load sobj_type at offset 18). > The t_sobj_ops > member can be set to NULL asynchronously from other threads > so this double > load introduces a window for the failure that you're seeing. > > Either we need to use some temporary, probe-local variable > (one that can't > conflict with a user-defined variable), or we need to perform > some element of > optimization to the generated DIF. > > I've filed this bug: > > 6691541 curlwpsinfo->pr_stype races > > Adam > > On Fri, Apr 18, 2008 at 04:20:31PM -0400, Lytvyn, Oleksandr > (IT) wrote: > > Hi! > > > > Anyone seen this? This one buffles me: I run whatfor.d from > > /usr/demo/dtrace, and here's what I get: > > > > dtrace: script '/usr/demo/dtrace/whatfor.d' matched 12 probes > > dtrace: error on enabled probe ID 1 (ID 681: > sched:unix:resume:off-cpu): > > invalid address (0x0) in action #1 at DIF offset 56 > > dtrace: error on enabled probe ID 1 (ID 681: > sched:unix:resume:off-cpu): > > invalid address (0x0) in action #1 at DIF offset 56 > > dtrace: error on enabled probe ID 1 (ID 681: > sched:unix:resume:off-cpu): > > invalid address (0x0) in action #1 at DIF offset 56 > > dtrace: error on enabled probe ID 1 (ID 681: > sched:unix:resume:off-cpu): > > invalid address (0x0) in action #1 at DIF offset 56 > > ... > > > > Multitudes of those. Apparently, action #1 of probe ID 1 is: > > > > self->sobj = curlwpsinfo->pr_stype; > > > > So, which address is invalid here? The curlwpsinfo is used > in predicate, > > so it cannot be 0x0, because it'd complain about the > predicate too. And > > pr_stype is supposed to be char. > > > > What's wrong here? > > > > Found another error report like that on the web, BTW (in German, > > accidentially). But no responses there, unfortunately. > > > > Thanks, > > > > Oleksandr Lytvyn > > Morgan Stanley | Technology > > 210 Carnegie Center, 4th Floor | Princeton, NJ 08540 > > Phone: +1 609 936-4026 > > Mobile: +1 732 773-4145 > > [EMAIL PROTECTED] > >