> > On Mar 9, 2010, at 3:36 PM, Robert Joly wrote:
> >
> > >>
> > >>
> > >> On Mar 9, 2010, at 1:16 PM, Robert Joly wrote:
> > >>
> > >>> Mardy wrote:
> > >>>>
> > >>>>
> > >>>> On Mar 9, 2010, at 10:07 AM, Robert Joly wrote:
> > >>>>
> > >>>>> Hi guys,
> > >>>>> I have been investigating XX-7634 which reports high CPU
> > >>>> utilization
> > >>>>> by java processes after an ISO install of the commercial
> > >>>> version (SCS)
> > >>>>> which uses the IBM JVM.
> > >>>>>
> > >>>>> Basically, after an ISO install I'm seeing that *all* the
> > >>>> java-based
> > >>>>> processes chew up between 50% and 100% of one processor and
> > >>>> remained
> > >>>>> like that for as long as I kept the box up (few hours).
> > >>>> The processes
> > >>>>> in question are sipXpage, sipXivr, sipXrelay, sipXconfig,
> > >>>> sipXrest and
> > >>>>> sipXprovision. Using jconsole I was able to find that the
> > >>>> hot thread
> > >>>>> for each of these services is called 'Attach Handler'
> > >>>>> (com.ibm.tools.attach.javaSE.AttachHandler.run()) and I
> > >> also found
> > >>>>> that I can eliminate the high CPU condition completely
> > on a fresh
> > >>>>> install by hand-editing the launch command for each of
> > these java
> > >>>>> processes to add the following property:
> > >>>>> -Dcom.ibm.tools.attach.enable=no
> > >>>>>
> > >>>>> If I add this property then all the processes are
> > >>>> well-behaved but I
> > >>>>> do not understand the fundamental reason why the hot thread
> > >>>> is there
> > >>>>> in the first place. I'm therefore turning to the Java gods
> > >>>> that are
> > >>>>> tuned in to this list to see if they had previous
> > >>>> encounters with this.
> > >>>>>
> > >>>>> Thanks in advance,
> > >>>>> bob
> > >>>>
> > >>>> I would like to know why this issue has all of a sudden
> > >> shown up on
> > >>>> the radar. The Attach API and supporting AttachHandler
> > thread was
> > >>>> introduced, as a result of an upgrade to the IBM JVM, back on
> > >>>> 2009-11-14 Is it possible that the high CPU utilization
> > has been
> > >>>> there since then but no one had noticed it until now?
> > >>>
> > >>> Refresh my memory, will you? Why are we using IBM in the
> > >> first place?
> > >>
> > >> Initially because it was the only option for supporting
> one of our
> > >> customers. In addition, we have discovered that it offers
> > some very
> > >> attractive memory optimization features not offered by
> other JVM's
> > >> that we may need to employ as the number of Java based services
> > >> increase.
> > >>
> > >>>
> > >>> I do not know when the high CPU behavior started showing up
> > >> but it was
> > >>> first reported on 2010-02-10. Also, not every system
> > exhibits the
> > >>> behavior. For example, our friends at Qantom can I can
> > >> reproduce the
> > >>> problem but Al Campbell and Chris Parfitt cannot on their
> > systems.
> > >>> I'm using a Dell R300 and Qantom is using Dell Optiplex and
> > >> so do Al
> > >>> and Chris. I have not identified the ingredient that makes
> > >> this hot
> > >>> thread appear and as far as I can tell, IBM does not
> publish the
> > >>> source code for their attach implementation.
> > >>>
> > >>> The thread seems to be looping around waiting for a
> > >> semaphore. Please
> > >>> see the attached screenshot for visual of the stacktrace.
> > >>>
> > >>
> > >> Is this actually causing a problem or is it just a red
> > herring? If
> > >> it is in fact impacting the performance of the system,
> > then disable
> > >> it.
> > >
> > >
> > > Here's a sample of top running on a bad system. This is
> > running on a
> > > quad-core machine and 60% of it is spent in the kernel and
> > 17% spent
> > > in user space. I'm not sure which way the priorities go but I'm
> > > hoping that the processes with lower number have higher
> priority...
> > >
> > > Would you agree that this is a problematic case?
> >
> > Is that a view of an idle system or one that is under heavy
> load? If
> > the system is idle, then no conclusion can be drawn from that data.
> >
> > I suggest that you take the safe route and just disable the Attach
> > API.
>
> I agree with this suggestion. I could modify the startup
> script for each java process to add a
> -Dcom.ibm.tools.attach.enable=no argument but this would not
> be optimal as new java processes that get introduce later may
> forget to do this.
>
> Instead, I was toying with the idea of modifying the 'sipx-config'
> script that is used to generate the string to use to launch
> 'java' (i.e.
> /usr/bin/java). Every startup script for java processes
> invoke it and it appears to me that it would be a good
> central place to put my -Dcom.ibm.tools.attach.enable=no
> argument so that it gets applied to all present and future
> java processes.
>
> Comments?
So, I went ahead and implemented that fix and it did bring the CPU
utilization of our idle Java processes down to 0% which is where we want
it however that solution is not good enough because it only applies to
the Java-based processes that the sipXecs team manages but does not
reach the "other" external Java processes we carry, openfire being the
leading (and possibly only) example. I do not believe that adding a
-Dcom.ibm.tools.attach.enable=no argument to the openfire-supplied
launch script is a good idea because of 1) the licensing ambiguities
that this may bring and 2) because it does not solve the problem across
the board.
Given that, I went back to the drawing board trying to understand the
fundamental reason why the 'Attach' threads start spinning in the first
place. To make a long story short, after many gyrations I found that
the spinning threads try to write to a temporary directory called
/tmp/.com_ibm_tools_attach/ but they do not have the necessary
permissions to do so. More specifically, the
/tmp/.com_ibm_tools_attach/ directory gets created by the first Java
process to run on the box. On a fresh install, the first Java processes
are the setup ones: sipxkeystoregen, sipxconfig-setup and
sipxopenfire-setup.sh. These three Java setup programs are launched by
the do_setup() function of the sipxecs launch script and get run as
root. As a result, the /tmp/.com_ibm_tools_attach/ directory created is
owned by root:root. Since the sipXecs Java-based processes are run as
sipxchange, their attempts to write to /tmp/.com_ibm_tools_attach/ fail
and they keep trying over and over again resulting in high CPU
utilization. To prove the theory, I added the following two lines to
the do_setup() function of the sipxecs launch script before launching
any process and the CPU problem goes away immediately:
mkdir /tmp/.com_ibm_tools_attach/
chown sipxchange:sipxchange /tmp/.com_ibm_tools_attach/
Do these two lines seem like a decent approach to solving the high CPU
problem or does anybody have a more clever approach to ensuring that
/tmp/.com_ibm_tools_attach/ permissions allow our Java-based sipXecs
processes to write to it?
Comments?
[BTW, I'm still puzzled by the fact that some people cannot reproduce
the problem. With the failure mechanism I just highlighted, it seems to
me that we should always get high CPU utilization on every fresh reclone
of a sipXecs. This part is still a mystery to me...]
_______________________________________________
sipx-dev mailing list [email protected]
List Archive: http://list.sipfoundry.org/archive/sipx-dev
Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-dev
sipXecs IP PBX -- http://www.sipfoundry.org/