On Thu, 5 Feb 2009 13:49:58 +0000, Matthew Toseland wrote:
> On Thursday 05 February 2009 00:43, Dennis Nezic wrote:
> > On Mon, 2 Feb 2009 17:26:40 -0500, Dennis Nezic wrote:
> > > On Tue, 27 Jan 2009 20:13:59 +0000, Matthew Toseland wrote:
> > > > On Tuesday 27 January 2009 20:03, Dennis Nezic wrote:
> > > > > On Tue, 27 Jan 2009 12:44:59 -0500, Dennis Nezic wrote:
> > > > > > On Wed, 21 Jan 2009 17:28:47 +0000, Matthew Toseland wrote:
> > > > > > > Give it more memory. If you can't give it more memory,
> > > > > > > throw the box out the window and buy a new one. If you
> > > > > > > can't do that wait for the db4o branch.
> > > > > > 
> > > > > > Or, more likely, throw freenet out the window :|.
> > > > > > 
> > > > > > > Seriously, EVERY time I have investigated these sorts of
> > > > > > > issues the answer has been either that it is showing
> > > > > > > constant Full GC's because it has slightly too little
> > > > > > > memory, or that there is external CPU load. Are you
> > > > > > > absolutely completely totally
> > > > > > > 100000000000000000000000000% sure that that is not the
> > > > > > > problem? AFAICS there are two posters here, and just
> > > > > > > because one of them is sure that the problem isn't memory
> > > > > > > doesn't necessarily mean that the other one's problems
> > > > > > > are not due to memory??
> > > > > > 
> > > > > > My node crashed/restarted again due to
> > > > > > MessageCore/PacketSender freezing for 3 minutes. The
> > > > > > problem appears to be with cpu usage, since my memory usage
> > > > > > is basically plateauing when the crash occurs, though I
> > > > > > suppose the two factors may not be necessarily entirely
> > > > > > unrelated. My cpu load (ie. as reported by uptime) would
> > > > > > sometimes rise pretty dramatically, with a 15-min load
> > > > > > number hovering between 3 and 4, which brings my system to
> > > > > > a crawl, and I guess this eventually "freezes" some threads
> > > > > > in freenet, and then triggers the shutdown.
> > > > > 
> > > > > Restarting the node "fixes" the cpu-load problem, even though
> > > > > the node is doing exactly the same stuff as before, at least
> > > > > from the user's perspective. So, clearly, the problem is not
> > > > > just "slow and obsolete" hardware as you suggest, but
> > > > > something else internal to the code, that grows out of
> > > > > control over time--over the course of dozens of hours.
> > > > 
> > > > I.e. memory usage. QED!
> > > > 
> > > > Memory usage was plateauing = memory usage was constantly at the
> > > > (low) maximum, and it was using 100% CPU in a vain attempt to
> > > > reclaim the last byte of memory. This is the most likely
> > > > explanation by far: Can you categorically rule it out via
> > > > checking freenet.loggc? You did add the wrapper.conf line I
> > > > mentioned?:
> > > 
> > > Hrm. Upon closer inspection of my latest loggc,
> > > http://dennisn.dyndns.org/guest/pubstuff/loggc-freezes.log.bz2
> > > 
> > > It appears that memory may in fact be an issue. But I don't think
> > > it's the memory limit itself. This last test I set my java memory
> > > limit to 250MB, and the logs show it never went much above 200MB.
> > > BUT, looking at the last few Full GC's, the time it took for them
> > > to complete increased rapidly near the end, and the last Full GC
> > > took over 3min!, which probably triggered the "freeze".
> > > 
> > > My system only has 384MB of physical ram and 400MB of swap in a
> > > swapfile (all of which is on Raid5/LVM :b). My current theory is
> > > that maybe the terribly long Full GCs are due to long disk-io
> > > times resulting from accessing the Raid5/LVM/swapfile. "man java"
> > > shows an interesting option "-Xincgc", which seems to avoid Full
> > > GC's:
> > > 
> > > "
> > > Enable the incremental garbage collector. The incremental
> > > garbage collector, which is off by default, will reduce the
> > > occasional long garbage-collection pauses during program exe-
> > > cution. The incremental garbage collector will at times exe-
> > > cute concurrently with the program and during such times will
> > > reduce the processor capacity available to the program.
> > > "
> > > 
> > > I'll see if that has any effect. (Is there any way to make the jvm
> > > more forgiving, to allow it to handle longer-than-3min garbage
> > > collections?)
> > > 
> > > Here is my vmstat, in 60s samples, without freenet running. So,
> > > clearly, with < 10M of physical memory free, the swapfile will be
> > > used heavily :o.
> > > 
> > > # vmstat -S M 60
> > > ---------memory---------- ---swap-- -----io---- --system--
> > > ----cpu---- swpd   free   buff  cache   si   so    bi    bo
> > > in    cs us sy id wa 101      9     37    179    0    0    33
> > > 155  471   192 19  2 74  5 101      5     37    179    0    0
> > > 79    21  442   179 20  4 66 11 101      8     37    179    0
> > > 0    71     5  441   114 68  3 25  3 101      8     37    180
> > > 0    0    59    32  465   168 19  2 73  5
> > > 
> > > Here is the same vmstat with freenet running:
> > > 
> > >  196      4      5     45    0    0   267   137  518   540 39  3
> > > 38 20 196      4      6     49    0    0    80   184  486   371
> > > 36  2 54  8 196      7      6     46    0    0    18    39  486
> > > 303 30  1 63  6 196     11      7     41    0    0    88   109
> > > 472   341 31  2 62  4
> > > 
> > > More swap space is used, and more disk-io (bi and bo--blocks
> > > written/read from disk is almost doubled, cpu.us--time spent
> > > running non-kernel code has more than doubled, and cpu.wa--time
> > > spent waiting for io--the last column, is somewhat increas(ing)).
> > > 
> > > My fingers are crossed with this -Xincgc option.
> > 
> > It didn't appear to have much effect. It still did a few Full GC's,
> > 3 in 1.8 days, so far more rarely, but no significant improvement
> > in load or memory management.
> > 
> > http://dennisn.dyndns.org/guest/pubstuff/loggc-freezes-xinc.log.bz2
> > 
> > As before, the GC's become increasingly longer and more eratic near
> > the end. This time, the last Full GC took 99s, with a bunch of long
> > 11s-40s GCs around the same time, which almost certainly
> > contributed the most to the dreaded "3 minute freeze". As usual,
> > the hour before the freeze, CPU load is higher than normal (as are
> > GC timings), then finally spikes even higher, then the freeze and
> > node shutdown.
> > 
> > I'll try lowering the memory I allocate to freenet, and try to free
> > some more of my precious RAM on my system. (My datastore is
> > currently only 5G, so the bloom shouldn't be a problem.) I'll also
> > try to monitor my swapfile activity to doubly-confirm that is the
> > issue here :|.
> > 
> > (Is there no way to get rid of constant stupid java GC? Perhaps we
> > should move to C? :P)
> 
> If GC is taking more than a fraction of a second then there is
> something seriously wrong on your system, most likely part of the VM
> has been swapped out.

Reducing the memory I allocate to freenet (to 128MB) and freeing up
some more RAM from my other running processes, all to avoid using
swapspace seems to have worked :P. It's running pretty smoothly now!
Case closed.

Reply via email to