On Tue, 27 Jan 2009 20:13:59 +0000, Matthew Toseland wrote:
> On Tuesday 27 January 2009 20:03, Dennis Nezic wrote:
> > On Tue, 27 Jan 2009 12:44:59 -0500, Dennis Nezic wrote:
> > > On Wed, 21 Jan 2009 17:28:47 +0000, Matthew Toseland wrote:
> > > > Give it more memory. If you can't give it more memory, throw
> > > > the box out the window and buy a new one. If you can't do that
> > > > wait for the db4o branch.
> > > 
> > > Or, more likely, throw freenet out the window :|.
> > > 
> > > > Seriously, EVERY time I have investigated these sorts of issues
> > > > the answer has been either that it is showing constant Full
> > > > GC's because it has slightly too little memory, or that there
> > > > is external CPU load. Are you absolutely completely totally
> > > > 100000000000000000000000000% sure that that is not the problem?
> > > > AFAICS there are two posters here, and just because one of them
> > > > is sure that the problem isn't memory doesn't necessarily mean
> > > > that the other one's problems are not due to memory??
> > > 
> > > My node crashed/restarted again due to MessageCore/PacketSender
> > > freezing for 3 minutes. The problem appears to be with cpu usage,
> > > since my memory usage is basically plateauing when the crash
> > > occurs, though I suppose the two factors may not be necessarily
> > > entirely unrelated. My cpu load (ie. as reported by uptime) would
> > > sometimes rise pretty dramatically, with a 15-min load number
> > > hovering between 3 and 4, which brings my system to a crawl, and
> > > I guess this eventually "freezes" some threads in freenet, and
> > > then triggers the shutdown.
> > 
> > Restarting the node "fixes" the cpu-load problem, even though the
> > node is doing exactly the same stuff as before, at least from the
> > user's perspective. So, clearly, the problem is not just "slow and
> > obsolete" hardware as you suggest, but something else internal to
> > the code, that grows out of control over time--over the course of
> > dozens of hours.
> 
> I.e. memory usage. QED!
> 
> Memory usage was plateauing = memory usage was constantly at the
> (low) maximum, and it was using 100% CPU in a vain attempt to reclaim
> the last byte of memory. This is the most likely explanation by far:
> Can you categorically rule it out via checking freenet.loggc? You did
> add the wrapper.conf line I mentioned?:

Hrm. Upon closer inspection of my latest loggc,
http://dennisn.dyndns.org/guest/pubstuff/loggc-freezes.log.bz2

It appears that memory may in fact be an issue. But I don't think it's
the memory limit itself. This last test I set my java memory limit to
250MB, and the logs show it never went much above 200MB. BUT, looking
at the last few Full GC's, the time it took for them to complete
increased rapidly near the end, and the last Full GC took over 3min!,
which probably triggered the "freeze".

My system only has 384MB of physical ram and 400MB of swap in a
swapfile (all of which is on Raid5/LVM :b). My current theory is that
maybe the terribly long Full GCs are due to long disk-io times
resulting from accessing the Raid5/LVM/swapfile. "man java" shows an
interesting option "-Xincgc", which seems to avoid Full GC's:

"
Enable the incremental garbage collector. The incremental
garbage collector, which is off by default, will reduce the
occasional long garbage-collection pauses during program exe-
cution. The incremental garbage collector will at times exe-
cute concurrently with the program and during such times will
reduce the processor capacity available to the program.
"

I'll see if that has any effect. (Is there any way to make the jvm more
forgiving, to allow it to handle longer-than-3min garbage collections?)

Here is my vmstat, in 60s samples, without freenet running. So,
clearly, with < 10M of physical memory free, the swapfile will be used
heavily :o.

# vmstat -S M 60
---------memory---------- ---swap-- -----io---- --system-- ----cpu----
swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 101      9     37    179    0    0    33   155  471   192 19  2 74  5
 101      5     37    179    0    0    79    21  442   179 20  4 66 11
 101      8     37    179    0    0    71     5  441   114 68  3 25  3
 101      8     37    180    0    0    59    32  465   168 19  2 73  5

Here is the same vmstat with freenet running:

 196      4      5     45    0    0   267   137  518   540 39  3 38 20
 196      4      6     49    0    0    80   184  486   371 36  2 54  8
 196      7      6     46    0    0    18    39  486   303 30  1 63  6
 196     11      7     41    0    0    88   109  472   341 31  2 62  4

More swap space is used, and more disk-io (bi and bo--blocks
written/read from disk is almost doubled, cpu.us--time spent running
non-kernel code has more than doubled, and cpu.wa--time spent waiting
for io--the last column, is somewhat increas(ing)).

My fingers are crossed with this -Xincgc option.

Reply via email to