On Sat, 2007-02-17 at 11:21 +0000, Justin Mason wrote:
> Raul Dias writes:
> > On Sat, 2007-02-17 at 02:07 +0100, Mark Martinec wrote:
> > > On Saturday February 17 2007 01:49, Matthew Wilson wrote:
> > > > I was/am primarily concerned with RAM usage for high-concurrency
> > > > situations.
> > > 
> > > Ok. Still, in my experience about 30 (maybe 50) SA processes can
> > > fully utilize today's CPU & I/O, and it's probably no big deal
> > > to provide about 2 GB of memory to cater for such system.
> > > Also, and unfortunately, multithreading in Perl is rather
> > > cumbersome and not significantly less expensive than fully
> > > individual processes.
> > 
> > After experiencing with the sa-blacklist.cf some time ago with 45
> > process brought my system to its knees with 3.5GB (out of memory).  
> > 
> > I agree about the thread model.
> > 
> > But sticking to a async I/O model is a valid point.  If implemented
> > correctly it will save a lot of memory and even improve performance a
> > little.
> > 
> > Having separeted process saves the need to have to check for garbage
> > after filtering a message, which will cause the code to have to be
> > recheck.  
> > 
> > However, for uniprocessor systems, having multiple process running is
> > actually more expansive than a async I/O one.  For multiple process
> > system, just keep one process for cpu or less.
> > 
> > In the past I have played a lot with perl-loop (any loopers around?)
> > which was the only way to go.  It is too low level for most people, but
> > perhaps POE is the way to go today (which can use perl-loop as its
> > base).
> 
> I'm dubious about the benefits for SpamAssassin...
> 
> An async model works very well for network-bound and I/O-bound servers;
> however, SpamAssassin is mainly CPU-bound, since the network and I/O parts
> are already mostly run async during the scan operation.
> 
> Also, the multiple spamd processes share quite a lot of RAM with each
> other -- there's a bug in how linux reports "shared" memory which makes it
> appear much worse than it is. read the FAQ for more details.

yep, but ...


01:01:37 kernel: Out of Memory: Killed process 10024 (spamd).
01:01:51 kernel: Out of Memory: Killed process 10044 (spamd).
01:02:05 kernel: Out of Memory: Killed process 10612 (spamd).
01:02:19 kernel: Out of Memory: Killed process 10038 (spamd).
01:02:32 kernel: Out of Memory: Killed process 10602 (spamd).
01:02:45 kernel: Out of Memory: Killed process 10398 (spamd).
01:03:04 kernel: Out of Memory: Killed process 10020 (spamd).
01:03:29 kernel: Out of Memory: Killed process 10015 (spamd).
01:03:42 kernel: Out of Memory: Killed process 10237 (spamd).
01:04:00 kernel: Out of Memory: Killed process 11037 (spamd).
01:04:18 kernel: Out of Memory: Killed process 10478 (spamd).
01:04:34 kernel: Out of Memory: Killed process 11065 (spamd).
01:04:40 kernel: Out of Memory: Killed process 10405 (spamd).
...and it goes...

If I remember correctly spamd was using something between 2 to 5% of
memory reported by top (45 process max).

If it was really shared, it would have not collapsed.

My bet is that the model used on Linux is copy on write.  So after a
fork, when the child spamd changes a value, the kernel makes its own
copy of the memory. (please correct me if I am wrong).  To make it worse
perl script (AFAIK) is data and not code which makes harder to reuse
(espcially with evals around).

Even if sharing does happen it is not enough.

OTOH, with an I/O model, the total memory used would be:
 - the perl interpreter and libraries (this is trully shared on a fork 
    model).
 - the compiled perl code and perl libraries.
 - one copy of the parsed rules and compiled regular expressions and non
   message/scanner related data.
 - one M::SA::PerMsgStatus object for each simultaneous scanned message 
   (this is a place to put a limit on).

> Still, if someone tries it and can demo increased efficiency...
> go for it ;)

This might require some internal changes to SA. Every Sync call would
have to be changed to Async (NON BLOCKING). This might include SQL
calls, DNS calls, exec ing external apps and even file I/O.

-Raul Dias


> --j.

Reply via email to