On Tue, Sep 18, 2007 at 09:52:33PM +0200, Thierry wrote: > Basically, my load moved from 0 to 1 ( tcpdump using 80% of a cpu ) with, > if I remember right when I looked, 80k requests being processed ( ntpd was > using 0.2 %of the cpu at this time ). > I did not find such overhead with iptables yet.
I ran some monitoring overnight and, probably because of the new DNS stuff, I only got up to 93 packets per second (when averaged over a minute). I've got 9 hours of data so far and I'm not seeing any correlation between userspace CPU usage (vmstat with 2 second intervals) and the packets that tcpdump was handling. The tcpdump process currently has about eight seconds of CPU time associated with it. Tcpdump recorded 286178 NTP packets going out, which would imply that the user side of tcpdump expends about 35 microseconds per packet. I've got no reasonable method of determining the CPU time used by the kernel running the pcap code. If your tcpdump process was indeed seeing 80000 packets, then your CPU usage is probably about right. Multiplying up my 35 microseconds by 80000 packets shows a CPU time of about 2.8 seconds. I would guess that your box has a newer version of my chip, with a clock speed almost 4 times as high. That 2.8 seconds could easily be equated to you seeing 80% CPU usage. In real life, I don't think you'll ever see 80K NTP packets per second (that's several megabytes of data!) from the Internet, and therefore running tcpdump, with suitable filtering, wouldn't use any noticeable CPU usage. I run a mail server on the same box and I regularly get people trying to break in through SSH which tends to cane the CPU. I would attribute my visible CPU usage to these processes. > Then I have a concern with iptables regarding IO, it s generating some > IO, and as the only other option we have with ulogd is DB ( I have few > experiences that tell me mysql is "less usable" with > 20 Millions > entries, but maybe pgsql would be better. The problem is that I would > like to avoid inserting something into the actual "generate stats > process"). I use Postgres at work, generally pretty small datasets (tables up to 100 thousand rows) but some of the stuff gets kind of big (200 million+ rows). Postgres is better for concurrent access to the data and it's query planner (i.e. the bit of code that rewrites your SQL query into something reasonably efficient) was (the last time I looked) much better than MySQL. MySQL always tended to be rule based, which then behaved badly when your data wasn't similar to how the rules were designed, whereas Postgres generates statistical summaries of your data which it then uses to pick good ways to run your query. Neither way is perfect, but in my experience the statistical way is *much* better. MySQL may have improved since I last used it though. > I think I'm going to gather all the infos I need and look around. I > already saw pycap ( python module ), perhaps writting something in C ( > i agree with you I think we could have something fast ). > > I just would like trying to have something fast that can handle a > possible high load. There are lots of bindings for Postgres. If you're really interested in speed then don't use a database, they're good for ensuring your data is safe and ad-hoc queries. If this is your first time doing something like this you'd probably be better with a database though because most of the time you're trying to figure out what you want to know, and writing little bits of SQL is *much* easier than writing lots of anything else. Sam _______________________________________________ timekeepers mailing list [email protected] https://fortytwo.ch/mailman/cgi-bin/listinfo/timekeepers
