Re: find -exec surprisingly slow
--On Monday, August 16, 2004 21:37:13 +0930 "Paul A. Hoadley" <[EMAIL PROTECTED]> wrote: On Sun, Aug 15, 2004 at 02:22:02PM -0700, Pat Lashley wrote: Just FYI, Exim, with the ExiScan patches, can reject at SMTP time; and also has a 'fakereject' capability which tells the sender that the message has been rejected; but actually delivers it. Thanks for the info. I have been thinking of changing MTAs for a while. I've been using Exim for years now, and in several widly varying installations. I can heartily recommend it as solid, flexable, and capable. And the config file is actually pretty easy to read even in complex or highly customized configurations. (Unlike a certain ancient but still inexplicably popular MTA...) The FreeBSD port automatically includes the semi-official ExiScan patchest which adds the ability to do SpamAssassin and anti-virus scanning while the SMTP connection is still open. The Exim mailing list has a pretty high signal-to-noise ratio; and the folks on it tend to be friendly and helpful. And there's very good on-line documentation at http://www.exim.org/ -Pat ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
Hello, On Sun, Aug 15, 2004 at 11:56:10AM +0100, Scott Mitchell wrote: > I don't know how committed to qmail you are, but Exim will do this > out of the box. I'm pretty sure it's part of the default config > file. With the exim+exiscan patches (available from ports) you can > get even more creative and integrate virus scanning, SpamAssassin, > etc. with very little effort. Thanks. I have been thinking of changing MTAs for a while. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ pgpABygqhYvTm.pgp Description: PGP signature
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 02:22:02PM -0700, Pat Lashley wrote: > Could you create a user to get them; and give that user a procmail > (or similar) delivery-time script to file them into subdirs based on > some arbitrary characteristic? Sounds feasible. The sheer volume has overwhelmed me, though, and now I'm just throwing them out. > Just FYI, Exim, with the ExiScan patches, can reject at SMTP time; > and also has a 'fakereject' capability which tells the sender that > the message has been rejected; but actually delivers it. Thanks for the info. I have been thinking of changing MTAs for a while. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ pgpHzYz6Anj4V.pgp Description: PGP signature
Re: find -exec surprisingly slow
--On Sunday, August 15, 2004 12:30:01 +0930 "Paul A. Hoadley" <[EMAIL PROTECTED]> wrote: Good question---without context, my claim that I can do nothing else seems wrong. What I should have said is "given I have an interest in collecting all the spams to non-existent addresses, I don't think I can make qmail do anything other than deliver it to the new/ subdir of a Maildir." Could you create a user to get them; and give that user a procmail (or similar) delivery-time script to file them into subdirs based on some arbitrary characteristic? IMHO, these messages should be _rejected_ at the SMTP session, though (AFAICS) qmail won't do this (without being patched). (I am sure I once read a "security" justification for this behaviour, though I can't seem to find any justification for it at all now. I am willing to be convinced otherwise, but IMHO, accepting these messages is bogus behaviour.) Anyway, I was about to embark on tracking down a patch to do SMTP-level rejection, when I decided I would just funnel them into a Maildir and use them later to train Bogofilter, or whatever. Just FYI, Exim, with the ExiScan patches, can reject at SMTP time; and also has a 'fakereject' capability which tells the sender that the message has been rejected; but actually delivers it. -Pat ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
On Sun, 15 Aug 2004 11:56:10 +0100, Scott Mitchell <[EMAIL PROTECTED]> wrote: On Sun, Aug 15, 2004 at 12:30:01PM +0930, Paul A. Hoadley wrote: Hello, On Sat, Aug 14, 2004 at 09:13:32PM -0500, Gary wrote: > There are several techniques just to block them at SMTP negotiation > all together, so they don't even enter your system... Techniques for qmail? Without patching it? I thought I had RTFMd pretty thoroughly, but I am willing to be enlightened. Hi Paul, I don't know how committed to qmail you are, but Exim will do this out of the box. I'm pretty sure it's part of the default config file. With the exim+exiscan patches (available from ports) you can get even more creative and integrate virus scanning, SpamAssassin, etc. with very little effort. Cheers, Scott I have a "howto" to do this with postfix at http://rapier.digital-euphoria.net/~lordofla/stuff/postfix/howto/ The web based control panel relevant to the howto is in http://rapier.digital-euphoria.net/~lordofla/stuff/postfix/ HTH -- Mark Napper Owner, digitalEuphoria http://www.digital-euphoria.net/ - [EMAIL PROTECTED] 0044 7980 992 619 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 12:30:01PM +0930, Paul A. Hoadley wrote: > Hello, > > On Sat, Aug 14, 2004 at 09:13:32PM -0500, Gary wrote: > > > There are several techniques just to block them at SMTP negotiation > > all together, so they don't even enter your system... > > Techniques for qmail? Without patching it? I thought I had RTFMd > pretty thoroughly, but I am willing to be enlightened. Hi Paul, I don't know how committed to qmail you are, but Exim will do this out of the box. I'm pretty sure it's part of the default config file. With the exim+exiscan patches (available from ports) you can get even more creative and integrate virus scanning, SpamAssassin, etc. with very little effort. Cheers, Scott -- === Scott Mitchell | PGP Key ID | "Eagles may soar, but weasels Cambridge, England | 0x54B171B9 | don't get sucked into jet engines" scott at fishballoon.org | 0xAA775B8B | -- Anon ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 01:06:42PM +0930 or thereabouts, Paul A. Hoadley wrote: > On Sat, Aug 14, 2004 at 10:25:46PM -0500, Gary wrote: > > http://lifewithqmail.org/lwq.html#smtp-reject > > > > which will lead you here.. > > > > http://netdevice.com/qmail/rcptck/ > > Thanks. I was fairly sure it couldn't be done without patching. yes, for the 55x at SMTP level, but there are several others methods so that your queue will not fill with waiting junk from non-existent senders. One is a selective spamassassin setup by .qmail file, called ifspamh, IIRC, so you can drop this into any .qmail file you wish, and it will run spamassassin client, and then deliver it where you wish for inspection, deletion, etc... -- Gary ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
Hi Gary, On Sat, Aug 14, 2004 at 10:25:46PM -0500, Gary wrote: > Most are patches, and very good. I use Eben Pratt's goodrcptto > personally on my own server, and some that I have built for others > (gives me control for accepting mail from lists only for those lists > that do not subscribe via envelope sender, such as this > one)... there are several to choose from > > http://lifewithqmail.org/lwq.html#smtp-reject > > which will lead you here.. > > http://netdevice.com/qmail/rcptck/ Thanks. I was fairly sure it couldn't be done without patching. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ pgp2BwCrcOiIX.pgp Description: PGP signature
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 12:30:01PM +0930 or thereabouts, Paul A. Hoadley wrote: > Techniques for qmail? Without patching it? I thought I had RTFMd > pretty thoroughly, but I am willing to be enlightened. forgot to add, there are also challange/auth mechanisms that one can use too.. I have used these in the past, until the simplicity of Eben's goodrcptto made it and RBLDNS /tcp.smtp files outdated and not necessary. For example, on qconfirm, I used to just send it an email and it would list all that was pending. I could then accept / drop / bounce it, etc.. Of course, if you are getting those large numbers, this would be unworkable. they are qconfirm and tmda -- Gary ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
it was said: >The original problem was that _bouncing_ these messages is >fruitless---they almost invariably have a forged From address. I'm >getting on average about 10,000 of them per day, so there were >constantly several thousand messages in my queue, as well as several >thousand bounced bounces and failures in my postmaster mailbox every >day. Hello, Ahh! That is much clearer! You may want to look into ucspi-tcp in sysutils/ports. Its tcpserver, tcprules, and rblsmtpd "sub-programs" do a fairly good job of rejecting connections from undesirable smtp servers - from the individual address all the way to netblock level. See http://cr.yp.to/ucspi-tcp.html for details. Other possible options a something like spamassassin, route them to /dev/null, etc. Another 2%, Stheg __ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 12:30:01PM +0930 or thereabouts, Paul A. Hoadley wrote: > Hello, > > On Sat, Aug 14, 2004 at 09:13:32PM -0500, Gary wrote: > > P> I'm not sure that I can make qmail do anything else. These are spams > > P> sent to non-existent addresses at my domain, being caught by > > P> .qmail-default. > > > > Question... why do you have a .qmail-default file to begin with? If > > you have proper namespace or .qmail- files for your users, it is not > > necessary at all... all would then be bounced. Or if you wish just > > to drop mail coming in to .qmail-default, just put a # in it... > > Good question---without context, my claim that I can do nothing else > seems wrong. What I should have said is "given I have an interest in > collecting all the spams to non-existent addresses, I don't think I > can make qmail do anything other than deliver it to the new/ subdir of > a Maildir." ah, okay... makes sense now. > The original problem was that _bouncing_ these messages is > fruitless---they almost invariably have a forged From address. I'm > getting on average about 10,000 of them per day, so there were > constantly several thousand messages in my queue, as well as several > thousand bounced bounces and failures in my postmaster mailbox every > day. right... this is why I block them at the SMTP level... > IMHO, these messages should be _rejected_ at the SMTP session, though > (AFAICS) qmail won't do this (without being patched). (I am sure I > behaviour.) Anyway, I was about to embark on tracking down a patch to > do SMTP-level rejection, when I decided I would just funnel them into > a Maildir and use them later to train Bogofilter, or whatever. okay.. > > I would never think of collecting them at all, not even allow them > > in. > I may soon change my mind, though my original plan was to put the spam > to use. The sheer volume looks like making that plan unworkable. :-) hee, hee... always with spam.. > > There are several techniques just to block them at SMTP negotiation > > all together, so they don't even enter your system... > > Techniques for qmail? Without patching it? I thought I had RTFMd > pretty thoroughly, but I am willing to be enlightened. Most are patches, and very good. I use Eben Pratt's goodrcptto personally on my own server, and some that I have built for others (gives me control for accepting mail from lists only for those lists that do not subscribe via envelope sender, such as this one)... there are several to choose from http://lifewithqmail.org/lwq.html#smtp-reject which will lead you here.. http://netdevice.com/qmail/rcptck/ Other techniques are my own RBL lists, commercial RBLs, etc... -- Gary ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: [OT] Re: find -exec surprisingly slow
Hello, On Sat, Aug 14, 2004 at 08:01:47PM -0700, stheg olloydson wrote: > What I would do is avoid the problem in the first place by not > having a .qmail-default. Without a .qmail-default, qmail's default behaviour is to _accept_ the message and then _bounce_ it. IMHO, this is _worse_ than (a) saving the spam (which (I had hoped!) might be useful in other contexts), or (b) piping it to the bit bucket. Both (a) and (b) require a .qmail-default. Have I overlooked something really obvious here? Is there a way (preferably without patching it) to get qmail to _reject_ the mail sent to non-existent addresses? -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ pgpm0Fusalxmw.pgp Description: PGP signature
Re: find -exec surprisingly slow
"Paul A. Hoadley" <[EMAIL PROTECTED]> wrote: > Hello, > > On Sat, Aug 14, 2004 at 09:13:32PM -0500, Gary wrote: > > > P> I'm not sure that I can make qmail do anything else. These are spams > > P> sent to non-existent addresses at my domain, being caught by > > P> .qmail-default. > > > > Question... why do you have a .qmail-default file to begin with? If > > you have proper namespace or .qmail- files for your users, it is not > > necessary at all... all would then be bounced. Or if you wish just > > to drop mail coming in to .qmail-default, just put a # in it... > > Good question---without context, my claim that I can do nothing else > seems wrong. What I should have said is "given I have an interest in > collecting all the spams to non-existent addresses, I don't think I > can make qmail do anything other than deliver it to the new/ subdir of > a Maildir." > > The original problem was that _bouncing_ these messages is > fruitless---they almost invariably have a forged From address. I'm > getting on average about 10,000 of them per day, so there were > constantly several thousand messages in my queue, as well as several > thousand bounced bounces and failures in my postmaster mailbox every > day. > > IMHO, these messages should be _rejected_ at the SMTP session, though > (AFAICS) qmail won't do this (without being patched). (I am sure I > once read a "security" justification for this behaviour, though I > can't seem to find any justification for it at all now. I am willing > to be convinced otherwise, but IMHO, accepting these messages is bogus > behaviour.) I agree. > Anyway, I was about to embark on tracking down a patch to > do SMTP-level rejection, when I decided I would just funnel them into > a Maildir and use them later to train Bogofilter, or whatever. Well, if you do have a reason to keep them, as example spams for a Bayes filter, for example, then I can't say otherwise. I'm surprised that qmail doesn't allow you to reject these properly, but I haven't been using qmail for a while now, so I don't remember. I've switched to Postfix, as this is pretty easy to set up in Postfix. I think the default config file for Postfix is set up this way as it is. I hope you find a better solution. Good luck. > > P> What I am going to do is clear out the Maildir daily > > P> instead of monthly, though. Collecting them has become a significant > > P> drain on disk space---the 400K spams are the result of about a month > > P> and a half of collection. > > > > I would never think of collecting them at all, not even allow them > > in. > > I may soon change my mind, though my original plan was to put the spam > to use. The sheer volume looks like making that plan unworkable. :-) > > > There are several techniques just to block them at SMTP negotiation > > all together, so they don't even enter your system... > > Techniques for qmail? Without patching it? I thought I had RTFMd > pretty thoroughly, but I am willing to be enlightened. -- Bill Moran Potential Technologies http://www.potentialtech.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
[OT] Re: find -exec surprisingly slow
it was said: >I'm not sure that I can make qmail do anything else. These are spams >sent to non-existent addresses at my domain, being caught by >.qmail-default. What I am going to do is clear out the Maildir daily >instead of monthly, though. Collecting them has become a significant >drain on disk space---the 400K spams are the result of about a month >and a half of collection. Hello, What I would do is avoid the problem in the first place by not having a .qmail-default. I don't know how important being sure you have no false positive spam rejections to incorrect/misspelled addresses is to you, but is it worth accepting hundreds of thousands of spams and then looking through them to find the very few that may be legitimate? I think you would be better off creating variations in the users' .qmail file, such as paul.hoadley@, phoadley@, [EMAIL PROTECTED] That could been done via a script that gets called when you create a user, so the only extra work would be to write script the first time and plugging it in. (Of course, you would have to run it against your existing users, too.) As I said, I don't know what your requirements are, just my 2% of the applicable currency's base unit. HTH, Stheg __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
Hello, On Sat, Aug 14, 2004 at 09:13:32PM -0500, Gary wrote: > P> I'm not sure that I can make qmail do anything else. These are spams > P> sent to non-existent addresses at my domain, being caught by > P> .qmail-default. > > Question... why do you have a .qmail-default file to begin with? If > you have proper namespace or .qmail- files for your users, it is not > necessary at all... all would then be bounced. Or if you wish just > to drop mail coming in to .qmail-default, just put a # in it... Good question---without context, my claim that I can do nothing else seems wrong. What I should have said is "given I have an interest in collecting all the spams to non-existent addresses, I don't think I can make qmail do anything other than deliver it to the new/ subdir of a Maildir." The original problem was that _bouncing_ these messages is fruitless---they almost invariably have a forged From address. I'm getting on average about 10,000 of them per day, so there were constantly several thousand messages in my queue, as well as several thousand bounced bounces and failures in my postmaster mailbox every day. IMHO, these messages should be _rejected_ at the SMTP session, though (AFAICS) qmail won't do this (without being patched). (I am sure I once read a "security" justification for this behaviour, though I can't seem to find any justification for it at all now. I am willing to be convinced otherwise, but IMHO, accepting these messages is bogus behaviour.) Anyway, I was about to embark on tracking down a patch to do SMTP-level rejection, when I decided I would just funnel them into a Maildir and use them later to train Bogofilter, or whatever. > P> What I am going to do is clear out the Maildir daily > P> instead of monthly, though. Collecting them has become a significant > P> drain on disk space---the 400K spams are the result of about a month > P> and a half of collection. > > I would never think of collecting them at all, not even allow them > in. I may soon change my mind, though my original plan was to put the spam to use. The sheer volume looks like making that plan unworkable. :-) > There are several techniques just to block them at SMTP negotiation > all together, so they don't even enter your system... Techniques for qmail? Without patching it? I thought I had RTFMd pretty thoroughly, but I am willing to be enlightened. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ pgpddP9ABOTcL.pgp Description: PGP signature
Re: find -exec surprisingly slow
On Sat, Aug 14, 2004 at 08:11:54PM -0400, Garance A Drosihn wrote: > Where is '.' in the above `find .' command? Is it is on the same > partition as /home/paulh/tmp/spam/sne/ ? > > You may find it much faster to do something like: > mkdir usermail.new > chown user:group usermail.new > mv usermail usermail.bigspam > mv usermail.new usermail > cd usermail.bigspam > find . \! -atime +1 -exec mv {} ../usermail \; > > My assumption there is that you have a LOT fewer "good files" than > you have "bad files", so there will be fewer files to move. But I > am also making the assumption that all your files are in a single > directory (and not a tree of directories), which may be a bad > assumption. All assumptions correct, and that is what I should have done. > The thing to use is the '-J' option of xargs. That way you can have > the destination-directory be the last argument in the command that > gets executed, and yet you're still moving as many files in a single > `mv' command as possible. E.g., change my earlier `find' command > to: > find . \! -atime +1 -print0 | xargs -0J[] mv [] ../usermail > > Check the man page for xargs for a description of -J Will do. Thanks for the tip. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ pgp2CrTPBrx7j.pgp Description: PGP signature
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 12:39:33AM +0100, Matthew Seaman wrote: > find . -atime +1 -print0 | xargs -0 -J % mv % /home/paulh/tmp/spam/sne/ > > xargs defaults to taking up to 5,000 arguments from it's stdin to > generate the mv commands (or up to ARG_MAX - 4096 = 61440 bytes), so > that would have done the job with only 8 or so invocations of mv. Thanks for that. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 01:32:35AM +0200, Erik Trulsson wrote: > You seem to have missed the fact that operations on very large > directories (which a directory with 400K files in it certainly > qualifies as) simply are slow. Good point. I had overlooked that. > Reducing the number of processes spawned will certainly help some, > but a better idea is to not have so many files in a single directory > - that is just asking for trouble. I'm not sure that I can make qmail do anything else. These are spams sent to non-existent addresses at my domain, being caught by .qmail-default. What I am going to do is clear out the Maildir daily instead of monthly, though. Collecting them has become a significant drain on disk space---the 400K spams are the result of about a month and a half of collection. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
At 8:31 AM +0930 8/15/04, Paul A. Hoadley wrote: Hello, I'm in the process of cleaning a Maildir full of spam. It has somewhere in the vicinity of 400K files in it. I started running this yesterday: find . -atime +1 -exec mv {} /home/paulh/tmp/spam/sne/ \; It's been running for well over 12 hours. It certainly is working---the spams are slowly moving to their new home---but it is taking a long time. It's a very modest system, running 4.8-R on a P2-350. I assume this is all overhead for spawning a shell and running mv 400K times. Some of it is that, and some of it is the performance-penalty of deleting files from a directory which has 400K filenames in it, only to add the same files into a directory which will eventually have 400K filenames in it. Directory adds/deletes are not fast when a directory has that many filenames. It is probably even worse if there are other processes still working on the same directory (such as sendmail importing more mail). Where is '.' in the above `find .' command? Is it is on the same partition as /home/paulh/tmp/spam/sne/ ? You may find it much faster to do something like: mkdir usermail.new chown user:group usermail.new mv usermail usermail.bigspam mv usermail.new usermail cd usermail.bigspam find . \! -atime +1 -exec mv {} ../usermail \; My assumption there is that you have a LOT fewer "good files" than you have "bad files", so there will be fewer files to move. But I am also making the assumption that all your files are in a single directory (and not a tree of directories), which may be a bad assumption. Is there a better way to move all files based on some characteristic of their date stamp? Maybe separating the find and the move, piping it through xargs? The thing to use is the '-J' option of xargs. That way you can have the destination-directory be the last argument in the command that gets executed, and yet you're still moving as many files in a single `mv' command as possible. E.g., change my earlier `find' command to: find . \! -atime +1 -print0 | xargs -0J[] mv [] ../usermail Check the man page for xargs for a description of -J -- Garance Alistair Drosehn= [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Instituteor [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 08:31:43AM +0930, Paul A. Hoadley wrote: > Hello, > > I'm in the process of cleaning a Maildir full of spam. It has > somewhere in the vicinity of 400K files in it. I started running > this yesterday: > > find . -atime +1 -exec mv {} /home/paulh/tmp/spam/sne/ \; > > It's been running for well over 12 hours. It certainly is > working---the spams are slowly moving to their new home---but it is > taking a long time. It's a very modest system, running 4.8-R on a > P2-350. I assume this is all overhead for spawning a shell and > running mv 400K times. Is there a better way to move all files based > on some characteristic of their date stamp? Maybe separating the find > and the move, piping it through xargs? It's mostly done now, but I > will know better for next time. Yup. Invoking mv 40,000 times is not particularly efficient. Something like this would have been better: find . -atime +1 -print0 | xargs -0 -J % mv % /home/paulh/tmp/spam/sne/ xargs defaults to taking up to 5,000 arguments from it's stdin to generate the mv commands (or up to ARG_MAX - 4096 = 61440 bytes), so that would have done the job with only 8 or so invocations of mv. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK pgphJYJTiKvon.pgp Description: PGP signature
Re: find -exec surprisingly slow
On Sun, Aug 15, 2004 at 08:31:43AM +0930, Paul A. Hoadley wrote: > Hello, > > I'm in the process of cleaning a Maildir full of spam. It has > somewhere in the vicinity of 400K files in it. I started running > this yesterday: > > find . -atime +1 -exec mv {} /home/paulh/tmp/spam/sne/ \; > > It's been running for well over 12 hours. It certainly is > working---the spams are slowly moving to their new home---but it is > taking a long time. It's a very modest system, running 4.8-R on a > P2-350. I assume this is all overhead for spawning a shell and > running mv 400K times. I wouldn't make that assumption. The overhead for starting new processes is probably only a relatively small part of the time. You seem to have missed the fact that operations on very large directories (which a directory with 400K files in it certainly qualifies as) simply are slow. A directory is essentially just a list of the names of all the files in it and their i-nodes. To find a given file in a directory (e.g. in order to create, delete or rename it) the system needs to do a linear search through all the files in the directory. For directories containing large number of files this can take some time. If you have the UFS_DIRHASH kernel option enabled (which I believe is the default since 4.5-R) then the system will keep bunch of hash-tables in memory to avoid having to search through the whole directory every time. There is however an upper limit to how much memory will be used for such hashtables (2MB by default) and if this limit is exceeded (which it probably is in your case) things will slow down again. The effect of the UFS_DIRHASH option is effectively that instead of directory operations starting to slow down after a few thousand files in the same directory, you can have a few tens of thousands of files before operations start to become noticably slower. I am quite certain that if those 400K files had been divided into 40 directories, each with 10K files in it, things would have been much faster. > Is there a better way to move all files based > on some characteristic of their date stamp? Maybe separating the find > and the move, piping it through xargs? It's mostly done now, but I > will know better for next time. Reducing the number of processes spawned will certainly help some, but a better idea is to not have so many files in a single directory - that is just asking for trouble. -- Erik Trulsson [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"