Re: More large spam....

2010-06-14 Thread Karsten Bräckelmann
On Sun, 2010-06-13 at 11:35 -0400, Charles Gregory wrote:
 On Sat, 12 Jun 2010, Karsten Bräckelmann wrote:

  There are just a very few rules scanning non-textual parts of a mail.
  Large-ish binary attachments don't have much of an impact on
  performance. Large-ish textual attachments potentially do.
 
 Now THAT is a curious comment. All the usage guidelines I have ever read 
 implied or outright stated that scanning mails over a certain size was a 
 significant degradation to system performance. Am I confusing the 

Well, a large message internally of course needs more memory and
slightly more time for parsing.

However, most RE rules, which account or the bulk of the load, are
operating on headers and rendered textual parts. They won't be run
against images, zip files, etc.


 guidelines for antivirus programs with those for SA? Would it be 'safe' to 
 run SA on messages with larger attachments? Anyone ever tested this?

Mind trying it yourself? If you're using spamc, just save such a message
and feed it to spamc with an appropriately large -s option. Does it take
significantly longer, or is it just about any other spam?

Also, do that test with ham. This is important, since, as you said, you
are merely getting less than one of these as spam. How many hams that
size do you get?


As a general thought -- though I believe I stated this before -- how
many messages are affected anyway? Both ham and spam. How many messages
larger than 500k and, say, less than 1M do you get in total? In percent
of your mail stream? Are you really afraid your system cannot cope with
a hand full of larger mail per week?

Or, to put it in other words: Even if processing such a mail does take
twice or three times as long burning your CPU, at the end of the week,
would you even notice the increased load?


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: More large spam....

2010-06-13 Thread Charles Gregory

On Sat, 12 Jun 2010, Karsten Bräckelmann wrote:

Please do not hijack a thread. Please do not hit Reply, if you do not
intend to reply and contribute to that thread. Removing all quoted text
and changing the Subject does *not* make it a new thread or post.
(Hint: In-Reply-To and References headers.)


(grumble grumble) Stupid mail programs (grumble grumble)
Yeah okay. Not so stupid. I'll comply

Footnote: and I was refraining from commenting on another thread on how 
people 'complain' about features of SA that don't work in ways that match 
*their* style of thinking Oh, the irony :)



Has there been any progress...

No changes since this has been asked the last time.


(nod) Alright. So far this is still a less than once a week phenomenon, 
for me personally. I just raise it occasionally to put a data point into 
the archives. If my inquiry had shaken lose a bunch of 'me too' comments, 
it might have led somewhere. But it hasn't, so the issue remains on the 
far back burner :)



There are just a very few rules scanning non-textual parts of a mail.
Large-ish binary attachments don't have much of an impact on
performance. Large-ish textual attachments potentially do.


Now THAT is a curious comment. All the usage guidelines I have ever read 
implied or outright stated that scanning mails over a certain size was a 
significant degradation to system performance. Am I confusing the 
guidelines for antivirus programs with those for SA? Would it be 'safe' to 
run SA on messages with larger attachments? Anyone ever tested this?


- C

Re: More large spam....

2010-06-12 Thread Karsten Bräckelmann
Please do not hijack a thread. Please do not hit Reply, if you do not
intend to reply and contribute to that thread. Removing all quoted text
and changing the Subject does *not* make it a new thread or post.

(Hint: In-Reply-To and References headers.)


On Sat, 2010-06-12 at 09:50 -0400, Charles Gregory wrote:
 I got another 1MB spam today.
 
 I still don't want to kill my system by attempting to scan every large 
 mail that comes in.

How many messages between 500k and 1M do you get per day?

 Has there been any progress on an 'option' to scan only text portions of 
 mail past a certain size limit and/or scan only the first X bytes? The 
 former is preferable because it avoids any issues with incomplete mail, or 
 text sections being last

No changes since this has been asked the last time. There are features
for this in 3.3, used by Amavis. This is not used by spamc.

There are just a very few rules scanning non-textual parts of a mail.
Large-ish binary attachments don't have much of an impact on
performance. Large-ish textual attachments potentially do.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}