[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 07:19 --- Subject: Re: SA 3.0 creates randomly extreme big bayes_journal On Sat, Oct 09, 2004 at 12:02:32AM -0700, [EMAIL PROTECTED] wrote: I wonder if there is any relationship between creating random huge journals and SA randomly growing suddenly to 250MB+. BTW: for someone experiencing this issue, I'd be interested in getting a copy (compressed of course!) of the extreme sized journal, and/or seeing the output of ls -las, stat, etc. There's only 1 way I can think of for a journal file to instantly grow to a large size (in normal usage, it will grow at a rate relative to the amount of mail being processed). When the journal write occurs, and a failure is detected, the code will truncate() the file back to the original size before the write (see BayesStore/DBM::cleanup()). The truncate() is accompanied by a warning: bayes: partial write to bayes journal $path ($len of $nbytes), recovering If this process somehow gets screwed up, the truncate could actually make the file larger and create what is known as a sparse file. ie: a bunch of data, a bunch of nothing (the OS typically inserts nulls when reading from the section that doesn't actually exist on disk), and a bunch of data. This behavior is new in 3.0.0, 2.6 would detect a partial write of journal data, internally jump ahead to the part that wasn't written, and try again. This could potentially lead to multiple writers clobbering each other, which could still happen for a partial write, but at least the journal file should be truncated() to a known good state. For examples of creating spare files: $ perl -e 'open(T, foo); print T hi; truncate(T,256*1048576); close(T);' $ ls -las foo 4 -rw-r--r--1 tvd wheel268435456 Oct 9 09:47 foo $ stat foo File: `foo' Size: 268435456 Blocks: 8 IO Block: 4096 Regular File Device: 803h/2051d Inode: 212997 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 1113/ tvd) Gid: ( 10/ wheel) Access: 2004-10-09 09:47:55.0 -0400 Modify: 2004-10-09 09:47:55.0 -0400 Change: 2004-10-09 09:47:55.0 -0400 So the file only actually has a small number of blocks used on disk to store hi\n (see below for comments about the actual size usage), but the filesystem reports the file as 256MB. In contrast: $ perl -e 'open(T, foo); print T hix(128*1048576); close(T);' $ ls -las foo 262404 -rw-r--r--1 tvd wheel268435456 Oct 9 09:50 foo $ stat foo File: `foo' Size: 268435456 Blocks: 524808 IO Block: 4096 Regular File Device: 803h/2051d Inode: 212997 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 1113/ tvd) Gid: ( 10/ wheel) Access: 2004-10-09 09:47:55.0 -0400 Modify: 2004-10-09 09:50:52.0 -0400 Change: 2004-10-09 09:50:52.0 -0400 This file actually has 256MB of data in it. Notice that the space actually used on disk is much higher. (the ls vs stat output may be a little confusing wrt blocks. ls reports 1k blocks, stat reports 512 byte blocks, and the actual file system block size (smallest amount of allocatable space in the FS) is 4096 bytes. so in the truncate version, there is only 1 FS block allocated for the file (4 x 1k == 1 x 4k), the bottom version has 65601 FS blocks allocated (262404 x 1k blocks == 65601 x 4k blocks)). --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 07:54 --- Sorry, I should have mentioned that I have copies of most of the occurences. However, I don't know if the block usage remains the same when I make a cp -a dir dir.new. I tar-gzipped the following dir, it's now 11 MB. Do you want to have it? Can I upload it to an FTP or shall I provide it to you via ftp or scp? Note that this is happening with SA 3.0 used by MailScanner, this is not a spamd issue, at least as far as I experience it. I have MailScanner+3.0 setup on two machines (our own mailserver and its backup MX), it occurs only on the main mailserver. Both are setup almost identical, the backup MX gets much less mail, though (about 10% of the main mailserver) and it's only spam. It didn't happen with MailScanner+2.63. (All other machines are milter+spamd2.63 because I'm reluctant to try spamd 3.0 with the memory problems getting reported. On one of the spamd machines (with the most mail traffic supposedly) I sometimes see one or two spamd processes collect massive amounts of memory (can be 1 GB), but this happens very rarily, less than once per month. If it is not detected early enough it crashes the machine eventually. I don't know of a way to limit this, AFAIK ulimit applies to logins and this won't work for a daemon, correct?) Here's the output for ls and stat as mentioned above. n8:/home/spamd/bayes.4 # ls -las total 305248 4 drwxrwsr-x2 spamdwww 4096 Oct 8 16:54 . 4 drwxr-xr-x 14 spamdmail 4096 Oct 8 17:01 .. 4 -rw---1 root www16 Oct 8 16:49 bayes.lock 4 -rw-rw-rw-1 spamdwww42 Oct 5 15:41 bayes.mutex 46116 -rw-rw-rw-1 root www 47168872 Oct 8 16:54 bayes_journal 202004 -rw-rw-rw-1 root www 206639592 Oct 8 16:32 bayes_journal.old 1544 -rw-rw-rw-1 spamdwww 2482176 Oct 8 16:04 bayes_seen 16420 -rw-rw-rw-1 spamdwww 20951040 Oct 8 16:50 bayes_toks 4 -rwxr-xr-x1 spamdwww 3499 Apr 12 18:14 create_bayes.pl 39136 -rw-r--r--1 root www 40030104 Oct 5 16:25 dump.txt 4 -rwx--1 spamdwww 943 Apr 12 19:34 stripbadtoken.pl 4 -rwx--1 spamdwww 229 Apr 12 19:06 timetest.pl n8:/home/spamd/bayes.4 # stat bayes_journal File: `bayes_journal' Size: 47168872Blocks: 92232 IO Block: 4096 regular file Device: 802h/2050d Inode: 785771 Links: 1 Access: (0666/-rw-rw-rw-) Uid: (0/root) Gid: (8/ www) Access: 2004-10-08 16:54:09.0 +0200 Modify: 2004-10-08 16:54:17.0 +0200 Change: 2004-10-08 17:01:07.0 +0200 n8:/home/spamd/bayes.4 # stat bayes_journal.old File: `bayes_journal.old' Size: 206639592 Blocks: 404008 IO Block: 4096 regular file Device: 802h/2050d Inode: 785775 Links: 1 Access: (0666/-rw-rw-rw-) Uid: (0/root) Gid: (8/ www) Access: 2004-10-08 16:51:03.0 +0200 Modify: 2004-10-08 16:32:12.0 +0200 Change: 2004-10-08 17:01:46.0 +0200 --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 08:54 --- Subject: Re: SA 3.0 creates randomly extreme big bayes_journal On Sat, Oct 09, 2004 at 07:54:27AM -0700, [EMAIL PROTECTED] wrote: However, I don't know if the block usage remains the same when I make a cp -a dir dir.new. I tar-gzipped the following dir, it's now 11 MB. Do you want to Hrm. I believe the issue is whether cp understands sparse files. The Linux cp I have (it looks like you're using Linux) seems to say it possibly supports sparse files based on a crude heuristic to determine if the file is sparse or not. So to be safe, I wouldn't trust cp. have it? Can I upload it to an FTP or shall I provide it to you via ftp or scp? If you can make it available, I'll grab it from you. I can make some ftp space available if that's easier. crashes the machine eventually. I don't know of a way to limit this, AFAIK ulimit applies to logins and this won't work for a daemon, correct?) ulimit applies to processes and their children. logins are simply a shell with children procs. ;) (BTW: ulimit -c 0 is great for httpd and such to prevent core files being written upon crash...) 46116 -rw-rw-rw-1 root www 47168872 Oct 8 16:54 bayes_journal 202004 -rw-rw-rw-1 root www 206639592 Oct 8 16:32 bayes_journal.old File: `bayes_journal' Size: 47168872Blocks: 92232 IO Block: 4096 regular file File: `bayes_journal.old' Size: 206639592 Blocks: 404008 IO Block: 4096 regular file Both of these seem to be non-sparse. In a cp version of the .old file, I'd look for a bunch of text, then a bunch of nulls, then potentially a bunch more text. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 09:23 --- Thanks for the hint. I had a look at the copied bayes_journal and bayes_journal.old files. They seem to contain stuff the normal bayes_journal doesn't have. F.i. the bayes_journal of size 47168872 contains over 800.000 lines. Some of them look like the stuff I see in a normal bayes_journal (but not exactly), but 813090 of the lines have an sa_generated. That's obviously what blows them to the sky. Examples: normal bayes_journal: a few dozen or hundred lines like this: t 1097337643 32578a6c95 blown_up bayes_journal: 99% of lines like this: m h [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] (it seems to start with thousands of these and eventually finishes with a mix of the following) the last 300 lines of the file are like this: (... indicates I skipped lines of the same type) ... t 1097247243 9826be506b n 0 1 c 0 1 1097247243 65cb22ebd8 ... c 0 1 1097247243 850959cf3e m h [EMAIL PROTECTED] n 0 1 c 0 1 1097247243 65cb22ebd8 ... c 0 1 1097247243 850959cf3e m h [EMAIL PROTECTED] EOF Does that already give you enough information or shall I still provide the complete files? --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 09:32 --- In addition to my last comment, I only noticed that now. The vast amount of those sa_generated lines seems to come in doubles. F.i. the file starts with m h [EMAIL PROTECTED] m h [EMAIL PROTECTED] m h [EMAIL PROTECTED] m h [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] and seems to go on this way almost up to the end. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 10:28 --- Subject: Re: SA 3.0 creates randomly extreme big bayes_journal On Sat, Oct 09, 2004 at 09:32:01AM -0700, [EMAIL PROTECTED] wrote: m h [EMAIL PROTECTED] m h [EMAIL PROTECTED] m h [EMAIL PROTECTED] m h [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] and seems to go on this way almost up to the end. Hrm. Those are message learn commands (m). Basically it says h/s for ham/spam, and then the message-id, which are all sha1 hash generated based on headers. Shouldn't be multiples together, and there should be token information before it (iirc, n # # (change ham/spam count), c # # token atime (learn token ham/spam, atime), m h/s msgid (message learned, ham/spam, msgid to avoid double learning, etc.), in that order...) Having the same entry multiple times (total) is ok, btw. sa-learn doesn't know a message was learned already until the journal is synced, so the same message can appear multiple times. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 10:38 --- Subject: Re: SA 3.0 creates randomly extreme big bayes_journal On Sat, Oct 09, 2004 at 09:23:55AM -0700, [EMAIL PROTECTED] wrote: lines. Some of them look like the stuff I see in a normal bayes_journal (but not exactly), but 813090 of the lines have an sa_generated. That's obviously what blows them to the sky. that's normal. normal bayes_journal: a few dozen or hundred lines like this: t 1097337643 32578a6c95 those are token atime updates. happens during scanning. blown_up bayes_journal: 99% of lines like this: m h [EMAIL PROTECTED] m s [EMAIL PROTECTED] m s [EMAIL PROTECTED] (it seems to start with thousands of these and eventually finishes with a mix of the following) Hrm. the last 300 lines of the file are like this: (... indicates I skipped lines of the same type) ... t 1097247243 9826be506b n 0 1 c 0 1 1097247243 65cb22ebd8 ... c 0 1 1097247243 850959cf3e m h [EMAIL PROTECTED] yeah, that's normal for learning to journal. Does that already give you enough information or shall I still provide the complete files? I'd like to get the file if I can. If there's actually lines and no nulls, it's not a truncate issue, so that's good. I'd like to see what they actually look like though. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 11:39 --- Subject: Re: SA 3.0 creates randomly extreme big bayes_journal On Sat, Oct 09, 2004 at 11:10:07AM -0700, [EMAIL PROTECTED] wrote: n (1 line) c (many) m (1 line) t (many) here it should then start over with n. well, the ncm (learn) and t (scan) are different operations, but otherwise, yeah. file maybe took an hour or less. Could be a problem with MailScanner - SA interaction, could it? Possibly, it's hard to say. This is part of my problem with bugs reported when not using our provided code -- it's pretty hard for us to debug/reproduce issues in code we didn't write, we don't use, etc. I don't see how multiple m lines from the same message would go into the journal without trying to learn the same message multiple times. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 14:56 --- Subject: Re: SA 3.0 creates randomly extreme big bayes_journal On Sat, Oct 09, 2004 at 01:38:56PM -0400, Theo Van Dinter wrote: I'd like to get the file if I can. If there's actually lines and no nulls, it's not a truncate issue, so that's good. I'd like to see what they actually look like though. Hrm. I got a copy of an errored journal file: 1096 c 3562112 m 4 n 266 t so yeah, the problem is the 3.5 million m lines. They do, by in large, all look duplicated for some reason. So the lines indicates ~1.8 million mails learned, but no corresponding ham/spam count updates or tokens, which is just wrong. I don't think you learned 1.8 million mails anyway. There is a ton of duplication. Individual msgid and repeat counts, respectively: 106 33120 16 3312 1 736 4 184 2 72 10 8 3 4 2 2 So something's up. I haven't seen this issue in normal spamd-type usage, so I'm tempted to blame MailScanner... However, looking at the code, I found something that seems odd, and could very well cause the issue. In fact, allow me to go: OMG! In 2.6x and 3.0, the sync_journal function (takes the journal data and updates the databases) calls seen_put (and seen_delete) to take the message id and store it in the database. BUT! seen_put and seen_delete check to see if learn_to_journal is set, and if so, defers the update to the journal! OMG OMG OMG! I can even reproduce this in normal SA mode! If you use sa-learn --sync to sync the journal, the problem doesn't exist, for some reason. If you let auto-sync occur during SA runs, the behavior happens due to the reason above. I got the behavior by setting: bayes_learn_to_journal 1 bayes_journal_max_size 1 then shoving messages through causes the problem to occur, and it actually keeps adding the same message over and over as well since seen is never updated. OMG! So the easy solution is to either have a special sync seen_put and seen_delete, or kluge the learn_to_journal setting around the calls. I think the first is the right solution. Patch forthcoming, then please test it for me. :) I have no idea how this hasn't been seen before. This code has been for ages. The m code was new to 2.6, so it's been over a year. geez! --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 [EMAIL PROTECTED] changed: What|Removed |Added CC||dev@spamassassin.apache.org AssignedTo|dev@spamassassin.apache.org |[EMAIL PROTECTED] Severity|normal |major Priority|P5 |P3 Target Milestone|Future |3.0.1 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 15:02 --- taking the ticket, moving to 3.0.1 queue, etc. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. You are on the CC list for the bug, or are watching someone who is.
[Bug 3872] SA 3.0 creates randomly extreme big bayes_journal
http://bugzilla.spamassassin.org/show_bug.cgi?id=3872 --- Additional Comments From [EMAIL PROTECTED] 2004-10-09 15:11 --- Created an attachment (id=2439) -- (http://bugzilla.spamassassin.org/attachment.cgi?id=2439action=view) suggested patch I'm still stunned about this! OMG! Anyway, here's my patch. It makes a new seen_{put,delete}_direct function that is called directly from the sync_journal function. the normal seen_{put,delete} is still available for the public API, and calls *_direct appropriately which I consider a private API just usable by the sync_journal function. --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is.