Re: Sharing and merging bayes data?
Rajkumar S wrote on Fri, 18 Dec 2009 10:56:46 +0530: Is the file format of bayes db available some where? dbm, gdbm ... Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: Sharing and merging bayes data?
On Fri, 18 Dec 2009 10:56:46 +0530 Rajkumar S rajkum...@asianetindia.com wrote: On Fri, Dec 18, 2009 at 9:29 AM, Matt Kettler mkettler...@verizon.net wrote: As you mentioned, you'd need a custom script (not wildly complicated for a good perl scripter, but beyond the bounds of someone with only crude scripting skills.) as well as historical copies of each database from the last merge. Is the file format of bayes db available some where? google did not turn up any thing. It would be great if some more information about how to go about merging the db can be posted. You can use sa-learn --backup to dump it to a text file, the format of that is pretty much self-explanatory. sa-learn --restore can load the merged file back into SA.
Re: Sharing and merging bayes data?
On 12/17/2009 2:50 AM, Rajkumar S wrote: Hello, I have 2 SA servers running for a single domain. Both were primed with a set of 200 spam and ham messages are are now auto learning. After about a day both have auto learned different numbers of ham and spam mails. Is it possible to merge the bayes data every night and update both servers with new merged data? I'm not sure about the merging, but why would you not be using a SQL server back-end database for storing the bayes information. Then both servers could reference and update the same data set.
Re: Sharing and merging bayes data?
On 12/17/2009 2:50 AM, Rajkumar S wrote: Hello, I have 2 SA servers running for a single domain. Both were primed with a set of 200 spam and ham messages are are now auto learning. After about a day both have auto learned different numbers of ham and spam mails. Is it possible to merge the bayes data every night and update both servers with new merged data? with regards, raj No.. If you're using file-based bayes, there's no good way to share updates between one DB and the other. The information needed to make such a merger successful isn't stored, because it is not needed for any reason within SpamAssassin. The database merely stores the token, it's spam count, it's nonspam count, and a last-seen timestamp. If you look at the same token in 2 different databases, you can't really merge these counts, because you don't know how many occurred since your last merge. If you really want common bayes data between two servers, you should configure bayes to use a SQL server (MySQL, etc) and point both SpamAssassin configurations to the same database. This also has the benefit that both servers are continuously in-sync.
Re: Sharing and merging bayes data?
On 12/17/09 2:50 AM, Rajkumar S wrote: Hello, I have 2 SA servers running for a single domain. Both were primed with a set of 200 spam and ham messages are are now auto learning. After about a day both have auto learned different numbers of ham and spam mails. Is it possible to merge the bayes data every night and update both servers with new merged data? with regards, raj One option, since 'mx2' version of bayes is heavily weighted towards 'spam', is, just nightly, stop spamd, backup bayes from mx1, and restore to mx2. (and its a lot easier if you use bayes). or, as one posted suggested, use one single mysql database. its faster and more stable. (mark: _ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.spammertrap.com _
Re: Sharing and merging bayes data?
On Thu, Dec 17, 2009 at 7:44 PM, Michael Scheidell list-s...@secnap.com wrote: or, as one posted suggested, use one single mysql database. its faster and more stable. Thanks every one, mysql is the way to go. raj
Re: Sharing and merging bayes data?
On 12/17/2009 02:14 PM, Michael Scheidell wrote: On 12/17/09 2:50 AM, Rajkumar S wrote: Hello, I have 2 SA servers running for a single domain. Both were primed with a set of 200 spam and ham messages are are now auto learning. After about a day both have auto learned different numbers of ham and spam mails. Is it possible to merge the bayes data every night and update both servers with new merged data? with regards, raj One option, since 'mx2' version of bayes is heavily weighted towards 'spam', is, just nightly, stop spamd, backup bayes from mx1, and restore to mx2. (and its a lot easier if you use bayes). or, as one posted suggested, use one single mysql database. its faster and more stable. Or if you really don't want to use a single mysql database, yet another alternative might be to disable autolearning, and manually train both servers against shared folders for spam/ham run as a nightly cronjob.
Re: Sharing and merging bayes data?
On Thu, 17 Dec 2009 09:04:18 -0500 Matt Kettler mkettler...@verizon.net wrote: No.. If you're using file-based bayes, there's no good way to share updates between one DB and the other. The information needed to make such a merger successful isn't stored, because it is not needed for any reason within SpamAssassin. The database merely stores the token, it's spam count, it's nonspam count, and a last-seen timestamp. If you look at the same token in 2 different databases, you can't really merge these counts, because you don't know how many occurred since your last merge. I'm not saying it's a good idea, but it is possible provided that you retained the result of the previous merge. It should be simple to script too.
Re: Sharing and merging bayes data?
On 12/17/2009 11:17 AM, RW wrote: If you're using file-based bayes, there's no good way to share updates between one DB and the other. The information needed to make such a merger successful isn't stored, because it is not needed for any reason within SpamAssassin. The database merely stores the token, it's spam count, it's nonspam count, and a last-seen timestamp. If you look at the same token in 2 different databases, you can't really merge these counts, because you don't know how many occurred since your last merge. I'm not saying it's a good idea, but it is possible provided that you retained the result of the previous merge. It should be simple to script too. Agreed I didn't mean to say that a merge is impossible, it's just not with the tools that SA comes with, and you need more info than just what's in the current database. As you mentioned, you'd need a custom script (not wildly complicated for a good perl scripter, but beyond the bounds of someone with only crude scripting skills.) as well as historical copies of each database from the last merge. Setting up SQL would be much easier.
Re: Sharing and merging bayes data?
On Fri, Dec 18, 2009 at 9:29 AM, Matt Kettler mkettler...@verizon.net wrote: As you mentioned, you'd need a custom script (not wildly complicated for a good perl scripter, but beyond the bounds of someone with only crude scripting skills.) as well as historical copies of each database from the last merge. Is the file format of bayes db available some where? google did not turn up any thing. It would be great if some more information about how to go about merging the db can be posted. thanks and regards, raj
Sharing and merging bayes data?
Hello, I have 2 SA servers running for a single domain. Both were primed with a set of 200 spam and ham messages are are now auto learning. After about a day both have auto learned different numbers of ham and spam mails. Is it possible to merge the bayes data every night and update both servers with new merged data? with regards, raj