Re: Sharing and merging bayes data?

2009-12-18 Thread Kai Schaetzl
Rajkumar S wrote on Fri, 18 Dec 2009 10:56:46 +0530:

 Is the file format of bayes db available some where?

dbm, gdbm ...

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Re: Sharing and merging bayes data?

2009-12-18 Thread RW
On Fri, 18 Dec 2009 10:56:46 +0530
Rajkumar S rajkum...@asianetindia.com wrote:

 On Fri, Dec 18, 2009 at 9:29 AM, Matt Kettler
 mkettler...@verizon.net wrote:
   As you mentioned, you'd need a custom script  (not wildly
  complicated for a good perl scripter, but beyond the bounds of
  someone with only crude scripting skills.) as well as historical
  copies of each database from the last merge.
 
 Is the file format of bayes db available some where? google did not
 turn up any thing. It would be great if some more information about
 how to go about merging the db can be posted.

You can use sa-learn --backup to dump it to a text file, the format of
that is pretty much self-explanatory. sa-learn --restore can load the
merged file back into SA. 


Re: Sharing and merging bayes data?

2009-12-17 Thread Thomas Harold

On 12/17/2009 2:50 AM, Rajkumar S wrote:

Hello,

I have 2 SA servers running for a single domain. Both were primed with
a set of 200 spam and ham messages are are now auto learning. After
about a day both have auto learned different numbers of ham and spam
mails. Is it possible to merge the bayes data every night and update
both servers with new merged data?



I'm not sure about the merging, but why would you not be using a SQL 
server back-end database for storing the bayes information.  Then both 
servers could reference and update the same data set.


Re: Sharing and merging bayes data?

2009-12-17 Thread Matt Kettler
On 12/17/2009 2:50 AM, Rajkumar S wrote:
 Hello,

 I have 2 SA servers running for a single domain. Both were primed with
 a set of 200 spam and ham messages are are now auto learning. After
 about a day both have auto learned different numbers of ham and spam
 mails. Is it possible to merge the bayes data every night and update
 both servers with new merged data?

 with regards,

 raj

   

No.. If you're using file-based bayes, there's no good way to share
updates between one DB and the other. The information needed to make
such a merger successful isn't stored, because it is not needed for any
reason within SpamAssassin. The database merely stores the token, it's
spam count, it's nonspam count, and a last-seen timestamp. If you look
at the same token in 2 different databases, you can't really merge these
counts, because you don't know how many occurred since your last merge.

 If you really want common bayes data between two servers, you should
configure bayes to use a SQL server (MySQL, etc) and point both
SpamAssassin configurations to the same database. This also has the
benefit that both servers are continuously in-sync.



Re: Sharing and merging bayes data?

2009-12-17 Thread Michael Scheidell

On 12/17/09 2:50 AM, Rajkumar S wrote:

Hello,

I have 2 SA servers running for a single domain. Both were primed with
a set of 200 spam and ham messages are are now auto learning. After
about a day both have auto learned different numbers of ham and spam
mails. Is it possible to merge the bayes data every night and update
both servers with new merged data?

with regards,

raj
   
One option, since 'mx2' version of bayes is heavily weighted towards 
'spam', is, just nightly, stop spamd, backup bayes from mx1, and restore 
to mx2.

(and its a lot easier if you use bayes).

or, as one posted suggested, use one single mysql database.  its faster 
and more stable.

(mark:
_
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.spammertrap.com

_


Re: Sharing and merging bayes data?

2009-12-17 Thread Rajkumar S
On Thu, Dec 17, 2009 at 7:44 PM, Michael Scheidell list-s...@secnap.com wrote:
 or, as one posted suggested, use one single mysql database.  its faster and
 more stable.

Thanks every one, mysql is the way to go.

raj


Re: Sharing and merging bayes data?

2009-12-17 Thread Ned Slider

On 12/17/2009 02:14 PM, Michael Scheidell wrote:

On 12/17/09 2:50 AM, Rajkumar S wrote:

Hello,

I have 2 SA servers running for a single domain. Both were primed with
a set of 200 spam and ham messages are are now auto learning. After
about a day both have auto learned different numbers of ham and spam
mails. Is it possible to merge the bayes data every night and update
both servers with new merged data?

with regards,

raj

One option, since 'mx2' version of bayes is heavily weighted towards
'spam', is, just nightly, stop spamd, backup bayes from mx1, and restore
to mx2.
(and its a lot easier if you use bayes).

or, as one posted suggested, use one single mysql database. its faster
and more stable.


Or if you really don't want to use a single mysql database, yet another 
alternative might be to disable autolearning, and manually train both 
servers against shared folders for spam/ham run as a nightly cronjob.




Re: Sharing and merging bayes data?

2009-12-17 Thread RW
On Thu, 17 Dec 2009 09:04:18 -0500
Matt Kettler mkettler...@verizon.net wrote:


 No.. If you're using file-based bayes, there's no good way to share
 updates between one DB and the other. The information needed to make
 such a merger successful isn't stored, because it is not needed for
 any reason within SpamAssassin. The database merely stores the token,
 it's spam count, it's nonspam count, and a last-seen timestamp. If
 you look at the same token in 2 different databases, you can't really
 merge these counts, because you don't know how many occurred since
 your last merge.

I'm not saying it's a good idea, but it is possible provided that you
retained the result of the previous merge. It should be simple to
script too.


Re: Sharing and merging bayes data?

2009-12-17 Thread Matt Kettler
On 12/17/2009 11:17 AM, RW wrote:
 If you're using file-based bayes, there's no good way to share
  updates between one DB and the other. The information needed to make
  such a merger successful isn't stored, because it is not needed for
  any reason within SpamAssassin. The database merely stores the token,
  it's spam count, it's nonspam count, and a last-seen timestamp. If
  you look at the same token in 2 different databases, you can't really
  merge these counts, because you don't know how many occurred since
  your last merge.
 
 I'm not saying it's a good idea, but it is possible provided that you
 retained the result of the previous merge. It should be simple to
 script too.

   
Agreed I didn't mean to say that a merge is impossible, it's just not
with the tools that SA comes with, and you need more info than just
what's in the current database.

 As you mentioned, you'd need a custom script  (not wildly complicated
for a good perl scripter, but beyond the bounds of someone with only
crude scripting skills.) as well as historical copies of each database
from the last merge.

Setting up SQL would be much easier.





Re: Sharing and merging bayes data?

2009-12-17 Thread Rajkumar S
On Fri, Dec 18, 2009 at 9:29 AM, Matt Kettler mkettler...@verizon.net wrote:
  As you mentioned, you'd need a custom script  (not wildly complicated
 for a good perl scripter, but beyond the bounds of someone with only
 crude scripting skills.) as well as historical copies of each database
 from the last merge.

Is the file format of bayes db available some where? google did not
turn up any thing. It would be great if some more information about
how to go about merging the db can be posted.

thanks and regards,

raj


Sharing and merging bayes data?

2009-12-16 Thread Rajkumar S
Hello,

I have 2 SA servers running for a single domain. Both were primed with
a set of 200 spam and ham messages are are now auto learning. After
about a day both have auto learned different numbers of ham and spam
mails. Is it possible to merge the bayes data every night and update
both servers with new merged data?

with regards,

raj