Original-Nachricht
Datum: Fri, 19 Dec 2008 09:56:04 +
Von: Matt Galloway matt.gallo...@senokian.com
An: dspam-users@lists.nuclearelephant.com
Betreff: Re: [dspam-users] DSpam global user advice
Many thanks for the amazingly quick reply!
Yes I used dspam_train first to train one of the users which will be
merged into globaluser and then I merged that user into globaluser.
Basically my setup is that we do mail for a large number of people and
we want the ability to use a globaluser so that everyone has training
data, but then the global user will be updated every month with the
training data of 3 users (who get large amounts of spam). Therefore I
want to make sure that at each merge step, it's not just adding the
users, but rather starting again for globaluser.
I don't understand that. Allow me to rephrase:
User A: We call him A
User B: We call him B
User C: We call him C
User D: We call him D
User E: We call him E
User F: We call him F
merged group name: globaluser
Now you have merged A + B + C into globaluser
globaluser is the merged group name and it's used for all users
Now during a time range user A, B and C get a lot of SPAM and you want their
training to be included into globaluser. Right?
Or do you want just to take the data from A + B + C and then recreate the
globaluser every month (or so)?
The problem with the second way is:
1) Initial
A has 1000 tokens
B has 2000 tokens
C has 3000 tokens
globaluser has 0 tokens
2) Merging
A + B + C ~ 4500 tokens (Just an assumption. Less because some tokens are
probably found in A and in B and in C)
This results into 4500 tokens for globaluser
3) Setting globaluser as merged group for all users
All the users have now out of the box at least 4500 tokens
4) Running for one month
A get's new 200 tokens
B get's new 100 tokens
C get's new 50 tokens
5) You do your daily purging and cleaning of the data
A had 1000 (from item 1) + 200 (from item 4) tokens - after purging he has -
500 tokens
B had 2000 (from item 1) + 100 (from item 4) tokens - after purging he has -
1000 tokens
C had 3000 (from item 1) + 50 (from item 4) tokens - after purging he has -
1500 tokens
globaluser still has 4500 and you hopefully have excluded globaluser from
purging
6) You recreate globaluser from A + B + C
Now do the math. What do you think will you get into globaluser? More or less
tokens then it had before? You will get LESS because A + B + C just has 3000
tokens including the doubles. And before you had alone 4500 tokens for
globaluser.
So it is a very bad idea to just erase globaluser in each run. And you have to
keep in mind that from that moment where you activate the merged group, all
user tokens are just the delta between there tokens and the tokens from the
globaluser at the time when DSPAM calculated the tokens. So erasing the tokens
in each merger run in globaluser is NOT making DSPAM stronger. It is making it
weaker!
Better would be to train allone the global user. For example by activating
corpus creation for A, B, C and then do on a regular basis a training of
globaluser from the data you collect at A, B and C. Or you exclude A, B, C from
the merged global group and add A, B and C into a merged and managed group.
Then training from A, B and C will flow into globaluser and the other users
(all except A, B, C) will automatically get the result from the training of A,
B and C.
Do you understand what I mean?
Does that make sense, or is it completely wacky?
It did not make sense to me. But English is not my native language and it could
be that I did not understand you right.
Also, does anyone have any ideas about my other message regarding the
signature not appearing anywhere? It suddenly disappeared and I don't
know what setting made it go away! So confused!
Thanks again,
Matt
Steve
Steve wrote:
If you want to merge into the global user, then DO NOT delete the old
data in the globaluser. Just merge into it but don't delete.
If you want to start from the beginning, then just delete the old data
and remerge again from whatever userdata you like.
Better would be to use raw mail data and use dspam_train to train the
globaluser.
Original-Nachricht
Datum: Fri, 19 Dec 2008 09:27:46 +
Von: Matt Galloway matt.gallo...@senokian.com
An: dspam-users@lists.nuclearelephant.com
Betreff: [dspam-users] DSpam global user advice
Hello again,
I have another question... this one is regarding dspam with a global
user. I am using a merged globaluser like so:
globaluser:merged:*
This seems to be working correctly (in the logs it states that the
user's data is being merged with globaluser's data) so that's good. And
I created the globaluser by dspam_merge on a few users. Now what I want
to do is update this globaluser from another merge of a few users,
but
what's the best way to go about this? Should I delete all data