Re: [Dspam-user] Global Group Not Training

2010-05-02 Thread Stevan Bajić
On Sat, 1 May 2010 11:11:24 +0200
Stevan Bajić  wrote:

> On Fri, 30 Apr 2010 15:23:41 -0400
> Ed Szynaka  wrote:
> 
> [...]
> > 
> I am still working on the fixup of group support in DSPAM. I never took the 
> time to completely read the code regarding group support. So this is new to 
> me.
> 
> Anyway... just by looking at the code I have now found out that a user can 
> not be in a shared group AND in a merged group. I think the README needs to 
> be rewritten to be more clear regarding how groups work and which groups can 
> be combined and which can not.
> 
> @Ed: You are for sure more fluent with English than I am. How about you 
> rewriting that part of the README? I have rewritten some paragraphs but I 
> will not commit to GIT until I don't get an okay from your part that the 
> group support is fixed. But after the commit to GIT you could sure spare some 
> minutes and rewrite the text about group support in the README. What do you 
> think?
> 
I am going to commit the changed code to GIT. I think this is the best approach 
to find problems and fix them. If any one here is using shared groups or 
shared,merged groups or merged groups or classification network groups or 
global classification groups then check out DSPAM from GIT and let me know if 
you find any issues.

> 
> > -- 
> > Ed Szynaka
> 
> -- 
> Kind Regards from Switzerland,
> 
> Stevan Bajić
> 

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-05-01 Thread Stevan Bajić
On Fri, 30 Apr 2010 15:23:41 -0400
Ed Szynaka  wrote:

[...]
> 
I am still working on the fixup of group support in DSPAM. I never took the 
time to completely read the code regarding group support. So this is new to me.

Anyway... just by looking at the code I have now found out that a user can not 
be in a shared group AND in a merged group. I think the README needs to be 
rewritten to be more clear regarding how groups work and which groups can be 
combined and which can not.

@Ed: You are for sure more fluent with English than I am. How about you 
rewriting that part of the README? I have rewritten some paragraphs but I will 
not commit to GIT until I don't get an okay from your part that the group 
support is fixed. But after the commit to GIT you could sure spare some minutes 
and rewrite the text about group support in the README. What do you think?


> -- 
> Ed Szynaka

-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 15:23:41 -0400
Ed Szynaka  wrote:

[...]
> The README contradicts that
> 
>--classify
>Tells DSPAM only to classify the message, and not make any writes to the
>user's metadata or attempt to deliver/quarantine the message.
> 
>NOTE: The output of the classification is specific to the user, not 
> including
>  the output of any groups they might be affiliated with, so it is
>  entirely possible that the message would be caught as spam by the 
> group,
>  even if it didn't appear in the classification.  If you want to get
>  the classification for the GROUP, use the group name as the user
>  instead of an individual.
> 
I am going to change the code so that groups are not evaluated if using 
--classify.


> Also where would I look in the debug output to find which set of tokens 
> determined the final result?  In my debugs I only see if a user is included 
> in a 
> group but not whether the user or CLASSIFICATION group data was used for the 
> final result.
> 
> 
> -- 
> Ed Szynaka
> Network/Systems Manager
> LocalNet Corp./CoreComm Internet Services
> 

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 15:23:41 -0400
Ed Szynaka  wrote:

[...]
> 
> The README contradicts that
> 
>--classify
>Tells DSPAM only to classify the message, and not make any writes to the
>user's metadata or attempt to deliver/quarantine the message.
> 
>NOTE: The output of the classification is specific to the user, not 
> including
>  the output of any groups they might be affiliated with, so it is
>  entirely possible that the message would be caught as spam by the 
> group,
>  even if it didn't appear in the classification.  If you want to get
>  the classification for the GROUP, use the group name as the user
>  instead of an individual.
> 
For merged groups this is not working that way. I have to consult the code to 
see if this is the case. For shared or shared,managed groups one can check the 
group to get the final result. But for merged group the user tokens are merged 
at runtime with the group tokens and just checking user tokens might have 
another result then just checking the group tokens and yet another result if 
you check the user incl. his group membership.


> Also where would I look in the debug output to find which set of tokens 
> determined the final result?  In my debugs I only see if a user is included 
> in a 
> group but not whether the user or CLASSIFICATION group data was used for the 
> final result.
> 
Take the last patch I have send you. That should print out more info.

> 
> -- 
> Ed Szynaka
> Network/Systems Manager
> LocalNet Corp./CoreComm Internet Services
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Ed Szynaka
Stevan Bajić wrote:
> On Fri, 30 Apr 2010 14:28:27 -0400
> Ed Szynaka  wrote:
> 
>> Stevan Bajić wrote:
>>> On Fri, 30 Apr 2010 14:03:06 -0400
>>> Ed Szynaka  wrote:
>>>
 Based on my reading of the README I thought --classify was used to 
 determine an 
 individuals results independent of group, mostly for testing.

>>> Wrong. Classify should consult groups. If not then the log you send me is 
>>> showing no error. You used "--classify" and according to the above 
>>> statement from you the switch "--classify" should NOT consult groups.
>>>
>> H... I included the 2 classify results assuming it was showing only the 
>> results for the test user and the corpus user.  Then I ran the regular 
>> processing of the message and compared the results to the classify results.  
>> I 
>> thought this would indicate whether it had stuck with the users data or if 
>> it 
>> had deferred to the corpususer result.
>>
>>
>>> ~/src/dspam-3.9.0/src$ sudo ./dspam --classify --user  --debug < 
>>> ~/innocent.mail 2>&1
>>> X-DSPAM-Result: ; result="Innocent"; class="Innocent"; 
>>> probability=0.; confidence=0.99; signature=N/A
>>>
>>> ~/src/dspam-3.9.0/src$ sudo ./dspam --classify --user corpususer --debug < 
>>> ~/innocent.mail 2>&1
>>> X-DSPAM-Result: corpususer; result="Spam"; class="Spam"; 
>>> probability=1.; confidence=0.55; signature=N/A
>>>
>>> ~/src/dspam-3.9.0/src$ sudo ./dspam --stdout --deliver=innocent,spam --user 
>>>  --debug < ~/spam.mail | grep X-DSPAM
>>> X-DSPAM-Result: Innocent
>>> X-DSPAM-Processed: Fri Apr 30 11:41:05 2010
>>> X-DSPAM-Confidence: 0.9899
>>> X-DSPAM-Probability: 0.
>>> X-DSPAM-Signature: 3856,4bdafa11196654320513085
>> I thought these results indicated that dspam had not deferred to the 
>> corpususer 
>> for its result even though the  has only processed 21 previous 
>> messages.  Am I interpreting these results incorrectly?
>>
> Yes. You are interpreting those results incorrectly.
> 
> IMHO using --classify or --process should result in the same result, class, 
> probability and confidence.
> 
> So there is no way to say if global/classification groups where considered 
> without looking at the debug output.
> 

The README contradicts that

   --classify
   Tells DSPAM only to classify the message, and not make any writes to the
   user's metadata or attempt to deliver/quarantine the message.

   NOTE: The output of the classification is specific to the user, not including
 the output of any groups they might be affiliated with, so it is
 entirely possible that the message would be caught as spam by the 
group,
 even if it didn't appear in the classification.  If you want to get
 the classification for the GROUP, use the group name as the user
 instead of an individual.

Also where would I look in the debug output to find which set of tokens 
determined the final result?  In my debugs I only see if a user is included in 
a 
group but not whether the user or CLASSIFICATION group data was used for the 
final result.


-- 
Ed Szynaka
Network/Systems Manager
LocalNet Corp./CoreComm Internet Services

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 14:28:27 -0400
Ed Szynaka  wrote:

> Stevan Bajić wrote:
> > On Fri, 30 Apr 2010 14:03:06 -0400
> > Ed Szynaka  wrote:
> > 
> >> Based on my reading of the README I thought --classify was used to 
> >> determine an 
> >> individuals results independent of group, mostly for testing.
> >>
> > Wrong. Classify should consult groups. If not then the log you send me is 
> > showing no error. You used "--classify" and according to the above 
> > statement from you the switch "--classify" should NOT consult groups.
> > 
> 
> H... I included the 2 classify results assuming it was showing only the 
> results for the test user and the corpus user.  Then I ran the regular 
> processing of the message and compared the results to the classify results.  
> I 
> thought this would indicate whether it had stuck with the users data or if it 
> had deferred to the corpususer result.
> 
> 
> > ~/src/dspam-3.9.0/src$ sudo ./dspam --classify --user  --debug < 
> > ~/innocent.mail 2>&1
> > X-DSPAM-Result: ; result="Innocent"; class="Innocent"; 
> > probability=0.; confidence=0.99; signature=N/A
> > 
> > ~/src/dspam-3.9.0/src$ sudo ./dspam --classify --user corpususer --debug < 
> > ~/innocent.mail 2>&1
> > X-DSPAM-Result: corpususer; result="Spam"; class="Spam"; 
> > probability=1.; confidence=0.55; signature=N/A
> > 
> > ~/src/dspam-3.9.0/src$ sudo ./dspam --stdout --deliver=innocent,spam --user 
> >  --debug < ~/spam.mail | grep X-DSPAM
> > X-DSPAM-Result: Innocent
> > X-DSPAM-Processed: Fri Apr 30 11:41:05 2010
> > X-DSPAM-Confidence: 0.9899
> > X-DSPAM-Probability: 0.
> > X-DSPAM-Signature: 3856,4bdafa11196654320513085
> 
> I thought these results indicated that dspam had not deferred to the 
> corpususer 
> for its result even though the  has only processed 21 previous 
> messages.  Am I interpreting these results incorrectly?
> 
Yes. You are interpreting those results incorrectly.

IMHO using --classify or --process should result in the same result, class, 
probability and confidence.

So there is no way to say if global/classification groups where considered 
without looking at the debug output.


> -- 
> Ed Szynaka
> Network/Systems Manager
> LocalNet Corp./CoreComm Internet Services
> 

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Ed Szynaka
Stevan Bajić wrote:
> On Fri, 30 Apr 2010 14:03:06 -0400
> Ed Szynaka  wrote:
> 
>> Based on my reading of the README I thought --classify was used to determine 
>> an 
>> individuals results independent of group, mostly for testing.
>>
> Wrong. Classify should consult groups. If not then the log you send me is 
> showing no error. You used "--classify" and according to the above statement 
> from you the switch "--classify" should NOT consult groups.
> 

H... I included the 2 classify results assuming it was showing only the 
results for the test user and the corpus user.  Then I ran the regular 
processing of the message and compared the results to the classify results.  I 
thought this would indicate whether it had stuck with the users data or if it 
had deferred to the corpususer result.


> ~/src/dspam-3.9.0/src$ sudo ./dspam --classify --user  --debug < 
> ~/innocent.mail 2>&1
> X-DSPAM-Result: ; result="Innocent"; class="Innocent"; 
> probability=0.; confidence=0.99; signature=N/A
> 
> ~/src/dspam-3.9.0/src$ sudo ./dspam --classify --user corpususer --debug < 
> ~/innocent.mail 2>&1
> X-DSPAM-Result: corpususer; result="Spam"; class="Spam"; probability=1.; 
> confidence=0.55; signature=N/A
> 
> ~/src/dspam-3.9.0/src$ sudo ./dspam --stdout --deliver=innocent,spam --user 
>  --debug < ~/spam.mail | grep X-DSPAM
> X-DSPAM-Result: Innocent
> X-DSPAM-Processed: Fri Apr 30 11:41:05 2010
> X-DSPAM-Confidence: 0.9899
> X-DSPAM-Probability: 0.
> X-DSPAM-Signature: 3856,4bdafa11196654320513085

I thought these results indicated that dspam had not deferred to the corpususer 
for its result even though the  has only processed 21 previous 
messages.  Am I interpreting these results incorrectly?

-- 
Ed Szynaka
Network/Systems Manager
LocalNet Corp./CoreComm Internet Services

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 14:03:06 -0400
Ed Szynaka  wrote:

> Based on my reading of the README I thought --classify was used to determine 
> an 
> individuals results independent of group, mostly for testing.
>
Wrong. Classify should consult groups. If not then the log you send me is 
showing no error. You used "--classify" and according to the above statement 
from you the switch "--classify" should NOT consult groups.


> And the --process 
> would use the group, even if that group was a CLASSIFICATION group.
> 
Correct.


> Ed
-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Ed Szynaka
Stevan Bajić wrote:
> On Fri, 30 Apr 2010 12:03:28 -0400
> Ed Szynaka  wrote:
> 
>> I just want to say thanks for this explanation.  Much more clear than the 
>> README.  I really feel like I get the point of all the syntax now.
>>
> Ohhh boy. I see now why it did not worked with my patch. This here is the 
> issue:
> CTX->operating_mode == DSM_PROCESS
> 
> So the DSPAM must be in processing mode. I guess that is not very logical for 
> a CLASSIFICATION group?
> 

Well I think that while the names create an unfortunate collison (--classify 
and 
CLASSIFICATION group) the line above does seem like it makes sense.

Based on my reading of the README I thought --classify was used to determine an 
individuals results independent of group, mostly for testing.  And the 
--process 
would use the group, even if that group was a CLASSIFICATION group.

Ed
-- 
Ed Szynaka
Network/Systems Manager
LocalNet Corp./CoreComm Internet Services

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Ed Szynaka


Stevan Bajić wrote:
> On Fri, 30 Apr 2010 12:03:28 -0400
> Ed Szynaka  wrote:
> 
>> I just want to say thanks for this explanation.  Much more clear than the 
>> README.  I really feel like I get the point of all the syntax now.
>>
> I am trying to fix that logic now. Reading the README I see this here:
> --
>   groupname:classification:*globaluser
> 
>   This will automatically add globaluser as a classification peer to all 
> users.
>   Any user who has less than 1000 innocent messages or 250 spam messages in
>   their corpus, or whose filter is uncertain about a particular message will
>   consult the global dictionary for an answer.
> --
> 
> The documentation IMHO is not very clear.
> 
> When is a global group consulted?
> 
> Option A:
> [having less than 1000 innocent messages] OR [having less than 250 spam 
> messages] OR [uncertain result about a particular message]
> 
> 
> Option B:
> ([having less than 1000 innocent messages] AND [having less than 250 spam 
> messages]) OR [uncertain result about a particular message]
> 
> 
> The original code in DSPAM is not clear about the logic. It is a total mess.
> 
> I personally would say that A is what should be implemented. What is your 
> oppinion?
> 
> 

I would agree that A is the proper logic based on the README.

> 
>> And just in case there was some misunderstanding; I've got no issues with 
>> the 
>> merged group as its implemented, its just not quite what I need.
>>
>> Thanks,
>> Ed
>>

-- 
Ed Szynaka
Network/Systems Manager
LocalNet Corp./CoreComm Internet Services

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 12:03:28 -0400
Ed Szynaka  wrote:

> I just want to say thanks for this explanation.  Much more clear than the 
> README.  I really feel like I get the point of all the syntax now.
> 
Ohhh boy. I see now why it did not worked with my patch. This here is the issue:
CTX->operating_mode == DSM_PROCESS

So the DSPAM must be in processing mode. I guess that is not very logical for a 
CLASSIFICATION group?


> And just in case there was some misunderstanding; I've got no issues with the 
> merged group as its implemented, its just not quite what I need.
> 
> Thanks,
> Ed
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 12:03:28 -0400
Ed Szynaka  wrote:

> I just want to say thanks for this explanation.  Much more clear than the 
> README.  I really feel like I get the point of all the syntax now.
> 
I am trying to fix that logic now. Reading the README I see this here:
--
  groupname:classification:*globaluser

  This will automatically add globaluser as a classification peer to all users.
  Any user who has less than 1000 innocent messages or 250 spam messages in
  their corpus, or whose filter is uncertain about a particular message will
  consult the global dictionary for an answer.
--

The documentation IMHO is not very clear.

When is a global group consulted?

Option A:
[having less than 1000 innocent messages] OR [having less than 250 spam 
messages] OR [uncertain result about a particular message]


Option B:
([having less than 1000 innocent messages] AND [having less than 250 spam 
messages]) OR [uncertain result about a particular message]


The original code in DSPAM is not clear about the logic. It is a total mess.

I personally would say that A is what should be implemented. What is your 
oppinion?



> And just in case there was some misunderstanding; I've got no issues with the 
> merged group as its implemented, its just not quite what I need.
> 
> Thanks,
> Ed
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Ed Szynaka
I just want to say thanks for this explanation.  Much more clear than the 
README.  I really feel like I get the point of all the syntax now.

And just in case there was some misunderstanding; I've got no issues with the 
merged group as its implemented, its just not quite what I need.

Thanks,
Ed

Stevan Bajić wrote:
> On Fri, 30 Apr 2010 09:48:41 -0400
> Ed Szynaka  wrote:
> 
>> Yes that's how the README lists it as well.  The only reason I tried the 
>> variations was to attempt to get the debug to let me know the user would 
>> also 
>> check the group.  I attempted the group name as the same as the corpus 
>> username 
>> to see if it worked similar to merged groups.
>>
> Merged and shared groups use the groupname for internal processing. 
> Classification groups don't do anything with the group name.
> 
> 
>> Mostly I tried the variations because the syntax for merged groups makes 
>> more 
>> sense to me.
>>
> In merged groups that makes sense. Right. In a global group it is completely 
> different. Let me explain:
> groupname:grouptype:groupmember
> 
> groupname is the name of the group
> 
> grouptype is either "merged","shared","shared,managed","inoculation" or 
> "classification"
> 
> groupmember is a list of members separated by ","
> 
> for classification group (aka: "classification") you can turn the group into 
> a global group by using a "*" prefixed member name. Doing that transforms the 
> classification group (aka: classification network) into a global group.
> 
> A normal classification group lists members in the memberlist AND you have to 
> be one of them to be part of that classification network.
> 
> A global group is active FOR ALL users of the system and the member(s) ("*" 
> prefixed) in the memberlist is/are used when a user has less than 1000 
> innocent messages or 250 spam messages AND the message is either SPAM or HAM 
> and the confidence is below 65% (aka: 0.65). Then DSPAM will consult the 
> global group members and query their data to get an score.
> 
> 
>> I'm still not quite sure what the reasoning (if any) there is 
>> behind the Global Group name.
>>
> It is almost like an merged group but only kicking in under certain 
> conditions.
> 
> 
>> And the merged group syntax appears to allow for 
>> adding some users to the group where the Global Groups syntax only allows 
>> adding 
>> all users to the group.
>>
> Yes. That is. GLOBAL = for ALL.
> 
> 
>> Any help would be greatly appreciated.  Merged groups do appear to be doing 
>> an 
>> okay job but are less than ideal solution.
>>
> Why? Can you explain what you find not so good about merged groups?
> 
> 
>> I'll probably be working on doing a 
>> second check against the corpus user instead of using the merged group today 
>> since it'll allow me to implement the low confidence corpus check instead of 
>> always merging in the corpus data.
>>
> Aha. I see.
> 
> 
>> Again thanks for taking a look at this,
>>
> Well... it was anyway time to look at it and try to fix that broken thing.
> 
> 
>> Ed
>>

-- 
Ed Szynaka
Network/Systems Manager
LocalNet Corp./CoreComm Internet Services

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 17:00:34 +0200
 wrote:

> dspam is really complicated!
>
That is not true. No one forces you to use group support. That group support in 
DSPAM is very unique. You will hardly find another statistical classifier to 
offer such functionality wrapped in something so simple as the DSPAM group file.


> there is no easier method?
> 
Easier method for what?


> 
> 
> Cordialement,
>  
> 
> Samuel SALSON
> Prestataire Administrateur Systèmes (PCIDSS)
> Pôle Technique/Production Informatique
> MONECAM
> 
> -Message d'origine-
> De : Stevan Bajić [mailto:ste...@bajic.ch] 
> Envoyé : vendredi 30 avril 2010 16:04
> À : dspam-user@lists.sourceforge.net
> Objet : Re: [Dspam-user] Global Group Not Training
> 
> On Fri, 30 Apr 2010 09:48:41 -0400
> Ed Szynaka  wrote:
> 
> > Yes that's how the README lists it as well.  The only reason I tried the 
> > variations was to attempt to get the debug to let me know the user would 
> > also 
> > check the group.  I attempted the group name as the same as the corpus 
> > username 
> > to see if it worked similar to merged groups.
> > 
> Merged and shared groups use the groupname for internal processing. 
> Classification groups don't do anything with the group name.
> 
> 
> > Mostly I tried the variations because the syntax for merged groups makes 
> > more 
> > sense to me.
> >
> In merged groups that makes sense. Right. In a global group it is completely 
> different. Let me explain:
> groupname:grouptype:groupmember
> 
> groupname is the name of the group
> 
> grouptype is either "merged","shared","shared,managed","inoculation" or 
> "classification"
> 
> groupmember is a list of members separated by ","
> 
> for classification group (aka: "classification") you can turn the group into 
> a global group by using a "*" prefixed member name. Doing that transforms the 
> classification group (aka: classification network) into a global group.
> 
> A normal classification group lists members in the memberlist AND you have to 
> be one of them to be part of that classification network.
> 
> A global group is active FOR ALL users of the system and the member(s) ("*" 
> prefixed) in the memberlist is/are used when a user has less than 1000 
> innocent messages or 250 spam messages AND the message is either SPAM or HAM 
> and the confidence is below 65% (aka: 0.65). Then DSPAM will consult the 
> global group members and query their data to get an score.
> 
> 
> > I'm still not quite sure what the reasoning (if any) there is 
> > behind the Global Group name.
> >
> It is almost like an merged group but only kicking in under certain 
> conditions.
> 
> 
> > And the merged group syntax appears to allow for 
> > adding some users to the group where the Global Groups syntax only allows 
> > adding 
> > all users to the group.
> > 
> Yes. That is. GLOBAL = for ALL.
> 
> 
> > Any help would be greatly appreciated.  Merged groups do appear to be doing 
> > an 
> > okay job but are less than ideal solution.
> >
> Why? Can you explain what you find not so good about merged groups?
> 
> 
> > I'll probably be working on doing a 
> > second check against the corpus user instead of using the merged group 
> > today 
> > since it'll allow me to implement the low confidence corpus check instead 
> > of 
> > always merging in the corpus data.
> > 
> Aha. I see.
> 
> 
> > Again thanks for taking a look at this,
> >
> Well... it was anyway time to look at it and try to fix that broken thing.
> 
> 
> > Ed
> > 
> -- 
> Kind Regards from Switzerland,
> 
> Stevan Bajić
> 
> --
> ___
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspam-user

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Ed Szynaka
Stevan Bajić wrote:
>> I've added the following to the group file:
>> corpususer:classification:*corpususer
>>
> That is IMHO not correct. The line should probably be:
> corpusgroup:classification:*corpususer
> 

Yes that's how the README lists it as well.  The only reason I tried the 
variations was to attempt to get the debug to let me know the user would also 
check the group.  I attempted the group name as the same as the corpus username 
to see if it worked similar to merged groups.

Mostly I tried the variations because the syntax for merged groups makes more 
sense to me.  I'm still not quite sure what the reasoning (if any) there is 
behind the Global Group name.  And the merged group syntax appears to allow for 
adding some users to the group where the Global Groups syntax only allows 
adding 
all users to the group.

> 
>> When sending mail to a test user it does not appear to mark any mail as 
>> spam.  Looking at the debug output there is also no indication that the 
>> corpususer tokens are being consulted.  (I've included the debug outputs 
>> below)
>>
> The whole group evaluation is a mess! I need to fix that ASAP.
> 

Any help would be greatly appreciated.  Merged groups do appear to be doing an 
okay job but are less than ideal solution.  I'll probably be working on doing a 
second check against the corpus user instead of using the merged group today 
since it'll allow me to implement the low confidence corpus check instead of 
always merging in the corpus data.

Again thanks for taking a look at this,
Ed

-- 
Ed Szynaka
Network/Systems Manager
LocalNet Corp./CoreComm Internet Services

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 09:48:41 -0400
Ed Szynaka  wrote:

> Yes that's how the README lists it as well.  The only reason I tried the 
> variations was to attempt to get the debug to let me know the user would also 
> check the group.  I attempted the group name as the same as the corpus 
> username 
> to see if it worked similar to merged groups.
> 
Merged and shared groups use the groupname for internal processing. 
Classification groups don't do anything with the group name.


> Mostly I tried the variations because the syntax for merged groups makes more 
> sense to me.
>
In merged groups that makes sense. Right. In a global group it is completely 
different. Let me explain:
groupname:grouptype:groupmember

groupname is the name of the group

grouptype is either "merged","shared","shared,managed","inoculation" or 
"classification"

groupmember is a list of members separated by ","

for classification group (aka: "classification") you can turn the group into a 
global group by using a "*" prefixed member name. Doing that transforms the 
classification group (aka: classification network) into a global group.

A normal classification group lists members in the memberlist AND you have to 
be one of them to be part of that classification network.

A global group is active FOR ALL users of the system and the member(s) ("*" 
prefixed) in the memberlist is/are used when a user has less than 1000 innocent 
messages or 250 spam messages AND the message is either SPAM or HAM and the 
confidence is below 65% (aka: 0.65). Then DSPAM will consult the global group 
members and query their data to get an score.


> I'm still not quite sure what the reasoning (if any) there is 
> behind the Global Group name.
>
It is almost like an merged group but only kicking in under certain conditions.


> And the merged group syntax appears to allow for 
> adding some users to the group where the Global Groups syntax only allows 
> adding 
> all users to the group.
> 
Yes. That is. GLOBAL = for ALL.


> Any help would be greatly appreciated.  Merged groups do appear to be doing 
> an 
> okay job but are less than ideal solution.
>
Why? Can you explain what you find not so good about merged groups?


> I'll probably be working on doing a 
> second check against the corpus user instead of using the merged group today 
> since it'll allow me to implement the low confidence corpus check instead of 
> always merging in the corpus data.
> 
Aha. I see.


> Again thanks for taking a look at this,
>
Well... it was anyway time to look at it and try to fix that broken thing.


> Ed
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Ed Szynaka
Stevan Bajić wrote:
> On Thu, 29 Apr 2010 23:41:30 +0200
> Stevan Bajić  wrote:
> 
> [...]
>> Classification group is a mess. I need to find time to fix that.
>>
> I have now changed the code that does group parsing and assigning. Can I send 
> you a patch to try out?
> 
I'd be more than happy to try out any patch you might have.

> [...]
>> It is broken. I mean the whole group support is not consistent.
>>
> I never played with classification groups. But now reading the code I really 
> ask my self if that classification group support has ever worked the proper 
> way?
> If I overlook all the other obvious issues and only concentrate on the ussage 
> of global groups/classification networks then I see that the code only is 
> made to switch from innocent to spam. If using global group then a spam 
> message gets automatically switched to be innocent and then later gets 
> checked against global user.
> The source code and README writes about "classification network" but I really 
> don't see here any big magic or anything that would deserve the name 
> "network". First result of a member from a global/classification group is 
> enough to switch the class state. I would at least expect the code to ask a 
> bunch of members (if they are more members) and then doing the class switch 
> based on the combined result. But not just first member and then use that 
> result.
> 

I'd have to agree that I've got anecdotal evidence that the Global Groups 
portion of Classification Groups never worked properly.  Unfortunately I've 
only 
been charged with this project recently and the previous admin appears to have 
just assumed it worked.

Ed

-- 
Ed Szynaka
Network/Systems Manager
LocalNet Corp./CoreComm Internet Services

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Fri, 30 Apr 2010 09:22:54 +0200
Stevan Bajić  wrote:

[...]
> 
Another issue with the current implementation of classification groups / global 
groups is that there is no mechanism to separate global groups from 
classification groups.

So as soon as you add a user to a global group the whole agent context gets 
tagged to use global groups. Later then when consulting the group members of 
any classification group (global or network) the code can not differentiate 
between global or network.

This is then problematic if you add first a classification entry in the group 
file and after that a global group entry. Since all members of a classification 
group are added into a node tree list the first user to appear in that list 
will be then the one deciding the outcome of a check and since that first entry 
is from a classification group (if you first list classification groups before 
global groups) and not from a global group the result will probably not be what 
the end user wanted when he/she activated global groups.


-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-30 Thread Stevan Bajić
On Thu, 29 Apr 2010 23:41:30 +0200
Stevan Bajić  wrote:

[...]
> Classification group is a mess. I need to find time to fix that.
> 
I have now changed the code that does group parsing and assigning. Can I send 
you a patch to try out?


[...]
> It is broken. I mean the whole group support is not consistent.
> 
I never played with classification groups. But now reading the code I really 
ask my self if that classification group support has ever worked the proper way?
If I overlook all the other obvious issues and only concentrate on the ussage 
of global groups/classification networks then I see that the code only is made 
to switch from innocent to spam. If using global group then a spam message gets 
automatically switched to be innocent and then later gets checked against 
global user.
The source code and README writes about "classification network" but I really 
don't see here any big magic or anything that would deserve the name "network". 
First result of a member from a global/classification group is enough to switch 
the class state. I would at least expect the code to ask a bunch of members (if 
they are more members) and then doing the class switch based on the combined 
result. But not just first member and then use that result.



-- 
Kind Regards from Switzerland,

Stevan Bajić

--
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] Global Group Not Training

2010-04-29 Thread Stevan Bajić
On Thu, 29 Apr 2010 17:26:25 -0400
Ed Szynaka  wrote:

> 
> >> Trying to combine the 2 ideas above I tried this in the group file:
> >> corpususer:classification:*
> >>
> >> But unfortunately this causes a double free or corruption error in glibc 
> >> when trying to classify any message.  (I saw a ticket on that 
> >> http://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2990455&group_id=250683
> >>  
> >> 
> >>  
> >> and will be posting to that ticket right after this email)
> >>
> >> 
> > This issue is fixed in GIT repository. Check out and try again.
> >
> >   
> I've checked out the latest copy and the above group file line does not 
> cause the double free error but it does not seem to be working properly 
> either.  Before the return of every email I get this error message 
> repeated serveral times:
> > WARNING:  nonstandard use of escape in a string literal
> > LINE 1: ...plit_part(split_part(version(),' ',2),'.',1) FROM '\d+')::in...
> >  ^
> > HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.
> 
> With the above group file line (corpususer:classification:*) I get no 
> X-DSPAM headers added unless the result is Whitelisted.
>
Classification group is a mess. I need to find time to fix that.
btw: the format above is not correct. Should be: 
groupname:classification:*classificationuser


> With 
> "corpususer:classification:*corpususer" it classifies and inserts 
> headers but debug does not have any entry showing corpususer.
>
It is broken. I mean the whole group support is not consistent.


> With 
> "corpususer:merged:*" I do see references to dspam using corpususer.
> >   
> >> My questions are:
> >> How should a Global Group be setup to get the results describe in the 
> >> README?
> >> Is there any way to tell if a Global Group is being used?
> >>
> >> 
> > A global group is many things in DSPAM. You mean a "classification" group. 
> > Right? So the question should be: How to check if a classification group is 
> > working.
> >   
> Yes, I believe I'm looking for a classification group.  A group that 
> will be used for new users who have no training data and where trained 
> users data is not confident in the result. So
> What format should I use the group file to get allusers to be in a 
> classification group with corpususer?
> And how do I check to see if its working?
> > Can you post the output of:
> > dspam_stats -H testu...@testdomain.com
> > dspam_admin ag pref testu...@testdomain.com
> > dspam_admin ag pref default
> > sed "/^[\t ]*#\|^[\t ]*$/d" /path/to/your/dspam.conf
> >
> >   
> $ /usr/local/dspam/bin/dspam_stats -H testu...@testdomain.com
> testu...@testdomain.com:
> TP True Positives:39
> TN True Negatives:60
> FP False Positives:9
> FN False Negatives:   29
> SC Spam Corpusfed: 0
> NC Nonspam Corpusfed:  0
> TL Training Left:   2431
> SHR Spam Hit Rate 57.35%
> HSR Ham Strike Rate:  13.04%
> PPV Positive predictive value:81.25%
> OCA Overall Accuracy: 72.26%
>
> $ /usr/local/dspam/bin/dspam_admin ag pref testu...@testdomain.com
> trainingMode=TOE
> spamAction=quarantine
> spamSubject=[SPAM]
> statisticalSedation=5
> enableBNR=on
> enableWhitelist=on
> signatureLocation=headers
> tagSpam=off
> tagNonspam=off
> showFactors=off
> optIn=off
> optOut=off
> whitelistThreshold=10
> makeCorpus=off
> storeFragments=off
> localStore=
> processorBias=on
> fallbackDomain=off
> trainPristine=off
> optOutClamAV=off
> ignoreRBLLookups=off
> RBLInoculate=off
> 
> $ /usr/local/dspam/bin/dspam_admin ag pref default
> trainingMode=TOE
> spamAction=quarantine
> spamSubject=[SPAM]
> statisticalSedation=5
> enableBNR=on
> enableWhitelist=on
> signatureLocation=headers
> tagSpam=off
> tagNonspam=off
> showFactors=off
> optIn=off
> optOut=off
> whitelistThreshold=10
> makeCorpus=off
> storeFragments=off
> localStore=
> processorBias=on
> fallbackDomain=off
> trainPristine=off
> optOutClamAV=off
> ignoreRBLLookups=off
> RBLInoculate=off
> 
> $ sed "/^[\t ]*#\|^[\t ]*$/d" /usr/local/dspam/etc/dspam.conf
> Home /usr/local/dspam/var/dspam
> StorageDriver /usr/local/dspam/lib/dspam/libpgsql_drv.so
> OnFail error
> Trust root
> Trust dspam
> Trust www-data
> Trust mail
> Trust mailnull
> Trust smmsp
> Trust daemon
> TrainingMode toe
> TestConditionalTraining off
> Feature whitelist
> Feature tb=5
> Algorithm graham burton
> Tokenizer chain
> PValue bcr
> WebStats off
> Preference "trainingMode=TOE"   # { TOE | TUM | TEFT | NOTRAIN } 
> -> default:teft
> Preference "spamAction

Re: [Dspam-user] Global Group Not Training

2010-04-29 Thread Stevan Bajić
On Thu, 29 Apr 2010 15:33:28 -0400
Ed Szynaka  wrote:

> Hello,
> I'm trying to figure out if a corpus classification is working 
> properly.  I've setup a Global Group as described in the README but it 
> doesn't appear to be working properly.  I have a user named corpususer 
> and have trained it on 120756 spam and 20581 ham. 
> 
> I've added the following to the group file:
> corpususer:classification:*corpususer
> 
That is IMHO not correct. The line should probably be:
corpusgroup:classification:*corpususer


> When sending mail to a test user it does not appear to mark any mail as 
> spam.  Looking at the debug output there is also no indication that the 
> corpususer tokens are being consulted.  (I've included the debug outputs 
> below)
> 
The whole group evaluation is a mess! I need to fix that ASAP.



> As a test I setup a merged group with the same corpususer by adding this 
> to the group file:
> corpususer:merged:*
> 
> This appears to classify mail better and the debug output (listed below) 
> also shows the corpususer being consulted.  At the moment I am using the 
> merged group but would much prefer to use the classification group 
> becuase I don't like that because I'm using a large corpus which would 
> override anything the user did to train their own email.
> 
> Trying to combine the 2 ideas above I tried this in the group file:
> corpususer:classification:*
> 
> But unfortunately this causes a double free or corruption error in glibc 
> when trying to classify any message.  (I saw a ticket on that 
> http://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2990455&group_id=250683
>  
> 
>  
> and will be posting to that ticket right after this email)
> 
> My questions are:
> How should a Global Group be setup to get the results describe in the 
> README?
> Is there any way to tell if a Global Group is being used?
> 
> I'm using dspam 3.9.0 with a postgresql backend, compiled from source 
> with the following options:
> ../configure --prefix=/usr/local/dspam --sysconfdir=/usr/local/dspam/etc 
> --with-storage-driver=mysql_drv,pgsql_drv 
> --with-mysql-includes=/usr/include/mysql 
> --with-pgsql-includes=/usr/include/postgresql --enable-daemon 
> --enable-debug -
> -enable-virtual-users --enable-preferences-extension --enable-clamav
> 
> Thanks,
> Ed
> 
> 
> corpususer:classification:*corpususer debug output:
> > 6565: [04/29/2010 14:47:37] No QuarantineAgent option found. Using 
> > standard quarantine.
> > 6565: [04/29/2010 14:47:37] DSPAM Instance Startup
> > 6565: [04/29/2010 14:47:37] input args: /usr/local/dspam/bin/dspam 
> > --stdout --deliver=innocent,spam --user testu...@testdomain.com --debug
> > 6565: [04/29/2010 14:47:37] pass-thru args:
> > 6565: [04/29/2010 14:47:37] processing user testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] uid = 0, euid = 0, gid = 0, egid = 8
> > 6565: [04/29/2010 14:47:37] loading preferences for user 
> > testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] _pgsql_drv_getpwnam: successful returning 
> > struct for name: testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] Loading preferences for uid 3856
> > 6565: [04/29/2010 14:47:37] Loading preferences for uid 0
> > 6565: [04/29/2010 14:47:37] Loading preferences for uid 0
> > 6565: [04/29/2010 14:47:37] default preferences empty. reverting to 
> > dspam.conf preferences.
> > 6565: [04/29/2010 14:47:37] Loading preferences from dspam.conf
> > 6565: [04/29/2010 14:47:37] using 
> > /usr/local/dspam/var/dspam/opt-in/testu...@testdomain.com.dspam as path
> > 6565: [04/29/2010 14:47:37] using 
> > /usr/local/dspam/var/dspam/opt-out/testu...@testdomain.com.nodspam as path
> > 6565: [04/29/2010 14:47:37] sedation level set to: 5
> > 6565: [04/29/2010 14:47:37] _pgsql_drv_getpwnam: successful returning 
> > struct for name: testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com.
> > 6565: [04/29/2010 14:47:39] Loading 7 BNR patterns
> > 6565: [04/29/2010 14:47:39] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com.
> > 6565: [04/29/2010 14:47:39] Whitelist threshold: 10
> > 
> > 6565: [04/29/2010 14:47:39] Graham-Bayesian Probability: 0.002278 
> > Samples: 15
> > 6565: [04/29/2010 14:47:39] Burton-Bayesian Probability: 0.18 
> > Samples: 27
> > 6565: [04/29/2010 14:47:39] no factors specified; using default
> > 6565: [04/29/2010 14:47:39] Result Confidence: 1.00
> > 6565: [04/29/2010 14:47:39] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com.
> > 6565: [04/29/2010 14:47:39] Control: [10 10] [10 11] Delta: [0 1]
> > 6565: [04/29/2010 14:47:40] total processing time: 3.01203s
> > 6565: [04/29/2010 14:47:40] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com.
> > 6565: [04/29/2010 14:47:40] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com

Re: [Dspam-user] Global Group Not Training

2010-04-29 Thread Ed Szynaka



Trying to combine the 2 ideas above I tried this in the group file:
corpususer:classification:*

But unfortunately this causes a double free or corruption error in glibc 
when trying to classify any message.  (I saw a ticket on that 
http://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2990455&group_id=250683 
 
and will be posting to that ticket right after this email)




This issue is fixed in GIT repository. Check out and try again.

  
I've checked out the latest copy and the above group file line does not 
cause the double free error but it does not seem to be working properly 
either.  Before the return of every email I get this error message 
repeated serveral times:

WARNING:  nonstandard use of escape in a string literal
LINE 1: ...plit_part(split_part(version(),' ',2),'.',1) FROM '\d+')::in...
 ^
HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.


With the above group file line (corpususer:classification:*) I get no 
X-DSPAM headers added unless the result is Whitelisted.  With 
"corpususer:classification:*corpususer" it classifies and inserts 
headers but debug does not have any entry showing corpususer.  With 
"corpususer:merged:*" I do see references to dspam using corpususer.
  

My questions are:
How should a Global Group be setup to get the results describe in the 
README?

Is there any way to tell if a Global Group is being used?



A global group is many things in DSPAM. You mean a "classification" group. 
Right? So the question should be: How to check if a classification group is working.
  
Yes, I believe I'm looking for a classification group.  A group that 
will be used for new users who have no training data and where trained 
users data is not confident in the result. So
What format should I use the group file to get allusers to be in a 
classification group with corpususer?

And how do I check to see if its working?

Can you post the output of:
dspam_stats -H testu...@testdomain.com
dspam_admin ag pref testu...@testdomain.com
dspam_admin ag pref default
sed "/^[\t ]*#\|^[\t ]*$/d" /path/to/your/dspam.conf

  

$ /usr/local/dspam/bin/dspam_stats -H testu...@testdomain.com
testu...@testdomain.com:
   TP True Positives:39
   TN True Negatives:60
   FP False Positives:9
   FN False Negatives:   29
   SC Spam Corpusfed: 0
   NC Nonspam Corpusfed:  0
   TL Training Left:   2431
   SHR Spam Hit Rate 57.35%
   HSR Ham Strike Rate:  13.04%
   PPV Positive predictive value:81.25%
   OCA Overall Accuracy: 72.26%
  
$ /usr/local/dspam/bin/dspam_admin ag pref testu...@testdomain.com

trainingMode=TOE
spamAction=quarantine
spamSubject=[SPAM]
statisticalSedation=5
enableBNR=on
enableWhitelist=on
signatureLocation=headers
tagSpam=off
tagNonspam=off
showFactors=off
optIn=off
optOut=off
whitelistThreshold=10
makeCorpus=off
storeFragments=off
localStore=
processorBias=on
fallbackDomain=off
trainPristine=off
optOutClamAV=off
ignoreRBLLookups=off
RBLInoculate=off

$ /usr/local/dspam/bin/dspam_admin ag pref default
trainingMode=TOE
spamAction=quarantine
spamSubject=[SPAM]
statisticalSedation=5
enableBNR=on
enableWhitelist=on
signatureLocation=headers
tagSpam=off
tagNonspam=off
showFactors=off
optIn=off
optOut=off
whitelistThreshold=10
makeCorpus=off
storeFragments=off
localStore=
processorBias=on
fallbackDomain=off
trainPristine=off
optOutClamAV=off
ignoreRBLLookups=off
RBLInoculate=off

$ sed "/^[\t ]*#\|^[\t ]*$/d" /usr/local/dspam/etc/dspam.conf
Home /usr/local/dspam/var/dspam
StorageDriver /usr/local/dspam/lib/dspam/libpgsql_drv.so
OnFail error
Trust root
Trust dspam
Trust www-data
Trust mail
Trust mailnull
Trust smmsp
Trust daemon
TrainingMode toe
TestConditionalTraining off
Feature whitelist
Feature tb=5
Algorithm graham burton
Tokenizer chain
PValue bcr
WebStats off
Preference "trainingMode=TOE"   # { TOE | TUM | TEFT | NOTRAIN } 
-> default:teft
Preference "spamAction=quarantine"  # { quarantine | tag | deliver } 
-> default:quarantine

Preference "spamSubject=[SPAM]" # { string } -> default:[SPAM]
Preference "statisticalSedation=5"  # { 0 - 10 } -> default:0
Preference "enableBNR=on"   # { on | off } -> default:off
Preference "enableWhitelist=on" # { on | off } -> default:on
Preference "signatureLocation=headers"  # { message | headers } -> 
default:message

Preference "tagSpam=off"# { on | off }
Preference "tagNonspam=off" # { on | off }
Preference "showFactors=off"# { on | off } -> default:off
Preference "optIn=off"

Re: [Dspam-user] Global Group Not Training

2010-04-29 Thread Stevan Bajić
On Thu, 29 Apr 2010 15:33:28 -0400
Ed Szynaka  wrote:

> Hello,
> I'm trying to figure out if a corpus classification is working 
> properly.  I've setup a Global Group as described in the README but it 
> doesn't appear to be working properly.  I have a user named corpususer 
> and have trained it on 120756 spam and 20581 ham. 
> 
> I've added the following to the group file:
> corpususer:classification:*corpususer
> 
> When sending mail to a test user it does not appear to mark any mail as 
> spam.  Looking at the debug output there is also no indication that the 
> corpususer tokens are being consulted.  (I've included the debug outputs 
> below)
> 
> As a test I setup a merged group with the same corpususer by adding this 
> to the group file:
> corpususer:merged:*
> 
> This appears to classify mail better and the debug output (listed below) 
> also shows the corpususer being consulted.  At the moment I am using the 
> merged group but would much prefer to use the classification group 
> becuase I don't like that because I'm using a large corpus which would 
> override anything the user did to train their own email.
> 
> Trying to combine the 2 ideas above I tried this in the group file:
> corpususer:classification:*
> 
> But unfortunately this causes a double free or corruption error in glibc 
> when trying to classify any message.  (I saw a ticket on that 
> http://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2990455&group_id=250683
>  
> 
>  
> and will be posting to that ticket right after this email)
> 
This issue is fixed in GIT repository. Check out and try again.


> My questions are:
> How should a Global Group be setup to get the results describe in the 
> README?
> Is there any way to tell if a Global Group is being used?
> 
A global group is many things in DSPAM. You mean a "classification" group. 
Right? So the question should be: How to check if a classification group is 
working.

Can you post the output of:
dspam_stats -H testu...@testdomain.com
dspam_admin ag pref testu...@testdomain.com
dspam_admin ag pref default
sed "/^[\t ]*#\|^[\t ]*$/d" /path/to/your/dspam.conf


> I'm using dspam 3.9.0 with a postgresql backend, compiled from source 
> with the following options:
> ../configure --prefix=/usr/local/dspam --sysconfdir=/usr/local/dspam/etc 
> --with-storage-driver=mysql_drv,pgsql_drv 
> --with-mysql-includes=/usr/include/mysql 
> --with-pgsql-includes=/usr/include/postgresql --enable-daemon 
> --enable-debug -
> -enable-virtual-users --enable-preferences-extension --enable-clamav
> 
> Thanks,
> Ed
> 
> 
> corpususer:classification:*corpususer debug output:
> > 6565: [04/29/2010 14:47:37] No QuarantineAgent option found. Using 
> > standard quarantine.
> > 6565: [04/29/2010 14:47:37] DSPAM Instance Startup
> > 6565: [04/29/2010 14:47:37] input args: /usr/local/dspam/bin/dspam 
> > --stdout --deliver=innocent,spam --user testu...@testdomain.com --debug
> > 6565: [04/29/2010 14:47:37] pass-thru args:
> > 6565: [04/29/2010 14:47:37] processing user testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] uid = 0, euid = 0, gid = 0, egid = 8
> > 6565: [04/29/2010 14:47:37] loading preferences for user 
> > testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] _pgsql_drv_getpwnam: successful returning 
> > struct for name: testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] Loading preferences for uid 3856
> > 6565: [04/29/2010 14:47:37] Loading preferences for uid 0
> > 6565: [04/29/2010 14:47:37] Loading preferences for uid 0
> > 6565: [04/29/2010 14:47:37] default preferences empty. reverting to 
> > dspam.conf preferences.
> > 6565: [04/29/2010 14:47:37] Loading preferences from dspam.conf
> > 6565: [04/29/2010 14:47:37] using 
> > /usr/local/dspam/var/dspam/opt-in/testu...@testdomain.com.dspam as path
> > 6565: [04/29/2010 14:47:37] using 
> > /usr/local/dspam/var/dspam/opt-out/testu...@testdomain.com.nodspam as path
> > 6565: [04/29/2010 14:47:37] sedation level set to: 5
> > 6565: [04/29/2010 14:47:37] _pgsql_drv_getpwnam: successful returning 
> > struct for name: testu...@testdomain.com
> > 6565: [04/29/2010 14:47:37] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com.
> > 6565: [04/29/2010 14:47:39] Loading 7 BNR patterns
> > 6565: [04/29/2010 14:47:39] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com.
> > 6565: [04/29/2010 14:47:39] Whitelist threshold: 10
> > 
> > 6565: [04/29/2010 14:47:39] Graham-Bayesian Probability: 0.002278 
> > Samples: 15
> > 6565: [04/29/2010 14:47:39] Burton-Bayesian Probability: 0.18 
> > Samples: 27
> > 6565: [04/29/2010 14:47:39] no factors specified; using default
> > 6565: [04/29/2010 14:47:39] Result Confidence: 1.00
> > 6565: [04/29/2010 14:47:39] _pgsql_drv_getpwnam returning cached name 
> > testu...@testdomain.com.
> > 6565: [04/29/2010 14:47:39] Control: [10 10] [10 11] Delta: [0 1]
> > 6565: