Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Dear members, User repeatedly sends us spam messages to train SA. Traning - at the moment - requires manual intervention: administrator verifies if it's really spam then issues sa-learn. Then the user thinks the process is done, and the next time when the same email arrives, it will

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Reindl Harald [mailto:h.rei...@thelounge.net] wrote: > > However, that doesn't happen. > > 0.000 0 338770 0 non-token data: nspam > > 0.000 01460807 0 non-token data: nham > what do you expect when you train 4 times more ham than spam? > frankly you

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Reindl Harald [mailto:h.rei...@thelounge.net] wrote: >> This is a mail gateway for multiple companies. I'm not supposed to read >> e-mails on that, or picking mails that can be used for learning ham > > how did you then manage 1.4 Mio ham-samples in your biased corpus Looks like in this

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Reindl Harald [mailto:h.rei...@thelounge.net] wrote: >> I think I have no control over what is learnt automatically. > surely, don't do autolearning at all This is a mail gateway for multiple companies. I'm not supposed to read e-mails on that, or picking mails that can be used for learning ham.

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Hello, David Jones [mailto:djo...@ena.com] wrote: > There should be many more rule hits than just these 3. It looks like > network tests aren't happening. > Can you post the original email to pastebin.com with minimal redacting > so the rest of us can run it through our SA to see how it

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Hello, David Jones [mailto:djo...@ena.com] wrote: > With non-English email flow, it's more challenging. If no RBLs hit, then you > really must train your Bayes properly which requires some way to accurately > determine the ham and spam. You must keep a copy of the ham and spam corpi and be