Am 14.05.2016 um 19:10 schrieb John Hardin:
On Sat, 14 May 2016, Reindl Harald wrote:

Am 14.05.2016 um 04:50 schrieb John Hardin:
 On Sat, 14 May 2016, Reindl Harald wrote:
>  Am 14.05.2016 um 04:04 schrieb John Hardin:
> >   How would a webservice be better? That would still be sending >
>   customer
> >   emails to a third party for processing.
> >  uhm you missed "and only give the rules which hitted and
spam/ham flag
>  out"

 Ah, OK, I misunderstood what you were suggesting.

 That wouldn't work. That tells you the rules they hit at the time they
 were scanned, not which rules they would hit from the current testing
 rules.

on the other hand it would reflect the complete mail-flow and not just
hand-crafted samples

It's not hand *crafted* samples, it's hand *classified* samples. The
message needs to be classified by a reliable human as ham or spam for
the analysis of the rules that it hits to have any use, or even be
possible.

that's just nitpicking - i can correct you the same way in german for most of you would try to express :-)

That's why doing something like having an SA install that's based on the
current SVN sandbox rules, and that gets a forked copy of your mail
stream, and that captures the hits, is still not useful for anything
other than gross "this rule didn't hit anything" analysis - you don't
know what a given message *should* have been, so you can't say anything
about the rules that hit it - whether they aid that result, or hider it.

how do you imagine such a setup *in practice*

Unless your mail stream prior to SA is *guaranteed* 100% ham (which is
hugely unlikely or why would you be running SA at all?) or 100% spam
(which might be the case for a clean honeypot), you need to review and
classify the messages manually before performing the scan and reporting
the rule hits, and that means keeping copies of the pristine messages,
at least for a while.

I don't know whether statutory requirement make this impossible for you
even if you did obtain consent from some of your clients to use their
mail stream in that manner.

i don't have access to the whole mailflow to classify it nor is there a technical way to mirror it on a different setup nor would SA or even smtpd ever see 95% of junk because content filters are the last ressort by definition

should be chained in a minimum negative score to count as ham and a
minimum positive to count as spam - configureable because it depends
on the local environment and adjustments which scores are clear
classifications, 7.0 would here not be 100% spam, 12.0 would be as
example

That's probably still not reliable enough for use in masscheck. Ham is a
bit more important; what would you recommend as a lower limit for
considering a message as ham? How many actual hams would meet that
requiement? It might be a lot of work for little final benefit. What
percentage actual FNs would you see with that setting? Those would
damage the masscheck analysis.

i would agree if we could call the current masscheck results reliable

it would at least help in the current situation and with a rule like
FSL_HELO_HOME when it hits only clear ham and has a high spam-score
and when it only needs to be enabled, collects the information through
scanning and submit the results once per day a lot of people running
milter like setups with reject and no access to rejected mails could
help to improve to auto-QA without collecting whole mails

Potentially. You'd have to be willing to set up a parallel mail
processing stream using the current SVN sandbox rules as I described
above. Performing analysis on the released rules provides no benefit to
masscheck

why would it provide no benefit when one part of the "sa-update" which currently don't get any updates most of the time is to re-score badly socred rules - that's really not only about sandbox rules


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to