Alex, from Nexus7.
Boyaah!
Le 27 sept. 2012 14:34, "Bowie Bailey" <bowie_bai...@buc.com> a écrit :
>
>
> On 9/27/2012 1:48 PM, Alexandre Boyer wrote:
>>
>> Alex, from prypiat.
>> Yes, I recycle.
>>
>>
>> On 12-09-27 11:09 AM, Bowie Bailey wrote:
>>>
>>> On 9/27/2012 10:41 AM, Alexandre Boyer wrote:
>>>>
>>>> Hello all,
>>>>
>>>> Here is a small ruleset that I'm working with. I added it to our
>>>> local ruleset in prod:
>>>>
>>>>      # BAD LINKS N-NG ;-) ;
>>>>      # Canada Post
>>>>

                       &n
>>>>      b sp;
>>>>      uri_detail   AJB_CANPOST_BADLINK             raw !~ /canadapost\./
>>>>      text =~ /(?:https?:\/\/(?:www\.)?|www\.)canadapost\./ type =~
/^a$/
>>>>      describe     AJB_CANPOST_BADLINK             Found a mismatch
>>>>      between href and anchored text pretending to link to
>>>> www.canadapost.ca
>>>>      score        AJB_CANPOST_BADLINK             1.0
>>>>      meta         AJB_CANPOST_PHISH_BADTRACKNUM   Z_CANPOST_BADLINK &&
>>>>      !Z_CANPOST_TRACKNUM
>>>>      describe     AJB_CANPOST_PHISH_BADTRACKNUM   Mismatch between href
>>>>      and anchored + unofficial tracking number from CanadaPost
>>>>      score        AJB_CANPOST_PHISH_BADTRACKNUM   2.0
>>>>      #
>>>>     youtube
>>>> &
>>>>      n bsp;
>>>>      uri_detail   AJB_UTUBE_BADLINK   raw !~ /youtube\./ text =~
>>>>      /(?:https?:\/\/(?:www\.)?|www\.)youtube\./ type =~ /^a$/
>>>>      describe     AJB_UTUBE_BADLINK   Found a mismatch between href and
>>>>      anchored text pretending to link to www.youtube.com
>>>>      score        AJB_UTUBE_BADLINK   0.5
>>>>      # because of link trackers (from massmailer for example), we must
>>>>      meta this with other rulz to be sure we face our fake yutube
botnet
>>>>      meta      AJB_FK_UTUBE_BOTNET     Z_UTUBE_BADLINK && Z_EMPTY_SUBJ
>>>>      && MIME_HTML_ONLY
>>>>      describe  AJB_FK_UTUBE_BOTNET     mismatch between href and
>>>>      anchored + empty subject = botnet
>>>>      score     AJB_FK_UTUBE_BOTNET     5.5
>>>>      ## & nbsp;
>>>>      # TODO: check if we could workwith  DKIM, exists:List-Unsubscribe,
>>>>      SPF_PASS, RCVD_IN_RP_SAFE, RCVD_IN_RP_CERTIFIED and others
>>>>      #    in order to avoid FPs from MassMailers.
>>>>
>>>> Note the TODO ;-)
>>>
>>> Don't know if it makes much difference in this case, but...
>>>
>>> (?:https?:\/\/(?:www\.)?|www\.)
>>
>> Should catch:
>> http://
>> https://
>> http://www.
>> https://www.
>> www.
>>
>>> can be simplified to:
>>>
>>> (?:https?:\/\/|www\.)
>>>
>> While this catches:
>> http://
>> https://
>> www.
>>
>> Covering less. It's may be overkill, but my regex has one and only
>> purpose: match any kind of "valid" web link, as per common user
>> experience (ie. "as seen on TV").
>>
>> The spammer will try to lure the common user by mimic what the common
>> user is habituated to see, no?
>
>
> Check again.  "http://www."; and "https://www."; are caught by the "www."
pattern.  Matching the "https?://" as well is not needed. That's why I
mentioned anchoring.  If you were anchoring the front of the regexp, you
would need this match.  Since you are not, the extra specificity is not
needed.  My regexp matches exactly the same strings as yours.
>
>

Oups, that kind of anchoring... I thought you were pointing the type <a>.

You're definitly right, sory for the misunderstanding.

I will update my rules with your simplier regex :-)

Alex, sometimes not focused on the right thing ;)

>>
>>> Since you're not anchoring the front of the regexp or trying to
>>> capture the match, the results will be the same.
>>>
>> Not capturing because not using thereafter. On a small system, this
>> makes no difference. On large systems (millions+ emails filtered a day),
>> this is probably making a difference. I take a guess here, I don't want
>> to prove this on my own systems :-)
>
>
> Right.  No need to capture here or in most SA rules.  I only mentioned it
since there would be a difference between your original regexp and my
suggestion if you were doing some capturing.
>
> As I said, it may not make any real difference here, I was simply
pointing out the possible simplification of the regexp.
>
> --
> Bowie

Reply via email to