On Tue, Oct 28, 2014 at 11:47 AM, francis picabia <fpica...@gmail.com> wrote:
> > > On Mon, Oct 27, 2014 at 4:55 PM, John Hardin <jhar...@impsec.org> wrote: > >> On Mon, 27 Oct 2014, francis picabia wrote: >> >> uri URI_EXAMPLE_EXTRA m;^https?://(?:www\.)?example\.com[^/?];i >>>>>> >>>>> >>> However another spoofed message was received today and the rule >>> did not capture it. >>> >>> If I want to detect something in the form of: >>> random_server.example.com.junk >>> I need to wildcard the first bit. Would that be: >>> >>> uri URI_EXAMPLE_EXTRA m;^https?://(?:.*\.)?example\.com[^/?];i >>> >>> I don't understand what the question mark and colon does inside the ( ) >>> I thought it followed an optional char or expression. Should it be >>> like this? >>> >>> uri URI_EXAMPLE_EXTRA m;^https?://(.*\.)?example\.com[^/?];i >>> >> >> (?:) means "group, don't remember the match". () remembers what's matched >> for future use in the RE (e.g. to check for repeated strings like >> "abcabcabcabc". >> >> Try this: >> >> uri URI_EXAMPLE_EXTRA m;^https?://(?:[^./]+\.)*example\.com[^/?];i >> >> > Once again, thanks for the RE coding. > > I found a false positive it captured with my attempt at this : > > <a href=" > http://www.newslettersite.com/redirectnewsletter_login.asp?URL=http://www.secondsite.com/PYB/contact_us.asp&loginemail=u...@example.com&logincode=123456&utm_source=Articles_Air_01112014&utm_medium=email&utm_campaign=newsletter&utm_content=contactus > " > > I've tested your rule with that and it does not tag for the above. > Great. Hopefully useful to others facing domain spoofs in phishing. > > I thought this was a representative test case, but apparently there is something triggering a false positive when the email is a newsletter which embeds a user's email within URLs. In the sample I've seen, there are 34 such possible links which may have triggered the issue, but I don't know which. I ran the quarantined sample through spamassassin -D and it shows: Oct 28 16:24:01.391 [28945] dbg: rules: ran uri rule URI_MYDOMAIN_PHISH ======> got hit: "http://example.com&" On prior lines in the trace I see other uri rules getting hits, but it seems to be about different URLs. The entire body of the email is base64 encoded. Extracting that part and running base64 -d I am not finding the hit described by SA trace. This is my method: zcat spam-jUVZBDml0wS5.gz | grep 'http://example.com' So the URL is not in the non-base64 part. zcat spam-jUVZBDml0wS5.gz > /tmp/spamfull cp /tmp/spamfull /tmp/spam64 vi /tmp/spam64 (to remove headers) base64 -d /tmp/spam64 | grep 'http://example.com' (no matchs) Double checked with: spamassassin -D -lint < /tmp/spamfull 2>&1 | grep http://example.com nothing is output except the line above with URI_MYDOMAIN_PHISH. Is there any suggestion on how to nail down where the match is happening?