[sniffer] Re: GBUdb question

Pete McNeil Tue, 22 Jan 2008 10:48:12 -0800

Hello Rob,

Tuesday, January 22, 2008, 1:11:00 PM, you wrote:


<snip... about auto-drill-down/>

> I'm not confident that this will handle the "forwarded messages"
> scenarios that I described, which I have ready custom programmed for
> the specific narrow range of ways that this currently happens with
> my server.

We're hopeful it will work for many cases. If you can identify cases
where it won't work please let us know.

>> Please share an example of the header you would inject.
>>   
> Currently, I'm using the following:

> X-RegEx-Original-IP: 127.0.0.1

> (But "X-RegEx-Original-IP" was arbitrary. This was inherited by an 
> antiquated anti-spam utility I used years ago. The "X-RegEx-Original-IP"
> part can change at any time. This would even be a header custom 
> designated by Sniffer.)

That seems straight forward enough. Thanks.

> Even better, another option would be for the IP to be passed to sniffer
> via the command line where sniffer would know to use that one and not 
> bother trying to grab this from the header. Please consider that as a 
> feature request.

I will add that to the list.

<snip about GBUdb training options (disabled training)/>

> That will work. But will this disable the "SNFClient.exe -bad" and
> "SNFClient.exe -good" tools?? and will this disable sharing of the
> data? Can data accumulated via these manual reportings be shared
> even if  "training" is "off"?

The command line tools always work. When you report a "good" or "bad"
hit it has the same effect as GBUdb learning from a message scan.

The information will be stored and shared in exactly the same way.

When you turn off training you are only disabling the system's ability
to learn automatically from scanned messages. Inputs from the command
line utility are still retained.

<snip/>

>> One other thought that I have is that you could use the command
>> line (or the ignore list) to mark the IPs on your internal
>> white-list as Infrastructure (ignore flag). This might effectively
>> train GBUdb to skip those IPs when finding the source of the
>> message - and in any case would render GBUdb inert for those IPs.
> There are too many IPs on that whitelist (it might have been possible 
> were it not that many of these entries are massive blocks of IPs).

Perhaps - that's up to you. However, the GBUdb system is designed to
handle large numbers of IPs without slowing down. It is not uncommon
to have significantly more than half a million IPs in GBUdb on systems
that handle 500 msg/min or more.

The ignore list file is intended to handle local infrastructure so
that if you lose your GBUdb data you can be assured that your local
resources are not tagged as bad sources accidentally.

Other IP records (ignore, good, bad, or ugly) can be entered via the
command line utility with the only real limit being the amount of RAM
you want to commit to the GBUdb.

To give you an idea of scalability, one of our spamtrap processors is
currently (typ) handling about 3000 msg/minute and has the following
GBUdb statistics:

        <gbudb>
                <size bytes='109051904'/>
                <records count='479671'/>
                <utilization percent='96.7379'/>
        </gbudb>


> Follow-up question...

> If, therefore, I cannot stop GBUdb-processing for a particular message,
> but I turn off truncate for all messages, the way I see it, couldn't I
> simply ignore the GBUdb reporting for some particular messages? (might
> not be as efficient, but I'd get the same result I seek!) But in a case
> where truncate is turned off, if GBUdb reports a message as spam, AND 
> content rules ALSO mark that message as spam, will the return code tell
> me that both GBUdb *and *rules caught the spam? Or do I get one code 
> instead of the other (if so, which one?)

If you turn off truncate then you will see the following results by
default in a conventional command-line implementation:

* For messages that match pattern rules you will see the pattern rule
result.

* If a message fails to match a pattern rule but would have been
truncated then it will be treated as black and you will get result
code 40.

* If a message fails to match a pattern rule but the IP falls in the
black range then you will get the black result code 40.

* If the message fails to match a pattern rule and the IP falls in the
caution range then you will get an bad IP result code 63. This is the
same result code you get from SNF when an IP pattern rule has matched.
IP pattern rules are deprecated and will be phased out over time -
GBUdb replaces them.

If you call SNF directly via XCI, or use the command line utility with
the -xhdr and capture the output then you also have the ability to
configure SNF to provide detailed information about the scan including
the GBUdb data and all available pattern matches. You could also mine
this data from the log files if you wish.

Note that you can set the x-header option to "api" and it will be
available to the XCI and command line interfaces without being
injected into the message.

--- One other thing ---

You can adjust and even disable the ranges that are defined for the
GBUdb. This allows you to develop your own statistical model for
classifying messages based on the behavior of their source. The ranges
that you select for your GBUdb do not impact what is reported and
learned about the IPs - - only how your system responds to those
statistics.

The new reference settings we have created are such that they are
reasonably safe even when large ISPs are involved. A message source
must be observed producing more than 95% spam before it will create a
truncation event.

Sad but true - many major ISPs generate just shy of that amount of
spam through various vectors (forwarded mailboxes, being one of them).
You may find that the new reference settings produce something very
close to your desired result -- especially if you also provide the
additional "hinting" that you propose.

If you are able to teach your GBUdb to ignore the correct IPs then you
will be able to tighten the range envelopes significantly and
dramatically reduce leakage without adding false positives.

Hope this helps,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#############################################################
This message is sent to you because you are subscribed to
  the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

[sniffer] Re: GBUdb question

Reply via email to