[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-09 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED




--- Additional Comments From [EMAIL PROTECTED]  2005-05-09 08:57 ---
Committed revision 169334




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-09 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-09 09:21 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector

On Mon, May 09, 2005 at 12:40:22AM -0700, [EMAIL PROTECTED] wrote:
 redirector_pattern /^(?:http://)?chkpt\.zdnet\.com/chkpt/\w+/(.*)$/
 
 why is '(?:http://)?' conditional?  should we not just ensure that any input 
 to
 that match will always be a full URL with protocol, and leave off the
 conditionality?

+1  the rest of the uri_list_canonify() code will make sure it's a FQ URL
anyway.





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-07 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 16:20 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector

On Thu, May 05, 2005 at 08:31:04PM -0700, [EMAIL PROTECTED] wrote:
 There's a lot of stuff in the comments here. I want to make sure that your 
 vote
 is about what ended in the patch and not about things I only talked about 
 doing.

It was the patch, but ...

 1) Mail::SpamAssassin-parse() was called in three places as a class method,
 even though the POD doesn't document it as being one. That happened not to 
 break
 anything before now, but should be fixed independent of this bug. It's a one
 line change in two of those places, and a three line change in the third. The
 API for parse() does not have to be changed.

That's fine.

masses/corpora/mass-find-nonspam:  my $ma = Mail::SpamAssassin-parse 
($dataref);

This has no M::SA object.  I'd probably just change it to use
M::SA::Message-new().

sa-learn.raw:  my $ma = Mail::SpamAssassin-parse($dataref);
tools/speedtest:my $mail = Mail::SpamAssassin-parse (\*FILE);

There's a M::SA object already, so doing $sa-parse() is fine with me.

This shouldn't be part of this ticket.

 2) The Mail::SpamAssassin instance is passed in to the constructors for 
 Message,
 Node, and HTML so they can have access to configuration options. That adds an
 argument to the constructor, a 'main' instance variable that is assigned in 
 the
 constructor and deleted in the finish() method.

This is what I'm -1 on.

 3) The code that actually performs the enhancement.
 #1 should be done anyway. #3 is not the issue you are bringing up.

Sort of.  I'm actually somewhat negative against #3 as well.  This is yet
another reserved IP/RegistrarBoundary thing, except the potential list of
these is much MUCH larger than those other two.  I think I'd rather have a
open redirector domain list via SURBL or something that gets you a few
points if your domain has an open redirector available.  Gives some incentive
to those people to close the redirector.

 I don't know that #2 is a big change to the core API. The three constructors 
 are
  only called in a handful of places in all. It could be argued that we should
 not have core functionality like message parsing and HTML rendering that 
 cannot
 be modified by any configuration options.

Yes, it is a big change.  Message and Message::Node are designed to be
independent of SpamAssassin.  The code's job is to parse a message and
return an internal data format, there's no configuration involved in
that process.

In this case, there's no need to have anything change in the current process.
The HTML is rendered and URIs are pulled out and cooked appropriately.   The
text URIs are handled separately and would also need to be gone over by the
code.

So if we want this support for just the URIBL stuff, it's easy to just add to
that plugin.  If we want it in general, it's a bit more complex.  As far as
our code is concerned, get_uri_list() is the way to get the URI list.  The
URIBL plugin is the only thing that I can see that directly accesses the
HTML uri_detail hash (and nothing seems to look at the old uri array).

This may be a good time to change get_uri_list().  If it's called with
no parameters, return what we currently do (array of uris).  If it's
called with some new parameter, return a uri_detail-esque hash which
includes both HTML and parsed URIs (trivial).  Change URIBL plugin to use
that new get_uri_list() call.  Have get_uri_list() do the full gambit
of redirection, skip it in uri_list_canonify().  Heck, skip the whole
canonification step in HTML.





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-07 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 17:14 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector 

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 This may be a good time to change get_uri_list().

+1

 If it's called with no parameters, return what we currently do (array of
 uris).  If it's called with some new parameter, return a
 uri_detail-esque hash which includes both HTML and parsed URIs
 (trivial). Change URIBL plugin to use that new get_uri_list() call.

- -1.  If we do this, one of the formats should be a *new* API -- having an
API become modal depending on the presence of an argument, especially if
the modality depends on if the arg is undef or not, is bad design.

(why is the undef or not model a bad idea?  because it's too easy to
accidentally mistype something and accidentally wind up passing undef in
as the arg -- leaving you scratching your head as to why the function has
returned the wrong data... we've seen this with setter APIs that
seem to do nothing because the data was accidentally undef.)

A better idea is just to *add* a new form of the API as a new method.

Aside from the above -- can someone explain why this is going to require
a new API?  what's wrong with putting redirector-stripping into the
existing one, since we already do it there anyway for all the other
redirector formats?

 Have get_uri_list() do the full gambit  of redirection, skip it in
 uri_list_canonify().

+1 

 Heck, skip the whole canonification step in HTML.

+1  ;)

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCfAhmMJF5cimLx9ARAvTiAJ9UGbloE4s7/D1fBTDxRUi4LO2HWACgmQ3M
unZ75f/NnD+usjZVbP3ovEo=
=d65F
-END PGP SIGNATURE-





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-07 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 19:13 ---
 what's wrong with putting redirector-stripping into the existing one

As far as I know the only thing wrong is that redirector stripping is part of
the deobfuscation code in HTML.pm, which is part of the parse() processing of
Message and Message::Node, which are independent of any conf options. Thus we
can't make redirector_patterns part of it as long as that gets its patterns from
conf.




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-07 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 22:06 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector

On Fri, May 06, 2005 at 07:13:26PM -0700, [EMAIL PROTECTED] wrote:
 As far as I know the only thing wrong is that redirector stripping is part of
 the deobfuscation code in HTML.pm, which is part of the parse() processing of

The deobfuscation code is in Util.pm.  It's just called, for the HTML part, in
HTML.pm.  It's called seperately in PerMsgStatus for the parsed URIs.  FYI.





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-07 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 23:07 ---
 The deobfuscation code is in Util.pm

Yes, which makes fixing the problem easier. I meant that the deobfuscation step
haas been made part of the parsing of HTML. The fact that the code is in Util
and is also called by PerMsgStatus for text bodies helps to point out how wrong
that is.

The method in Util to canonify URIs only has to be called in one place. We can
just not call it in HTML.pm, cache uncanonified URIs in the HTML metadata, then
call the canonify method in PerMsgStatus after the code has chosen to either use
the cached HTML metadata or parse the text message for URIs.

Is that what you were suggesting in comment #49 that Justin agreed with in
comment #50? Yes, I'm also +1 on that approach. This patch would then go in
cleanly to PerMsgStatus with no impact on HTML.

BTW, I'll open another bug for the three uses of parse() as a class method and
fix those as a separate issue.




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-07 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-07 07:03 ---
I have a patch almost ready that moves the calls to uri_list_canonify from HTML
and get_parsed_uri_list into get_uri_list and changes the URIDNSBL plugin call
to get_parsed_uri_list to instead call get_uri_list.

The API for get_uri_list is not changed. It always calls uri_list_canonify if it
doesn't immediately returned a cached result, and it saves the results in the
metadata for URIs in HTML messages. Since get_uri_list caches its results, the
canonification is only done once.

With that change, the patch for redirection patterns is simpler, only going in
PerMsgStatus instead of also in HTML.




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

Attachment #2845 is|0   |1
   obsolete||




--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 16:38 ---
Created an attachment (id=2846)
 -- (http://bugzilla.spamassassin.org/attachment.cgi?id=2846action=view)
Patch updated to work with HTML messages

 I don't think we should change -parse()

Justin, did you type that comment before seeing my last two comment #31 and
comment #32? All I had to do was have HTML.pm conditionalize its one use of the
conf parameter on $self-{main} being a ref. That assumes that when parse is
being called as a class method we don't care about the redirection_patterns.
The is no change to the parse API.

I just tested it and it all works. New patch is attached that is Daryl's patch
modified to get HTML messages to work according to Justin's suggestions.

If it looks good I think we can check it in to trunk for 3.1.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 16:51 ---
'All I had to do was have HTML.pm conditionalize its one use of the
conf parameter on $self-{main} being a ref. That assumes that when parse is
being called as a class method we don't care about the redirection_patterns.
The is no change to the parse API.'

I don't like the idea of having two different behaviours depending on how the
message object is created...



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 17:41 ---
 I don't like the idea of having two different behaviours depending on
 how the message object is created...

That's what's bothering me about this. The issue for me is what is parse()
defined as doing? Is it a class method or an object method?

I just had an idea... The problem is that configuration options are associated
with the Mail::SpamAssassin instance. We had to do that in order to have
per-user options. But we have changed things since then to have a system
configuration that is loaded once and then extending $main-{conf} with per-user
options.

So why can't the base {conf} options be in a class variable instead of an
instance variable? It would be accessed via a class method, which means we would
not have to pass through the main object to all those constructors.

Is there a reason why Mail::SpamAssassin::Conf-new is called when a new instance
of Mail::SpamAssassin is created and not when the class is loaded?

Would that simplify how we handle the comfiguration options?




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 18:33 ---
Some installations use multiple perl threads, each of which creates a
Mail::SpamAssassin object.  Beware of breaking thread safety.





--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 18:37 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector 

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


yep, you're right -- Mail::SpamAssassin::parse() is clearly an object
method, not a class method, as it's defined in the POD (plus it makes more
sense that way given that config settings could modify how messages are
parsed!).  we should fix any calls that use it as a class method to do the
opposite.

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCesplMJF5cimLx9ARAuDpAJ40fJYPDzRdBAiUoVrRa4vhmF95FACgrzbJ
6Hh5sogjcI+T1OO4jOqsoEE=
=JrZD
-END PGP SIGNATURE-





--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

Attachment #2846 is|0   |1
   obsolete||




--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 19:23 ---
Created an attachment (id=2847)
 -- (http://bugzilla.spamassassin.org/attachment.cgi?id=2847action=view)
Patch to fix the bug and deal with all problems so far mentioned in comments

This patch includes the previous fix for HTML messages and also changes
sa-learn, tools/speedtest and masses/corpora/mass-find-nonspam to call parse as
an instance method, not as an object method. NOTE: I don't know how to test the
last two, so I did not even run them with the change. The change was trivial,
but someone should test it.

I also added a new test to t/uri.t that uses redirection_patterns. It isn't an
end-to-end test starting with a message (although I did test that by hand) but
it does exercise the redirection parser.

Now I think it's ready for review :-)




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 19:55 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector

On Thu, May 05, 2005 at 07:23:57PM -0700, [EMAIL PROTECTED] wrote:
 This patch includes the previous fix for HTML messages and also changes
 sa-learn, tools/speedtest and masses/corpora/mass-find-nonspam to call parse 
 as
 an instance method, not as an object method. NOTE: I don't know how to test 
 the

-1

I see no reason to make large changes to the API of the core of our code
to add in a small enhancement to the functionality.  This is better done
in a different place.

Also, FYI, Message is not actually specific to SA so ruining the design
isn't really a great idea.


I'd be okay just adding the functionality to the URIDNSBL plugin for now.
Alternately, a separate plugin or doing it in get_uri_list().





--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 20:31 ---
Theo,

There's a lot of stuff in the comments here. I want to make sure that your vote
is about what ended in the patch and not about things I only talked about doing.

The way it ended up the changes are:

1) Mail::SpamAssassin-parse() was called in three places as a class method,
even though the POD doesn't document it as being one. That happened not to break
anything before now, but should be fixed independent of this bug. It's a one
line change in two of those places, and a three line change in the third. The
API for parse() does not have to be changed.

2) The Mail::SpamAssassin instance is passed in to the constructors for Message,
Node, and HTML so they can have access to configuration options. That adds an
argument to the constructor, a 'main' instance variable that is assigned in the
constructor and deleted in the finish() method.

3) The code that actually performs the enhancement.

#1 should be done anyway. #3 is not the issue you are bringing up.

I don't know that #2 is a big change to the core API. The three constructors are
 only called in a handful of places in all. It could be argued that we should
not have core functionality like message parsing and HTML rendering that cannot
be modified by any configuration options.

A plugin wouldn't help. Right now there is no way for the HTML rendering to make
use of anything in conf, for this enhancement or for any future one that has to
do with HTML.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


Re: [Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread Daniel Quinlan
 I'd be okay just adding the functionality to the URIDNSBL plugin for now.
 Alternately, a separate plugin or doing it in get_uri_list().

It might be better to add a new plugin API that could be called from the
URIDNSBL plugin or any other place that wants it.

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 20:35 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector

 I'd be okay just adding the functionality to the URIDNSBL plugin for now.
 Alternately, a separate plugin or doing it in get_uri_list().

It might be better to add a new plugin API that could be called from the
URIDNSBL plugin or any other place that wants it.





--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 20:48 ---
How would the new plugin API work? Is the idea that URIs are extracterd and
deobfuscated in HTML.pm and later the URIs can be passed to a plugin to be
checked for redirection?




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

 AssignedTo|[EMAIL PROTECTED] |dev@spamassassin.apache.org




--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 00:21 ---
Hmm, just got back in... having config options available to the various parsing
code sounds handy to me, but I'm happy with whatever you guys think is best.



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 13:12 ---
I see a basic design issue here. Going all the way back to the spamassasssin
script, processing looks like this, as a rough outline:
  
   $mail = Mail::SpamAssassin-parse( message text )
   $results = Mail::SpamAssassin-check($mail)

Until now everything that parse() has done has been independent of _any_
configuration options. That makes parse() something that understands the RFC
standard email, MIME, and HTML formats, but acts as a standalone library.

Unless we change that design, the redirection pattern code _has_ to be done
during the call to check(), using the deobfuscated URis that are parth of the
$mail object that parse() returns.

In the design we have been using in these patches we think about the problem
like this: Spammers use redirection as a way to obfuscate the URLs they are
sending you to. We find the target URLs in our deobfuscation routine.

Treating redirection patterns as just another form of obfuscation (#0) won't
work because 1) we want to have the flexibility of placing the patterns in a
configuration file; 2) parse() behaviour is independent of any configuration
file; 3) deobfuscation is performed in parse().

To get this to work we change any one of the four numbered items in the previous
paragraph. I think the easiest to change is #0, by performing the redirection
pattern stuff in check(), not in parse(). Changing #3 is what I did with the
patch I submitted -- It is not a lot of code change, but it is a fundamental
design change to parse().

The problem with doing redirection patterns in check() is that we want this to
act as a filter on all URIs before they are passed to all other processing in
check(). Do we have a place to put that in? Is that the change to the plugin API
that you were talking about, Daniel, to have a way of inserting a filter before
all other processing, or at least a filter on the URI list?



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 15:01 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector 

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


OK, moving that code to a check()-time call is the least destructive to
existing design, APIs and usage, so I'm happy with that.

now -- why not just change HTML.pm to add the URIs found in a raw form,
and then in check(), call into a new Mail:SpamAssassin::HTML method which
postprocesses that list to canonify the URIs?  That would seem to be
the minimal code change.

PS: I don't see where the plugin API needs changing, either.

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCe+k8MJF5cimLx9ARAhH7AJ440Ho48b90FCmdeQDJqkXkSsPWaACeODrH
rBvcvrqav89UMYNvodj+47w=
=oNvJ
-END PGP SIGNATURE-





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-06 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-06 15:12 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector

On Fri, May 06, 2005 at 12:21:07AM -0700, [EMAIL PROTECTED] wrote:
 Hmm, just got back in... having config options available to the various 
 parsing
 code sounds handy to me, but I'm happy with whatever you guys think is best.

I'm -1 on that idea, at least for Message (and therefore HTML).  It's designed
to be separate from the rest of SpamAssassin so I don't want to see it getting
Conf intertwined.  It's easy enough to make some extra options available to
Message::new() when we have to.





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

   Target Milestone|3.1.0   |3.2.0




--- Additional Comments From [EMAIL PROTECTED]  2005-05-04 21:16 ---
bumping to 3.2.0 for now




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-04 23:37 ---
'Is it the case that the @ISA in HTML.pm means
that it inherits from HTML::Parser and not from Mail::SpamAssassin, and it has
no access to the conf stuff in the main SpamAssassin object? The only call to
HTML-new() that I found was in Mail::SpamAssassin::Message::Node which also
seems not to have access to the main object and therefore no access to conf.
Which means that if you want HTML.pm to make use of the redirector_patterns
configuration value it is going to be quite tricky.'

not too tricky -- pass the 'main' pointer through all object constructors so
that Node can have a ptr, and then pass that in to the HTML object as well so it
can take a ptr there too.

just remember to delete that pointer in finish() in Node.pm, and add a new
finish() API to HTML.pm which will delete that ptr.  otherwise you wind up with
the danger of a circular reference.



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

Attachment #2788 is|0   |1
   obsolete||




--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 02:31 ---
Created an attachment (id=2845)
 -- (http://bugzilla.spamassassin.org/attachment.cgi?id=2845action=view)
patch 2788 updated to patch current svn but no other changes


Justin, is it really that simple? The Node instance is created in Message which
also doesn't have access to the 'main' pointer. So you would have to pass it in
to Message, which would then also need a finish() method to delete it. That's
three levels of apssing it through and dealing with a finish method.

What I'm most concerned about is how this relates to the loading the per-user
conf and the copying of the configiguration for the child processes in spamd.
Does anything special have to be done in passing the 'main' pointer in to the
Message object to Node to HTML because of that?

If you can confirm that those are not of concern, then I can modify Message and
Node and HTML accordingly.

In the meantime, this attachment is the same as the last one but upgraded to
apply cleanly to current svn trunk. I'm uploading it to make it easier to work
with the patch.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 10:27 ---
Sidney --

yep, it really is that simple ;)  it's unfortunate that it has to be passed
through, but given that the HTML parsing (a complex step that requires access to
configuration) takes place at the end of those 3 objects, I think it's
unavoidable.  we've had to do this in the past, too, on other objects.

this btw is why I generally set {main} on objects, even if it's not used
immediately -- it may be useful down the line...

it may be simpler if you can pass a $main directly from one method call to
another, instead of having to set $self-{main} and then have a $self-finish()
method to clean it up later.  that really depends on the call flow though.

another simpler option would be to refactor so that the HTML parsing can take
place from another class that *does* know {main}.  but I think that would be
much more intrusive and damaging, would break encapsulation by moving HTML
parsing away from the data it's working on (the Node object), and generally
doesn't make sense since this isn't a big deal.

'What I'm most concerned about is how this relates to the loading the per-user
conf and the copying of the configiguration for the child processes in spamd.
Does anything special have to be done in passing the 'main' pointer in to the
Message object to Node to HTML because of that?'

nope.  the 'main' pointer never changes between user scans -- instead the
$main-{conf} object reference does.   So as long as you store the main ptr via
$self-{main} = $main and access via $self-{main}-{conf}, instead of
taking a copy of the pointer a la $self-{conf} = $main-{conf} and then using
$self-{conf} later, you'll be OK.



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 14:25 ---
Justin said:
 yep, it really is that simple ;)

Wanna bet? :-)

sa-learn calls Mail::SpamAssassin::parse which I guess is a class method, which
gets passed the string 'Mail::SpamAssassin' as the first argument, and so does
not have a Mail::SpamAssassin opject as $self to pass to the
Mail::SpamAssassin::Message-new that it calls, which then calls
Mail::SpamAssassin::Message::Node-new which then calls
Mail::SpamAssassin::HTML-new which then can't access the main object because
there isn't any.

Ok, I see that sa-learn has an instance of Mail::SpamAssassin there called
$spamtest. Would it work to replace the call to Mail::SpamAssassin-parse with
$spamtest-parse?

grep shows that parse is the only function besides new that is called as a class
method, and it is used that way in sa-learn, corpora/mass-find-nonspam, and
tools/speedtest. I did't look in the latter two to see if they have instances of
Mail::SpamAssassin handy when they call parse so it can be called as an instance
method instead of as a class method.

Yeh, this simple :-)



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 14:44 ---
It looks like in all three cases where parse is called as a class method, the
caller realy wants to do some parsing of a message for some specific information
and can afford to have the parser not use any configuration options.

I just did some more googling about O-O perl... How does this sound? The main
object is passed through until it gets to the HTML object that wants to use it.
In HTML the test for parsing redirection_patterns is conditionalized on it being
called as an object method instead of a class method by wrapping it in

 if ($self-{main})

Does that look right?



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 14:57 ---
typo. I meant (and making what I'm wrapping explicit)

my $redirector_patterns;
$redirector_patterns =
 $self-{main}-{conf}-{redirector_patterns} if ref $self-{main};
   [ ... ]
my @tmp = Mail::SpamAssassin::Util::uri_list_canonify($redirector_patterns, 
$uri);

Since uri_list_canonify ignores the first argument if it is undef.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-05 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-05 15:14 ---
ick.  ok, if the code is run at message-parse-time without the $main object
being available easily, that certainly makes life harder.

I don't think we should change -parse() to require a Mail::SA object be passed
in -- that's changing the API (and quite a lot of assumptions in the existing
calling code).  I don't think we can assume that there even will *be* a Mail::SA
object in existence by that stage.

in that case, the only way I can see to do it is to defer the use of {conf}
data, e.g. redirection_patterns, until the caller actually *reads* the URI list
in PerMsgStatus, and at that stage, use the $conf data (since it is available by
then).

I don't think we should add another version of the parse() API to deal with this
case.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-04 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 16:10 ---
Actually, when I don't mis-type the uri, it works fine in a plain text message.
 Sidney, can you attach the html message you're using to the bug?



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-04 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 16:24 ---
Created an attachment (id=2832)
 -- (http://bugzilla.spamassassin.org/attachment.cgi?id=2832action=view)
Test spam containing redirected url that is not parsed


The attached message contains an obfuscated nate.com redirection.

I added some dbg messages to Util.pm in the uri_canonify_list method.

It is being called once with the first argument undef and the second argument
containing the URL to be cleaned, which is a nate.com redirection that is
obfuscated.

The second time uri_canonify_list is called it has the redirector patterns in
the first argument, but the second argument is an empty array.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-04 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 17:01 ---
I think I see the problem.

In HTML.pm the call to uri_list_canonify passes in
$self-{conf}-{redirector_patterns} but unlike in PerMsgStatus.pm, where the
same call works, there is no 'conf' = $main-{conf} defined in new().

You either have to add conf to uri_list_canonify as it is in PerMsgStatus or
else change the argument in the call to be
$self-{main}-{conf}-{redirector_patterns}




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-04 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 17:09 ---
Well, I'm wrong about the fix - I don't know much about how O-O perl works and
what the SUPER::new in HTML::new does compared to the way things are in
PerMsgStatus.

But it does look like $self-{conf}-{redirector_patterns} is not what you
expected in HTML.pm.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-04 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 17:42 ---
I read up a little on perl O-O... Is it the case that the @ISA in HTML.pm means
that it inherits from HTML::Parser and not from Mail::SpamAssassin, and it has
no access to the conf stuff in the main SpamAssassin object? The only call to
HTML-new() that I found was in Mail::SpamAssassin::Message::Node which also
seems not to have access to the main object and therefore no access to conf.
Which means that if you want HTML.pm to make use of the redirector_patterns
configuration value it is going to be quite tricky.




--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-04 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 17:46 ---
Subject: Re:  [review] RFE: please add pattern for nate.com redirector

Yeah, I just took a look at it now.  It appears that I just copied the 
line over from PerMsgStatus.pm without any thought.  Worse I failed to 
test it with an HTML message.

I don't have time to work on it now.  I'll see if I can get to it in the 
next few days.






--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-03 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-02 23:07 ---
+1 looks great to me



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-03 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

  Status Whiteboard||needs 1 more vote






--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-03 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 15:18 ---
What should this look like when it works? I ran spamassassin -t -D on a spam I
have that uses the nate.com redirector and all I saw in the logs that looke
relevant was the following that shows: 1) The redirectors were added to the
config; 2) The redirection was to neverp4yretail.com; 3) nate.com is skipped in
uridnsbl processing; 4) no other reference in the logs to neverp4yretail.com.

Should there be something in the debug log, or might it be working and there is
no relevant dbg stgatement to show it?

dbg: config: adding redirector regex: (?i-xsm:^(?:http://)?chkpt\.zdnet\.
com/chkpt/\w+/(.*)$)
dbg: config: adding redirector regex: (?i-xsm:^(?:http://)?www\.nate\.com
/r/\w+/(.*)$)

dbg: uri: html uri found, http://www.nate.com/r/DM03/n%65verp4%79re%74%61
%69%6c%2eco%6d/%62%61m/?m%61%6e=%6Di%634%39
dbg: uri: cleaned html uri, http://www.nate.com/r/DM03/neverp4yretail.com
/bam/?man=mic49
dbg: uri: cleaned html uri, http://www.nate.com/r/DM03/n%65verp4%79re%74%
61%69%6c%2eco%6d/%62%61m/?m%61%6e=%6Di%634%39
dbg: uridnsbl: domain nate.com in skip list
dbg: uridnsbl: domain nate.com in skip list



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-03 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 15:31 ---
quinlan added nate.com to the uridnsbl_skip_domain list in r160005 so any checks
on nate.com (including the domain being redirected to) are skipped.



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-05-03 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176





--- Additional Comments From [EMAIL PROTECTED]  2005-05-03 15:48 ---
Right!  That was just a coincidence.

For some reason it doesn't work with the encoded url, I'll have to take a look.



--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


[Bug 4176] [review] RFE: please add pattern for nate.com redirector

2005-04-13 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=4176


[EMAIL PROTECTED] changed:

   What|Removed |Added

Summary|RFE: please add pattern for |[review] RFE: please add
   |nate.com redirector |pattern for nate.com
   ||redirector






--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.