Re: regexp help?

2000-01-26 Thread Mikko Hänninen

Jim B [EMAIL PROTECTED] wrote on Wed, 26 Jan 2000:
 I've been looking at how Mutt uses these regexps, and it seems I may need
 something more like:

 ignore this match this ignore this

 From what I can tell, the expression actually is more of the "ignore" part
 rather than the "match" part... like a "negative" match.  Am I right on
 this?

I think you are.

And that unfortunately means that as far as I can see, you can't have 
the kind of reply_regexp that you'd like, because there's no way to 
specify "skip these chars" -- they'll be included no matter what you do,
or then the part at the end won't.

It would be possible to solve this case with just a reply_regexp if the 
tracking number got added somewhere to the beginning of the subject, not 
to the end.  As it is, you either need to convince the tracking software
to start adding References headers, or to use a different method for 
mungling the subjects, or to figure out some way to fix the subjects 
and/or Reference headers (procmail? custom perl script to maintain a 
database of seen subject lines which adds a References header?).


David T-G [EMAIL PROTECTED] wrote on Tue, 25 Jan 2000:
 Mutt regexp gurus: Is specifying the A-Z necessary if we don't care about
 the case?  If we do and have to use A-Z, will the re|aw exp match upper
 case versions?

Me either, but I think it follows the usual regexp rules, since the
re|aw part works (Most my replies begin with Re:, not re:, yet that
works for me) -- no reason why an [a-z] match would work differently.

(I mean, I haven't looked at the code, but the easiest way for a case
insensitive match is to lowercase the whole string first before
matching, instead of try to start mungling the regexp parser somehow,
so I'd be willing to bet that's how it's done).


Regards,
Mikko
-- 
// Mikko Hänninen, aka. Wizzu  //  [EMAIL PROTECTED]  //  http://www.iki.fi/wiz/
// The Corrs list maintainer  //   net.freak  //   DALnet IRC operator /
// Interests: roleplaying, Linux, the Net, fantasy  scifi, the Corrs /
Clouds are high flying fogs.



Re: regexp help?

2000-01-26 Thread David DeSimone

David T-G [EMAIL PROTECTED] wrote:

 Mutt regexp gurus:  Is specifying the A-Z necessary if we don't care
 about the case?  If we do and have to use A-Z, will the re|aw exp
 match upper case versions?

Specifying any capital letter in your regexp will cause Mutt to match
the regexp case-sensitively.  If there are no capital letters in the
regexp, it will be performed case-insensitively.

So a regexp like "^(re|aw): [A-Z]*" will fail to match "Re: ",
because the "A-Z" part causes a case-sensitive match.  If the
regexp was "^(re|aw): [a-z]*", it would match.

-- 
David DeSimone   | "The doctrine of human equality reposes on this:
[EMAIL PROTECTED]   |  that there is no man really clever who has not
Hewlett-Packard  |  found that he is stupid." -- Gilbert K. Chesterson
UX WTEC Engineer |PGP: 5B 47 34 9F 3B 9A B0 0D  AB A6 15 F1 BB BE 8C 44



Re: regexp help?

2000-01-26 Thread David DeSimone

Jim B [EMAIL PROTECTED] wrote:

 I believe this can be corrected by a well-written regular expression, but
 I'm not so good at that so I'm wondering if anyone can help.

This won't work with Mutt as it's currently coded, because Mutt assumes
that the reply_regexp matches something at the *beginning* of the
string, and takes anything following that match as the "real subject."

Mutt uses code like this:

if (e-subject)
{
  regmatch_t pmatch[1];

  rfc2047_decode (e-subject, e-subject, mutt_strlen (e-subject) + 1);

  if (regexec (ReplyRegexp.rx, e-subject, 1, pmatch, 0) == 0)
e-real_subj = e-subject + pmatch[0].rm_eo;
  else
e-real_subj = e-subject;
}

The pmatch[] array contains a list of matches to subexpressions within
the regular expression.  pmatch[0], used here, matches the entire
regular expression.  As written, the "real subject" is taken to be
whatever comes after the reply_regexp.  So...

 Subject: RE: [GeneralService] Word1 Word2 [IMS21250368242002]
 
 set reply_regexp = "^(re|aw):[ \t]*[ \t]\[[a-z0-9]*\]"
 
 but that apparently doesn't work.

Nope, that won't work, because the reply_regexp matches the entire
string, leaving no "leftover" text for the "real subject".

By the way, something to watch out for in .mutrrc's:

set reply_regexp = "^(re|aw):[ \t]*"

This doesn't do what you think it does; Mutt sees the "\t" as simply
"t", because backslashes are parsed within double-quotes.  So the regexp
comes out as "^(re|aw):[ t]*", and so a subject like "Re: Tuesday" comes
out with a real subject of "uesday".  :)

Maybe there's a way to get middle-matching of subjects; I'll have to
play with it.

-- 
David DeSimone   | "The doctrine of human equality reposes on this:
[EMAIL PROTECTED]   |  that there is no man really clever who has not
Hewlett-Packard  |  found that he is stupid." -- Gilbert K. Chesterson
UX WTEC Engineer |PGP: 5B 47 34 9F 3B 9A B0 0D  AB A6 15 F1 BB BE 8C 44



Regexp dangers (was Re: regexp help?)

2000-01-26 Thread Marius Gedminas

On Wed, Jan 26, 2000 at 02:13:23PM -0600, David DeSimone wrote:
 By the way, something to watch out for in .mutrrc's:
 
 set reply_regexp = "^(re|aw):[ \t]*"
 
 This doesn't do what you think it does; Mutt sees the "\t" as simply
 "t", because backslashes are parsed within double-quotes.  So the regexp
 comes out as "^(re|aw):[ t]*", and so a subject like "Re: Tuesday" comes
 out with a real subject of "uesday".  :)

I always wondered, how many backslashes I need in such places ;).

However my ~/.muttrc contains this:

  set reply_regexp="^(re|aw)(\\[[0-9]+\\])?:[ \t]*" # TheBat! uses Re[%d]:

and when I query Mutt (:set ?reply_regexp), I see

  reply_regexp="^(re|aw)(\[[0-9]+\])?:[ .]*"

I suppose that `.' in brackets is the actual tab character, so your warning
is not necessary in this case.

OTOH things like

  send-hook '~C "\\foo[^ ]*@"'   ...

or

  color index red default '~h "X-Note: Message-ID seen before recently\."'

are a different issue.  I'm not quite sure how many `\'s should there
be before that `.' and I'm a little too lazy to experiment to find it
out. :-)  (I'm sure a single \ there is not enough, but even an unescaped
dot would work for me there, and I don't expect many false matches.)

Marius Gedminas
-- 
F U cn rd dis U mst uz Unix.



Re: regexp help?

2000-01-25 Thread David T-G

Jim --

...and then Jim B said...
% Hi, I just started using Mutt last night, coming from the Pine world.  I

Welcome to the real world :-)


...
% 
% However, the IMC machine writes a new tracking number into the Subject

Ouch!

...
% 
% For example, one message may look like this:
% 
% Subject: RE: [GeneralService] Word1 Word2 [IMS21250368242002]
% 
% I will reply to it, then the customer will reply again, and the Subject
% will change to this:
% 
% Subject: RE: [GeneralService] Word1 Word2 [IMS21250368262914]
% 
% The only thing that changes is the tracking number.
% 
% I've tried the following:
% 
% set reply_regexp = "^(re|aw):[ \t]*[ \t]\[A-Z*0-9*]"
% 
% but that apparently doesn't work.

I suppose it's too much to ask your server software maintainer to quit
changing the subject line, eh? :-)

Let's see, here...  Your regexp says to match a "re" or "aw" at the
front of the line, a colon, any spaces or tabs, a single space or tab,
an opening bracket, any cap letters, any numbers, and a closing bracket.

Looks like you don't have a match for your actual topic or words parts.

Could you be looking for something more like

  "^(re|aw):[ \t]\[A-Za-z*][ \t]([A-Za-z]*[ \t]*)*\[A-Z*0-9*]"

instead, which (we hope) matches a "re" or "aw" at the front, a colon, 
a space or tab, an opening bracket, any number of letters, a closing
bracket, a space or tab, any number of (any number of letters and any
number of spaces or tabs), an opening bracket, any number of cap letters,
any number of digits, and a closing bracket (whew!)?  That, at least,
should better match your described subject line...

On the other hand, if you simply wanted to match on anything between your
"re|aw" and the actual tracking number, you probably wanted ".*" instead
of "*" between the two white classes -- but I don't think that that will
work for your threading.

Mutt regexp gurus: Is specifying the A-Z necessary if we don't care about
the case?  If we do and have to use A-Z, will the re|aw exp match upper
case versions?


% 
% Can anyone supply a regexp that should handle this?  It would help SO
% much.
% 
% Thanks!!


:-D
-- 
David T-G   * It's easier to fight for one's principles
(play) [EMAIL PROTECTED]  * than to live up to them. -- fortune cookie
(work) [EMAIL PROTECTED]
http://www.bigfoot.com/~davidtg/Shpx gur Pbzzhavpngvbaf Qrprapl Npg!
The "new millennium" starts at the beginning of 2001.  There was no year 0.
Note: If bigfoot.com gives you fits, try sector13.org in its place. *sigh*


 PGP signature


Re: regexp help?

2000-01-25 Thread Jim B

Hi, thanks for replying.  :)


On Tue, 25 Jan 2000, David T-G wrote:

 Looks like you don't have a match for your actual topic or words parts.

I've been looking at how Mutt uses these regexps, and it seems I may need
something more like:

ignore this match this ignore this

From what I can tell, the expression actually is more of the "ignore" part
rather than the "match" part... like a "negative" match.  Am I right on
this?


 On the other hand, if you simply wanted to match on anything between your
 "re|aw" and the actual tracking number, you probably wanted ".*" instead
 of "*" between the two white classes -- but I don't think that that will
 work for your threading.

I think this would be more of what I need come to think of it.  I want to
match everything between the re|aw and the tracking number.  Can I just
match the middle part and ignore the re|aw and bracketed sections?


 Mutt regexp gurus: Is specifying the A-Z necessary if we don't care about
 the case?  If we do and have to use A-Z, will the re|aw exp match upper
 case versions?

Well I'm no guru, :)  but, the re|aw does match upper and lower case for
some reason.  I got the whole beginning of my expression from the default
produced by the www-based muttrc generator and it seems to work as
advertised.