Re: regexp help?
Jim B [EMAIL PROTECTED] wrote on Wed, 26 Jan 2000: I've been looking at how Mutt uses these regexps, and it seems I may need something more like: ignore this match this ignore this From what I can tell, the expression actually is more of the "ignore" part rather than the "match" part... like a "negative" match. Am I right on this? I think you are. And that unfortunately means that as far as I can see, you can't have the kind of reply_regexp that you'd like, because there's no way to specify "skip these chars" -- they'll be included no matter what you do, or then the part at the end won't. It would be possible to solve this case with just a reply_regexp if the tracking number got added somewhere to the beginning of the subject, not to the end. As it is, you either need to convince the tracking software to start adding References headers, or to use a different method for mungling the subjects, or to figure out some way to fix the subjects and/or Reference headers (procmail? custom perl script to maintain a database of seen subject lines which adds a References header?). David T-G [EMAIL PROTECTED] wrote on Tue, 25 Jan 2000: Mutt regexp gurus: Is specifying the A-Z necessary if we don't care about the case? If we do and have to use A-Z, will the re|aw exp match upper case versions? Me either, but I think it follows the usual regexp rules, since the re|aw part works (Most my replies begin with Re:, not re:, yet that works for me) -- no reason why an [a-z] match would work differently. (I mean, I haven't looked at the code, but the easiest way for a case insensitive match is to lowercase the whole string first before matching, instead of try to start mungling the regexp parser somehow, so I'd be willing to bet that's how it's done). Regards, Mikko -- // Mikko Hänninen, aka. Wizzu // [EMAIL PROTECTED] // http://www.iki.fi/wiz/ // The Corrs list maintainer // net.freak // DALnet IRC operator / // Interests: roleplaying, Linux, the Net, fantasy scifi, the Corrs / Clouds are high flying fogs.
Re: regexp help?
David T-G [EMAIL PROTECTED] wrote: Mutt regexp gurus: Is specifying the A-Z necessary if we don't care about the case? If we do and have to use A-Z, will the re|aw exp match upper case versions? Specifying any capital letter in your regexp will cause Mutt to match the regexp case-sensitively. If there are no capital letters in the regexp, it will be performed case-insensitively. So a regexp like "^(re|aw): [A-Z]*" will fail to match "Re: ", because the "A-Z" part causes a case-sensitive match. If the regexp was "^(re|aw): [a-z]*", it would match. -- David DeSimone | "The doctrine of human equality reposes on this: [EMAIL PROTECTED] | that there is no man really clever who has not Hewlett-Packard | found that he is stupid." -- Gilbert K. Chesterson UX WTEC Engineer |PGP: 5B 47 34 9F 3B 9A B0 0D AB A6 15 F1 BB BE 8C 44
Re: regexp help?
Jim B [EMAIL PROTECTED] wrote: I believe this can be corrected by a well-written regular expression, but I'm not so good at that so I'm wondering if anyone can help. This won't work with Mutt as it's currently coded, because Mutt assumes that the reply_regexp matches something at the *beginning* of the string, and takes anything following that match as the "real subject." Mutt uses code like this: if (e-subject) { regmatch_t pmatch[1]; rfc2047_decode (e-subject, e-subject, mutt_strlen (e-subject) + 1); if (regexec (ReplyRegexp.rx, e-subject, 1, pmatch, 0) == 0) e-real_subj = e-subject + pmatch[0].rm_eo; else e-real_subj = e-subject; } The pmatch[] array contains a list of matches to subexpressions within the regular expression. pmatch[0], used here, matches the entire regular expression. As written, the "real subject" is taken to be whatever comes after the reply_regexp. So... Subject: RE: [GeneralService] Word1 Word2 [IMS21250368242002] set reply_regexp = "^(re|aw):[ \t]*[ \t]\[[a-z0-9]*\]" but that apparently doesn't work. Nope, that won't work, because the reply_regexp matches the entire string, leaving no "leftover" text for the "real subject". By the way, something to watch out for in .mutrrc's: set reply_regexp = "^(re|aw):[ \t]*" This doesn't do what you think it does; Mutt sees the "\t" as simply "t", because backslashes are parsed within double-quotes. So the regexp comes out as "^(re|aw):[ t]*", and so a subject like "Re: Tuesday" comes out with a real subject of "uesday". :) Maybe there's a way to get middle-matching of subjects; I'll have to play with it. -- David DeSimone | "The doctrine of human equality reposes on this: [EMAIL PROTECTED] | that there is no man really clever who has not Hewlett-Packard | found that he is stupid." -- Gilbert K. Chesterson UX WTEC Engineer |PGP: 5B 47 34 9F 3B 9A B0 0D AB A6 15 F1 BB BE 8C 44
Regexp dangers (was Re: regexp help?)
On Wed, Jan 26, 2000 at 02:13:23PM -0600, David DeSimone wrote: By the way, something to watch out for in .mutrrc's: set reply_regexp = "^(re|aw):[ \t]*" This doesn't do what you think it does; Mutt sees the "\t" as simply "t", because backslashes are parsed within double-quotes. So the regexp comes out as "^(re|aw):[ t]*", and so a subject like "Re: Tuesday" comes out with a real subject of "uesday". :) I always wondered, how many backslashes I need in such places ;). However my ~/.muttrc contains this: set reply_regexp="^(re|aw)(\\[[0-9]+\\])?:[ \t]*" # TheBat! uses Re[%d]: and when I query Mutt (:set ?reply_regexp), I see reply_regexp="^(re|aw)(\[[0-9]+\])?:[ .]*" I suppose that `.' in brackets is the actual tab character, so your warning is not necessary in this case. OTOH things like send-hook '~C "\\foo[^ ]*@"' ... or color index red default '~h "X-Note: Message-ID seen before recently\."' are a different issue. I'm not quite sure how many `\'s should there be before that `.' and I'm a little too lazy to experiment to find it out. :-) (I'm sure a single \ there is not enough, but even an unescaped dot would work for me there, and I don't expect many false matches.) Marius Gedminas -- F U cn rd dis U mst uz Unix.
Re: regexp help?
Jim -- ...and then Jim B said... % Hi, I just started using Mutt last night, coming from the Pine world. I Welcome to the real world :-) ... % % However, the IMC machine writes a new tracking number into the Subject Ouch! ... % % For example, one message may look like this: % % Subject: RE: [GeneralService] Word1 Word2 [IMS21250368242002] % % I will reply to it, then the customer will reply again, and the Subject % will change to this: % % Subject: RE: [GeneralService] Word1 Word2 [IMS21250368262914] % % The only thing that changes is the tracking number. % % I've tried the following: % % set reply_regexp = "^(re|aw):[ \t]*[ \t]\[A-Z*0-9*]" % % but that apparently doesn't work. I suppose it's too much to ask your server software maintainer to quit changing the subject line, eh? :-) Let's see, here... Your regexp says to match a "re" or "aw" at the front of the line, a colon, any spaces or tabs, a single space or tab, an opening bracket, any cap letters, any numbers, and a closing bracket. Looks like you don't have a match for your actual topic or words parts. Could you be looking for something more like "^(re|aw):[ \t]\[A-Za-z*][ \t]([A-Za-z]*[ \t]*)*\[A-Z*0-9*]" instead, which (we hope) matches a "re" or "aw" at the front, a colon, a space or tab, an opening bracket, any number of letters, a closing bracket, a space or tab, any number of (any number of letters and any number of spaces or tabs), an opening bracket, any number of cap letters, any number of digits, and a closing bracket (whew!)? That, at least, should better match your described subject line... On the other hand, if you simply wanted to match on anything between your "re|aw" and the actual tracking number, you probably wanted ".*" instead of "*" between the two white classes -- but I don't think that that will work for your threading. Mutt regexp gurus: Is specifying the A-Z necessary if we don't care about the case? If we do and have to use A-Z, will the re|aw exp match upper case versions? % % Can anyone supply a regexp that should handle this? It would help SO % much. % % Thanks!! :-D -- David T-G * It's easier to fight for one's principles (play) [EMAIL PROTECTED] * than to live up to them. -- fortune cookie (work) [EMAIL PROTECTED] http://www.bigfoot.com/~davidtg/Shpx gur Pbzzhavpngvbaf Qrprapl Npg! The "new millennium" starts at the beginning of 2001. There was no year 0. Note: If bigfoot.com gives you fits, try sector13.org in its place. *sigh* PGP signature
Re: regexp help?
Hi, thanks for replying. :) On Tue, 25 Jan 2000, David T-G wrote: Looks like you don't have a match for your actual topic or words parts. I've been looking at how Mutt uses these regexps, and it seems I may need something more like: ignore this match this ignore this From what I can tell, the expression actually is more of the "ignore" part rather than the "match" part... like a "negative" match. Am I right on this? On the other hand, if you simply wanted to match on anything between your "re|aw" and the actual tracking number, you probably wanted ".*" instead of "*" between the two white classes -- but I don't think that that will work for your threading. I think this would be more of what I need come to think of it. I want to match everything between the re|aw and the tracking number. Can I just match the middle part and ignore the re|aw and bracketed sections? Mutt regexp gurus: Is specifying the A-Z necessary if we don't care about the case? If we do and have to use A-Z, will the re|aw exp match upper case versions? Well I'm no guru, :) but, the re|aw does match upper and lower case for some reason. I got the whole beginning of my expression from the default produced by the www-based muttrc generator and it seems to work as advertised.