Hey guys, GMime author here... On 02/07/2011 11:28 PM, Duncan wrote: > Duncan posted on Mon, 07 Feb 2011 23:06:24 +0000 as excerpted: > > >> Meanwhile, something else URL related that /used/ to annoy me, tho I've >> not noticed it recently so maybe it's fixed (?), is unspaced commas or >> the like, terminating a URL. Here's testing it: >> >> http://example.com, Does the URL include the comma? >> >> http://example.com. What about the terminating dot? >> >> http://example.com? Question mark? >> > Hmm... pan got those three right, now (as of... see the headers for git > commit, it's been a bit since I rebuilt). > > >> "http://example.com" Double-quote? >> >> 'http://example.com' Single-quote? >> >> http://example.com: Colon? >> > ... and those three wrong. Pan didn't include the leading quote on either > of those, but parsed the trailing punctuation as part of the URL on all > three. > > >> Those of us using pan to follow this list, thru gmane or whatever, >> should get pan's behavior with the above tested directly. I guess I'll >> post a followup with the results for anyone using a standard mail >> client. >>
I wrote up a quick test to see if it might be a bug in GMime 2.4's gmime-filter-html.c implementation and it appears to get all of the above urls correct[1], plus it didn't seem to get confused by <http://example.com ... and then no >, so I'm guessing that Pan doesn't use GMime for this feature(?) and that maybe it has some custom regex's or something. A number of GNOME apps (including gnome-terminal) I think use regexes that you may be able to steal (assuming they are not the same ones already used by Pan), or another option is to use GMime's url scanner instead. You can see example usage in gmime/gmime-filter-html.c (you'll need something like the 'patterns' array at the top, altho you could probably drop the mask bit unless you want to keep a similar url vs addrspec feature). The overall API is similar to regex and so could almost be used as a drop-in replacement. I mention this because it might be easier to try this out than to debug/fix Pan's current url regexes (I say this as a non-perl programmer who is very much intimidated by regex syntax ;-) As an added bonus, my url-scanner trie graph approach is ~13x faster than regex for this particular purpose (or at least *was* back when I first wrote it ~6-7 years ago). Hope that helps... Jeff 1. There is, however, a difference between what GMime matches as the url string and what Thunderbird matches in the double-quotes example (Thunderbird includes both leading and closing quotes, GMime only matches what is between them). _______________________________________________ Pan-devel mailing list Pan-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-devel