https://bugzilla.wikimedia.org/show_bug.cgi?id=42513
--- Comment #9 from Krinkle <[email protected]> --- (In reply to comment #8) > (In reply to comment #6) > > On further investigation and following links from that gist, it looks like > > the > > regexes in mw.Uri aren't so half-baked. > > > > They actually come from here: > > http://blog.stevenlevithan.com/archives/parseuri > > If they weren't half-baked, mw.Uri wouldn't crash on '@' in the URL. > > These regexes are already awful (177 characters? Seriously?), and are bound > to > get worse as we discover more edge-cases. > > What about password with a '@' in it? (Yeah, this is allowed as far as I > know, No, those need to be url escaped afaik. At least in Chrome and in the nodejs modules I used it the password (especially the @ symbol) needs to be url escaped or it won't work. > While on second though the <a>-abuse I linked isn't a good idea, we really > need a serious parsing library. I'm open to suggestions, though it'll need to be flexible and be replaceable in-place to allow smooth transition (same API and and unit tests, different internal implementation). The module is generic enough to easily allow a different parser. But there is some ambiguity on the fields to be extracted and their meaning[1], so even a "perfect" library may need modification to work for us and our interpretation of those terms. [1] http://tantek.com/2011/238/b1/many-ways-slice-url-name-pieces -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
