https://bugzilla.wikimedia.org/show_bug.cgi?id=42513

--- Comment #9 from Krinkle <[email protected]> ---
(In reply to comment #8)
> (In reply to comment #6)
> > On further investigation and following links from that gist, it looks like
> > the
> > regexes in mw.Uri aren't so half-baked.
> > 
> > They actually come from here:
> > http://blog.stevenlevithan.com/archives/parseuri
> 
> If they weren't half-baked, mw.Uri wouldn't crash on '@' in the URL. 
> 
> These regexes are already awful (177 characters? Seriously?), and are bound
> to
> get worse as we discover more edge-cases. 
> 
> What about password with a '@' in it? (Yeah, this is allowed as far as I
> know,

No, those need to be url escaped afaik. At least in Chrome and in the nodejs
modules I used it the password (especially the @ symbol) needs to be url
escaped or it won't work.

> While on second though the <a>-abuse I linked isn't a good idea, we really
> need a serious parsing library.

I'm open to suggestions, though it'll need to be flexible and be replaceable
in-place to allow smooth transition (same API and and unit tests, different
internal implementation).

The module is generic enough to easily allow a different parser.

But there is some ambiguity on the fields to be extracted and their meaning[1],
so even a "perfect" library may need modification to work for us and our
interpretation of those terms.

[1] http://tantek.com/2011/238/b1/many-ways-slice-url-name-pieces

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to