https://bugzilla.wikimedia.org/show_bug.cgi?id=41151

--- Comment #2 from Gabriel Wicke <[email protected]> 2012-10-19 16:28:24 
UTC ---
[09:19] <liangent> $linkTrail = '/^([a-z]+)(.*)$/sD'; is in MessagesEn.php
[09:20] <liangent> this shouldn't include CJK characters, right?
[09:20] <gwicke> I would think so
[09:20] <liangent> but parsoid includes CJK chars in linktrail..
[09:21] <gwicke> interesting- I guess we approximate the regexp to something
more liberal right now
[09:22] <gwicke> there is no i18n support yet, so we don't use the localized
regexps
[09:23] <gwicke> we currently have tail:( ![A-Z \t(),.:\n\r-] tc:text_char {
return tc } )* 
[09:24] <gwicke> I think the idea was to be very liberal about tails in the
tokenizer, and to convert/validate based on language in token stream transforms
[09:24] <gwicke> invalid tails can then be converted back to a text token
[09:25] <gwicke> the A-Z might be a bit fishy in that context though..

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to