https://bugzilla.wikimedia.org/show_bug.cgi?id=41151
--- Comment #2 from Gabriel Wicke <[email protected]> 2012-10-19 16:28:24 UTC --- [09:19] <liangent> $linkTrail = '/^([a-z]+)(.*)$/sD'; is in MessagesEn.php [09:20] <liangent> this shouldn't include CJK characters, right? [09:20] <gwicke> I would think so [09:20] <liangent> but parsoid includes CJK chars in linktrail.. [09:21] <gwicke> interesting- I guess we approximate the regexp to something more liberal right now [09:22] <gwicke> there is no i18n support yet, so we don't use the localized regexps [09:23] <gwicke> we currently have tail:( ![A-Z \t(),.:\n\r-] tc:text_char { return tc } )* [09:24] <gwicke> I think the idea was to be very liberal about tails in the tokenizer, and to convert/validate based on language in token stream transforms [09:24] <gwicke> invalid tails can then be converted back to a text token [09:25] <gwicke> the A-Z might be a bit fishy in that context though.. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
