User "Brion VIBBER" posted a comment on MediaWiki.r90092.

Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/90092#c20970
Commit summary:

fix for bug29371 . regex wordwrap with UTF8: do not use \b metacharacter. The 
problem is that JavaScript recognizes word boundaries only before/after ASCII 
letters (and numbers/underscore)

Comment:

One more case I'm concerned about is going to be regressions on punctuation 
that appears before a word:

* "Goat-horned epidendrum", match on "goat horn" turns up "<nowiki><span 
class="highlight">Goat</span>-<span class="highlight">horn</span>ed 
epidendrum</nowiki>" in current deployment. Current regex I think won't match 
on the "horn".

If that's cleared up then I'm probably ok marking it resolved -- you should 
only set this one back to 'new' though to remind others to check, I don't think 
the system will let you resolve it yourself.


It's also worth testing texts that might be more complicated to match entirely, 
like:
* similar cases with non-ASCII punctuation chars, such as em-dash, German or 
French quotes, or CJK brackets/quotes/etc
* text without spaces (Japanese, Chinese, Thai text?)

Those however probably already don't really work well, so they're not going to 
be regressions from the previous code. :) So don't worry about adding/fixing 
those yet.


_______________________________________________
MediaWiki-CodeReview mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview

Reply via email to