User "Brion VIBBER" posted a comment on MediaWiki.r90092. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/90092#c20970 Commit summary:
fix for bug29371 . regex wordwrap with UTF8: do not use \b metacharacter. The problem is that JavaScript recognizes word boundaries only before/after ASCII letters (and numbers/underscore) Comment: One more case I'm concerned about is going to be regressions on punctuation that appears before a word: * "Goat-horned epidendrum", match on "goat horn" turns up "<nowiki><span class="highlight">Goat</span>-<span class="highlight">horn</span>ed epidendrum</nowiki>" in current deployment. Current regex I think won't match on the "horn". If that's cleared up then I'm probably ok marking it resolved -- you should only set this one back to 'new' though to remind others to check, I don't think the system will let you resolve it yourself. It's also worth testing texts that might be more complicated to match entirely, like: * similar cases with non-ASCII punctuation chars, such as em-dash, German or French quotes, or CJK brackets/quotes/etc * text without spaces (Japanese, Chinese, Thai text?) Those however probably already don't really work well, so they're not going to be regressions from the previous code. :) So don't worry about adding/fixing those yet. _______________________________________________ MediaWiki-CodeReview mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
