Edit report at https://bugs.php.net/bug.php?id=63663&edit=1
ID: 63663 Updated by: ahar...@php.net Reported by: kobrien at kiva dot org Summary: str_word_count does not properly handle non-latin characters -Status: Open +Status: Analyzed Type: Bug Package: Strings related Operating System: Ubuntu 12.04 PHP Version: 5.3.20-dev Block user comment: N Private report: N New Comment: This is due to the use of isalpha() internally, which doesn't play well with multibyte encodings like UTF-8, regardless of the locale setting. Fundamentally, this is the same issue as bug #27668 â I'm not sure there's a lot we can do about this in PHP 5.x, but it's worth noting if and when we revisit Unicode string handling internally. Previous Comments: ------------------------------------------------------------------------ [2012-12-01 02:29:17] kobrien at kiva dot org Description: ------------ The function str_word_count() does work properly on non-latin characters. It will return a value of zero. Whereas str_word_count() works properly on latin characters and returns the value for the number of words in a string. Test script: --------------- <?php print str_word_count("PHP function str_word_count does not properly handle non-latin characters") . "\n"; // returns 11 print str_word_count("Хабилло жиÑÐµÐ»Ñ Ð¯Ð²Ð°Ð½Ñкого Ñайона. ÐÐ¼Ñ 70 леÑ. Ðн женаÑ. У него ÑеÑвеÑо деÑей. Хабилло Ñилолог. Ðн более двадÑаÑи Ð»ÐµÑ ÑабоÑÐ°ÐµÑ Ð¿Ð¾ пÑоÑеÑÑии. Также Хабилло занимаеÑÑÑ Ð²Ð¸Ð½Ð¾Ð³ÑадаÑÑÑвом. У него имееÑÑÑ Ð½ÐµÐ±Ð¾Ð»ÑÑой виногÑадник. ÐÑим видом деÑÑелÑноÑÑи Хабилло занимаеÑÑÑ 15 леÑ."); // returns 0, but should return 37 Expected result: ---------------- The second instruction should return 37 Actual result: -------------- The second instruction returns 0 ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=63663&edit=1