Edit report at https://bugs.php.net/bug.php?id=63663&edit=1

 ID:                 63663
 Updated by:         ahar...@php.net
 Reported by:        kobrien at kiva dot org
 Summary:            str_word_count does not properly handle non-latin
                     characters
-Status:             Open
+Status:             Analyzed
 Type:               Bug
 Package:            Strings related
 Operating System:   Ubuntu 12.04
 PHP Version:        5.3.20-dev
 Block user comment: N
 Private report:     N

 New Comment:

This is due to the use of isalpha() internally, which doesn't play well with 
multibyte encodings like UTF-8, regardless of the locale setting.

Fundamentally, this is the same issue as bug #27668 — I'm not sure there's a 
lot we can do about this in PHP 5.x, but it's worth noting if and when we 
revisit Unicode string handling internally.


Previous Comments:
------------------------------------------------------------------------
[2012-12-01 02:29:17] kobrien at kiva dot org

Description:
------------
The function str_word_count() does work properly on non-latin characters. It 
will 
return a value of zero. Whereas str_word_count() works properly on latin 
characters and returns the value for the number of words in a string.

Test script:
---------------
<?php
print str_word_count("PHP function str_word_count does not properly handle 
non-latin characters") . "\n";

// returns 11

print str_word_count("Хабилло житель Яванского 
района. Ему 70 лет. Он женат. У него четверо 
детей. Хабилло филолог. Он более двадцати 
лет работает по профессии. Также Хабилло 
занимается виноградарством. У него имеется 
небольшой виноградник. Этим видом 
деятельности Хабилло занимается 15 лет.");

// returns 0, but should return 37

Expected result:
----------------
The second instruction should return 37

Actual result:
--------------
The second instruction returns 0


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=63663&edit=1

Reply via email to