Edit report at https://bugs.php.net/bug.php?id=65361&edit=1

 ID:                 65361
 Updated by:         a...@php.net
 Reported by:        pascal dot chevrel at free dot fr
 Summary:            Transliteration has uppercase problems with letter J
                     in Serbian
-Status:             Open
+Status:             Feedback
 Type:               Bug
 Package:            Unicode Engine related
 Operating System:   Linux
 PHP Version:        5.5.1
 Block user comment: N
 Private report:     N

 New Comment:

Ok, then it has to be ICU itself. I was testing on windows previously which has 
ICU 50, but ubuntu 13.04 ships with ICU 48 and I can repro what you say there. 

Which ICU version do you use? Most linux distros have 48 at the time. May be 
you 
could try a newer ICU, even 51? But even now from what I can see it's unlikely 
a 
PHP bug.

Thanks.


Previous Comments:
------------------------------------------------------------------------
[2013-07-30 16:49:20] pascal dot chevrel at free dot fr

All my sources are in utf8, I rechecked with the isutf8 bash command.

------------------------------------------------------------------------
[2013-07-30 16:43:45] a...@php.net

Is your source cyrillic string UTF-8 encoded? No idea how to encode otherwise, 
but 
with UTF-8 source it gives the translit you expect. So that might be the key.

------------------------------------------------------------------------
[2013-07-30 14:44:15] pascal dot chevrel at free dot fr

Description:
------------
The transliterator class does not work well when converting from Cyrillic 
Serbian to Latin Script Serbian. All the j letters in cyrillic are 
systematically converted to uppercase J in latin-script serbian while it should 
be lowercase j inside a word.

Online conversion tools probably also based on ICU don't have this bug and do 
the conversion correctly.

I am attaching a code sample that shows that bug. I tested that the bug exists 
in both PHP 5.4 and 5.5

Thanks!

Test script:
---------------
<?php
$t = Transliterator::create('Serbian-Latin/BGN');
$source = 'Најгледанији сајтови';
echo '<ul>'
    . '<li>Cyrillic source: ' . $source . '</li>'
    . '<li>Expected transliteration: Najgledaniji sajtovi</li>'
    . '<li>Actual transliteration: ' . $t->transliterate($source) . '</li>'
    . '</ul>';


Expected result:
----------------
This string :
Најгледанији сајтови

Should be transliterated to:
Najgledaniji sajtovi



Actual result:
--------------
But PHP transliterates it to:
NaJgledaniJi saJtovi


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=65361&edit=1

Reply via email to