Edit report at https://bugs.php.net/bug.php?id=65361&edit=1
ID: 65361
User updated by: pascal dot chevrel at free dot fr
Reported by: pascal dot chevrel at free dot fr
Summary: Transliteration has uppercase problems with letter J
in Serbian
-Status: Feedback
+Status: Open
Type: Bug
Package: Unicode Engine related
Operating System: Linux
PHP Version: 5.5.1
Block user comment: N
Private report: N
New Comment:
"but with UTF-8 source it gives the translit you expect"
That's not the case for me, do you have an example online showing my example
working? A gist on github for example.
Previous Comments:
------------------------------------------------------------------------
[2013-07-30 17:16:30] [email protected]
Ok, then it has to be ICU itself. I was testing on windows previously which has
ICU 50, but ubuntu 13.04 ships with ICU 48 and I can repro what you say there.
Which ICU version do you use? Most linux distros have 48 at the time. May be
you
could try a newer ICU, even 51? But even now from what I can see it's unlikely
a
PHP bug.
Thanks.
------------------------------------------------------------------------
[2013-07-30 16:49:20] pascal dot chevrel at free dot fr
All my sources are in utf8, I rechecked with the isutf8 bash command.
------------------------------------------------------------------------
[2013-07-30 16:43:45] [email protected]
Is your source cyrillic string UTF-8 encoded? No idea how to encode otherwise,
but
with UTF-8 source it gives the translit you expect. So that might be the key.
------------------------------------------------------------------------
[2013-07-30 14:44:15] pascal dot chevrel at free dot fr
Description:
------------
The transliterator class does not work well when converting from Cyrillic
Serbian to Latin Script Serbian. All the j letters in cyrillic are
systematically converted to uppercase J in latin-script serbian while it should
be lowercase j inside a word.
Online conversion tools probably also based on ICU don't have this bug and do
the conversion correctly.
I am attaching a code sample that shows that bug. I tested that the bug exists
in both PHP 5.4 and 5.5
Thanks!
Test script:
---------------
<?php
$t = Transliterator::create('Serbian-Latin/BGN');
$source = 'ÐаÑгледаниÑи ÑаÑÑови';
echo '<ul>'
. '<li>Cyrillic source: ' . $source . '</li>'
. '<li>Expected transliteration: Najgledaniji sajtovi</li>'
. '<li>Actual transliteration: ' . $t->transliterate($source) . '</li>'
. '</ul>';
Expected result:
----------------
This string :
ÐаÑгледаниÑи ÑаÑÑови
Should be transliterated to:
Najgledaniji sajtovi
Actual result:
--------------
But PHP transliterates it to:
NaJgledaniJi saJtovi
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=65361&edit=1