Edit report at https://bugs.php.net/bug.php?id=63450&edit=1
ID: 63450
Comment by: trollofdarkness at gmail dot com
Reported by: trollofdarkness at gmail dot com
Summary: iconv returns false when illegal character
encountered
Status: Not a bug
Type: Bug
Package: ICONV related
Operating System: Debian 5 Lenny
PHP Version: 5.4.8
Block user comment: N
Private report: N
New Comment:
Hi,
So, I had a look at it and this is not a libiconv related bug. It is a glibc
related bug (so, iconv, but the glibc implementation) as I was not using the
GNU
libiconv implementation but the glibc one.
Actually, I had the 2.7 version of glibc. I tested on another machine - a
Ubuntu
12.04 LTS server - where the glibc version was 2.14 and, indeed, the bug was
not
present. So it is in recent versions of glibc.
To correct the problem on Debian, you can recompile PHP to use the libiconv
implementation instead of the glibc one.
But it is NOT quite easy because PHP looks for glic implementation BEFORE
libiconv and select it if present... even with every --with-iconv=something
parameter you can use when running ./configure.
I used the solution presented there :
<http://stackoverflow.com/questions/4743080/how-can-i-force-php-to-use-the-
libiconv-version-of-iconv-instead-of-the-centos-i/4851065#4851065> and as one
of
the comments states, I had to change global configure file and not (only) the
one of ext/iconv. (note that, first, you have to actually download libiconv and
compile it... but that's just wget && ./configure && make && make install).
I now have the libiconv implementation in use and it's working perfectly.
I storngly think PHP should change the behaviour of the configure file, we
should not have to edit it to use the libiconv implementation, we should just
be
able to use the right configure parameter!
Previous Comments:
------------------------------------------------------------------------
[2012-11-06 22:19:13] trollofdarkness at gmail dot com
Hi Rasmus,
Thanks for your help!
I will have a look at that on the spot and will post an update to say if it
works
to downgrade the libiconv.
------------------------------------------------------------------------
[2012-11-06 21:54:00] [email protected]
This is not a PHP issue. This is a change in recent versions of libiconv. If
you
link PHP against an older version of libiconv it will work again or you can use
mbstring_convert_encoding(). And we have a new uconverter extension feature
coming that will do a better job than either of these. See
https://wiki.php.net/rfc/uconverter
------------------------------------------------------------------------
[2012-11-06 21:45:33] trollofdarkness at gmail dot com
Description:
------------
Hi everyone,
I have been, since I think the version 5.3.x is out (and still with 5.4.8),
experiencing issues with iconv.
Especially, when an illegal character is encountered and the //IGNORE flag is
set on the target charset, the function returns FALSE instead of just skipping
this character.
This is problematic because if a single character in a 50 000 chars long string
is "illegal" then the output is nothing, just for one char...
It does not happen with the TRANSLIT flag.
I experienced that with UTF8 (from) and ISO-8859-15 (to) charsets, I did not
test with other ones. Below is an example to reproduce the bug.
Note : I saw there are other bug reports about similar issues, but they're all
saying the string is cut... In my case, it literally returns false. So, might
be
different?
Test script:
---------------
<?php
$str = "
foo
è
foo
";
$result = iconv("UTF-8", "ISO-8859-15"."//IGNORE", $str);
var_dump($result); // false, instead of "foo ... foo"
?>
Expected result:
----------------
foo
foo
Actual result:
--------------
false
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=63450&edit=1