Edit report at https://bugs.php.net/bug.php?id=48147&edit=1
ID: 48147 Comment by: [email protected] Reported by: kulakov74 at yandex dot ru Summary: iconv with //IGNORE cuts the string Status: Feedback Type: Bug Package: ICONV related Operating System: Linux PHP Version: 5.*, 6CVS (2009-05-05) Block user comment: N Private report: N New Comment: I submitted an updated bug to glibc, which correctly describes the incorrect behavior in glibc http://sourceware.org/bugzilla/show_bug.cgi?id=13541 The facts of the matter are as follows: 1) glibc has inconsistent behavior about what the EILSEQ error code is supposed to mean, between its documentation and its behavior 2) glibc and libiconv have different behavior 3) A user of PHP who would like to use iconv to convert between two character sets while ignoring malformed characters *cannot do so* with the most recent versions of PHP (5.4+). (Trust me, I've tried.) In old versions of PHP, this functionality was available. Thus, this bug is a regression. If you want to blame upstream, that's fine by me, but I'm not optimistic on glibc getting updated any time in the near future, and there is a well understood (and implemented elsewhere) fix which gives us the correct behavior. Previous Comments: ------------------------------------------------------------------------ [2012-01-08 12:33:12] [email protected] To me it looks like there is no bug (as stated in the redhat issues). Also even if there was one, it would not be a PHP bug but iconv's. Or do you have any information that shows that PHP is causing this problem here? ------------------------------------------------------------------------ [2011-12-23 00:49:31] [email protected] I think I understand how to fix this bug, without modifying glibc. We need to modify our invocation of iconv in order to mirror the behavior of iconv_prog.c:process_block() when the '-c' flag is set (if we mimic the code closely enough, we also get an extra bonus of sensible block processing behavior, which is better than the horrible over-allocation iconv does right now). In particular, we need to handle the EILSEQ error code correctly. ------------------------------------------------------------------------ [2011-12-18 22:34:38] [email protected] Upstream bugs: http://sources.redhat.com/bugzilla/show_bug.cgi?id=13517 http://sources.redhat.com/bugzilla/show_bug.cgi?id=13518 ------------------------------------------------------------------------ [2011-12-18 19:37:53] [email protected] Not broken in latest version of libiconv ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n --version iconv (GNU libiconv 1.14) Copyright (C) 2000-2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Bruno Haible. ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c 15312 ezyang@javelin:~/Desktop/libiconv-1.14/src$ iconv -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c iconv: illegal input sequence at position 8168 8157 ------------------------------------------------------------------------ [2009-05-07 13:58:21] [email protected] We still can't fix bugs in glibc iconv implementation. Try this on command line and you get same results: # iconv -f utf-8 -t iso-8859-1 iconv.html > /dev/null iconv: illegal input sequence at position 3589 # iconv -f utf-8 -t iso-8859-1//IGNORE iconv.html > /dev/null iconv: illegal input sequence at position 8168 ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48147 -- Edit this bug report at https://bugs.php.net/bug.php?id=48147&edit=1
