Edit report at https://bugs.php.net/bug.php?id=48147&edit=1

 ID:                 48147
 Comment by:         [email protected]
 Reported by:        kulakov74 at yandex dot ru
 Summary:            iconv with //IGNORE cuts the string
 Status:             Feedback
 Type:               Bug
 Package:            ICONV related
 Operating System:   Linux
 PHP Version:        5.*, 6CVS (2009-05-05)
 Block user comment: N
 Private report:     N

 New Comment:

I submitted an updated bug to glibc, which correctly describes the incorrect 
behavior in glibc http://sourceware.org/bugzilla/show_bug.cgi?id=13541

The facts of the matter are as follows:

1) glibc has inconsistent behavior about what the EILSEQ error code is supposed 
to mean, between its documentation and its behavior
2) glibc and libiconv have different behavior
3) A user of PHP who would like to use iconv to convert between two character 
sets while ignoring malformed characters *cannot do so* with the most recent 
versions of PHP (5.4+). (Trust me, I've tried.) In old versions of PHP, this 
functionality was available. Thus, this bug is a regression.

If you want to blame upstream, that's fine by me, but I'm not optimistic on 
glibc getting updated any time in the near future, and there is a well 
understood (and implemented elsewhere) fix which gives us the correct behavior.


Previous Comments:
------------------------------------------------------------------------
[2012-01-08 12:33:12] [email protected]

To me it looks like there is no bug (as stated in the redhat issues). Also even 
if 
there was one, it would not be a PHP bug but iconv's.

Or do you have any information that shows that PHP is causing this problem here?

------------------------------------------------------------------------
[2011-12-23 00:49:31] [email protected]

I think I understand how to fix this bug, without modifying glibc. We need to 
modify our invocation of iconv in order to mirror the behavior of 
iconv_prog.c:process_block() when the '-c' flag is set (if we mimic the code 
closely enough, we also get an extra bonus of sensible block processing 
behavior, which is better than the horrible over-allocation iconv does right 
now). In particular, we need to handle the EILSEQ error code correctly.

------------------------------------------------------------------------
[2011-12-18 22:34:38] [email protected]

Upstream bugs:

http://sources.redhat.com/bugzilla/show_bug.cgi?id=13517
http://sources.redhat.com/bugzilla/show_bug.cgi?id=13518

------------------------------------------------------------------------
[2011-12-18 19:37:53] [email protected]

Not broken in latest version of libiconv

ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n --version
iconv (GNU libiconv 1.14)
Copyright (C) 2000-2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Bruno Haible.
ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n -f utf-8 -t 
iso-8859-1//IGNORE ~/iconv.html | wc -c
15312
ezyang@javelin:~/Desktop/libiconv-1.14/src$ iconv -f utf-8 -t 
iso-8859-1//IGNORE ~/iconv.html | wc -c
iconv: illegal input sequence at position 8168
8157

------------------------------------------------------------------------
[2009-05-07 13:58:21] [email protected]

We still can't fix bugs in glibc iconv implementation. Try this on 
command line and you get same results:

# iconv -f utf-8 -t iso-8859-1 iconv.html > /dev/null
iconv: illegal input sequence at position 3589

# iconv -f utf-8 -t iso-8859-1//IGNORE iconv.html > /dev/null
iconv: illegal input sequence at position 8168


------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=48147


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48147&edit=1

Reply via email to