Hi!
Don't have a good solution, but some ideas:
1. There's http://php.net/manual/en/class.uconverter.php which uses ICU
convertor. It can recognize tons of charsets/encodings
(http://site.icu-project.org/charts/charset) and can filter out bad
characters, though the way to achieve it may be a bit
I made a patch [0] for T39665 [1] about 6 months ago. It has been
rotting in gerrit since.
The core bug is related to glibc's iconv implementation and PHP (and
HHVM as well I think). To work around the iconv bug I wrote a little
helper function that will use mb_convert_encoding() instead if it is