On Wed, Jun 16, 2010 at 01:59:33PM -0700, David E. Wheeler wrote: > I think what I need is some code to strip non-utf8 characters from a string > -- even if that string has the utf8 bit switched on. I thought that Encode > would do that for me, but in this case apparently not. Anyone got an > example?
Tri this: Encode::_utf8_off($string); $string = Encode::decode('utf8', $string); That will replace any byte sequences which are invalid UTF-8 with the Unicode replacement character. If you want to guarantee that the flag is on first, do this: utf8::upgrade($string); Encode::_utf8_off($string); $string = Encode::decode('utf8', $string); Devel::Peek's Dump() function will come in handy for checking results. Cheers, Marvin Humphrey