> I use the following code to get rss and parse it, but the code > occasionally have issues with gb2312 or big-5 encoded feeds, and fails > to parse them. However other times may appear just okay. Any thoughts? > Maybe SimpleXMLElement is simply not meant for other language encodings...
I normalize to UTF-8 before giving to SimpleXML, and it seems okay. For character set conversions I use both mb_convert_encoding and iconv and compare to make sure they gave the same result. However for gb2312 and euc-kr I use mb_convert_encoding only; and for windows-1256 and windows-1254 I use iconv only. [1] shows my code. HTH, Darren [1]: $s_mb=false; if($encoding=='gb2312' || $encoding=='euc-kr'){ //iconv is not coping with certain characters very well, so just use mbstring $s_iconv=$s_mb=mb_convert_encoding($s,'UTF-8',$encoding); } if($s_mb===false){ $s_iconv=iconv($encoding,'UTF-8',$s); if($encoding=='windows-1256' || $encoding='windows-1254')$s_mb=$s_iconv; //Handle encodings not supported by mb_string extension else $s_mb=mb_convert_encoding($s,'UTF-8',$encoding); } -- Darren Cook, Software Researcher/Developer http://dcook.org/gobet/ (Shodan Go Bet - who will win?) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) -- PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php