Hi. Having removed the BOMs, I did a quick analysis of the various encodings declared in the XML files.
57,020 files examined in PHPDOC and PEARDOC - all languages. No files with BOM ISO-8859-1 : 20,761 files ISO-8859-2 : 3,185 files ISO-8859-7 : 428 files ISO-8859-8 : 2 files WINDOWS-1255 : 194 files BIG5 : 83 files GB2312 : 887 files Thats 25,540 files which are not marked as encoded with UTF-8. What should the encoding be and what is the impact of it NOT being UTF-8? The next analysis I did was to see how many files content was only [\x00-\x7f]. This would show files which would have no content changed to match UTF-8. ISO-8859-1 : 11,509 files ISO-8859-2 : 228 files ISO-8859-7 : 1 file ISO-8859-8 : 1 file WINDOWS-1255 : 27 files BIG5 : 18 files GB2312 : 5 files That's 11,789 files which can be safer retagged as being UTF-8 without any problems as the content was essentially ASCII only. That leaves 13,751 files not encoded as UTF-8. Shall I commit the ascii -> UTF-8 change? (Running for cover ... ) -- ----- Richard Quadling "Standing on the shoulders of some very clever giants!" EE : http://www.experts-exchange.com/M_248814.html Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731 ZOPA : http://uk.zopa.com/member/RQuadling