> i tried that kind of stuff - it did not seem to work.
> i will try again... if anyone has any ideas i.e. "use iconv to convert
> to A, then use DOM stuff, then use iconv to move it back to UTF8..."
> etc. i am all ears.

Nope - for example this is the input text (apologies if your reader
isn't utf-8) - simplified chinese


Output is this:


What is funny is I don't care about altering the actual content, only
the content of the "href" and "src" attributes, which are all standard
latin-based URLs, too.

Here's the simplest code to create the behavior

$q = db_query("SELECT id,old FROM testing", "redirects");
while(list($id, $doc) = db_rows($q)) {
        $new = fix_document($doc);
        $new = db_escape($new);
        db_query("UPDATE testing SET new='$new' WHERE id=$id",

function fix_document($string) {
        $dom = new DomDocument('1.0', 'UTF-8');
        $dom->preserveWhiteSpace = false;
        return $dom->saveHTML();

(Note: it is not the db functions, if I do this:

function fix_document($string) {
        return $string;

The content is unaltered.

Anyone with any ideas? Any options to feed to the DOM stuff? It's
translating the stuff to htmlentities, which I don't want either.

