On 22 May 2011 08:17, Eli Orr (Office) <eli....@logodial.com> wrote:
> Hi Adam,
> I have a prof that the XML advise does not work in real cases I had.
> We are using XMLs in our system but when you edit the XML with  a text
> editor and put the XML heading of UTF-8
> <?xml version="1.0" encoding="UTF-8"?>
> it DOES NOT assure the text inside is encoded in UTF-8 so but maybe (many
> cases) t other iso-xxx method.

The point of the header is telling readers what encoding is used. Of
course that means errors are possible - setting the header is not
magic, it doesn't change the rest of the file. You need to make sure
the contents of the file match the encoding from the header when you
make XML documents.

Anyway, from your perspective, the header is an indication but not a
foolproof way of figuring encoding out.

> My question was for a function that scan the bytes of the file and decided
> WITHOUT the BOM heading.
> I mean by checking the bytes sequence in the file.
> I claim that WITHOUT a BOM it might be impossible to assure it is UTF-8
> encoding which is a whole escape sequence logic
> that may convert one character into one, two or three character.

http://se.php.net/manual/en/function.mb-detect-encoding.php - the
first comment should be interesting to you.

If you try to use mb_detect_encoding to detect whether a string is
valid UTF-8, use the strict mode, it is pretty worthless otherwise.

    $str = 'áéóú'; // ISO-8859-1
    mb_detect_encoding($str, 'UTF-8'); // 'UTF-8'
    mb_detect_encoding($str, 'UTF-8', true); // false


WWW: plphp.dk / plind.dk
LinkedIn: plind
BeWelcome/Couchsurfing: Fake51
Twitter: kafe15

PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to