Re: [PHP-I18N] utf-8 characters delivered as question marks involuntary

Darren Cook Fri, 12 Sep 2008 06:14:42 -0700

>     input => output
> A: 3313 => 3313;  
> B: 3313 => 3?13;  
> -------------------------
> 3 -> a variable-length character;
> 1 -> a one byte ascii character;
> ? -> a one byte ascii question mark (literaly)
> ...
> Sometimes (probability 1/(2-20)) i get a question mark instead of a
> particular variable-length character when the other variable-length
> characters are displayed correctly within the same page.


Question mark is often used for characters that have no code point in
the charset. So, what is the code point of the character that turns into
a question mark? Does that same code point output okay some of the time?

Of course, this explanation is a bit dubious if input is UTF-8, internal
encoding is UTF-8 and output is UTF-8.

> More details for an instant: - Only characters read from a utf-8
> encoded file with a function file_get_contents() seem to be displayed
> correctly at all times; - Characters that are read by the php.exe
> (please excuse me here) do fail once in a while

Ah, could you be reading a partial multi-byte character and processing
it? E.g. taking an extreme example if you read 8 bytes, process them,
then repeat with the next 8 bytes, and they are an equal mix of
characters 1, 2 and 3 bytes long then there is a high probability you
will processing garbage.

Darren


-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

-- 
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-I18N] utf-8 characters delivered as question marks involuntary

Reply via email to