Tom Worster wrote:

Because, I thought, HTML files were basically just text files with
different file extensions, and that those other characters would not
store or display properly if saved in .txt format. Was I mistaken?

yes. see

which says that html uses the UCS, a character-by-character equivalent to
the Unicode character set. so if you use a unicode character encoding (such
as utf-8) then you you have a direct encoding for every unicode character in

I wonder if you've looked at any of those non-encoded characters in vi
while shelled in.

so a utf-8 html file or stream should normally to have no entities other
than <, >, $amp; and perhaps " as needed.

Well there it is, that "should" word. I've already named one example
where "should" doesn't work as expected in all cases, here's another
one: Client blog that uses Wordpress and the UTF-8 charset. Text that
is copied & pasted with non-English letters & fancy punctuation marks
displays alright in the body-text of posts, but not in post subject
lines. I think that's because Wordpress is doing some conversions in
the background, but isn't doing them everywhere. So when one of those
is present in the subject line, there's a little glyph that shows up
when viewed in a Web browser. Unless that character is properly encoded
by me. Have you looked at the Wordpress wp-includes/formatting.php file?
Lots of busy work in the form of substituting one thing for another
thing, there.

That is supposed to be a UTF-8 encoded text file, between 1/3 and 1/2
of the characters do not display correctly on my screen.

why this doesn't work for you is not clear. it could be that your browser
has a preference configured to override the charset specified in the http
headers. or perhaps the browser does not observe the specified content type
for txt files.

I don't think it has anything to do with configuration preferences like
that. I'm running Win2K as indicted earlier, the client is running a Mac
and he sees the same thing I do - in Safari and MacFF both. What OS are
you running? I'd expect XP or Vista to behave slightly differently from
Win2k, and who knows what different Linux distros will do w/out testing.

Either way,
this next link suggests that Turkish characters with no equivalent in
the English language should be encoded for Web display:

don't believe everything you read on the web. while some browsers may
tolerate it, i don't think pages encoded according to those suggestions
would even be valid html.

I'll let you investigate that at - I satisfied
my curiosity about that long before you said anything.

And because that is off-topic, I'll throw this in:

The consensus seems to be that the proposed "ifset()" and "ifempty()"
functions are more effort than they are worth. What I'd like to know
is, why "empty()" still exists when every time I turn around, the
mentors I turn to locally tell me not to use it, to use "isset()"
instead. Because empty() doesn't work with zero. Anyone care to take
a stab at that?

perhaps because it's hard to get rid of language elements without breaking
existing code?

Perhaps. I think Chris said it best, they serve difference purposes.
It's been my observation though, they can generally be interchanged
with minor changes to syntax. Unless working with zero.


PHP General Mailing List (
To unsubscribe, visit:

Reply via email to