(Rasmus wrote:) "If you fwrite UTF-8 data to the file, then it is a UTF-8 
file."

Thanks Rasmus! Honestly, that is REALLY helpful!

I was just coming back here to post that I had found that very same answer. 
But I am glad to hear it confirmed by the experts.

Bottom line: I was just being silly/ignorant.

I went and downloaded a simple HEX editor and compared the actual binary 
output of several files that I had created using both PHP, and my favorite 
text editor (emeditor from emeditor.com). I then realized what probably 
everyone else here already knew: that (most of the time) the actual binary 
output from "Windows 1252" and "ISO-8859-1" and "UTF-8 without the byte 
order mark" -are completely identical!

I had the false impression that when a file was saved in UTF-8, that there 
was an actual binary "marker" that specified this (e.g. binary marker = 
"This file is saved in UTF-8!") -there simply is no such thing. The only 
thing that would set "UTF-8" apart -binarilly speaking- is the BOM, and I 
had stripped that out, making the file exactly the same as plain old "ANSI" 
(since I didn't have any characters that required "UTF-8", like from other 
languages etc.).

My text editor displays the current character encoding in the status bar, 
but since there was no way for it to tell whether it was saved with Windows 
1252 or UTF-8, it just displayed that the file was encoded "windows 
default - ISO-8859-1". This is where I got confused.

It turned out that my PHP script has been faithfully saving the file in 
UTF-8 the whole time, and everything was fine. I was just not educated 
enough about what actually changed when you save a file in UTF-8 but didn't 
have any characters that differed from ANSI (which in my case, the "change" 
was nothing, since ALL of the characters in my test document where 
interchangeable with ANSI).

Well, this has been a learning experience! I hope that this post will help 
some poor ignoramus like myself, sometime in the future! :) And hopefully I 
am right about what I said above, and not flaunting my ignorance once 
again -lol

Thanks again, to everyone who helped me! You guys really got me on the right 
track. Not the least of which was simply causing me to think about what I 
was asking more deeply.

-Jon


"Rasmus Lerdorf" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Jon M. wrote:
>> No matter what I do to the strings to encode them in whatever format 
>> before using "fwrite", it ALWAYS seems to end up writing the actual file 
>> in "iso-8859-1".
>>
>> Isn't the encoding of the characters in PHP's strings, and the encoding 
>> of the actual binary file on your hard drive, two totally different 
>> things? Or am I just misinformed?
>
> A file is completely defined by its contents.  If you fwrite UTF-8 data to 
> the file, then it is a UTF-8 file.  Whether your editor, or whatever it is 
> you are using to determine the file is being written as iso-8859-1 is 
> smart enough to pick this up is a completely different question.
>
> Why don't you try creating the same contents with PHP and with your 
> preferred text editor and then compare the contents.  Perhaps your editor 
> is dropping a hint somewhere in it that you are not writing to the file 
> from PHP.
>
> -Rasmus 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to