Re: [WSG] Encoding odities

2008-07-13 Thread Nikita The Spider The Spider
On Sun, Jul 13, 2008 at 2:49 PM, Mordechai Peller <[EMAIL PROTECTED]> wrote:
> David Hucklesby wrote:
>>
>> FWIW - The META content-type is only relevant to pages read from
>> a local file-- for example, when someone saves your page to disk.
>
> Not true. I recently had some non-local UTF-8 files where some special
> characters weren't displaying properly in IE6. When I added the missing meta
> tag, the problem was solved.

Mordechai, you're correct. The encoding declared in the META tag *can*
be relevant, although it is trumped by what (if anything) is specified
in the HTTP header. When loading pages directly from disk, obviously
there's no HTTP headers involved.

You might find this article interesting:
http://NikitaTheSpider.com/articles/EncodingDivination.html


-- 
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more


***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***



Re: [WSG] Encoding odities

2008-07-13 Thread Mordechai Peller

David Hucklesby wrote:

FWIW - The META content-type is only relevant to pages read from
a local file-- for example, when someone saves your page to disk.
Not true. I recently had some non-local UTF-8 files where some special 
characters weren't displaying properly in IE6. When I added the missing 
meta tag, the problem was solved. 



***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***



Re: [WSG] Encoding odities

2008-07-11 Thread David Hucklesby
On Thu, 10 Jul 2008 15:25:13 +0100, Barney Carroll wrote:
> Thanks for your swift responses, all.
>
>
> The validator gives me an unconditional pass after putting in Kevin's 
> properly-formed
> tag: 
>
> Notepad++ was really nice, but I suspected it was being a bit silly. On a Mac 
> I'd use
> BBEdit (loved that) but PCs seem to be short on really head-above-the-crowd 
> open source
> code editors.
>
>
> Nikita, that's worth knowing... And yes it is ending up US-ASCII, but I'd 
> just like to
> be sure I'm sticking to the lowest common denominator...
>
>
> Regards,
> Barney
>
> On 7/10/08, Nikita The Spider The Spider <[EMAIL PROTECTED]> wrote:> On Thu,
> Jul 10, 2008 at 8:27 AM, Barney Carroll
>> <[EMAIL PROTECTED]> wrote:
>>> Hello all,
>>>
>>> I've got a problem with character set encoding I'd like to rectify. I use 
>>> UTF-8 as
>>> a matter of convenience and ideology, and don't believe it should be that 
>>> much of a
>>> problem. My editor (Notepad++) is set to create new files in UTF-8 without 
>>> a byte
>>> order mark, but when I retrieve files from my server it tells me that 
>>> they're ANSI.
>>
>>
>> Does "ANSI" means US-ASCII? The most popular single-byte encodings 
>> (ISO-8859-X, Win-
>> 1252) and UTF-8 are supersets of US-ASCII, so a US-ASCII file is also valid 
>> UTF-8
>> (and ISO-8859-X and Win-1252) all at the same time. It's pretty easy to 
>> write English-
>> language pages that are 100% pure US-ASCII, so this might be your situation.
>> Notepad++ has saved the file as UTF-8, but in this situation that doesn't 
>> look any
>> different from US-ASCII (i.e. "ANSI").
>>
[...]
~

FWIW - The META content-type is only relevant to pages read from
a local file-- for example, when someone saves your page to disk.

For files served from the web, browsers look for encoding information
in the response headers. Many servers are set up to send ISO-8859-1
(or the more recent ISO-8859-15). If you want to include glyphs from
the Unicode character set with such encoding, you must use HTML
entities. If you use things like "curly quotes" for example, your text
quickly becomes unreadable. UTF-8 encoding lets you add these as
regular text.

It's not just the editor that needs to encode things properly. You
also have to make sure the file is uploaded as "binary" rather than
the usual ASCII conversion. Your server must also send the correct
encoding header.

The W3C has information on how to do that using your .htaccess file:

 

Firefox developer tools or Opera "Info" sidebar can tell you what
the headers are.

Hoping this helps.


Cordially,
David
--




***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***



Re: [WSG] Encoding odities

2008-07-10 Thread James Ellis
Barney,

One other thing you might want to check is how (mime type, character encoding) 
your web server is serving the file.

If you are using a server side language then most (if not all) can send http 
headers to a browser, including a content type header. In PHP, for instance, 
you'd do this to send an UTF8 encoded html file.
header('Content-Type: text/html; charset=utf-8');
and this to send a UTF8 encoded xml file (and so on ad nauseam)
header('Content-Type: text/xml; charset=utf-8');

If you are just serving static HTML pages, without middleware in the way, then 
check your web server config for how it sends text/html pages. In Apache 
you'll find this in the main config file, if you don't have access to that 
then try the .htaccess methods.

Of course, whether or not your document includes characters from other 
character sets that cannot be mapped to UTF8 is another matter ;) - if the 
browser cannot render the character you'll see the odd ?? characters in their 
place.

Cheers
James


On Friday 11 July 2008 00:25:13 Barney Carroll wrote:
> Thanks for your swift responses, all.
>
> The validator gives me an unconditional pass after putting in Kevin's
> properly-formed tag:
> 
>
> Notepad++ was really nice, but I suspected it was being a bit silly. On a
> Mac I'd use BBEdit (loved that) but PCs seem to be short on really
> head-above-the-crowd open source code editors.
>
> Nikita, that's worth knowing... And yes it is ending up US-ASCII, but I'd
> just like to be sure I'm sticking to the lowest common denominator...
>
> Regards,
> Barney
>
> On 7/10/08, Nikita The Spider The Spider <[EMAIL PROTECTED]> wrote:
> > On Thu, Jul 10, 2008 at 8:27 AM, Barney Carroll
> >
> > <[EMAIL PROTECTED]> wrote:
> > > Hello all,
> > >
> > > I've got a problem with character set encoding I'd like to rectify. I
> > > use UTF-8 as a matter of convenience and ideology, and don't believe it
> >
> > should
> >
> > > be that much of a problem. My editor (Notepad++) is set to create new
> >
> > files
> >
> > > in UTF-8 without a byte order mark, but when I retrieve files from my
> >
> > server
> >
> > > it tells me that they're ANSI.
> >
> > Does "ANSI" means US-ASCII? The most popular single-byte encodings
> > (ISO-8859-X, Win-1252) and UTF-8 are supersets of US-ASCII, so a
> > US-ASCII file is also valid UTF-8 (and ISO-8859-X and Win-1252) all at
> > the same time. It's pretty easy to write English-language pages that
> > are 100% pure US-ASCII, so this might be your situation. Notepad++ has
> > saved the file as UTF-8, but in this situation that doesn't look any
> > different from US-ASCII (i.e. "ANSI").
> >
> > Here's an ASCII chart. There are a lot of things floating around out
> > there that claim to be "extended ASCII", "Microsoft ASCII", "Updated
> > ASCII", etc. None of them are official. ASCII ends at character 127,
> > end of story.
> > http://www.jimprice.com/jim-asc.shtml
> >
> > Here's a list of valid charset names:
> > http://www.iana.org/assignments/character-sets
> >
> > > I ran an automatic W3C validation of my markup just a second ago after
> > > making some edits and it warns me that no character set encoding was
> > > specified (even though the first tag in my heads is  > > name="content-type" content="text/html; charset=UTF-8">).
> >
> > For us to figure out why that is, you'll need to share a URL with us.
> >
> >
> >
> > --
> > Philip
> > http://NikitaTheSpider.com/
> > Whole-site HTML validation, link checking and more
> >
> >
> >
> > ***
> > List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
> > Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
> > Help: [EMAIL PROTECTED]
> > ***
>
> ***
> List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
> Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
> Help: [EMAIL PROTECTED]
> ***




***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***



Re: [WSG] Encoding odities

2008-07-10 Thread Barney Carroll
Thanks for your swift responses, all.

The validator gives me an unconditional pass after putting in Kevin's
properly-formed tag:


Notepad++ was really nice, but I suspected it was being a bit silly. On a
Mac I'd use BBEdit (loved that) but PCs seem to be short on really
head-above-the-crowd open source code editors.

Nikita, that's worth knowing... And yes it is ending up US-ASCII, but I'd
just like to be sure I'm sticking to the lowest common denominator...

Regards,
Barney

On 7/10/08, Nikita The Spider The Spider <[EMAIL PROTECTED]> wrote:
>
> On Thu, Jul 10, 2008 at 8:27 AM, Barney Carroll
> <[EMAIL PROTECTED]> wrote:
> > Hello all,
> >
> > I've got a problem with character set encoding I'd like to rectify. I use
> > UTF-8 as a matter of convenience and ideology, and don't believe it
> should
> > be that much of a problem. My editor (Notepad++) is set to create new
> files
> > in UTF-8 without a byte order mark, but when I retrieve files from my
> server
> > it tells me that they're ANSI.
>
>
> Does "ANSI" means US-ASCII? The most popular single-byte encodings
> (ISO-8859-X, Win-1252) and UTF-8 are supersets of US-ASCII, so a
> US-ASCII file is also valid UTF-8 (and ISO-8859-X and Win-1252) all at
> the same time. It's pretty easy to write English-language pages that
> are 100% pure US-ASCII, so this might be your situation. Notepad++ has
> saved the file as UTF-8, but in this situation that doesn't look any
> different from US-ASCII (i.e. "ANSI").
>
> Here's an ASCII chart. There are a lot of things floating around out
> there that claim to be "extended ASCII", "Microsoft ASCII", "Updated
> ASCII", etc. None of them are official. ASCII ends at character 127,
> end of story.
> http://www.jimprice.com/jim-asc.shtml
>
> Here's a list of valid charset names:
> http://www.iana.org/assignments/character-sets
>
>
>
> > I ran an automatic W3C validation of my markup just a second ago after
> > making some edits and it warns me that no character set encoding was
> > specified (even though the first tag in my heads is  > name="content-type" content="text/html; charset=UTF-8">).
>
>
> For us to figure out why that is, you'll need to share a URL with us.
>
>
>
> --
> Philip
> http://NikitaTheSpider.com/
> Whole-site HTML validation, link checking and more
>
>
>
> ***
> List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
> Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
> Help: [EMAIL PROTECTED]
> ***
>
>


***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***

Re: [WSG] Encoding odities

2008-07-10 Thread Nikita The Spider The Spider
On Thu, Jul 10, 2008 at 8:27 AM, Barney Carroll
<[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I've got a problem with character set encoding I'd like to rectify. I use
> UTF-8 as a matter of convenience and ideology, and don't believe it should
> be that much of a problem. My editor (Notepad++) is set to create new files
> in UTF-8 without a byte order mark, but when I retrieve files from my server
> it tells me that they're ANSI.

Does "ANSI" means US-ASCII? The most popular single-byte encodings
(ISO-8859-X, Win-1252) and UTF-8 are supersets of US-ASCII, so a
US-ASCII file is also valid UTF-8 (and ISO-8859-X and Win-1252) all at
the same time. It's pretty easy to write English-language pages that
are 100% pure US-ASCII, so this might be your situation. Notepad++ has
saved the file as UTF-8, but in this situation that doesn't look any
different from US-ASCII (i.e. "ANSI").

Here's an ASCII chart. There are a lot of things floating around out
there that claim to be "extended ASCII", "Microsoft ASCII", "Updated
ASCII", etc. None of them are official. ASCII ends at character 127,
end of story.
http://www.jimprice.com/jim-asc.shtml

Here's a list of valid charset names:
http://www.iana.org/assignments/character-sets


> I ran an automatic W3C validation of my markup just a second ago after
> making some edits and it warns me that no character set encoding was
> specified (even though the first tag in my heads is  name="content-type" content="text/html; charset=UTF-8">).

For us to figure out why that is, you'll need to share a URL with us.


-- 
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more


***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***



RE: [WSG] Encoding odities

2008-07-10 Thread Erickson, Kevin (DOE)
try this:




From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Barney Carroll
Sent: Thursday, July 10, 2008 8:27 AM
To: wsg@webstandardsgroup.org
Subject: [WSG] Encoding odities



Hello all,

I've got a problem with character set encoding I'd like to rectify. I
use UTF-8 as a matter of convenience and ideology, and don't believe it
should be that much of a problem. My editor (Notepad++) is set to create
new files in UTF-8 without a byte order mark, but when I retrieve files
from my server it tells me that they're ANSI.


I ran an automatic W3C validation of my markup just a second ago after
making some edits and it warns me that no character set encoding was
specified (even though the first tag in my heads is ).

More than a little confused about this. Could it be that this is a
contradiction with the fact that my files are somehow converted back to
ANSI by the server or something?

Any help much appreciated,
Regards,
Barney




***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***


***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***


Re: [WSG] Encoding odities

2008-07-10 Thread Aldona
I had this exact problem with Notepad++ as well. If you open the file in 
regular notepad or another editor you can see the charactors which wind 
up just before the first official characters (usually the doctype). I 
never found a way around the problem but I can say that PSPad is a great 
editor. Unix, dos or mac and can deal with lots of different programming 
languages.


IceKat

Barney Carroll wrote:


Hello all,

I've got a problem with character set encoding I'd like to rectify. I 
use UTF-8 as a matter of convenience and ideology, and don't believe 
it should be that much of a problem. My editor (Notepad++) is set to 
create new files in UTF-8 without a byte order mark, but when I 
retrieve files from my server it tells me that they're ANSI.


I ran an automatic W3C validation of my markup just a second ago after 
making some edits and it warns me that no character set encoding was 
specified (even though the first tag in my heads is name="content-type" content="text/html; charset=UTF-8">).


More than a little confused about this. Could it be that this is a 
contradiction with the fact that my files are somehow converted back 
to ANSI by the server or something?


Any help much appreciated,
Regards,
Barney




***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
*** 



***
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
***