Re: character sets in HTML files?

2001-10-19 Thread Bill Janssen
> > One of the advantages of using Python 2 for parsing is that it can work > > with a complete 32-bit Unicode charset encoding (UTF-8), rather than > > just a locale-specific subset, and includes support for transforming > > many (most) subsets into UTF-8. > > My understanding is that you

Re: character sets in HTML files?

2001-10-19 Thread David A. Desrosiers
> One of the advantages of using Python 2 for parsing is that it can work > with a complete 32-bit Unicode charset encoding (UTF-8), rather than > just a locale-specific subset, and includes support for transforming > many (most) subsets into UTF-8. My understanding is that you need the

Re: character sets in HTML files?

2001-10-18 Thread Bill Janssen
> Remember, implementing an XML parser is no trivial matter. If the > XML page or application fails validation, the page is bitbucketed. In the > current scheme, Plucker tries to make sense of what's left of the broken > HTML, but with XML, that's not allowed. Luckily, Python 2 comes with t

Re: character sets in HTML files?

2001-10-18 Thread David A. Desrosiers
> Should plucker just parse XML and feed non-xml stuff to tidy to > reformat? Just an idea to simplify things. I think it simplifies > things, at least. Remember, implementing an XML parser is no trivial matter. If the XML page or application fails validation, the page is bitbucketed.

Re: character sets in HTML files?

2001-10-18 Thread MJ Ray
Bill Janssen <[EMAIL PROTECTED]> writes: > As soon as we add an XML component to the parser... It's on my list. Should plucker just parse XML and feed non-xml stuff to tidy to reformat? Just an idea to simplify things. I think it simplifies things, at least. > Actually, if you read the XHTML

Re: character sets in HTML files?

2001-10-17 Thread Bill Janssen
> > I've been reading the HTTP and HTML specs about character sets. > > Shouldn't you be using the xhtml specs now? > -- > MJR As soon as we add an XML component to the parser... It's on my list. Actually, if you read the XHTML specs, you'll see that they refer you back to the HTML specs for

Re: character sets in HTML files?

2001-10-17 Thread MJ Ray
Bill Janssen <[EMAIL PROTECTED]> writes: > I've been reading the HTTP and HTML specs about character sets. Shouldn't you be using the xhtml specs now? -- MJR

character sets in HTML files?

2001-10-16 Thread Bill Janssen
I've been reading the HTTP and HTML specs about character sets. The HTTP spec says, "If a page is of type 'text/*', and the HTTP headers don't specify a character set, assume ISO-8859-1'. The HTML spec says, "Don't follow the HTTP spec rules about the default being ISO-8859-1", and "Use the HTTP