Re: [whatwg] Internal character encoding declaration, Drop UTF-32, and UTF and BOM terminology

2007-06-25 Thread Ian Hickson
On Sun, 24 Jun 2007, Peter Karlsson wrote: I don't think forbidding BOCU-1 is a good idea. If there is ever a proper specification written of it, it could be very useful as a compression format for documents. BOCU-1 has been used for security attacks. It's on the no fly list. Do

Re: [whatwg] Internal character encoding declaration, Drop UTF-32, and UTF and BOM terminology

2007-06-24 Thread Peter Karlsson
Ian Hickson: I don't think forbidding BOCU-1 is a good idea. If there is ever a proper specification written of it, it could be very useful as a compression format for documents. BOCU-1 has been used for security attacks. It's on the no fly list. Do you have any references on that, or are

Re: [whatwg] Internal character encoding declaration, Drop UTF-32, and UTF and BOM terminology

2007-06-23 Thread Ian Hickson
On Sat, 11 Mar 2006, Henri Sivonen wrote: I think allowing in-place decoder change (when feasible) would be good for performance. Done. I think it would be beneficial to additionally stipulate that 1. The meta element-based character encoding information declaration is

Re: [whatwg] Internal character encoding declaration

2006-03-16 Thread Henri Sivonen
On Mar 14, 2006, at 15:07, Peter Karlsson wrote: Henri Sivonen on 2006-03-14: Transcoding is very popular, especially in Russia. In *proxies* *today*? What's the point considering that browsers have supported the Cyrillic encoding soup *and* UTF-8 for years? The mod_charset is not

Re: [whatwg] Internal character encoding declaration

2006-03-16 Thread Peter Karlsson
Henri Sivonen on 2006-03-16: Right. So, as a data point, it neither proves nor disproves the legends about transcoding *proxies* around Russia and Japan. The only transcoding proxies I know about are WAP gateways. They tend to do interesting things with input, especially when the source

Re: [whatwg] Internal character encoding declaration

2006-03-16 Thread Ivan Sagalaev
Peter Karlsson wrote: Transcoding is very popular, especially in Russia. Ahem... I wouldn't say it is. Only most, shall we say, conservative hosters still insist on these archaic setups and refuse to understand that trying to stick everything into windows-1251 is long unneeded. But overall

Re: [whatwg] Internal character encoding declaration

2006-03-14 Thread Henri Sivonen
On Mar 14, 2006, at 10:03, Peter Karlsson wrote: Henri Sivonen on 2006-03-11: I think it would be beneficial to additionally stipulate that 1. The meta element-based character encoding information declaration is expected to work only if the Basic Latin range of characters maps to the same

Re: [whatwg] Internal character encoding declaration

2006-03-13 Thread Lachlan Hunt
Henri Sivonen wrote: If a meta element whose http-equiv attribute has the value Content-Type (compare case-insensitively) and whose content attribute has a value that begins with text/html; charset=, the string in the content attribute following the start text/html; charset= is taken, white

Re: [whatwg] Internal character encoding declaration

2006-03-13 Thread Henri Sivonen
On Mar 13, 2006, at 16:12, Lachlan Hunt wrote: Henri Sivonen wrote: Authors are adviced not to use the UTF-32 encoding or legacy encodings. (Note: I think UTF-32 on the Web is harmful and utterly pointless, I agree about it being pointless, but why is it considered harmful? Opportunity

Re: [whatwg] Internal character encoding declaration

2006-03-11 Thread Henri Sivonen
On Mar 10, 2006, at 22:49, Ian Hickson wrote: I'm actually considering just requiring that UAs support rewinding (by defining the exact semantics of how to parse for the meta header). Is this something people would object to? I think allowing in-place decoder change (when feasible) would

Re: [whatwg] Internal character encoding declaration

2006-03-11 Thread Henri Sivonen
On Mar 11, 2006, at 17:10, Henri Sivonen wrote: Initialize a character decoder that the bytes 0x20–0x7E (inclusive) as well as 0x09, 0x0A and 0x0D decode to the Unicode code points of the same (zero-extended) value and maps all other bytes to U+FFFD and raises a REWIND flag On further

Re: [whatwg] Internal character encoding declaration

2006-03-10 Thread Bjoern Hoehrmann
* Ian Hickson wrote: Currently the behaviour is very underspecified here: http://whatwg.org/specs/web-apps/current-work/#documentEncoding I'd like to rewrite that bit. It will require a lot of research; of existing authoring practices, of current UAs, and of author needs. If anyone wants to

Re: [whatwg] Internal character encoding declaration

2005-08-13 Thread Henri Sivonen
On Aug 8, 2005, at 21:42, Henri Sivonen wrote: and must have the content attribute set to the literal value text/html; charset= should the space after ';' be one or more white space characters? Or actually zero or more. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/

[whatwg] Internal character encoding declaration

2005-08-08 Thread Henri Sivonen
Quoting from WA1 draft section 2.2.5.1. Specifying and establishing the document's character encoding: The meta element may also be used, in HTML only (not in XHTML) to provide UAs with character encoding information for the file. To do this, the meta element must be the first element in the