Re: Security bug in XmlHttpRequest, setRequestHeader()

2012-01-06 Thread Anne van Kesteren
On Fri, 06 Jan 2012 00:26:25 +0100, Hill, Brad bh...@paypal-inc.com  
wrote:
As this behavior is at least partially formally documented in   
http://tools.ietf.org/html/rfc3875#section-4.1.18 , and very widely  
implemented, the algorithm for XHR should be updated to at least  
consider _, and possibly all non-alphanumeric characters, as  
equivalent to - for purposes of comparison to the blacklisted header  
set.


We do not consider this to be an issue. (If it's an issue at all, it's an  
issue with those libraries.)


http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/thread.html#msg1349


--
Anne van Kesteren
http://annevankesteren.nl/



Re: Security bug in XmlHttpRequest, setRequestHeader()

2012-01-06 Thread Julian Reschke

On 2012-01-06 09:49, Anne van Kesteren wrote:

On Fri, 06 Jan 2012 00:26:25 +0100, Hill, Brad bh...@paypal-inc.com
wrote:

As this behavior is at least partially formally documented in
http://tools.ietf.org/html/rfc3875#section-4.1.18 , and very widely
implemented, the algorithm for XHR should be updated to at least
consider _, and possibly all non-alphanumeric characters, as
equivalent to - for purposes of comparison to the blacklisted header
set.


We do not consider this to be an issue. (If it's an issue at all, it's
an issue with those libraries.)

http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/thread.html#msg1349


See also the thread starting 
http://lists.w3.org/Archives/Public/ietf-http-wg/2011OctDec/0317.html.


If people are concerned by this, I'd recommend submitting an erratum for 
RFC 3050.


Best regards, Julian



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams gl...@skynav.com wrote:

 But, if the browser does not support UTF-32, then the table in step (4) of
  [1] is supposed to apply, which would interpret the initial two bytes FF
 FE
  as UTF-16LE according to the current language of [1], and further,
 return a
  confidence level of certain.
 
  I see the problem now. It seems that the table in step (4) should be
  changed to interpret an initial FF FE as UTF-16BE only if the following
 two
  bytes are not 00.
 


 That wouldn't actually bring browsers and the spec closer together; it
 would actually bring them further apart.


 At first glance, it looks like it makes the spec allow WebKit and IE's
 behavior, which (unfortunately) includes UTF-32 detection, by allowing them
 to fall through to step 7, where they're allowed to detect things however
 they want.


 However, that's ignoring step 5.  If step 4 passes through, then step 5
 would happen next.  That means this carefully-constructed file would be
 detected as UTF-8 by step 5:


 http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding


 That's not what happens in any browser; FF detects it as UTF-16 and WebKit
 and IE detect it as UTF-32.  This change would require it to be detected as
 UTF-8, which would have security implications if implemented, eg. a page
 outputting escaped user-inputted text in UTF-32 might contain a string like
 this, followed by a hostile script, when interpreted as UTF-8.


 This really isn't worth spending time on; you've free to press this if you
 like, but I'm moving on.


 --
 Glenn Maynard


I'm getting responseType json landed in WebKit, and going to do so
without the restriction of the JSON source being UTF-8.  We default our
decoding to UTF-8 if none is dictated by the server or overrideMIMEType(),
but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or
UTF-32(BE/LE) if the context is encoded as such, and accept the source
as-is.

It's a matter of having that perfect recipe of easiest implementation +
most interoperability.  It actually adds complication to our decoder if we
do something special just for (perfectly legit) JSON payloads.  I think
keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will
be reducing our interoperability and complicating our code base.  If we
don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the
JSON grammar and JSON.parse will do the leg work.  As someone else stated,
this is a good fight but probably not the right battlefield.


Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
Please be careful with quote markers; you quoted text written by me as
written by Glenn Adams.

On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org wrote:
 I'm getting responseType json landed in WebKit, and going to do so without
 the restriction of the JSON source being UTF-8.  We default our decoding to
 UTF-8 if none is dictated by the server or overrideMIMEType(), but we also
 do BOM detection and will gracefully switch to UTF-16(BE/LE) or
 UTF-32(BE/LE) if the context is encoded as such, and accept the source
 as-is.

 It's a matter of having that perfect recipe of easiest implementation +
 most interoperability.  It actually adds complication to our decoder if we

Accepting content that other browsers don't will result in pages being
created that work only in WebKit.  That gives the least
interoperability, not the most.

If this behavior gets propagated into other browsers, that's even
worse.  Gecko doesn't support UTF-32, and adding it would be a huge
step backwards.

 do something special just for (perfectly legit) JSON payloads.  I think
 keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be
 reducing our interoperability and complicating our code base.  If we don't
 want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON
 grammar and JSON.parse will do the leg work.

Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec.

Also, I'm a bit confused.  You talk about the rudimentary encoding
detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
mechanisms (HTTP headers and overrideMimeType).  These are separate
and unrelated.  If you're using HTTP mechanisms, then the JSON spec
doesn't enter into it.  If you're using both HTTP headers (HTTP) and
UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
two.  I can't tell what mechanism you're actually using.

 As someone else stated, this is a good fight but probably not the right 
 battlefield.

Strongly disagree.  Preventing legacy messes from being perpetuated
into new APIs is one of the *only* battlefields available, where we
can get people to stop using legacy encodings without breaking
existing content.

Anne: There's one related change I'd suggest.  Currently, if a JSON
response says Content-Encoding: application/json; charset=Shift_JIS,
the explicit charset will be silently ignored and UTF-8 will be used.
I think this should be explicitly rejected, returning null as the JSON
response entity body.  Don't decode as UTF-8 despite an explicitly
conflicting header, or people will start sending bogus charset values
without realizing it.

-- 
Glenn Maynard



Re: [XHR] responseType json

2012-01-06 Thread Julian Reschke

On 2012-01-06 17:20, Glenn Maynard wrote:
 ...

Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec.
...


You seem to feel strongly about this (and I might agree for UTF-32).

How about raising this issue in a place where there's an actual chance 
to cause changes? (- IETF apps-discuss)


Best regards, Julian



Re: [XHR] responseType json

2012-01-06 Thread Boris Zbarsky

On 1/6/12 11:20 AM, Glenn Maynard wrote:

Accepting content that other browsers don't will result in pages being
created that work only in WebKit.  That gives the least
interoperability, not the most.


I assume Jarred was talking about interoperability with content, not 
with other browsers.


And thus start most races to the bottom in web-land

-Boris



Re: [XHR] responseType json

2012-01-06 Thread Julian Reschke

On 2012-01-06 17:56, Boris Zbarsky wrote:

On 1/6/12 11:20 AM, Glenn Maynard wrote:

Accepting content that other browsers don't will result in pages being
created that work only in WebKit. That gives the least
interoperability, not the most.


I assume Jarred was talking about interoperability with content, not
with other browsers.

And thus start most races to the bottom in web-land


One could argue that it isn't a race to the bottom when the component 
accepts what is defined as valid (by the media type); and that the real 
problem is that another spec tries to profile that.


Best regards, Julian



RE: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

2012-01-06 Thread Young, Milan
The HTML Speech XG worked for over a year prioritizing use cases against
timelines and packaged all of that into a recommendation complete with
IDLs and examples.  So while I understand that WebApps may not have the
time to review the entirety of this work, it's hard to see how
dissecting it would speed the process of understanding.

 

Perhaps a better approach would be to find half an hour to present to
select members of WebApps the content of the recommendation and the
possible relevance to their group.  Does that sound reasonable?

 

Thanks

 

 

 

From: Glen Shires [mailto:gshi...@google.com] 
Sent: Wednesday, January 04, 2012 11:15 PM
To: public-webapps@w3.org
Cc: public-xg-htmlspe...@w3.org; Arthur Barstow; Dan Burnett
Subject: Speech Recognition and Text-to-Speech Javascript API - seeking
feedback for eventual standardization

 

As Dan Burnett wrote below: The HTML Speech Incubator Group [1] has
recently wrapped up its work on use cases, requirements, and proposals
for adding automatic speech recognition (ASR) and text-to-speech (TTS)
capabilities to HTML.  The work of the group is documented in the
group's Final Report. [2]  The members of the group intend this work to
be input to one or more working groups, in W3C and/or other standards
development organizations such as the IETF, as an aid to developing full
standards in this space.

 

Because that work was so broad, Art Barstow asked (below) for a
relatively specific proposal.  We at Google are proposing that a subset
of it be accepted as a work item by the Web Applications WG.
Specifically, we are proposing this Javascript API [3], which enables
web developers to incorporate speech recognition and synthesis into
their web pages. This simplified subset enables developers to use
scripting to generate text-to-speech output and to use speech
recognition as an input for forms, continuous dictation and control, and
it supports the majority of use-cases in the Incubator Group's Final
Report.

 

We welcome your feedback and ask that the Web Applications WG consider
accepting this Javascript API [3] as a work item.

 

[1] charter:  http://www.w3.org/2005/Incubator/htmlspeech/charter

[2] report: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/

[3] API:
http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
peechapi.html

 

Bjorn Bringert

Satish Sampath

Glen Shires

 

On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires gshi...@google.com
wrote:

Milan,

The IDLs contained in both documents are in the same format and order,
so it's relatively easy to compare the two side
http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#sp
eechreco-section -by-side
http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/
speechapi.html#api_description . The semantics of the attributes,
methods and events have not changed, and both IDLs link directly to the
definitions contained in the Speech XG Final Report. 

 

As you mention, we agree that the protocol portions of the Speech XG
Final Report are most appropriate for consideration by a group such as
IETF, and believe such work can proceed independently, particularly
because the Speech XG Final Report has provided a roadmap for these to
remain compatible.  Also, as shown in the Speech XG Final Report -
Overview
http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#in
troductory , the Speech Web API is not dependent on the Speech
Protocol and a Default Speech service can be used for local or remote
speech recognition and synthesis.

 

Glen Shires

 

On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan milan.yo...@nuance.com
wrote:

Hello Glen,

 

The proposal says that it contains a simplified subset of the
JavaScript API.  Could you please clarify which elements of the
HTMLSpeech recommendation's JavaScript API were omitted?   I think this
would be the most efficient way for those of us familiar with the XG
recommendation to evaluate the new proposal.

 

I'd also appreciate clarification on how you see the protocol being
handled.  In the HTMLSpeech group we were thinking about this as a
hand-in-hand relationship between W3C and IETF like WebSockets.  Is this
still your (and Google's) vision?

 

Thanks

 

 

From: Glen Shires [mailto:gshi...@google.com] 
Sent: Thursday, December 22, 2011 11:14 AM
To: public-webapps@w3.org; Arthur Barstow
Cc: public-xg-htmlspe...@w3.org; Dan Burnett


Subject: Re: HTML Speech XG Completes, seeks feedback for eventual
standardization

 

We at Google believe that a scripting-only (Javascript) subset of the
API defined in the Speech XG Incubator Group Final Report is of
appropriate scope for consideration by the WebApps WG.

 

The enclosed scripting-only subset supports the majority of the
use-cases and samples in the XG proposal. Specifically, it enables
web-pages to generate speech output and to use speech recognition as an
input for forms, continuous dictation and control. The Javascript API
will allow 

Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 11:20 AM, Glenn Maynard gl...@zewt.org wrote:

 Please be careful with quote markers; you quoted text written by me as
 written by Glenn Adams.


Sorry, copying from the archives into Gmail is a pain.



 On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org
 wrote:
  I'm getting responseType json landed in WebKit, and going to do so
 without
  the restriction of the JSON source being UTF-8.  We default our decoding
 to
  UTF-8 if none is dictated by the server or overrideMIMEType(), but we
 also
  do BOM detection and will gracefully switch to UTF-16(BE/LE) or
  UTF-32(BE/LE) if the context is encoded as such, and accept the source
  as-is.
 
  It's a matter of having that perfect recipe of easiest implementation +
  most interoperability.  It actually adds complication to our decoder if
 we

 Accepting content that other browsers don't will result in pages being
 created that work only in WebKit.


WebKit is used in many walled garden environments, so we consider these
scenarios, but as a secondary goal to our primary goal of being a standards
compliant browser engine.  The point being, there will always be content
that's created solely for WebKit, so that's not a good argument to make.
 So generally speaking, if someone is aiming to create content that's
x-browser compatible, they'll do just that and use the least common
denominators.


  That gives the least
 interoperability, not the most.


 If this behavior gets propagated into other browsers, that's even
 worse.  Gecko doesn't support UTF-32, and adding it would be a huge
 step backwards.


We're not adding anything here, it's a matter of complicating and taking
away from our decoder for one particular case.  You're acting like we're
adding UTF-32 support for the first time.



  do something special just for (perfectly legit) JSON payloads.  I think
  keeping that UTF-8 bit in the spec is fine, but I don't think WebKit
 will be
  reducing our interoperability and complicating our code base.  If we
 don't
  want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON
  grammar and JSON.parse will do the leg work.

 Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF
 spec.


So let's change the IETF spec as well - are we even fighting that battle
yet?



 Also, I'm a bit confused.  You talk about the rudimentary encoding
 detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
 mechanisms (HTTP headers and overrideMimeType).  These are separate
 and unrelated.  If you're using HTTP mechanisms, then the JSON spec
 doesn't enter into it.  If you're using both HTTP headers (HTTP) and
 UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
 two.  I can't tell what mechanism you're actually using.






  As someone else stated, this is a good fight but probably not the right
 battlefield.

 Strongly disagree.  Preventing legacy messes from being perpetuated
 into new APIs is one of the *only* battlefields available, where we
 can get people to stop using legacy encodings without breaking
 existing content.


without breaking existing content and yet killing UTF-16 and UTF-32
support just for responseType json would break existing UTF-16 and UTF-32
JSON.  Well, which is it?

Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding
for the web platform.  But it's also plausible to push these restrictions
not just in one spot in XHR, but across the web platform and also where the
web platform defers to external specs (e.g. JSON).  In this particular
case, an author will be more likely to just use responseText + JSON.parse
for content he/she cannot control - the content won't end up changing and
our initiative is circumvented.

I suggest taking this initiative elsewhere (at least in parallel), i.e.,
getting RFC4627 to only support UTF-8 encoding if that's the larger
picture.  To say that a legit JSON source can be stored as any Unicode
encoding but can only be transported as UTF-8 in this one particular XHR
case is inconsistent and only leads to worse interoperability and confusion
to those looking up these specs - if I go to JSON spec first, I'll see all
those encodings are supported and wonder why it doesn't work in this one
instance.  Are we out to totally confuse the hell out of authors?



 Anne: There's one related change I'd suggest.  Currently, if a JSON
 response says Content-Encoding: application/json; charset=Shift_JIS,
 the explicit charset will be silently ignored and UTF-8 will be used.
 I think this should be explicitly rejected, returning null as the JSON
 response entity body.  Don't decode as UTF-8 despite an explicitly
 conflicting header, or people will start sending bogus charset values
 without realizing it.


+1


 --
 Glenn Maynard


Re: [XHR] responseType json

2012-01-06 Thread Boris Zbarsky

On 1/6/12 12:13 PM, Jarred Nicholls wrote:

WebKit is used in many walled garden environments, so we consider these
scenarios, but as a secondary goal to our primary goal of being a
standards compliant browser engine.  The point being, there will always
be content that's created solely for WebKit, so that's not a good
argument to make.  So generally speaking, if someone is aiming to create
content that's x-browser compatible, they'll do just that and use the
least common denominators.


People never aim to create content that's cross-browser compatible per 
se, with a tiny minority of exceptions.


People aim to create content that reaches users.

What that means is that right now people are busy authoring webkit-only 
websites on the open web because they think that webkit is the only UA 
that will ever matter on mobile.  And if you point out this assumption 
to these people, they will tell you right to your face that it's a 
perfectly justified assumption.  The problem is bad enough that both 
Trident and Gecko have seriously considered implementing support for 
some subset of -webkit CSS properties.  Note that people here includes 
divisions of Google.


As a result, any time WebKit deviates from standards, that _will_ 100% 
guaranteed cause sites to be created that depend on those deviations; 
the other UAs then have the choice of not working on those sites or 
duplicating the deviations.


We've seen all this before, circa 2001 or so.

Maybe in this particular case it doesn't matter, and maybe the spec in 
this case should just change, but if so, please argue for that, as the 
rest of your mail does, not for the principle of shipping random spec 
violations just because you want to.   In general if WebKit wants to do 
special webkitty things in walled gardens that's fine.  Don't pollute 
the web with them if it can be avoided.  Same thing applies to other 
UAs, obviously.


-Boris



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 3:18 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 1/6/12 12:13 PM, Jarred Nicholls wrote:

 WebKit is used in many walled garden environments, so we consider these
 scenarios, but as a secondary goal to our primary goal of being a
 standards compliant browser engine.  The point being, there will always
 be content that's created solely for WebKit, so that's not a good
 argument to make.  So generally speaking, if someone is aiming to create
 content that's x-browser compatible, they'll do just that and use the
 least common denominators.


 People never aim to create content that's cross-browser compatible per se,
 with a tiny minority of exceptions.

 People aim to create content that reaches users.

 What that means is that right now people are busy authoring webkit-only
 websites on the open web because they think that webkit is the only UA that
 will ever matter on mobile.  And if you point out this assumption to these
 people, they will tell you right to your face that it's a perfectly
 justified assumption.  The problem is bad enough that both Trident and
 Gecko have seriously considered implementing support for some subset of
 -webkit CSS properties.  Note that people here includes divisions of
 Google.

 As a result, any time WebKit deviates from standards, that _will_ 100%
 guaranteed cause sites to be created that depend on those deviations; the
 other UAs then have the choice of not working on those sites or duplicating
 the deviations.

 We've seen all this before, circa 2001 or so.

 Maybe in this particular case it doesn't matter, and maybe the spec in
 this case should just change, but if so, please argue for that, as the rest
 of your mail does, not for the principle of shipping random spec violations
 just because you want to.


I think my entire mail was quite clear that the spec is inconsistent with
rfc4627 and perhaps that's where the changes need to happen, or else yield
to it.  Let's not be dogmatic here, I'm just pointing out the obvious
disconnect.

This is an editor's draft of a spec, it's not a recommendation, so it's
hardly a violation of anything.  This is a 2-way street, and often times
it's the spec that needs to change, not the implementation.  The point is,
there needs to be a very compelling reason to breach the contract of a
media type's existing spec that would yield inconsistent results from the
rest of the web platform layers, and involve taking away functionality that
is working perfectly fine and can handle all the legit content that's
already out there (as rare as it might be).

Let's get Crockford on our side, let him know there's a lot of support for
banishing UTF-16 and UTF-32 forever and change rfc4627.


   In general if WebKit wants to do special webkitty things in walled
 gardens that's fine.  Don't pollute the web with them if it can be avoided.
  Same thing applies to other UAs, obviously.


IE and WebKit have gracefully handled UTF-32 for a long time in other parts
of the platform, and despite it being an unsupported codec of the HTML
spec, they've continued to do so.  I've had nothing to do with this, so I'm
not to be held responsible for its present perpetuation ;)  My argument is
focused around the JSON media type's spec, which blatantly contradicts.




 -Boris




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2012-01-06 Thread Ms2ger

On 01/06/2012 10:28 PM, Jarred Nicholls wrote:

This is an editor's draft of a spec, it's not a recommendation, so it's
hardly a violation of anything.


With this kind of attitude, frankly, you shouldn't be implementing a spec.

HTH
Ms2ger




Re: Use Cases for Connectionless Push support in Webapps recharter

2012-01-06 Thread Bryan Sullivan
That is correct, the essential value in notification bearer flexibility is
resource conservation and contextual adaptability (eg bearer selection when
conditions warrant a change or limit choices).

On Wednesday, January 4, 2012, Charles Pritchard ch...@jumis.com wrote:
 a) Don't drain the battery.
 b) Don't waste bandwidth.
 c) Don't use the more expensive connection when a less expensive
connection is also available.


 On Jan 4, 2012, at 6:38 PM, Glenn Adams gl...@skynav.com wrote:

 what are the qualitative differences (if any) between these three use
cases?

 On Tue, Jan 3, 2012 at 5:51 PM, Bryan Sullivan bls...@gmail.com wrote:

 I had an action item to provide some use cases for the Webapps
 recharter process, related to the Push based on extending server-sent
 events topic at the last F2F (draft API proposal that was presented:
 http://bkaj.net/w3c/eventsource-push.html).

 The intent of the action item was to establish a basis for a Webapps
 charter item related to extending eventsource (or coming up with a new
 API) for the ability to deliver arbitrary notifications/data to
 webapps via connectionless bearers, as informationally described in
 Server-Sent Events (http://dev.w3.org/html5/eventsource/).

 Here are three use cases:

 1)  One of Bob’s most-used apps is a social networking webapp which
 enables him to remain near-realtime connected to his friends and
 colleagues. During his busy social hours, when he’s out clubbing, his
 phone stays pretty much connected full time, with a constant stream of
 friend updates. He likes to remain just as connected though during
 less-busy times, for example during the workday as friends post their
 lunch plans or other updates randomly. While he wants his favorite app
 to remain ready to alert him, he doesn’t want the app to drain his
 battery just to remain connected during low-update periods.

 2)  Alice is a collector, and is continually watching or bidding in
 various online auctions. When auctions are about to close, she knows
 the activity can be fast and furious and is usually watching her
 auction webapp closely. But in the long slow hours between auction
 closings, she still likes for her webapp to alert her about bids and
 other auction updates as they happen, without delay. She needs for her
 auction webapp to enable her to continually watch multiple auctions
 without fear that its data usage during the slow periods will
 adversely impact her profits.

 3)  Bob uses a web based real-time communications service and he
wants
 to be available to his friends and family even when his application is
 not running. Bob travels frequently and it is critical for him to
 optimize data usage and preserve battery. Bob’s friends can call him
 up to chat using video/audio or just text and he wants to make sure
 they can reach him irrespective of what device and what network he is
 connected at any given time.

 Comments/questions?

 --
 Thanks,
 Bryan Sullivan




-- 
Thanks,
Bryan Sullivan


Re: [XHR] responseType json

2012-01-06 Thread Ojan Vafai
On Fri, Jan 6, 2012 at 12:18 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 1/6/12 12:13 PM, Jarred Nicholls wrote:

 WebKit is used in many walled garden environments, so we consider these
 scenarios, but as a secondary goal to our primary goal of being a
 standards compliant browser engine.  The point being, there will always
 be content that's created solely for WebKit, so that's not a good
 argument to make.  So generally speaking, if someone is aiming to create
 content that's x-browser compatible, they'll do just that and use the
 least common denominators.


 People never aim to create content that's cross-browser compatible per se,
 with a tiny minority of exceptions.

 People aim to create content that reaches users.

 What that means is that right now people are busy authoring webkit-only
 websites on the open web because they think that webkit is the only UA that
 will ever matter on mobile.  And if you point out this assumption to these
 people, they will tell you right to your face that it's a perfectly
 justified assumption.  The problem is bad enough that both Trident and
 Gecko have seriously considered implementing support for some subset of
 -webkit CSS properties.  Note that people here includes divisions of
 Google.

 As a result, any time WebKit deviates from standards, that _will_ 100%
 guaranteed cause sites to be created that depend on those deviations; the
 other UAs then have the choice of not working on those sites or duplicating
 the deviations.

 We've seen all this before, circa 2001 or so.

 Maybe in this particular case it doesn't matter, and maybe the spec in
 this case should just change, but if so, please argue for that, as the rest
 of your mail does, not for the principle of shipping random spec violations
 just because you want to.   In general if WebKit wants to do special
 webkitty things in walled gardens that's fine.  Don't pollute the web with
 them if it can be avoided.  Same thing applies to other UAs, obviously.


I'm ambivalent about whether we should restrict to utf8 or not. On the one
hand, having everyone on utf8 would greatly simplify the web. On the other
hand, I can imagine this hurting download size for japanese/chinese
websites (i.e. they'd want utf-16).

I agree with Boris that we don't need to pollute the web if we want to
expose this to WebKit's walled-garden environments. We have mechanisms for
exposing things only to those environments specifically to avoid this
problem. Lets keep this discussion focused on what's best for the web. We
can make WebKit do the appropriate thing.


Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote:

 On 01/06/2012 10:28 PM, Jarred Nicholls wrote:

 This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.


 With this kind of attitude, frankly, you shouldn't be implementing a spec.


I resent that comment, because I'm one of the few that fight in WebKit to
get us 100% spec compliant in XHR (don't even get me started with how many
violations there are in Firefox, IE, and Opera...WebKit isn't the only
one mind you), but that doesn't mean any spec addition, as fluid as it is
in the early stages, is gospel.  In this case I simply think it wasn't
debated enough before going in - actually it wasn't debated at all, it was
just placed in there and now I'm a bad guy for pointing out its disconnect?
 I think your attitude is far poorer.

The web platform changes all the time - if this matter is sured up, then
implementations will change accordingly.



 HTH
 Ms2ger





Re: [XHR] responseType json

2012-01-06 Thread Bjoern Hoehrmann
* Jarred Nicholls wrote:
This is an editor's draft of a spec, it's not a recommendation, so it's
hardly a violation of anything.  This is a 2-way street, and often times
it's the spec that needs to change, not the implementation.  The point is,
there needs to be a very compelling reason to breach the contract of a
media type's existing spec that would yield inconsistent results from the
rest of the web platform layers, and involve taking away functionality that
is working perfectly fine and can handle all the legit content that's
already out there (as rare as it might be).

You have yet to explain how you propose Webkit should behave, and it is
rather unclear to me whether the proposed behavior is in line with the
existing HTTP, MIME, and JSON specifications. A HTTP response with

  Content-Type: application/json;charset=iso-8859-15

for instance must not be treated as ISO-8859-15 encoded as there is no
charset parameter for the application/json media type, and there is no
other reason to treat it as ISO-8859-15, so it's either an error, or
you silently ignore the unrecognized parameter.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 4:58 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:

 Long experience shows that people who say things like I'm going to code
 against the Rec
 instead of the draft, because the Rec is more stable


I know that's a common error, but I never said I was going against a Rec.
 My point was that the editor's draft is fluid enough that it can be
debated and changed, as it's clearly not perfect at any point in time.
 Debating a change to it doesn't put anyone in the wrong, and certainly
doesn't mean I'm violating it - because tomorrow, my proposed violation
could be the current state of the spec.



 RFC4627, for example, is six years old.  This was right about the
 beginning of the time when UTF-8 everywhere, dammit was really
 starting to gain hold as a reasonable solution to encoding hell.
 Crockford, as well, is not a browser dev, nor is he closely connected
 to browser devs in a capacity that would really inform him of why
 supporting multiple encodings on the web is so painful.  So, looking
 to that RFC for guidance on current best-practice is not a good idea.

 This issue has been debated and argued over for a long time, far
 predating the current XHR bit.  There's a reason why new file formats
 produced in connection with web stuff are utf8-only.  It's good for
 the web if we're consistent about this.


 ~TJ



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 4:54 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 * Jarred Nicholls wrote:
 This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.  This is a 2-way street, and often times
 it's the spec that needs to change, not the implementation.  The point is,
 there needs to be a very compelling reason to breach the contract of a
 media type's existing spec that would yield inconsistent results from the
 rest of the web platform layers, and involve taking away functionality
 that
 is working perfectly fine and can handle all the legit content that's
 already out there (as rare as it might be).

 You have yet to explain how you propose Webkit should behave, and it is
 rather unclear to me whether the proposed behavior is in line with the
 existing HTTP, MIME, and JSON specifications. A HTTP response with

  Content-Type: application/json;charset=iso-8859-15

 for instance must not be treated as ISO-8859-15 encoded as there is no
 charset parameter for the application/json media type, and there is no
 other reason to treat it as ISO-8859-15, so it's either an error, or
 you silently ignore the unrecognized parameter.


I think the spec should clarify this.  I agree with Glenn Maynard's
proposal: if a server sends a specific charset to use that isn't UTF-8, we
should explicitly reject it, never decode or parse the text and return
null.  Silently decoding in UTF-8 when the server or author is dictating
something different could cause confusion.


 --
 Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
 Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote:

 WebKit is used in many walled garden environments, so we consider these
 scenarios, but as a secondary goal to our primary goal of being a standards
 compliant browser engine.  The point being, there will always be content
 that's created solely for WebKit, so that's not a good argument to make.
  So generally speaking, if someone is aiming to create content that's
 x-browser compatible, they'll do just that and use the least common
 denominators.


If you support UTF-16 here, then people will use it.  That's always the
pattern on the web--one browser implements something extra, and everyone
else ends up having to implement it--whether or not it was a good
idea--because people accidentally started depending on it.  I don't know
why we have to keep repeating this mistake.

We're not adding anything here, it's a matter of complicating and taking
 away from our decoder for one particular case.  You're acting like we're
 adding UTF-32 support for the first time.


Of course you are; you're adding UTF-16 and UTF-32 support to the
responseType == json API.

Also, since JSON uses zero-byte detection, which isn't used by HTML at all,
you'd still need code in your decoder to support that--which means you're
forcing everyone else to complicate *their* decoders with this special case.

XHR's behavior, if the change I suggested is accepted, shouldn't require
special cases in a decoding layer.  I'd have the decoder expose the final
encoding in use (which I'd expect to be available already), and when
.response is queried, return null if the final encoding used by the decoder
wasn't UTF-8.  This means the decoding would still take place for other
encodings, but the end result would be discarded by XHR.  This puts the
handling for this restriction within the XHR layer, rather than at the
decoder layer.

I said:

  Also, I'm a bit confused.  You talk about the rudimentary encoding
 detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
 mechanisms (HTTP headers and overrideMimeType).  These are separate
 and unrelated.  If you're using HTTP mechanisms, then the JSON spec
 doesn't enter into it.  If you're using both HTTP headers (HTTP) and
 UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
 two.  I can't tell what mechanism you're actually using.


Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte
detection.  My question remains, though: what exactly are you doing?  Do
you do zero-byte detection?  Do you do BOM detection?  What's the order of
precedence between zero-byte and/or BOM detection, HTTP Content-Type
headers, and overrideMimeType if they disagree?  All of this would need to
be specified; currently none of it is.



 without breaking existing content and yet killing UTF-16 and UTF-32
 support just for responseType json would break existing UTF-16 and UTF-32
 JSON.  Well, which is it?


This is a new feature; there isn't yet existing content using a
responseType of json to be broken.

 Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding
 for the web platform.  But it's also plausible to push these restrictions
 not just in one spot in XHR, but across the web platform


I've yet to see a workable proposal to do this across the web platform, due
to backwards-compatibility.  That's why it's being done more narrowly,
where it can be done without breaking existing pages.  If you have any
novel ideas to do this across the platform, I guarantee everyone on the
list would like to hear them.  Failing that, we should do what we can where
we can.

and also where the web platform defers to external specs (e.g. JSON).  In
 this particular case, an author will be more likely to just use
 responseText + JSON.parse for content he/she cannot control - the content
 won't end up changing and our initiative is circumvented.


Of course not.  It tells the developer that something's wrong, and he has
the choice of working around it or fixing his service.  If just 25% of
those people make the right choice, this is a win.  It also helps
discourage new services from being written using legacy encodings.  We
can't stop people from doing the wrong thing, but that doesn't mean we
shouldn't point people in the right direction.

This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.


This is the worst thing I've seen anyone say in here in a long time.

On Fri, Jan 6, 2012 at 12:25 PM, Julian Reschke julian.resc...@gmx.dewrote:

 One could argue that it isn't a race to the bottom when the component
 accepts what is defined as valid (by the media type); and that the real
 problem is that another spec tries to profile that.


First off, it's common and perfectly normal for an API exposing features
from another spec to explicitly limit the allowed profile of that spec.
Saying JSON through this API must be UTF-8 is perfectly OK.

Second, this 

Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls


Sent from my iPhone

On Jan 6, 2012, at 7:11 PM, Glenn Maynard gl...@zewt.org wrote:

 On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote:
 WebKit is used in many walled garden environments, so we consider these 
 scenarios, but as a secondary goal to our primary goal of being a standards 
 compliant browser engine.  The point being, there will always be content 
 that's created solely for WebKit, so that's not a good argument to make.  So 
 generally speaking, if someone is aiming to create content that's x-browser 
 compatible, they'll do just that and use the least common denominators.
 
 If you support UTF-16 here, then people will use it.  That's always the 
 pattern on the web--one browser implements something extra, and everyone else 
 ends up having to implement it--whether or not it was a good idea--because 
 people accidentally started depending on it.  I don't know why we have to 
 keep repeating this mistake.
 
 We're not adding anything here, it's a matter of complicating and taking 
 away from our decoder for one particular case.  You're acting like we're 
 adding UTF-32 support for the first time.
 
 Of course you are; you're adding UTF-16 and UTF-32 support to the 
 responseType == json API.
 
 Also, since JSON uses zero-byte detection, which isn't used by HTML at all, 
 you'd still need code in your decoder to support that--which means you're 
 forcing everyone else to complicate *their* decoders with this special case.
 
 XHR's behavior, if the change I suggested is accepted, shouldn't require 
 special cases in a decoding layer.  I'd have the decoder expose the final 
 encoding in use (which I'd expect to be available already), and when 
 .response is queried, return null if the final encoding used by the decoder 
 wasn't UTF-8.  This means the decoding would still take place for other 
 encodings, but the end result would be discarded by XHR.  This puts the 
 handling for this restriction within the XHR layer, rather than at the 
 decoder layer.

That's why I'd like to see the spec changed to clarify the discarding if the 
encoding was supplied and isn't UTF-8.

 
 I said:
 Also, I'm a bit confused.  You talk about the rudimentary encoding
 detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
 mechanisms (HTTP headers and overrideMimeType).  These are separate
 and unrelated.  If you're using HTTP mechanisms, then the JSON spec
 doesn't enter into it.  If you're using both HTTP headers (HTTP) and
 UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
 two.  I can't tell what mechanism you're actually using.
 
 Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte 
 detection.  My question remains, though: what exactly are you doing?  Do you 
 do zero-byte detection?  Do you do BOM detection?  What's the order of 
 precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, 
 and overrideMimeType if they disagree?  All of this would need to be 
 specified; currently none of it is.

None of that matters if a specific codec is the one all be all.  If that's the 
consensus then that's it, period.

WebKit shares a single text decoder globally for HTML, XML, plain text, etc. 
the XHR payload runs through it before it would pass to JSON.parse.  Read the 
code if you're interested.  I would need to change the text decoder to skip BOM 
detection for this one case unless the spec added that wording of discarding 
when encoding != UTF-8, then that can be enforced all in XHR with no decoder 
changes.  I don't want to get hung on explaining WebKit's specific impl. 
details.

 
  
 without breaking existing content and yet killing UTF-16 and UTF-32 support 
 just for responseType json would break existing UTF-16 and UTF-32 JSON.  
 Well, which is it?
 
 This is a new feature; there isn't yet existing content using a responseType 
 of json to be broken.
 
 Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for 
 the web platform.  But it's also plausible to push these restrictions not 
 just in one spot in XHR, but across the web platform
 
 I've yet to see a workable proposal to do this across the web platform, due 
 to backwards-compatibility.  That's why it's being done more narrowly, where 
 it can be done without breaking existing pages.  If you have any novel ideas 
 to do this across the platform, I guarantee everyone on the list would like 
 to hear them.  Failing that, we should do what we can where we can.
 
 and also where the web platform defers to external specs (e.g. JSON).  In 
 this particular case, an author will be more likely to just use responseText 
 + JSON.parse for content he/she cannot control - the content won't end up 
 changing and our initiative is circumvented.
 
 Of course not.  It tells the developer that something's wrong, and he has the 
 choice of working around it or fixing his service.  If just 25% of those 
 people make the right choice, this is a 

Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote:

 Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte
 detection.  My question remains, though: what exactly are you doing?  Do
 you do zero-byte detection?  Do you do BOM detection?  What's the order of
 precedence between zero-byte and/or BOM detection, HTTP Content-Type
 headers, and overrideMimeType if they disagree?  All of this would need to
 be specified; currently none of it is.


 None of that matters if a specific codec is the one all be all.  If that's
 the consensus then that's it, period.

 WebKit shares a single text decoder globally for HTML, XML, plain text,
 etc. the XHR payload runs through it before it would pass to JSON.parse.
  Read the code if you're interested.  I would need to change the text
 decoder to skip BOM detection for this one case unless the spec added that
 wording of discarding when encoding != UTF-8, then that can be enforced all
 in XHR with no decoder changes.  I don't want to get hung on explaining
 WebKit's specific impl. details.


All of the details I asked about are user-visible, not WebKit
implementation details, and would need to be specified if encodings other
than UTF-8 were allowed.  I do think this should remain UTF-8 only, but if
you want to discuss allowing other encodings, these are things that would
need to be defined (which requires a clear proposal, not read the code).

I assume it's not using the exact same decoder logic as HTML.  After all,
that would allow non-Unicode encodings.

-- 
Glenn Maynard


Re: Pressing Enter in contenteditable: p or br or div?

2012-01-06 Thread Ojan Vafai
BCC: whatwg, CC:public-webapps since discussion of the editing spec has
moved

I'm OK with this conclusion, but I still strongly prefer div to be the
default single-line container name. Also, I'd really like the default
single-line container name to be configurable in some way. Different apps
have different needs and it's crappy for them to have to handle enter
themselves just to get a different block type on enter.

Something like document.execCommand(DefaultBlock, false, tagName). What
values are valid for tagName are open to discussion. At a minimum, I'd want
to see div, p and br. As one proof that this is valuable, the Closure
editor supports these three with custom code and they are all used in
different apps. I'm tempted to say that any block type should be allowed,
but I'd be OK with starting with the tree above. For example, I could see a
use-case for li if you wanted an editable widget that only contained a
single list.

Ojan

On Mon, May 30, 2011 at 1:16 PM, Aryeh Gregor simetrical+...@gmail.comwrote:

 On Thu, May 12, 2011 at 4:28 PM, Aryeh Gregor simetrical+...@gmail.com
 wrote:
  Behavior for Enter in contenteditable in current browsers seems to be
  as follows:
 
  * IE9 wraps all lines in p (including if you start typing in an
  empty text box).  If you hit Enter multiple times, it inserts empty
  ps.  Shift-Enter inserts br.
  * Firefox 4.0 just uses br _moz_dirty= for Enter and Shift-Enter,
  always.  (What's _moz_dirty for?)
  * Chrome 12 dev doesn't wrap a line when you start typing, but when
  you hit Enter it wraps the new line in a div.  Hitting Enter
  multiple times outputs divbr/div, and Shift-Enter always inserts
  br.
  * Opera 11.10 wraps in p like IE, but for blank lines it uses
  pbr/p instead of just p/p (they render the same).
 
  What behavior do we want?

 I ended up going with the general approach of IE/Opera:


 http://aryeh.name/spec/editcommands/editcommands.html#additional-requirements

 It turns out WebKit and Opera make the insertParagraph command behave
 essentially like hitting Enter, so I actually wrote all the
 requirements there (IE's and Firefox's behavior for insertParagraph
 was very different and didn't seem useful):


 http://aryeh.name/spec/editcommands/editcommands.html#the-insertparagraph-command

 The basic idea is that if the cursor isn't wrapped in a single-line
 container (address, dd, div, dt, h*, li, p, pre) then the current line
 gets wrapped in a p.  Then the current single-line container is
 split in two, mostly.  Exceptions are roughly:

 * For pre and address, insert a br instead of splitting the element.
  (This matches Firefox for pre and address, and Opera for pre but not
 address.  IE/Chrome make multiple pres/addresses.)
 * For an empty li/dt/dd, destroy it and break out of its container, so
 hitting Enter twice in a list breaks out of the list.  (Everyone does
 this for li, only Firefox does for dt/dd.)
 * If the cursor is at the end of an h* element, make the new element a
 p instead of a header.  (Everyone does this.)
 * If the cursor is at the end of a dd/dt element, it switches to dt/dd
 respectively.  (Only Firefox does this, but it makes sense.)

 Like the rest of the spec, this is still a rough draft and I haven't
 tried to pin corner cases down yet, so it's probably not advisable to
 try implementing it yet as written.  As always, you can see how the
 spec implementation behaves for various input by looking at
 autoimplementation.html:

 http://aryeh.name/spec/editcommands/autoimplementation.html#insertparagraph



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Jan 6, 2012, at 8:10 PM, Glenn Maynard gl...@zewt.org wrote:

 On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote:
 Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte 
 detection.  My question remains, though: what exactly are you doing?  Do you 
 do zero-byte detection?  Do you do BOM detection?  What's the order of 
 precedence between zero-byte and/or BOM detection, HTTP Content-Type 
 headers, and overrideMimeType if they disagree?  All of this would need to 
 be specified; currently none of it is.
 
 None of that matters if a specific codec is the one all be all.  If that's 
 the consensus then that's it, period.
 
 WebKit shares a single text decoder globally for HTML, XML, plain text, etc. 
 the XHR payload runs through it before it would pass to JSON.parse.  Read the 
 code if you're interested.  I would need to change the text decoder to skip 
 BOM detection for this one case unless the spec added that wording of 
 discarding when encoding != UTF-8, then that can be enforced all in XHR with 
 no decoder changes.  I don't want to get hung on explaining WebKit's specific 
 impl. details.
 
 All of the details I asked about are user-visible, not WebKit implementation 
 details, and would need to be specified if encodings other than UTF-8 were 
 allowed.  I do think this should remain UTF-8 only, but if you want to 
 discuss allowing other encodings, these are things that would need to be 
 defined (which requires a clear proposal, not read the code).

Of course, I apologize I didn't mean it as a dismissal, I just figured if we 
are settled on one codec then I'd spare ourselves the time.  I'm also mobile :) 
I could provide you those details if no decoding changes (enforcement) were 
done in WebKit, if you'd like.  But since this is a new API, might as well just 
stick to UTF-8.

 
 I assume it's not using the exact same decoder logic as HTML.  After all, 
 that would allow non-Unicode encodings.

Not exact, but close.  For discussion's sake and in this context, you could 
call it the Unicode text decoder that does BOM detection and switches Unicode 
codecs automatically.  For enforced UTF-8 I'd (have to) disable the BOM 
detection, but additionally could avoid decoding altogether if the specified 
encoding is not explicitly UTF-8 (and that was a part of the spec).  We'll make 
it work either way :)

 
 -- 
 Glenn Maynard
 


[editing] tab in an editable area WAS: [whatwg] behavior when typing in contentEditable elements

2012-01-06 Thread Ojan Vafai
BCC: whatwg, CC:public-webapps since discussion of the editing spec has
moved

On Tue, Jun 14, 2011 at 12:54 PM, Aryeh Gregor simetrical+...@gmail.comwrote:

 You suggest that the tab key in browsers should act like indent, as in

dedicated text editors.  This isn't tenable -- it means that if you're
 using Tab to cycle through focusable elements on the page, as soon as
 it hits a contenteditable area it will get stuck and start doing
 something different.  No current browser does this, for good reason.


There are strong use-cases for both. In an app like Google Docs you
certainly want tab to act like indent. In a mail app, it's more of a
toss-up. In something like the Google+ sharing widget, you certainly want
it to maintain normal web tabbing behavior. Anecdotally, gmail has an
internal lab to enable document-like tabbing behavior and it is crazy
popular. People gush over it.

We should make this configurable via execCommand:
document.execCommand(TabBehavior, false, bitmask);

The bitmask is because you might want a different set of behaviors:
-Tabbing in lists
-Tabbing in table cells
-Tabbing blockquotes
-Tab in none of the above insert a tab
-Tab in none of the above insert X spaces (X is controlled by the CSS
tab-size property?)

Ojan


Re: [XHR] responseType json

2012-01-06 Thread Tab Atkins Jr.
On Fri, Jan 6, 2012 at 1:45 PM, Jarred Nicholls jar...@webkit.org wrote:
 On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote:
 On 01/06/2012 10:28 PM, Jarred Nicholls wrote:
 This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.

 With this kind of attitude, frankly, you shouldn't be implementing a spec.

 I resent that comment, because I'm one of the few that fight in WebKit to
 get us 100% spec compliant in XHR (don't even get me started with how many
 violations there are in Firefox, IE, and Opera...WebKit isn't the only one
 mind you), but that doesn't mean any spec addition, as fluid as it is in the
 early stages, is gospel.  In this case I simply think it wasn't debated
 enough before going in - actually it wasn't debated at all, it was just
 placed in there and now I'm a bad guy for pointing out its disconnect?  I
 think your attitude is far poorer.

 The web platform changes all the time - if this matter is sured up, then
 implementations will change accordingly.

While Ms2ger was a bit short, there's a reason.  Long experience shows
that people who say things like I'm going to code against the Rec
instead of the draft, because the Rec is more stable often end up
causing pain for everyone else, because that more stable Rec is also
*more wrong*, precisely because stable means hasn't been updated to
take into account new information or to fix bugs.  This happens even
for smaller differences - well-meaning devs coding to the Working
Draft of a spec on /TR instead of the error-corrected Editor's Draft
cause never-ending pain.

Old RFCs are also often a source of pain, because we quite often find
that the authors aren't fully versed in the complexities and
subtleties of the public web.  They may be operating from an academic
or corporate standpoint, or otherwise be contained in a local
experience-minimum that affects their view of what's reasonable.

RFC4627, for example, is six years old.  This was right about the
beginning of the time when UTF-8 everywhere, dammit was really
starting to gain hold as a reasonable solution to encoding hell.
Crockford, as well, is not a browser dev, nor is he closely connected
to browser devs in a capacity that would really inform him of why
supporting multiple encodings on the web is so painful.  So, looking
to that RFC for guidance on current best-practice is not a good idea.

This issue has been debated and argued over for a long time, far
predating the current XHR bit.  There's a reason why new file formats
produced in connection with web stuff are utf8-only.  It's good for
the web if we're consistent about this.

~TJ



Re: [XHR] responseType json

2012-01-06 Thread Tab Atkins Jr.
On Fri, Jan 6, 2012 at 12:36 PM, Ojan Vafai o...@chromium.org wrote:
 I'm ambivalent about whether we should restrict to utf8 or not. On the one
 hand, having everyone on utf8 would greatly simplify the web. On the other
 hand, I can imagine this hurting download size for japanese/chinese websites
 (i.e. they'd want utf-16).

Note that this may be subject to the same counter-intuitive forces
that cause UTF-8 to usually be better for CJK HTML pages (because a
lot of the source is ASCII markup).  In JSON, all of the markup
artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII,
along with numbers, bools, and null.  Only the contents of strings can
be non-ascii.

JSON is generally lighter on markup than XML-like languages, so the
effect may not be as pronounced, but it shouldn't be dismissed without
some study.  At minimum, it will *reduce* the size difference between
the two.

~TJ



Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
On Fri, Jan 6, 2012 at 4:45 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:

 Note that this may be subject to the same counter-intuitive forces
 that cause UTF-8 to usually be better for CJK HTML pages (because a
 lot of the source is ASCII markup).  In JSON, all of the markup
 artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII,
 along with numbers, bools, and null.  Only the contents of strings can
 be non-ascii.

 JSON is generally lighter on markup than XML-like languages, so the
 effect may not be as pronounced, but it shouldn't be dismissed without
 some study.  At minimum, it will *reduce* the size difference between
 the two.


And more fundamentally, this is trying to repurpose charsets as a
compression mechanism.  If you want compression, use compression
(Transfer-Encoding: gzip):

-rw-rw-r-- 1 glenn glenn 7274 Jan 06 23:59 test-utf8.txt
-rw-rw-r-- 1 glenn glenn 3672 Jan 06 23:59 test-utf8.txt.gz
-rw-rw-r-- 1 glenn glenn 6150 Jan 06 23:59 test-utf16.txt
-rw-rw-r-- 1 glenn glenn 3468 Jan 06 23:59 test-utf16.txt.gz

The difference even without compression isn't enough to warrant the
complexity (~15%), and with compression the difference is under 10%.

(Test case is simply copying the rendered text from
http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8in
Firefox.)

-- 
Glenn Maynard