from:"Henri Sivonen"

Re: inline declarative manifest, was Re: New manifest spec - ready for FPWD?

2013-12-04 Thread Henri Sivonen

On Wed, Dec 4, 2013 at 8:16 AM, Jonas Sicking jo...@sicking.cc wrote:
 On Dec 3, 2013 9:25 PM, Marcos Caceres w...@marcosc.com wrote:
 On Wednesday, December 4, 2013 at 9:40 AM, Jonas Sicking wrote:
  We currently have both script.../script and script src=..., as
  well as both style.../style and style src. A big reason we have
  both is for author convenience.


 I thought this was because script and style are “this page’s script
 and style” (i.e., the scope is very specific).

 This is different to the manifest, which is “this loosely grouped set of
 navigable resources forms a web-app”.

 Some web-apps are single-page. If they are simple enough I don't see
 anything wrong with that.

I think we  shouldn't optimize for the single-page  case. Even
single-page apps probably have some bitmaps that they don't include as
data: URLs. On the other hand, for multi-page apps and inline manifest
 would be really inefficient. That is, external-only manifests seem
quite reasonable to me.

 meta name=manifest content='{
 a: 1,
 b: foopy
 }'

Are manifests really short enough for this kind of thing?

What happened to the idea from February to stick a JSON-based caching
description that desugars into NavigationController  into the same
manifest?  Are we absolutely sure that we don't  want the manifest to
grow to do AppCache-ish things that pretty much require the
declaration to be an attribute on html?

-- 
Henri Sivonen
hsivo...@hsivonen.fi
http://hsivonen.fi/

Re: File API for Review

2013-03-08 Thread Henri Sivonen

Additionally, I think http://www.w3.org/TR/FileAPI/#dfn-type should
clarify that the browser must not use statistical methods to guess the
charset parameter part of the type as part of determining the type.
Firefox currently asks magic 8-ball, but the magic 8-ball is broken.
AFAICT, WebKit does not guess, so I hope it's possible to remove the
guessing from Firefox.

(The guessing in Firefox relies on a big chunk of legacy code that's
broken and shows no signs of ever getting fixed properly. The File API
is currently the only thing in Firefox that exposes the mysterious
behavior of said legacy code to the Web using the default settings of
Firefox, so I'm hoping to remove the big chunk of legacy code instead
of fixing it properly.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Review of the template spec

2012-12-11 Thread Henri Sivonen

I reviewed 
http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html#dfn-template
. Sorry about the delay.

Comments:

XML parsing isn’t covered. (Should have the wormhole per the
discussion at TPAC.)

XSLT output isn’t covered. (When an XSLT program trying to generate a
template element with children, the children should go through the
wormhole.)

Interaction with the DOM to XDM mapping isn’t covered per discussion
at TPAC. (Expected template contents not to appear in the XDM when
invoking the XPath DOM API (for consistency with querySelectorAll) but
expected them to appear in the XDM when an XSLT transformation is
being processed (to avoid precluding use cases).)

 1. If DOCUMENT does not have a browsing context, Let TEMPLATE CONTENTS OWNER 
 be DOCUMENT and abort these steps.
 2. Otherwise, Let TEMPLATE CONTENTS OWNER be a new Document node that does 
 not have a browsing context.

Is there a big win from this inconsistency? Why not always have a
separate doc as the template contents owner?

Do we trust the platform never to introduce a way to plant a document
that does not have a browsing context into a browsing context?
(Unlikely, but do we really want to make the bet?)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Making template play nice with XML and tags-and-text

2012-08-05 Thread Henri Sivonen

On Wed, Jul 18, 2012 at 11:35 PM, Adam Barth w...@adambarth.com wrote:
 On Wed, Jul 18, 2012 at 11:29 AM, Adam Klein ad...@chromium.org wrote:

 On Wed, Jul 18, 2012 at 9:19 AM, Adam Barth w...@adambarth.com wrote:

 Inspired by a conversation with hsivonen in #whatwg, I spend some time
 thinking about how we would design template for an XML world.  One idea I
 had was to put the elements inside the template into a namespace other than
 http://www.w3.org/1999/xhtml.

On the face of things, this seems a lot less scary than the wormhole
model. I think this merits further exploration! Thank you!

 One question about your proposal: do the contents of template in an HTML
 Unlike the existing wormhole template semantics, in this approach the
 tags-and-text inside template would translate into DOM as usual for XML.
 We'd get the inert behavior for free because we'd avoid defining any
 behavior for elements in the http://www.w3.org/2012/xhtml-template namespace
 (just as no behavior is defined today).

 This does get you inertness, but doesn't avoid querySelector matching
 elements inside template.

If changes of the magnitude discussed here are on the table for HTML
parsing, I don't see why querySelectorAll() or even Selectors should
be assumed to be unchangeable.

 Also, the elements inside template, though they appear to be HTML,
 wouldn't have any of the IDL attributes one might expect, e.g., a
 href=foo/a would have no href property in JS (nor would img have
 src, etc). They are, perhaps, too inert.

I think that's not a problem, because you're not supposed to mutate
the template anyway. You're supposed to clone the template and then
mutate the clone.

 That's unfortunate.  I guess that means CSS styles will get applied to them
 as well, which wouldn't be what authors would want.

That's not really a problem as long as subtrees with elements in the
template namespaces are rooted at a display: none; template element.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] HTML Parsing and the template element

2012-06-15 Thread Henri Sivonen

On Thu, Jun 14, 2012 at 11:48 PM, Ian Hickson i...@hixie.ch wrote:
 Does anyone object to me adding template, content, and shadow to
 the HTML parser spec next week?

I don't object to adding them if they create normal child elements in
the DOM. I do object if template has a null firstChild and the new
property that leads to a fragment that belongs to a different owner
document.

(My non-objection to creating normal children in the DOM should not be
read as a commitment to support templates Gecko.)


-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] HTML Parsing and the template element

2012-06-15 Thread Henri Sivonen

On Tue, Jun 12, 2012 at 12:14 AM, Rafael Weinstein rafa...@google.com wrote:
 On Mon, Jun 11, 2012 at 3:13 PM, Henri Sivonen hsivo...@iki.fi wrote:
 On Thu, Jun 7, 2012 at 8:35 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 Just saying that querySelector/All doesn't match elements in a
 template (unless the scope is inside the template already) would work,
 but it means that we have to make sure that all future similar APIs
 also pay attention to this.

 I think that would be preferable compared to opening the Pandora's box
 of breaking the correspondence between the markup of the DOM tree.
 Besides, we'd need something like this for the XML case anyway if the
 position the spec takes is that it shies away from changing the
 correspondence between XML source and the DOM.

 In general, I think the willingness to break the correspondence
 between the source and the DOM should be the same for both HTML and
 XML serializations. If you believe that it's not legitimate to break
 the correspondence between XML source and the DOM, it would be logical
 to treat radical changes to the correspondence between HTML source and
 the DOM as equally suspect.

 I think looking at this as whether we are breaking the correspondance
 between source and DOM may not be helpful -- because it's likely to
 be a matter of opinion.

Arguments that correspondence between the source and the DOM are a
matter of opinion in the HTML case is the sort of slippery slope I'm
worried about. After all, in theory, we could make the parsing
algorithm output whatever data structure. If we make an exception
here, I expect that we'll see more proposals that will involve the
parser generating non-traditional DOM structures in order to
accommodate supposed API benefits at the expense of the old
DOM-assuming stuff working generically.

In the XML case, the correspondence between the source and the DOM is
not a matter of opinion. At least not at present. It seems that the
template spec is not willing to change that. Why? Why doesn't the same
answer apply to HTML?

I think we shouldn't violate the DOM Consistency Design Principle and
make templates have wormholes to another document when parsed from
text/html but have normal children when parsed from
application/xhtml+xml. That sort of route will lead to having to
implement template inertness twice and one of the solutions will be
one that's supposedly being avoided by the proposed design for HTML.

 There are several axes of presence for elements WRT to a Document:

 -serialization: do the elements appear in the serialization of the
 Document, as delivered to the client and if the client re-serializes
 via innerHTML, etc...
 -DOM traversal: do the elements appear via traversing the document's
 childNodes hierarchy
 -querySelector*, get*by*, etc: are the element's returned via various
 document-level query mechanisms
 -CSS: are the element's considered for matching any present or future
 document-level selectors

I'm arguing that these axes should be coupled the way they are now and
have always been. (And if you want them decoupled, the decoupling
should be done e.g. using selectors that specifically prohibit
templates in the parent chain of the selected node or using APIs that
aren't low-level tree accessors like firstChild, childNodes and
getElementById().)

For better or worse, the DOM is a data structure that represents the
markup and then has some higher-level-feature sugaring. It's not a
data structure whose shape is dictated by the higher-level features.
The DOM is so fundamental to the platform that I think the bar for
changing its nature should be very high. Certainly higher than one
vendor wishing to proceed with the feature that they've come up with.
Certainly not for a feature that could be implemented without breaking
the traditional model with relative ease (making inertness check walk
the parent chain [or propagate an equivalent flag down for O(1) read]
and rooting selector queries differently or using *:not(template)).

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] HTML Parsing and the template element

2012-06-11 Thread Henri Sivonen

On Thu, Jun 7, 2012 at 8:35 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 Just saying that querySelector/All doesn't match elements in a
 template (unless the scope is inside the template already) would work,
 but it means that we have to make sure that all future similar APIs
 also pay attention to this.

I think that would be preferable compared to opening the Pandora's box
of breaking the correspondence between the markup of the DOM tree.
Besides, we'd need something like this for the XML case anyway if the
position the spec takes is that it shies away from changing the
correspondence between XML source and the DOM.

In general, I think the willingness to break the correspondence
between the source and the DOM should be the same for both HTML and
XML serializations. If you believe that it's not legitimate to break
the correspondence between XML source and the DOM, it would be logical
to treat radical changes to the correspondence between HTML source and
the DOM as equally suspect.

I worry that if we take the position here that it's okay to change
your correspondence between the source and the DOM in order to
optimize for a real or perceived need, it will open the floodgates for
all sorts of arguments that we can make the parser generate whatever
data structures regardless of what the input looks like and we'll end
up in a world of pain. It's bad enough that isindex is a parser macro.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR] chunked

2012-06-07 Thread Henri Sivonen

On Thu, May 24, 2012 at 8:59 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Thu, May 24, 2012 at 2:54 AM, Jonas Sicking jo...@sicking.cc wrote:
 Is there a reason not to add chunked-text and chunked-arraybuffer
 to the spec right now?

 1. Why not just have
 http://html5labs.interoperabilitybridges.com/streamsapi/ to address
 this case?

It appears that Microsoft's proposal involves potentially buffering
the XHR response body until the Web author-supplied script chooses to
initiate I read from the stream object. Or am I missing something?

The chunked-text and the chunked-arraybuffer response types are
both simpler than the stream proposal and have the advantage that they
don't involve the browser engine buffering the XHR response body: if
the Web author-provided script fails to handle the chunks as they
arrive, the data is gone and doesn't fill up buffer space.

(Some of my remarks on IRC were confused, because I mistook the
Worker-specific API for the API proposed for the main thread.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-07 Thread Henri Sivonen

On Wed, Jun 6, 2012 at 6:49 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 Flip-flopping is irrelevant.

It's irrelevant in the sense of flip-flopping being bad in and of
itself. Changing one's mind is okay and it's good to acknowledge past
mistakes. However, in many cases if one's mind is changed after
interoperable implementations have been made available to Web authors,
it may be worse to try to fix past mistakes instead of just
acknowledging them as mistakes and trying to not make more mistakes
like that in the future.

 Only what is good for authors is.

I believe in this case not changing the way SVG script content
tokenizes would be best for authors.

 If
 deployed content would break as a result of a change, we either find a
 new way to accommodate the desired change, or drop it.  But we need
 the compat data about that breakage before we can claim that it will
 occur.

Support Existing Content is not the only relevant Design Principle
here. Degrade Gracefully is also relevant. I believe it would be bad
for authors if SVG-in-HTML content tokenized subtly differently in the
long tail of old (old when viewed from the future) browsers which is
likely to include IE9 and, at this rate, IE10 and also a mix of
versions of the Android stock browser for a long time. (Also possibly
various left-over versions of Firefox, Safari and Opera.)

Past data suggests that IE is updated slowly and that the Android
stock browser typically doesn't get updated at all (until it is
abandoned by switching to another browser or by switching to another
device).

Arguments about it being okay to violate the Degrade Gracefully
principle because the future is longer than the past (so it's always
worthwhile to make things better for the future) would apply to
pretty much all breaking changes to the Web platform and have the same
problems this time as when the argument is applied to other breaking
changes to the Web platform.

 The SVGWG would like to make things as good for authors as possible.
 Past positions don't matter, except insofar as the history of their
 effects on the specs persists.

The reason why I care about correcting recounts of past SVG working
group opinion on this topic is that I think it's better for learning
from mistakes if the learning is based on the truth of what happened.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] HTML Parsing and the template element

2012-06-07 Thread Henri Sivonen

On Wed, Jun 6, 2012 at 7:13 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 A call like document.querySelectorAll('p') doesn't *want* to get the
 p inside the template.

I think it's backwards to assume that querySelectorAll() works a
particular way and that's that's not what authors want and to change
the DOM in response.

There are various solutions that don't involve drastic changes to the
correspondence between the markup and the DOM, for example:

* Invoking querySelectorAll() on a wrapper element that's known not to
be a parent of the templates on the page.

* Using a selector that fails to match elements whose ancestor chain
contains a template element.

* Introducing an API querySelectorNonTemplate(). (Don't say All if
you don't mean *all*).

Even though XML has fallen out of favor, I think violations of the DOM
Consistency principle and features that don't work with the XHTML
serialization should be considered huge red flags indicating faulty
design.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-06 Thread Henri Sivonen

On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote:
 I think the SVG working group should learn to stand by its past
 mistakes. Not standing by them in the sense of thinking the past
 mistakes are great but in the sense of not causing further
 disturbances by flip-flopping.

 For what it's worth, I've not seen any flip-floppying on this. Over
 the years that I've asked the SVG WG the detailed question on if they
 prefer to have the parsing model for scripts in SVG-in-HTML I've
 consistently gotten the answer that they prefer this.

At the time when SVG parsing was being added to text/html, vocal
members of the SVG working group were adamant that parsing should work
the same as for XML so that output from existing tools that had XML
serializers could be copied and pasted into text/html in a text
editor. Suggestions went as far as insisting a full XML parser be
embedded inside the HTML parser.

For [citation needed], see e.g. Requirement 1 in
http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not
the only place where the requirement was expressed but the first one I
found when searching the archives) and requirements 1 and 2 as well as
the first sentence under Summary in
http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html .

 I'm also not sure how this is at all relevant here given that we
 should do what's best for authors, even when we learn over time what's
 best for authors.

At this point, what's best for authors includes considerations of
consistent behavior across already-deployed browsers (including IE9,
soon IE10 and the Android stock browser) and future browsers.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] HTML Parsing and the template element

2012-06-06 Thread Henri Sivonen

On Tue, Jun 5, 2012 at 12:42 AM, Ian Hickson i...@hixie.ch wrote:
 On Wed, 4 Apr 2012, Rafael Weinstein wrote:
 On Mon, Apr 2, 2012 at 3:21 PM, Dimitri Glazkov dglaz...@chromium.org 
 wrote:
 
  Perhaps lost among other updates was the fact that I've gotten the
  first draft of HTML Templates spec out:
 
  http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html

 I think the task previously was to show how dramatic the changes to the
 parser would need to be. Talking to Dimitri, it sounds to me like they
 turned out to be less open-heart-surgery and more quick outpatient
 procedure. Adam, Hixie, Henri, how do you guys feel about the
 invasiveness of the parser changes that Dimitri has turned out here?

 I think it's more or less ok, but it has the problem that it doesn't give
 a way to reset the insertion mode again while inside a template.

I still think that breaking the old correspondence between markup and
the DOM and shrugging the XML side off is a big mistake. Why would it
be substantially harder to check inertness by walking the parent chain
(which normally won't be excessively long) as opposed to checking a
flag on the owner document?

I strongly believe that this template contents should be children of
the template element in the DOM instead of being behind a special
wormhole to another document while parsing and serializing as if the
special wormhole wasn't there.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [manifest] screen sizes, Re: Review of Web Application Manifest Format and Management APIs

2012-06-06 Thread Henri Sivonen

On Sun, May 27, 2012 at 7:45 PM, Anant Narayanan an...@mozilla.com wrote:
 Well, we haven't received this request from developers explicitly yet, but
 one can imagine a situation in which a developer makes an app only for
 mobile phones (Instagram?) and doesn't want users to use it on desktops.
 Even though it'll technically work, it might look ugly due to scaling.

Shouldn't it be up to the user to refrain from using ugly apps instead
of the developer preventing them?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-15 Thread Henri Sivonen

On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote:
 Issue 1: How to handle tokens which precede the first start tag

 Options:
 a) Queue them, and then later run them through tree construction once
 the implied context element has been picked

 b) Create a new insertion like waiting for context element, which
 probably ignores end tags and doctype and inserts character tokens and
 comments. Once the implied context element is picked, reset the
 insertion mode appropriately, and procede normally.

I prefer b).

I'm assuming the use case for this stuff isn't that authors throw
random stuff at the API and then insert the result somewhere. I expect
authors to pass string literals or somewhat cooked string literals to
the API knowing where they're going to insert the result but not
telling the insertion point to the API as a matter of convenience.

If you know you are planning to insert stuff as a child of tbody,
don't start your string literal with stuff that would tokenize as
characters!

(Firefox currently does not have the capability to queue tokens.
Speculative parsing in Firefox is not based on queuing tokens. See
https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the
details.)

 Issue 2: How to infer a non-HTML implied context element

 Options:
 a) By tagName alone. When multiple namespaces match, prefer HTML, and
 then either SVG or MathML (possibly on a per-tagName basis)

 b) Also inspect attributes for tagNames which may be in multiple namespaces

AFAICT, the case where this really matters (if my assumptions about
use cases are right) is a. (Fragment parsing makes scripts useless
anyway by setting their already started flag, authors probably
shouldn't be adding styles by parsing style, both HTML and SVG
font are considered harmful and cross-browser support Content MathML
is far off in the horizon.)

So I prefer a) possibly with a-specific elaborations if we can come
up with some. Generic solutions seem to involve more complexity. For
example, if we supported a generic attribute for forcing SVG
interpretation, would it put us on a slippery slope to support it when
it appears on tokens that aren't the first start tag token in a
contextless fragment parse?

 Issue 3: What form does the API take

 a) Document.innerHTML

 b) document.parse()

 c) document.createDocumentFragment()

I prefer b) because:
 * It doesn't involve creating the fragment as a separate step.
 * It doesn't need to be foolishly consistent with the HTML vs. XML
design errors of innerHTML.
 * It's shorted than document.createDocumentFragment().
 * Unlike innerHTML, it is a method, so we can add more arguments
later (or right away) to refine its behavior.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML

2012-05-11 Thread Henri Sivonen

On Wed, May 9, 2012 at 7:45 PM, Rafael Weinstein rafa...@google.com wrote:
 I'm very much of a like mike with Henri here, in that I'm frustrated
 with the situation we're currently in WRT SVG  MathML  parsing
 foreign content in HTML, etc... In particular, I'm tempted to feel
 like SVG and MathML made this bed for themselves and they should now
 have to sleep in it.

I think that characterization is unfair to MathML.  The math working
group tried hard to avoid local name collisions with HTML.  They
didn't want to play namespace games.  As I understand it, they were
forced into a different namespace by W3C strategy tax arising from the
NAMESPACE ALL THE THINGS! attitude.

SVG is the language that introduced collisions with both HTML and
MathML and threw unconventional camel casing into the mix.

On Fri, May 11, 2012 at 1:44 AM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 The innerHTML API is convenient.  It lets you set the entire
 descendant tree of an element, creating elements and giving them
 attributes, in a single call, using the same syntax you'd use if you
 were writing it in HTML (module some extra quote-escaping maybe).

I'm less worried about magic in an API that's meant for representing
tree literals in JavaScript as a sort of E4H without changing the
JavaScript itself than I am about magic in APIs that are meant for
parsing arbitrary potentially user-supplied content.

If we are designing an API for the former case rather than the latter
case, I'm OK with the following magic:
 * Up until the first start tag parser behaves as in in body (Tough
luck if you want to use ![CDATA[  or U+ before the first tag,
though I could be convinced that the parser should start in a mode
that enables ![CDATA[.)
 * if the first start tag is any MathML 3 element name except set or
image, start behaving as if setting innerHTML on math (details of
that TBD) before processing the start tag token further and then
continue to behave like when setting innerHTML on math.
 * otherwise, if the first start tag is any SVG 1.1 element name
except script, style, font or a, start behaving as if setting
innerHTML on svg (details of that TBD) before processing the start
tag token further and then continue to behave like when setting
innerHTML on svg.
 * otherwise, set the insertion mode per HTML-centric template rules
proposed so far.

Open question: Should it be possible to use a magic attribute on the
first tag token to disambiguate it as MathML or SVG? xmlns=... would
be an obvious disambiguator, but the values are unwieldy.  Should
xlink:href be used as a disambiguator for a? If the use case is
putting tree literals in code, it probably doesn't make sense to use
script or style (either HTML or SVG) in that kind of context
anyway. And SVG font has been rejected by Mozilla and Microsoft
anyway.

I still think that having to create a DocumentFragment first and then
set innerHTML on it is inconvenient and we should have a method on
document that takes a string to parse and returns the resulting
DocumentFragment, e.g. document.parse(string) to keep it short.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML

2012-05-11 Thread Henri Sivonen

 also add a method on Document that parses a
string using the HTML parser regardless of the HTMLness flag of the
document and returns a DocumentFragment (or has an optional extra
argument for forcing XML parsing explicitly).


-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML

2012-05-09 Thread Henri Sivonen

On Tue, Apr 24, 2012 at 6:39 AM, Rafael Weinstein rafa...@google.com wrote:
 What doesn't appear to be controversial is the parser changes which
 would allow the template element to have arbitrary top-level content
 elements.

It's not controversial as long as an HTML context is assumed.  I think
it is still controversial for SVG and MathML elements that aren't
wrapped in an svg or math element.

 I'd like to propose that we add DocumentFragment.innerHTML which
 parses markup into elements without a context element.

Why should the programmer first create a document fragment and then
set a property on it? Why not introduce four methods on Document that
return a DocumentFragment: document.parseFragmentHTML (parses like
template.innerHTML), document.parseFragementSVG (parses like
svg.innerHTML), document.parseFragmentMathML (parses like
math.innerHTML) and document.parseFragmentXML (parses like innerHTML
in the XML mode without namespace context)? This would avoid magic for
distinguishing HTML a and SVG a.

On Thu, Apr 26, 2012 at 8:23 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 (In my dreams, we just merge SVG into the HTML namespace, and then
 this step disappears.)

In retrospect, it would have been great if Namespaces in XML had never
been introduced and SVG, MathML and HTML shared a single namespace.
However, at this point trying to merge the namespaces would lead to
chameleon namespaces which are evil and more trouble than fixing the
historical mistake is worth.  I feel very strongly that vendors and
the W3C should stay away from turning SVG into a chameleon namespace.
SVG is way more established them CSS gradients or Flexbox in terms of
what kind of changes are acceptable.

See http://lists.w3.org/Archives/Public/www-archive/2009Feb/0065.html
as well as various XML experiences from non-browser contexts.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML

2012-05-09 Thread Henri Sivonen

On Wed, May 9, 2012 at 11:39 AM, James Graham jgra...@opera.com wrote:
 document.parse(string, [auto|html|svg|mathml|xml])

Makes sense at least for the options other than auto.

 With auto being the default and doing magic, and the other options
 allowing one to disable the magic.

I worry about introducing magic into APIs.  It starts with good
intentions but can lead to regrettable results.

For MathML, if the parser contains a list of all MathML elements, the
magic can work at any point in time if the parser and the input are
from the same point in time but will fail if the parser is older
vintage than the input given to it.  (image can be treated as MathML
here, since presumably the magic image is only needed for parsing
legacy HTML pages.) Are we OK with the incorrect behavior when the
input and the parser are of different vintage? (Note that the full
document HTML parsing algorithm only has this problem for camel case
SVG elements, which is a non-problem if the SVG working group can
refrain from introducing more camel case elements.)

With SVG, there's the problem that a is common and ambiguous.  It
seems bad to introduce an API that does reliable magic for almost
everything but is unreliable for one common thing.  Solving this
problem with lookahead would be bad, because it would be surprising
for a in a/a and a in apath//a to mean different things.
 Solving this problem with chameleon namespaces would introduce worse
problems.

I don't see any good way to solve the contextless a vs. a problem
with magic.  (If SVG moves away from xlink:href, we can't even use
attributes to disambiguate.)

If we end up doing (flawed) list-based magic, we need to make sure the
SVG working group is on board and henceforth avoids local name
collisions with HTML and MathML.

If the API required the caller to request SVG explicitly, would it be
a sure thing that jQuery would build magic heuristics on the library
side?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] HTML Parsing and the template element

2012-04-18 Thread Henri Sivonen

On Tue, Apr 3, 2012 at 1:21 AM, Dimitri Glazkov dglaz...@chromium.org wrote:
 Perhaps lost among other updates was the fact that I've gotten the
 first draft of HTML Templates spec out:

 http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html

Once parsed, the template contents must not be in the document tree.

That's surprising, radical and weird.  Why are the template contents
hosted in a document fragment that the template element points to
using a non-child property?  Why aren't the template contents simply
hosted as a subtree rooted at the template element?

This also breaks the natural mapping between XML source and the DOM in
the XML case.

This weirdness also requires a special case to the serialization algorithm.

If the document fragment wasn't there and the contents of the template
were simply children of template element, the parsing algorithm
changes would look rather sensible.

Wouldn't it make more sense to host the template contents as normal
descendants of the template element and to make templating APIs accept
either template elements or document fragments as template input?  Or
to make the template elements have a cloneAsFragment() method if the
template fragment is designed to be cloned as the first step anyway?

When implementing this, making embedded content inert is probably the
most time-consuming part and just using a document fragment as a
wrapper isn't good enough anyway, since for example img elements load
their src even when not inserted into the DOM tree. Currently, Gecko
can make imbedded content inert on a per-document basis.  This
capability is used for documents returned by XHR, createDocument and
createHTMLDocument. It looks like the template proposal will involve
computing inertness from the ancestor chain (template ancestor or
DocumentFragment marked as inert as an ancestor).  It's unclear to me
what the performance impact will be.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [Clipboard] Mathematical Proofs in HTML5 Documents‏

2012-04-03 Thread Henri Sivonen

On Tue, Apr 3, 2012 at 4:57 AM, Adam Sobieski adamsobie...@hotmail.com wrote:
 MathML3 includes annotation and annotation-xml elements which can
 provide parallel representations of mathematical semantics

 1. Having entire proofs in math elements. Proof formats could then express
 semantics in annotation or annotation-xml elements. OpenMath content
 dictionaries could come to exist for mathematical proof structures.

 2. Having proofs in HTML5 document structure, possibly containing one or
 more math element instances, while utilizing XML attributes from other
 XMLNS.

Does any browser currently support any kind of a XML-based clipboard
flavor? If you transfer MathML islands using an HTML clipboard flavor,
you can't use arbitrary namespaces.

 3. Having proofs in HTML5 document structure, possibly containing one or
 more math element instances, while utilizing RDFA
 (http://dev.w3.org/html5/rdfa/). Proof structure and semantics can overlay
 the HTML5 and/or the RDFA can relate elements to referenced external
 resources.

What kind of software do expect to consume of this kind of data?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [webcomponents] HTML Parsing and the template element

2012-02-08 Thread Henri Sivonen

On Thu, Feb 9, 2012 at 12:00 AM, Dimitri Glazkov dglaz...@chromium.org wrote:
 == IDEA 1: Keep template contents parsing in the tokenizer ==

Not this!

Here's why:
Making something look like markup but then not tokenizing it as markup
is confusing. The confusion leads to authors not having a clear mental
model of what's going on and where stuff ends. Trying to make things
just work for authors leads to even more confusing here be dragons
solutions. Check out
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-double-escaped-dash-dash-state

Making something that looks like markup but isn't tokenized as markup
also makes the delta between HTML and XHTML greater. Some people may
be ready to throw XHTML under the bus completely at this point, but
this also goes back to the confusion point. Apart from namespaces, the
mental model you can teach for XML is remarkably sane. Whenever HTML
deviates from it, it's a complication in the understandability of
HTML.

Also, multi-level parsing is in principle bad for perf. (How bad
really? Dunno.) I *really* don't want to end up writing a single-pass
parser that has to be black-box indishtinguishable from something
that's defined as a multi-pass parser.

(There might be a longer essay about how this sucks in the public-html
archives, since the SVG WG proposed something like this at one point,
too.)

 == IDEA 2: Just tweak insertion modes ==

I think a DWIM insertion mode that switches to another mode and
reprocesses the token upon the first start tag token *without* trying
to return to the DWIM insertion mode when the matching end tag is seen
for the start tag that switched away from the DWIM mode is something
that might be worth pursuing. If we do it, I think we should make it
work for a fragment parsing API that doesn't require context beyound
assuming HTML, too. (I think we shouldn't try to take the DWIM so far
that a contextless API would try to guess HTML vs. SVG vs. MathML.)

The violation of the Degrade Gracefully principle and tearing the
parser spec open right when everybody converged on the spec worry me,
though. I'm still hoping for a design that doesn't require parser
changes at all and that doesn't blow up in legacy browsers (even
better if the results in legacy browsers were sane enough to serve as
input for a polyfill).

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Obsolescence notices on old specifications, again

2012-01-23 Thread Henri Sivonen

On Mon, Jan 23, 2012 at 11:01 AM, Ms2ger ms2...@gmail.com wrote:
 I propose that we add a pointer to the contemporary specification to the
 following specifications:

 * DOM 2 Core (DOM4)
 * DOM 2 Views (HTML)
 * DOM 2 Events (D3E)
 * DOM 2 Style (CSSOM)
 * DOM 2 Traversal and Range (DOM4)
 * DOM 2 HTML (HTML)
 * DOM 3 Core (DOM4)

 and a recommendation against implementing the following specifications:

 * DOM 3 Load and Save
 * DOM 3 Validation

 Hearing no objections, I'll try to move this forward.

I support adding such notices to the above-mentioned specs.

On Mon, Jan 23, 2012 at 10:38 PM, Glenn Adams gl...@skynav.com wrote:
 I work in an industry where devices are certified against final
 specifications, some of which are mandated by laws and regulations. The
 current DOM-2 specs are still relevant with respect to these certification
 processes and regulations.

I think proliferating obsolete stuff is harmful.

Which laws or regulations require compliance with some of the
above-mentioned specs? Have bugs been filed on those laws and
regulations?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR] responseType json

2011-12-13 Thread Henri Sivonen

On Mon, Dec 12, 2011 at 7:08 PM, Jarred Nicholls jar...@sencha.com wrote:
 There's no feeding (re: streaming) of data to a parser, it's buffered until
 the state is DONE (readyState == 4) and then an XML doc is created upon the
 first access to responseXML or response.  Same will go for the JSON parser
 in our first iteration of implementing the json responseType.

FWIW, Gecko parses XML and HTML in a streaming way as data arrives
from the network. When readyState changes to DONE, the document has
already been parsed.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR] responseType json

2011-12-12 Thread Henri Sivonen

On Sun, Dec 11, 2011 at 4:08 PM, Jarred Nicholls jar...@sencha.com wrote:
  A good compromise would be to only throw it away (and thus restrict
 responseText access) upon the first successful parse when accessing
 .response.

I disagree. Even though conceptually, the spec says that you first
accumulate text and then you invoke JSON.parse, I think we should
allow for implementations that feed an incremental JSON parser as data
arrives from the network and throws away each input buffer after
pushing it to the incremental JSON parser.

That is, in order to allow more memory-efficient implementations in
the future, I think we shouldn't expose responseText for JSON.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR] responseType json

2011-12-02 Thread Henri Sivonen

On Fri, Dec 2, 2011 at 3:41 PM, Robin Berjon ro...@berjon.com wrote:
 On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote:
 I tied it to UTF-8 to further the fight on encoding proliferation and 
 encourage developers to always use that encoding.

 That's a good fight, but I think this is the wrong battlefield. IIRC (valid) 
 JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are 
 detectable rather easily. The only thing this limitation is likely to bring 
 is pain when dealing with resources outside one's control.

Browsers don't support UTF-32. It has no use cases as an interchange
encoding beyond writing evil test cases. Defining it as a valid
encoding is reprehensible.

Does anyone actually transfer JSON as UTF-16?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: XPath and find/findAll methods

2011-11-29 Thread Henri Sivonen

On Tue, Nov 29, 2011 at 7:33 AM, Liam R E Quin l...@w3.org wrote:
 (2) Not a dead end

 XSLT 1 and XPath 1 are not evolutionary dead ends although it's true
 that neither the xt nor the libxml2 library supports XSLT 2 and XPath 2.
 There's some support (along with XQuery) in the Qt libraries, and also
 in C++ with XQilla and Zorba.  There are maybe 50 implementations of
 XPath 2 and/or XQuery 2 that I've encountered.  XQuery 3.0 and XPath 3.0
 are about to go to Last Call, we hope, and XSLT 3.0 to follow next year.
 The work is very much active and alive.

Sure, XPath and XSLT keep being developed. What I meant by
evolutionary dead end is that the XPath 1.0-compatibile evolutionary
path has been relegated to a separate mode instead of XPath 2.0 and
newer being compatible by design. So the new development you cite
happens with Compatibility Mode set to false. To remain compatible
with existing content, browsers would presumably have to live in the
Compatibility Mode set to true world, which would mean browsers living
on a forked evolutionary path that isn't the primary interest of the
WGs working on the evolution.

I don't have enough data about existing XPath-using Web content to
know how badly the Web would break if browsers started interpreting
existing XPath (1.x) expressions as XPath 2.x expression with
Compatibility Mode set to false, but the fact that the WG felt that it
needed to define a compatibility mode suggests that the WG itself
believed the changes to be breaking ones.

    /html/body/div/p[@id = /html/head/link[@rel = 'me']/@src]/strong

This example depends on unprefixed name expressions matching the
(X)HTML namespace when tested against an element and no namespace when
tested against attributes. And that trick only works with (X)HTML
nodes.

Selectors have the advantage that they wildcard the namespace by
default, so it's feasible to define APIs that don't even have
namespace binding mechanisms.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: XPath and find/findAll methods

2011-11-25 Thread Henri Sivonen

On Thu, Nov 24, 2011 at 5:19 PM, Julian Reschke julian.resc...@gmx.de wrote:
 Well, the use case is to allow browsers to move to XPath2/XSLT2 at some
 point in the future, without having to maintain another engine.

Sorry about bringing up the XPath2 rathole that's now expanding into
the XSLT2 rathole.

My point was that since XPath2/XSLT2 made incompatible changes, there
isn't a smooth path for moving to XPath2/XSLT2 proper in browsers in
the future even if browser vendors felt that it was worthwhile to
expend the effort. There seems to be a potential migration path to
XPath2_compat/XSLT2_compat, though, but do the people who want
XPath2/XSLT2 want just the compat mode variants or the variants that
the relevant WG treats as the primary ones?

In any case, I think XPath2/XSLT2 have a bad investment/payoff ratio
from the browser point of view, so I think it makes sense for people
who want to use XSLT2 in browsers to license Saxon-CE (XSLT2
implemented in JavaScript) from Saxonica instead of expecting native
implementations in browsers.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] HTML in XHR implementation feedback

2011-11-24 Thread Henri Sivonen

On Mon, Nov 21, 2011 at 8:26 PM, Jonas Sicking jo...@sicking.cc wrote:
 On Wed, Nov 16, 2011 at 2:40 AM, Henri Sivonen hsivo...@iki.fi wrote:
  * For text/html responses for response type  and document, the
 character encoding is established by taking the first match from this
 list in this order:
   - HTTP charset parameter
   - BOM
   - HTML-compliant meta prescan up to 1024 bytes.
   - UTF-8

 I still think that we are putting large parts of the world at a
 significant disadvantage here since they would not be able to use this
 feature together with existing content, which I would imagine is a
 large argument for this feature at all.

 Here is what I propose. How about we add a .defaultCharset property.
 When not set we use the list as described above. If set, the contents
 of .defaultCharset is used in place of UTF8.

I think that makes sense as a solution if it turns out that a solution
is needed. I think adding that feature now would be a premature
addition of complexity--especially considering that responseText has
existed for this long with a UTF-8 default without a .defaultCharset
property.

  * When there is no HTTP-level charset parameter, progress events are
 stalled and responseText made null until the parser has found a BOM or
 a charset meta or has seen 1024 bytes or the EOF without finding
 either BOM or charset meta.

 Why? I wrote the gecko code specifically so that we can adjust
 .responseText once we know the document charset. Given that we're only
 scanning 1024 bytes, this shouldn't ever require more than 1024 bytes
 of extra memory (though the current implementation doesn't take
 advantage of that).

I meant that stalling stops at EOF if the file is shorter than 1024
bytes. However, this point will become moot, because supporting HTML
parsing per spec in the default mode broke Wolfram Alpha and caused
wasteful parsing on Gmail, so per IRC discussion with Anne and Olli,
I'm preparing to limit HTML parsing to responseType == document
only.

  * Making responseType ==  not support HTML parsing at all and to
 treat text/html as an unknown type for the purpose of character
 encoding.

 I don't understand what the part after the and means. But the part
 before it sounds quite interesting to me. It would also resolve any
 concerns about breaking existing content.

The part after and means the old behavior. This is now the plan.

On Mon, Nov 21, 2011 at 8:28 PM, Jonas Sicking jo...@sicking.cc wrote:
 The side effect is that meta prescan doesn't happen in the
 synchronous mode for text/html resources. This is displeasingly
 inconsistent but makes sense if the sync mode is treated as an evil
 legacy feature rather than as an evolving part of the platform.

 I'm not sure what this means. Aren't we only doing meta prescan when
 parsing a HTML document?

The meta prescan is done only when parsing HTML.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: XPath and find/findAll methods

2011-11-24 Thread Henri Sivonen

On Wed, Nov 23, 2011 at 11:05 PM, Julian Reschke julian.resc...@gmx.de wrote:
 could you elaborate on what you mean by no smooth evolutionary path to
 XPath 2.x? (Are you referring to specs or to implementations?)

Specs. XPath 2.0 changed XPath in an incompatible way. There's a
compatibility mode for interpreting existing XPath 1.0 queries, but
it seems like a bad idea to build on a spec whose authors have put
compatibility into a side mode and what's considered the main thing
isn't fully compatible with existing queries.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: XPath and find/findAll methods

2011-11-24 Thread Henri Sivonen

On Thu, Nov 24, 2011 at 3:49 PM, Robin Berjon ro...@berjon.com wrote:
  Node.prototype.queryXPath = function (xpath) {

 So, now for the money question: should we charter this?

Since IE and Opera already have a solution, it seems to me that unless
that solution has bad flaws, it would make more sense to spec what
they already support instead inventing a new API.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: XPath and find/findAll methods

2011-11-22 Thread Henri Sivonen

On Tue, Nov 22, 2011 at 12:28 AM, Martin Kadlec bs-ha...@myopera.com wrote:
 Only reason why XPath is dead on the web is because there is not (yet) easy 
 way to use it.

It's worth noting that XPath in browsers is XPath 1.0 which doesn't
have a smooth evolutionary path to XPath 2.x, so browser XPath is an
evolutionary dead end unless forked on a different evolutionary path
than W3C XPath.

Even though XPath might be very important to its user base, in the big
picture it isn't the kind of Web platform feature that would generate
a lot of Web developer mindshare if a browser vendor invested in it.
Chances are that investments in CSS always have a higher return on
investment (in terms of Web developer mindshare) than investments in
XPath. In this situation, I expect there to be no enthusiasm for
polishing what's an evolutionary dead end (XPath 1.0) or for launching
something incompatible that'd require a lot of up-front work (XPath
2.x) while still having to support the existing evolutionary dead end.

Furthermore, XPath 2.x would be a slippery slope towards dependencies
on XML Schema. Even though it's an optional feature, it's prudent to
leave a wide safety margin around optional features. Otherwise,
there's a risk of getting sucked into implementing bad optional
features anyway.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: TAG Comment on

2011-11-21 Thread Henri Sivonen

On Mon, Nov 21, 2011 at 12:05 PM, Anne van Kesteren ann...@opera.com wrote:
 On Mon, 21 Nov 2011 00:47:05 +0100, Mark Nottingham m...@mnot.net wrote:

 For example, some browsers still (!) support blink, but that doesn't
 mean we should promote its use.

 FWIW, blink is defined as a feature in HTML5 browsers are expected to
 implement.

Conflating specs with promotion worries me. In particular, it worries
me that the specs == promotion mindset might lead to hiding some
features from the spec, which would lead back to the bad old days when
spec were not even seriously trying to contain the description of what
needs to be implemented in order to successfully render the Web.

By all means put some kind of Surgeon General's warning about race
conditions on localStorage but, please, let's not hide the description
of the feature from specs.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] HTML in XHR implementation feedback

2011-11-21 Thread Henri Sivonen

On Wed, Nov 16, 2011 at 12:40 PM, Henri Sivonen hsivo...@iki.fi wrote:
  * Making XHR not support HTML parsing in the synchronous mode.

In reference to the other thread about discouraging synchronous XHR
(outside Workers), this change ended up being made in Gecko. (HTML
parsing in XHR still hasn't made its way to the Nightly channel, so
don't expect to see it quite yet.)

The side effect is that meta prescan doesn't happen in the
synchronous mode for text/html resources. This is displeasingly
inconsistent but makes sense if the sync mode is treated as an evil
legacy feature rather than as an evolving part of the platform.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[XHR2] HTML in XHR implementation feedback

2011-11-16 Thread Henri Sivonen

I landed support for HTML parsing in XHR in Gecko today. It has not
yet propagated to the Nightly channel.

Here's how it behaves:

 * Contrary to the spec, for response types other than  and
document, character encoding determination for text/html happens the
same way as for unknown types.

 * For text/html responses for response type  and document, the
character encoding is established by taking the first match from this
list in this order:
   - HTTP charset parameter
   - BOM
   - HTML-compliant meta prescan up to 1024 bytes.
   - UTF-8

 * In particular, the following have no effect on the character encoding:
   - meta discovered by the tree builder algorithm
   - The user-configurable fallback encoding
   - Locale-specific defaults
   - The encoding of the document that invoked XHR
   - Byte patterns in the response (beyond BOM and meta). Even the
BOMless UTF-16 detection that Firefox does when heuristic detection
has otherwise been turned off is skipped for XHR.

 * When there is no HTTP-level charset parameter, progress events are
stalled and responseText made null until the parser has found a BOM or
a charset meta or has seen 1024 bytes or the EOF without finding
either BOM or charset meta.

 * If the response is a multipart response, XHR behaves as if it
didn't support HTML parsing for the subparts of the response. (The
multipart handling infrastructure in Gecko makes assumptions that are
incorrect for the off-the-main-thread parsing infrastructure. Since
the plan is to move XML parsing off the main thread, too, we'll need
to find out whether multipart support is a worthwhile feature to keep.
If it is, we need to add some mechanisms to make multipart work when
subparts are parsed off the main thread or. If not, we should drop the
feature, in my opinion.)

 * HTML parsing is supported in the synchronous mode, but I'd be quite
happy to remove that support in order to curb sync XHR proliferation.

 * I believe the implementation otherwise matches the spec, but
exposing the document via responseXML should be considered to be at
risk. See below.

Risks:

 * Stalling progress events while waiting for meta could, in theory,
deadlock an existing Web app when the Web app does long polling with
responseType == , gets a text/html response without a charset
declaration, the first chunk of the response is shorter than 1024
bytes and the server won't send more before the client side informs
the server via another channel that the first chunk has been
processed.
   - If this turns out to be a Real Problem, my plan is to make
responseText show decoded text up to the first byte that isn't one of
0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 -
0x5A, and 0x61 - 0x7A.
   - I think this risk is low.

 * responseXML now becomes non-null for HTTP error responses that have
a text/html response body. This might be a problem if Web apps that
expect to get XML responses check for HTTP errors by checking
responseXML for null. We'll see how bad breakage nightly testers
report.
   - I think this risk is high.
   - If this turns out to be a Real Problem, the solution would be to
make HTML parsing (including the meta prescan) available only when
responseType == document. (Note that xhr.response maps to
responseText when responseType == , so if responseXML is made null
when responseType == , xhr.response wouldn't work for retrieving the
tree.) This change might even be a good idea performance-wise to avoid
adding HTML parsing overhead for legacy uses of XHR that don't set
responseType.

Spec change proposals so far:

 * I suggest making responseType modes other than  and document
not consider the internal character encoding declarations in HTML (or
XML).

Spec change proposals that I'm not making yet but might make in near future:

 * Making responseType ==  not support HTML parsing at all and to
treat text/html as an unknown type for the purpose of character
encoding.

 * Making XHR not support HTML parsing in the synchronous mode.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: innerHTML in DocumentFragment

2011-11-11 Thread Henri Sivonen

On Fri, Nov 11, 2011 at 11:49 AM, Anne van Kesteren ann...@opera.com wrote:
 Unfortunately style and script are parsed differently depending on
 if they live in foreign content or not. However this is something we
 can fix, and would lead to a lot of other benefits (such as scripts
 parsed consistently when moved around in the markup).

 I do agree it would make sense to parse these consistently throughout
 text/html.

I think http://www.w3.org/Bugs/Public/show_bug.cgi?id=10901 should
remain as WONTFIX.

 * We have interop between Gecko, WebKit, Trident (since IE9 on this
point) and Presto (once Ragnarök ships). Hooray! Interop is hard. When
it has been achieved, we shouldn't self-sabotage it.

 * Even if we considered the replacement cycles of Firefox and Chrome
to be fast enough that Firefox or Chrome legacy didn't matter,
Microsoft has deployed XML-style tokenization of SVG in IE. (Apple has
deployed the parsing in Safari, too.) Microsoft says IE9 will be
supported until January 2020. Even if IE9 doesn't have a large active
userbase all the way until 2020, I think authors who try to make Web
content that works would be worse of if we had a period of even a
couple of years with browsers that support SVG-in-text/html tokenizing
scriptstyle content substantially differently. (Which would be the
case if we changed the tokenization but MS didn't agree to issue such
a drastic change as a patch for IE9.)

 * If we changed SVG style to tokenize like HTML style, we'd most
likely end up breaking the kind of copypaste scenarios that the SVG
WG really wanted to work in the first place. (This argument also
applies to script, but style is even more likely to occur in the kind
of content one would expect to be able to copypaste and have it just
work.) Maybe the SVG WG has changed their mind now, but we shouldn't
let them flipflop now that we've reached interop.

This ship has sailed.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: innerHTML in DocumentFragment

2011-11-11 Thread Henri Sivonen

On Fri, Nov 11, 2011 at 1:11 PM, Jonas Sicking jo...@sicking.cc wrote:
 Microsoft has expressed support for changing the parser here.

As a patch for IE9?

 Have you ever actually talked to the SVG WG about this specific issue?

Yes, at the time foreign lands were being specced in HTML and the SVG
WG had to be dragged in kicking and screaming, because they didn't
want to SVG-in-HTML to be supported at all at first.

 If not, please stop arguing that the SVG group wants the currently
 specced behavior.

I'm not arguing what they want it *today*. I'm saying what they wanted
earlier and why doing something different would be bad.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: innerHTML in DocumentFragment

2011-11-11 Thread Henri Sivonen

On Thu, Nov 10, 2011 at 7:32 PM, Jonas Sicking jo...@sicking.cc wrote:
 I don't think we should make up rules for where it makes sense to
 insert DOM and where it doesn't. After all, we support .innerHTML on
 all HTML elements (and soon maybe all Elements), and not just a subset
 of them, right?

Yes, but with innerHTML on elements, we always have a context node, so
there isn't magic DWIM involved.

But you don't need to look far to find special cases with difficult
elements: We also support createContextualFragment with all possible
contexts except we special-case things so that if the context is html
in the (X)HTML namespace, the behavior is as if the context had been
body in the (X)HTML namespace.

On reasonable view is that solutions should always be complete and
make sense (for some notion of making sense) for all inputs. Another
reasonable view is that we shouldn't do anything with for compleness
as the rationale and that everything that needs notable additional
engineering needs to be justified by use cases. If no one really wants
to use DWIM parsing to create a DocumentFragment that has the html
element in the (X)HTML namespace as its child, why put the engineering
effort into supporting such a case?

Currently, per spec (and Ragnarök, Chrome and Firefox comply and
interoperate), if you take take an HTML document that has head and
body (as normal) and assign document.body.outerHTML =
document.body.outerHTML, you get an extra head so that the document
has 2 heads. Would you expend engineering effort, for the sake of
making sense in all cases for completeness, to get rid of the extra
head even though there are likely no use cases and 3 out of 4 engines
interoperate while complying with the spec?

 And requiring that a context node is passed in in all cases when HTML
 is parsed is terrible developer ergonomics.

One possibility is that instead of adding innerHTML to
DocumentFragment, we add three methods to Document:
DocumentFragment parseFragment(DOMString htmlMarkup);
DocumentFragment parseSvgFragment(DOMString svgMarkup);
DocumentFragment parseMathFragment(DOMString mathmlMarkup);

parseFragment would do roughly the kind of DWIM Yehuda suggested. That
is, you'd get to use tr with it but not html. parseSvgFragment
would invoke the HTML fragment parsing algorithm with svg in the SVG
namespace as the context. parseMathFragment would invoke the HTML
fragment parsing algorithm with math in the MathML namespace as the
context.

As a bonus, developers would need to call createDocumentFragement() first.

 frag.innerHTML = g/g;
 someSVGElement.appendChild(frag);

 seems very possible to make work

Making it work is a problem with a.

I think we should have three DocumentFragment-returning parsing
methods instead of packing a lot of magic into innerHTML on
DocumentFragment, when having to obtain a DocumentFragment first and
filling it as a separate step sucks as far as developer ergonomics go.

 someTableElement.innerHTML = tr.../trdiv/div;

 will just drop the div on the floor.

By what mechanism? (It didn't implement and run Yehuda's suggestion,
but I'm pretty sure it wouldn't drop the div. Why would we put
additional effort into dropping the div?)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: innerHTML in DocumentFragment

2011-11-11 Thread Henri Sivonen

On Fri, Nov 11, 2011 at 1:42 PM, Henri Sivonen hsivo...@iki.fi wrote:
 As a bonus, developers would need to call createDocumentFragement() first.

Doh. Would *not* need to call createDocumentFragement() first.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: innerHTML in DocumentFragment

2011-11-10 Thread Henri Sivonen

On Fri, Nov 4, 2011 at 1:03 AM, Yehuda Katz wyc...@gmail.com wrote:
 It would be useful if there was a way to take a String of HTML and parse it
 into a document fragment. This should work even if the HTML string contains
 elements that are invalid in the in body insertion mode.
 Something like this code should work:
   var frag = document.createDocumentFragment();
   frag.innerHTML = trtdhello/td/tr
   someTable.appendChild(frag)

It's easy for me to believe that there are valid use cases where the
first tag encountered is tr.

 This would probably require a new, laxer insertion mode, which would behave
 similarly to the body insertion mode, but with different semantics in the A
 start tag whose tag name is one of: caption, col, colgroup, frame,
 head, tbody, td, tfoot, th, thead, tr case.

What are the use cases for having this work with head and frame as
first-level tags in the string? Do you also want it work with html,
body and frameset?

What about SVG and MathML elements?

I totally sympathize that this is a problem with tr, but developing
a complete solution that works sensibly even when you do stuff like
frag.innerHTML = head/head
frag.innerHTML = headdiv/div/head
frag.innerHTML = frameset/frameseta!-- b --
frag.innerHTML = htmlbodyfoo/htmlbartr/tr
frag.innerHTML = htmlbodyfoo/htmltr/tr
frag.innerHTML = div/divtr/tr
frag.innerHTML = tr/trdiv/div
frag.innerHTML = gpath//g
is a much trickier problem than you tr example makes it first seem.

Do you have use cases for tags other than tr appearing as the outermost tag?

What would you expect the my examples above to do and why?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: innerHTML in DocumentFragment

2011-11-10 Thread Henri Sivonen

On Fri, Nov 4, 2011 at 2:54 PM, João Eiras jo...@opera.com wrote:
 * stripScripts is a boolean that tells the parser to strip unsafe content
 like scripts, event listeners and embeds/objects which would be handled by a
 3rd party plugin according to user agent policy.

According to user agent policy is a huge interoperability problem.
(IIRC, Collin Jackson listed IE's toStaticHTML as an example of a bad
security feature for this reason in his USENIX talk.)

If we expose an HTML sanitizer to Web content as a DOM API, we should
have a clear normative spec that says what exactly the sanitizer does.
Stuff to debate includes what to do about Content MathML, what to do
about object elements that appear to reference SVG and what to do
about embed elements that bear Microdata attributes.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Sanatising HTML content through sandboxing

2011-11-10 Thread Henri Sivonen

On Wed, Nov 9, 2011 at 9:54 AM, Adam Barth w...@adambarth.com wrote:
 Also, a div doesn't represent a security boundary.  It's difficult to
 sandbox something unless you have a security boundary around it.
 IMHO, an easy way to solve this problem is to just exposes an
 HTMLParser object, analogous to DOMParser, which folks can use to
 safely parse HTML,

DOMParser.parseFromString already takes a content type as the second
argument. The plan is to support HTML parsing when the second argument
is text/html.

 e.g., from XMLHttpRequest.

XMLHttpRequest Level 2 has built-in support for HTML parsing. No need
to first get responseText and then pass it to something else.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] responseText for text/html before the encoding has stabilized

2011-11-07 Thread Henri Sivonen

On Mon, Nov 7, 2011 at 9:57 AM, Jonas Sicking jo...@sicking.cc wrote:
 It would be really nice if we could move forward with this thread.

I was planning on reporting back when I have something that passes all
mochitests. This has been delayed by other stuff. Particularly fallout
from the new View Source highlighter.

 My preference is still to not do any HTML/XML specific processing when
 .responseType is set to anything other than  or document. This
 allows us to make encoding handling consistent for text and a
 possible future incremental text type.

My patch doesn't do HTML-specific processing when responseType is not
 or document.

 Also, the current spec leads to quite strange results if we end up
 supporting more text-based formats directly in XHR. For example in
 Gecko we've added experimental support for parsing into JSON. If we
 added this to a future version of XHR, this would mean that if a JSON
 resource was served as a text/html Content-Type, we'd simultaneously
 parse as HTML in order to detect encoding, and JSON in order to return
 a result to the page.

responseType ==  being weird that way with XML isn't new. I guess
the main difference is that mislabeling JSON as text/html might be
more probable than mislabeling as XML when e.g. PHP default to
text/html responses.

One way to address this is to not support new response types with
responseType ==  and force authors to set responseType to json if
they want to read responseJSON.

 So what I suggest is that we make the current steps 4 and 5 *only*
 apply if .responseType is set to  or document. This almost matches
 what we've implemented in Gecko, though in gecko we also skip step 6
 which IMHO is a bug (if for no other reason, we should skip a UTF8 BOM
 if one is present).

Makes sense.

 As to the question which HTML charset encoding-detection rules to
 apply when .responseType is set to  or document and content is
 served as HTML I'm less sure what the answer is. It appears clear that
 we can't reload a resource the same way normal page does when hitting
 a meta which wasn't found during prescan and which declares a
 charset different from the one currently used.

 However my impression is that a good number of HTML documents out
 there don't use UTF8 and do declare a charset using meta within the
 first 1024 bytes. Additionally I do hear *a lot* that authors have a
 hard time setting HTTP header due to not having full access to
 configurations of their hosting server (as well as configurations
 being hard to do even when access is available).

 Hence it seems like we at least want to run the prescan, though if
 others think otherwise I'd be interested to hear.

My current patch runs the prescan.

 There is also the issue of if we should take into account the encoding
 of the page which started the XHR (we do for navigation at least in
 Gecko), as well as if we should take user settings into account. I
 still believe that we'll exclude large parts of the world from
 transitioning to developing AJAX based websites if we drop all of
 these things, however I have not yet gathered that data.

I think we shouldn't take the encoding of the invoking page into
account. We have an excellent opportunity to avoid propagating that
kind of legacy badness. I think we should take the opportunity to make
a new feature less crazy.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] responseText for text/html before the encoding has stabilized

2011-10-03 Thread Henri Sivonen

On Fri, Sep 30, 2011 at 8:05 PM, Jonas Sicking jo...@sicking.cc wrote:
 Unless responseType== or responseType==document I don't think we
 should do *any* HTML or XML parsing. Even the minimal amount needed to
 do charset detection.

I'd be happy to implement it that way.

 For responseType==text we currently *only* look at http headers and
 if nothing is found we fall back to using UTF8. Though arguably we
 should also check for a BOM, but don't currently.

Not checking for the BOM looks like a bug to me though not a
particularly serious one given that the default is UTF-8, so the
benefit of checking the BOM is that people can use UTF-16. But using
UTF-16 on the wire is a bad idea anyway.

This could be fixed for consistency without too much hardship but it's
rather useless use of developer time.

On Fri, Sep 30, 2011 at 9:05 PM, Ian Hickson i...@hixie.ch wrote:
 So... the prescanning is generally considered optional

I consider that a spec bug. For the sake of well-defined behavior, I
think the spec should require buffering up to 1024 bytes in order to
look for a charset meta without a timeout (but buffering should stop
as soon as a charset meta has been seen, so that if the meta
appears early, there's no useless stalling until the 1024-byte
boundary).

 (the only benefit
 really is that it avoids reloads in bad cases), and indeed implementations
 are somewhat encouraged to abort it early if the server only sent a few
 bytes (because that will shorten the time until something is displayed).

Firefox has buffered up to 1024 bytes without a timeout since Firefox
4. I have received no reports of scripts locking due to the buffering.
There have been a couple of reports of incremental display of progress
messages having become non-incremental, but those were non-fatal and
easy to fix (by declaring the encoding).

 Also, it has a number of false-positives, e.g. it doesn't ignore the
 contents of script elements.

I think restarts with scripts are much worse than mostly-theoretical
false positives. (If someone puts a charset meta inside a script,
they are doing it very wrong.)

 Do we really want to put it into the critical path in this way?

For responseType ==  and responseType == document, I think doing
so would be less surprising than ignoring meta. For responseType ==
text and responseType == chunked-text or any response type that
doesn't actually involve running the full HTML parser, I'd rather not
run the meta prescan, either.

 I agree that the reloading alternative is even worse.

Yes.

 What about just
 relying on the Content-Type charset= and defaulting to UTF-8 if it isn't
 there, and not doing any in-page stuff?

That would be easy to implement, but it would be strange not to
support some ways of declaring the encoding that are considered
conforming by HTML.

 How is the encoding determined for, e.g., text/plain or text/css files
 brought down through XHR and viewed through responseText?

Per spec, @charset isn't honored for text/css, so in that sense, not
honoring meta would be consistent. However, I'd be hesitant to stop
honoring the XML declaration for XML, since the could well be content
depending on it. XML and CSS probably won't end up being treated
consistently with each other. But then, XHR doesn't support parsing
into a CSS OM.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] Avoiding charset dependencies on user settings

2011-09-30 Thread Henri Sivonen

On Thu, Sep 29, 2011 at 11:27 PM, Jonas Sicking jo...@sicking.cc wrote:
 Finally, XHR allows the programmer using XHR to override the MIME
 type, including the charset parameter, so if the person adding new XHR
 code can't change the encoding declarations on legacy data, (s)he can
 override the UTF-8 last resort from JS (and a given repository of
 legacy data pretty often has a self-consistent encoding that the XHR
 programmer can discover ahead of time). I think requiring the person
 adding XHR code to write that line is much better than adding more
 locale and/or user setting-dependent behavior to the Web platform.

 This is certainly a good point, and is likely generally the easiest
 solution for someone rolling out a AJAX version of a new website
 rather than requiring webserver configuration changes. However it
 still doesn't solve the case where a website uses different encodings
 for different documents as described above.

If we want to *really* address that problem, I think the right way to
address it in XHR would be to add a way to XHR to override the HTML
last resort encoding so that authors who are dealing with a content
repository migrated partially to UTF-8 can set the last resort to the
legacy encoding they know they have instead of ending up overriding
the whole HTTP Content-Type for the UTF-8 content. (I'm assuming here
that if someone is migrating a site from a legacy encoding to UTF-8,
the UTF-8 parts declare that they are UTF-8. Authors who migrate to
UTF-8 but are *still* after realizing that legacy encodings suck UTF-8
rocks too clueless to *declare* that they use UTF-8 don't deserve any
further help from browsers, IMO.)

 I'm particularly keen to hear how this will affect locales which do
 not use ascii by default. Most of the contents I personally consume is
 written in english or swedish. Most of which is generally legible even
 if decoded using the wrong encoding. I'm under the impression that
 that is not the case for for example Chinese or Hindi documents. I
 think it would be sad if we went with any particular solution here
 without consulting people from those locales.

The old way of putting Hindi content on the Web relied on
intentionally misencoded downloadable fonts. From the browser's point
of view, such deep legacy text is Windows-1252. Hindi content that
works without misencoded fonts is UTF-8. So I think Hindi isn't
relevant to this thread.

Users in CJK and Cyrillic locales are the ones most hurt by authors
not declaring their encodings (well, actually, readers of CJK and
Cyrillic languages whose browsers are configured for other locales are
hurt *even* more), so I think it would be completely backwards for
browsers to complicate new features in order to enable authors in the
CJK and Cyrillic locales deploy *new* features and *still* not declare
encodings. Instead, I think we should design new features to make
authors everywhere get their act together and declare their encodings.
(Note that this position is much less extreme than the more
enlightened position e.g. HTML5 App Cache manifests take: Requiring
everyone to use UTF-8 for a new feature so that declarations aren't
needed.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] responseText for text/html before the encoding has stabilized

2011-09-30 Thread Henri Sivonen

On Fri, Sep 30, 2011 at 3:04 PM, Anne van Kesteren ann...@opera.com wrote:
 I do not see why text and moz-chunked-text have to be the same. Surely
 we do not want XML encoding detection to kick in for chunks.

Does text and default need to be the same for responseText for
text/html and XML types? It seems annoying to have to run the meta
prescan or to run the XML declaration detection without running a full
parse in the text mode.

 Having deterministic decoding and waiting for 1024 bytes if the MIME type is
 text/html seems reasonable.

Seems reasonable for the modes that have a non-null responseXML for text/html.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] responseText for text/html before the encoding has stabilized

2011-09-30 Thread Henri Sivonen

On Fri, Sep 30, 2011 at 3:35 PM, Anne van Kesteren ann...@opera.com wrote:
 On Fri, 30 Sep 2011 14:29:32 +0200, Henri Sivonen hsivo...@iki.fi wrote:

 On Fri, Sep 30, 2011 at 3:04 PM, Anne van Kesteren ann...@opera.com
 wrote:

 I do not see why text and moz-chunked-text have to be the same.
 Surely we do not want XML encoding detection to kick in for chunks.

 Does text and default need to be the same for responseText for
 text/html and XML types? It seems annoying to have to run the meta
 prescan or to run the XML declaration detection without running a full
 parse in the text mode.

 Unless we disable responseText and responseXML when responseType is not the
 empty string I am not sure that makes sense.

responseType is a newish feature. If it's OK for responseType ==
chunked-text to use encoding determination rules that differ from
responseType ==  or responseType == document, why should
responseType == text have to be consistent with responseType == 
instead of being consistent with responseType == chunked-text?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] Avoiding charset dependencies on user settings

2011-09-29 Thread Henri Sivonen

On Thu, Sep 29, 2011 at 3:30 AM, Jonas Sicking jo...@sicking.cc wrote:
 Do we have any guesses or data as to what percentage of existing pages
 would parse correctly with the above suggestion?

I don't have guesses or data, because I think the question is irrelevant.

When XHR is used for retrieving responseXML for legacy text/html, I'm
not expecting legacy data that doesn't have encoding declations to be
UTF-8 encoded. I want to use UTF-8 for consistency with legacy
responseText and for well-defined behavior. (In the HTML parsing
algorithm at least, we value well-defined behavior over guessing the
author's intent correctly.) When people add responseXML usage for
text/html, I expect them to add encoding declaration (if they are
missing) when they add XHR code that uses responseXML for text/html.

We assume for security purposes that an origin is under the control of
one authority--i.e. that authority can change stuff within the origin.
I'm suggesting that when XHR is used to retrieve text/html data from
the same origin, if the text/html data doesn't already have its
encoding declared, the person exercising the origin's authority to add
XHR should take care of exercising the origin's authority to modify
the text/html resources to add encoding declarations.

XHR can't be used for retrieving different-origin legacy data without
the other origin opting in using CORS. I posit that it's less onerous
for the other origin to declare its encoding than to add CORS support.
Since the other origin needs to participate anyway, I think it's
reasonable to require declaring the encoding to be part of the
participation.

Finally, XHR allows the programmer using XHR to override the MIME
type, including the charset parameter, so if the person adding new XHR
code can't change the encoding declarations on legacy data, (s)he can
override the UTF-8 last resort from JS (and a given repository of
legacy data pretty often has a self-consistent encoding that the XHR
programmer can discover ahead of time). I think requiring the person
adding XHR code to write that line is much better than adding more
locale and/or user setting-dependent behavior to the Web platform.

 What outcome do you suggest and why? It seems you aren't suggesting
 doing stuff that involves a parser restart? Are you just arguing
 against UTF-8 as the last resort?

 I'm suggesting that we do the same thing for XHR loading as we do for
 iframe loading. With exception of not ever restarting the parser.
 The goals are:

 * Parse as much of the HTML on the web as we can.
 * Don't ever restart a network operation as that significantly
 complicates the progress reporting as well as can have bad side
 effects since XHR allows arbitrary headers and HTTP methods.

So you suggest scanning the first 1024 bytes heuristically and suggest
varying the last resort encoding.

Would you decode responseText using the same encoding that's used for
responseXML? If yes, that would mean changing the way responseText
decodes in Gecko when there's no declaration.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[XHR2] responseText for text/html before the encoding has stabilized

2011-09-29 Thread Henri Sivonen

http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#text-response-entity-body says:
The text response entity body is a DOMString representing the
response entity body. and If charset is null and mime is text/html
follow the rules set forth in the HTML specification to determine the
character encoding. Let charset be the determined character encoding.
Furthermore, the response entity body is defined while the state is
LOADING: The response entity body is the fragment of the entity body
of the response received so far (LOADING) or the complete entity body
of the response (DONE).

The spec is silent on what responseText for text/html should be if
responseText is read before it is known that the rules set forth in
the HTML specification to determine the character encoding will no
longer change their result. This looks like a spec bug.

There are three obvious solutions:
1) Change the encoding used for responseText as more data becomes
available so that previous responseText is not guaranteed to be a
prefix of subsequent responseText.
2) Make XHR pretend it hasn't seen any data at all before it has seen
so much that the encoding decision is final.
3) Not using the HTML rules for responseText.

Solution #1 is what Gecko now does with XML, but fortunately XML
doesn't allow non-ASCII before the XML declaration, so you can't
detect this from outside the black box. With HTML, solution #1 would
mean handing a footgun to Web authors who might not prepare for cases
where previous responseText stops being a prefix of subsequent
responseText.

Solution #2 could, in the worst case (assuming we aren't doing the
worst of worst cases; i.e. we aren't allowing parser restarts
arbitrarily late), stall until 1024 bytes has been seen, which risks
breaking existing comet apps if there exist comet apps that use
responseText with slowly-arriving text/html responses that don't have
a BOM, don't have an early meta and don't have an HTTP charset and
that require the JS part of the app to respond act on data within the
first 1024 bytes before the server sends more. (OK, it would be silly
to write comet apps with responseText using text/html as opposed to
e.g. text/plain or whatever and not put a charset declaration on the
HTTP layer, but this is the Web, so who knows if such apps exist.)

Solution #3 would make the text/html side inconsistent with the XML
side and could lead to confusion especially in the default mode if
responseXML does honor metas (within the first 1024 bytes). Solution
#3 would be easy to implement, though.

As a complication, since Saturday, Gecko supports a moz-chunked-text
response type which modifies the behavior of response and responseText
so that they only show a string consisting of new text since the
previous progress event. moz-chunked-text isn't specced anywhere (to
my knowledge), but IRC discussion with Olli indicates that it's
assumed that, even going forward, the encoding decision is made the
same way for moz-chunked-text and text response types. This
assumption obviously excludes solution #1 above, since chunks reported
before meta could use a different encoding compared to chunks after
meta, which wouldn't make sense.

It's worth noting that moz-chunked-text turns off responseXML, so
it's not unthinkable to use non-HTML rules for moz-chunked-text.

In IRC discussion with Olli, we gravitated towards solution #2, but we
didn't consider the comet stalling aspect in that discussion.

In any case, all this should be specced properly and it currently isn't. :-(

It seems to me that all these cannot be true:
 * responseText and responseXML use the same encoding detection rules.
 * The text and default modes use the same encoding detection rules.
 * text and moz-chunked-text use the same encoding detection rules.
 * moz-chunked-text uses the same encoding for all chunks.
 * All imaginable badly written comet apps are guaranteed to continue working.
 * responseXML considers meta in a deterministic way (no timer for
bailing out before 1024 bytes if the network stalls).

Which property do we give up?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] Avoiding charset dependencies on user settings

2011-09-28 Thread Henri Sivonen

On Wed, Sep 28, 2011 at 4:16 AM, Jonas Sicking jo...@sicking.cc wrote:
 So it sounds like your argument is that we should do meta prescan
 because we can do it without breaking any new ground. Not because it's
 better or was inherently safer before webkit tried it out.

The outcome I am suggesting is that character encoding determination
for text/html in XHR should be:
 1) HTTP charset
 2) BOM
 3) meta prescan
 4) UTF-8

My rationale is:
 * Restarting the parser sucks. Full heuristic detection and
non-prescan meta require restarting.
 * Supporting HTTP charset, BOM and meta prescan means supporting
all the cases where the author is declaring the encoding in a
conforming way.
 * Supporting meta prescan even for responseText is safe to the
extent content is not already broken in WebKit.
 * Not doing even heuristic detection on the first 1024 bytes allows
us to avoid one of the unpredictability and
non-interoperability-inducing legacy flaws that encumber HTML when
loading it into a browsing context.
 * Using a clamped last resort encoding instead of a user setting or
locale-dependent encoding allows us to avoid one of the
unpredictability and non-interoperability-inducing legacy flaws that
encumber HTML when loading it into a browsing context.
 * Using UTF-8 as opposed to Windows-1252 or a user setting or
locale-dependent encoding as the last resort encoding allows the same
encoding to be used in the responseXML and responseText cases without
breaking existing responseText usage that expects UTF-8 (UTF-8 is the
responseText default in Gecko).

What outcome do you suggest and why? It seems you aren't suggesting
doing stuff that involves a parser restart? Are you just arguing
against UTF-8 as the last resort?

 And in any case, it's easy to figure out where the
 data was loaded from after the fact, so debugging doesn't seem any
 harder.

If that counts as not harder, I concede this point.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] Avoiding charset dependencies on user settings

2011-09-26 Thread Henri Sivonen

On Mon, Sep 26, 2011 at 12:46 PM, Jonas Sicking jo...@sicking.cc wrote:
 On Fri, Sep 23, 2011 at 1:26 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On Thu, Sep 22, 2011 at 9:54 PM, Jonas Sicking jo...@sicking.cc wrote:
 I agree that there are no legacy requirements on XHR here, however I
 don't think that that is the only thing that we should look at. We
 should also look at what makes the feature the most useful. A extreme
 counter-example would be that we could let XHR refuse to parse any
 HTML page that didn't pass a validator. While this wouldn't break any
 existing content, it would make HTML-in-XHR significantly less useful.

 Applying all the legacy text/html craziness to XHR could break current
 use of XHR to retrieve responseText of text/html resources (assuming
 that we want responseText for text/html work like responseText for XML
 in the sense that the same character encoding is used for responseText
 and responseXML).

 This doesn't seem to only be a problem when using crazy parts of
 text/html charset detection. Simply looking for meta charset in the
 first 1024 characters will change behavior and could cause page
 breakage.

 Or am I missing something?

Yes: WebKit already performs the meta prescan for text/html when
retrieving responseText via XHR even though it doesn't support full
HTML parsing in XHR (so responseXML is still null).
http://hsivonen.iki.fi/test/moz/xhr/charset-xhr.html

Thus, apps broken by the meta prescan would already be broken in
WebKit (unless, of course, they browser sniff in a very strange way).

And apps that wouldn't be OK with using UTF-8 as the fallback encoding
when there's no HTTP-level charset, no BOM and no meta in the first
1024 bytes would already by broken in Gecko.

 Applying all the legacy text/html craziness to XHR would make data
 loading in programs fail in subtle and hard-to-debug ways depending on
 the browser localization and user settings. At least when loading into
 a browsing context, there's visual feedback of character misdecoding
 and the feedback can be attributed back to a given file. If
 setting-dependent misdecoding happens in the XHR data loading
 machinery of an app, it's much harder to figure out what part of the
 system the problem should be attributed to.

 Could you provide more detail here. How are you imagining this data
 being used such that it's not being displayed to the user.

 I.e. can you describe an application that would break in a non-visual
 way and where it would be harder to detect where the data originated
 from compared to for example iframe usage.

If a piece of text came from XHR and got injected into a visible DOM,
it's not immediately obvious, which HTTP response it came from.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] Avoiding charset dependencies on user settings

2011-09-23 Thread Henri Sivonen

On Thu, Sep 22, 2011 at 9:54 PM, Jonas Sicking jo...@sicking.cc wrote:
 I agree that there are no legacy requirements on XHR here, however I
 don't think that that is the only thing that we should look at. We
 should also look at what makes the feature the most useful. A extreme
 counter-example would be that we could let XHR refuse to parse any
 HTML page that didn't pass a validator. While this wouldn't break any
 existing content, it would make HTML-in-XHR significantly less useful.

Applying all the legacy text/html craziness to XHR could break current
use of XHR to retrieve responseText of text/html resources (assuming
that we want responseText for text/html work like responseText for XML
in the sense that the same character encoding is used for responseText
and responseXML).

Applying all the legacy text/html craziness to XHR would make data
loading in programs fail in subtle and hard-to-debug ways depending on
the browser localization and user settings. At least when loading into
a browsing context, there's visual feedback of character misdecoding
and the feedback can be attributed back to a given file. If
setting-dependent misdecoding happens in the XHR data loading
machinery of an app, it's much harder to figure out what part of the
system the problem should be attributed to.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [XHR2] Avoiding charset dependencies on user settings

2011-09-23 Thread Henri Sivonen

On Fri, Sep 23, 2011 at 11:26 AM, Henri Sivonen hsivo...@iki.fi wrote:
 Applying all the legacy text/html craziness

Furthermore, applying full legacy text/html craziness involves parser
restarts for GET requests. With a browsing context, that means
renavigation, but I really don't want to support parser restarts in
XHR.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Adding Web Intents to the Webapps WG deliverables

2011-09-22 Thread Henri Sivonen

On Tue, Sep 20, 2011 at 6:53 AM, Ian Hickson i...@hixie.ch wrote:
 Why not just improve both navigator.registerContentHandler and
 navigator.registerProtocolHandler?

 In particular, why are intents registered via a new HTML element rather
 than an API?

Web Activities addresses this problem space without a new HTML element:
https://github.com/mozilla/openwebapps/blob/master/docs/ACTIVITIES.md

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[XHR2] Avoiding charset dependencies on user settings

2011-09-22 Thread Henri Sivonen

http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#document-response-entity-body
says:
If final MIME type is text/html let document be Document object that
represents the response entity body parsed following the rules set
forth in the HTML specification for an HTML parser with scripting
disabled. [HTML]

Since there's presumably no legacy content using XHR to read
responseXML for text/html (and expecting HTML parsing) and since (in
Gecko at least) responseText for non-XML tries HTTP charset and falls
back on UTF-8, it seems it doesn't make sense to implement full-blown
legacy charset craziness for text/html in XHR.

Specifically, it seems that it makes sense to skip heuristic detection
and to use UTF-8 (as opposed to Windows-1252 or a locale-dependent
value) as the fallback encoding if there's neither meta nor HTTP
charset, since UTF-8 is the pre-existing fallback for responseText and
responseText is already used with text/html.

As it stands, the XHR2 spec defers to a part of HTML that has
legacy-oriented optional features. It seems that it makes sense to
clamp down those options for XHR.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Overriding the MIME type in XHR2 after the request has started

2011-04-19 Thread Henri Sivonen

In reference to
http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#dom-xmlhttprequest-overridemimetype

It seems to me that XHR2 allows overrideMimeType() to be called at any
time so that it affects the calls to the responseXML getter the
overrideMimeType() call. And the subsequent overrideMimeType() calls can
make the responseXML getter return different things later.

This is bad, because it requires synchronous parsing when the
responseXML getter is called. OTOH, if overrideMimeType() calls were
honored only before the send() method has been called, parsing to DOM
could be implemented progressively (potentially off-the-main-thread) as
the resource representation downloads and the responseXML getter could
return this eagerly-parsed Document and always return the same document
on subsequent calls.

Are there compelling use cases for allowing overrideMimeType() after
send() has been called? I assume that typically one would use
overrideMimeType() when knowing ahead of time that the config of the
server responding to XHR is bogus.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[XHR2] Overriding the MIME type in XHR2 after the request has started

2011-04-19 Thread Henri Sivonen

Adding [XHR2] to the subject to comply with the instructions. Sorry
about the noise.

On Tue, 2011-04-19 at 12:04 +0300, Henri Sivonen wrote:
 In reference to
 http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#dom-xmlhttprequest-overridemimetype
 
 It seems to me that XHR2 allows overrideMimeType() to be called at any
 time so that it affects the calls to the responseXML getter the
 overrideMimeType() call. And the subsequent overrideMimeType() calls can
 make the responseXML getter return different things later.
 
 This is bad, because it requires synchronous parsing when the
 responseXML getter is called. OTOH, if overrideMimeType() calls were
 honored only before the send() method has been called, parsing to DOM
 could be implemented progressively (potentially off-the-main-thread) as
 the resource representation downloads and the responseXML getter could
 return this eagerly-parsed Document and always return the same document
 on subsequent calls.
 
 Are there compelling use cases for allowing overrideMimeType() after
 send() has been called? I assume that typically one would use
 overrideMimeType() when knowing ahead of time that the config of the
 server responding to XHR is bogus.
 

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Use cases for Range::createContextualFragment and script nodes

2010-10-20 Thread Henri Sivonen

When WebKit or Firefox trunk create an HTML script element node via 
Range::createContextualFragment, the script has its 'already started' flag set, 
so the script won't run when inserted into a document. In Opera 10.63 and in 
Firefox 3.6.x, the script doesn't have the 'already started' flag set, so the 
script behaves like a script created with document.createElement(script) when 
inserted into a document.

I'd be interested in use cases around createContextualFragment in order to get 
a better idea of which behavior should be the correct behavior going forward.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Please special-case html as the context of Range.createContextualFragment when the doc is an HTML doc

2010-09-06 Thread Henri Sivonen

For a future revision of DOM Range:

Please specify that when the document associated with Range is an HTML document 
(has its HTMLness bit set per HTML5) and the context node (startContainer of 
the Range) has local name html in the XHTML namespace, the context passed to 
the HTML fragment parsing algorithm should be body in the XHTML namespace 
instead. This is required for compat with existing scripts.

See https://bugzilla.mozilla.org/show_bug.cgi?id=585819 and 
dependencies/duplicates.

(Also, if the document is an HTML document and the range doesn't have an 
identifiable context element node even after walking the parent chain, the 
context passed to the HTML fragment parsing algorithm should be body in the 
XHTML namespace. I don't know DOM Range well enough to say how this situation 
can arise.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Issues with XML Dig Sig and XML Canonicalization; was Re: Rechartering WebApp WG

2010-02-22 Thread Henri Sivonen

Sorry about the slow response time.

On Feb 12, 2010, at 16:07, Marcos Caceres wrote:
What we are discussing is if Mozilla's solution for signing Zip files
(JAR-based) [1] is easier for vendors to implement/maintain and authors to
deal with when compared to the W3C Widget solution of using XML Dig Sig.

I think it's clear that JAR/XPI signing is simpler than XML Dig Sig, because
JAR signing operates on a plain text / line-base manifest and, thus, doesn't
require XML canonicalization before the signing step.

I have previously listed the summary of issues in
http://lists.w3.org/Archives/Public/public-webapps/2009AprJun/0178.html

Thus far, in terms of ease of use for authors, little in the way of concrete
evidence has been presented of one signing method being easier than the other
(specially by looking at the complexity of using Mozilla's command line-based
tool [1] compared to BONDI's SDK [2]). This is not to say that Mozilla (or
anyone, given its open source nature) could not make a super easy tool for
signing zip files.

FWIW, I think I haven't ever argued anything about the ease of use of Mozilla's
XPI signing tool. I have previously argued that Sun's jar signing tools were
widely available. (Previously, I was unaware that .jar and .xpi used different
crypto algorithms. Since .xpi is newer, one might assume it has a better
algorithm in terms of crypto characteristics but obviously not in terms of
network effects of tool availability.)

However, the proof is in the pudding here: By virtue that Bondi's SDK
includes a tool that allows widgets to be signed with a few clicks is
evidence that the W3C's Widgets Signature specification is capable of being
used to produce easy to use products.

I don't think I've ever claimed that the production of easy-to-use products
weren't *possible*. My claim was that XML Canonicalization (whether Exclusive
or not) introduces enough *implementation* complexity that previously, buggy
canonicalization code has been deployed, which has lead to signatures failing
to validate with other implementations that weren't bugwards-compatible with
the signer's implementation.

Here's evidence of bugs in just one high-profile Canonicalization
implementation:
https://issues.apache.org/bugzilla/buglist.cgi?query_format=advancedproduct=Securitycomponent=Canonicalizationcmdtype=doit

In terms of implementation, Mozilla has previously raised concerns about XML
canonicalization (which I don't fully understand, hence the growing email cc
list) - but by the virtue that people have implemented the Widget signing
spec, I await to see if Mozilla's concerns will materialize in practice and
actually hinder interoperability - I'm not saying this is FUD, but we need
proof.

The above is proof of *previous* interop-sensitive bugs in a widely-deployed
Canonicalization implementation. There's no reason to believe that
complexity-induced bugs of this kind are unique to one implementation. Instead,
I think it's fair to expect any from-scratch implementation of Canonicalization
to be prone to similar bugs *that could be avoided by using jar signing
instead*, since jar signing omits the Canonicalization step entirely.

Unfortunately, due to confidentiality concerns of people deploying crypto
software, I can't give you concrete deployment stories where the above-cited
bugs have caused interop issues. I can only point to the public bug list and
assert that bugs in this area have had actual interop consequences at
deployment time.

(Also, having to have a Canonicalization impl. adds code bloat compared to
having a jar signing impl.)

It's too early to make the call that widget signing is flawed. And it's
important to note that no one that has implemented has come back to the WG
raising any concerns or screaming bloody murder.

It could be that people don't sign widgets very often. I don't recall ever
seeing a signed Firefox extension or a signed Eclipse plug-in.

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Notifications

2010-02-12 Thread Henri Sivonen

On Feb 10, 2010, at 20:35, John Gregg wrote:

 I agree that this is a good distinction, but I think even considering ambient 
 notifications there is a question of how much interaction should be 
 supported.  NotifyOSD, for example, does not allow the user to take any 
 action in response to a notification.  

Being able to acknowledge an ambient notification could be an optional feature 
that isn't supported on Ubuntu as long as NotifyOSD doesn't support 
acknowledging notifications. (If it's a problem to make acknowledgement 
optional, I think making HTML notification optional is going to be a bigger 
problem...)

FWIW, Microsoft explicitly says notifications must be ignorable and don't 
persist. Notifications aren't modal and don't require user interaction, so 
users can freely ignore them. In Windows Vista® and later, notifications are 
displayed for a fixed duration of 9 seconds.
http://msdn.microsoft.com/en-us/library/aa511497.aspx
As such, it's always unsafe to design UI in a way that expects the users to be 
able to acknowledge a given notification.

 So a very simple use case: email web app wants to alert you have new mail 
 outside the frame, and allow the user to click on that alert and be taken to 
 the inbox page.  This does not work on NotifyOSD, because they explicitly 
 don't support that part of the D-bus notifications spec.  However, Growl 
 would support this. 

If acknowledgement support is super-important to Web apps, surely it should be 
to native apps, too. It seems to me that it would be a bad outcome for users if 
the Ubuntu desktop and the Web platform disagree on this point and it causes 
the duplication of notification mechanisms. I think it would make more sense to 
either add org.freedesktop.Notifications.ActionInvoked to NotifyOSD (if 
acknowledgeability is the Right Thing) or not to add acknowledgeability to the 
Web platform (if that's the Right Thing). Having two groups of platform 
designers (the designers of the Ubuntu desktop and the designers of the Web 
platform) disagree on what the Right Thing is makes the users lose.

CCing mpt in case he can share some insight into why NotifyOSD explicitly 
doesn't support org.freedesktop.Notifications.ActionInvoked.

On Feb 11, 2010, at 00:10, Drew Wilson wrote:
 it seems like the utility of being able to put markup such as bold text, or 
 graphics, or links in a notification should be self-evident,

It's not self-evident. If it were, surely native apps would be bypassing 
NotifyOSD and Growl to get more bolded and linkified notifications.

On Feb 11, 2010, at 16:07, Jeremy Orlow wrote:

 As has been brought up repeatedly, growl and the other notification engines 
 are used by a SMALL FRACTION of all web users.  I suspect a fraction of a 
 percent.  Why are we bending over backwards to make this system work on those 
 platforms?

More seriously though: Virtually every user of an up-to-date Ubuntu 
installation has the notification engine installed. As for Growl, the kind of 
users who install Growl are presumably the kind of users who care about 
notifications of multiple concurrent things the most. Furthermore, it seems 
that notifications are becoming more a part of operating system platfroms. For 
example, it looks like Windows 7 has a system API for displaying notifications: 
http://msdn.microsoft.com/en-us/library/ee330740%28VS.85%29.aspx

 Are there other examples where we've dumbed down an API to the least common 
 denominator for a small fraction of users?  Especially when there's no 
 technical reason why these providers could not be made more advanced (for 
 example, embed webkit to display fully functional notifications)?

It's not a given that it's an advancement in user experience terms not to force 
all ambient notifications into a consistent form.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: Notifications

2010-02-10 Thread Henri Sivonen

On Feb 3, 2010, at 20:54, Drew Wilson wrote:

 Following up on breaking out createHTMLNotification() and 
 createNotification() vs combining them into one large API - I believe the 
 intent is that a given user agent may not support all types of notifications 
 (for example, a mobile phone application may only support text + icon 
 notifications, not HTML notifications).

My main concern isn't mobile phones in the abstract but mapping to concrete 
system-wide notification mechanisms: Growl and NotifyOSD on Mac and Ubuntu 
respectively.

So far, the only use case I've seen (on the WHATWG list) for HTML notifications 
that aren't close to the kind of notifications that Growl and NotifyOSD support 
has been a calendar alarm.

I agree that calendar alarm is a valid use case, but I think HTML vs. not HTML 
isn't the right taxonomy. Rather, it seems to me that there are ambient 
notifications (that dismiss themselves after a moment even if unacknowledged) 
and notifications that are all about interrupting the user until explicitly 
dismissed (calendar alarms).

I think the API for ambient notifications should be designed so that browsers 
can map all ambient notifications to Growl and NotifyOSD. As for notifications 
that require explicit acknowledgement, I think it would be worthwhile to 
collect use cases beyond calendar alarms first and not heading right away to 
generic HTML notifications.

If it turns out that notifications that require explicit acknowledgements are 
virtually always calendar alarms or alarm clock notifications, it might make 
sense to design an API explicitly for those. For example, it could be desirable 
to allow a privileged calendar Web app to schedule such alarms to fire on a 
mobile device without having to keep a browsing context or a worker active at 
the notification time.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: XMLHttpRequest Comments from W3C Forms WG

2009-12-17 Thread Henri Sivonen

On Dec 16, 2009, at 21:47, Klotz, Leigh wrote:

 I'd like to suggest that the main issue is dependency of the XHR document on 
 concepts where HTML5 is the only specification that defines several core 
 concepts of the Web platform architecture, such as event loops, event handler 
 attributes,  Etc.

A user agent that doesn't implement the core concepts isn't much use for 
browsing the Web. Since the point of the XHR spec is getting interop among Web 
browsers, it isn't a good allocation of resources to make XHR not depend on 
things that a user agent that is suitable for browsing the Web needs to support 
anyway.

XHR interop doesn't matter much if XHR is transplanted into an environment 
where the other pieces fail to be interoperable with Web browsing software. 
That is, in such a case, it isn't much use if XHR itself works like XHR in 
browsers--the system as a whole still doesn't interoperate with Web browsers.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [selectors-api] querySelector with namespace

2009-11-26 Thread Henri Sivonen

On Nov 26, 2009, at 13:18, Jonathan Watt wrote:

 During a discussion about xml:id I was about to make the throw away comment 
 that
 you could use querySelector to easily work around lack of support for xml:id,
 but on checking it turns out that's not the case. querySelector, it seems,
 cannot be used to select on a specific namespace, since you can only use
 namespace prefixes in selectors, and querySelector does not resolve prefixes.

Isn't the easiest solution not to support xml:id on the Web? It's not supported 
in Gecko, WebKit or Trident. What's the upside of adding it?

xml:id doesn't enable functionality that the id attribute on HTML, MathML or 
SVG elements doesn't enable, but xml:id comes with all sorts of complications. 
In addition to this complication, it has the complication that in an 
xml:id-enabled world, an element doesn't have a single attribute that has 
IDness. Instead, it has to have two (the natural choice flowing from XML specs) 
or the IDness of attributes has to depend on the presence of other attributes 
(the choice taken by SVG 1.2 Tiny).

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: XMLSerializer should run HTML serialization algorithm when input doc is HTML

2009-07-02 Thread Henri Sivonen


On Jul 2, 2009, at 12:11, Giovanni Campagna wrote:


2009/7/2 Cameron McCormack c...@mcc.id.au:

Henri Sivonen:

Gecko bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=500937

The proposed patch there and (based on black-box testing) WebKit  
solve

the issue by running the HTML serialization algorithm when the owner
document of the input node is an HTML document.

This should probably be in a spec somewhere.


We’d need a spec for XMLSerializer first, I guess.


Then we need a discussion about the possibility of having a spec for
XMLSerializer, having already DOM3LS.



It should be pretty clear by now that XHR/DOMParser/XMLSerializer won  
and DOM3 LS lost. DOM3 LS is now just a legacy burden, but XHR,  
DOMParser and XMLSerializer need specs in order to iron out interop  
issues.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

XMLSerializer should run HTML serialization algorithm when input doc is HTML

2009-07-01 Thread Henri Sivonen


Gecko bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=500937

The proposed patch there and (based on black-box testing) WebKit solve  
the issue by running the HTML serialization algorithm when the owner  
document of the input node is an HTML document.


This should probably be in a spec somewhere.

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Please include a statement of purpose and user interaction expectations for feature

2009-06-23 Thread Henri Sivonen


On Jun 2, 2009, at 16:02, Robin Berjon wrote:


On Jun 2, 2009, at 14:57 , Henri Sivonen wrote:
Please include a corresponding UA requirement to obtain  
authorization from the user for the features imported with  
feature. (It seems that the security aspect requires an  
authorization and doesn't make sense if the dangerous feature are  
simply imported silently.) As far as I can tell, the spec doesn't  
currently explain what the UA is supposed to do with the 'feature  
list' once built.


I don't think that that is a good idea. The purpose of feature is  
to provide a hook through which a widget may communicate with a  
security policy. What's in the security policy really isn't up to P 
+C to define (though it certainly should be defined somewhere else).  
Maybe it could ask the user, as you state, but maybe it could see  
that the widget was signed by a trusted party, or know that the  
device doesn't have any sensitive data for a given API, or maybe  
anything goes on the full moon.



I see. The track record with Java APIs doesn't fill me with confidence  
that the Right Thing will be done, but I guess this is outside the  
scope of interop-oriented specs. (My current phone asks me every time  
Google Maps Mobile wants to use the network and doesn't allow me to  
grant the permission permanently and doesn't ask me when GMM wants to  
grab my geolocation and send it to Google.)


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Please include a statement of purpose and user interaction expectations for feature

2009-06-22 Thread Henri Sivonen


On Jun 16, 2009, at 15:42, Marcos Caceres wrote:


Based on Arve and Robin's additional feedback, I've added  the
following to the spec as part of The Feature Element section:

How a user agent makes use of features depends on the user agent's
security policy, hence activation and authorization requirements for
features are beyond the scope of this specification.

Is that satisfactory?


I think it's better than what was in the spec before. However, if a  
reader doesn't already know what feature is for, I think the current  
text might not make it quite clear.


I notice that now the definition of what a feature is includes a video  
codec in addition to APIs. Does BONDI expect video codecs to be  
sensitive to security policies? Do you envision undeclared video  
codecs to be withheld from the HTML5 source fallback and  
canPlayType()?


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[widgets] Please include a statement of purpose and user interaction expectations for feature

2009-06-02 Thread Henri Sivonen

Please state the purpose of feature. (That it's for authorizing  
features that don't participate in the Web-oriented browser security  
model.)


Please include a corresponding UA requirement to obtain authorization  
from the user for the features imported with feature. (It seems that  
the security aspect requires an authorization and doesn't make sense  
if the dangerous feature are simply imported silently.) As far as I  
can tell, the spec doesn't currently explain what the UA is supposed  
to do with the 'feature list' once built.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[widgets] Purpose and utility of feature unclear

2009-06-01 Thread Henri Sivonen


Regarding http://dev.w3.org/2006/waf/widgets/#the-feature-element

I don't understand the purpose and utility of the feature element.

Using a feature element denotes that, at runtime, a widget may  
attempt to access the feature identified by the feature element's  
name attribute.


Why is this useful to denote? What happens if a widget doesn't denote  
that it'll attempt to use a feature but does so anyway?


Using a feature element denotes that, at runtime, a widget may  
attempt to access the feature identified by the feature element's  
name attribute.



Why aren't all the implemented features simply available like in a Web  
browser engine?


A user agent can expose a feature through, for example, an API, in  
which case a user agents that supports the [Widgets-APIs]  
specification can allow authors to check if a feature loaded via the  
hasFeature() method.


Wouldn't this have all the same problems that DOM hasFeature() has had  
previously and the problems that have been pointed out as reasons not  
to have feature detection at-rules in CSS? Namely, that  
implementations have the incentive to claim that they have a feature  
as soon as they have a partial buggy implementation.


A boolean attribute that indicates whether or not this feature must  
be available to the widget at runtime. In other words, the required  
attribute denotes that a feature is absolutely needed by the widget  
to function correctly, and without the availability of this feature  
the widget serves no useful purpose or won't execute properly.


What's a widget engine expected to do when an unrecognized feature is  
declared as required?



feature name=http://example.org/api.geolocation; required=false/



Suppose a WG creates a feature for the Web, the feature is not part of  
the Widgets 1.0 Family of specs and the WG doesn't assign a feature  
string for the feature because the WG doesn't consider widgets. Next,  
suppose browser engines implement the feature making it  
unconditionally available to Web content.


Now, if such a browser engine is also a widget engine, does it make  
the feature's availability on the widget side conditional to importing  
it with feature? If it does, what's the point of not making the  
feature available unconditionally? If it doesn't, what's the point of  
feature?


If there are two such engines, how do they converge on the same  
feature name string of the specifiers of the feature itself just meant  
it to be available to Web content unconditionally and didn't bother to  
mint a widget feature string?


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Public keys in widgets URI scheme?

2009-05-27 Thread Henri Sivonen


On May 27, 2009, at 18:32, Adam Barth wrote:


3) A developer can write two widgets that occupy the same origin
(again, but re-using the public key).  These widgets will be able to
interact more freely, for example by sharing the same localStorage,
etc.



I though the point of the UUID was to isolate even different instances  
of the same widget.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Jar signing vs. XML signatures

2009-04-17 Thread Henri Sivonen


On Apr 17, 2009, at 13:24, Robin Berjon wrote:

Trying to separate the discussion from the change request: would you  
be satisfied if requirements to perform C14N were removed and  
reliance on XSD data types for definition purposes were replaced  
with something less scary (though in this case this is a bit of a  
FUD argument Henri, the referenced types aren't overwhelming)?


My preferred change would be adopting jar signing. However, if that's  
not feasible, my next preferred option would indeed be removing the  
requirement to perform canonicalization (i.e. sign XML as binary with  
a detached traditional binary signature block).


As for the data types, I'd be satisfied if the datatypes were defined  
in such a way that attribute value parsing algorithms and conversion  
methods that a browser engine has to contain anyway were reusable.  
This should include well-defined behavior in the case of non- 
conforming input.


For example, for dates (which is a datatype that widgets add--not  
something that comes from XML signatures), it makes more sense to  
reuse an appropriate microsyntax definition from HTML5 than to  
delegate to XSD. XSD not only makes leading and trailing whitespace  
conforming and fails to define behavior in the case of non-conforming  
dates, XSD which even allows leap seconds! (Is it a FUD argument that  
XSD dates deviate from the value space that is typically used in Posix  
date conversions between multi-unit tuples and epoch seconds?)


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Jar signing vs. XML signatures

2009-04-15 Thread Henri Sivonen


On Apr 15, 2009, at 15:00, Marcos Caceres wrote:

On Tue, Apr 14, 2009 at 4:19 PM, Henri Sivonen hsivo...@iki.fi  
wrote:

On Apr 14, 2009, at 14:38, Marcos Caceres wrote:

I think it would be more productive to help us address the issues  
that you

mentioned, instead of asking us to dump everything and start again.



So the issues were:
 1) The complexity of canonicalization/reserialization of XML.


I think this is an issue that needs to be taken up with XML Security
WG or whoever is working on the canonicalization spec.


That's not the point. The point is that XML signatures try to solve a  
more complex problem than what needs solving for signing a zip file.  
It would be useless to tell people who do want to solve the complex  
problem that they should solve it without canonicalization.


But widgets don't really need to solve the problem XML signatures  
solve. Widgets could get away with signing a manifest traditionally  
without the signing method knowing that what is being signed happens  
to be XML.


This would result in having to reserve one file name for the manifest  
(either in a jar-ish text format or in a W3C-ish XML format) and a  
range of file names for detached signatures in traditional binary  
formats that off-the-shelf crypto libraries support. The cost of  
putting the signatures inside the manifest XML file is that you end up  
importing complexity like canonicalization.


The above approach won't quite work without a bit of elaboration you  
really want the distributor to sign the author signature and not just  
sign the same manifest. (What's the purpose of A distributor  
signature MUST have a ds:Reference for any author signature, if one is  
present within the widget package. Why does it matter for the widget  
engine that the distributor signed the author signature if both sign  
the same manifest?)



 2) Spec dependency on XSD.


We can probably address this and use prose as you suggested. So you
recommend we follow HTML5 here, right?


Yes, I think the HTML5 approach to defining syntax is better.

Given that you understand the problem, can you maybe propose some  
text?


I'm not sure I understand the problem that the spec is solving. :-)  
For example, I don't know where the code for actually parsing  
CreatedType is supposed to come from. However, my wild guess is that  
unless widget impls are supposed to bring in huge off-the-shelf XSD  
machinery, it would be better to use English to define a more  
constrained date format here the way HTML5 does than to defer to XSD.  
Of course, that doesn't help much if you import XSD deps otherwise  
from XML signatures. If you are bringing in XSD machinery anyway,  
defining things without XSD might even hurt. What's the expected  
software reuse scenario here?


Instead of canonicalizing the manifest XML and using XML signature,  
you
could treat the manifest XML as a binary file and sign it the  
traditional
way leaving a detached binary signature in the format customary for  
the

signing cipher in the zip file. This would address issues #1 and #2.


That is our intention.


Do you mean that's the existing plan (that's not what it looks like)  
or that that's a new intention?


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[widgets] Jar signing vs. XML signatures

2009-04-14 Thread Henri Sivonen

I noticed that widget packaging uses XML signatures (notorious for  
bugs in canonicalization/reserialization code) for signing zip files.  
However, signing zip files has been solved long ago for Java jar  
files. The mechanism or a variation of it is also used for Mozilla xpi  
files and ODF documents.


Wouldn't it be simpler to use jar signing instead of inventing a new  
way of signing zip files with implementation dependencies on XML  
signatures and spec dependencies on XSD? (Why does the spec have  
dependencies on XSD?)


Jar signing is pretty simple compared to XML canonicalization   
reserialization. When you need to reserialize XML, you import all the  
troubles of serializing XML (see e.g. https://issues.apache.org/bugzilla/buglist.cgi?query_format=advancedproduct=Securitycomponent=Canonicalizationcmdtype=doit 
 ). The META-INF folder is ugly, but unsigned widgets could omit it,  
and it isn't much uglier than an XML signature file on the top level  
of the zip archive.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Jar signing vs. XML signatures

2009-04-14 Thread Henri Sivonen

On Apr 14, 2009, at 11:57, Thomas Roessler wrote:

On 14 Apr 2009, at 10:27, Henri Sivonen wrote:

Wouldn't it be simpler to use jar signing instead of inventing a
new way of signing zip files with implementation dependencies on
XML signatures and spec dependencies on XSD? (Why does the spec
have dependencies on XSD?)

Which XSD dependency do you mean? The only XSD dependencies I could
think of right now are ones that say things like the value of this
attribute is of type anyURI or the value space of this element is
a restriction on the base64Binary XSD type. XML Signature does not
require schema validation, or anything like that.

Hence, spec dependencies.

I don't find the string anyURI in the spec, but anyURI is a great
example of why defining syntax in terms of XSD datatypes is a bad idea:

http://hsivonen.iki.fi/thesis/html5-conformance-checker#iri

XSD datatypes are too vague, allow whitespace where the spec writer
didn't mean to allow whitespace or allow surprising values (like 0
and 1 when the spec writer though (s)he'd be allowing true and
false). It is much safer to define datatypes in precise English
prose like HTML5 does than to expect XSD to match what is really meant.

When you need to reserialize XML, you import all the troubles of
serializing XML (see e.g. https://issues.apache.org/bugzilla/buglist.cgi?query_format=advancedproduct=Securitycomponent=Canonicalizationcmdtype=doit
).

The only place where you actually need canonicalization is when
hashing the SignedInfo element inside the signature file (i.e., once
per signature verification).

Given that the signature format is profiled down pretty heavily in
the widget signing spec, I'd dare a guess that most of the
complexity isn't ever used, so a careful implementation might be
able to write a c14n implementation that bails out on anything that
doesn't look like a signature that follows the constraints in this
format.

If you need to do canonicalization even in one place, you need a
properly debugged implementation of it. If the signature format is
profiled heavily, doesn't it mean you can't even use an off-the-shelf
implementation of XML signatures?

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Jar signing vs. XML signatures

2009-04-14 Thread Henri Sivonen

On Apr 14, 2009, at 13:01, Thomas Roessler wrote:

On 14 Apr 2009, at 11:42, Henri Sivonen wrote:

XSD datatypes are too vague, allow whitespace where the spec writer
didn't mean to allow whitespace or allow surprising values (like
0 and 1 when the spec writer though (s)he'd be allowing true
and false). It is much safer to define datatypes in precise
English prose like HTML5 does than to expect XSD to match what is
really meant.

There's an interesting discussion to be had here; however, I doubt
it's in scope for this WG. (In other words, this strikes me as a
rathole.)

I don't see why widgets need to depend on XML signatures at all.

The only place where you actually need canonicalization is when
hashing the SignedInfo element inside the signature file (i.e.,
once per signature verification).

Given that the signature format is profiled down pretty heavily in
the widget signing spec, I'd dare a guess that most of the
complexity isn't ever used, so a careful implementation might be
able to write a c14n implementation that bails out on anything
that doesn't look like a signature that follows the constraints in
this format.

Much of the complexity of canonicalization (and signature in
general) comes from the need to deal with pretty arbitrary nodesets
generated by transform chains. The widget signature profile does
not use (i.e., it's a MUST NOT) any transform chains.

Since the use of transforms is a choice of the signature
application, you shouldn't have any trouble using existing toolkits.

This all seems like needless complexity to me. To sign a zip archive,
one needs a manifest file that contains digests for all the other zip
entries and a signature for the manifest file. Even if widgets use an
XML manifest instead of a jar-style plaintext manifest (which would be
supported by existing jar signing tools; analogously to the zip format
itself having been chosen due to pre-existing tool support), why would
one want to sign the manifest XML with the XML signature machinery
instead of signing it as a sequence of bytes using a well-established
detached signature format for signing a file of bytes?

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Jar signing vs. XML signatures

2009-04-14 Thread Henri Sivonen


On Apr 14, 2009, at 14:38, Marcos Caceres wrote:

I think it would be more productive to help us address the issues  
that you mentioned, instead of asking us to dump everything and  
start again.



So the issues were:
 1) The complexity of canonicalization/reserialization of XML.
 2) Spec dependency on XSD.
 3) Inability to use existing jar signing tools.

If you are already profiling XML signature a lot and are already using  
a detached signature file, it seems to me that you are one step away  
from optimizing away canonicalization:


Instead of canonicalizing the manifest XML and using XML signature,  
you could treat the manifest XML as a binary file and sign it the  
traditional way leaving a detached binary signature in the format  
customary for the signing cipher in the zip file. This would address  
issues #1 and #2.


But then if you are signing the XML manifest file the traditional way,  
you are a step away from using jar-compatible manifests. :-) This  
would address issue #3.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Content-type sniffing and file extension to MIME mapping

2009-03-09 Thread Henri Sivonen


On Mar 6, 2009, at 15:29, Marcos Caceres wrote:

2. The XHTML mapping should also appear in the file identification  
table

[2].


What version of XHTML should I be pointing to? 1.0 or 1.1?



Does it need to say anything more than that .xhtml maps to application/ 
xhtml+xml? The media type is defined by RFC 3236. As implemented, the  
media type isn't restricted to a particular point version of XHTML and  
browsers don't implement 1.1.


(In fact, the media types in the table aren't defined by the specs in  
the Defined by column in general, but are defined by RFCs.)


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [selectors-api] SVG WG Review of Selectors API

2009-01-28 Thread Henri Sivonen



On Jan 27, 2009, at 00:18, Alex Russell wrote:

We just need to invent a pseudo-property for elements which can be  
matched by a :not([someProperty=your_ns_here]).



To select SVG elements while avoiding HTML elements of the same name,  
a selector that prohibits the local name foreignObject between an  
ancestor svg element and the selector subject would be good enough.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [widgets] Trimming attribute values, a bad idea?

2008-12-03 Thread Henri Sivonen



On Dec 3, 2008, at 04:51, Marcos Caceres wrote:


So, for instance, access network= false is ok.

Does anyone see any problem with this? Should I revert back to being
strict and having UA do comparisons without trimming?



Experience with HTML, SVG and MathML indicates that when trimming is  
specified, implementors don't always do it. My conclusion is that it's  
better to specify that keyword attribute values are compared without  
trimming. (Unless, of course, the attribute in question takes a  
whitespace-separated list of tokens.)


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: Support for compression in XHR?

2008-09-12 Thread Henri Sivonen



On Sep 11, 2008, at 22:59, Jonas Sicking wrote:

Wouldn't a better solution then be that when the web page sets the  
flag on the XHR object the browser will always compress the data?  
And leave it up to the web page to ensure that it doesn't enable  
capabilities that the web server doesn't support. After all, it's  
the web page's responsibility to know many other aspects of server  
capabilities, such as if GET/POST/DELETE is supported for a given URI.



This is the approach I've taken with Validator.nu. Validator.nu  
support gzipped request bodies. If someone reads the docs for the Web  
service API so that they can program client code for the service, they  
should notice what the documentation says about compression.


There is, though, the problem that now compression support is part of  
the published API as opposed to being an orthogonal transport feature,  
so removing incoming compression support from Validator.nu (e.g. if  
bandwidth were abundant and CPU were the bottle neck) would break  
existing clients. This is not a problem with same-site XHR, though,  
when the same entity controls the server and the JS program performing  
the requests and can update the JS program immediately.


(Validator.nu also advertises Accept-Encoding: gzip via OPTIONS, but  
I'm not aware of any client automatically picking it up from there.)


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

80 matches

Mail list logo