Re: inline declarative manifest, was Re: New manifest spec - ready for FPWD?
On Wed, Dec 4, 2013 at 8:16 AM, Jonas Sicking jo...@sicking.cc wrote: On Dec 3, 2013 9:25 PM, Marcos Caceres w...@marcosc.com wrote: On Wednesday, December 4, 2013 at 9:40 AM, Jonas Sicking wrote: We currently have both script.../script and script src=..., as well as both style.../style and style src. A big reason we have both is for author convenience. I thought this was because script and style are “this page’s script and style” (i.e., the scope is very specific). This is different to the manifest, which is “this loosely grouped set of navigable resources forms a web-app”. Some web-apps are single-page. If they are simple enough I don't see anything wrong with that. I think we shouldn't optimize for the single-page case. Even single-page apps probably have some bitmaps that they don't include as data: URLs. On the other hand, for multi-page apps and inline manifest would be really inefficient. That is, external-only manifests seem quite reasonable to me. meta name=manifest content='{ a: 1, b: foopy }' Are manifests really short enough for this kind of thing? What happened to the idea from February to stick a JSON-based caching description that desugars into NavigationController into the same manifest? Are we absolutely sure that we don't want the manifest to grow to do AppCache-ish things that pretty much require the declaration to be an attribute on html? -- Henri Sivonen hsivo...@hsivonen.fi http://hsivonen.fi/
Re: File API for Review
Additionally, I think http://www.w3.org/TR/FileAPI/#dfn-type should clarify that the browser must not use statistical methods to guess the charset parameter part of the type as part of determining the type. Firefox currently asks magic 8-ball, but the magic 8-ball is broken. AFAICT, WebKit does not guess, so I hope it's possible to remove the guessing from Firefox. (The guessing in Firefox relies on a big chunk of legacy code that's broken and shows no signs of ever getting fixed properly. The File API is currently the only thing in Firefox that exposes the mysterious behavior of said legacy code to the Web using the default settings of Firefox, so I'm hoping to remove the big chunk of legacy code instead of fixing it properly.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Review of the template spec
I reviewed http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html#dfn-template . Sorry about the delay. Comments: XML parsing isn’t covered. (Should have the wormhole per the discussion at TPAC.) XSLT output isn’t covered. (When an XSLT program trying to generate a template element with children, the children should go through the wormhole.) Interaction with the DOM to XDM mapping isn’t covered per discussion at TPAC. (Expected template contents not to appear in the XDM when invoking the XPath DOM API (for consistency with querySelectorAll) but expected them to appear in the XDM when an XSLT transformation is being processed (to avoid precluding use cases).) 1. If DOCUMENT does not have a browsing context, Let TEMPLATE CONTENTS OWNER be DOCUMENT and abort these steps. 2. Otherwise, Let TEMPLATE CONTENTS OWNER be a new Document node that does not have a browsing context. Is there a big win from this inconsistency? Why not always have a separate doc as the template contents owner? Do we trust the platform never to introduce a way to plant a document that does not have a browsing context into a browsing context? (Unlikely, but do we really want to make the bet?) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Making template play nice with XML and tags-and-text
On Wed, Jul 18, 2012 at 11:35 PM, Adam Barth w...@adambarth.com wrote: On Wed, Jul 18, 2012 at 11:29 AM, Adam Klein ad...@chromium.org wrote: On Wed, Jul 18, 2012 at 9:19 AM, Adam Barth w...@adambarth.com wrote: Inspired by a conversation with hsivonen in #whatwg, I spend some time thinking about how we would design template for an XML world. One idea I had was to put the elements inside the template into a namespace other than http://www.w3.org/1999/xhtml. On the face of things, this seems a lot less scary than the wormhole model. I think this merits further exploration! Thank you! One question about your proposal: do the contents of template in an HTML Unlike the existing wormhole template semantics, in this approach the tags-and-text inside template would translate into DOM as usual for XML. We'd get the inert behavior for free because we'd avoid defining any behavior for elements in the http://www.w3.org/2012/xhtml-template namespace (just as no behavior is defined today). This does get you inertness, but doesn't avoid querySelector matching elements inside template. If changes of the magnitude discussed here are on the table for HTML parsing, I don't see why querySelectorAll() or even Selectors should be assumed to be unchangeable. Also, the elements inside template, though they appear to be HTML, wouldn't have any of the IDL attributes one might expect, e.g., a href=foo/a would have no href property in JS (nor would img have src, etc). They are, perhaps, too inert. I think that's not a problem, because you're not supposed to mutate the template anyway. You're supposed to clone the template and then mutate the clone. That's unfortunate. I guess that means CSS styles will get applied to them as well, which wouldn't be what authors would want. That's not really a problem as long as subtrees with elements in the template namespaces are rooted at a display: none; template element. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] HTML Parsing and the template element
On Thu, Jun 14, 2012 at 11:48 PM, Ian Hickson i...@hixie.ch wrote: Does anyone object to me adding template, content, and shadow to the HTML parser spec next week? I don't object to adding them if they create normal child elements in the DOM. I do object if template has a null firstChild and the new property that leads to a fragment that belongs to a different owner document. (My non-objection to creating normal children in the DOM should not be read as a commitment to support templates Gecko.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] HTML Parsing and the template element
On Tue, Jun 12, 2012 at 12:14 AM, Rafael Weinstein rafa...@google.com wrote: On Mon, Jun 11, 2012 at 3:13 PM, Henri Sivonen hsivo...@iki.fi wrote: On Thu, Jun 7, 2012 at 8:35 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Just saying that querySelector/All doesn't match elements in a template (unless the scope is inside the template already) would work, but it means that we have to make sure that all future similar APIs also pay attention to this. I think that would be preferable compared to opening the Pandora's box of breaking the correspondence between the markup of the DOM tree. Besides, we'd need something like this for the XML case anyway if the position the spec takes is that it shies away from changing the correspondence between XML source and the DOM. In general, I think the willingness to break the correspondence between the source and the DOM should be the same for both HTML and XML serializations. If you believe that it's not legitimate to break the correspondence between XML source and the DOM, it would be logical to treat radical changes to the correspondence between HTML source and the DOM as equally suspect. I think looking at this as whether we are breaking the correspondance between source and DOM may not be helpful -- because it's likely to be a matter of opinion. Arguments that correspondence between the source and the DOM are a matter of opinion in the HTML case is the sort of slippery slope I'm worried about. After all, in theory, we could make the parsing algorithm output whatever data structure. If we make an exception here, I expect that we'll see more proposals that will involve the parser generating non-traditional DOM structures in order to accommodate supposed API benefits at the expense of the old DOM-assuming stuff working generically. In the XML case, the correspondence between the source and the DOM is not a matter of opinion. At least not at present. It seems that the template spec is not willing to change that. Why? Why doesn't the same answer apply to HTML? I think we shouldn't violate the DOM Consistency Design Principle and make templates have wormholes to another document when parsed from text/html but have normal children when parsed from application/xhtml+xml. That sort of route will lead to having to implement template inertness twice and one of the solutions will be one that's supposedly being avoided by the proposed design for HTML. There are several axes of presence for elements WRT to a Document: -serialization: do the elements appear in the serialization of the Document, as delivered to the client and if the client re-serializes via innerHTML, etc... -DOM traversal: do the elements appear via traversing the document's childNodes hierarchy -querySelector*, get*by*, etc: are the element's returned via various document-level query mechanisms -CSS: are the element's considered for matching any present or future document-level selectors I'm arguing that these axes should be coupled the way they are now and have always been. (And if you want them decoupled, the decoupling should be done e.g. using selectors that specifically prohibit templates in the parent chain of the selected node or using APIs that aren't low-level tree accessors like firstChild, childNodes and getElementById().) For better or worse, the DOM is a data structure that represents the markup and then has some higher-level-feature sugaring. It's not a data structure whose shape is dictated by the higher-level features. The DOM is so fundamental to the platform that I think the bar for changing its nature should be very high. Certainly higher than one vendor wishing to proceed with the feature that they've come up with. Certainly not for a feature that could be implemented without breaking the traditional model with relative ease (making inertness check walk the parent chain [or propagate an equivalent flag down for O(1) read] and rooting selector queries differently or using *:not(template)). -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] HTML Parsing and the template element
On Thu, Jun 7, 2012 at 8:35 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Just saying that querySelector/All doesn't match elements in a template (unless the scope is inside the template already) would work, but it means that we have to make sure that all future similar APIs also pay attention to this. I think that would be preferable compared to opening the Pandora's box of breaking the correspondence between the markup of the DOM tree. Besides, we'd need something like this for the XML case anyway if the position the spec takes is that it shies away from changing the correspondence between XML source and the DOM. In general, I think the willingness to break the correspondence between the source and the DOM should be the same for both HTML and XML serializations. If you believe that it's not legitimate to break the correspondence between XML source and the DOM, it would be logical to treat radical changes to the correspondence between HTML source and the DOM as equally suspect. I worry that if we take the position here that it's okay to change your correspondence between the source and the DOM in order to optimize for a real or perceived need, it will open the floodgates for all sorts of arguments that we can make the parser generate whatever data structures regardless of what the input looks like and we'll end up in a world of pain. It's bad enough that isindex is a parser macro. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR] chunked
On Thu, May 24, 2012 at 8:59 AM, Anne van Kesteren ann...@annevk.nl wrote: On Thu, May 24, 2012 at 2:54 AM, Jonas Sicking jo...@sicking.cc wrote: Is there a reason not to add chunked-text and chunked-arraybuffer to the spec right now? 1. Why not just have http://html5labs.interoperabilitybridges.com/streamsapi/ to address this case? It appears that Microsoft's proposal involves potentially buffering the XHR response body until the Web author-supplied script chooses to initiate I read from the stream object. Or am I missing something? The chunked-text and the chunked-arraybuffer response types are both simpler than the stream proposal and have the advantage that they don't involve the browser engine buffering the XHR response body: if the Web author-provided script fails to handle the chunks as they arrive, the data is gone and doesn't fill up buffer space. (Some of my remarks on IRC were confused, because I mistook the Worker-specific API for the API proposed for the main thread.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, Jun 6, 2012 at 6:49 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Flip-flopping is irrelevant. It's irrelevant in the sense of flip-flopping being bad in and of itself. Changing one's mind is okay and it's good to acknowledge past mistakes. However, in many cases if one's mind is changed after interoperable implementations have been made available to Web authors, it may be worse to try to fix past mistakes instead of just acknowledging them as mistakes and trying to not make more mistakes like that in the future. Only what is good for authors is. I believe in this case not changing the way SVG script content tokenizes would be best for authors. If deployed content would break as a result of a change, we either find a new way to accommodate the desired change, or drop it. But we need the compat data about that breakage before we can claim that it will occur. Support Existing Content is not the only relevant Design Principle here. Degrade Gracefully is also relevant. I believe it would be bad for authors if SVG-in-HTML content tokenized subtly differently in the long tail of old (old when viewed from the future) browsers which is likely to include IE9 and, at this rate, IE10 and also a mix of versions of the Android stock browser for a long time. (Also possibly various left-over versions of Firefox, Safari and Opera.) Past data suggests that IE is updated slowly and that the Android stock browser typically doesn't get updated at all (until it is abandoned by switching to another browser or by switching to another device). Arguments about it being okay to violate the Degrade Gracefully principle because the future is longer than the past (so it's always worthwhile to make things better for the future) would apply to pretty much all breaking changes to the Web platform and have the same problems this time as when the argument is applied to other breaking changes to the Web platform. The SVGWG would like to make things as good for authors as possible. Past positions don't matter, except insofar as the history of their effects on the specs persists. The reason why I care about correcting recounts of past SVG working group opinion on this topic is that I think it's better for learning from mistakes if the learning is based on the truth of what happened. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] HTML Parsing and the template element
On Wed, Jun 6, 2012 at 7:13 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: A call like document.querySelectorAll('p') doesn't *want* to get the p inside the template. I think it's backwards to assume that querySelectorAll() works a particular way and that's that's not what authors want and to change the DOM in response. There are various solutions that don't involve drastic changes to the correspondence between the markup and the DOM, for example: * Invoking querySelectorAll() on a wrapper element that's known not to be a parent of the templates on the page. * Using a selector that fails to match elements whose ancestor chain contains a template element. * Introducing an API querySelectorNonTemplate(). (Don't say All if you don't mean *all*). Even though XML has fallen out of favor, I think violations of the DOM Consistency principle and features that don't work with the XHTML serialization should be considered huge red flags indicating faulty design. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote: I think the SVG working group should learn to stand by its past mistakes. Not standing by them in the sense of thinking the past mistakes are great but in the sense of not causing further disturbances by flip-flopping. For what it's worth, I've not seen any flip-floppying on this. Over the years that I've asked the SVG WG the detailed question on if they prefer to have the parsing model for scripts in SVG-in-HTML I've consistently gotten the answer that they prefer this. At the time when SVG parsing was being added to text/html, vocal members of the SVG working group were adamant that parsing should work the same as for XML so that output from existing tools that had XML serializers could be copied and pasted into text/html in a text editor. Suggestions went as far as insisting a full XML parser be embedded inside the HTML parser. For [citation needed], see e.g. Requirement 1 in http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not the only place where the requirement was expressed but the first one I found when searching the archives) and requirements 1 and 2 as well as the first sentence under Summary in http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html . I'm also not sure how this is at all relevant here given that we should do what's best for authors, even when we learn over time what's best for authors. At this point, what's best for authors includes considerations of consistent behavior across already-deployed browsers (including IE9, soon IE10 and the Android stock browser) and future browsers. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] HTML Parsing and the template element
On Tue, Jun 5, 2012 at 12:42 AM, Ian Hickson i...@hixie.ch wrote: On Wed, 4 Apr 2012, Rafael Weinstein wrote: On Mon, Apr 2, 2012 at 3:21 PM, Dimitri Glazkov dglaz...@chromium.org wrote: Perhaps lost among other updates was the fact that I've gotten the first draft of HTML Templates spec out: http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html I think the task previously was to show how dramatic the changes to the parser would need to be. Talking to Dimitri, it sounds to me like they turned out to be less open-heart-surgery and more quick outpatient procedure. Adam, Hixie, Henri, how do you guys feel about the invasiveness of the parser changes that Dimitri has turned out here? I think it's more or less ok, but it has the problem that it doesn't give a way to reset the insertion mode again while inside a template. I still think that breaking the old correspondence between markup and the DOM and shrugging the XML side off is a big mistake. Why would it be substantially harder to check inertness by walking the parent chain (which normally won't be excessively long) as opposed to checking a flag on the owner document? I strongly believe that this template contents should be children of the template element in the DOM instead of being behind a special wormhole to another document while parsing and serializing as if the special wormhole wasn't there. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [manifest] screen sizes, Re: Review of Web Application Manifest Format and Management APIs
On Sun, May 27, 2012 at 7:45 PM, Anant Narayanan an...@mozilla.com wrote: Well, we haven't received this request from developers explicitly yet, but one can imagine a situation in which a developer makes an app only for mobile phones (Instagram?) and doesn't want users to use it on desktops. Even though it'll technically work, it might look ugly due to scaling. Shouldn't it be up to the user to refrain from using ugly apps instead of the developer preventing them? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I'm assuming the use case for this stuff isn't that authors throw random stuff at the API and then insert the result somewhere. I expect authors to pass string literals or somewhat cooked string literals to the API knowing where they're going to insert the result but not telling the insertion point to the API as a matter of convenience. If you know you are planning to insert stuff as a child of tbody, don't start your string literal with stuff that would tokenize as characters! (Firefox currently does not have the capability to queue tokens. Speculative parsing in Firefox is not based on queuing tokens. See https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the details.) Issue 2: How to infer a non-HTML implied context element Options: a) By tagName alone. When multiple namespaces match, prefer HTML, and then either SVG or MathML (possibly on a per-tagName basis) b) Also inspect attributes for tagNames which may be in multiple namespaces AFAICT, the case where this really matters (if my assumptions about use cases are right) is a. (Fragment parsing makes scripts useless anyway by setting their already started flag, authors probably shouldn't be adding styles by parsing style, both HTML and SVG font are considered harmful and cross-browser support Content MathML is far off in the horizon.) So I prefer a) possibly with a-specific elaborations if we can come up with some. Generic solutions seem to involve more complexity. For example, if we supported a generic attribute for forcing SVG interpretation, would it put us on a slippery slope to support it when it appears on tokens that aren't the first start tag token in a contextless fragment parse? Issue 3: What form does the API take a) Document.innerHTML b) document.parse() c) document.createDocumentFragment() I prefer b) because: * It doesn't involve creating the fragment as a separate step. * It doesn't need to be foolishly consistent with the HTML vs. XML design errors of innerHTML. * It's shorted than document.createDocumentFragment(). * Unlike innerHTML, it is a method, so we can add more arguments later (or right away) to refine its behavior. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML
On Wed, May 9, 2012 at 7:45 PM, Rafael Weinstein rafa...@google.com wrote: I'm very much of a like mike with Henri here, in that I'm frustrated with the situation we're currently in WRT SVG MathML parsing foreign content in HTML, etc... In particular, I'm tempted to feel like SVG and MathML made this bed for themselves and they should now have to sleep in it. I think that characterization is unfair to MathML. The math working group tried hard to avoid local name collisions with HTML. They didn't want to play namespace games. As I understand it, they were forced into a different namespace by W3C strategy tax arising from the NAMESPACE ALL THE THINGS! attitude. SVG is the language that introduced collisions with both HTML and MathML and threw unconventional camel casing into the mix. On Fri, May 11, 2012 at 1:44 AM, Tab Atkins Jr. jackalm...@gmail.com wrote: The innerHTML API is convenient. It lets you set the entire descendant tree of an element, creating elements and giving them attributes, in a single call, using the same syntax you'd use if you were writing it in HTML (module some extra quote-escaping maybe). I'm less worried about magic in an API that's meant for representing tree literals in JavaScript as a sort of E4H without changing the JavaScript itself than I am about magic in APIs that are meant for parsing arbitrary potentially user-supplied content. If we are designing an API for the former case rather than the latter case, I'm OK with the following magic: * Up until the first start tag parser behaves as in in body (Tough luck if you want to use ![CDATA[ or U+ before the first tag, though I could be convinced that the parser should start in a mode that enables ![CDATA[.) * if the first start tag is any MathML 3 element name except set or image, start behaving as if setting innerHTML on math (details of that TBD) before processing the start tag token further and then continue to behave like when setting innerHTML on math. * otherwise, if the first start tag is any SVG 1.1 element name except script, style, font or a, start behaving as if setting innerHTML on svg (details of that TBD) before processing the start tag token further and then continue to behave like when setting innerHTML on svg. * otherwise, set the insertion mode per HTML-centric template rules proposed so far. Open question: Should it be possible to use a magic attribute on the first tag token to disambiguate it as MathML or SVG? xmlns=... would be an obvious disambiguator, but the values are unwieldy. Should xlink:href be used as a disambiguator for a? If the use case is putting tree literals in code, it probably doesn't make sense to use script or style (either HTML or SVG) in that kind of context anyway. And SVG font has been rejected by Mozilla and Microsoft anyway. I still think that having to create a DocumentFragment first and then set innerHTML on it is inconvenient and we should have a method on document that takes a string to parse and returns the resulting DocumentFragment, e.g. document.parse(string) to keep it short. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML
also add a method on Document that parses a string using the HTML parser regardless of the HTMLness flag of the document and returns a DocumentFragment (or has an optional extra argument for forcing XML parsing explicitly). -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML
On Tue, Apr 24, 2012 at 6:39 AM, Rafael Weinstein rafa...@google.com wrote: What doesn't appear to be controversial is the parser changes which would allow the template element to have arbitrary top-level content elements. It's not controversial as long as an HTML context is assumed. I think it is still controversial for SVG and MathML elements that aren't wrapped in an svg or math element. I'd like to propose that we add DocumentFragment.innerHTML which parses markup into elements without a context element. Why should the programmer first create a document fragment and then set a property on it? Why not introduce four methods on Document that return a DocumentFragment: document.parseFragmentHTML (parses like template.innerHTML), document.parseFragementSVG (parses like svg.innerHTML), document.parseFragmentMathML (parses like math.innerHTML) and document.parseFragmentXML (parses like innerHTML in the XML mode without namespace context)? This would avoid magic for distinguishing HTML a and SVG a. On Thu, Apr 26, 2012 at 8:23 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: (In my dreams, we just merge SVG into the HTML namespace, and then this step disappears.) In retrospect, it would have been great if Namespaces in XML had never been introduced and SVG, MathML and HTML shared a single namespace. However, at this point trying to merge the namespaces would lead to chameleon namespaces which are evil and more trouble than fixing the historical mistake is worth. I feel very strongly that vendors and the W3C should stay away from turning SVG into a chameleon namespace. SVG is way more established them CSS gradients or Flexbox in terms of what kind of changes are acceptable. See http://lists.w3.org/Archives/Public/www-archive/2009Feb/0065.html as well as various XML experiences from non-browser contexts. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] Template element parser changes = Proposal for adding DocumentFragment.innerHTML
On Wed, May 9, 2012 at 11:39 AM, James Graham jgra...@opera.com wrote: document.parse(string, [auto|html|svg|mathml|xml]) Makes sense at least for the options other than auto. With auto being the default and doing magic, and the other options allowing one to disable the magic. I worry about introducing magic into APIs. It starts with good intentions but can lead to regrettable results. For MathML, if the parser contains a list of all MathML elements, the magic can work at any point in time if the parser and the input are from the same point in time but will fail if the parser is older vintage than the input given to it. (image can be treated as MathML here, since presumably the magic image is only needed for parsing legacy HTML pages.) Are we OK with the incorrect behavior when the input and the parser are of different vintage? (Note that the full document HTML parsing algorithm only has this problem for camel case SVG elements, which is a non-problem if the SVG working group can refrain from introducing more camel case elements.) With SVG, there's the problem that a is common and ambiguous. It seems bad to introduce an API that does reliable magic for almost everything but is unreliable for one common thing. Solving this problem with lookahead would be bad, because it would be surprising for a in a/a and a in apath//a to mean different things. Solving this problem with chameleon namespaces would introduce worse problems. I don't see any good way to solve the contextless a vs. a problem with magic. (If SVG moves away from xlink:href, we can't even use attributes to disambiguate.) If we end up doing (flawed) list-based magic, we need to make sure the SVG working group is on board and henceforth avoids local name collisions with HTML and MathML. If the API required the caller to request SVG explicitly, would it be a sure thing that jQuery would build magic heuristics on the library side? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] HTML Parsing and the template element
On Tue, Apr 3, 2012 at 1:21 AM, Dimitri Glazkov dglaz...@chromium.org wrote: Perhaps lost among other updates was the fact that I've gotten the first draft of HTML Templates spec out: http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html Once parsed, the template contents must not be in the document tree. That's surprising, radical and weird. Why are the template contents hosted in a document fragment that the template element points to using a non-child property? Why aren't the template contents simply hosted as a subtree rooted at the template element? This also breaks the natural mapping between XML source and the DOM in the XML case. This weirdness also requires a special case to the serialization algorithm. If the document fragment wasn't there and the contents of the template were simply children of template element, the parsing algorithm changes would look rather sensible. Wouldn't it make more sense to host the template contents as normal descendants of the template element and to make templating APIs accept either template elements or document fragments as template input? Or to make the template elements have a cloneAsFragment() method if the template fragment is designed to be cloned as the first step anyway? When implementing this, making embedded content inert is probably the most time-consuming part and just using a document fragment as a wrapper isn't good enough anyway, since for example img elements load their src even when not inserted into the DOM tree. Currently, Gecko can make imbedded content inert on a per-document basis. This capability is used for documents returned by XHR, createDocument and createHTMLDocument. It looks like the template proposal will involve computing inertness from the ancestor chain (template ancestor or DocumentFragment marked as inert as an ancestor). It's unclear to me what the performance impact will be. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [Clipboard] Mathematical Proofs in HTML5 Documents
On Tue, Apr 3, 2012 at 4:57 AM, Adam Sobieski adamsobie...@hotmail.com wrote: MathML3 includes annotation and annotation-xml elements which can provide parallel representations of mathematical semantics 1. Having entire proofs in math elements. Proof formats could then express semantics in annotation or annotation-xml elements. OpenMath content dictionaries could come to exist for mathematical proof structures. 2. Having proofs in HTML5 document structure, possibly containing one or more math element instances, while utilizing XML attributes from other XMLNS. Does any browser currently support any kind of a XML-based clipboard flavor? If you transfer MathML islands using an HTML clipboard flavor, you can't use arbitrary namespaces. 3. Having proofs in HTML5 document structure, possibly containing one or more math element instances, while utilizing RDFA (http://dev.w3.org/html5/rdfa/). Proof structure and semantics can overlay the HTML5 and/or the RDFA can relate elements to referenced external resources. What kind of software do expect to consume of this kind of data? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [webcomponents] HTML Parsing and the template element
On Thu, Feb 9, 2012 at 12:00 AM, Dimitri Glazkov dglaz...@chromium.org wrote: == IDEA 1: Keep template contents parsing in the tokenizer == Not this! Here's why: Making something look like markup but then not tokenizing it as markup is confusing. The confusion leads to authors not having a clear mental model of what's going on and where stuff ends. Trying to make things just work for authors leads to even more confusing here be dragons solutions. Check out http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-double-escaped-dash-dash-state Making something that looks like markup but isn't tokenized as markup also makes the delta between HTML and XHTML greater. Some people may be ready to throw XHTML under the bus completely at this point, but this also goes back to the confusion point. Apart from namespaces, the mental model you can teach for XML is remarkably sane. Whenever HTML deviates from it, it's a complication in the understandability of HTML. Also, multi-level parsing is in principle bad for perf. (How bad really? Dunno.) I *really* don't want to end up writing a single-pass parser that has to be black-box indishtinguishable from something that's defined as a multi-pass parser. (There might be a longer essay about how this sucks in the public-html archives, since the SVG WG proposed something like this at one point, too.) == IDEA 2: Just tweak insertion modes == I think a DWIM insertion mode that switches to another mode and reprocesses the token upon the first start tag token *without* trying to return to the DWIM insertion mode when the matching end tag is seen for the start tag that switched away from the DWIM mode is something that might be worth pursuing. If we do it, I think we should make it work for a fragment parsing API that doesn't require context beyound assuming HTML, too. (I think we shouldn't try to take the DWIM so far that a contextless API would try to guess HTML vs. SVG vs. MathML.) The violation of the Degrade Gracefully principle and tearing the parser spec open right when everybody converged on the spec worry me, though. I'm still hoping for a design that doesn't require parser changes at all and that doesn't blow up in legacy browsers (even better if the results in legacy browsers were sane enough to serve as input for a polyfill). -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Obsolescence notices on old specifications, again
On Mon, Jan 23, 2012 at 11:01 AM, Ms2ger ms2...@gmail.com wrote: I propose that we add a pointer to the contemporary specification to the following specifications: * DOM 2 Core (DOM4) * DOM 2 Views (HTML) * DOM 2 Events (D3E) * DOM 2 Style (CSSOM) * DOM 2 Traversal and Range (DOM4) * DOM 2 HTML (HTML) * DOM 3 Core (DOM4) and a recommendation against implementing the following specifications: * DOM 3 Load and Save * DOM 3 Validation Hearing no objections, I'll try to move this forward. I support adding such notices to the above-mentioned specs. On Mon, Jan 23, 2012 at 10:38 PM, Glenn Adams gl...@skynav.com wrote: I work in an industry where devices are certified against final specifications, some of which are mandated by laws and regulations. The current DOM-2 specs are still relevant with respect to these certification processes and regulations. I think proliferating obsolete stuff is harmful. Which laws or regulations require compliance with some of the above-mentioned specs? Have bugs been filed on those laws and regulations? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR] responseType json
On Mon, Dec 12, 2011 at 7:08 PM, Jarred Nicholls jar...@sencha.com wrote: There's no feeding (re: streaming) of data to a parser, it's buffered until the state is DONE (readyState == 4) and then an XML doc is created upon the first access to responseXML or response. Same will go for the JSON parser in our first iteration of implementing the json responseType. FWIW, Gecko parses XML and HTML in a streaming way as data arrives from the network. When readyState changes to DONE, the document has already been parsed. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR] responseType json
On Sun, Dec 11, 2011 at 4:08 PM, Jarred Nicholls jar...@sencha.com wrote: A good compromise would be to only throw it away (and thus restrict responseText access) upon the first successful parse when accessing .response. I disagree. Even though conceptually, the spec says that you first accumulate text and then you invoke JSON.parse, I think we should allow for implementations that feed an incremental JSON parser as data arrives from the network and throws away each input buffer after pushing it to the incremental JSON parser. That is, in order to allow more memory-efficient implementations in the future, I think we shouldn't expose responseText for JSON. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR] responseType json
On Fri, Dec 2, 2011 at 3:41 PM, Robin Berjon ro...@berjon.com wrote: On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote: I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. That's a good fight, but I think this is the wrong battlefield. IIRC (valid) JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are detectable rather easily. The only thing this limitation is likely to bring is pain when dealing with resources outside one's control. Browsers don't support UTF-32. It has no use cases as an interchange encoding beyond writing evil test cases. Defining it as a valid encoding is reprehensible. Does anyone actually transfer JSON as UTF-16? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: XPath and find/findAll methods
On Tue, Nov 29, 2011 at 7:33 AM, Liam R E Quin l...@w3.org wrote: (2) Not a dead end XSLT 1 and XPath 1 are not evolutionary dead ends although it's true that neither the xt nor the libxml2 library supports XSLT 2 and XPath 2. There's some support (along with XQuery) in the Qt libraries, and also in C++ with XQilla and Zorba. There are maybe 50 implementations of XPath 2 and/or XQuery 2 that I've encountered. XQuery 3.0 and XPath 3.0 are about to go to Last Call, we hope, and XSLT 3.0 to follow next year. The work is very much active and alive. Sure, XPath and XSLT keep being developed. What I meant by evolutionary dead end is that the XPath 1.0-compatibile evolutionary path has been relegated to a separate mode instead of XPath 2.0 and newer being compatible by design. So the new development you cite happens with Compatibility Mode set to false. To remain compatible with existing content, browsers would presumably have to live in the Compatibility Mode set to true world, which would mean browsers living on a forked evolutionary path that isn't the primary interest of the WGs working on the evolution. I don't have enough data about existing XPath-using Web content to know how badly the Web would break if browsers started interpreting existing XPath (1.x) expressions as XPath 2.x expression with Compatibility Mode set to false, but the fact that the WG felt that it needed to define a compatibility mode suggests that the WG itself believed the changes to be breaking ones. /html/body/div/p[@id = /html/head/link[@rel = 'me']/@src]/strong This example depends on unprefixed name expressions matching the (X)HTML namespace when tested against an element and no namespace when tested against attributes. And that trick only works with (X)HTML nodes. Selectors have the advantage that they wildcard the namespace by default, so it's feasible to define APIs that don't even have namespace binding mechanisms. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: XPath and find/findAll methods
On Thu, Nov 24, 2011 at 5:19 PM, Julian Reschke julian.resc...@gmx.de wrote: Well, the use case is to allow browsers to move to XPath2/XSLT2 at some point in the future, without having to maintain another engine. Sorry about bringing up the XPath2 rathole that's now expanding into the XSLT2 rathole. My point was that since XPath2/XSLT2 made incompatible changes, there isn't a smooth path for moving to XPath2/XSLT2 proper in browsers in the future even if browser vendors felt that it was worthwhile to expend the effort. There seems to be a potential migration path to XPath2_compat/XSLT2_compat, though, but do the people who want XPath2/XSLT2 want just the compat mode variants or the variants that the relevant WG treats as the primary ones? In any case, I think XPath2/XSLT2 have a bad investment/payoff ratio from the browser point of view, so I think it makes sense for people who want to use XSLT2 in browsers to license Saxon-CE (XSLT2 implemented in JavaScript) from Saxonica instead of expecting native implementations in browsers. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] HTML in XHR implementation feedback
On Mon, Nov 21, 2011 at 8:26 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Nov 16, 2011 at 2:40 AM, Henri Sivonen hsivo...@iki.fi wrote: * For text/html responses for response type and document, the character encoding is established by taking the first match from this list in this order: - HTTP charset parameter - BOM - HTML-compliant meta prescan up to 1024 bytes. - UTF-8 I still think that we are putting large parts of the world at a significant disadvantage here since they would not be able to use this feature together with existing content, which I would imagine is a large argument for this feature at all. Here is what I propose. How about we add a .defaultCharset property. When not set we use the list as described above. If set, the contents of .defaultCharset is used in place of UTF8. I think that makes sense as a solution if it turns out that a solution is needed. I think adding that feature now would be a premature addition of complexity--especially considering that responseText has existed for this long with a UTF-8 default without a .defaultCharset property. * When there is no HTTP-level charset parameter, progress events are stalled and responseText made null until the parser has found a BOM or a charset meta or has seen 1024 bytes or the EOF without finding either BOM or charset meta. Why? I wrote the gecko code specifically so that we can adjust .responseText once we know the document charset. Given that we're only scanning 1024 bytes, this shouldn't ever require more than 1024 bytes of extra memory (though the current implementation doesn't take advantage of that). I meant that stalling stops at EOF if the file is shorter than 1024 bytes. However, this point will become moot, because supporting HTML parsing per spec in the default mode broke Wolfram Alpha and caused wasteful parsing on Gmail, so per IRC discussion with Anne and Olli, I'm preparing to limit HTML parsing to responseType == document only. * Making responseType == not support HTML parsing at all and to treat text/html as an unknown type for the purpose of character encoding. I don't understand what the part after the and means. But the part before it sounds quite interesting to me. It would also resolve any concerns about breaking existing content. The part after and means the old behavior. This is now the plan. On Mon, Nov 21, 2011 at 8:28 PM, Jonas Sicking jo...@sicking.cc wrote: The side effect is that meta prescan doesn't happen in the synchronous mode for text/html resources. This is displeasingly inconsistent but makes sense if the sync mode is treated as an evil legacy feature rather than as an evolving part of the platform. I'm not sure what this means. Aren't we only doing meta prescan when parsing a HTML document? The meta prescan is done only when parsing HTML. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: XPath and find/findAll methods
On Wed, Nov 23, 2011 at 11:05 PM, Julian Reschke julian.resc...@gmx.de wrote: could you elaborate on what you mean by no smooth evolutionary path to XPath 2.x? (Are you referring to specs or to implementations?) Specs. XPath 2.0 changed XPath in an incompatible way. There's a compatibility mode for interpreting existing XPath 1.0 queries, but it seems like a bad idea to build on a spec whose authors have put compatibility into a side mode and what's considered the main thing isn't fully compatible with existing queries. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: XPath and find/findAll methods
On Thu, Nov 24, 2011 at 3:49 PM, Robin Berjon ro...@berjon.com wrote: Node.prototype.queryXPath = function (xpath) { So, now for the money question: should we charter this? Since IE and Opera already have a solution, it seems to me that unless that solution has bad flaws, it would make more sense to spec what they already support instead inventing a new API. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: XPath and find/findAll methods
On Tue, Nov 22, 2011 at 12:28 AM, Martin Kadlec bs-ha...@myopera.com wrote: Only reason why XPath is dead on the web is because there is not (yet) easy way to use it. It's worth noting that XPath in browsers is XPath 1.0 which doesn't have a smooth evolutionary path to XPath 2.x, so browser XPath is an evolutionary dead end unless forked on a different evolutionary path than W3C XPath. Even though XPath might be very important to its user base, in the big picture it isn't the kind of Web platform feature that would generate a lot of Web developer mindshare if a browser vendor invested in it. Chances are that investments in CSS always have a higher return on investment (in terms of Web developer mindshare) than investments in XPath. In this situation, I expect there to be no enthusiasm for polishing what's an evolutionary dead end (XPath 1.0) or for launching something incompatible that'd require a lot of up-front work (XPath 2.x) while still having to support the existing evolutionary dead end. Furthermore, XPath 2.x would be a slippery slope towards dependencies on XML Schema. Even though it's an optional feature, it's prudent to leave a wide safety margin around optional features. Otherwise, there's a risk of getting sucked into implementing bad optional features anyway. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: TAG Comment on
On Mon, Nov 21, 2011 at 12:05 PM, Anne van Kesteren ann...@opera.com wrote: On Mon, 21 Nov 2011 00:47:05 +0100, Mark Nottingham m...@mnot.net wrote: For example, some browsers still (!) support blink, but that doesn't mean we should promote its use. FWIW, blink is defined as a feature in HTML5 browsers are expected to implement. Conflating specs with promotion worries me. In particular, it worries me that the specs == promotion mindset might lead to hiding some features from the spec, which would lead back to the bad old days when spec were not even seriously trying to contain the description of what needs to be implemented in order to successfully render the Web. By all means put some kind of Surgeon General's warning about race conditions on localStorage but, please, let's not hide the description of the feature from specs. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] HTML in XHR implementation feedback
On Wed, Nov 16, 2011 at 12:40 PM, Henri Sivonen hsivo...@iki.fi wrote: * Making XHR not support HTML parsing in the synchronous mode. In reference to the other thread about discouraging synchronous XHR (outside Workers), this change ended up being made in Gecko. (HTML parsing in XHR still hasn't made its way to the Nightly channel, so don't expect to see it quite yet.) The side effect is that meta prescan doesn't happen in the synchronous mode for text/html resources. This is displeasingly inconsistent but makes sense if the sync mode is treated as an evil legacy feature rather than as an evolving part of the platform. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[XHR2] HTML in XHR implementation feedback
I landed support for HTML parsing in XHR in Gecko today. It has not yet propagated to the Nightly channel. Here's how it behaves: * Contrary to the spec, for response types other than and document, character encoding determination for text/html happens the same way as for unknown types. * For text/html responses for response type and document, the character encoding is established by taking the first match from this list in this order: - HTTP charset parameter - BOM - HTML-compliant meta prescan up to 1024 bytes. - UTF-8 * In particular, the following have no effect on the character encoding: - meta discovered by the tree builder algorithm - The user-configurable fallback encoding - Locale-specific defaults - The encoding of the document that invoked XHR - Byte patterns in the response (beyond BOM and meta). Even the BOMless UTF-16 detection that Firefox does when heuristic detection has otherwise been turned off is skipped for XHR. * When there is no HTTP-level charset parameter, progress events are stalled and responseText made null until the parser has found a BOM or a charset meta or has seen 1024 bytes or the EOF without finding either BOM or charset meta. * If the response is a multipart response, XHR behaves as if it didn't support HTML parsing for the subparts of the response. (The multipart handling infrastructure in Gecko makes assumptions that are incorrect for the off-the-main-thread parsing infrastructure. Since the plan is to move XML parsing off the main thread, too, we'll need to find out whether multipart support is a worthwhile feature to keep. If it is, we need to add some mechanisms to make multipart work when subparts are parsed off the main thread or. If not, we should drop the feature, in my opinion.) * HTML parsing is supported in the synchronous mode, but I'd be quite happy to remove that support in order to curb sync XHR proliferation. * I believe the implementation otherwise matches the spec, but exposing the document via responseXML should be considered to be at risk. See below. Risks: * Stalling progress events while waiting for meta could, in theory, deadlock an existing Web app when the Web app does long polling with responseType == , gets a text/html response without a charset declaration, the first chunk of the response is shorter than 1024 bytes and the server won't send more before the client side informs the server via another channel that the first chunk has been processed. - If this turns out to be a Real Problem, my plan is to make responseText show decoded text up to the first byte that isn't one of 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A. - I think this risk is low. * responseXML now becomes non-null for HTTP error responses that have a text/html response body. This might be a problem if Web apps that expect to get XML responses check for HTTP errors by checking responseXML for null. We'll see how bad breakage nightly testers report. - I think this risk is high. - If this turns out to be a Real Problem, the solution would be to make HTML parsing (including the meta prescan) available only when responseType == document. (Note that xhr.response maps to responseText when responseType == , so if responseXML is made null when responseType == , xhr.response wouldn't work for retrieving the tree.) This change might even be a good idea performance-wise to avoid adding HTML parsing overhead for legacy uses of XHR that don't set responseType. Spec change proposals so far: * I suggest making responseType modes other than and document not consider the internal character encoding declarations in HTML (or XML). Spec change proposals that I'm not making yet but might make in near future: * Making responseType == not support HTML parsing at all and to treat text/html as an unknown type for the purpose of character encoding. * Making XHR not support HTML parsing in the synchronous mode. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: innerHTML in DocumentFragment
On Fri, Nov 11, 2011 at 11:49 AM, Anne van Kesteren ann...@opera.com wrote: Unfortunately style and script are parsed differently depending on if they live in foreign content or not. However this is something we can fix, and would lead to a lot of other benefits (such as scripts parsed consistently when moved around in the markup). I do agree it would make sense to parse these consistently throughout text/html. I think http://www.w3.org/Bugs/Public/show_bug.cgi?id=10901 should remain as WONTFIX. * We have interop between Gecko, WebKit, Trident (since IE9 on this point) and Presto (once Ragnarök ships). Hooray! Interop is hard. When it has been achieved, we shouldn't self-sabotage it. * Even if we considered the replacement cycles of Firefox and Chrome to be fast enough that Firefox or Chrome legacy didn't matter, Microsoft has deployed XML-style tokenization of SVG in IE. (Apple has deployed the parsing in Safari, too.) Microsoft says IE9 will be supported until January 2020. Even if IE9 doesn't have a large active userbase all the way until 2020, I think authors who try to make Web content that works would be worse of if we had a period of even a couple of years with browsers that support SVG-in-text/html tokenizing scriptstyle content substantially differently. (Which would be the case if we changed the tokenization but MS didn't agree to issue such a drastic change as a patch for IE9.) * If we changed SVG style to tokenize like HTML style, we'd most likely end up breaking the kind of copypaste scenarios that the SVG WG really wanted to work in the first place. (This argument also applies to script, but style is even more likely to occur in the kind of content one would expect to be able to copypaste and have it just work.) Maybe the SVG WG has changed their mind now, but we shouldn't let them flipflop now that we've reached interop. This ship has sailed. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: innerHTML in DocumentFragment
On Fri, Nov 11, 2011 at 1:11 PM, Jonas Sicking jo...@sicking.cc wrote: Microsoft has expressed support for changing the parser here. As a patch for IE9? Have you ever actually talked to the SVG WG about this specific issue? Yes, at the time foreign lands were being specced in HTML and the SVG WG had to be dragged in kicking and screaming, because they didn't want to SVG-in-HTML to be supported at all at first. If not, please stop arguing that the SVG group wants the currently specced behavior. I'm not arguing what they want it *today*. I'm saying what they wanted earlier and why doing something different would be bad. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: innerHTML in DocumentFragment
On Thu, Nov 10, 2011 at 7:32 PM, Jonas Sicking jo...@sicking.cc wrote: I don't think we should make up rules for where it makes sense to insert DOM and where it doesn't. After all, we support .innerHTML on all HTML elements (and soon maybe all Elements), and not just a subset of them, right? Yes, but with innerHTML on elements, we always have a context node, so there isn't magic DWIM involved. But you don't need to look far to find special cases with difficult elements: We also support createContextualFragment with all possible contexts except we special-case things so that if the context is html in the (X)HTML namespace, the behavior is as if the context had been body in the (X)HTML namespace. On reasonable view is that solutions should always be complete and make sense (for some notion of making sense) for all inputs. Another reasonable view is that we shouldn't do anything with for compleness as the rationale and that everything that needs notable additional engineering needs to be justified by use cases. If no one really wants to use DWIM parsing to create a DocumentFragment that has the html element in the (X)HTML namespace as its child, why put the engineering effort into supporting such a case? Currently, per spec (and Ragnarök, Chrome and Firefox comply and interoperate), if you take take an HTML document that has head and body (as normal) and assign document.body.outerHTML = document.body.outerHTML, you get an extra head so that the document has 2 heads. Would you expend engineering effort, for the sake of making sense in all cases for completeness, to get rid of the extra head even though there are likely no use cases and 3 out of 4 engines interoperate while complying with the spec? And requiring that a context node is passed in in all cases when HTML is parsed is terrible developer ergonomics. One possibility is that instead of adding innerHTML to DocumentFragment, we add three methods to Document: DocumentFragment parseFragment(DOMString htmlMarkup); DocumentFragment parseSvgFragment(DOMString svgMarkup); DocumentFragment parseMathFragment(DOMString mathmlMarkup); parseFragment would do roughly the kind of DWIM Yehuda suggested. That is, you'd get to use tr with it but not html. parseSvgFragment would invoke the HTML fragment parsing algorithm with svg in the SVG namespace as the context. parseMathFragment would invoke the HTML fragment parsing algorithm with math in the MathML namespace as the context. As a bonus, developers would need to call createDocumentFragement() first. frag.innerHTML = g/g; someSVGElement.appendChild(frag); seems very possible to make work Making it work is a problem with a. I think we should have three DocumentFragment-returning parsing methods instead of packing a lot of magic into innerHTML on DocumentFragment, when having to obtain a DocumentFragment first and filling it as a separate step sucks as far as developer ergonomics go. someTableElement.innerHTML = tr.../trdiv/div; will just drop the div on the floor. By what mechanism? (It didn't implement and run Yehuda's suggestion, but I'm pretty sure it wouldn't drop the div. Why would we put additional effort into dropping the div?) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: innerHTML in DocumentFragment
On Fri, Nov 11, 2011 at 1:42 PM, Henri Sivonen hsivo...@iki.fi wrote: As a bonus, developers would need to call createDocumentFragement() first. Doh. Would *not* need to call createDocumentFragement() first. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: innerHTML in DocumentFragment
On Fri, Nov 4, 2011 at 1:03 AM, Yehuda Katz wyc...@gmail.com wrote: It would be useful if there was a way to take a String of HTML and parse it into a document fragment. This should work even if the HTML string contains elements that are invalid in the in body insertion mode. Something like this code should work: var frag = document.createDocumentFragment(); frag.innerHTML = trtdhello/td/tr someTable.appendChild(frag) It's easy for me to believe that there are valid use cases where the first tag encountered is tr. This would probably require a new, laxer insertion mode, which would behave similarly to the body insertion mode, but with different semantics in the A start tag whose tag name is one of: caption, col, colgroup, frame, head, tbody, td, tfoot, th, thead, tr case. What are the use cases for having this work with head and frame as first-level tags in the string? Do you also want it work with html, body and frameset? What about SVG and MathML elements? I totally sympathize that this is a problem with tr, but developing a complete solution that works sensibly even when you do stuff like frag.innerHTML = head/head frag.innerHTML = headdiv/div/head frag.innerHTML = frameset/frameseta!-- b -- frag.innerHTML = htmlbodyfoo/htmlbartr/tr frag.innerHTML = htmlbodyfoo/htmltr/tr frag.innerHTML = div/divtr/tr frag.innerHTML = tr/trdiv/div frag.innerHTML = gpath//g is a much trickier problem than you tr example makes it first seem. Do you have use cases for tags other than tr appearing as the outermost tag? What would you expect the my examples above to do and why? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: innerHTML in DocumentFragment
On Fri, Nov 4, 2011 at 2:54 PM, João Eiras jo...@opera.com wrote: * stripScripts is a boolean that tells the parser to strip unsafe content like scripts, event listeners and embeds/objects which would be handled by a 3rd party plugin according to user agent policy. According to user agent policy is a huge interoperability problem. (IIRC, Collin Jackson listed IE's toStaticHTML as an example of a bad security feature for this reason in his USENIX talk.) If we expose an HTML sanitizer to Web content as a DOM API, we should have a clear normative spec that says what exactly the sanitizer does. Stuff to debate includes what to do about Content MathML, what to do about object elements that appear to reference SVG and what to do about embed elements that bear Microdata attributes. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Sanatising HTML content through sandboxing
On Wed, Nov 9, 2011 at 9:54 AM, Adam Barth w...@adambarth.com wrote: Also, a div doesn't represent a security boundary. It's difficult to sandbox something unless you have a security boundary around it. IMHO, an easy way to solve this problem is to just exposes an HTMLParser object, analogous to DOMParser, which folks can use to safely parse HTML, DOMParser.parseFromString already takes a content type as the second argument. The plan is to support HTML parsing when the second argument is text/html. e.g., from XMLHttpRequest. XMLHttpRequest Level 2 has built-in support for HTML parsing. No need to first get responseText and then pass it to something else. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] responseText for text/html before the encoding has stabilized
On Mon, Nov 7, 2011 at 9:57 AM, Jonas Sicking jo...@sicking.cc wrote: It would be really nice if we could move forward with this thread. I was planning on reporting back when I have something that passes all mochitests. This has been delayed by other stuff. Particularly fallout from the new View Source highlighter. My preference is still to not do any HTML/XML specific processing when .responseType is set to anything other than or document. This allows us to make encoding handling consistent for text and a possible future incremental text type. My patch doesn't do HTML-specific processing when responseType is not or document. Also, the current spec leads to quite strange results if we end up supporting more text-based formats directly in XHR. For example in Gecko we've added experimental support for parsing into JSON. If we added this to a future version of XHR, this would mean that if a JSON resource was served as a text/html Content-Type, we'd simultaneously parse as HTML in order to detect encoding, and JSON in order to return a result to the page. responseType == being weird that way with XML isn't new. I guess the main difference is that mislabeling JSON as text/html might be more probable than mislabeling as XML when e.g. PHP default to text/html responses. One way to address this is to not support new response types with responseType == and force authors to set responseType to json if they want to read responseJSON. So what I suggest is that we make the current steps 4 and 5 *only* apply if .responseType is set to or document. This almost matches what we've implemented in Gecko, though in gecko we also skip step 6 which IMHO is a bug (if for no other reason, we should skip a UTF8 BOM if one is present). Makes sense. As to the question which HTML charset encoding-detection rules to apply when .responseType is set to or document and content is served as HTML I'm less sure what the answer is. It appears clear that we can't reload a resource the same way normal page does when hitting a meta which wasn't found during prescan and which declares a charset different from the one currently used. However my impression is that a good number of HTML documents out there don't use UTF8 and do declare a charset using meta within the first 1024 bytes. Additionally I do hear *a lot* that authors have a hard time setting HTTP header due to not having full access to configurations of their hosting server (as well as configurations being hard to do even when access is available). Hence it seems like we at least want to run the prescan, though if others think otherwise I'd be interested to hear. My current patch runs the prescan. There is also the issue of if we should take into account the encoding of the page which started the XHR (we do for navigation at least in Gecko), as well as if we should take user settings into account. I still believe that we'll exclude large parts of the world from transitioning to developing AJAX based websites if we drop all of these things, however I have not yet gathered that data. I think we shouldn't take the encoding of the invoking page into account. We have an excellent opportunity to avoid propagating that kind of legacy badness. I think we should take the opportunity to make a new feature less crazy. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] responseText for text/html before the encoding has stabilized
On Fri, Sep 30, 2011 at 8:05 PM, Jonas Sicking jo...@sicking.cc wrote: Unless responseType== or responseType==document I don't think we should do *any* HTML or XML parsing. Even the minimal amount needed to do charset detection. I'd be happy to implement it that way. For responseType==text we currently *only* look at http headers and if nothing is found we fall back to using UTF8. Though arguably we should also check for a BOM, but don't currently. Not checking for the BOM looks like a bug to me though not a particularly serious one given that the default is UTF-8, so the benefit of checking the BOM is that people can use UTF-16. But using UTF-16 on the wire is a bad idea anyway. This could be fixed for consistency without too much hardship but it's rather useless use of developer time. On Fri, Sep 30, 2011 at 9:05 PM, Ian Hickson i...@hixie.ch wrote: So... the prescanning is generally considered optional I consider that a spec bug. For the sake of well-defined behavior, I think the spec should require buffering up to 1024 bytes in order to look for a charset meta without a timeout (but buffering should stop as soon as a charset meta has been seen, so that if the meta appears early, there's no useless stalling until the 1024-byte boundary). (the only benefit really is that it avoids reloads in bad cases), and indeed implementations are somewhat encouraged to abort it early if the server only sent a few bytes (because that will shorten the time until something is displayed). Firefox has buffered up to 1024 bytes without a timeout since Firefox 4. I have received no reports of scripts locking due to the buffering. There have been a couple of reports of incremental display of progress messages having become non-incremental, but those were non-fatal and easy to fix (by declaring the encoding). Also, it has a number of false-positives, e.g. it doesn't ignore the contents of script elements. I think restarts with scripts are much worse than mostly-theoretical false positives. (If someone puts a charset meta inside a script, they are doing it very wrong.) Do we really want to put it into the critical path in this way? For responseType == and responseType == document, I think doing so would be less surprising than ignoring meta. For responseType == text and responseType == chunked-text or any response type that doesn't actually involve running the full HTML parser, I'd rather not run the meta prescan, either. I agree that the reloading alternative is even worse. Yes. What about just relying on the Content-Type charset= and defaulting to UTF-8 if it isn't there, and not doing any in-page stuff? That would be easy to implement, but it would be strange not to support some ways of declaring the encoding that are considered conforming by HTML. How is the encoding determined for, e.g., text/plain or text/css files brought down through XHR and viewed through responseText? Per spec, @charset isn't honored for text/css, so in that sense, not honoring meta would be consistent. However, I'd be hesitant to stop honoring the XML declaration for XML, since the could well be content depending on it. XML and CSS probably won't end up being treated consistently with each other. But then, XHR doesn't support parsing into a CSS OM. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] Avoiding charset dependencies on user settings
On Thu, Sep 29, 2011 at 11:27 PM, Jonas Sicking jo...@sicking.cc wrote: Finally, XHR allows the programmer using XHR to override the MIME type, including the charset parameter, so if the person adding new XHR code can't change the encoding declarations on legacy data, (s)he can override the UTF-8 last resort from JS (and a given repository of legacy data pretty often has a self-consistent encoding that the XHR programmer can discover ahead of time). I think requiring the person adding XHR code to write that line is much better than adding more locale and/or user setting-dependent behavior to the Web platform. This is certainly a good point, and is likely generally the easiest solution for someone rolling out a AJAX version of a new website rather than requiring webserver configuration changes. However it still doesn't solve the case where a website uses different encodings for different documents as described above. If we want to *really* address that problem, I think the right way to address it in XHR would be to add a way to XHR to override the HTML last resort encoding so that authors who are dealing with a content repository migrated partially to UTF-8 can set the last resort to the legacy encoding they know they have instead of ending up overriding the whole HTTP Content-Type for the UTF-8 content. (I'm assuming here that if someone is migrating a site from a legacy encoding to UTF-8, the UTF-8 parts declare that they are UTF-8. Authors who migrate to UTF-8 but are *still* after realizing that legacy encodings suck UTF-8 rocks too clueless to *declare* that they use UTF-8 don't deserve any further help from browsers, IMO.) I'm particularly keen to hear how this will affect locales which do not use ascii by default. Most of the contents I personally consume is written in english or swedish. Most of which is generally legible even if decoded using the wrong encoding. I'm under the impression that that is not the case for for example Chinese or Hindi documents. I think it would be sad if we went with any particular solution here without consulting people from those locales. The old way of putting Hindi content on the Web relied on intentionally misencoded downloadable fonts. From the browser's point of view, such deep legacy text is Windows-1252. Hindi content that works without misencoded fonts is UTF-8. So I think Hindi isn't relevant to this thread. Users in CJK and Cyrillic locales are the ones most hurt by authors not declaring their encodings (well, actually, readers of CJK and Cyrillic languages whose browsers are configured for other locales are hurt *even* more), so I think it would be completely backwards for browsers to complicate new features in order to enable authors in the CJK and Cyrillic locales deploy *new* features and *still* not declare encodings. Instead, I think we should design new features to make authors everywhere get their act together and declare their encodings. (Note that this position is much less extreme than the more enlightened position e.g. HTML5 App Cache manifests take: Requiring everyone to use UTF-8 for a new feature so that declarations aren't needed.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] responseText for text/html before the encoding has stabilized
On Fri, Sep 30, 2011 at 3:04 PM, Anne van Kesteren ann...@opera.com wrote: I do not see why text and moz-chunked-text have to be the same. Surely we do not want XML encoding detection to kick in for chunks. Does text and default need to be the same for responseText for text/html and XML types? It seems annoying to have to run the meta prescan or to run the XML declaration detection without running a full parse in the text mode. Having deterministic decoding and waiting for 1024 bytes if the MIME type is text/html seems reasonable. Seems reasonable for the modes that have a non-null responseXML for text/html. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] responseText for text/html before the encoding has stabilized
On Fri, Sep 30, 2011 at 3:35 PM, Anne van Kesteren ann...@opera.com wrote: On Fri, 30 Sep 2011 14:29:32 +0200, Henri Sivonen hsivo...@iki.fi wrote: On Fri, Sep 30, 2011 at 3:04 PM, Anne van Kesteren ann...@opera.com wrote: I do not see why text and moz-chunked-text have to be the same. Surely we do not want XML encoding detection to kick in for chunks. Does text and default need to be the same for responseText for text/html and XML types? It seems annoying to have to run the meta prescan or to run the XML declaration detection without running a full parse in the text mode. Unless we disable responseText and responseXML when responseType is not the empty string I am not sure that makes sense. responseType is a newish feature. If it's OK for responseType == chunked-text to use encoding determination rules that differ from responseType == or responseType == document, why should responseType == text have to be consistent with responseType == instead of being consistent with responseType == chunked-text? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] Avoiding charset dependencies on user settings
On Thu, Sep 29, 2011 at 3:30 AM, Jonas Sicking jo...@sicking.cc wrote: Do we have any guesses or data as to what percentage of existing pages would parse correctly with the above suggestion? I don't have guesses or data, because I think the question is irrelevant. When XHR is used for retrieving responseXML for legacy text/html, I'm not expecting legacy data that doesn't have encoding declations to be UTF-8 encoded. I want to use UTF-8 for consistency with legacy responseText and for well-defined behavior. (In the HTML parsing algorithm at least, we value well-defined behavior over guessing the author's intent correctly.) When people add responseXML usage for text/html, I expect them to add encoding declaration (if they are missing) when they add XHR code that uses responseXML for text/html. We assume for security purposes that an origin is under the control of one authority--i.e. that authority can change stuff within the origin. I'm suggesting that when XHR is used to retrieve text/html data from the same origin, if the text/html data doesn't already have its encoding declared, the person exercising the origin's authority to add XHR should take care of exercising the origin's authority to modify the text/html resources to add encoding declarations. XHR can't be used for retrieving different-origin legacy data without the other origin opting in using CORS. I posit that it's less onerous for the other origin to declare its encoding than to add CORS support. Since the other origin needs to participate anyway, I think it's reasonable to require declaring the encoding to be part of the participation. Finally, XHR allows the programmer using XHR to override the MIME type, including the charset parameter, so if the person adding new XHR code can't change the encoding declarations on legacy data, (s)he can override the UTF-8 last resort from JS (and a given repository of legacy data pretty often has a self-consistent encoding that the XHR programmer can discover ahead of time). I think requiring the person adding XHR code to write that line is much better than adding more locale and/or user setting-dependent behavior to the Web platform. What outcome do you suggest and why? It seems you aren't suggesting doing stuff that involves a parser restart? Are you just arguing against UTF-8 as the last resort? I'm suggesting that we do the same thing for XHR loading as we do for iframe loading. With exception of not ever restarting the parser. The goals are: * Parse as much of the HTML on the web as we can. * Don't ever restart a network operation as that significantly complicates the progress reporting as well as can have bad side effects since XHR allows arbitrary headers and HTTP methods. So you suggest scanning the first 1024 bytes heuristically and suggest varying the last resort encoding. Would you decode responseText using the same encoding that's used for responseXML? If yes, that would mean changing the way responseText decodes in Gecko when there's no declaration. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[XHR2] responseText for text/html before the encoding has stabilized
http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#text-response-entity-body says: The text response entity body is a DOMString representing the response entity body. and If charset is null and mime is text/html follow the rules set forth in the HTML specification to determine the character encoding. Let charset be the determined character encoding. Furthermore, the response entity body is defined while the state is LOADING: The response entity body is the fragment of the entity body of the response received so far (LOADING) or the complete entity body of the response (DONE). The spec is silent on what responseText for text/html should be if responseText is read before it is known that the rules set forth in the HTML specification to determine the character encoding will no longer change their result. This looks like a spec bug. There are three obvious solutions: 1) Change the encoding used for responseText as more data becomes available so that previous responseText is not guaranteed to be a prefix of subsequent responseText. 2) Make XHR pretend it hasn't seen any data at all before it has seen so much that the encoding decision is final. 3) Not using the HTML rules for responseText. Solution #1 is what Gecko now does with XML, but fortunately XML doesn't allow non-ASCII before the XML declaration, so you can't detect this from outside the black box. With HTML, solution #1 would mean handing a footgun to Web authors who might not prepare for cases where previous responseText stops being a prefix of subsequent responseText. Solution #2 could, in the worst case (assuming we aren't doing the worst of worst cases; i.e. we aren't allowing parser restarts arbitrarily late), stall until 1024 bytes has been seen, which risks breaking existing comet apps if there exist comet apps that use responseText with slowly-arriving text/html responses that don't have a BOM, don't have an early meta and don't have an HTTP charset and that require the JS part of the app to respond act on data within the first 1024 bytes before the server sends more. (OK, it would be silly to write comet apps with responseText using text/html as opposed to e.g. text/plain or whatever and not put a charset declaration on the HTTP layer, but this is the Web, so who knows if such apps exist.) Solution #3 would make the text/html side inconsistent with the XML side and could lead to confusion especially in the default mode if responseXML does honor metas (within the first 1024 bytes). Solution #3 would be easy to implement, though. As a complication, since Saturday, Gecko supports a moz-chunked-text response type which modifies the behavior of response and responseText so that they only show a string consisting of new text since the previous progress event. moz-chunked-text isn't specced anywhere (to my knowledge), but IRC discussion with Olli indicates that it's assumed that, even going forward, the encoding decision is made the same way for moz-chunked-text and text response types. This assumption obviously excludes solution #1 above, since chunks reported before meta could use a different encoding compared to chunks after meta, which wouldn't make sense. It's worth noting that moz-chunked-text turns off responseXML, so it's not unthinkable to use non-HTML rules for moz-chunked-text. In IRC discussion with Olli, we gravitated towards solution #2, but we didn't consider the comet stalling aspect in that discussion. In any case, all this should be specced properly and it currently isn't. :-( It seems to me that all these cannot be true: * responseText and responseXML use the same encoding detection rules. * The text and default modes use the same encoding detection rules. * text and moz-chunked-text use the same encoding detection rules. * moz-chunked-text uses the same encoding for all chunks. * All imaginable badly written comet apps are guaranteed to continue working. * responseXML considers meta in a deterministic way (no timer for bailing out before 1024 bytes if the network stalls). Which property do we give up? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] Avoiding charset dependencies on user settings
On Wed, Sep 28, 2011 at 4:16 AM, Jonas Sicking jo...@sicking.cc wrote: So it sounds like your argument is that we should do meta prescan because we can do it without breaking any new ground. Not because it's better or was inherently safer before webkit tried it out. The outcome I am suggesting is that character encoding determination for text/html in XHR should be: 1) HTTP charset 2) BOM 3) meta prescan 4) UTF-8 My rationale is: * Restarting the parser sucks. Full heuristic detection and non-prescan meta require restarting. * Supporting HTTP charset, BOM and meta prescan means supporting all the cases where the author is declaring the encoding in a conforming way. * Supporting meta prescan even for responseText is safe to the extent content is not already broken in WebKit. * Not doing even heuristic detection on the first 1024 bytes allows us to avoid one of the unpredictability and non-interoperability-inducing legacy flaws that encumber HTML when loading it into a browsing context. * Using a clamped last resort encoding instead of a user setting or locale-dependent encoding allows us to avoid one of the unpredictability and non-interoperability-inducing legacy flaws that encumber HTML when loading it into a browsing context. * Using UTF-8 as opposed to Windows-1252 or a user setting or locale-dependent encoding as the last resort encoding allows the same encoding to be used in the responseXML and responseText cases without breaking existing responseText usage that expects UTF-8 (UTF-8 is the responseText default in Gecko). What outcome do you suggest and why? It seems you aren't suggesting doing stuff that involves a parser restart? Are you just arguing against UTF-8 as the last resort? And in any case, it's easy to figure out where the data was loaded from after the fact, so debugging doesn't seem any harder. If that counts as not harder, I concede this point. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] Avoiding charset dependencies on user settings
On Mon, Sep 26, 2011 at 12:46 PM, Jonas Sicking jo...@sicking.cc wrote: On Fri, Sep 23, 2011 at 1:26 AM, Henri Sivonen hsivo...@iki.fi wrote: On Thu, Sep 22, 2011 at 9:54 PM, Jonas Sicking jo...@sicking.cc wrote: I agree that there are no legacy requirements on XHR here, however I don't think that that is the only thing that we should look at. We should also look at what makes the feature the most useful. A extreme counter-example would be that we could let XHR refuse to parse any HTML page that didn't pass a validator. While this wouldn't break any existing content, it would make HTML-in-XHR significantly less useful. Applying all the legacy text/html craziness to XHR could break current use of XHR to retrieve responseText of text/html resources (assuming that we want responseText for text/html work like responseText for XML in the sense that the same character encoding is used for responseText and responseXML). This doesn't seem to only be a problem when using crazy parts of text/html charset detection. Simply looking for meta charset in the first 1024 characters will change behavior and could cause page breakage. Or am I missing something? Yes: WebKit already performs the meta prescan for text/html when retrieving responseText via XHR even though it doesn't support full HTML parsing in XHR (so responseXML is still null). http://hsivonen.iki.fi/test/moz/xhr/charset-xhr.html Thus, apps broken by the meta prescan would already be broken in WebKit (unless, of course, they browser sniff in a very strange way). And apps that wouldn't be OK with using UTF-8 as the fallback encoding when there's no HTTP-level charset, no BOM and no meta in the first 1024 bytes would already by broken in Gecko. Applying all the legacy text/html craziness to XHR would make data loading in programs fail in subtle and hard-to-debug ways depending on the browser localization and user settings. At least when loading into a browsing context, there's visual feedback of character misdecoding and the feedback can be attributed back to a given file. If setting-dependent misdecoding happens in the XHR data loading machinery of an app, it's much harder to figure out what part of the system the problem should be attributed to. Could you provide more detail here. How are you imagining this data being used such that it's not being displayed to the user. I.e. can you describe an application that would break in a non-visual way and where it would be harder to detect where the data originated from compared to for example iframe usage. If a piece of text came from XHR and got injected into a visible DOM, it's not immediately obvious, which HTTP response it came from. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] Avoiding charset dependencies on user settings
On Thu, Sep 22, 2011 at 9:54 PM, Jonas Sicking jo...@sicking.cc wrote: I agree that there are no legacy requirements on XHR here, however I don't think that that is the only thing that we should look at. We should also look at what makes the feature the most useful. A extreme counter-example would be that we could let XHR refuse to parse any HTML page that didn't pass a validator. While this wouldn't break any existing content, it would make HTML-in-XHR significantly less useful. Applying all the legacy text/html craziness to XHR could break current use of XHR to retrieve responseText of text/html resources (assuming that we want responseText for text/html work like responseText for XML in the sense that the same character encoding is used for responseText and responseXML). Applying all the legacy text/html craziness to XHR would make data loading in programs fail in subtle and hard-to-debug ways depending on the browser localization and user settings. At least when loading into a browsing context, there's visual feedback of character misdecoding and the feedback can be attributed back to a given file. If setting-dependent misdecoding happens in the XHR data loading machinery of an app, it's much harder to figure out what part of the system the problem should be attributed to. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR2] Avoiding charset dependencies on user settings
On Fri, Sep 23, 2011 at 11:26 AM, Henri Sivonen hsivo...@iki.fi wrote: Applying all the legacy text/html craziness Furthermore, applying full legacy text/html craziness involves parser restarts for GET requests. With a browsing context, that means renavigation, but I really don't want to support parser restarts in XHR. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Adding Web Intents to the Webapps WG deliverables
On Tue, Sep 20, 2011 at 6:53 AM, Ian Hickson i...@hixie.ch wrote: Why not just improve both navigator.registerContentHandler and navigator.registerProtocolHandler? In particular, why are intents registered via a new HTML element rather than an API? Web Activities addresses this problem space without a new HTML element: https://github.com/mozilla/openwebapps/blob/master/docs/ACTIVITIES.md -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[XHR2] Avoiding charset dependencies on user settings
http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#document-response-entity-body says: If final MIME type is text/html let document be Document object that represents the response entity body parsed following the rules set forth in the HTML specification for an HTML parser with scripting disabled. [HTML] Since there's presumably no legacy content using XHR to read responseXML for text/html (and expecting HTML parsing) and since (in Gecko at least) responseText for non-XML tries HTTP charset and falls back on UTF-8, it seems it doesn't make sense to implement full-blown legacy charset craziness for text/html in XHR. Specifically, it seems that it makes sense to skip heuristic detection and to use UTF-8 (as opposed to Windows-1252 or a locale-dependent value) as the fallback encoding if there's neither meta nor HTTP charset, since UTF-8 is the pre-existing fallback for responseText and responseText is already used with text/html. As it stands, the XHR2 spec defers to a part of HTML that has legacy-oriented optional features. It seems that it makes sense to clamp down those options for XHR. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Overriding the MIME type in XHR2 after the request has started
In reference to http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#dom-xmlhttprequest-overridemimetype It seems to me that XHR2 allows overrideMimeType() to be called at any time so that it affects the calls to the responseXML getter the overrideMimeType() call. And the subsequent overrideMimeType() calls can make the responseXML getter return different things later. This is bad, because it requires synchronous parsing when the responseXML getter is called. OTOH, if overrideMimeType() calls were honored only before the send() method has been called, parsing to DOM could be implemented progressively (potentially off-the-main-thread) as the resource representation downloads and the responseXML getter could return this eagerly-parsed Document and always return the same document on subsequent calls. Are there compelling use cases for allowing overrideMimeType() after send() has been called? I assume that typically one would use overrideMimeType() when knowing ahead of time that the config of the server responding to XHR is bogus. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[XHR2] Overriding the MIME type in XHR2 after the request has started
Adding [XHR2] to the subject to comply with the instructions. Sorry about the noise. On Tue, 2011-04-19 at 12:04 +0300, Henri Sivonen wrote: In reference to http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#dom-xmlhttprequest-overridemimetype It seems to me that XHR2 allows overrideMimeType() to be called at any time so that it affects the calls to the responseXML getter the overrideMimeType() call. And the subsequent overrideMimeType() calls can make the responseXML getter return different things later. This is bad, because it requires synchronous parsing when the responseXML getter is called. OTOH, if overrideMimeType() calls were honored only before the send() method has been called, parsing to DOM could be implemented progressively (potentially off-the-main-thread) as the resource representation downloads and the responseXML getter could return this eagerly-parsed Document and always return the same document on subsequent calls. Are there compelling use cases for allowing overrideMimeType() after send() has been called? I assume that typically one would use overrideMimeType() when knowing ahead of time that the config of the server responding to XHR is bogus. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Use cases for Range::createContextualFragment and script nodes
When WebKit or Firefox trunk create an HTML script element node via Range::createContextualFragment, the script has its 'already started' flag set, so the script won't run when inserted into a document. In Opera 10.63 and in Firefox 3.6.x, the script doesn't have the 'already started' flag set, so the script behaves like a script created with document.createElement(script) when inserted into a document. I'd be interested in use cases around createContextualFragment in order to get a better idea of which behavior should be the correct behavior going forward. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Please special-case html as the context of Range.createContextualFragment when the doc is an HTML doc
For a future revision of DOM Range: Please specify that when the document associated with Range is an HTML document (has its HTMLness bit set per HTML5) and the context node (startContainer of the Range) has local name html in the XHTML namespace, the context passed to the HTML fragment parsing algorithm should be body in the XHTML namespace instead. This is required for compat with existing scripts. See https://bugzilla.mozilla.org/show_bug.cgi?id=585819 and dependencies/duplicates. (Also, if the document is an HTML document and the range doesn't have an identifiable context element node even after walking the parent chain, the context passed to the HTML fragment parsing algorithm should be body in the XHTML namespace. I don't know DOM Range well enough to say how this situation can arise.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Issues with XML Dig Sig and XML Canonicalization; was Re: Rechartering WebApp WG
Sorry about the slow response time. On Feb 12, 2010, at 16:07, Marcos Caceres wrote: What we are discussing is if Mozilla's solution for signing Zip files (JAR-based) [1] is easier for vendors to implement/maintain and authors to deal with when compared to the W3C Widget solution of using XML Dig Sig. I think it's clear that JAR/XPI signing is simpler than XML Dig Sig, because JAR signing operates on a plain text / line-base manifest and, thus, doesn't require XML canonicalization before the signing step. I have previously listed the summary of issues in http://lists.w3.org/Archives/Public/public-webapps/2009AprJun/0178.html Thus far, in terms of ease of use for authors, little in the way of concrete evidence has been presented of one signing method being easier than the other (specially by looking at the complexity of using Mozilla's command line-based tool [1] compared to BONDI's SDK [2]). This is not to say that Mozilla (or anyone, given its open source nature) could not make a super easy tool for signing zip files. FWIW, I think I haven't ever argued anything about the ease of use of Mozilla's XPI signing tool. I have previously argued that Sun's jar signing tools were widely available. (Previously, I was unaware that .jar and .xpi used different crypto algorithms. Since .xpi is newer, one might assume it has a better algorithm in terms of crypto characteristics but obviously not in terms of network effects of tool availability.) However, the proof is in the pudding here: By virtue that Bondi's SDK includes a tool that allows widgets to be signed with a few clicks is evidence that the W3C's Widgets Signature specification is capable of being used to produce easy to use products. I don't think I've ever claimed that the production of easy-to-use products weren't *possible*. My claim was that XML Canonicalization (whether Exclusive or not) introduces enough *implementation* complexity that previously, buggy canonicalization code has been deployed, which has lead to signatures failing to validate with other implementations that weren't bugwards-compatible with the signer's implementation. Here's evidence of bugs in just one high-profile Canonicalization implementation: https://issues.apache.org/bugzilla/buglist.cgi?query_format=advancedproduct=Securitycomponent=Canonicalizationcmdtype=doit In terms of implementation, Mozilla has previously raised concerns about XML canonicalization (which I don't fully understand, hence the growing email cc list) - but by the virtue that people have implemented the Widget signing spec, I await to see if Mozilla's concerns will materialize in practice and actually hinder interoperability - I'm not saying this is FUD, but we need proof. The above is proof of *previous* interop-sensitive bugs in a widely-deployed Canonicalization implementation. There's no reason to believe that complexity-induced bugs of this kind are unique to one implementation. Instead, I think it's fair to expect any from-scratch implementation of Canonicalization to be prone to similar bugs *that could be avoided by using jar signing instead*, since jar signing omits the Canonicalization step entirely. Unfortunately, due to confidentiality concerns of people deploying crypto software, I can't give you concrete deployment stories where the above-cited bugs have caused interop issues. I can only point to the public bug list and assert that bugs in this area have had actual interop consequences at deployment time. (Also, having to have a Canonicalization impl. adds code bloat compared to having a jar signing impl.) It's too early to make the call that widget signing is flawed. And it's important to note that no one that has implemented has come back to the WG raising any concerns or screaming bloody murder. It could be that people don't sign widgets very often. I don't recall ever seeing a signed Firefox extension or a signed Eclipse plug-in. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Notifications
On Feb 10, 2010, at 20:35, John Gregg wrote: I agree that this is a good distinction, but I think even considering ambient notifications there is a question of how much interaction should be supported. NotifyOSD, for example, does not allow the user to take any action in response to a notification. Being able to acknowledge an ambient notification could be an optional feature that isn't supported on Ubuntu as long as NotifyOSD doesn't support acknowledging notifications. (If it's a problem to make acknowledgement optional, I think making HTML notification optional is going to be a bigger problem...) FWIW, Microsoft explicitly says notifications must be ignorable and don't persist. Notifications aren't modal and don't require user interaction, so users can freely ignore them. In Windows Vista® and later, notifications are displayed for a fixed duration of 9 seconds. http://msdn.microsoft.com/en-us/library/aa511497.aspx As such, it's always unsafe to design UI in a way that expects the users to be able to acknowledge a given notification. So a very simple use case: email web app wants to alert you have new mail outside the frame, and allow the user to click on that alert and be taken to the inbox page. This does not work on NotifyOSD, because they explicitly don't support that part of the D-bus notifications spec. However, Growl would support this. If acknowledgement support is super-important to Web apps, surely it should be to native apps, too. It seems to me that it would be a bad outcome for users if the Ubuntu desktop and the Web platform disagree on this point and it causes the duplication of notification mechanisms. I think it would make more sense to either add org.freedesktop.Notifications.ActionInvoked to NotifyOSD (if acknowledgeability is the Right Thing) or not to add acknowledgeability to the Web platform (if that's the Right Thing). Having two groups of platform designers (the designers of the Ubuntu desktop and the designers of the Web platform) disagree on what the Right Thing is makes the users lose. CCing mpt in case he can share some insight into why NotifyOSD explicitly doesn't support org.freedesktop.Notifications.ActionInvoked. On Feb 11, 2010, at 00:10, Drew Wilson wrote: it seems like the utility of being able to put markup such as bold text, or graphics, or links in a notification should be self-evident, It's not self-evident. If it were, surely native apps would be bypassing NotifyOSD and Growl to get more bolded and linkified notifications. On Feb 11, 2010, at 16:07, Jeremy Orlow wrote: As has been brought up repeatedly, growl and the other notification engines are used by a SMALL FRACTION of all web users. I suspect a fraction of a percent. Why are we bending over backwards to make this system work on those platforms? More seriously though: Virtually every user of an up-to-date Ubuntu installation has the notification engine installed. As for Growl, the kind of users who install Growl are presumably the kind of users who care about notifications of multiple concurrent things the most. Furthermore, it seems that notifications are becoming more a part of operating system platfroms. For example, it looks like Windows 7 has a system API for displaying notifications: http://msdn.microsoft.com/en-us/library/ee330740%28VS.85%29.aspx Are there other examples where we've dumbed down an API to the least common denominator for a small fraction of users? Especially when there's no technical reason why these providers could not be made more advanced (for example, embed webkit to display fully functional notifications)? It's not a given that it's an advancement in user experience terms not to force all ambient notifications into a consistent form. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Notifications
On Feb 3, 2010, at 20:54, Drew Wilson wrote: Following up on breaking out createHTMLNotification() and createNotification() vs combining them into one large API - I believe the intent is that a given user agent may not support all types of notifications (for example, a mobile phone application may only support text + icon notifications, not HTML notifications). My main concern isn't mobile phones in the abstract but mapping to concrete system-wide notification mechanisms: Growl and NotifyOSD on Mac and Ubuntu respectively. So far, the only use case I've seen (on the WHATWG list) for HTML notifications that aren't close to the kind of notifications that Growl and NotifyOSD support has been a calendar alarm. I agree that calendar alarm is a valid use case, but I think HTML vs. not HTML isn't the right taxonomy. Rather, it seems to me that there are ambient notifications (that dismiss themselves after a moment even if unacknowledged) and notifications that are all about interrupting the user until explicitly dismissed (calendar alarms). I think the API for ambient notifications should be designed so that browsers can map all ambient notifications to Growl and NotifyOSD. As for notifications that require explicit acknowledgement, I think it would be worthwhile to collect use cases beyond calendar alarms first and not heading right away to generic HTML notifications. If it turns out that notifications that require explicit acknowledgements are virtually always calendar alarms or alarm clock notifications, it might make sense to design an API explicitly for those. For example, it could be desirable to allow a privileged calendar Web app to schedule such alarms to fire on a mobile device without having to keep a browsing context or a worker active at the notification time. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: XMLHttpRequest Comments from W3C Forms WG
On Dec 16, 2009, at 21:47, Klotz, Leigh wrote: I'd like to suggest that the main issue is dependency of the XHR document on concepts where HTML5 is the only specification that defines several core concepts of the Web platform architecture, such as event loops, event handler attributes, Etc. A user agent that doesn't implement the core concepts isn't much use for browsing the Web. Since the point of the XHR spec is getting interop among Web browsers, it isn't a good allocation of resources to make XHR not depend on things that a user agent that is suitable for browsing the Web needs to support anyway. XHR interop doesn't matter much if XHR is transplanted into an environment where the other pieces fail to be interoperable with Web browsing software. That is, in such a case, it isn't much use if XHR itself works like XHR in browsers--the system as a whole still doesn't interoperate with Web browsers. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [selectors-api] querySelector with namespace
On Nov 26, 2009, at 13:18, Jonathan Watt wrote: During a discussion about xml:id I was about to make the throw away comment that you could use querySelector to easily work around lack of support for xml:id, but on checking it turns out that's not the case. querySelector, it seems, cannot be used to select on a specific namespace, since you can only use namespace prefixes in selectors, and querySelector does not resolve prefixes. Isn't the easiest solution not to support xml:id on the Web? It's not supported in Gecko, WebKit or Trident. What's the upside of adding it? xml:id doesn't enable functionality that the id attribute on HTML, MathML or SVG elements doesn't enable, but xml:id comes with all sorts of complications. In addition to this complication, it has the complication that in an xml:id-enabled world, an element doesn't have a single attribute that has IDness. Instead, it has to have two (the natural choice flowing from XML specs) or the IDness of attributes has to depend on the presence of other attributes (the choice taken by SVG 1.2 Tiny). -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: XMLSerializer should run HTML serialization algorithm when input doc is HTML
On Jul 2, 2009, at 12:11, Giovanni Campagna wrote: 2009/7/2 Cameron McCormack c...@mcc.id.au: Henri Sivonen: Gecko bug: https://bugzilla.mozilla.org/show_bug.cgi?id=500937 The proposed patch there and (based on black-box testing) WebKit solve the issue by running the HTML serialization algorithm when the owner document of the input node is an HTML document. This should probably be in a spec somewhere. We’d need a spec for XMLSerializer first, I guess. Then we need a discussion about the possibility of having a spec for XMLSerializer, having already DOM3LS. It should be pretty clear by now that XHR/DOMParser/XMLSerializer won and DOM3 LS lost. DOM3 LS is now just a legacy burden, but XHR, DOMParser and XMLSerializer need specs in order to iron out interop issues. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
XMLSerializer should run HTML serialization algorithm when input doc is HTML
Gecko bug: https://bugzilla.mozilla.org/show_bug.cgi?id=500937 The proposed patch there and (based on black-box testing) WebKit solve the issue by running the HTML serialization algorithm when the owner document of the input node is an HTML document. This should probably be in a spec somewhere. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Please include a statement of purpose and user interaction expectations for feature
On Jun 2, 2009, at 16:02, Robin Berjon wrote: On Jun 2, 2009, at 14:57 , Henri Sivonen wrote: Please include a corresponding UA requirement to obtain authorization from the user for the features imported with feature. (It seems that the security aspect requires an authorization and doesn't make sense if the dangerous feature are simply imported silently.) As far as I can tell, the spec doesn't currently explain what the UA is supposed to do with the 'feature list' once built. I don't think that that is a good idea. The purpose of feature is to provide a hook through which a widget may communicate with a security policy. What's in the security policy really isn't up to P +C to define (though it certainly should be defined somewhere else). Maybe it could ask the user, as you state, but maybe it could see that the widget was signed by a trusted party, or know that the device doesn't have any sensitive data for a given API, or maybe anything goes on the full moon. I see. The track record with Java APIs doesn't fill me with confidence that the Right Thing will be done, but I guess this is outside the scope of interop-oriented specs. (My current phone asks me every time Google Maps Mobile wants to use the network and doesn't allow me to grant the permission permanently and doesn't ask me when GMM wants to grab my geolocation and send it to Google.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Please include a statement of purpose and user interaction expectations for feature
On Jun 16, 2009, at 15:42, Marcos Caceres wrote: Based on Arve and Robin's additional feedback, I've added the following to the spec as part of The Feature Element section: How a user agent makes use of features depends on the user agent's security policy, hence activation and authorization requirements for features are beyond the scope of this specification. Is that satisfactory? I think it's better than what was in the spec before. However, if a reader doesn't already know what feature is for, I think the current text might not make it quite clear. I notice that now the definition of what a feature is includes a video codec in addition to APIs. Does BONDI expect video codecs to be sensitive to security policies? Do you envision undeclared video codecs to be withheld from the HTML5 source fallback and canPlayType()? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[widgets] Please include a statement of purpose and user interaction expectations for feature
Please state the purpose of feature. (That it's for authorizing features that don't participate in the Web-oriented browser security model.) Please include a corresponding UA requirement to obtain authorization from the user for the features imported with feature. (It seems that the security aspect requires an authorization and doesn't make sense if the dangerous feature are simply imported silently.) As far as I can tell, the spec doesn't currently explain what the UA is supposed to do with the 'feature list' once built. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[widgets] Purpose and utility of feature unclear
Regarding http://dev.w3.org/2006/waf/widgets/#the-feature-element I don't understand the purpose and utility of the feature element. Using a feature element denotes that, at runtime, a widget may attempt to access the feature identified by the feature element's name attribute. Why is this useful to denote? What happens if a widget doesn't denote that it'll attempt to use a feature but does so anyway? Using a feature element denotes that, at runtime, a widget may attempt to access the feature identified by the feature element's name attribute. Why aren't all the implemented features simply available like in a Web browser engine? A user agent can expose a feature through, for example, an API, in which case a user agents that supports the [Widgets-APIs] specification can allow authors to check if a feature loaded via the hasFeature() method. Wouldn't this have all the same problems that DOM hasFeature() has had previously and the problems that have been pointed out as reasons not to have feature detection at-rules in CSS? Namely, that implementations have the incentive to claim that they have a feature as soon as they have a partial buggy implementation. A boolean attribute that indicates whether or not this feature must be available to the widget at runtime. In other words, the required attribute denotes that a feature is absolutely needed by the widget to function correctly, and without the availability of this feature the widget serves no useful purpose or won't execute properly. What's a widget engine expected to do when an unrecognized feature is declared as required? feature name=http://example.org/api.geolocation; required=false/ Suppose a WG creates a feature for the Web, the feature is not part of the Widgets 1.0 Family of specs and the WG doesn't assign a feature string for the feature because the WG doesn't consider widgets. Next, suppose browser engines implement the feature making it unconditionally available to Web content. Now, if such a browser engine is also a widget engine, does it make the feature's availability on the widget side conditional to importing it with feature? If it does, what's the point of not making the feature available unconditionally? If it doesn't, what's the point of feature? If there are two such engines, how do they converge on the same feature name string of the specifiers of the feature itself just meant it to be available to Web content unconditionally and didn't bother to mint a widget feature string? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Public keys in widgets URI scheme?
On May 27, 2009, at 18:32, Adam Barth wrote: 3) A developer can write two widgets that occupy the same origin (again, but re-using the public key). These widgets will be able to interact more freely, for example by sharing the same localStorage, etc. I though the point of the UUID was to isolate even different instances of the same widget. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Jar signing vs. XML signatures
On Apr 17, 2009, at 13:24, Robin Berjon wrote: Trying to separate the discussion from the change request: would you be satisfied if requirements to perform C14N were removed and reliance on XSD data types for definition purposes were replaced with something less scary (though in this case this is a bit of a FUD argument Henri, the referenced types aren't overwhelming)? My preferred change would be adopting jar signing. However, if that's not feasible, my next preferred option would indeed be removing the requirement to perform canonicalization (i.e. sign XML as binary with a detached traditional binary signature block). As for the data types, I'd be satisfied if the datatypes were defined in such a way that attribute value parsing algorithms and conversion methods that a browser engine has to contain anyway were reusable. This should include well-defined behavior in the case of non- conforming input. For example, for dates (which is a datatype that widgets add--not something that comes from XML signatures), it makes more sense to reuse an appropriate microsyntax definition from HTML5 than to delegate to XSD. XSD not only makes leading and trailing whitespace conforming and fails to define behavior in the case of non-conforming dates, XSD which even allows leap seconds! (Is it a FUD argument that XSD dates deviate from the value space that is typically used in Posix date conversions between multi-unit tuples and epoch seconds?) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Jar signing vs. XML signatures
On Apr 15, 2009, at 15:00, Marcos Caceres wrote: On Tue, Apr 14, 2009 at 4:19 PM, Henri Sivonen hsivo...@iki.fi wrote: On Apr 14, 2009, at 14:38, Marcos Caceres wrote: I think it would be more productive to help us address the issues that you mentioned, instead of asking us to dump everything and start again. So the issues were: 1) The complexity of canonicalization/reserialization of XML. I think this is an issue that needs to be taken up with XML Security WG or whoever is working on the canonicalization spec. That's not the point. The point is that XML signatures try to solve a more complex problem than what needs solving for signing a zip file. It would be useless to tell people who do want to solve the complex problem that they should solve it without canonicalization. But widgets don't really need to solve the problem XML signatures solve. Widgets could get away with signing a manifest traditionally without the signing method knowing that what is being signed happens to be XML. This would result in having to reserve one file name for the manifest (either in a jar-ish text format or in a W3C-ish XML format) and a range of file names for detached signatures in traditional binary formats that off-the-shelf crypto libraries support. The cost of putting the signatures inside the manifest XML file is that you end up importing complexity like canonicalization. The above approach won't quite work without a bit of elaboration you really want the distributor to sign the author signature and not just sign the same manifest. (What's the purpose of A distributor signature MUST have a ds:Reference for any author signature, if one is present within the widget package. Why does it matter for the widget engine that the distributor signed the author signature if both sign the same manifest?) 2) Spec dependency on XSD. We can probably address this and use prose as you suggested. So you recommend we follow HTML5 here, right? Yes, I think the HTML5 approach to defining syntax is better. Given that you understand the problem, can you maybe propose some text? I'm not sure I understand the problem that the spec is solving. :-) For example, I don't know where the code for actually parsing CreatedType is supposed to come from. However, my wild guess is that unless widget impls are supposed to bring in huge off-the-shelf XSD machinery, it would be better to use English to define a more constrained date format here the way HTML5 does than to defer to XSD. Of course, that doesn't help much if you import XSD deps otherwise from XML signatures. If you are bringing in XSD machinery anyway, defining things without XSD might even hurt. What's the expected software reuse scenario here? Instead of canonicalizing the manifest XML and using XML signature, you could treat the manifest XML as a binary file and sign it the traditional way leaving a detached binary signature in the format customary for the signing cipher in the zip file. This would address issues #1 and #2. That is our intention. Do you mean that's the existing plan (that's not what it looks like) or that that's a new intention? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[widgets] Jar signing vs. XML signatures
I noticed that widget packaging uses XML signatures (notorious for bugs in canonicalization/reserialization code) for signing zip files. However, signing zip files has been solved long ago for Java jar files. The mechanism or a variation of it is also used for Mozilla xpi files and ODF documents. Wouldn't it be simpler to use jar signing instead of inventing a new way of signing zip files with implementation dependencies on XML signatures and spec dependencies on XSD? (Why does the spec have dependencies on XSD?) Jar signing is pretty simple compared to XML canonicalization reserialization. When you need to reserialize XML, you import all the troubles of serializing XML (see e.g. https://issues.apache.org/bugzilla/buglist.cgi?query_format=advancedproduct=Securitycomponent=Canonicalizationcmdtype=doit ). The META-INF folder is ugly, but unsigned widgets could omit it, and it isn't much uglier than an XML signature file on the top level of the zip archive. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Jar signing vs. XML signatures
On Apr 14, 2009, at 11:57, Thomas Roessler wrote: On 14 Apr 2009, at 10:27, Henri Sivonen wrote: Wouldn't it be simpler to use jar signing instead of inventing a new way of signing zip files with implementation dependencies on XML signatures and spec dependencies on XSD? (Why does the spec have dependencies on XSD?) Which XSD dependency do you mean? The only XSD dependencies I could think of right now are ones that say things like the value of this attribute is of type anyURI or the value space of this element is a restriction on the base64Binary XSD type. XML Signature does not require schema validation, or anything like that. Hence, spec dependencies. I don't find the string anyURI in the spec, but anyURI is a great example of why defining syntax in terms of XSD datatypes is a bad idea: http://hsivonen.iki.fi/thesis/html5-conformance-checker#iri XSD datatypes are too vague, allow whitespace where the spec writer didn't mean to allow whitespace or allow surprising values (like 0 and 1 when the spec writer though (s)he'd be allowing true and false). It is much safer to define datatypes in precise English prose like HTML5 does than to expect XSD to match what is really meant. When you need to reserialize XML, you import all the troubles of serializing XML (see e.g. https://issues.apache.org/bugzilla/buglist.cgi?query_format=advancedproduct=Securitycomponent=Canonicalizationcmdtype=doit ). The only place where you actually need canonicalization is when hashing the SignedInfo element inside the signature file (i.e., once per signature verification). Given that the signature format is profiled down pretty heavily in the widget signing spec, I'd dare a guess that most of the complexity isn't ever used, so a careful implementation might be able to write a c14n implementation that bails out on anything that doesn't look like a signature that follows the constraints in this format. If you need to do canonicalization even in one place, you need a properly debugged implementation of it. If the signature format is profiled heavily, doesn't it mean you can't even use an off-the-shelf implementation of XML signatures? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Jar signing vs. XML signatures
On Apr 14, 2009, at 13:01, Thomas Roessler wrote: On 14 Apr 2009, at 11:42, Henri Sivonen wrote: XSD datatypes are too vague, allow whitespace where the spec writer didn't mean to allow whitespace or allow surprising values (like 0 and 1 when the spec writer though (s)he'd be allowing true and false). It is much safer to define datatypes in precise English prose like HTML5 does than to expect XSD to match what is really meant. There's an interesting discussion to be had here; however, I doubt it's in scope for this WG. (In other words, this strikes me as a rathole.) I don't see why widgets need to depend on XML signatures at all. When you need to reserialize XML, you import all the troubles of serializing XML (see e.g. https://issues.apache.org/bugzilla/buglist.cgi?query_format=advancedproduct=Securitycomponent=Canonicalizationcmdtype=doit ). The only place where you actually need canonicalization is when hashing the SignedInfo element inside the signature file (i.e., once per signature verification). Given that the signature format is profiled down pretty heavily in the widget signing spec, I'd dare a guess that most of the complexity isn't ever used, so a careful implementation might be able to write a c14n implementation that bails out on anything that doesn't look like a signature that follows the constraints in this format. If you need to do canonicalization even in one place, you need a properly debugged implementation of it. If the signature format is profiled heavily, doesn't it mean you can't even use an off-the- shelf implementation of XML signatures? Much of the complexity of canonicalization (and signature in general) comes from the need to deal with pretty arbitrary nodesets generated by transform chains. The widget signature profile does not use (i.e., it's a MUST NOT) any transform chains. Since the use of transforms is a choice of the signature application, you shouldn't have any trouble using existing toolkits. This all seems like needless complexity to me. To sign a zip archive, one needs a manifest file that contains digests for all the other zip entries and a signature for the manifest file. Even if widgets use an XML manifest instead of a jar-style plaintext manifest (which would be supported by existing jar signing tools; analogously to the zip format itself having been chosen due to pre-existing tool support), why would one want to sign the manifest XML with the XML signature machinery instead of signing it as a sequence of bytes using a well-established detached signature format for signing a file of bytes? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Jar signing vs. XML signatures
On Apr 14, 2009, at 14:38, Marcos Caceres wrote: I think it would be more productive to help us address the issues that you mentioned, instead of asking us to dump everything and start again. So the issues were: 1) The complexity of canonicalization/reserialization of XML. 2) Spec dependency on XSD. 3) Inability to use existing jar signing tools. If you are already profiling XML signature a lot and are already using a detached signature file, it seems to me that you are one step away from optimizing away canonicalization: Instead of canonicalizing the manifest XML and using XML signature, you could treat the manifest XML as a binary file and sign it the traditional way leaving a detached binary signature in the format customary for the signing cipher in the zip file. This would address issues #1 and #2. But then if you are signing the XML manifest file the traditional way, you are a step away from using jar-compatible manifests. :-) This would address issue #3. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Content-type sniffing and file extension to MIME mapping
On Mar 6, 2009, at 15:29, Marcos Caceres wrote: 2. The XHTML mapping should also appear in the file identification table [2]. What version of XHTML should I be pointing to? 1.0 or 1.1? Does it need to say anything more than that .xhtml maps to application/ xhtml+xml? The media type is defined by RFC 3236. As implemented, the media type isn't restricted to a particular point version of XHTML and browsers don't implement 1.1. (In fact, the media types in the table aren't defined by the specs in the Defined by column in general, but are defined by RFCs.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [selectors-api] SVG WG Review of Selectors API
On Jan 27, 2009, at 00:18, Alex Russell wrote: We just need to invent a pseudo-property for elements which can be matched by a :not([someProperty=your_ns_here]). To select SVG elements while avoiding HTML elements of the same name, a selector that prohibits the local name foreignObject between an ancestor svg element and the selector subject would be good enough. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [widgets] Trimming attribute values, a bad idea?
On Dec 3, 2008, at 04:51, Marcos Caceres wrote: So, for instance, access network= false is ok. Does anyone see any problem with this? Should I revert back to being strict and having UA do comparisons without trimming? Experience with HTML, SVG and MathML indicates that when trimming is specified, implementors don't always do it. My conclusion is that it's better to specify that keyword attribute values are compared without trimming. (Unless, of course, the attribute in question takes a whitespace-separated list of tokens.) -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: Support for compression in XHR?
On Sep 11, 2008, at 22:59, Jonas Sicking wrote: Wouldn't a better solution then be that when the web page sets the flag on the XHR object the browser will always compress the data? And leave it up to the web page to ensure that it doesn't enable capabilities that the web server doesn't support. After all, it's the web page's responsibility to know many other aspects of server capabilities, such as if GET/POST/DELETE is supported for a given URI. This is the approach I've taken with Validator.nu. Validator.nu support gzipped request bodies. If someone reads the docs for the Web service API so that they can program client code for the service, they should notice what the documentation says about compression. There is, though, the problem that now compression support is part of the published API as opposed to being an orthogonal transport feature, so removing incoming compression support from Validator.nu (e.g. if bandwidth were abundant and CPU were the bottle neck) would break existing clients. This is not a problem with same-site XHR, though, when the same entity controls the server and the JS program performing the requests and can update the JS program immediately. (Validator.nu also advertises Accept-Encoding: gzip via OPTIONS, but I'm not aware of any client automatically picking it up from there.) -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/