Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On Wed, Oct 31, 2012 at 7:33 PM, Ian Hickson i...@hixie.ch wrote: On Wed, 31 Oct 2012, Johan Sundström wrote: On Wednesday, October 31, 2012 at 15:02 , Ian Hickson wrote: On Tue, 30 Oct 2012, Johan Sundström wrote: That said, I would still much enjoy a future where javascript:alert(document.doctype) would tell you something rich about the page that we today need deep knowledge of document.compatMode and/or combinations of XMLSerializer and parsers, or deep study of DocumentType refdocs to tease out. Can you elaborate on that? Sure – rich as in not [object DocumentType], but Well the toString() isn't what matters, it's what you can get from the rest of the attributes on the object. Or are you just saying you wish .toString() would expose the concatenated string? That would just be a conveniece method. Is it worth the compat risk? Yes, this is where our opinions differ. :-) To me, it is the (lack of) language integration that is the heart of the matter and the source of my itch – not that what I attempted to do proved impossible to cobble together with a (perfectly functional!) solution using other documented DOM APIs scattered about in other disjunct parts of the browser object model, or pasting together object properties and programmer provided constant strings to almost reproduce the value sought. My own hack unintentionally got it wrong in several ways, for example, and I deem that unnecessary brittleness. From my own experience, the only guaranteed safe, reliable and cross browser method for figuring out an object's class name is Object.prototype.toString.call(object_of_interest), so I would sacrifice consistency with DocumentType.prototype.toString behaviours of the past in an instance for a more useful and intuitive one. If a novice programmer's expectations on what happens when she uses the object in a string context is met, I'd call that improvement here. …on apple.com: !DOCTYPE html …on roxen.com: !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd; I don't understand how that is different than document.compatMode, really, other than the latter not exposing limited quirks mode. But in theory at least, this information is already exposed. It tells me what the doctype is, instead of the name of a bucket the browser sorts the doctype into for various semantic and standards compliance (and/or political) reasons. Both features are excellent, when they are the feature you seek, and they already bear decent names helping with their findability and learnability. I am essentially weary of the long knowledge gap and edit distance between alert(document.doctype) and alert((new XMLSerializer).serializeToString(document.doctype)) – that we can straddle both in this group we already proved; I aspire to help the other 99%. …on the Firefox default homepage: !DOCTYPE html [ !ENTITY % htmlDTD PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN [...] This is for XML, right? In HTML the bit in the square brackets would just be dropped. It's not clear that it's worth exposing just for XML... Anyway, this is the DOM Core spec, so I'll let Anne, Aryeh, and Ms2ger give you a proper answer. :-) It probably is, and it's also where the change would be useful; were SVG and other DOMs exempt from returning a string serialization, it would be a substantially less useful change. -- / Johan Sundström, http://ecmanaut.blogspot.com/
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On Tue, Oct 30, 2012 at 3:20 AM, Stewart Brodie stewart.bro...@antplc.com wrote: Hi everybody! Serializing a complete HTML document DOM to a string is surprisingly hard in javascript. Does XMLSerializer().serializeToString(document) not meet your requirement? Ah – good thinking. (new XMLSerializer).serializeToString(document) does indeed do a pretty excellent job of it, including the crazy hacks people do with conditional comments outside of the root node, which I hadn't figured I would be able to piece back together from an already parsed page. While I hate to admit it, maybe on some level there is benefit to much of the DOM APIs being javascript hostile to force you towards the occasional really well-paved paths like the above, when you can find them. My use case was taking as good a snapshot of an already live web page's structure from a non-privileged bookmarklet, for archival purposes (i e essentially what a curl of the page would do). For my purposes, it is a bonus that I actually get the current state of the page with whatever DOM mods have transpired since it loaded rather than what curl would produce, so I think XMLSerializer is a good friend. That said, I would still much enjoy a future where javascript:alert(document.doctype) would tell you something rich about the page that we today need deep knowledge of document.compatMode and/or combinations of XMLSerializer and parsers, or deep study of DocumentType refdocs to tease out. Is there a case against it in people using it where they ought to pick other solutions? -- / Johan Sundström, http://ecmanaut.blogspot.com/ -- Stewart Brodie Team Leader - ANT Galio Browser ANT Software Limited
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On Tue, 30 Oct 2012, Johan Sundström wrote: That said, I would still much enjoy a future where javascript:alert(document.doctype) would tell you something rich about the page that we today need deep knowledge of document.compatMode and/or combinations of XMLSerializer and parsers, or deep study of DocumentType refdocs to tease out. Can you elaborate on that? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On Wednesday, October 31, 2012 at 15:02 , Ian Hickson wrote: On Tue, 30 Oct 2012, Johan Sundström wrote: That said, I would still much enjoy a future where javascript:alert(document.doctype) would tell you something rich about the page that we today need deep knowledge of document.compatMode and/or combinations of XMLSerializer and parsers, or deep study of DocumentType refdocs to tease out. Can you elaborate on that? Sure – rich as in not [object DocumentType], but …on apple.com: !DOCTYPE html …on roxen.com: !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd; …on the Firefox default homepage: !DOCTYPE html [ !ENTITY % htmlDTD PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN DTD/xhtml1-strict.dtd %htmlDTD; !ENTITY % globalDTD SYSTEM chrome://global/locale/global.dtd %globalDTD; !ENTITY % aboutHomeDTD SYSTEM chrome://browser/locale/aboutHome.dtd %aboutHomeDTD; !ENTITY % syncBrandDTD SYSTEM chrome://browser/locale/syncBrand.dtd %syncBrandDTD; !-- These strings are used in the about:home page -- !ENTITY abouthome.pageTitle brandFullName; Start Page !ENTITY abouthome.searchEngineButton.label Search !-- LOCALIZATION NOTE (abouthome.defaultSnippet1.v1): text in a/ will be linked to the Firefox features page on mozilla.com -- !ENTITY abouthome.defaultSnippet1.v1 Thanks for choosing Firefox! To get the most out of your browser, learn more about the alatest features/a. !-- LOCALIZATION NOTE (abouthome.defaultSnippet2.v1): text in a/ will be linked to the featured add-ons on addons.mozilla.org -- !ENTITY abouthome.defaultSnippet2.v1 It's easy to customize your Firefox exactly the way you want it. aChoose from thousands of add-ons/a. !ENTITY abouthome.bookmarksButton.label Bookmarks !ENTITY abouthome.historyButton.label History !ENTITY abouthome.settingsButton.label Settings !ENTITY abouthome.addonsButton.labelAdd-ons !ENTITY abouthome.appsButton.label Marketplace !ENTITY abouthome.downloadsButton.label Downloads !ENTITY abouthome.syncButton.label syncBrand.shortName.label; !ENTITY % browserDTD SYSTEM chrome://browser/locale/browser.dtd %browserDTD; ] -- / Johan Sundström, http://ecmanaut.blogspot.com/
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On Wed, 31 Oct 2012, Johan Sundström wrote: On Wednesday, October 31, 2012 at 15:02 , Ian Hickson wrote: On Tue, 30 Oct 2012, Johan Sundström wrote: That said, I would still much enjoy a future where javascript:alert(document.doctype) would tell you something rich about the page that we today need deep knowledge of document.compatMode and/or combinations of XMLSerializer and parsers, or deep study of DocumentType refdocs to tease out. Can you elaborate on that? Sure – rich as in not [object DocumentType], but Well the toString() isn't what matters, it's what you can get from the rest of the attributes on the object. Or are you just saying you wish .toString() would expose the concatenated string? That would just be a conveniece method. Is it worth the compat risk? …on apple.com: !DOCTYPE html …on roxen.com: !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd; I don't understand how that is different than document.compatMode, really, other than the latter not exposing limited quirks mode. But in theory at least, this information is already exposed. …on the Firefox default homepage: !DOCTYPE html [ !ENTITY % htmlDTD PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN [...] This is for XML, right? In HTML the bit in the square brackets would just be dropped. It's not clear that it's worth exposing just for XML... Anyway, this is the DOM Core spec, so I'll let Anne, Aryeh, and Ms2ger give you a proper answer. :-) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
Johan Sundström oyas...@gmail.com wrote: Hi everybody! Serializing a complete HTML document DOM to a string is surprisingly hard in javascript. Does XMLSerializer().serializeToString(document) not meet your requirement? -- Stewart Brodie Team Leader - ANT Galio Browser ANT Software Limited
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On 30 Oct 2012 at 10:20, Stewart Brodie stewart.bro...@antplc.com wrote: Johan Sundström oyas...@gmail.com wrote: Serializing a complete HTML document DOM to a string is surprisingly hard in javascript. Does XMLSerializer().serializeToString(document) not meet your requirement? I was wondering that too. I use it to get the content of an iframe into a string, so I can send that off to a database. Seems to work without problems (Safari Mac 6.0.1). But I too had to ask how to do that; it wasn't particularly obvious that that was what I should have been using (to me at any rate). -- Cheers -- Tim
[whatwg] Proposal for window.DocumentType.prototype.toString
Hi everybody! Serializing a complete HTML document DOM to a string is surprisingly hard in javascript. As a fairly seasoned javascript hacker I figured this might do it: document.doctype + document.documentElement.outerHTML It doesn't. No browser has a useful window.DocumentType.prototype that returns either the original document's !DOCTYPE ... before parsing – or a semantically equivalent post-parsing one. Google Chrome shows one in its devtools, but seems not to export some way of getting at it to programmers. My proposal is we specify this more useful behaviour for javascript-running browsers, so it does become as simple as above. A rough sketch of how a polyfill might implement the latter window.DocumentType.prototype.toString: https://gist.github.com/3977584 Even as a polyfill, the above is rather limited, though: I believe only Firefox implements internalSubset today, and probably only in XML contexts. The most useful implementation would IMO be a native one that reproducing the doctype, as it was formatted in the source document. Thoughts? -- / Johan Sundström, http://ecmanaut.blogspot.com/
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On 10/29/12 8:58 PM, Johan Sundström wrote: Serializing a complete HTML document DOM to a string is surprisingly hard in javascript. I thought there were plans to put innerHTML on Document. Did that go nowhere? As a fairly seasoned javascript hacker I figured this might do it: document.doctype + document.documentElement.outerHTML This seems lossy in many cases (most obviously: when the HTML uses conditional comments, though there are also various XHTML-specific issues). The most useful implementation would IMO be a native one that reproducing the doctype, as it was formatted in the source document. That might be worth doing independent of the serialization issue. -Boris
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On Mon, Oct 29, 2012 at 6:17 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 10/29/12 8:58 PM, Johan Sundström wrote: Serializing a complete HTML document DOM to a string is surprisingly hard in javascript. I thought there were plans to put innerHTML on Document. Did that go nowhere? There were plans to put in on DocumentFragment. But IIRC no other browser vendors voiced an interest and Hixie was opposed because he thought it would encourage people to do more string-based DOM building. The WebKit patch for this floundered as a result. I still think it's a good idea.
Re: [whatwg] Proposal for window.DocumentType.prototype.toString
On Mon, 29 Oct 2012, Johan Sundstr�m wrote: Serializing a complete HTML document DOM to a string is surprisingly hard in javascript. As a fairly seasoned javascript hacker I figured this might do it: document.doctype + document.documentElement.outerHTML It doesn't. No browser has a useful window.DocumentType.prototype that returns either the original document's !DOCTYPE ... before parsing � or a semantically equivalent post-parsing one. If you know the document is always going to be in the no-quirks mode, then you can just stick !DOCTYPE HTML at the start. If you need to be able to tell what the mode is but are ok with ignoring the limited quirks mode, then you can use document.compatMode to pick whether to use that string or none, as in: (document.compatMode == 'CSS1Compat' ? '!DOCTYPE HTML' : '') + document.documentElement.outerHTML That will drop any comment nodes around the root element, in case that matters. If you want to get the actual DOCTYPE strings, you can make a simple serialisation function for doctype nodes that uses the three attributes on that object to string together the full thing (much as you do in the polyfill you mentioned). I believe only Firefox implements internalSubset today Since the internal subset has no meaning in text/html, that's ok if your goal is just to be semantically equivalent. The most useful implementation would IMO be a native one that reproducing the doctype, as it was formatted in the source document. What's your use case, exactly? On Mon, 29 Oct 2012, Boris Zbarsky wrote: I thought there were plans to put innerHTML on Document. Did that go nowhere? Lack of implementor interest killed it a while ago. On Mon, 29 Oct 2012, Ojan Vafai wrote: On Mon, Oct 29, 2012 at 6:17 PM, Boris Zbarsky bzbar...@mit.edu wrote: I thought there were plans to put innerHTML on Document. Did that go nowhere? There were plans to put in on DocumentFragment. That was a different plan, but yes, there have also been proposals to do that. This was in the context of templates; a better solution to which has since been worked on in public-webapps. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'