Re: [whatwg] URL: file: URLs

2012-10-29 Thread Anne van Kesteren
On Sun, Oct 28, 2012 at 6:51 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Same as the comment I quoted?  As same as something else?

Same as you quoted.


 Well, the Gecko parser preserves the host at this stage assuming the URI was
 correctly formatted with a host.  Again:

   blah://foo/bar = blah://foo/bar

 The interesting things happen when you have 0, 1, or 3 slashes between ':'
 and foo.  The handling of foo after this point is a separate issue.

Those are handled the same as in Gecko (also matches Safari I think,
Chrome strips are starting slashes (like if you have four), but I did
not copy that).


 In Gecko, it's part of URL parsing.  More precisely, it's part of the
 normalization performed as part of constructing a URL object from a
 string.  Since this is also how we parse URLs, it's effectively all part of
 the package.

 But note that it would be a bit odd of file://c:/ claimed to have a host of
 c with a default port or some such...

Maybe I should introduce a file host state that supports colons in
the host name (or special case the host state further, but the
former seems cleaner). Most browsers seem to fail currently on input
such as file://c:/ but this is on a Mac so maybe that's the
difference. I would prefer having the parsing be consistent though.


 7 and 8 are not, though at some point we'll need to define equality
 comparisons anyway.

Yeah, I guess at some point someone would need to write a processing
file: URLs specification (for post-parsing operations). On the other
hand, it's not entirely clear to me that needs to be interoperable.


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Boris Zbarsky

On 10/29/12 5:00 AM, Anne van Kesteren wrote:

But note that it would be a bit odd of file://c:/ claimed to have a host of
c with a default port or some such...


Maybe I should introduce a file host state that supports colons in
the host name (or special case the host state further, but the
former seems cleaner).


I don't think that's particularly desirable.  The c: is totally part 
of the path; treating it otherwise would just be confusing.  Imo.



Most browsers seem to fail currently on input
such as file://c:/ but this is on a Mac


Yes, doing that on a Mac would just be wrong


I would prefer having the parsing be consistent though.


You mean across Windows and non-Windows?  I'm not sure that's viable.

-Boris



Re: [whatwg] URL: file: URLs

2012-10-29 Thread Anne van Kesteren
On Mon, Oct 29, 2012 at 3:13 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 10/29/12 5:00 AM, Anne van Kesteren wrote:
 Maybe I should introduce a file host state that supports colons in
 the host name (or special case the host state further, but the
 former seems cleaner).

 I don't think that's particularly desirable.  The c: is totally part of
 the path; treating it otherwise would just be confusing.  Imo.

But at that point in a URL you cannot have a path. A path starts with
a slash after the host. Especially if you want file://test/ to parse
with test being the host.


 Most browsers seem to fail currently on input
 such as file://c:/ but this is on a Mac

 Yes, doing that on a Mac would just be wrong

I suppose, I would hate it though for new URL(...) to depend on the platform.


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Boris Zbarsky

On 10/29/12 10:53 AM, Anne van Kesteren wrote:

But at that point in a URL you cannot have a path. A path starts with
a slash after the host.


The point is that on Windows, Gecko parses file://c:/something as 
file:///c:/something


As in, it's an exception to the general if there are two slashes after 
the file: then the next thing is a host rule.



I suppose, I would hate it though for new URL(...) to depend on the platform.


I'm not sure there are great solutions here.  :(

-Boris


[whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Johan Sundström
Hi everybody!

Serializing a complete HTML document DOM to a string is surprisingly
hard in javascript. As a fairly seasoned javascript hacker I figured
this might do it:

  document.doctype + document.documentElement.outerHTML

It doesn't. No browser has a useful window.DocumentType.prototype that
returns either the original document's !DOCTYPE ... before parsing –
or a semantically equivalent post-parsing one. Google Chrome shows one
in its devtools, but seems not to export some way of getting at it to
programmers.

My proposal is we specify this more useful behaviour for
javascript-running browsers, so it does become as simple as above. A
rough sketch of how a polyfill might implement the latter
window.DocumentType.prototype.toString:

  https://gist.github.com/3977584

Even as a polyfill, the above is rather limited, though:  I believe
only Firefox implements internalSubset today, and probably only in
XML contexts. The most useful implementation would IMO be a native one
that reproducing the doctype, as it was formatted in the source
document.

Thoughts?

-- 
 / Johan Sundström, http://ecmanaut.blogspot.com/


Re: [whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Boris Zbarsky

On 10/29/12 8:58 PM, Johan Sundström wrote:

Serializing a complete HTML document DOM to a string is surprisingly
hard in javascript.


I thought there were plans to put innerHTML on Document.  Did that go 
nowhere?



As a fairly seasoned javascript hacker I figured
this might do it:

   document.doctype + document.documentElement.outerHTML


This seems lossy in many cases (most obviously: when the HTML uses 
conditional comments, though there are also various XHTML-specific issues).



The most useful implementation would IMO be a native one
that reproducing the doctype, as it was formatted in the source
document.


That might be worth doing independent of the serialization issue.

-Boris


Re: [whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Ojan Vafai
On Mon, Oct 29, 2012 at 6:17 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 10/29/12 8:58 PM, Johan Sundström wrote:

 Serializing a complete HTML document DOM to a string is surprisingly
 hard in javascript.


 I thought there were plans to put innerHTML on Document.  Did that go
 nowhere?


There were plans to put in on DocumentFragment. But IIRC no other browser
vendors voiced an interest and Hixie was opposed because he thought it
would encourage people to do more string-based DOM building. The WebKit
patch for this floundered as a result. I still think it's a good idea.


Re: [whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Ian Hickson
On Mon, 29 Oct 2012, Johan Sundstr�m wrote:
 
 Serializing a complete HTML document DOM to a string is surprisingly 
 hard in javascript. As a fairly seasoned javascript hacker I figured 
 this might do it:
 
   document.doctype + document.documentElement.outerHTML

 It doesn't. No browser has a useful window.DocumentType.prototype that 
 returns either the original document's !DOCTYPE ... before parsing � 
 or a semantically equivalent post-parsing one.

If you know the document is always going to be in the no-quirks mode, then 
you can just stick !DOCTYPE HTML at the start. If you need to be able 
to tell what the mode is but are ok with ignoring the limited quirks 
mode, then you can use document.compatMode to pick whether to use that 
string or none, as in:

   (document.compatMode == 'CSS1Compat' ? '!DOCTYPE HTML' : '') +
   document.documentElement.outerHTML

That will drop any comment nodes around the root element, in case that 
matters. If you want to get the actual DOCTYPE strings, you can make a 
simple serialisation function for doctype nodes that uses the three 
attributes on that object to string together the full thing (much as you 
do in the polyfill you mentioned).


 I believe only Firefox implements internalSubset today

Since the internal subset has no meaning in text/html, that's ok if your 
goal is just to be semantically equivalent.


 The most useful implementation would IMO be a native one that 
 reproducing the doctype, as it was formatted in the source document.

What's your use case, exactly?


On Mon, 29 Oct 2012, Boris Zbarsky wrote:
 
 I thought there were plans to put innerHTML on Document.  Did that go 
 nowhere?

Lack of implementor interest killed it a while ago.


On Mon, 29 Oct 2012, Ojan Vafai wrote:
 On Mon, Oct 29, 2012 at 6:17 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 
  I thought there were plans to put innerHTML on Document.  Did that go 
  nowhere?
 
 There were plans to put in on DocumentFragment.

That was a different plan, but yes, there have also been proposals to do 
that. This was in the context of templates; a better solution to which has 
since been worked on in public-webapps.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'