On 10/14/2014 03:41 AM, Anne van Kesteren wrote:
On Tue, Oct 14, 2014 at 1:05 AM, Sam Ruby <ru...@intertwingly.net> wrote:
1) rows where the notes merely say "href" are cases where parse errors are
thrown and failure is returned. The expected results are an object that
returns the original href, but empty values for all other properties. I
don't see this behavior in the spec:
https://url.spec.whatwg.org/#url-parsing
That is what you get when e.g. using <a>. If you use new URL() the
object would fail to construct so you cannot observe the other
properties. I'm not sure why you think it doesn't follow from the
specification. If you return failure, there's no URL returned, so why
would the properties return something?
Given that I've found problems in the spec, my implementation, and the
test data, I'm trying to guess at what is the desired behavior. As one
source for clues, I've looked at what at the now unmaintained library:
https://github.com/annevk/url/blob/master/url.js#L62
And, as noted above, this is consistent with urltestdata.txt,
Given all of the above, would you suggest changing the spec or the
expected test results?
2) rows that contain "href hostname" appear to be ones where the expected
results do not appear to be updated to include the host to IDNA mapping.
Looking at the first of those
http://intertwingly.net/stories/2014/10/13/urltest-results/eb3950fcc8
it seems something might be broken here on your end.
Can you explain what you think is broken? It isn't completely obvious,
but the input string in that case contains U+200B, U+2060, U+FEFF:
http://www.fileformat.info/info/unicode/char/200B/index.htm
http://www.fileformat.info/info/unicode/char/2060/index.htm
http://www.fileformat.info/info/unicode/char/feff/index.htm
I'll also note that the results I produce are consistent with
Presto/2.12.388.
3) rows that contain "href protocol hostname pathname" need further
investigation. I suspect that these are based on my using a library to
normalize the IDNA mapping, and it "helpfully" cleans up other problems like
removing U+0000 characters from the input.
E.g. for http://intertwingly.net/stories/2014/10/13/urltest-results/7a0e86d240
per http://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt
U+FDD0 is disallowed meaning failure ought to be returned. What you
have as outcome for "whatwg" does not match urltestdata.txt (including
the version you are using).
Agreed. As I indicated, I need to look further into the library that I
am using.
P.S. I didn't update to the latest test data yet; but from what I can see
the changes wouldn't materially affect the results, so I am publishing now.
It affects what happens for http://%30%78%63%30%2e%30%32%35%30.01%2e,
http://192.168.0.257, and
ttp://\uff10\uff38\uff43\uff10\uff0e\uff10\uff12\uff15\uff10\uff0e\uff10\uff11.
I do plan to update to the latest expected test results, but meanwhile I
am still trying to determine places where these results aren't correct
or current with the specification.
- Sam Ruby