On Jul 24, 2010, at 9:55 AM, Adam Barth wrote:

> 2010/7/23 Ian Fette (イアンフェッティ) <[email protected]>:
>> http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization
>>  lists
>> some interesting cases we've come across on the anti-phishing team in
>> Google. To the extent you're concerned with / interested in
>> canonicalizaiton, it may be worth taking a look at (not to suggest you
>> follow that in determining how to parse/canonicalize URLs, but rather to
>> make sure that you have some "correct" way of handling the listed URLs).
> 
> Thanks.  That's helpful.
> 
>> BTW, are you covering canonicalization?
> 
> Yes.  The three main things I'm hoping to cover are parsing,
> canonicalization, and resolving relative URLs.

Is there any place in the Web platform where "canonicalize" is exposed by 
itself in a Web-facing way? I think resolve against a base and parse into 
components are the only algorithms whose effects can be observed directly. I 
think we only need to spec "canonicalize" if it turns out to be a useful 
subroutine.

There's also the related question of what browsers should do with input typed 
into the URL field. Other than establishing that these rules may be different 
between the URL field and URLs present in content, I'm not sure this is 
amenable to spec. But perhaps a survey of what browsers do would be useful.

Regards,
Maciej

Reply via email to