Re: [public-webapps] Comment on Widget URI (2)
Hi Larry, On Tue, Dec 15, 2009 at 3:52 PM, Robin Berjon ro...@berjon.com wrote: Hi Larry, On Dec 9, 2009, at 17:55 , Larry Masinter wrote: http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5 gives several different examples of normalization and comparison of strings for the purpose of identification. Yes. That's why we indicate that A producer MUST generate URIs that are normalised according to chapter 5.3.2. Syntax-Based Normalization of [RFC3987]. RFC 3987 further states that IRIs already in Unicode MUST NOT be normalized before parsing or interpreting. It goes on to add further details in the rest of 5.3.2.2. I can't figure out from the document of the Widget: URI scheme which, if any, of the comparison algorithms are recommended. In fact, the assertion that using UTF-8 is recommended seems like it would result in ambiguous interpretation of URIs if some implementations use UTF-8 and others don't. I'm sorry, I can't find a single location in either the published draft nor the editor's draft that states that for widget URIs UTF-8 is recommended. The Widget P+C specification states that it is recommended to use UTF-8 for the file name field of the local file header of a file entry. One may indeed be able to use something else, and user agents may indeed be able to do something with that, but really all bets are off. Zip 6.3 only supports UTF-8 and CP437. When UTF-8 is used, it must be implicitly marked as such. Hence, you always know which encoding you are getting: http://www.pkware.com/documents/casestudies/APPNOTE.TXT So, if I have a file named Voß.html and a relative IRI that points to voss.html, do they match or not? You say case sensitive, do you mean byte for byte? Do half-width romaji characters match the full-width romaji characters? Does anyone ever really mean byte for byte in string comparisons? Since these IRIs are not normalised, would you prefer codepoint for codepoint appended to case sensitive? Or am I missing something in your comment? Perhaps it's necessary to dig further into the widget spec to insure this is not an ambiguity, but the question was whether the widget specification was well-defined, and my comment was that it didn't seem to be. P+C is a separate specification over which WURIs are but a layer, but likewise is indicating codepoint-matching what you are requesting there? Sorry for being thick but it is hard to be certain what the desired outcome from your comment is. Agreed. It might be a help to check the following algorithm in PC: http://dev.w3.org/2006/waf/widgets/#rule-for-finding-a-file-within-a-widget-0 And the definition of a zip relative path: http://dev.w3.org/2006/waf/widgets/#zip-relative-path -- Marcos Caceres http://datadriven.com.au
RE: [public-webapps] Comment on Widget URI (2)
http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5 gives several different examples of normalization and comparison of strings for the purpose of identification. There are significant differences in alternatives for how to do comparison of Unicode file names. I can't figure out from the document of the Widget: URI scheme which, if any, of the comparison algorithms are recommended. In fact, the assertion that using UTF-8 is recommended seems like it would result in ambiguous interpretation of URIs if some implementations use UTF-8 and others don't. So, if I have a file named Voß.html and a relative IRI that points to voss.html, do they match or not? You say case sensitive, do you mean byte for byte? Do half-width romaji characters match the full-width romaji characters? Note that different operating systems normalize unicode file names differently. Perhaps it's necessary to dig further into the widget spec to insure this is not an ambiguity, but the question was whether the widget specification was well-defined, and my comment was that it didn't seem to be. Larry -- http://larry.masinter.net -Original Message- From: Robin Berjon [mailto:ro...@berjon.com] Sent: Thursday, November 19, 2009 6:00 AM To: Larry Masinter Cc: public-webapps@w3.org Subject: Re: [public-webapps] Comment on Widget URI (2) Dear Larry, thank you for your comments. On Oct 10, 2009, at 19:44 , Larry Masinter wrote: 2) ** WELL-DEFINED MAPPING TO FILES ** Section 4.4 Step 2 makes normative reference: http://www.w3.org/TR/widgets/#rule-for-finding-a-file-within-a-widget- The algorithm there seems to be lacking a clear definition of matches which deals reasonably with the issues surrounding matching and equivalence for Unicode strings, or the handling of character sets in IRIs which are not represented in UTF8. Suggestion (Editorial): Move the definition of the mapping algorithm into the URI scheme registration document so that its definition can be reviewed for completeness. Suggestion (Technical): Define exactly and precisely what match means and make it clear what the appropriate response or error conditions are if there is more than one file that matches. This comment concerns P+C, and I'm unsure about what change you are requesting where. Could you please provide an example of an issue in the current setup and explain how you would like to see it addressed? -- Robin Berjon - http://berjon.com/
Re: [public-webapps] Comment on Widget URI (2)
Dear Larry, thank you for your comments. On Oct 10, 2009, at 19:44 , Larry Masinter wrote: 2) ** WELL-DEFINED MAPPING TO FILES ** Section 4.4 Step 2 makes normative reference: http://www.w3.org/TR/widgets/#rule-for-finding-a-file-within-a-widget- The algorithm there seems to be lacking a clear definition of matches which deals reasonably with the issues surrounding matching and equivalence for Unicode strings, or the handling of character sets in IRIs which are not represented in UTF8. Suggestion (Editorial): Move the definition of the mapping algorithm into the URI scheme registration document so that its definition can be reviewed for completeness. Suggestion (Technical): Define exactly and precisely what match means and make it clear what the appropriate response or error conditions are if there is more than one file that matches. This comment concerns P+C, and I'm unsure about what change you are requesting where. Could you please provide an example of an issue in the current setup and explain how you would like to see it addressed? -- Robin Berjon - http://berjon.com/