Re: [public-webapps] Comment on Widget URI (2)

2009-12-15 Thread Marcos Caceres
Hi Larry,

On Tue, Dec 15, 2009 at 3:52 PM, Robin Berjon ro...@berjon.com wrote:
 Hi Larry,

 On Dec 9, 2009, at 17:55 , Larry Masinter wrote:
 http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5

 gives several different examples of normalization and
 comparison of strings for the purpose of identification.

 Yes. That's why we indicate that A producer MUST generate URIs that are 
 normalised according to chapter 5.3.2. Syntax-Based Normalization of 
 [RFC3987].

 RFC 3987 further states that IRIs already in Unicode MUST NOT be normalized 
 before parsing or interpreting. It goes on to add further details in the 
 rest of 5.3.2.2.

 I can't figure out from the document of the
 Widget: URI scheme which, if any, of the comparison
 algorithms are recommended. In fact, the assertion
 that using UTF-8 is recommended seems like it would
 result in ambiguous interpretation of URIs if some
 implementations use UTF-8 and others don't.

 I'm sorry, I can't find a single location in either the published draft nor 
 the editor's draft that states that for widget URIs UTF-8 is recommended.

 The Widget P+C specification states that it is recommended to use UTF-8 for 
 the file name field of the local file header of a file entry. One may indeed 
 be able to use something else, and user agents may indeed be able to do 
 something with that, but really all bets are off.


Zip 6.3 only supports UTF-8 and CP437. When UTF-8 is used, it must be
implicitly marked as such. Hence, you always know which encoding you
are getting:
http://www.pkware.com/documents/casestudies/APPNOTE.TXT

 So, if I have a file named Voß.html and a relative
 IRI that points to voss.html, do they match or not?
 You say case sensitive, do you mean byte for byte?
 Do half-width romaji characters match the full-width
 romaji characters?

 Does anyone ever really mean byte for byte in string comparisons? Since these 
 IRIs are not normalised, would you prefer codepoint for codepoint appended 
 to case sensitive? Or am I missing something in your comment?

 Perhaps it's necessary to dig further into the
 widget spec to insure this is not an ambiguity, but
 the question was whether the widget specification
 was well-defined, and my comment was that it
 didn't seem to be.

 P+C is a separate specification over which WURIs are but a layer, but 
 likewise is indicating codepoint-matching what you are requesting there? 
 Sorry for being thick but it is hard to be certain what the desired outcome 
 from your comment is.


Agreed. It might be a help to check the following algorithm in PC:
http://dev.w3.org/2006/waf/widgets/#rule-for-finding-a-file-within-a-widget-0

And the definition of a zip relative path:
http://dev.w3.org/2006/waf/widgets/#zip-relative-path



-- 
Marcos Caceres
http://datadriven.com.au



RE: [public-webapps] Comment on Widget URI (2)

2009-12-09 Thread Larry Masinter
http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5

gives several different examples of normalization and
comparison of strings for the purpose of identification.

There are significant differences in alternatives for
how to do comparison of Unicode file names.

I can't figure out from the document of the
Widget: URI scheme which, if any, of the comparison
algorithms are recommended. In fact, the assertion
that using UTF-8 is recommended seems like it would
result in ambiguous interpretation of URIs if some
implementations use UTF-8 and others don't.

So, if I have a file named Voß.html and a relative
IRI that points to voss.html, do they match or not?
You say case sensitive, do you mean byte for byte?
Do half-width romaji characters match the full-width
romaji characters?

Note that different operating systems normalize
unicode file names differently.

Perhaps it's necessary to dig further into the
widget spec to insure this is not an ambiguity, but
the question was whether the widget specification
was well-defined, and my comment was that it
didn't seem to be.

Larry
--
http://larry.masinter.net


-Original Message-
From: Robin Berjon [mailto:ro...@berjon.com] 
Sent: Thursday, November 19, 2009 6:00 AM
To: Larry Masinter
Cc: public-webapps@w3.org
Subject: Re: [public-webapps] Comment on Widget URI (2)

Dear Larry,

thank you for your comments.

On Oct 10, 2009, at 19:44 , Larry Masinter wrote:
 2) ** WELL-DEFINED MAPPING TO FILES **
 
 Section 4.4 Step 2 makes normative reference:
 
 http://www.w3.org/TR/widgets/#rule-for-finding-a-file-within-a-widget- 
 
 The algorithm there seems to be lacking a clear definition of matches
 which deals reasonably with the issues surrounding matching and equivalence
 for Unicode strings, or the handling of character sets in IRIs which are
 not represented in UTF8.
 
 Suggestion (Editorial): Move the definition of the mapping algorithm
 into the URI scheme registration document so that its definition can 
 be reviewed for completeness.
 Suggestion (Technical): Define exactly and precisely what match means
 and make it clear what the appropriate response or error conditions are
 if there is more than one file that matches.

This comment concerns P+C, and I'm unsure about what change you are requesting 
where. Could you please provide an example of an issue in the current setup and 
explain how you would like to see it addressed?

-- 
Robin Berjon - http://berjon.com/






Re: [public-webapps] Comment on Widget URI (2)

2009-11-19 Thread Robin Berjon
Dear Larry,

thank you for your comments.

On Oct 10, 2009, at 19:44 , Larry Masinter wrote:
 2) ** WELL-DEFINED MAPPING TO FILES **
 
 Section 4.4 Step 2 makes normative reference:
 
 http://www.w3.org/TR/widgets/#rule-for-finding-a-file-within-a-widget-
 
 The algorithm there seems to be lacking a clear definition of matches
 which deals reasonably with the issues surrounding matching and equivalence
 for Unicode strings, or the handling of character sets in IRIs which are
 not represented in UTF8.
 
 Suggestion (Editorial): Move the definition of the mapping algorithm
 into the URI scheme registration document so that its definition can 
 be reviewed for completeness.
 Suggestion (Technical): Define exactly and precisely what match means
 and make it clear what the appropriate response or error conditions are
 if there is more than one file that matches.

This comment concerns P+C, and I'm unsure about what change you are requesting 
where. Could you please provide an example of an issue in the current setup and 
explain how you would like to see it addressed?

-- 
Robin Berjon - http://berjon.com/