Re: Client side JavaScript i18n API
On 26.4.2010 21.49, ext Nebojša Ćirić c...@chromium.org wrote: We have a first draft at http://docs.google.com/Doc?id=dhttrq5v_0c8k5vkdh (it has view/edit permissions). It's describes a small subset of final API we intend to implement. We've picked date/time formatting and collation as must have for the first iteration. Have you guys looked at the I18N facilities of the Dojo JavaScript framework? [1] Seems like there is some overlap with your API, and Dojo is already well established in the field. Maybe there are some opportunities to reuse and/or coordinate, to avoid duplicate APIs. The Dojo formatting modules cover dates, numbers, and currencies in a locale-specific manner, and are based on the CLDR (nice!). [2] Dojo has the concept of locale [3], but looks like it doesn't support locale-specific collation out of the box, and I don't know if you can plug in your own comparison (i.e. an implementation of the UCA). Maybe someone who knows more can confirm or deny. Best regards, Jere [1] http://www.dojotoolkit.org/ [2] http://docs.dojocampus.org/quickstart/internationalization/number-and-curren cy-formatting [3] http://www.ibm.com/developerworks/web/library/wa-dojo/
[widget-uri] Widget URI ABNF definition comments
Robin, all, I had a look at the Widget URIs Working Draft of 08 Sept 2009, and a couple of questions popped into mind, since I'm a sucker for all things ABNF. :-) /1/ Is it the intent that the 'opaque authority' corresponds exactly to the iauthority definition in the ABNF? If so, am I correct in assuming that it doesn't matter at this point that there is no mention of the iuserinfo and port components wrt. widget URIs, because the opaque authority intentionally has no semantics? /2/ Also, I'm trying to figure out what is the relationship between zip-rel-path (as found in Widgets PC) and ihier-part (as found in RFC 3897) -- if we rely on RFC 3987 then I think these parts must agree. Based on the current ABNF definition of the widget URI, it seems that the only matching variant of ihier-part is // iauthority ipath-abempty since that is the only one with //, although I'm not sure why it needs to be so. If we disregard iauthority (for reasons detailed above), the question then becomes: is zip-rel-path compliant with ipath-abempty? Both of those definitions are quite complicated, and I would be (pleasantly) surprised if it turned out that they do match. My concern here is that we should not say a widget URI is an RFC 3987 compliant IRI unless that definition is really valid, so I'm trying to make the connections and tie any loose ends to be able to say that with confidence. /3/ Since we rely on RFC 3987, I guess the widget URI definition should reuse components from IRI as much as possible. But in the end, it all boils down to whether someone using a bona fide IRI validator (if those even exist) would be able to get consistent results when feeding it with widget URIs. And of course it becomes more interesting when widget URIs point to files inside the widget package that have non-ASCII characters in their names. /4/ Finally, the IRI vs URI naming debate applies as ever. I agree it's messy in that we are so accustomed to URIs, but really should be using IRIs, and that not everyone is conditioned to mentally replace URI with IRI every time. Maybe changing the document name to Widgets 1.0: Widget Resource Identifiers would sidestep some of the problem. :-) Best regards, Jere
Re: [widget-uri] Widget URI ABNF definition comments
On 15.9.2009 13.55, ext timeless timel...@gmail.com wrote: On Tue, Sep 15, 2009 at 1:49 PM, jere.kapy...@nokia.com wrote: Maybe changing the document name to Widgets 1.0: Widget Resource Identifiers would sidestep some of the problem. :-) So we can have URNs, URLs, URIs, IRIs, and WRIs? i'm not sure that's an improvement :) Well, maybe not. But you forgot XRI by OASIS [1]. :-) --Jere [1] http://en.wikipedia.org/wiki/Extensible_Resource_Identifier
Re: [widgets] PC LC comments on I18N/L10N
On 2.7.2009 14.40, ext Marcos Caceres marc...@opera.com wrote: On Tue, Jun 30, 2009 at 10:27 AM, jere.kapy...@nokia.com wrote: On 29.6.2009 13.30, ext Marcos Caceres marc...@opera.com wrote: /JK1/ OK, I've checked it and I think it is now easier to find all the relevant stuff. However, the Localization guidelines section (now 8.1) is marked as non-normative, but I think it should be as normative as 8.2 and 8.3. Especially since there is some CC behaviour described. Agreed. fixed. DoC: OK, although in the July 1 ED the section is still marked as non-normative. Maybe the section should be called Localization model to make it sound more, eh, normative? The complex example refers to several files which really have the same purpose. I think they should also have the same name, otherwise they cannot be found by the same reference. That is, /locales/es/gatos.html should be called /locales/es/cats.html. Or is it intentional? Never is:) That is a left over mistake from when we had multiple configs. /JK2/ Looks like the third file in 8.4.2 is still labeled locales/es/gatos.html. Those cats are tough to manage! :-) Argh! Fixed. DoC: OK This statement in the authoring guideline is puzzling: '[That is,] authors cannot simply put shared files into a language level folder, but need to put all files needed into the language level folder for the widget to work (for example, having a.gif in both /locales/zh-Hans/ folder and locales/zh).' Isn't this the opposite of what is supposed to happen in the fallback model? If the same a.gif is good for both zh-Hans and zh, it should be possible for the author to include it just once in /locales/zh. Yes, this is correct. If the user's language list includes 'zh-Hans', it will also include 'zh', as per Step 5. So a.gif will be found eventually. Right. /JK3/ So then, isn't the statement I quoted above actually incorrect? Authors *can* simply put files into a language level folder, so that duplicates of the exact same file are not needed. If they have the same name but the content is different, then that's OK too. (Meaning that a resource called 'a.gif' could be completely different for zh-hans-cn and zh, and what is ultimately retrieved depends on the UA locale.) One more way to put it is that if your UA locale is zh-hans-cn, and there is no 'locales/zh-hans-cn/a.gif', but there is a 'locales/zh/a.gif', then the latter would be found and used. If you agree with this reasoning, then I think the statement should be removed. Or am I misreading it somehow? This is actually pretty crucial to the working of the whole model, so it's important to determine that we do have the same understanding, and that the spec text doesn't lead readers up the garden path in any way. Ok, I think I stuffed the description up. I means this: locales/en/b.gif locales/en-au/a.gif locales/en-us/a.gif If en is matched, then a.gif would obviously be missing (as it is not in locales/en/). I'm trying to say that authors should not rely on the top most level. I obviously need to dump the sentence in the spec, but I still need to make the above clear. Can you help me out with that? I think the problem with the authoring guideline in 8.2 is that there are too many disconnected examples. The third example is all that is needed. When that is moved to the front of the section, then all the prose can be about it. Here is my suggestion. It gets rid of the two other examples and focuses on the zh-Hans-CN example. 8 Authoring guideline: Authors need to avoid region, script or other subtags except where they add useful distinguishing information to a locale folder. In addition, avoid including empty locale folders in a widget package (unless there is a good reason to include them). An example of a widget that uses folder-based localization: widget.wgt locales/zh-Hans-CN/a.gif locales/zh-Hans-CN/f.gif locales/zh-Hans/a.gif locales/zh-Hans/b.gif locales/zh/a.gif locales/zh/b.gif locales/zh/c.gif a.gif b.gif c.gif d.gif index.html config.xml Authors can further facilitate the localization process by grouping files into folder hierarchies made up of matching subtags, as is shown in the example. Note also that the user agent treats any file or folder outside the container for localized content as unlocalized content. Assuming the widget's locale is zh-hans-cn: - A reference to a.gif would resolve to locales/zh-Hans-CN/a.gif - A reference to b.gif would resolve to locales/zh-Hans/b.gif - A reference to c.gif would resolve to locales/zh/c.gif - A reference to d.gif would resolve to d.gif, as it is not associated with any locales and is hence available to all locales. This works at all sub-levels, so long as the parent subtag matches the child subtags. So, for example, the CN region can make use of the localized files in the zh-Hans folder level, the zh folder level, and the unlocalized files at the
Re: [widgets] PC LC comments on I18N/L10N
Hi Marcos, thanks for your effort. See below for specific points. (Marked accordingly, see end of this e-mail for the legend.) On 29.6.2009 13.30, ext Marcos Caceres marc...@opera.com wrote: Hi Jere, Fixes and some questions below. I got stuck on your last point, can you please clarify it or suggest more clearly what you want me to do there? Sure, see inline below. Right. I've moved everything and renamed examples Localization examples. This would make the material flow better and have all the concepts defined before they are used. I will need you to check this before we republish. Is that OK? Also, the whole of Section 7 should actually appear right before the processing steps (i.e., after the current Section 8). I moved everything as you suggested. /JK1/ OK, I've checked it and I think it is now easier to find all the relevant stuff. However, the Localization guidelines section (now 8.1) is marked as non-normative, but I think it should be as normative as 8.2 and 8.3. Especially since there is some CC behaviour described. The complex example refers to several files which really have the same purpose. I think they should also have the same name, otherwise they cannot be found by the same reference. That is, /locales/es/gatos.html should be called /locales/es/cats.html. Or is it intentional? Never is:) That is a left over mistake from when we had multiple configs. /JK2/ Looks like the third file in 8.4.2 is still labeled locales/es/gatos.html. Those cats are tough to manage! :-) This statement in the authoring guideline is puzzling: '[That is,] authors cannot simply put shared files into a language level folder, but need to put all files needed into the language level folder for the widget to work (for example, having a.gif in both /locales/zh-Hans/ folder and locales/zh).' Isn't this the opposite of what is supposed to happen in the fallback model? If the same a.gif is good for both zh-Hans and zh, it should be possible for the author to include it just once in /locales/zh. Yes, this is correct. If the user's language list includes 'zh-Hans', it will also include 'zh', as per Step 5. So a.gif will be found eventually. Right. /JK3/ So then, isn't the statement I quoted above actually incorrect? Authors *can* simply put files into a language level folder, so that duplicates of the exact same file are not needed. If they have the same name but the content is different, then that's OK too. (Meaning that a resource called 'a.gif' could be completely different for zh-hans-cn and zh, and what is ultimately retrieved depends on the UA locale.) One more way to put it is that if your UA locale is zh-hans-cn, and there is no 'locales/zh-hans-cn/a.gif', but there is a 'locales/zh/a.gif', then the latter would be found and used. If you agree with this reasoning, then I think the statement should be removed. Or am I misreading it somehow? This is actually pretty crucial to the working of the whole model, so it's important to determine that we do have the same understanding, and that the spec text doesn't lead readers up the garden path in any way. Priority is probably a bad term to use with regard to localized folders. I changed it to: [[ In the example below, assuming the widget's locale is zh-hans-cn: * The a.gif file in the zh-Hans-CN locale folder would be used instead of the a.gif file in the zh-Hans locale folder. * The b.gif file in the zh-Hans locale folder would be used instead of the b.gif file in the zh locale folder. * The c.gif in the zh locale folder would be used instead of the c.gif file root of the widget package. * The d.gif file would always be used from the root of the widget, as it is not associated with any locales and is hence available to all locales. ]] I think this now accurately explains what should happen, thanks. /4/ The xml:lang attribute Does the XML specification state that the values of xml:lang attributes must be unique across instances of the same element? No. If yes, it is probably redundant to repeat that in the context of all the elements in the configuration document. If not, the statement about uniqueness could still be factored out, for example to section 8.4, to avoid repetition. Although redundant, I think I will leave this as is. If it's still annoying, we can remove it in CR as it would be an editorial change and it would have no normative impact. Is that OK? Yes, that's OK. (I was just trying to eradicate duplicate text.) In the first example of Step 5, why would en and en-au be swapped around when decomposed? I was missing an en, which might have been causing confusion. Taking the first part of the example: en-us,en-au,fr,en Would become: en-us,en-au,fr,en And would normalize to: en-us,en,en-au Sorry, but that doesn't correspond to anything I see in the spec now... :-( Maybe my original comment was misdirected, but you kind of lost me there. _however_, during
[widgets] PC LC comments on I18N/L10N
Marcos, all, here's a bunch of comments related to the I18N/L10N related parts of the Widgets 1.0: Packaging and Configuration spec (Last Call i.e. Working Draft 28 May 2009). /1/ Order of material From an editorial standpoint, I think that the 7.2 Examples subsection should really be the last one in this section. Information about element-based localization, which now is 8.15, should appear in section 7, because the concepts are used in Section 8 before they are even defined. It would be good to have all related info in the same section. My suggestion for the organization of the material is: 7 Internationalization and localization 7.1 Localization guidelines 7.2 Folder-based localization (used to be 7.3) 7.3 Element-based localization (used to be 8.15) 7.4 Localization examples (used to be 7.2 Examples, note new name) This would make the material flow better and have all the concepts defined before they are used. Also, the whole of Section 7 should actually appear right before the processing steps (i.e., after the current Section 8). /2/ Content of examples The localization examples are non-normative, but many developers are going to study them closely, so it pays to fine-tune them a little (and fix a few bugs). In the simple example, the second file /sp/index.html should be labeled /locales/es/index.html. Similarly, in the complex example, /locales/sp/ should be /locales/es. The complex example refers to several files which really have the same purpose. I think they should also have the same name, otherwise they cannot be found by the same reference. That is, /locales/es/gatos.html should be called /locales/es/cats.html. Or is it intentional? In Fallback Behaviour Example, first paragraph, last sentence should read: The purpose of this 'fallback' model is to reduce the number of files that need to be created in order to achieve localization of a widget package. (remove 'n' from 'then', add 'in order') /3/ Folder-based localization Suggested addition to the authoring guideline: A Conformance Checker (CC) SHOULD issue a warning if there are empty locale folders in the widget package. This statement in the authoring guideline is puzzling: '[That is,] authors cannot simply put shared files into a language level folder, but need to put all files needed into the language level folder for the widget to work (for example, having a.gif in both /locales/zh-Hans/ folder and locales/zh).' Isn't this the opposite of what is supposed to happen in the fallback model? If the same a.gif is good for both zh-Hans and zh, it should be possible for the author to include it just once in /locales/zh. If the user's language list includes 'zh-Hans', it will also include 'zh', as per Step 5. So a.gif will be found eventually. Replace 'shared1.gif, shared2.gif' with 'shared1.gif and shared2.gif'. Priority is probably a bad term to use with regard to localized folders. /4/ The xml:lang attribute Does the XML specification state that the values of xml:lang attributes must be unique across instances of the same element? If yes, it is probably redundant to repeat that in the context of all the elements in the configuration document. If not, the statement about uniqueness could still be factored out, for example to section 8.4, to avoid repetition. /5/ Processing steps Step 3: the concept of localized config doc appears in the Configuration Defaults table, but that doesn't seem to exist anymore. (See also Step 6.) Step 5: replace unprocessed locales lists with unprocessed locales list throughout. Consider replacing user agent's locales with user agent locales throughout the spec. In the first example of Step 5, why would en and en-au be swapped around when decomposed? Would 'canonicalization' be a more suitable term here than 'decomposition'? /6/ Runtime resolution of localized resources What specification should describe how a reference to a resource (which could be in a localized folder) is resolved at runtime, based on the user's language range? That is not quite in the domain of the packaging specification, given that it is runtime behavior. This functionality is sketched in the fallback behavior example, but it is non-normative. Thanks for considering these comments. Best regards, Jere @ Nokia
[widgets] PC LC, general comments
Marcos, all, some more review comments for Widgets 1.0 Packaging and Configuration LC (Working Draft 28 May 2009), this time of a more general nature (not so much L10N-related). /1/ RFC 2119 keyword usage In writing specs, the work split between MAY and SHOULD is sometimes problematic. I tend to use MAY when there is something that is slightly out of line but permissible in some cases, and SHOULD when it is generally a very good idea or almost mandatory to do something, but it can be skipped given enough reasons. This is why I reacted to this statement in section 3.1: A user agent MAY support the [Widgets-DigSig] specification ... And would have written it as: A user agent SHOULD support ... The same goes for section 3.2. I would say: a user agent SHOULD support the its:span element and the its:dir attribute, and MAY support other ITS elements and attributes. Also, the right name for ITS is Internationalization Tag Set (ation vs. ed). 4 Conformance Checker Last sentence in the section: CCs are NOT REQUIRED to display all messages at once. Here, NOT REQUIRED is not valid RFC 2119 keyword usage. Suggest to remove this sentence altogether, and change the preceding sentence to say: The wording and presentation of the advisory messages are implementation-dependent. 5 Zip Archive It would be enough to say: The packaging format for the files of a widget is the Zip archive format as defined in the [ZIP] file format specification. For conformance (last paragraph in section before note) I suggest, for clarity: To conform to this specification, a Zip archive MUST contain one or more file entries and MUST be a valid Zip archive. (c.f MUST NOT be invalid) 5.2 Version of Zip For Zip version 2.0 features, the MAYs should not be marked up as keywords. 8.5 The widget element, definition of width attribute: Which view modes SHOULD honor the value of this attribute is defined in the the [Widgets-Views] specification. This is invalid keyword usage, suggest to replace with: The view modes that honor the value of this attribute are defined in the [Widgets-Views] specification. /2/ UTF-8 encoding So, we still can't mandate the use of UTF-8 as the path encoding? Of course, RECOMMENDED is fairly strong, but I would prefer MUST. As soon as you move out of US-ASCII range, the bytes CP437 and UTF-8 will be different even for the limited number of non-US-ASCII chacters that CP437 can represent. Marcin has already commented on the ABNF of utf8-chars, I'll participate in that discussion if necessary. 5.4 Reserved Characters: suggest to reverse the order of the glyph and character columns. 8.11 The content element, definition of charset attribute: - Despite widespread use, the name charset is not good. A more accurate name would be encoding. - Suggest to replace (a user agent are REQUIRED to support [UTF-8]) with User agents MUST support [UTF-8]. /3/ JPEG icons This may have been beaten to death before, but I wondered why a JPEG file is not allowed as the default icon. /4/ Configuration document It is not currently an error to have a file called 'config.xml' anywhere inside the widget's folder structure. The only one that matters is the one in the root folder. Maybe the CC should warn about the presence of multiple config.xml documents? In 8.2 Proprietary Extensions, first para, last sentence: For the sake of interoperability, extensions to the configuration document are NOT RECOMMENDED. Two problems: 1) NOT RECOMMENDED is not valid keyword usage; 2) it is OK to extend the configuration document because there is a well-defined mechanism that uses namespaces. Suggestion: remove this statement. Version attribute: in the ABNF, doesn't 1*2DIGIT mean that a version number like 123 would be invalid? Especially build numbers are often three or four digits long. Of course, this is only a guideline, but still... Numeric attribute: apparently all numeric attributes are non-negative. Isn't there any use case for negative numbers, ever? Also, non-Western numerals are not acceptable by this definition, but that might not be a showstopper in this case. Keyword attribute: Is the set of allowable characters inherited straight from the XML spec? Is it self-evident that keywords can't contain spaces because it is the keyword list separator? Couldn't comma also be used as a separator? Boolean attribute: it is stated that only true and false are valid values, but that if the value is something else, the default is false. That seems like a contradiction. Are values other than literal true and false supposed to be actual errors, or are they silently coerced into false? (If I accidentally type required=rue it would come out as false...) /5/ Rule for Parsing a Non-negative Integer Couldn't we just refer to some well-established definition, if there is one handy? Seems like this must have been done many times before. And what about those negative numbers, then? Not needed? :-) Hope this helps, Jere @ Nokia
Re: [widgets] Base folder and resolution of relative paths
Hi Francois, the idea of treating the widget package essentially as a local HTTP server in terms of resources is quite interesting, although like you said there will be differences. In terms of localized resources, it seems a good strategy to use the same mechanism as HTTP and Accept-Language to select localized resources. I don¹t know if anyone has done it like this before in the context of an application; maybe not. But somehow I think we don¹t go all the way to q values. (?) The most important concept here is really that the locale (or language tag, if you will) is specific to any given resource. That does have the risk of having partially localized applications, as you observed, but that is really the developer¹s concern. They just need to make a concerted effort to localize all that is needed. Of course there will always be widgets that have nothing localized, so the mechanism must allow the fallback content to be found as the last step of the lookup. The drawback there is that if resources are always looked for according to the UA language list, it might be wasted effort (and a performance hit) for those widgets that don¹t have anything localized. The search would go all the way from the deepest branch to the root in vain before finding the fallback content. I¹m thinking the widget locale (as derived from the UA language list) should be organized by primary subtag so that the most specific tag comes first, like so: ³en-us, en, fr-ca, fr-fr, fr². At least Marcos seemed to like this idea. That would simplify things by making the base folder essentially the first tag in the widget locale. (Or could this be folded to just en-us, fr-ca, fr-fr? Content negotiation should know to drop the subtags successively anyway.) Assumptions/premises: - The content negotiation is applied only to widget: URIs (incidentally, would a relative URI inside widget content then always be considered a widget: URI?) - When referring to content inside the widget package, the Olocales/some-language-tag¹ part is never included in the URI. Instead, the content negotiation mechanism should kick in and find the resource given just a relative URI like 'flag.png'. - If the content negotiation finds no localized content (i.e. there is no requested representation for the resource), a resource with the specified name in the root of the widget is used. If not present, it is probably an error. I'm trying to help Marcos hammer out these things in the PC spec, so I really appreciate your feedback. --Jere -- Jere Käpyaho (jere.kapy...@nokia.com) Specialist, Developer Platforms Standardization Devices RD, Nokia Corporation On 11.5.2009 11.26, ext Francois Daoust f...@w3.org wrote: Hi Jere, Let me try to clarify my thoughts... Note I'm actually suggesting an amended C3: one locale per resource, and one locale for element-based localization. For folder-based localization, my rationale is that it's useful to view a widget package as a local HTTP server because it allows one to create a widget from an existing server-based web site with limited changes. There will always be differences but trying to stick to HTTP rules when possible (and mostly harmless, i.e. when there's no strong benefit not to follow these rules) sounds like a good idea. In HTTP exchanges, the locale of a resource is resource-specific: the server returns the resource in a locale that best matches the list defined in the Accept-Language HTTP header field of the HTTP request. Two requests with the same language list sent to two different resources may return resources in two different languages. That's up to the content provider to make sure that things are consistent. I'm suggesting that you use the same mechanism for folder-based localization. In practice, this means that there is no one locale for folder-based localization, it's rather one locale per resource. It sure makes it possible to have half of a widget in Russian while the other half is in Chinese, which does not make a lot of sense. But then the same thing is possible with server-based web sites, and returned languages are based on the list of languages supposedly understood by the end user anyway. For element-based localization, I don't feel strongly one way or the other. Here, we're not talking about one resource per language, but about one resource that contains information in different languages. It seems better to define one locale for the whole resource for consistency reasons, and that's what C3 suggests. Francois. jere.kapy...@nokia.com wrote: Hi Francois et al, reading the comments below, I¹m puzzled as to why C3 would be preferable. It says: ... two different locales: one specifically for folder-based localization, and the other specifically for element-based localization. What is the rationale for this? I would think that the locale influencing the selection of material should be the same for both (and use default unlocalized
Re: I18N issue: case-sensitivity of locale subdirectories
On 5.5.2009 13.16, ext Marcos Caceres marc...@opera.com wrote: On Wed, Apr 29, 2009 at 4:16 PM, Robin Berjon ro...@berjon.com wrote: Assume we have two localisation subdirectories: locales/en/ locales/EN/ What happens? BCP47 (which we reference) is defined to be case-insensitive so it doesn't help us much in this respect. There are multiple options: a) we define a canonical casing and all others are ignored; b) we select an order of priority and we only consider one (the first to match); c) we select an order of priority and we merge them all (in that order, with a given precedence rule); d) the device on which the user agent is catches fire. I think that (a) should be ruled out because as BCP47 tells us, ISO639-1 recommends lowercase (language codes), ISO3166-1 recommends uppercase (country codes), and ISO15924 recommends titlecase (script codes). These are different, but likely to be confusing, and I don't think that developers should have to worry about that. Agreed. Because BCP47 is indeed case-insensitive [1], both en and EN (and also eN and En) are considered equivalent. While it is probably an oversight or error to provide several variants of the same language tag with different character case anyway, they need to be considered somehow because they *are* equivalent, unless it is made explicit that this is an error in the packaging. The path inside the widget's ZIP file is already defined as case-insensitive, so it is actually already an error to have two or more folders with names that differ only by character case. Even if some implementation unzips the content of the widget to a local filesystem, we have no control over whether filenames in that filesystem are case-insensitive or case-sensitive. I don't have a strong opinion on this, but I do I have a preference for a rule based on (b): if multiple locale subdirectories have the same case-insensitive name, then the one that comes first in ASCII-code order (e.g. in order: EN, En, eN, en) is used and the others are ignored. This seems reasonable. I will add this. I suggest that the widget packaging rules say that any localized folders must be unique in terms of a case-insensitive match, otherwise the packaging is invalid [2]. This also allows us to not talk about ASCII code ordering. Furthermore, there is then no need to merge the contents of such folders. For the degenerate (but unfortunately unavoidable) case where someone has managed to slip in two or more such folders, define a canonical casing (obvious suggestion: lowercase) and use it, then simply ignore any others. The argument in favour of only using one is that we already have to merge multiple directories, and adding one merge operation for what is in all probability a user error seems like too much complexity for little value (I'm happy to be contradicted by implementers however). Picking ASCII-code order is based on the fact that the directory names must be ASCII here (the others must be discarded), and picking the first is arbitrary. Thoughts? I support b. Added some of your text above to the spec. I guess none of a)-d) really fit my observations as such. It's more like additional packaging rules + shades of a). Note that for comparisons with the widget locale value you still need to case-fold [3] everything anyway. There is no guarantee that the widget locale matches any localized subfolder name as such, because the widget locale itself could use capitalization that really carries no meaning, but fails to match any localized folder unless you do a case-insensitive comparison. In this case the comparison can be also language-insensitive, because BCP47 language tags consist of US-ASCII characters. Hope this helps, Jere [1] http://tools.ietf.org/html/bcp47#section-2.1 [2] http://dev.w3.org/2006/waf/widgets/#invalid-widgets [3] http://www.w3.org/International/wiki/Case_folding [0]http://dev.w3.org/cvsweb/~checkout~/2006/waf/widgets/i18n.html?rev=1.29co ntent-type=text/html;%20charset=utf-8 -- Robin Berjon - http://berjon.com/ Feel like hiring me? Go to http://robineko.com/ -- Marcos Caceres http://datadriven.com.au -- Jere Käpyaho (jere.kapy...@nokia.com) Specialist, Developer Platforms Standardization Devices RD, Nokia Corporation
Re: Widgets 1.0 Packaging and Configuration: I18N comments...
Marcos, thanks for a lucid and thorough widget I18N model proposal. This should really help all concerned to come to agreement about how widgets should be internationalized and localized. Also in this case I18N turned out to be more than many perhaps thought it would, but the effort is worth it, given the importance of being able to present widgets in the user's language. I've taken a first look at the proposal document (rev1.19 as of April 15); please accept a few comments. Wrt PROPOSAL A2, I would strongly recommend that the single language tag be non-empty. There is always some default language that can be retrieved or derived. Allowing an empty language tag serves no purpose, and really just complicates matters. Wrt PROPOSAL A1, the same thing applies. The language tag list should be non-empty. Although I'm not convinced that multiple language tags are of value. In this context, it should also be noted that the same widget probably shouldn't use content labeled with two or more completely disjoint language tags. At least the language must be a common denominator, even if subtags differ. So there should never be a case where the user's preferred language list is used naively when locating resources that are not found. (Example from A1 shows en-us,en,fr,* -- in this case if you want resource X but it is not found for 'en-us' or 'en', then you shouldn't go looking for an 'fr' version of it.) In The widget's locale, the first question strikes me as odd. The localizable materials in the configuration document and in the folders seem like disjoint enough that there is no issue of mixing those two. The widget's locale is the same for both, as it is determined from the UA locale. If there are no entries for the widget's locale in the config file, or if there is no relevant folder content, then a fallback/default is used. The second question should be simple enough: the widget's locale should be represented as a single language tag, based on the UA locale. If there are indeed multiple preferred languages, in the order of pereference, then it is the first one. IMHO there shouldn't be any overlap between folder content and config document. They serve different purposes: the config document has mostly housekeeping data, whereas folder content is presented to the user dynamically at runtime. This is why I don't like PROPOSAL B3. The locale for both config elements and folders must be the same. My preference is clearly to have one well-defined, non-empty language tag represent the widget's locale. If there are no config elements or folder content for that, then there must be fallback elements / content that is not tagged with anything, and that serves as the default data. Note that this is different from what is described in PROPOSAL C1. Unlocalized content can and should be used even if there is a derived widget locale, because somebody just didn't provide localized content for that particular language tag. I think PROPOSAL C3 is the closest to this approach, but it considers the widget locale to be more than one language tag. Neither PROPOSAL D1 or D2 says exactly what I at least would like. PROPOSAL D2 comes close, but any resources that are not tagged with a language tag (i.e. are default resources) should probably go in the 'locales' folder. However, it doesn't really matter where they are found, so if it is OK to place arbitrary files in the root of the widget anyway, then the defaults might just as well be there. Regarding xml:base, PROPOSAL E1 is the most sensible one. This is what Dashboard must be doing also, although someone more intimate with WebKit might want to have a look there. However, the issue of the subtags complicate things a bit: if you really want to find resources that are not available for the most specific tag, but could exist for a less specific tag, then it is not enough to have just a static xml:base, but additional processing rules for resource access are needed. If subtags are not honoured, and the language tag must always be an exact match, then a static xml:base would be enough, I guess. I don't know where we currently stand on the subtag issue, but it seems like an important use case. Hope I've added at least some value to the discussion. It might be good to ask other WG members to provide their explicit preferences about the different proposals, or at least input such as I have. I hope and think you can extract some of my preferences from above. Best regards, Jere On 3.4.2009 18.02, ext Marcos Caceres marc...@opera.com wrote: Hi Addison, On Fri, Apr 3, 2009 at 4:38 PM, Phillips, Addison addi...@amazon.com wrote: Hello Webapps, Thanks for the response. Is there is a new draft or editor's copy where these changes are made? Yes, see [1]; but we are still working out the details. As this change caused some radical changes in the spec, I am still working out how to make the processing model work. I'm currently
Re: [widgets] Zip endian issue?
Well, the ZIP file specification does say that all values are stored in little-endian byte order unless otherwise specified. The local file header signature is the four bytes 50 4B 03 04, in this order, always. Endianness is not even an issue, if you read and compare individual bytes. How could the ZIP file be entirely transposed on media? It is a binary file; if some entity is transposing it, it's not the same file anymore. A practical example of where and how this would be happening is needed. --Jere On 3.4.2009 11.54, ext Marcos Caceres marc...@opera.com wrote: Hi, I recently chatted with Josh and he pointed out that there might be an endian issue when checking for magic numbers. [[ timeless.b...@gmail.com: zip files are guaranteed not to have ENDIAN issues? If the first four bytes of the potential Zip archive do not match the magic numbers for a Zip archive (50 4B 03 04) me: ah, good point timeless.b...@gmail.com: i.e. could you have 4B500403 or whatever the correct endian corruptions are :) me: All values are stored in little-endian byte order unless otherwise specified. timeless.b...@gmail.com: yeah, but could the file be entirely transposed on media test it.. create a zip file w/a single tiny file and then use some perl magic to swap each pair of bytes :) (i'm not awake enough to do that) me: neither am I ]] Anyone with experience in this area want to propose some text for the PC spec to fix this issue. Kind regards, Marcos -- Marcos Caceres http://datadriven.com.au
Re: [widgets] Further argument for making config.xml mandatory
In the context of the discussion about having a mandatory config file, I proposed to simplify matters even further and have just one config file, with the note that this proposal could be ignored in the interest of time and/or effort. There are pros and cons to both approaches, as Marcos has itemized. While I think it was great of Marcos to consider this proposal, this issue could really be resolved either way, and I certainly don't want this to hold up the spec. Then again, not all have expressed sufficient rationale to back up their opinion, or expressed any opinion at all. The configuration file does not seem to contain so much translatable data that it would be an issue for localization, and that issue was not raised by the W3C I18N WG either (on the contrary, rather). That was why I thought the simplification was a good idea, but I'm also OK with multiple config files. For the record, you can still count me as backing the idea of a single, mandatory config file, with some multiple elements distinguished by an xml:lang attribute. --Jere On 22.3.2009 20.06, Art Barstow art.bars...@nokia.com wrote: On Mar 19, 2009, at 12:06 PM, ext Marcos Caceres wrote: On Thu, Mar 19, 2009 at 4:52 PM, Andrew Welch andrew.j.we...@gmail.com wrote: That's exactly what I was talking about when I said even thought the XML i18n guidelines say it's bad practice,'. Ahh very sorry, I just saw the email after that containing the code sample, and gmail collapses the quoted parts my bad. However, Addison Phillips, the Chair of i18n core, said the following in the formal feedback representing the i18n WG's LC comments for the spec [1]: Section 7.4 (Widget) The various language bearing elements such as name, description, etc. are of the zero-or-one type. However, it is typically better to allow any number of these elements to occur, provided that none share the same xml:lang. This allows for localization (which is part of the point in allowing xml:lang on the element). So we have been blessed by them to do this... umm this somewhat questionable, yet problem solving thing :) [1] http://lists.w3.org/Archives/Public/public-webapps/2009JanMar/ 0259.html That's interesting, because xml:lang seems pretty redundant otherwise! Alright, lets see a show of hands for this approach! Who supports us just having a single config.xml with a bunch of repeated elements, but with different xml:langs? Advantages here are: * we only need to make very small modifications to the parsing model. * no more searching for config docs in locale folders * no multiple parsing of config files Disadvantages: * large, and, if not careful, hard to maintain config files My experience working with localizers is that separating their data (concerns) into separate files was a good model and eliminates potential issues with a localizer accidentally removing or changing some other part of the config file. Marcos - what part(s) of the old/current model - i.e. where there is a root config file plus a config file may also be installed in a locale-specific directory and the locale-specific config file would only contain those parts that are localized - are not properly specified (i.e. needs more spec work)? All - my take of positions here is as follows. Please let us know if this data is not correct and if you have not indicated your preference, please do so ASAP (before the March 26 Voice Conference): Only one root config file: Jere, Marcos A root config file plus one per locale directory: Benoit, Josh -Regards, Art Barstow
Re: [widgets] Further argument for making config.xml mandatory
I still think that more than one config document is the most confusing aspect of this. Having just one (mandatory) config document, with the localized parts tagged with xml:lang attributes would be the simplest. However, as I understand it, the separate config files were recommended by the W3C I18N group. If this decision would be reversed, then anything in the config document that could (as per the schema) have an xml:lang attribute would by definition be localizable/localized. Others (like id, version etc.) would not be. That would also free the implementation from collecting all the various config documents, just to create and store an intersection of the elements. If you have two values for the same element, then who wins? The most specific (from the config in the localized folder), or the least specific (the default/fallback one from the root)? Proposal (feel free to ignore, due to pressure to be feature complete): make the config file mandatory, but allow it only in the root, then allow multiple elements with unique xml:lang attributes for those elements that are localizable. --Jere On 19.3.2009 16.24, ext Marcos Caceres marc...@opera.com wrote: On Thu, Mar 19, 2009 at 1:15 PM, Priestley, Mark, VF-Group mark.priest...@vodafone.com wrote: Hi Marcos, All, I would like to raise a comment in support of making the configuration document at the root of the widget mandatory. The localisation model currently described by [1] allows for multiple configuration documents; zero or one at the root of the widget and zero or one at the root of each locales folder. While we support the approach of allowing localisation of the configuration document (with the addition of the fallback mechanism that has been previously discussed), one concern we had with such an approach was that it doesn't make sense to localise some of the information in the configuration document, for example: the feature element, (the replacement for) the access element, the license element, the id and version attributes (and maybe others?). In fact in some cases, allowing different values could present security risks. Previously we (Vodafone) had considered an approach of requiring user agents to, for example, create a list of all feature elements present in any valid configuration document. We had not yet thought how to handle the case in which the different configuration documents contain different id attribute values. However, now that there is a proposal to make the configuration document at the root of the widget mandatory, I would like to propose that a better (although not pretty) solution would be specify which attributes and elements are localisable. The non-localisable attributes / elements would only be valid if included in the configuration document at the root of the widget. Thoughts? Proposal: not localizable: widget's id and version attributes. feature and its options access The following elements would be localizable: widget (but no id or version, derived from root config, if available) name description author license icon content preference screenshot FWIW, I think this will confuse authors... and irritate the poor souls who need to implement this :) Kind regards, Marcos -- Marcos Caceres http://datadriven.com.au
Re: Widget Signature update
One (possibly minor) point regarding the filename rule: At least the Widgets 1.0 PC spec uses ABNF (RFC 5234) and refers to it, maybe this would be good also in the DigSig spec? The rule expressed in ABNF would be something like: signature-filename = signature non-zero-digit *DIGIT .xml non-zero-digit = %x31-39 Here, DIGIT is a prefabricated rule defined in RFC 5234. This rule says that in between the strings there must be at least one non-zero digit, followed by zero or more normal digits. The normative reference for ABNF would be (grabbed from the PC spec): dtdfn id=abnf[ABNF]/dfn/dt ddRFC 5234, a href=http://www.ietf.org/rfc/rfc5234.txt;citeAugmented BNF for Syntax Specifications: abbr title=Augmented Backus-Naur FormABNF/abbr/cite/a. D. Crocker and P. Overell. January 2008./dd --Jere On 9.3.2009 22.51, Hirsch Frederick (Nokia-CIC/Boston) frederick.hir...@nokia.com wrote: I updated section 4 to correspond to this: If the signatures list is not empty, sort the list of signatures by the file name field in ascending numerical order (e.g.signature1.xml followed by signature2.xml followed by signature3.xml etc). regards, Frederick Frederick Hirsch Nokia On Mar 6, 2009, at 10:07 AM, ext Marcos Caceres wrote: Hi Frederick, On 3/6/09 3:59 PM, Frederick Hirsch wrote: I've updated the widget signature document distributor file naming convention to the following after discussing this with Josh (thanks Josh): Naming convention for a distributor signature: |signature [1-9][0-9]* .xml| * Each distributor signature /MUST/ have a name consisting of the string signature followed by a digit in the range 1-9 inclusive, followed by zero or more digits in the range 0-9 inclusive and then .xml, as stated by the BNF. An example is signature20.xml. * Leading zeros are disallowed in the numbers. * Any file name that does not match this BNF /MUST/ be ignored. Thus a file named signature01.xml will be ignored. A warning /MAY/ be generated. * There is no requirement that all the signature file names form a contiguous set of numeric values. * These signatures /MUST/ be sorted numerically based on the numeric portion of the name. Thus signature2.xml preceeds signature11.xml, for example. See draft http://dev.w3.org/2006/waf/widgets-digsig/#distributor-signatures I also updated the notation section, changed the code format to be italic (without color), and updated the body style to not be quite so large. Please indicate any comment or corrections on the list. The changes look good to me! thank you. Kind regards, Marcos
Re: [widgets] Minutes from 5 March 2009 Voice Conference
Easier on the eye, but to me it's pretty close to the color of RFC 2119 keyword style (em.ct). Seems like the body text font has grown in size somewhat, compared to other specs. --Jere On 5.3.2009 18.03, Hirsch Frederick (Nokia-CIC/Boston) frederick.hir...@nokia.com wrote: I updated the style for code items in the Digital Signature specification to brown. Does this work better? It does not conflict with other color uses as far as I can tell. Please look at http://dev.w3.org/2006/waf/widgets-digsig/ (refresh) regards, Frederick Frederick Hirsch Nokia On Mar 5, 2009, at 10:11 AM, Barstow Art (Nokia-CIC/Boston) wrote: JS: re styling, orange doesn't work well for me regarding readability MC: I can help with that FH: I'll take a pass at that