Re: Client side JavaScript i18n API

2010-04-27 Thread Jere.Kapyaho
On 26.4.2010 21.49, ext Nebojša Ćirić c...@chromium.org wrote:
 We have a first draft at
 http://docs.google.com/Doc?id=dhttrq5v_0c8k5vkdh (it has view/edit
 permissions).
 
 It's describes a small subset of final API we intend to implement.
 We've picked date/time formatting and collation as must have for the
 first iteration.
 
Have you guys looked at the I18N facilities of the Dojo JavaScript
framework? [1] Seems like there is some overlap with your API, and Dojo is
already well established in the field. Maybe there are some opportunities to
reuse and/or coordinate, to avoid duplicate APIs.

The Dojo formatting modules cover dates, numbers, and currencies in a
locale-specific manner, and are based on the CLDR (nice!). [2]

Dojo has the concept of locale [3], but looks like it doesn't support
locale-specific collation out of the box, and I don't know if you can plug
in your own comparison (i.e. an implementation of the UCA). Maybe someone
who knows more can confirm or deny.

Best regards,
Jere

[1] http://www.dojotoolkit.org/
[2] 
http://docs.dojocampus.org/quickstart/internationalization/number-and-curren
cy-formatting
[3] http://www.ibm.com/developerworks/web/library/wa-dojo/




[widget-uri] Widget URI ABNF definition comments

2009-09-15 Thread Jere.Kapyaho
Robin, all,

I had a look at the Widget URIs Working Draft of 08 Sept 2009, and a couple
of questions popped into mind, since I'm a sucker for all things ABNF. :-)

/1/ Is it the intent that the 'opaque authority' corresponds exactly to the
iauthority definition in the ABNF? If so, am I correct in assuming that it
doesn't matter at this point that there is no mention of the iuserinfo and
port components wrt. widget URIs, because the opaque authority intentionally
has no semantics?

/2/ Also, I'm trying to figure out what is the relationship between
zip-rel-path (as found in Widgets PC) and ihier-part (as found in RFC 3897)
-- if we rely on RFC 3987 then I think these parts must agree. Based on the
current ABNF definition of the widget URI, it seems that the only matching
variant of ihier-part is

// iauthority ipath-abempty

since that is the only one with //, although I'm not sure why it needs to
be so.

If we disregard iauthority (for reasons detailed above), the question then
becomes: is zip-rel-path compliant with ipath-abempty? Both of those
definitions are quite complicated, and I would be (pleasantly) surprised if
it turned out that they do match.

My concern here is that we should not say a widget URI is an RFC 3987
compliant IRI unless that definition is really valid, so I'm trying to make
the connections and tie any loose ends to be able to say that with
confidence.

/3/ Since we rely on RFC 3987, I guess the widget URI definition should
reuse components from IRI as much as possible. But in the end, it all boils
down to whether someone using a bona fide IRI validator (if those even
exist) would be able to get consistent results when feeding it with widget
URIs. And of course it becomes more interesting when widget URIs point to
files inside the widget package that have non-ASCII characters in their
names.

/4/ Finally, the IRI vs URI naming debate applies as ever. I agree it's
messy in that we are so accustomed to URIs, but really should be using IRIs,
and that not everyone is conditioned to mentally replace URI with IRI every
time. Maybe changing the document name to Widgets 1.0: Widget Resource
Identifiers would sidestep some of the problem. :-)

Best regards,
Jere




Re: [widget-uri] Widget URI ABNF definition comments

2009-09-15 Thread Jere.Kapyaho
On 15.9.2009 13.55, ext timeless timel...@gmail.com wrote:

On Tue, Sep 15, 2009 at 1:49 PM, jere.kapy...@nokia.com wrote:
 Maybe changing the document name to Widgets 1.0: Widget Resource
 Identifiers would sidestep some of the problem. :-)

So we can have URNs, URLs, URIs, IRIs, and WRIs? i'm not sure that's
an improvement :)

Well, maybe not. But you forgot XRI by OASIS [1]. :-)

--Jere

[1] http://en.wikipedia.org/wiki/Extensible_Resource_Identifier


Re: [widgets] PC LC comments on I18N/L10N

2009-07-03 Thread Jere.Kapyaho
On 2.7.2009 14.40, ext Marcos Caceres marc...@opera.com wrote:

 On Tue, Jun 30, 2009 at 10:27 AM, jere.kapy...@nokia.com wrote:
 On 29.6.2009 13.30, ext Marcos Caceres marc...@opera.com wrote:
 /JK1/ OK, I've checked it and I think it is now easier to find all the
 relevant stuff. However, the Localization guidelines section (now 8.1) is
 marked as non-normative, but I think it should be as normative as 8.2 and
 8.3. Especially since there is some CC behaviour described.
 
 Agreed. fixed.

DoC: OK, although in the July 1 ED the section is still marked as
non-normative. Maybe the section should be called Localization model to
make it sound more, eh, normative?

 
 The complex example refers to several files which really have the same
 purpose. I think they should also have the same name, otherwise they cannot
 be found by the same reference. That is, /locales/es/gatos.html should be
 called /locales/es/cats.html. Or is it intentional?
 
 Never is:) That is a left over mistake from when we had multiple configs.
 
 /JK2/ Looks like the third file in 8.4.2 is still labeled
 locales/es/gatos.html. Those cats are tough to manage! :-)
 
 Argh! Fixed.

DoC: OK

 This statement in the authoring guideline is puzzling: '[That is,] authors
 cannot simply put shared files into a language level folder, but need to
 put
 all files needed into the language level folder for the widget to work (for
 example, having a.gif in both /locales/zh-Hans/ folder and
 locales/zh).' Isn't this the opposite of what is supposed to happen in
 the
 fallback model? If the same a.gif is good for both zh-Hans and zh, it
 should be possible for the author to include it just once in /locales/zh.
 
 Yes, this is correct.
 
 If the user's language list includes 'zh-Hans', it will also include 'zh',
 as per Step 5. So a.gif will be found eventually.
 
 Right.
 
 /JK3/ So then, isn't the statement I quoted above actually incorrect?
 Authors *can* simply put files into a language level folder, so that
 duplicates of the exact same file are not needed. If they have the same name
 but the content is different, then that's OK too. (Meaning that a resource
 called 'a.gif' could be completely different for zh-hans-cn and zh, and what
 is ultimately retrieved depends on the UA locale.)
 
 One more way to put it is that if your UA locale is zh-hans-cn, and there is
 no 'locales/zh-hans-cn/a.gif', but there is a 'locales/zh/a.gif', then the
 latter would be found and used.
 
 If you agree with this reasoning, then I think the statement should be
 removed. Or am I misreading it somehow? This is actually pretty crucial to
 the working of the whole model, so it's important to determine that we do
 have the same understanding, and that the spec text doesn't lead readers up
 the garden path in any way.
 
 Ok, I think I stuffed the description up. I means this:
 
   locales/en/b.gif
   locales/en-au/a.gif
   locales/en-us/a.gif
 
 If en is matched, then a.gif would obviously be missing (as it is
 not in locales/en/). I'm trying to say that authors should not rely on
 the top most level.
 
 I obviously need to dump the sentence in the spec, but I still need to
 make the above clear. Can you help me out with that?

I think the problem with the authoring guideline in 8.2 is that there are
too many disconnected examples. The third example is all that is needed.
When that is moved to the front of the section, then all the prose can be
about it.

Here is my suggestion. It gets rid of the two other examples and focuses on
the zh-Hans-CN example.

8
Authoring guideline:

Authors need to avoid region, script or other subtags except where they add
useful distinguishing information to a locale folder. In addition, avoid
including empty locale folders in a widget package (unless there is a good
reason to include them).

An example of a widget that uses folder-based localization:

widget.wgt  
  locales/zh-Hans-CN/a.gif
  locales/zh-Hans-CN/f.gif
  locales/zh-Hans/a.gif
  locales/zh-Hans/b.gif
  locales/zh/a.gif
  locales/zh/b.gif
  locales/zh/c.gif
  a.gif
  b.gif
  c.gif
  d.gif
  index.html
  config.xml

Authors can further facilitate the localization process by grouping files
into folder hierarchies made up of matching subtags, as is shown in the
example. Note also that the user agent treats any file or folder outside the
container for localized content as unlocalized content.

Assuming the widget's locale is zh-hans-cn:
- A reference to a.gif would resolve to locales/zh-Hans-CN/a.gif
- A reference to b.gif would resolve to locales/zh-Hans/b.gif
- A reference to c.gif would resolve to locales/zh/c.gif
- A reference to d.gif would resolve to d.gif, as it is not associated with
any locales and is hence available to all locales.

This works at all sub-levels, so long as the parent subtag matches the child
subtags. So, for example, the CN region can make use of the localized
files in the zh-Hans folder level, the zh folder level, and the
unlocalized files at the 

Re: [widgets] PC LC comments on I18N/L10N

2009-06-30 Thread Jere.Kapyaho
Hi Marcos,

thanks for your effort. See below for specific points. (Marked accordingly,
see end of this e-mail for the legend.)

On 29.6.2009 13.30, ext Marcos Caceres marc...@opera.com wrote:

 Hi Jere,
 
 Fixes and some questions below. I got stuck on your last point, can
 you please clarify it or suggest more clearly what you want me to do
 there?

Sure, see inline below.

 Right. I've moved everything and renamed examples  Localization examples.
 
 This would make the material flow better and have all the concepts defined
 before they are used.
 
 I will need you to check this before we republish. Is that OK?
 
 Also, the whole of Section 7 should actually appear right before the
 processing steps (i.e., after the current Section 8).
 
 I moved everything as you suggested.

/JK1/ OK, I've checked it and I think it is now easier to find all the
relevant stuff. However, the Localization guidelines section (now 8.1) is
marked as non-normative, but I think it should be as normative as 8.2 and
8.3. Especially since there is some CC behaviour described.

 The complex example refers to several files which really have the same
 purpose. I think they should also have the same name, otherwise they cannot
 be found by the same reference. That is, /locales/es/gatos.html should be
 called /locales/es/cats.html. Or is it intentional?
 
 Never is:) That is a left over mistake from when we had multiple configs.

/JK2/ Looks like the third file in 8.4.2 is still labeled
locales/es/gatos.html. Those cats are tough to manage! :-)

 This statement in the authoring guideline is puzzling: '[That is,] authors
 cannot simply put shared files into a language level folder, but need to put
 all files needed into the language level folder for the widget to work (for
 example, having a.gif in both /locales/zh-Hans/ folder and
 locales/zh).' Isn't this the opposite of what is supposed to happen in the
 fallback model? If the same a.gif is good for both zh-Hans and zh, it
 should be possible for the author to include it just once in /locales/zh.
 
 Yes, this is correct.
 
 If the user's language list includes 'zh-Hans', it will also include 'zh',
 as per Step 5. So a.gif will be found eventually.

 Right.

/JK3/ So then, isn't the statement I quoted above actually incorrect?
Authors *can* simply put files into a language level folder, so that
duplicates of the exact same file are not needed. If they have the same name
but the content is different, then that's OK too. (Meaning that a resource
called 'a.gif' could be completely different for zh-hans-cn and zh, and what
is ultimately retrieved depends on the UA locale.)

One more way to put it is that if your UA locale is zh-hans-cn, and there is
no 'locales/zh-hans-cn/a.gif', but there is a 'locales/zh/a.gif', then the
latter would be found and used.

If you agree with this reasoning, then I think the statement should be
removed. Or am I misreading it somehow? This is actually pretty crucial to
the working of the whole model, so it's important to determine that we do
have the same understanding, and that the spec text doesn't lead readers up
the garden path in any way.

 Priority is probably a bad term to use with regard to localized folders.
 
 I changed it to:
 [[
 In the example below, assuming the widget's locale is zh-hans-cn:
 * The a.gif file in the zh-Hans-CN locale folder would be used
 instead of the a.gif file in the zh-Hans locale folder.
 * The b.gif file in the zh-Hans locale folder would be used instead of
 the b.gif file in the zh locale folder.
 * The c.gif in the zh locale folder would be used instead of the c.gif
 file root of the widget package.
 * The d.gif file would always be used from the root of the widget, as
 it is not associated with any locales and is hence available to all
 locales.
 ]]

I think this now accurately explains what should happen, thanks.

 /4/ The xml:lang attribute
 
 Does the XML specification state that the values of xml:lang attributes must
 be unique across instances of the same element?
 
 No.
 
 If yes, it is probably
 redundant to repeat that in the context of all the elements in the
 configuration document. If not, the statement about uniqueness could still
 be factored out, for example to section 8.4, to avoid repetition.
 
 Although redundant, I think I will leave this as is. If it's still
 annoying, we can remove it in CR as it would be an editorial change
 and it would have no normative impact. Is that OK?

Yes, that's OK. (I was just trying to eradicate duplicate text.)

 In the first example of Step 5, why would en and en-au be swapped around
 when decomposed?
 
 I was missing an en, which might have been causing confusion. Taking
 the first part of the example:
 en-us,en-au,fr,en
 
 Would become:
 en-us,en-au,fr,en
 
 And would normalize to:
 en-us,en,en-au

Sorry, but that doesn't correspond to anything I see in the spec now... :-(
Maybe my original comment was misdirected, but you kind of lost me there.

 _however_, during 

[widgets] PC LC comments on I18N/L10N

2009-06-03 Thread Jere.Kapyaho
Marcos, all,

here's a bunch of comments related to the I18N/L10N related parts of the
Widgets 1.0: Packaging and Configuration spec (Last Call i.e. Working Draft
28 May 2009).

/1/ Order of material

From an editorial standpoint, I think that the 7.2 Examples subsection
should really be the last one in this section.

Information about element-based localization, which now is 8.15, should
appear in section 7, because the concepts are used in Section 8 before they
are even defined. It would be good to have all related info in the same
section.

My suggestion for the organization of the material is:

7 Internationalization and localization
7.1 Localization guidelines
7.2 Folder-based localization (used to be 7.3)
7.3 Element-based localization (used to be 8.15)
7.4 Localization examples (used to be 7.2 Examples, note new name)

This would make the material flow better and have all the concepts defined
before they are used.

Also, the whole of Section 7 should actually appear right before the
processing steps (i.e., after the current Section 8).


/2/ Content of examples

The localization examples are non-normative, but many developers are going
to study them closely, so it pays to fine-tune them a little (and fix a few
bugs).

In the simple example, the second file /sp/index.html should be labeled
/locales/es/index.html. Similarly, in the complex example, /locales/sp/
should be /locales/es.

The complex example refers to several files which really have the same
purpose. I think they should also have the same name, otherwise they cannot
be found by the same reference. That is, /locales/es/gatos.html should be
called /locales/es/cats.html. Or is it intentional?

In Fallback Behaviour Example, first paragraph, last sentence should read:
The purpose of this 'fallback' model is to reduce the number of files that
need to be created in order to achieve localization of a widget package.
(remove 'n' from 'then', add 'in order')


/3/ Folder-based localization

Suggested addition to the authoring guideline: A Conformance Checker (CC)
SHOULD issue a warning if there are empty locale folders in the widget
package.

This statement in the authoring guideline is puzzling: '[That is,] authors
cannot simply put shared files into a language level folder, but need to put
all files needed into the language level folder for the widget to work (for
example, having a.gif in both /locales/zh-Hans/ folder and
locales/zh).' Isn't this the opposite of what is supposed to happen in the
fallback model? If the same a.gif is good for both zh-Hans and zh, it
should be possible for the author to include it just once in /locales/zh.
If the user's language list includes 'zh-Hans', it will also include 'zh',
as per Step 5. So a.gif will be found eventually.

Replace 'shared1.gif, shared2.gif' with 'shared1.gif and shared2.gif'.

Priority is probably a bad term to use with regard to localized folders.


/4/ The xml:lang attribute

Does the XML specification state that the values of xml:lang attributes must
be unique across instances of the same element? If yes, it is probably
redundant to repeat that in the context of all the elements in the
configuration document. If not, the statement about uniqueness could still
be factored out, for example to section 8.4, to avoid repetition.


/5/ Processing steps

Step 3: the concept of localized config doc appears in the Configuration
Defaults table, but that doesn't seem to exist anymore. (See also Step 6.)

Step 5: replace unprocessed locales lists with unprocessed locales list
throughout.

Consider replacing user agent's locales with user agent locales
throughout the spec.

In the first example of Step 5, why would en and en-au be swapped around
when decomposed? Would 'canonicalization' be a more suitable term here than
'decomposition'?


/6/ Runtime resolution of localized resources

What specification should describe how a reference to a resource (which
could be in a localized folder) is resolved at runtime, based on the user's
language range? That is not quite in the domain of the packaging
specification, given that it is runtime behavior. This functionality is
sketched in the fallback behavior example, but it is non-normative.


Thanks for considering these comments.

Best regards,
Jere @ Nokia




[widgets] PC LC, general comments

2009-06-03 Thread Jere.Kapyaho
Marcos, all,

some more review comments for Widgets 1.0 Packaging and Configuration LC
(Working Draft 28 May 2009), this time of a more general nature (not so much
L10N-related).


/1/ RFC 2119 keyword usage

In writing specs, the work split between MAY and SHOULD is sometimes
problematic. I tend to use MAY when there is something that is slightly out
of line but permissible in some cases, and SHOULD when it is generally a
very good idea or almost mandatory to do something, but it can be skipped
given enough reasons.

This is why I reacted to this statement in section 3.1:
A user agent MAY support the [Widgets-DigSig] specification ...
And would have written it as:
A user agent SHOULD support ...

The same goes for section 3.2. I would say: a user agent SHOULD support the
its:span element and the its:dir attribute, and MAY support other ITS
elements and attributes. Also, the right name for ITS is
Internationalization Tag Set (ation vs. ed).

4 Conformance Checker
Last sentence in the section: CCs are NOT REQUIRED to display all messages
at once. Here, NOT REQUIRED is not valid RFC 2119 keyword usage. Suggest
to remove this sentence altogether, and change the preceding sentence to
say: The wording and presentation of the advisory messages are
implementation-dependent.

5 Zip Archive
It would be enough to say: The packaging format for the files of a widget
is the Zip archive format as defined in the [ZIP] file format
specification.

For conformance (last paragraph in section before note) I suggest, for
clarity: To conform to this specification, a Zip archive MUST contain one
or more file entries and MUST be a valid Zip archive. (c.f MUST NOT be
invalid)

5.2 Version of Zip
For Zip version 2.0 features, the MAYs should not be marked up as keywords.

8.5 The widget element, definition of width attribute: Which view modes
SHOULD honor the value of this attribute is defined in the the
[Widgets-Views] specification. This is invalid keyword usage, suggest to
replace with: The view modes that honor the value of this attribute are
defined in the [Widgets-Views] specification.

 
/2/ UTF-8 encoding

So, we still can't mandate the use of UTF-8 as the path encoding? Of course,
RECOMMENDED is fairly strong, but I would prefer MUST. As soon as you move
out of US-ASCII range, the bytes CP437 and UTF-8 will be different even for
the limited number of non-US-ASCII chacters that CP437 can represent.

Marcin has already commented on the ABNF of utf8-chars, I'll participate in
that discussion if necessary.

5.4 Reserved Characters: suggest to reverse the order of the glyph and
character columns.

8.11 The content element, definition of charset attribute:
- Despite widespread use, the name charset is not good. A more accurate
name would be encoding.
- Suggest to replace (a user agent are REQUIRED to support [UTF-8]) with
User agents MUST support [UTF-8].


/3/ JPEG icons

This may have been beaten to death before, but I wondered why a JPEG file is
not allowed as the default icon.


/4/ Configuration document

It is not currently an error to have a file called 'config.xml' anywhere
inside the widget's folder structure. The only one that matters is the one
in the root folder. Maybe the CC should warn about the presence of multiple
config.xml documents?

In 8.2 Proprietary Extensions, first para, last sentence: For the sake of
interoperability, extensions to the configuration document are NOT
RECOMMENDED. Two problems: 1) NOT RECOMMENDED is not valid keyword usage;
2) it is OK to extend the configuration document because there is a
well-defined mechanism that uses namespaces. Suggestion: remove this
statement.

Version attribute: in the ABNF, doesn't 1*2DIGIT mean that a version
number like 123 would be invalid? Especially build numbers are often three
or four digits long. Of course, this is only a guideline, but still...

Numeric attribute: apparently all numeric attributes are non-negative. Isn't
there any use case for negative numbers, ever? Also, non-Western numerals
are not acceptable by this definition, but that might not be a showstopper
in this case.

Keyword attribute: Is the set of allowable characters inherited straight
from the XML spec? Is it self-evident that keywords can't contain spaces
because it is the keyword list separator? Couldn't comma also be used as a
separator?

Boolean attribute: it is stated that only true and false are valid
values, but that if the value is something else, the default is false. That
seems like a contradiction. Are values other than literal true and false
supposed to be actual errors, or are they silently coerced into false? (If I
accidentally type required=rue it would come out as false...)


/5/ Rule for Parsing a Non-negative Integer

Couldn't we just refer to some well-established definition, if there is one
handy? Seems like this must have been done many times before. And what about
those negative numbers, then? Not needed? :-)

Hope this helps,
Jere @ Nokia




Re: [widgets] Base folder and resolution of relative paths

2009-05-13 Thread Jere.Kapyaho
Hi Francois,

the idea of treating the widget package essentially as a local HTTP server
in terms of resources is quite interesting, although like you said there
will be differences. In terms of localized resources, it seems a good
strategy to use the same mechanism as HTTP and Accept-Language to select
localized resources. I don¹t know if anyone has done it like this before in
the context of an application; maybe not. But somehow I think we don¹t go
all the way to q values. (?)

The most important concept here is really that the locale (or language tag,
if you will) is specific to any given resource. That does have the risk of
having partially localized applications, as you observed, but that is really
the developer¹s concern. They just need to make a concerted effort to
localize all that is needed.

Of course there will always be widgets that have nothing localized, so the
mechanism must allow the fallback content to be found as the last step of
the lookup. The drawback there is that if resources are always looked for
according to the UA language list, it might be wasted effort (and a
performance hit) for those widgets that don¹t have anything localized. The
search would go all the way from the deepest branch to the root in vain
before finding the fallback content.

I¹m thinking the widget locale (as derived from the UA language list) should
be organized by primary subtag so that the most specific tag comes first,
like so: ³en-us, en, fr-ca, fr-fr, fr². At least Marcos seemed to like this
idea. That would simplify things by making the base folder essentially the
first tag in the widget locale. (Or could this be folded to just en-us,
fr-ca, fr-fr? Content negotiation should know to drop the subtags
successively anyway.)

Assumptions/premises:
- The content negotiation is applied only to widget: URIs (incidentally,
would a relative URI inside widget content then always be considered a
widget: URI?)
- When referring to content inside the widget package, the
Olocales/some-language-tag¹ part is never included in the URI. Instead, the
content negotiation mechanism should kick in and find the resource given
just a relative URI like 'flag.png'.
- If the content negotiation finds no localized content (i.e. there is no
requested representation for the resource), a resource with the specified
name in the root of the widget is used. If not present, it is probably an
error.

I'm trying to help Marcos hammer out these things in the PC spec, so I
really appreciate your feedback.

--Jere

-- 
Jere Käpyaho (jere.kapy...@nokia.com)
Specialist, Developer Platforms Standardization
Devices RD, Nokia Corporation



On 11.5.2009 11.26, ext Francois Daoust f...@w3.org wrote:

 Hi Jere,
 
 Let me try to clarify my thoughts...
 
 Note I'm actually suggesting an amended C3: one locale per resource, and
 one locale for element-based localization.
 
 
 For folder-based localization, my rationale is that it's useful to view
 a widget package as a local HTTP server because it allows one to create
 a widget from an existing server-based web site with limited changes.
 There will always be differences but trying to stick to HTTP rules when
 possible (and mostly harmless, i.e. when there's no strong benefit not
 to follow these rules) sounds like a good idea.
 
 In HTTP exchanges, the locale of a resource is resource-specific: the
 server returns the resource in a locale that best matches the list
 defined in the Accept-Language HTTP header field of the HTTP request.
 Two requests with the same language list sent to two different resources
 may return resources in two different languages. That's up to the
 content provider to make sure that things are consistent. I'm suggesting
 that you use the same mechanism for folder-based localization. In
 practice, this means that there is no one locale for folder-based
 localization, it's rather one locale per resource. It sure makes it
 possible to have half of a widget in Russian while the other half is in
 Chinese, which does not make a lot of sense. But then the same thing is
 possible with server-based web sites, and returned languages are based
 on the list of languages supposedly understood by the end user anyway.
 
 
 For element-based localization, I don't feel strongly one way or the
 other. Here, we're not talking about one resource per language, but
 about one resource that contains information in different languages. It
 seems better to define one locale for the whole resource for consistency
 reasons, and that's what C3 suggests.
 
 Francois.
 
 
 
 jere.kapy...@nokia.com wrote:
 Hi Francois et al,
 
 reading the comments below, I¹m puzzled as to why C3 would be preferable. It
 says:
 
 ... two different locales: one specifically for folder-based localization,
 and the other specifically for element-based localization.
 
 What is the rationale for this? I would think that the locale influencing
 the selection of material should be the same for both (and use default
 unlocalized 

Re: I18N issue: case-sensitivity of locale subdirectories

2009-05-07 Thread Jere.Kapyaho
On 5.5.2009 13.16, ext Marcos Caceres marc...@opera.com wrote:
 On Wed, Apr 29, 2009 at 4:16 PM, Robin Berjon ro...@berjon.com wrote:
 Assume we have two localisation subdirectories:
 
  locales/en/
  locales/EN/
 
 What happens? BCP47 (which we reference) is defined to be case-insensitive
 so it doesn't help us much in this respect.
 
 There are multiple options:
 
  a) we define a canonical casing and all others are ignored;
  b) we select an order of priority and we only consider one (the first to
 match);
  c) we select an order of priority and we merge them all (in that order,
 with a given precedence rule);
  d) the device on which the user agent is catches fire.
 
 I think that (a) should be ruled out because as BCP47 tells us, ISO639-1
 recommends lowercase (language codes), ISO3166-1 recommends uppercase
 (country codes), and ISO15924 recommends titlecase (script codes). These are
 different, but likely to be confusing, and I don't think that developers
 should have to worry about that.
 
 Agreed.

Because BCP47 is indeed case-insensitive [1], both en and EN (and also
eN and En) are considered equivalent. While it is probably an oversight
or error to provide several variants of the same language tag with different
character case anyway, they need to be considered somehow because they *are*
equivalent, unless it is made explicit that this is an error in the
packaging.

The path inside the widget's ZIP file is already defined as
case-insensitive, so it is actually already an error to have two or more
folders with names that differ only by character case. Even if some
implementation unzips the content of the widget to a local filesystem, we
have no control over whether filenames in that filesystem are
case-insensitive or case-sensitive.

 I don't have a strong opinion on this, but I do I have a preference for a
 rule based on (b): if multiple locale subdirectories have the same
 case-insensitive name, then the one that comes first in ASCII-code order
 (e.g. in order: EN, En, eN, en) is used and the others are ignored.
 
 This seems reasonable. I will add this.

I suggest that the widget packaging rules say that any localized folders
must be unique in terms of a case-insensitive match, otherwise the packaging
is invalid [2]. This also allows us to not talk about ASCII code ordering.
Furthermore, there is then no need to merge the contents of such folders.

For the degenerate (but unfortunately unavoidable) case where someone has
managed to slip in two or more such folders, define a canonical casing
(obvious suggestion: lowercase) and use it, then simply ignore any others.

 The argument in favour of only using one is that we already have to merge
 multiple directories, and adding one merge operation for what is in all
 probability a user error seems like too much complexity for little value
 (I'm happy to be contradicted by implementers however). Picking ASCII-code
 order is based on the fact that the directory names must be ASCII here (the
 others must be discarded), and picking the first is arbitrary.
 
 Thoughts?
 
 I support b. Added some of your text above to the spec.

I guess none of a)-d) really fit my observations as such. It's more like
additional packaging rules + shades of a).

Note that for comparisons with the widget locale value you still need to
case-fold [3] everything anyway. There is no guarantee that the widget
locale matches any localized subfolder name as such, because the widget
locale itself could use capitalization that really carries no meaning, but
fails to match any localized folder unless you do a case-insensitive
comparison. In this case the comparison can be also language-insensitive,
because BCP47 language tags consist of US-ASCII characters.

Hope this helps,
Jere

[1] http://tools.ietf.org/html/bcp47#section-2.1
[2] http://dev.w3.org/2006/waf/widgets/#invalid-widgets
[3] http://www.w3.org/International/wiki/Case_folding

 [0]http://dev.w3.org/cvsweb/~checkout~/2006/waf/widgets/i18n.html?rev=1.29co
 ntent-type=text/html;%20charset=utf-8
 
 --
 Robin Berjon - http://berjon.com/
    Feel like hiring me? Go to http://robineko.com/
 

 --
 Marcos Caceres
 http://datadriven.com.au

-- 
Jere Käpyaho (jere.kapy...@nokia.com)
Specialist, Developer Platforms Standardization
Devices RD, Nokia Corporation




Re: Widgets 1.0 Packaging and Configuration: I18N comments...

2009-04-16 Thread Jere.Kapyaho
Marcos,

thanks for a lucid and thorough widget I18N model proposal. This should really 
help all concerned to come to agreement about how widgets should be 
internationalized and localized. Also in this case I18N turned out to be more 
than many perhaps thought it would, but the effort is worth it, given the 
importance of being able to present widgets in the user's language.

I've taken a first look at the proposal document (rev1.19 as of April 15); 
please accept a few comments.

Wrt PROPOSAL A2, I would strongly recommend that the single language tag be 
non-empty. There is always some default language that can be retrieved or 
derived. Allowing an empty language tag serves no purpose, and really just 
complicates matters.

Wrt PROPOSAL A1, the same thing applies. The language tag list should be 
non-empty. Although I'm not convinced that multiple language tags are of value. 
In this context, it should also be noted that the same widget probably 
shouldn't use content labeled with two or more completely disjoint language 
tags. At least the language must be a common denominator, even if subtags 
differ. So there should never be a case where the user's preferred language 
list is used naively when locating resources that are not found. (Example from 
A1 shows en-us,en,fr,* -- in this case if you want resource X but it is not 
found for 'en-us' or 'en', then you shouldn't go looking for an 'fr' version of 
it.)

In The widget's locale, the first question strikes me as odd. The localizable 
materials in the configuration document and in the folders seem like disjoint 
enough that there is no issue of mixing those two. The widget's locale is the 
same for both, as it is determined from the UA locale. If there are no entries 
for the widget's locale in the config file, or if there is no relevant folder 
content, then a fallback/default is used.

The second question should be simple enough: the widget's locale should be 
represented as a single language tag, based on the UA locale. If there are 
indeed multiple preferred languages, in the order of pereference, then it is 
the first one.

IMHO there shouldn't be any overlap between folder content and config document. 
They serve different purposes: the config document has mostly housekeeping 
data, whereas folder content is presented to the user dynamically at runtime. 
This is why I don't like PROPOSAL B3. The locale for both config elements and 
folders must be the same.

My preference is clearly to have one well-defined, non-empty language tag 
represent the widget's locale. If there are no config elements or folder 
content for that, then there must be fallback elements / content that is not 
tagged with anything, and that serves as the default data. Note that this is 
different from what is described in PROPOSAL C1. Unlocalized content can and 
should be used even if there is a derived widget locale, because somebody just 
didn't provide localized content for that particular language tag. I think 
PROPOSAL C3 is the closest to this approach, but it considers the widget locale 
to be more than one language tag.

Neither PROPOSAL D1 or D2 says exactly what I at least would like. PROPOSAL D2 
comes close, but any resources that are not tagged with a language tag (i.e. 
are default resources) should probably go in the 'locales' folder. However, it 
doesn't really matter where they are found, so if it is OK to place arbitrary 
files in the root of the widget anyway, then the defaults might just as well be 
there.

Regarding xml:base, PROPOSAL E1 is the most sensible one. This is what 
Dashboard must be doing also, although someone more intimate with WebKit might 
want to have a look there. However, the issue of the subtags complicate things 
a bit: if you really want to find resources that are not available for the most 
specific tag, but could exist for a less specific tag, then it is not enough to 
have just a static xml:base, but additional processing rules for resource 
access are needed. If subtags are not honoured, and the language tag must 
always be an exact match, then a static xml:base would be enough, I guess. I 
don't know where we currently stand on the subtag issue, but it seems like an 
important use case.

Hope I've added at least some value to the discussion. It might be good to ask 
other WG members to provide their explicit preferences about the different 
proposals, or at least input such as I have. I hope and think you can extract 
some of my preferences from above.

Best regards,
Jere


On 3.4.2009 18.02, ext Marcos Caceres marc...@opera.com wrote:

Hi Addison,

On Fri, Apr 3, 2009 at 4:38 PM, Phillips, Addison addi...@amazon.com wrote:
 Hello Webapps,

 Thanks for the response. Is there is a new draft or editor's copy where these 
 changes are made?

Yes, see [1]; but we are still working out the details. As this change
caused some radical changes in the spec, I am still working out how to
make the processing model work. I'm currently 

Re: [widgets] Zip endian issue?

2009-04-03 Thread Jere.Kapyaho
Well,

the ZIP file specification does say that all values are stored in little-endian 
byte order unless otherwise specified. The local file header signature is the 
four bytes 50 4B 03 04, in this order, always. Endianness is not even an issue, 
if you read and compare individual bytes.

How could the ZIP file be entirely transposed on media? It is a binary file; 
if some entity is transposing it, it's not the same file anymore. A practical 
example of where and how this would be happening is needed.

--Jere


On 3.4.2009 11.54, ext Marcos Caceres marc...@opera.com wrote:

Hi,
I recently chatted with Josh and he pointed out that there might be an
endian issue when checking for magic numbers.
[[
timeless.b...@gmail.com:  zip files are guaranteed not to have ENDIAN issues?
If the first four bytes of the potential Zip archive do not match the
magic numbers for a Zip archive (50 4B 03 04)
 me:  ah, good point
 timeless.b...@gmail.com:  i.e. could you have 4B500403
or whatever the correct endian corruptions are :)
 me:  All values
 are stored in little-endian byte order unless otherwise specified.
 timeless.b...@gmail.com:  yeah, but could the file be entirely
transposed on media
test it.. create a zip file w/a single tiny file and then use some
perl magic to swap each pair of bytes :)
(i'm not awake enough to do that)
 me:  neither am I
]]

Anyone with experience in this area want to propose some text for the
PC spec to fix this issue.

Kind regards,
Marcos
--
Marcos Caceres
http://datadriven.com.au




Re: [widgets] Further argument for making config.xml mandatory

2009-03-22 Thread Jere.Kapyaho

In the context of the discussion about having a mandatory config file, I 
proposed to simplify matters even further and have just one config file, with 
the note that this proposal could be ignored in the interest of time and/or 
effort. There are pros and cons to both approaches, as Marcos has itemized. 
While I think it was great of Marcos to consider this proposal, this issue 
could really be resolved either way, and I certainly don't want this to hold up 
the spec. Then again, not all have expressed sufficient rationale to back up 
their opinion, or expressed any opinion at all.

The configuration file does not seem to contain so much translatable data that 
it would be an issue for localization, and that issue was not raised by the W3C 
I18N WG either (on the contrary, rather). That was why I thought the 
simplification was a good idea, but I'm also OK with multiple config files. For 
the record, you can still count me as backing the idea of a single, mandatory 
config file, with some multiple elements distinguished by an xml:lang attribute.

--Jere


On 22.3.2009 20.06, Art Barstow art.bars...@nokia.com wrote:



On Mar 19, 2009, at 12:06 PM, ext Marcos Caceres wrote:

 On Thu, Mar 19, 2009 at 4:52 PM, Andrew Welch
 andrew.j.we...@gmail.com wrote:
 That's exactly what I was talking about when I said even thought
 the XML i18n
 guidelines say it's bad practice,'.

 Ahh very sorry, I just saw the email after that containing the code
 sample, and gmail collapses the quoted parts my bad.


 However, Addison Phillips, the
 Chair of i18n core, said the following in the formal feedback
 representing the i18n WG's LC comments for the spec [1]:

 Section 7.4 (Widget) The various language bearing elements such as
 name, description, etc. are of the zero-or-one type. However, it
 is typically better to allow any number of these elements to occur,
 provided that none share the same xml:lang. This allows for
 localization (which is part of the point in allowing xml:lang on the
 element).

 So we have been blessed by them to do this... umm this somewhat
 questionable, yet problem solving thing :)

 [1] http://lists.w3.org/Archives/Public/public-webapps/2009JanMar/
 0259.html

 That's interesting, because xml:lang seems pretty redundant
 otherwise!

 Alright, lets see a show of hands for this approach! Who supports us
 just having a single config.xml with a bunch of repeated elements, but
 with different xml:langs?

 Advantages here are:
   *  we only need to make very small modifications to the parsing
 model.
   *  no more searching for config docs in locale folders
   *  no multiple parsing of config files

 Disadvantages:
  * large, and, if not careful, hard to maintain config files

My experience working with localizers is that separating their data
(concerns) into separate files was a good model and eliminates
potential issues with a localizer accidentally removing or changing
some other part of the config file.

Marcos - what part(s) of the old/current model - i.e. where there is
a root config file plus a config file may also be installed in a
locale-specific directory and the locale-specific config file would
only contain those parts that are localized - are not properly
specified (i.e. needs more spec work)?

All - my take of positions here is as follows. Please let us know if
this data is not correct and if you have not indicated your
preference, please do so ASAP (before the March 26 Voice Conference):

Only one root config file: Jere, Marcos

A root config file plus one per locale directory: Benoit, Josh

-Regards, Art Barstow







Re: [widgets] Further argument for making config.xml mandatory

2009-03-19 Thread Jere.Kapyaho
I still think that more than one config document is the most confusing aspect 
of this. Having just one (mandatory) config document, with the localized parts 
tagged with xml:lang attributes would be the simplest. However, as I understand 
it, the separate config files were recommended by the W3C I18N group.

If this decision would be reversed, then anything in the config document that 
could (as per the schema) have an xml:lang attribute would by definition be 
localizable/localized. Others (like id, version etc.) would not be. That would 
also free the implementation from collecting all the various config documents, 
just to create and store an intersection of the elements. If you have two 
values for the same element, then who wins? The most specific (from the config 
in the localized folder), or the least specific (the default/fallback one from 
the root)?

Proposal (feel free to ignore, due to pressure to be feature complete): make 
the config file mandatory, but allow it only in the root, then allow multiple 
elements with unique xml:lang attributes for those elements that are 
localizable.

--Jere


On 19.3.2009 16.24, ext Marcos Caceres marc...@opera.com wrote:

On Thu, Mar 19, 2009 at 1:15 PM, Priestley, Mark, VF-Group
mark.priest...@vodafone.com wrote:
 Hi Marcos, All,

 I would like to raise a comment in support of making the configuration
 document at the root of the widget mandatory.

 The localisation model currently described by [1] allows for multiple
 configuration documents; zero or one at the root of the widget and zero or
 one at the root of each locales folder.

 While we support the approach of allowing localisation of the configuration
 document (with the addition of the fallback mechanism that has been
 previously discussed), one concern we had with such an approach was that it
 doesn't make sense to localise some of the information in the
 configuration document, for example: the feature element, (the replacement
 for) the access element, the license element, the id and version attributes
 (and maybe others?). In fact in some cases, allowing different values
 could present security risks.

 Previously we (Vodafone) had considered an approach of requiring user agents
 to, for example, create a list of all feature elements present in any valid
 configuration document. We had not yet thought how to handle the case in
 which the different configuration documents contain different id attribute
 values.

 However, now that there is a proposal to make the configuration document at
 the root of the widget mandatory, I would like to propose that a better
 (although not pretty) solution would be specify which attributes and
 elements are localisable. The non-localisable attributes / elements would
 only be valid if included in the configuration document at the root of the
 widget.

 Thoughts?

Proposal: not localizable:
widget's id and version attributes.
feature and its options
access

The following elements would be localizable:
 widget (but no id or version, derived from root config, if available)
 name
 description
 author
 license
 icon
 content
 preference
 screenshot

FWIW, I think this will confuse authors... and irritate the poor souls
who need to implement this :)

Kind regards,
Marcos
--
Marcos Caceres
http://datadriven.com.au




Re: Widget Signature update

2009-03-12 Thread Jere.Kapyaho
One (possibly minor) point regarding the filename rule:

At least the Widgets 1.0 PC spec uses ABNF (RFC 5234) and refers to it, maybe 
this would be good also in the DigSig spec?

The rule expressed in ABNF would be something like:

signature-filename = signature non-zero-digit *DIGIT  .xml
non-zero-digit = %x31-39

Here, DIGIT is a prefabricated rule defined in RFC 5234. This rule says that in 
between the strings there must be at least one non-zero digit, followed by zero 
or more normal digits.

The normative reference for ABNF would be (grabbed from the PC spec):

dtdfn id=abnf[ABNF]/dfn/dt
  ddRFC 5234, a href=http://www.ietf.org/rfc/rfc5234.txt;citeAugmented 
BNF
for Syntax Specifications: abbr title=Augmented
Backus-Naur FormABNF/abbr/cite/a. D. Crocker  and P. 
Overell.
  January 2008./dd

--Jere

On 9.3.2009 22.51, Hirsch Frederick (Nokia-CIC/Boston) 
frederick.hir...@nokia.com wrote:

I updated section 4 to correspond to  this:

If the signatures list is not empty, sort the list of signatures by
the file name field in ascending numerical order (e.g.signature1.xml
followed by signature2.xml followed by signature3.xml etc).


regards, Frederick

Frederick Hirsch
Nokia



On Mar 6, 2009, at 10:07 AM, ext Marcos Caceres wrote:

 Hi Frederick,

 On 3/6/09 3:59 PM, Frederick Hirsch wrote:
 I've updated the widget signature document distributor file naming
 convention to the following after discussing this with Josh (thanks
 Josh):

 Naming convention for a distributor signature:
|signature [1-9][0-9]* .xml|

*

  Each distributor signature /MUST/ have a name consisting of
  the string signature followed by a digit in the range 1-9
  inclusive, followed by zero or more digits in the range 0-9
  inclusive and then .xml, as stated by the BNF. An
 example is
  signature20.xml.

*

  Leading zeros are disallowed in the numbers.

*

  Any file name that does not match this BNF /MUST/ be
 ignored.
  Thus a file named signature01.xml will be ignored. A
 warning
  /MAY/ be generated.

*

  There is no requirement that all the signature file names
 form
  a contiguous set of numeric values.

*

  These signatures /MUST/ be sorted numerically based on the
  numeric portion of the name. Thus signature2.xml preceeds
  signature11.xml, for example.


 See draft
 http://dev.w3.org/2006/waf/widgets-digsig/#distributor-signatures

 I also updated the notation section, changed the code format to be
 italic (without color), and updated the body style to not be quite
 so large.

 Please indicate any comment or corrections on the list.


 The changes look good to me! thank you.

 Kind regards,
 Marcos





Re: [widgets] Minutes from 5 March 2009 Voice Conference

2009-03-05 Thread Jere.Kapyaho
Easier on the eye, but to me it's pretty close to the color of RFC 2119 keyword 
style (em.ct).

Seems like the body text font has grown in size somewhat, compared to other 
specs.

--Jere

On 5.3.2009 18.03, Hirsch Frederick (Nokia-CIC/Boston) 
frederick.hir...@nokia.com wrote:

I updated the style for code items in the Digital Signature
specification to brown.

Does this work better? It does not conflict with other color uses as
far as I can tell.

Please look at
http://dev.w3.org/2006/waf/widgets-digsig/  (refresh)


regards, Frederick

Frederick Hirsch
Nokia



On Mar 5, 2009, at 10:11 AM, Barstow Art (Nokia-CIC/Boston) wrote:

JS: re styling, orange doesn't work well for me regarding
readability

MC: I can help with that

FH: I'll take a pass at that