RE: valid characters in user names- esp. compatibility characters

Addison Phillips [wM] Wed, 11 Aug 2004 21:08:21 -0700

Hi Tex,

webMethods has used a (slightly modified) version of punycode for handling generated 
class names in Java in several products very successfully for several years now. The 
slight modification is to subtitute underscore for the dash character (since one is 
illegal in Java class names). Punycode has proven to be exceedingly robust for this 
type of application, although the algorithm is very arcane.


Our ACE coder doesn't directly impose NFKC or any of the stringprep type preparations. 
In our application of ACEs users create objects visually and we generate Java code 
named after the objects in a process invisible to users. Although NFKC and stringprep 
are reasonable restrictions for IDN, with its peculiar requirements, it doesn't follow 
that it is good for all applications. Punycode (and all other ACEs) are essentially 
transfer encoding schemes for Unicode code points. The ASCII sequences they generate 
are unique to any particular Unicode scalar sequence. 

It's true that logins have many similarities to IDN in terms of requirements, though. 
Just note that there is no reason why an internal algorithm *has* to do both 
stringprep and punycode or has to do stringprep in the IDN way...

I have a whitepaper on the subject which expands (a tiny amount) on webMethods use of 
ACEs that was presented at IUCs twice, the last one being at Unicode 22, called "Four 
ACEs: A Survey of ASCII Compatible Encodings". The PDF is on my personal website 
http://www.inter-locale.com. I can't remember, but I think this one was a substitute 
paper at IUC22, so it probably isn't in the program proceedings.

Hope this helps,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf Of Tex Texin
> Sent: 2004å8æ11æ 18:29
> To: Unicoders
> Subject: valid characters in user names- esp. compatibility characters
> 
> 
> hi,
> 
> 1) I am looking at a set of legacy applications that would like 
> to extend user
> IDs to support international characters.
> It is not possible to update all of the applications 
> simultaneously to fully
> support unicode, so I am considering an algorithmic mapping of the
> international IDs to an ASCII-based encoding and a layering similar to how
> domain names were extended to be international.
> 
> However, I am curious as to whether some Users might read/write 
> their names
> using compatibility characters (esp. in ideographic markets) and 
> object to the
> characters being normalized through nfkc. I thought it might be 
> like someone
> spelling their name incorrectly. I don't know enough about 
> ideographic names or
> the compat. characters to evaluate if it would be perceived as a 
> problem by
> users. If any CJK experts would comment on this, it would be appreciated.
> 
> 2) I am also getting questions about the robustness and stability 
> of the GNU
> libidn implementations of stringprep and punycode which are being 
> considered. I
> would be glad to hear privately if you have used them and what 
> your experience
> was/is.
> 
> tia
> tex
> 
> -- 
> -------------------------------------------------------------
> Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
> Xen Master                          http://www.i18nGuy.com
>                          
> XenCraft                          http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------
>

RE: valid characters in user names- esp. compatibility characters

Reply via email to