Great thanks !!

Best Regards.

On Mon, Dec 13, 2010 at 6:45 PM, Lee Passey <[email protected]> wrote:

> On Sat, December 11, 2010 5:24 pm, Jeulin-L Michael wrote:
>
> [snip]
> > I am now wondering how are you guy managing unicode characters from the
> json
> > file ?
> >
> > For instance unicode characters in "Kha\u0304lid Muh\u0323ammad
> > \u02bbAli\u0304 al-H\u0323a\u0304jj" doesn't make sens at all.
>
> While JSON is technically UTF-8 enabled, the OL developers have chosen to
> encode Unicode characters using the "\u" escape sequence, which is also
> allowed in JSON. Thus, "\u" followed by a four character hexadecimal number
> represents a single Unicode character, at the specified code point. Thus,
> the
> acute 'e' that Mr. Millar was complaining about just a few minutes ago
> should
> be encoded as "\u0233".
>
> In your case the encoding can be a little confusing, because OL has used
> the
> "Combining diacritical marks" set (range 300-36f) [1]. These Unicode
> "characters" are designed not to be used as standalone characters, but
> rather
> as a means of modifying the /preceding/ character. "\u0304" is meaningless
> on
> its own, but "a\u0304" means "the character 'a' combined with a macron over
> it." Every possible Latin-based European language character can be
> represented
> both as ASCII with a combining diacritical mark and as a "precomposed"
> character. Because of the existence of combining diacritical marks, it is
> important to perform Unicode normalization [2] before comparing Unicode
> strings.
>
> This is, I believe, evidence of the continuing tension between "things as
> they
> are" and "things as they appear to be" which plagues our attempts to
> digitize
> texts. I can't comment on whether or not combining diacritical marks are
> the
> best way to do latin transliterations of Arabic names (personally, I would
> have used "\u0101" instead of "a\u0304") but at least now you know what OL
> has
> done, and can adjust for it if you wish.
>
> [1] http://www.unicode.org/charts/PDF/U0300.pdf.
> [2] http://en.wikipedia.org/wiki/Unicode_normalization
>
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to
> [email protected]
>
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to