Great thanks !! Best Regards.
On Mon, Dec 13, 2010 at 6:45 PM, Lee Passey <[email protected]> wrote: > On Sat, December 11, 2010 5:24 pm, Jeulin-L Michael wrote: > > [snip] > > I am now wondering how are you guy managing unicode characters from the > json > > file ? > > > > For instance unicode characters in "Kha\u0304lid Muh\u0323ammad > > \u02bbAli\u0304 al-H\u0323a\u0304jj" doesn't make sens at all. > > While JSON is technically UTF-8 enabled, the OL developers have chosen to > encode Unicode characters using the "\u" escape sequence, which is also > allowed in JSON. Thus, "\u" followed by a four character hexadecimal number > represents a single Unicode character, at the specified code point. Thus, > the > acute 'e' that Mr. Millar was complaining about just a few minutes ago > should > be encoded as "\u0233". > > In your case the encoding can be a little confusing, because OL has used > the > "Combining diacritical marks" set (range 300-36f) [1]. These Unicode > "characters" are designed not to be used as standalone characters, but > rather > as a means of modifying the /preceding/ character. "\u0304" is meaningless > on > its own, but "a\u0304" means "the character 'a' combined with a macron over > it." Every possible Latin-based European language character can be > represented > both as ASCII with a combining diacritical mark and as a "precomposed" > character. Because of the existence of combining diacritical marks, it is > important to perform Unicode normalization [2] before comparing Unicode > strings. > > This is, I believe, evidence of the continuing tension between "things as > they > are" and "things as they appear to be" which plagues our attempts to > digitize > texts. I can't comment on whether or not combining diacritical marks are > the > best way to do latin transliterations of Arabic names (personally, I would > have used "\u0101" instead of "a\u0304") but at least now you know what OL > has > done, and can adjust for it if you wish. > > [1] http://www.unicode.org/charts/PDF/U0300.pdf. > [2] http://en.wikipedia.org/wiki/Unicode_normalization > > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] >
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
