On Wed, Jun 27, 2012 at 5:15 AM, Edward Tie <[email protected]> wrote:

> And Chinese , Thais and japanse  language ?


I did some tests with several languages, including Arabic, and it all works
over XEP-0301.
Even the strange Unicode stuff I've seen, works too -- complex Unicode art
-- 

Regarding international Unicode text and complex Unicode art:

For XEP-0301, all of it remains reliably real-time editable (inserts,
deletes at correct locations) as long as any processing steps that modifies
the Unicode content is kept outside the encode-decode chain (i.e. Unicode
Normalization, emoticon processing, autotext, etc), and whatever remaining
Unicode modifications (i.e. Linefeeds) handled by the XML processor
complies with w3.org/XML.  Technically, it is not the responsibility of the
XML processor to do normalization, and normalization should be done closer
to the GUI end.  Then the indexes (p and n attributes) used by real time
editing action elements are always accurate.

In my experience, I've thankfully discovered that /most/ XMPP modules and
libraries (ones that provides let me process XMPP extensions) don't do
rudely do unwanted/unavoidable Unicode normalization on the strings inside
the XMPP extension that I want to process myself.  The only processing done
I've seen is XML-processor related (entity decode, linefeed conversion, and
conversion to the programming language's Unicode string format such as
UTF16, etc), which is acceptable and doesn't interfere with real time
message editing indexes.  As long as the XML-processor limits its Unicode
processing to comply with the XML processing standard at w3.org/XML ....

So therefore, I've not seen many situations where there was unavoidable
Unicode normalization taking place INSIDE the sender encode/recipient
decode chain.  XML processor bugs can always be fixed, "rude" normalization
done by an XMPP library can be moved to a proper place in the processing
chain, and desparate programmers can count upon message resets (<rtt
event='reset'/>) to automatically correct the text that's messed up by
wrong-position inserts/deletes (bad, bad) -- at least it signals a
debugging/fixing opportunity once a disreprecancy is noticed between the
contents of the real time message and the contents of the message reset (or
final <body/>)

Thanks
Mark Rejhon

Reply via email to