On Wed, Jun 27, 2012 at 5:15 AM, Edward Tie <[email protected]> wrote:
> And Chinese , Thais and japanse language ? I did some tests with several languages, including Arabic, and it all works over XEP-0301. Even the strange Unicode stuff I've seen, works too -- complex Unicode art -- Regarding international Unicode text and complex Unicode art: For XEP-0301, all of it remains reliably real-time editable (inserts, deletes at correct locations) as long as any processing steps that modifies the Unicode content is kept outside the encode-decode chain (i.e. Unicode Normalization, emoticon processing, autotext, etc), and whatever remaining Unicode modifications (i.e. Linefeeds) handled by the XML processor complies with w3.org/XML. Technically, it is not the responsibility of the XML processor to do normalization, and normalization should be done closer to the GUI end. Then the indexes (p and n attributes) used by real time editing action elements are always accurate. In my experience, I've thankfully discovered that /most/ XMPP modules and libraries (ones that provides let me process XMPP extensions) don't do rudely do unwanted/unavoidable Unicode normalization on the strings inside the XMPP extension that I want to process myself. The only processing done I've seen is XML-processor related (entity decode, linefeed conversion, and conversion to the programming language's Unicode string format such as UTF16, etc), which is acceptable and doesn't interfere with real time message editing indexes. As long as the XML-processor limits its Unicode processing to comply with the XML processing standard at w3.org/XML .... So therefore, I've not seen many situations where there was unavoidable Unicode normalization taking place INSIDE the sender encode/recipient decode chain. XML processor bugs can always be fixed, "rude" normalization done by an XMPP library can be moved to a proper place in the processing chain, and desparate programmers can count upon message resets (<rtt event='reset'/>) to automatically correct the text that's messed up by wrong-position inserts/deletes (bad, bad) -- at least it signals a debugging/fixing opportunity once a disreprecancy is noticed between the contents of the real time message and the contents of the message reset (or final <body/>) Thanks Mark Rejhon
