-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 To sum up, we would need to agree on and specify a RTF subset which is Unicode-aware (UTF-8 only?), and implement an Unicode-aware transducer for it.
On 21.05.2014 15:59, Jaak Ristioja wrote: > So this means that actually we want non-standard RTF (someone > should update the wiki). Should we assume UTF-8? Are you sure we > don't have any modules with ISO-8859-something encoded values? > > If we choose any ASCII superset encoding we have to consider at > least the two points: > > * Since the RTF control words and delimeters are specified in ASCII > only, we need to decide whether how the bytes of the superset act > as delimeters and parts of "RTF" control words. For example, > whether the Unicode letter, number, spacing, punctuation, control > etc characters constitute parts of RTF control words or act as > delimiters. > > * In case of encodings where characters may consist of multiple > bytes (e.g. the variable-length UTF-8) we must consider the > character bondaries. We can't just pass through any non-ASCII byte > values. For example, the following bit sequence wouldn't make > sense: > > 11100010 01011100 10000010 01110001 10101100 01100011 > > which is an UTF-8 encoded Euro sign, €, interleaved with bytes of > the ASCII string "\qc". It just doesn't make sense, whereas the > following sequences would be correct: > > 11100010 10000010 10101100 01011100 01110001 01100011 (€\qc) > 01011100 01110001 01100011 11100010 10000010 10101100 (\qc€) > > So depending on the encoding it were correct to detect such cases, > otherwise we end up with invalid Unicode output. > > Blessings, Jaak > > On 21.05.2014 15:19, Chris Burrell wrote: >> I believe some conf files have direct unicode (rather than >> escaped sequences) in them and that is preferred. > >> On 20 May 2014 23:28, "Jaak Ristioja" <j...@ristioja.ee >> <mailto:j...@ristioja.ee>> wrote: > >> I've never done BiDi, but I'm not sure I need to take that into >> account while fixing the RTF parsing. As I currently understand >> it, this particular piece of code does not support any part from >> the RTF spec dealing with bidirectional text handling. Hence all >> BiDi information contained in the configuration file strings >> (e.g. About=) is contained either in the plain ASCII text or the >> \u<num> Unicode escapes which this algorithm should pass through >> unmodified. > >> ...except for HTML entities which should actually be escaped. >> This bug in the algorithm I previously failed to notice. >> Additionally I forgot that non-ASCII characters in the input >> string should also lead to parsing failure. > >> Jaak > > >> On 20.05.2014 21:01, David Haslam wrote: >>> Take care with Right to Left languages such as Hebrew. >>> >>> i.e. After any patches to the filter, please include some >>> testing >> for BiDi >>> text in the About= field and others. >>> >>> David >>> >>> >>> >>> -- View this message in context: >> http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html >> >> >> > Sent from the SWORD Dev mailing list archive at Nabble.com. >>> >>> _______________________________________________ sword-devel >>> mailing list: sword-devel@crosswire.org >> <mailto:sword-devel@crosswire.org> >>> http://www.crosswire.org/mailman/listinfo/sword-devel >>> Instructions to unsubscribe/change your settings at above page >>> > > > >> _______________________________________________ sword-devel >> mailing list: sword-devel@crosswire.org >> <mailto:sword-devel@crosswire.org> >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page > > > >> _______________________________________________ sword-devel >> mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page > > > > _______________________________________________ sword-devel > mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel Instructions > to unsubscribe/change your settings at above page > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQgcBAEBAgAGBQJTfKcZAAoJELozJlbjIn79QeRAAIuemi7ZxYbt+fLCKjmJq5eF Twas8zabBkm55uco6lFZ+gQaE51i7UFBR9zoVeZqC3PBXHylD1Vaki2jcFIJWEuQ 7rmw8o1YM0q/dAobuqVbHnxpzpEbPXWEajhipb4B91BNYQJWqNzo3bx/y0RVeV/8 QkYj4CmXG0DB5oAnzaq54ZJr3pbYms/kEwhbMSe+lQUiIjAsuTa3glJgCfBt9QA2 b0fkyVm3M85DPQD9Qn7Iucbb38UirTwVjxFWt4ds5oWSuSWCyJKl0/TSN2IV1zqg NDH+TiNQ/SekimeX4F8/wNMV5pfNhmE+wdiRfaCXLO9a6GY7bx2IlOZVbsz9fB6L 6AB5LQqBYd1YY/3vRnFNHXgX7GP6iEqkp36xMFDH4dTcDp8o87Xh4T5ugumb2io2 i6IUwYO6jhEKVEMSpD5qiexYrBWoJYAsCXSyNA7N+Aw+ktOi70aOOPD/vAsPpqMI 1kFss3zc9fmDGVMmy43R5TKTy1qB3tpTqrHyWFZP8xSEJJt2DbqszC2HAr8KTmea vz6d/Xb58qLhuBOHKP2Dhcr9BZ10RhrWj/q9mBv6nWzo6xE4H7PySpTy64LciGRW ZNoVS3DfBOPTaS3jmJdduBtSSdI3huVfeQlRBkMDutd8QdEfECgvyXYX/5Q+pP4C APT6gRvDSSzwg98H0m+9REa7kOcXbaaBH1A6aiUhU8PE7B4xqCSU7vwAx7xGwtb9 4AmYmPi+ei3p6BxY42bAihnfz9BDSjOETUOx0/psF2Cv+Pvg4PGH/vRcSSesriGX ouGZ57p4Q77XsITtI8umFi1tvgyAHmigO5M1Cd2gSLtXlB1laSum3K3vuHvF4r16 WKnlVlFJ2eEf2IP3TGT7bUg2PdgaAQoY2F1L8FHLZDup0iZrzXD73CA4vcfZRUSg IpkZDHEQbRA0ZUID9ACDGWo/KXxI/EtgWdLKXDCcX2fBDuzEWKRjItnCE/IsU+gG sxVJwsR3eKK4eU0bD1fHqnb9kVhwuGqgOL2YYmthKiX6gXddR4nIpFIOjztAar4p P/WWSIa44P3yNXvVeD/J3kpNABMzart+mW8Hw0jqRi6SfxSAi/2NZOk5Pq1bASCY sE9nYGCCLVbrZCOLorg/XlwXfE2ltrteU/5MXvX8CqezDcq1x+4ljWyqCJ/SkaW6 NRxUzcP+DIdm0yHh+HCzGRtqVyn3TnbOeX6vzbACQMxh9iMP1Dzy9Vr0S8J1Vi0w iox6Hg51jhtdFOrrZDGm/rTnIxVe7fkUiHDa0bWtVED044P74CNL46yRrdlnyyWV sBVwpjZQ/UwpdRrb4R5wtoYv1DT8YjNyfIGIkheH92hIhHZvz9RkzW8cWpw8DRzU BrqvWjyCEF8evu5C3/KNPMLgz6w86Fqonma8BmM+MdriV0/lmshLVwtLOHXzKaLW XspcL85Ar3MKYI1GO7pbBIw9uZ2ki4hJxKqHKeB3fBsdBue36koIMtN6lklsBHmd 2hrbNSDalD1/NBCdZ5/5yck32bx4gqTvbs/lC2eyITEoDHm4qoJx5MLpBq5HE2tW caSuCElSi4xqZpJjcjjGB4RNQkIJv89IRAUNivlVRzQreMumKMhFM+npH4AYKxyD 0VACjXN2QxKa5+UaZXPicdUhoTuYJlYoXhhz1+pszfwmBIPrweKnr81a2HvI3Ytg xiR1GJZQlnDZKYFqZWMhZAaiSWk452On072KdzMciZ5UrGGp/Za+kA0+W7xvZGA9 TimH2Y3fU17ZvYXBIaugqxyt2G0usjxv14n/pWQFJDaj26TnXpdEuQHI+mDIwsE0 kokoStgcFEqYYRshy6BZncv3ksKSmNhBc4eqbOq+Z37CGT1HHmUzUvp4JWyZJza7 1NJzteZ2X/gjJXqcZPIW0LROz2EESSo7wNmBCBH/GYQ/Fy4tCjwg2YYyM8Bj98WM zHPlnwyEcxYPGDWRddzoPOjo2nejtvLju0RnQP1+y41qt/5hpLWnagIZL/waAJ4R /clCrvzZjRfvmHaecc+9rJm6ZTSC//HwP2Pf8FYg8eEDOGcVGRthTFvmENguf1xw e9P0Mjvt0rkkGZOhdqWxFE6N1Z7cB2kmhLGHViTK4GeB40YqEwvoI2flzbgeWC+j SHmiPerkPasPwMDiGwwLzTcbKfyoYr1KqAzz1ZelFNKgP9lWVfHIS+EkEghwPKV/ STvE53H2wYzlkPF4Cg+2RClKMmbWwuAPFXwgl/4/7xE9cGTbe7cjd/CDJ6qpH4tR GcwOPIKY9UJjXfA5imtyoy3amjw+T8PtsgVz2uGAanhzUcpHMObib5pYODWuRodN YJZKkzGhTpzJwP9fg+S5ugQ1tdOPXsu/kAOkv405VDwvzrMR8hDqIv+PCAroA/Lp Y2YQPfrz9tHz19L44k2eVbeD2bpVt97vMIMWhEBkiYpkst2xyZ5TYlXO8SEXBh6C Y4uHI82j5d4uZcK+Ux9fEKe7hqnl1txDnB7t9L9nfINM7BYjq68A351xH53mD8Bm G64IeAyT6DscfLsy48yViNeADo+ncig8H/gctEeA7yZJccAqcQvgAoW9BBGabnNi y3OMenIqOBERqHdNZ0Ne6Av/42s0/H2RLREUFJtOqHB2IwjdhfjIHIHhrZliSWAG EnRIJVQrZHQIWB1ZFFFJiPTdQWmyNBIFAivbwWtEuDZa7TO+Z09m7R+DyolP0GPm Gyf7bzEMhlF5PuGlXeya =toe/ -----END PGP SIGNATURE----- _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page