-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 So this means that actually we want non-standard RTF (someone should update the wiki). Should we assume UTF-8? Are you sure we don't have any modules with ISO-8859-something encoded values?
If we choose any ASCII superset encoding we have to consider at least the two points: * Since the RTF control words and delimeters are specified in ASCII only, we need to decide whether how the bytes of the superset act as delimeters and parts of "RTF" control words. For example, whether the Unicode letter, number, spacing, punctuation, control etc characters constitute parts of RTF control words or act as delimiters. * In case of encodings where characters may consist of multiple bytes (e.g. the variable-length UTF-8) we must consider the character bondaries. We can't just pass through any non-ASCII byte values. For example, the following bit sequence wouldn't make sense: 11100010 01011100 10000010 01110001 10101100 01100011 which is an UTF-8 encoded Euro sign, €, interleaved with bytes of the ASCII string "\qc". It just doesn't make sense, whereas the following sequences would be correct: 11100010 10000010 10101100 01011100 01110001 01100011 (€\qc) 01011100 01110001 01100011 11100010 10000010 10101100 (\qc€) So depending on the encoding it were correct to detect such cases, otherwise we end up with invalid Unicode output. Blessings, Jaak On 21.05.2014 15:19, Chris Burrell wrote: > I believe some conf files have direct unicode (rather than escaped > sequences) in them and that is preferred. > > On 20 May 2014 23:28, "Jaak Ristioja" <j...@ristioja.ee > <mailto:j...@ristioja.ee>> wrote: > > I've never done BiDi, but I'm not sure I need to take that into account > while fixing the RTF parsing. As I currently understand it, this > particular piece of code does not support any part from the RTF spec > dealing with bidirectional text handling. Hence all BiDi information > contained in the configuration file strings (e.g. About=) is contained > either in the plain ASCII text or the \u<num> Unicode escapes which this > algorithm should pass through unmodified. > > ...except for HTML entities which should actually be escaped. This bug > in the algorithm I previously failed to notice. Additionally I forgot > that non-ASCII characters in the input string should also lead to > parsing failure. > > Jaak > > > On 20.05.2014 21:01, David Haslam wrote: > > Take care with Right to Left languages such as Hebrew. > > > > i.e. After any patches to the filter, please include some testing > for BiDi > > text in the About= field and others. > > > > David > > > > > > > > -- > > View this message in context: > > http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html > > Sent from the SWORD Dev mailing list archive at Nabble.com. > > > > _______________________________________________ > > sword-devel mailing list: sword-devel@crosswire.org > <mailto:sword-devel@crosswire.org> > > http://www.crosswire.org/mailman/listinfo/sword-devel > > Instructions to unsubscribe/change your settings at above page > > > > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > <mailto:sword-devel@crosswire.org> > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQgcBAEBAgAGBQJTfKM/AAoJELozJlbjIn79gXpAAMxwoq17dvVzCikAplQUjON0 xDJXlDFfKK14w8xj11NSUvJEPjVWlwTi82WzEplQBKfkxtFY09010ZB5IKotEtSP dcJMjzc4FmuJmPifB7s3gtEOQ81OThMArlnq/aFHvGj6+5D8qjFkQiqOzSJeaORS C8dPobXSnJkJ/g3zKCdJf/k5msphFbmuIQOD4Ovco2ZHHlukL8QNd8pt3RcPN4Hy BMxYx9glw3+YJK5Jj63isdsmOGLeRory3PDcHZoPJzu8zssW78Chlsgoh+xWlfkn zI5PdP1ARhq7K/kUnPp7jXx3LDFiEbmPjrNBi/A03k+n7s2oZWdxm9uBfEEq5VpB DpdCA19msaEE+fOWOyAAvvZstnCxYrrd01j+HxXUGoA4JHBBVQo01H5udfOdbiBu nSI5M0GUKBjSSfLSmrh2oTC0qniVMRw4t+IAIJU1chjfBCsoNAx6xTiDE8x+hpjd A+s8wvgBU0gNbqeOMvWXkHeOWSu7O0oPEp0vVl+6fUPPFDHGR1+2vPXLnCcbASwj pEJwls9IBis7touUlIt4stlois1Imtw8zKGXXU8h0UmSgRHK0G2Ck8clNptClkMY +9xP+TGXZI0q+WlzA7M4aD2puQAiJ0iJTm/kV+QGF/1RiaWNGWTG7Oxfufz5XdDn xqTrAkYoVw3a+ZRgZPs4YbyK3ysVqncvAOFKuqLcEEwiA4zEYztGxPMAhcypQJFH n6ORlF3/Kmkukj3eapanznmcvoZ+H/APKNWmo2b+TZ10WABCtZVDO+pd1Ed+l2U5 EytGhMYEqNSMqV109k3It9Ll7a8GVQa6k7AX8/BSXlh6/GaaoIzkSgGJBFAU8Zsj dW7u6O7wBOTBmE+lUUrwA3igveDhTDhzjORE7Ek74xkhoNVwh1DmqWwJGZbIGb5R 47yWwxql4pqS4jq3M+TM8SUZaeY/NTjRTn+WLFBGahKVH5Gg/NiB6onfBBRLyYwK iorFYngEhpKDNJBPp8rfSIg4NxhbupwG9B1Bbrdg6Kj+E+kGsXDuDkBWQEgf1Jwv 3XbiDBEjUf2wr4TdbUx9GrwrBNP7q9YW0RmbQGlvIahVwtr3/PJGhiU/kS47fAZf HQMac1US7eYgtW5hzH/YG+41cCI9J0byZBEuSJS2GuSd0LD0Of4bPLxyOxiXqvTU kwSPIQwsBOZpFIA5Qfc35x5KxVqCGUYBvXhglpZtZGlGr8uIPpshc1gz9ukCejuz 754upiYTlCzocKpvPbER9QpMZFYb+iDTdc4bU8whmxkP8ATKSDQmYIqUS2ohLKV8 co5X0741kRaG5oNOBBrM7kn/9nWgFNspFBkJAvGLbD8h6R8S11cu7INrXzJjxv/e bCAxGXb2UQXXUe18FCYeqUvl5VdQOQt3f7gja3XbitCKkJjUA6i7t1+5vjuMQsAY NFliiFxNeNjNE4hIIpvA7G3N+2t0W8IjGsystXm6ONN0lM78eLZLLlsrfkPi8NgR Nydc78zEJfGr8APkiYleIYTi6ftgtDrI9927wNWqgIPqO4vqA1TZngX8wx6YPJou uF8cSnI0PlcOfEKtsBgZedOpbZlqAt61wvMGMW0YUfiL5LhuP95KQekqDMMBDCQX mGMehJHRJ5PvoDt8485lGOWdwXn6T7PlakZ1UCtYeMV0Nx2PfPBfU7bnCwSRFQKg vpUhPCkW5qpvlkBLOpPLwkqcZGiSyLL/YSGp6cVExeeQVHc2hI169zGY9dUHBEMN CaKwI9Wjn5V95bax3gsMlHnY9c1TB/6yLWnVEJAilm5ijgWW5KxstWoJMd/OptY8 QvbsOA7K36HfwOwNCblQCGbUrPjikhXTw8ew1aap4OHqGIKUWCMm3z/eHOPRU5mD Ce2Z86vwYb9T2PcyqUiZOs1WW9TBZx70Hr2JQmRwgMyWpT4DERjofP83IA8vxZdP 9uKT4j+EBUGoI2zGgE2lapLL/VWrzt6OBMv5iUmR4OIFLdnHevAAy5w53c4+tWjs SNmjAz8tW5FWiVFR99FQBN6KWXIjKdJGQl+zccOlE0zBQe2grnqFmUeuuBbPiojb Wch+hqrKDX/VLr/gIP9EErMJ7ZvZ7st+gwPZlFwC7Evf3OCrUnRYIbMI6iLGLoZ6 c9YLbK67hj1Ho+X99XTeoQj8l2V14TSRCFZBmO7Os5L2kXOEiw0yeV8Dn87LJPFp 4VcfgFGLi9FRnI36K4+h5JWoyhrGhNHrHsO60Xs2U3a02fRfeUgn/T1Xf0xXbVMC gX8zJ3aC15pUy/dJaqJ4HIszzPe5ErO7J9GB7AhjVnx8pEE0xayoJkA4VM0YF8Lk b/IF04rm/dNlsLL7zRzdGpr2uo9esMzFJDYcHnhInhaE7t2iGR4+cgUdRJKA7NJW ZumxNz3a1EjeZHRLqRxfT8O6Cc55hG4GwVO7JxUnXJtRMx+ENXZslf4ExGdhcTdf ntjsfngGemyKYv8aMJ9pDlLFVyR+91xSpFp8QYRDtcP14y5Dfh/jh4Kmdu0BqTzt Wt0KUUZQlx8Qu8XJbatPiieDmjtQ8HPmhsHQAA+QmLzrhEmakrAjTfpWq5eNYQeQ ei6tawFllPyuNrez2BOP3nfXuSBlfn2+yBfi3H1mJc8urrFwDtt/zqTHdoOtyCNO PVaqMROmVzgdKg7yyXTBek3UBe8TxMWigvepRvxkGlmMZQkW42/5ft0269esY/bw tuy57vDPyvQfrJzpN62y =RNpJ -----END PGP SIGNATURE----- _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page