> I understand that Mac developers would consider a conversion to unicode > "lossy" or "non-reversible" if the directionality indicators are not > preserved somehow (using RLE/LRE or RLO/LRO), and this might constitute > an "algorithmic" approach that 'enc2xs' would not support. > > Is there a work-around that will allow all the MacArabic code points to > be converted successfully, given that their respective character > semantics are all well established in unicode? Even a "lossy" > conversion (ditching the directionality specs) would be better than the > failures I'm getting now.
(1) If you can forgive information loss on the text direction, how about use of fallback? e.g. 0x2B <LR>+0x002B # PLUS SIGN, left-right 0xAB <RL>+0x002B # PLUS SIGN, right-left in http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ARABIC.TXT can be converted to <U002B> \x2B |0 # PLUS SIGN <U002B> \xAB |3 # PLUS SIGN, right-left in Encode/ucm/macArabic.ucm. (2) I've briefly written a module (attached with this mail) for MacArabic with Perl 5.6.1 or later. I hope it would be able to be built on Mac; but I haven't worked with Macintosh, and I'm not well-informed in Macintosh nor "bidi", please report me if something wrong. (at least, the version here doesn't support embedding or nesting of direction.) SADAHIRO Tomoyuki
Lingua-AR-MacArabic-0.00.tar.gz
Description: Binary data