Chris, I think what you've done is very interesting, and useful, so what I have to say below is not intended as criticism of your work in any way.
It's curious that the Arabic Presentation Forms got into Unicode at all, and a number of people still think it was a mistake, a sell-out. One of the Fathers of Unicode told me they were deprecated. Even the Unicode specification explains their presence rather apologetically. According to the True Gospel of Unicode, Arabic text should always be _encoded_ in the basic \u06HH characters, and should not need to be re-encoded. The rendering (BIDI, shaping, ligatures) should be done in a separate rendering engine that does whatever it needs to do internally but does _not_ change the original encoding of the text. The Unicode Presentation Forms are kludges for systems that don't implement a proper rendering engine (and it sounds like you are working within the limitations of such a system). According to the True Gospel, the Arabic Presentations Forms are an unjustified waste of code space. >From a purely aesthetic point of view, rendering each basic Arabic character with one of only four glyphs (isolated, initial, medial, final) yields only marginally acceptable results. It's crude and butt ugly. Before you get defensive, let me point out that I myself use a four-glyph-to-each-char font for rendering Arabic on my own webpage www.arabic-morphology.com So I'm not criticising you. My rendered Arabic is crude and butt ugly. So I'm just saying that really good Arabic rendering would need to be far more subtle and flexible than any four-glyph font could allow. Good Arabic rendering also needs to use a lot of ligatures, not just the laam-alif ones. The Arabic Presentation Forms also provide a lot of ligatures, but again a good rendering system would need ligatures not includes in the closed set of Arabic Presentation Forms. As an example, the ArabTeX rendering system (a package for TeX and LaTeX for rendering Arabic) uses a font of about 250 basic glyphs, with an auxiliary mechanism to connect the basic shapes where necessary with smooth curves. The ultimate Arabic rendering engine is probably that of Thomas Milo, who is inspired by the best of the Turkish calligraphers. His rendering algorithms first draw the basic skeleton of the word according to the best classical proportions, and then fit in the dots and diacritics afterward (this is how real calligraphers do it, and it requires some squeezing and subtle contextual adjustments). A lot of people think that Milo's work is _too_ calligraphic, and unsuitable for plain text like newspapers and even most books, but he's a constant reminder that most computer rendering of Arabic (including my own) is still pretty crude. Keep up the good work, Ken > Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm > list-help: <mailto:[EMAIL PROTECTED]> > list-unsubscribe: <mailto:[EMAIL PROTECTED]> > list-post: <mailto:[EMAIL PROTECTED]> > Delivered-To: mailing list [EMAIL PROTECTED] > Delivered-To: moderator for [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > From: "Chris Whiting" <[EMAIL PROTECTED]> > Subject: Re: Bidirectional (bidi) Support? > Date: Fri, 24 Oct 2003 22:24:38 -0400 > X-Priority: 3 > X-MSMail-Priority: Normal > X-Newsreader: Microsoft Outlook Express 6.00.2800.1158 > X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 > X-Posted-By: 24.225.138.33 > > > >"Bob Hallissy" <[EMAIL PROTECTED]> wrote in message > news:OFA3E32D4C.D67B7CEA->[EMAIL PROTECTED] > > > >On 21/10/2003 01:09:32 "Chris Whiting" wrote: > > > >>I have implemented ... an Arabic shaping algorithm in > >>Perl and was wondering if it would be useful to upload it to cpan. > > > >I presume your algorithm depends on the Arabic presentation forms available > as separately encoded >characters in Unicode. If this is the case, and given > that lots of Arabic characters in Unicode do not have >all their > presentation forms separately encoded, nor will any new presentation forms > be added to the >standard, it would seem such an algorithm would be of > limited, and perhaps, misleading help. > > > > The algorithm, and all that I have seen, convert Arabic characters in the > \x{06--} range to Arabic Presentation Forms A ( starting at \x{FB50} or B > ( starting at \x{FE70} ) characters depending on their medial, isolated, > initial, and final values per the Unicode standard. > > I am not sure that I understand your point. Isn't this the purpose of the > Arabic Presentation Forms? > > I have found several different algorthms that do this but have found none on > CPAN. Note that I may help someone else (who has a simpler module than > mine) to upload his files. > > I use the modules when rendering text on images using ImageMagick which > performs no shaping nor bidi algorithm. Without this shaping module many > of the characters are not rendered correctly. It my case the shaping module > is not limited nor misleading but required. > > Perhaps, you have a better way? > > Chris > > > >Bob > > ********************************************************************** Kenneth R. Beesley [EMAIL PROTECTED] Xerox Research Centre Europe Tel from France: 04 76 61 50 64 6, chemin de Maupertuis Tel from Abroad: +33 4 76 61 50 64 38240 MEYLAN Fax from France: 04 76 61 50 99 France Fax from Abroad: +33 4 76 61 50 99 XRCE page: http://www.xrce.xerox.com Personal page: http://www.xrce.xerox.com/people/beesley/beesley.html **********************************************************************