I have been developing something similar to this concept. It's just a perl cgi script that converts from one character set to another. I haven't worked on it for a while, but it works at least from romanized Persian to html Unicode (eg. ک ). I think it can currently handle input and output for: -Romanized Persian (currently only my own personal one) -Unicode html decimal -Unicode hexadecimal -Unicode name -Isiri 3342 -Win 1256.
I personally just use it for romanized -> Unicode decimal. Plans include supporting ISO 8859-6, MacFarsi, and a few more Romanization schemes (because everyone thinks their scheme is the best one). BTW the 'Remove Short Vowels' option hasn't been implemented yet, but will only take a couple minutes to do. http://students.cs.byu.edu/~jonsafar/code2code.html and its corresponding cgi script (change the name first 'mv code2code{_,.}cgi' ): http://students.cs.byu.edu/~jonsafar/code2code_cgi It's all GPL'ed. If you can improve upon it, please do so and let me know so everyone benefits. My main motivation for this script was when I was looking for some texts in Persian, and I came across an Isiri 3342 encoded corpus. The hosting web site said that I needed some modified Mosaic <shudder/> browser to read it, or use some crappy Java program. Text is just text, and converting text from one character set to another is usually pretty straightforward once you start looking at everything in hex (and start forgetting higher-level stuff like fonts). I have similar perl scripts for other languages. While I originally developed it for Persian, I've modified it for a few other orthographies. Just took a few hours to do. -Jahan D. --- Nigel Greenwood <[EMAIL PROTECTED]> wrote: --------------------------------- Connie Bobroff of Wahington University has kindly suggested that I bring our ScriptMaster software products PerScript and PerScribe to the attention of list members who would like to type Farsi using a US/European OS. For most users, particularly those with Windows XP or Windows 2000, **PerScript** is more appropriate. It allows you to type (more or less!) phonetically right-to-left in a text box. The [Unicode] Persian text can then be saved as HTML or, if your system has Farsi-language support, pasted into Word for further processing. For more details and an online demo, please see: http://www.elgin.free-online.co.uk/perscript.htm If you are running Windows 98 you can still use PerScript to generate HTML -- but you won't be able to use Word (for example, Word 97) for further processing. If you really want to produce a Persian document in Word you will have to use our other product, **PerScribe**, which is a bit more complicated and in effect "tricks" Word into displaying Persian correctly. The .doc file will print out correctly, but may look strange if viewed with a WXP/W2K system. For details, please see: http://www.elgin.free-online.co.uk/perscribe.htm I would be happy to answer any questions about these programs. Nigel > _______________________________________________ > PersianComputing mailing list > [EMAIL PROTECTED] > http://lists.sharif.edu/mailman/listinfo/persiancomputing > __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com _______________________________________________ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
