I have been developing something similar to this
concept.  It's just a perl cgi script that converts
from one character set to another.  I haven't worked
on it for a while, but it works at least from
romanized Persian to html Unicode (eg. ک ).  I
think it can currently handle input and output for:
-Romanized Persian (currently only my own personal
one) 
-Unicode html decimal
-Unicode hexadecimal
-Unicode name
-Isiri 3342
-Win 1256.  

I personally just use it for romanized -> Unicode
decimal.  Plans include supporting ISO 8859-6,
MacFarsi, and a few more Romanization schemes (because
everyone thinks their scheme is the best one).
BTW the 'Remove Short Vowels' option hasn't been
implemented yet, but will only take a couple minutes
to do.

http://students.cs.byu.edu/~jonsafar/code2code.html
and its corresponding cgi script
(change the name first 'mv code2code{_,.}cgi' ):
http://students.cs.byu.edu/~jonsafar/code2code_cgi

It's all GPL'ed.  If you can improve upon it, please
do so and let me know so everyone benefits.

My main motivation for this script was when I was
looking for some texts in Persian, and I came across
an Isiri 3342 encoded corpus.  The hosting web site
said that I needed some modified Mosaic <shudder/>
browser to read it, or use some crappy Java program.  

Text is just text, and converting text from one
character set to another is usually pretty
straightforward once you start looking at everything
in hex (and start forgetting higher-level stuff like
fonts).

I have similar perl scripts for other languages. 
While I originally developed it for Persian, I've
modified it for a few other orthographies.  Just took
a few hours to do.

-Jahan D.



--- Nigel Greenwood <[EMAIL PROTECTED]>
wrote:
---------------------------------
Connie Bobroff of Wahington University has kindly
suggested that I bring our ScriptMaster software
products PerScript and PerScribe to the attention of
list members who would like to type Farsi using a
US/European OS.


For most users, particularly those with Windows XP or
Windows 2000, **PerScript** is more appropriate.  It
allows you to type (more or less!) phonetically
right-to-left in a text box.  The [Unicode] Persian
text can then be saved as HTML or, if your system has
Farsi-language support, pasted into Word for further
processing.  For more details and an online demo,
please see:


    http://www.elgin.free-online.co.uk/perscript.htm


If you are running Windows 98 you can still use
PerScript to generate HTML -- but you won't be able to
use Word (for example, Word 97) for further
processing.  If you really want to produce a Persian
document in Word you will have to use our other
product, **PerScribe**, which is a bit more
complicated and in effect "tricks" Word into
displaying Persian correctly. The .doc file will print
out correctly, but may look strange if viewed with a
WXP/W2K system.  For details, please see:


    http://www.elgin.free-online.co.uk/perscribe.htm


I would be happy to answer any questions about these
programs.


Nigel

> _______________________________________________
> PersianComputing mailing list
> [EMAIL PROTECTED]
>
http://lists.sharif.edu/mailman/listinfo/persiancomputing
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com
_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to