I have been developing something similar to this
concept. It's just a perl cgi script that converts
from one character set to another. I haven't worked
on it for a while, but it works at least from
romanized Persian to html Unicode (eg. #1705; ). I
think it can currently handle input and output
I'm not sure if you're already aware of this, but
www.ozodi.org run by Radio Free Europe distributes
these Tajik fonts:
http://students.cs.byu.edu/~jonsafar/fonts/xtajmcyr.ttf
http://students.cs.byu.edu/~jonsafar/fonts/xtajtcyr.ttf
-Jon D.
--- C Bobroff [EMAIL PROTECTED] wrote:
Peter
For anyone who's interested, new versions of a Persian
stemmer, two-level morphology engine, link-grammar
syntax parser, and character encoding conversion
scripts are available for download. All of it is
under the Free license GPL v.2
Web demonstrations for the Persian stemmer and the
syntax
--- Behdad Esfahbod [EMAIL PROTECTED] wrote:
Can you please educate us on how these are supposed
to work? I
can't get anything out of them. I choose UTF-8, and
type a verb
in the stemmer, I get back the verb verbatim.
Sorry about the late reply. The perl script is run
from the
/orthography.txt
or
http://students.cs.byu.edu/~jonsafar/persian_charmaps.pdf
To romanize Persian texts:
download this:
http://students.cs.byu.edu/~jonsafar/perstem.pl
then type:
perl perstem.pl --nostem --input utf8 myinput.txt
myoutput.txt
Hope this helps,
-Jon D