Re: New versions of Persian stemmer syntax parser

2005-06-24 Thread Jon D.

--- Behdad Esfahbod [EMAIL PROTECTED] wrote:

 Can you please educate us on how these are supposed
 to work?  I
 can't get anything out of them.  I choose UTF-8, and
 type a verb
 in the stemmer, I get back the verb verbatim.

Sorry about the late reply.  The perl script is run
from the command-line, taking input from STDIN and
outputing to STDOUT under Unix/Linux/Cygwin.

I just finished version 0.7, which natively supports
input and output to  from UTF-8, CP-1256 (aka.
Windows-1256), and ISIRI 3342.  The default input 
output is romanized text.  This version also fixes
some bugs (eg. --root).  A tentative name for the
stemmer is Perstem.  If any of you think of a better
name, please let me know.  Also, I'll try to get the
web page version updated sometime soon, which
hopefully will fix the problem that Behdad mentioned.

http://students.cs.byu.edu/~jonsafar/perstem


Sample usage might include (after removing the single
quotation marks around commands):

Input a UTF-8 webpage and output to CP-1256,
preserving only the roots of words, and remove HTML
tags:
'perstem -i utf8 -o cp1256 --root --noroman 
my_utf8.html  my_cp1256.txt'

Input romanized sentence from the command-line, output
to UTF-8, show the morphological links, remove short
vowels, and tokenize punctuation:
'echo man ketAb-hAie tu rA nemi-binam. | perstem -o
utf8 --links --unvowel --tokenize  my_utf8.txt'

For a full list of commands, try the -h or --help
option.


The stemmer and syntax parser were well recieved last
week at the First International Conference on Aspects
of Iranian Linguistics in Leipzig, Germany.

Thanks for all your feedback so far,
-Jon Dehdari





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: New versions of Persian stemmer syntax parser

2005-06-09 Thread Behdad Esfahbod
On Tue, 7 Jun 2005, Jon D. wrote:

 For anyone who's interested, new versions of a Persian
 stemmer, two-level morphology engine, link-grammar
 syntax parser, and character encoding conversion
 scripts are available for download.  All of it is
 under the Free license GPL v.2

 Web demonstrations for the Persian stemmer and the
 syntax parser are available also:

 http://students.cs.byu.edu/~jonsafar/stemmer.html
 http://students.cs.byu.edu/~jonsafar/persianlg.html

Hi Jon,

Can you please educate us on how these are supposed to work?  I
can't get anything out of them.  I choose UTF-8, and type a verb
in the stemmer, I get back the verb verbatim.

Thanks,


--behdad
http://behdad.org/
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing