Re: New versions of Persian stemmer & syntax parser
--- Behdad Esfahbod <[EMAIL PROTECTED]> wrote: > Can you please educate us on how these are supposed > to work? I > can't get anything out of them. I choose UTF-8, and > type a verb > in the stemmer, I get back the verb verbatim. Sorry about the late reply. The perl script is run from the command-line, taking input from STDIN and outputing to STDOUT under Unix/Linux/Cygwin. I just finished version 0.7, which natively supports input and output to & from UTF-8, CP-1256 (aka. Windows-1256), and ISIRI 3342. The default input & output is romanized text. This version also fixes some bugs (eg. --root). A tentative name for the stemmer is "Perstem". If any of you think of a better name, please let me know. Also, I'll try to get the web page version updated sometime soon, which hopefully will fix the problem that Behdad mentioned. http://students.cs.byu.edu/~jonsafar/perstem Sample usage might include (after removing the single quotation marks around commands): Input a UTF-8 webpage and output to CP-1256, preserving only the roots of words, and remove HTML tags: 'perstem -i utf8 -o cp1256 --root --noroman < my_utf8.html > my_cp1256.txt' Input romanized sentence from the command-line, output to UTF-8, show the morphological links, remove short vowels, and tokenize punctuation: 'echo "man ketAb-hAie tu rA nemi-binam." | perstem -o utf8 --links --unvowel --tokenize > my_utf8.txt' For a full list of commands, try the -h or --help option. The stemmer and syntax parser were well recieved last week at the First International Conference on Aspects of Iranian Linguistics in Leipzig, Germany. Thanks for all your feedback so far, -Jon Dehdari __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: New versions of Persian stemmer & syntax parser
On Tue, 7 Jun 2005, Jon D. wrote: > For anyone who's interested, new versions of a Persian > stemmer, two-level morphology engine, link-grammar > syntax parser, and character encoding conversion > scripts are available for download. All of it is > under the Free license GPL v.2 > > Web demonstrations for the Persian stemmer and the > syntax parser are available also: > > http://students.cs.byu.edu/~jonsafar/stemmer.html > http://students.cs.byu.edu/~jonsafar/persianlg.html Hi Jon, Can you please educate us on how these are supposed to work? I can't get anything out of them. I choose UTF-8, and type a verb in the stemmer, I get back the verb verbatim. Thanks, --behdad http://behdad.org/ ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
New versions of Persian stemmer & syntax parser
For anyone who's interested, new versions of a Persian stemmer, two-level morphology engine, link-grammar syntax parser, and character encoding conversion scripts are available for download. All of it is under the Free license GPL v.2 Web demonstrations for the Persian stemmer and the syntax parser are available also: http://students.cs.byu.edu/~jonsafar/stemmer.html http://students.cs.byu.edu/~jonsafar/persianlg.html The full versions are here: The stemmer (type './stemmer.pl --help' for instructions): http://students.cs.byu.edu/~jonsafar/stemmer.pl The morphology engine: http://students.cs.byu.edu/~jonsafar/persian-pckimmo-0.8.2.tar.gz The link-grammar syntax parser: http://students.cs.byu.edu/~jonsafar/persianlg-0.8.2.tar.gz Persian character set converters: http://students.cs.byu.edu/~jonsafar/utf8_2_roman_1-7.pl http://students.cs.byu.edu/~jonsafar/pub/win1256_2_roman_1-5.pl http://students.cs.byu.edu/~jonsafar/pub/win1256_2_roman.tcl http://students.cs.byu.edu/~jonsafar/pub/isiri2roman_1-2.pl http://students.cs.byu.edu/~jonsafar/pub/roman2unicode_1-5.pl I'm also starting a lexicon project for Persian here: http://students.cs.byu.edu/~jonsafar/persian_lexicon.html It's still pretty small, but hopefully with time it will grow. Any helpful feedback or contribution would be appreciated. -Jon Dehdari __ Discover Yahoo! Get on-the-go sports scores, stock quotes, news and more. Check it out! http://discover.yahoo.com/mobile.html ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing