RE: Farsi Stemming Algorithm
Thanks a lot, Jon, for your reply. > The only one that I'm aware of is found here [1]. But it > seems hard to get any other information about this stemmer. Yes, it definitely seems so. The only Farsi stemmer I've been aware of myself is http://www.isri.unlv.edu/publications/isripub/Taghva2003-02.pdf . I had contacted Dr. Taghva some time ago about his stemmer, but didn't hear back from him at all. > While the aim is a little different from a stemmer, a Perian > morphological engine is being developed. The one available > for download [2] is a couple versions behind current > development, but it still yeilds decent results. Version 0.5 > is public domain, and newer versions will be under the > General Public License. A new version will be released in a > couple of months. I downloaded this package, and looked into it. It seem to be useful for my job. However, this is the first time I'm hearing of PC-Kimmo, so I was kind of lost when trying to figure out the whole thing. I was wonderring if you can provide me with some additional info (or URLs; didn't find any myself) about this software, especially how can it be used on Linux in batch mode. Does PC-Kimmo come with any callable C interface? Thanks a lot! - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
FarsiWeb and its mission (was RE: IranL10nInfo)
On Wed, 2004-04-28 at 11:40, Omid K. Rad wrote: > I was rather disappointed when I was told that FarsiWeb is > not interested in Microsoft .NET technology at all. Even though I value > all the great achievements that FarsiWeb has found, I personally believe > that resolving Persian computing issues should not be selective, > especially for a group that has nationally accepted this mission The FarsiWeb Project is a research project funded by Sharif FarsiWeb, Inc (a private company) and a few sponsors [1], with a very very limited budget and personnel. Why is that that you think it should resolve *all* Persian computing issues? Individual members of FarsiWeb also represent High Council of Informatics of Iran in the Unicode Consortium and are active in a few other national and international organizations. But the group has not ever been assigned any responsibility apart from its certain limited contracts. In other terms, we have not nationally accepted any mission, and we do not even get any funds from the Iranian government for continuing to represent them in the Unicode Consortium and ISO/IEC JTC1/SC2. That aside, we would love to contribute to proper implementation of Persian and Iranian requirements in any piece of software, which is the reason we are active on the PersianComputing mailing list. We have already shared an internal document with Omid on what we consider requirements of Iran's Persian, and we will try to review his final document and provide comments to him. He has suggested that we even support his final proposal, which we may decide to do at the end. But we have lots of other work to do, and we can't take responsibility for everything, specially any software that doesn't come with source code. Roozbeh Pournader Technical Manager of the FarsiWeb Project President of Sharif FarsiWeb, Inc. [1] Current sponsors are Sharif University of Technology and Cyber7 Inc. Previous sponsors included Science and Arts Foundation, and High Council of Informatics of Iran. FarsiWeb welcomes other sponsors or contractors. ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
On Wed, 2004-04-28 at 20:05, C Bobroff wrote: > > About your suggestion, however, we (i.e. our team) have no idea about > > Afghan and Tajik languages. > It's all one language, different conventions. For example, Tajiki is written in the Cyrillic alphabet instead of Arabic. ;) roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
BBC Persian on Internet and the Persian Language
There is a debate story by BBC Persian on the Internet and the Persian Langauge here: http://www.bbc.co.uk/persian/interactivity/debate/story/2004/04/040428_mf_bt_weblanguage.shtml roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: Days of the Week abbreviated
On Wed, 2004-04-28 at 09:06, C Bobroff wrote: > OK, but kindly don't involve Roozbeh in any flamefests until AFTER he's > done with the fonts. Not much has happened with the fonts since last year (1382), and the latest version is 0.4. BTW, we need volunteers for tracking bugs in the fonts. As for me, I've been busy with the Academy stuff, specifications for Persian locale information and collation, and committee work for the FarsiLinux Technical Committee. roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: PersianComputing Digest, Vol 11, Issue 15
On Wed, Apr 28, 2004 at 07:00:31AM -0700, C Bobroff wrote: > The problem here is that you're receiving the Daily Digest form of the > list so you're mixing and matching two different topics. Possibly > three with the Outlook question that also crept in. Not only that, but also you are screwing my mailer's threaded mail reading. Please don't do that. Masoud ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Abbreviations et al.
Nice examples of abbreviations/shorthands/whatever: * The first page of Mosahab Persian Encyclopedia (first published in 1345/1966), about the abbreviations used in the encyclopedia, showing different methods of Persian abbreviation (127 KiB): http://www.farsiweb.info/misc/mosahab-abbr.png * A month table from a "sar-resid-naame" (I don't know the English term) published in Iran in 1383/2004, showing the one-letter day headings (37 KiB): http://www.farsiweb.info/misc/calendar-abbr.png roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: Days of the Week abbreviated
On Thu, 29 Apr 2004, Roozbeh Pournader wrote: > Not much has happened with the fonts since last year (1382), and the > latest version is 0.4. BTW, we need volunteers for tracking bugs in the > fonts. Sorry to hear that. Can you release the latest if there have been any improvements? Maybe I could post them on my website and say the "price" is one bug report per download! On the other hand, one does not like to distribute a lot of beta fonts into the system which could result in chaos. That's why I usually just send people to Borna still. -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: PersianComputing Digest, Vol 11, Issue 15
On Thu, 29 Apr 2004, Masoud Sharbiani wrote: > Not only that, but also you are screwing my mailer's threaded mail reading. > Please don't do that. I'm sure it was merely a subconscious attempt to seek out the perfect abbreviation for Dushanbe, *Monday* Bazaar and capital of Tajikistan :) -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
On Thu, 29 Apr 2004, Roozbeh Pournader wrote: > For example, Tajiki is written in the Cyrillic alphabet instead of > Arabic. ;) Yeah, well, since I found out you can't actually type it unless you buy those stand-alone programs (without the source code!), I'm going to cite the Tajik [1] example every time people suggest Persian script should be "reformed" and written in Latin chars because it's less headaches. How easy or hard to implement seems to depend only on how much interest there is, not on technical hurdles. (Yes, I just read the BBC article Roozbeh mentioned!) [1] The English word is Tajik (and sometimes Tadzhik) but not Tajiki. (I also only found this out recently!) -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: BBC Persian on Internet and the Persian Language
On Thu, 29 Apr 2004, Roozbeh Pournader wrote: > http://www.bbc.co.uk/persian/interactivity/debate/story/2004/04/040428_mf_bt_weblanguage.shtml Can you give an example of "haa-ye havvaz instead of kasra." I can't think how that situation could come up although I'm sure it's obvious. -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Farsi Stemming Algorithm
--- Ehsan Akhgari <[EMAIL PROTECTED]> wrote: > I downloaded this package, and looked into it. It > seem to be useful for my > job. However, this is the first time I'm hearing of > PC-Kimmo, so I was kind > of lost when trying to figure out the whole thing. > I was wonderring if you > can provide me with some additional info (or URLs; > didn't find any myself) > about this software, It's a two-level morphology engine, so basically it resolves a surface form to a lexical form, or lexical to surface form. For example, if I give it a newspaper word like 'nmiAim' (نميايم -- I am not coming), it will resolve to 'n+mi+A+m', taking into account any morpheme boundary changes (like the yeh here). More documentation is found here [1]. > especially how can it be used > on Linux in batch mode. > Does PC-Kimmo come with any callable C interface? One of the things that drives me nuts about the software is that it claims to run on Solaris/Sparc, Win/x86, MacOS, or BSD, but apparently no Linux (I have a Sparc box, so I'm lucky :-). The source code is downloadable, but it currently doesn't seem to compile on Linux/x86. It does have a callable C interface, as documented in the kimmolib.txt in this file [2]. In fact, I'm working on an AI program that calls PC-Kimmo to do morphology. Batch mode is used via the 'take' command, and using a .tak file. Don't be too disappointed about version 0.5 of the Persian implementation -- it was released 2 years ago ;-) I've reworked almost every aspect of it since then, so hopefully it will work better. Have fun. -Jon D. [1] http://www.sil.org/pckimmo/ [2] ftp://ftp.sil.org/software/unix/pc-parse-doc-20030321.tgz __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
Dear Behdad, Roozbeh, Connie, Thanks for your replies and explaining me. First of all, Iâm sorry if you found my last post antagonistic in anyway. Iâm not expecting FarsiWeb anything more than what they are doing (I donât see myself in that stance either). All I wanted to say is donât avoid something just because you guess it might not be of your taste (or it can simply be my conception only). I am not signifying working on Microsoft platform at all. I am specifically calling to .NET as a technology which is a world standard right now, and we are noticing mistakes in it pertaining Persian and Iran. *Please go on to my next post for my explanations.* And thank you very much for the locale info you provided to us. I'm sure I could never find people anywhere else as useful for our work as those I'm finding here. I hope we can do a good job with your help. Regards, Omid ->> ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
<> Hello every body, especially my friends at FarsiWeb, I'm trying to point out some things here (even though you might already know) about .NET and our project. For your information: The .NET Common Language Infrastructure (CLI) and the C# programming language were submitted to ECMA and ISO/IEC International standardization organizations a couple of years ago. The submissions were ratified as standards after thorough investigations as: Standard ECMA-334 (C#) http://www.ecma-international.org/publications/standards/Ecma-334.htm Standard ECMA-335 (CLI) http://www.ecma-international.org/publications/standards/Ecma-335.htm Standard ISO/IEC 23270 (C#) http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=36 768 Standard ISO/IEC 23271 (CLI) http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=36 769 This resulted in raising many new open source movements over .NET in the ICT community, amongst which there are three major projects by third parties that intend to implement versions of the .NET Framework conforming to the base implementations that Microsoft has done or is already underway. Those are: The Ximian's Mono Project sponsored by UNIX http://www.go-mono.com Free Software Foundation's Portable .NET http://www.dotgnu.org/pnet.html Corel's Rotor (Microsoft SSCLI) for FreeBSD http://msdn.microsoft.com/net/sscli All of these implementations are published under noncommercial shared-source licenses. This means we will have .NET applications running on a vast number of platforms quite soon, to name a handful: Linux, Windows, Solaris, FreeBSD, HP-UX, and Mac OS X. We have also a choice of more than 20 programming languages to choose from: APL, COBOL, Component Pascal, Eiffel, Fortran, Haskell, Jscript.NET, Mercury, Oberon, Pascal, Perl, Python, Smalltalk, Visual Basic.NET, C# , Managed C++, etc. To make applications more interoperable between different platforms, all of the implementations of CLI consider implementing the fundamental namespaces in the .NET Framework Class Library that reflect closely to what Microsoft releases. These don't include namespaces such as Microsoft.*, yet include those that are referred to as pure .NET namespaces which System.Globalization namespace is one of them. The System.Globalization is also available in .NET Compact Framework - a lighter version of the framework that installs on handheld devices. In the "Iran Localization Info for Microsoft .NET" project (IranL10nInfo for short) we have selected to work only on those parts of .NET that are in the System.Globalization namespace (pure .NET). Any changes that Microsoft mekes on them are indirectly ported to every non-Microsoft implementations of the Class Library. Moreover, this project will automatically produce a good layout of information fields that we can simply use for other languages like Tajik and Afghan. So, we are trying to resolve some locale issues far beyond Microsoft - a big name. All the best, Omid __ Iran Localization Info for Microsoft .NET http://www.idevcenter.com/projects/iranl10ninfo/draft/ Other Open Source developments over ECMA CLI: Intel Lab's OCL (Open CLI Library) http://sourceforge.net/projects/ocl/ Platform.NET http://sourceforge.net/projects/platformdotnet/ Articles: Linux World - Bringing the CLI to Open Source (Article) http://www.linuxworld.com/story/39216.htm?DE=1 Devx - Peeking under the Lid of Open Source .NET CLI Implementations http://www.devx.com/devx/article/9725 Microsoft Open Source: MSDN - ECMA Standardization http://msdn.microsoft.com/net/ecma/ MSDN - The Common Language Infrastructure (CLI) http://msdn.microsoft.com/netframework/using/understanding/cli Microsoft Share Source Home Page: http://www.microsoft.com/sharedsource/ ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: Days of the Week abbreviated
The main problem with the fonts right now are: * The lack line height data. * The size does not match of the MS fonts. (so not to the Latin fonts). * A few of fonts have bitmaps added. Those bitmaps should be removed. * There's a known problem on LCDs, but that's another story perhaps. So, as you can see, the third item is trivial to fix, the fourth is not that important, and the first two are easy to fix. After that we can talk about mark positioning and other fancy characteristics. behdad PS, Behnam: So this was the list of bugs in the fonts you asked me to list. Waiting for the fix. On Thu, 29 Apr 2004, C Bobroff wrote: > On Thu, 29 Apr 2004, Roozbeh Pournader wrote: > > > Not much has happened with the fonts since last year (1382), and the > > latest version is 0.4. BTW, we need volunteers for tracking bugs in the > > fonts. > > Sorry to hear that. Can you release the latest if there have been any > improvements? Maybe I could post them on my website and say the "price" > is one bug report per download! > On the other hand, one does not like to distribute a lot of beta > fonts into the system which could result in chaos. That's why I usually > just send people to Borna still. > -Connie --behdad behdad.org ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
On Thu, 29 Apr 2004, C Bobroff wrote: > On Thu, 29 Apr 2004, Roozbeh Pournader wrote: > > > For example, Tajiki is written in the Cyrillic alphabet instead of > > Arabic. ;) > > [1] The English word is Tajik (and sometimes Tadzhik) but not Tajiki. (I > also only found this out recently!) I guess Tajik is more correct. While Tajik is listed in Merriam-Webster at m-w.com, but in their Indo-Europian languages chart they have named it Tajiki: http://m-w.com/mw/table/indoeuro.htm Perhaps we should add Tajik vs. Tajiki to the list of wars ;). > -Connie --behdad behdad.org ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
On Thu, 29 Apr 2004, Behdad Esfahbod wrote: > Perhaps we should add Tajik vs. Tajiki to the list of wars ;). Good idea! Merriam-Webster even has "Irani" as an English word in case you need more suggestions for your list. I'm sticking with the Oxford English Dictionary... -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
Dear Connie (et al), It's very easy to type Tajik using a "Phonetic" (i.e., mnemonic) Cyrillic keyboard. I wrote a Keyman keyboard driver for Kazakh that should include all those Cyrillic fancy characters needed for Tajik. Want to try it? Best regards, Peter E. Hauer Linguasoft ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: IranL10nInfo
On Thu, 29 Apr 2004, Linguasoft wrote: > It's very easy to type Tajik using a "Phonetic" (i.e., mnemonic) Cyrillic > keyboard. With which font though? I could only find hacked fonts. -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing