RE: Persian PC-Kimmo 0.8 released
For anyone who's interested, Persian PC-Kimmo version 0.8 has just been released. It's available here: http://home.byu.net/jmd56/download/persian-pckimmo-0.8.tar.gz Thanks, Jon, for releasing this version. It looks a lot better than the previous one! The biggest thing holding them back from being a 1.0 is a relatively small lexicon (~1350 words). The morphology engine achieves about two-thirds recognition on a corpus of about 3.5 million words. And of course, it's GPL'ed. Hmmm, do you have a list of the words in the current lexicon? (I'm not familiar with PC-KIMMO specific commands, so I can't parse them on my own.) What should I do to help adding more words? Any helpful feedback would be appreciated. I find the new tree-style recognition a lot helpful: n+mi+]+im NEG+DUR+come.PRES+1P 1: Top | Verb | VNEGPREFIXVNStem n+ __|___ NEG+ VPREFIX VStem mi+ | DUR+V1Stem |_ V2Stem VPSUFFIX | +im V3Stem +1P | V ] come.PRES Top: [ cat: Top ] 1 parse found n+mi+]+m NEG+DUR+come.PRES+1S 1: Top | Verb | VNEGPREFIXVNStem n+ __|___ NEG+ VPREFIX VStem mi+ | DUR+V1Stem |_ V2Stem VPSUFFIX | +m V3Stem +1S | V ] come.PRES Top: [ cat: Top ] 1 parse found I was wonderring if there's some way to retrieve the tree-structured data in a format which is easy to parse (the ASCII style is too difficult for a computer program to parse), something like an XML format maybe? - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Persian PC-Kimmo 0.8 released
Thanks for your reply, Jon. Thanks for asking. All the words are in tab-separated text files, as in noun.lex, verb.lex, etc. They get converted to a kimmo-usable file such as fa-noun.lex, fa-verb.lex, etc. using the db2lex perl scripts in the scripts directory. The verb and adjective files use a specific script written for them; all others use the plain script. Also see the orthography.txt file for the romanization scheme. It also has some other goodies. I would love add any additions you might make to the lexicon in the next release. I suppose I can use roman2unicode to convert the roman encoding into readable plain text (I'm not fast on reading the roman notation). That way, I can import the data into Excel, sort it alphabetically, and start adding new stuff... As you can see, it needs a little more work on the morphophonemic rules, but it should work fine for stemming purposes. Yes, it's pretty good at recognizing the stem of the word. Hans Nelson is the man to talk to. He's working on a Kimmo output to XML program. I don't know much about it, but here's his email: [EMAIL PROTECTED] Thanks for your hint. I'll try to contact him. In case you're interested, I can send the final result of our discussion to you off-list. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Farsi Stemming Algorithm
Hi all, Does anyone know of any free Farsi Stemming algorithm, like the Porter algorithm to English? Thanks a lot! - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Farsi Stemming Algorithm
One of the things that drives me nuts about the software is that it claims to run on Solaris/Sparc, Win/x86, MacOS, or BSD, but apparently no Linux (I have a Sparc box, so I'm lucky :-). The source code is downloadable, but it currently doesn't seem to compile on Linux/x86. It does have a callable C interface, as documented in the kimmolib.txt in this file [2]. In fact, I'm working on an AI program that calls PC-Kimmo to do morphology. Batch mode is used via the 'take' command, and using a .tak file. Here's an update. I tried to build the whole pc-parse package on Linux (RedHat 9.0) using gcc 3.2.2, and it compiled without a single problem. I also tried running PC-Kimmo, and it was working smoothly. I noticed that in the README, they cliam to have tested the build process on the following platfroms: 1. Debian GNU/Linux 2.2 (kernel 2.2.17) / gcc 2.95.2, glibc 2.1.3-24 2. Red Hat Linux 7.3 (kernel 2.4.18) / gcc 2.96, glibc 2.2.5-34 3. Red Hat Linux 8.0 (kernel 2.4.18-14) / gcc 3.2-7, glibc 2.2.93-5 4. OpenBSD 3.1 / gcc 2.95.3 5. Mac OS X (10.2) / gcc 3.1 6. cygwin 1.3.10-1 (Windows XP Pro) / gcc 2.95.3-5 Maybe you're trying an older version? - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Farsi Stemming Algorithm
(I'll just reply to your other post here) I guess I didn't know about a new pc-parse release. Where did you get the newest source code? That's terrific news for me. Well, the release I downloaded is approximately one year old, but here's the URL I downloaded it from: ftp://ftp.sil.org/software/unix/pc-parse-src-20030321.tgz To build it, I just did a typical ./configure; make; make install; - there was nothing more than that. What compiler version have you used to compile it? Let me know if you still have compilation problems. I might be able to help if I can reproduce them here. I'm very interested in any work you'd work on, including a PHP extension. Maybe SIL.org might be interested as well. Actually, what I'm working on is an English/Persian search engine which can be placed on any site with no need to download/install anything. It's nearly finished, I only have to translate the web UI into Persian, and also implement stemming for Persian in the engine. Originally I planned to implement a stemming algorithm myself, but I figured that I can't be considered an expert in Persian grammar/linguistics at all, so I prefer to use already working solutions, and your work seems to be the *best* choice. The PHP extension would be quite a thin wrapper, but anyway I'll definitely provide you with the source code when I'm finished. You'll be also welcome to a copy of the search engine's source code itself if you're interested. Give me a week and I'll email them to the email address in your signature, unless you tell me otherwise. Thanks a lot! I highly appreciate your great help. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
An important note: what Notepad does here is only acceptable. It's not even recommended. HTML 4 clearly doesn't allow a UTF-8 BOM appear before the HTML tag. Notepad is supposed to be a text editor. A text editor shouldn't insert markup by itself. BTW, ISIRI 6219 strongly discourages the use of a BOM in UTF-8 files. The problem here is that web protocols (HTML for example) don't allow the BOM, and Notepad is not an HTML editor, so there's nothing to prevent it from adding the BOM. Check out: http://www.unicode.org/faq/utf_bom.html#28 - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] 'I generally take life as it comes my way', said Death. ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
First of all, thank you very much for all the patient and lengthy explanations. Very nice of you to share so many tips! (Thanks to the others too who answered on and off list!) Happy to help! [snip] Now that 2 people have said to change ZWNJ to \u200c, I tried that but it didn't work. I don't think I have the right tool. I couldn't do it in Notepad because as I said, it's WYSIWYG in Persian script so if I do a global replacement and stick \u200c in the middle of Persian script, that's obviously not going to work (and I also tried it for good measure and it didn't work but there may be many reasons it didn't work out using Notepad.) I don't know what you mean here. Why it doesn't work in Notepad? Note that on Windows XP, you can't type ZWNJ inside the Find/Replace dialog box - you need to copy/paste it from inside the Notepad text editor window. Another reason why not to use Notepad. Then, since you recommended Frontpage, I tried that. Earlier, it had not even occured to me to attempt to open a .js file in Frontpage (version 2000.) This time I fooled it by changing the extension from .js to .html and so was able to open it in html view where all the unicode was in numeric style. I changed all the #8204; to \u200c but now I see that also has not worked. Well, I don't know what the problem is here... BTW, FrontPage 2003 can open the .js file (using File | Open, or drag and drop) and render the UTF-8 characters without converting them to numeric entities just fine. Don't try putting them in an HTML file. Don't know about FrontPage 2000, though. I think I'm not going to use Notepad for making bidirectional arrays from now on! That is insane to go to such great lengths! Yeah, it's definitely so. Not sure what you have in mind here, but at this point, Ill be glad just to make it work with ZWNJ. In the JS code, try to replace the trailing ZWNJ-raa and ZWNJ-o with nothing using a regex. HTH, - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
You can re-live its creation here in the archives: http://lists.sharif.edu/pipermail/persiancomputing/2003-June/0 00538.html [snip] Thanks for the links. Seems like a very handy keyboard. BTW, why the Shift-Space combination does not work? Done! Beautiful! I hope the Mozilla users appreciate all this trouble. Thanks again for all your help! You're welcome! :-) - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
What is notepad? A text editor? Text editors should not insert a UTF-8 BOM either. The problem is that Microsoft sometimes invents non-standard things and then pushes it so hard that Unicode adds it to parts of the standard (or an FAQ). Microsoft conventions for .txt files in the Unicode FAQ looks sarcastic to me. Well, maybe you're right, but I don't see how a text editor is supposed to know the encoding of a file without some kind of mark. See, HTTP transfers the character set using the Content-Type response header. In HTML, it's spedified with a meta http-equiv=Content-Type ... tag. In XML, the default encoding is UTF-8, and if a document is encoded in another encoding, it must be specified in the ?xml ? PI. Plain text files have no means of identifying the character encoding, so a single text file can be interpreted as UTF-7, UTF-8, UTF-16, UTF-32, etc. if there's nothing to declare the exact character encoding used. The point here is that, protocols which do not allow BOM are those who provide other means of specifying the character encoding. A certain byte stream can have multiple interpretations depending on what content encoding you use to interpret it, and there must be some way to cut off this confusion. YMMV, - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
Thanks for the links. Seems like a very handy keyboard. BTW, why the Shift-Space combination does not work? Bug in Microsoft keyboard layout creation tool. Use Shift-B temporarily. Thanks. I've not done any work in this arena, so what I propose here might make no sense. Sorry if that's so. But, the M$ page on the keyboard layout creation tool says the tool simplifies the process of creating a keyboard layout. Would there be any way to assign ZWNJ to Shift+Space by coding the keyboard layout tool manually? If you can send me the C/C++ source file off-list, I'll try to investigate it further. If not, I guess Shift+B is not that bad as well. The keyboard layout rocks, even without having Shift+Space in place. :-) - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Persian-English Dictionary -- Was: Iranian Mac User group
[snip] I'm sure this dictionary must have been funded by the Iranian government and no profits expected. I'm shocked to see that less than a dozen US universities have purchased it. I should think the author and publisher would be very happy to see it put online and all the efforts go to some use. Surely they will agree if their name is kept with the data! As for the technical part, I no longer have any doubts as to the abilities of the members of this group, especially after hearing the keyboard hack job for the sake of the ZWNJ earlier today! :-) I did the keyboard job just because I thought it's a lot easier to use Shift+Space instead of Shift+B, and also because I was in the process of typing in a lot of Persian data. It took only about half an hour (not the time to download the MSKLC tool of course) and improved my typing speed considerably. About your proposal, I'm personally interested in doing the technical part of the job. I volunteer to implement a web interface for the dictionary, and I can also provide the hosting for the web interface. I can provide some amount of web space for the data as well, but I think we'll need other people's help as well, because I would guess the whole data would be *huge*. If the data has to reside on multiple web servers, I can code some sort of distributed query mechanism which transparently fetches the definitions for remote web servers and display them to the end user transparently. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Misinformation!
Here is a solution (in fact a hack) that if implemented correctly, can resolve some of the issues till people and Google start using correct software: With a little tweaking, the web servers can translate the correct Unicode to the incorrect unicode desired so much by the Win9X users. That is, the web severs looks at the browser request, and if it can detect Win9X, translates all U+06CC's in the document to U+064A (and all other required translations). The same technique could be used to fool google into generating correct search results. That, is the web server generates a Win9X friendly version of the document and appends it to the original document. You can also allocate tags that the user of the web server can disable or enable some of these features. This may even make one gain some advatnage over other web hosting companies. That solves half of the problem. On Win9x, the key d on the keyboard inserts an Arabic YEH, and on Win2K+, it inserts FARSI YEH. So, if you use this method, when a user types in a word containing yeh in the google's search box on Win9x, they wouldn't find your site. The best hack (or solution, as one might call it) I've found for this is feeding a version of page too Google which contains both forms of words (using YEH and FARSI YEH) so that the chances of google finding your page for a certain keyword gets maximized. Of course, certain measures must be taken to prevent bad results, for example, the proximity of the words must not get touched. Nevertheless, this will cause other problems, such as malformed keyword density, which cannot be solved reliably. The problem must be fixed in the search engine code, really, and such hacks have their own downsides. The search engine project I've been working on www.ariasearch.com handles this (and the ARABIC KEHEH and FARSI KEH problem) among other problems for searching in Persian text. Of course, the solution above is only a transient one, and it is up to people to upgrade their Win9X machines to something that is Unicode-compliant, also it is up to Google to program their systems such that it can understand that both U+06CC and U+064A are the same shape and hence should be regarded the same for searching unless user requests otherwise. This is the same as case-insensitive search that is usually implemented by mapping all upper and lower case characters -- in documents and queries alike -- to uppercase. Yeah that's right. Of course great attention must be paid so that it doesn't break Arabic search results. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] He who sees the abyss, but with eagle's eyes - he who with eagle's talons grasps the abyss: he has courage. -Thus Spoke Zarathustra, F. W. Nietzsche ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Misinformation!
There's a difference in the case of C++ standard and web standards: Writing non-standard C++ code only produces compile-time problems, but if you happen to compile the code, it works correctly (or supposed to do so). Well, that's not exactly so. Some non-conformant behavior tend to generate (maybe subtle) runtime behavior differences. But I see what your point here is. But it's quite a different case in web. 30-40 percent is low enough to get ignored, counting that the other way you are sacrificing the other 60-70% for not being able to find the document by searching in Google. And note that even with Win9x and a recent IE, and updated fonts, there's no problem. I'd definitely do so if the Google search problem couldn't be solved. But I've been using a method I've mentioned in my other post to solve that problem as well. This was the best way of having the best of the two worlds that I could think of, but I'm wide open for suggestions/improvements to this idea. About using HTML entities, no matter what the encoding of the page is, HTML entities generate Unicode characters. They do on most browsers, but browsers are not required to do so. Consider a browser which can't handle UTF-8 (well, or at all). It's quite common to see people exporting Persian documents in MS Word, and get an HTML page encoded in MS Arabic encoding, with Persian Yeh and Keh encoded in HTML entities. Yes, and that will make their document even more difficult for search engines to index. And of course, I'd debate that using CP1256/ISO-8859-6 is not suitable for Persian documents, but that's another story perhaps. PS. BTW, I just found that using Harakat (kasre, fathe, ...) also prevent a hit in Google search :(. That's quite expected, but perhaps I should reconsider my habbit of putting those tiny marks everywhere. That's another sad fact. I really think that Google must seriously consider implementing some such details on their indexing process. That's also one of the things that AriaSearch.com handles. --- Hmmm, now that we're here, how about gathering some volunteers who can work with Google to fix some of these problems? In the past, I've contacted Google on a number of occassions about small problems in their services, and they seemed quite willing to fix them. Maybe we would hopefully have a more Persian-friendly Google in the future this way. If you feel that this is a good idea, I'd be pleased to take part in that team. Comments? - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Persian-English Dictionary -- Was: Iranian Mac User group
I volunteer to implement a web interface for the dictionary, Excellent! You'll have to make it so that whether the user types in bi[ZWNJ]kaar, bikaar, or bi kaar, the word will be found! Yes, that's right. This is relatively easy to implement. but I think we'll need other people's help as well, because I would guess the whole data would be *huge*. Will this require separate dedicated server(s)? (I'm thinking about Behdad and the Persian Digital Library here...) Hmmm, not necessarily *dedicated*. As long as there's enough web space for some part of the data to reside on the server, and I have access to it to install an application which processes the queries locally, it doesn't really have to be dedicated, unless the server's already fully loaded by other tasks. I don't think we'll need dedicated servers for this job. The process of searching can be done fast enough. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Persian UTF-8 MySql collation
Ehsan - are you thinking about adding glibc collation to the strings/ctype-MYSET.c file? Or something more fundemental? Well, to tell you the truth, I'm not really sure, since I've not checked the MySQL source tree yet. But yes, I'm going to see if glibc support can be incorporated into MySQL's charset handling mechanism. I think you and the team I'm working with are trying to do the same thing - it would be great if we could work together and come up with a solution that anyone else can use too. I looked around a bit, and it seems like MySQL 4.1.x will be supporting UTF-8. MySQL 4.0.x doesn't have that support (the version I'm using on the production server is 4.0.18-standard.) Because of that, incorporating that support into MySQL might require a lot more work that I currently imagine. Unfortunately in that case, I'll have to leave MySQL as it is, and sort the data at the client site (less efficient, but requiring less development time), and since the application I'm working on doesn't store very big chunks of data in the db, I may decide to sacrifice performance for development time. What's involved in creating a collation file? These two pages: http://dev.mysql.com/doc/mysql/en/Adding_character_set.html http://dev.mysql.com/doc/mysql/en/Character_arrays.html http://dev.mysql.com/doc/mysql/en/String_collating.html seem to say that's it's not too difficult, if you know what you're doing? (Which I dont. I'm just a humble PHP programmer) Well, that seems to be for single-byte code pages. The Persian character coding system used in glibc is UTF-8, and that will require patching MySQL source code. And like I said, because of MySQL's lack of UTF-8 support, it might require more work that I imagine. I think I can handle it from technical point of view (I'm good at C/C++) but I'm quite pressed in free time... ... it seems it would be great to create a mySql Persian collation file rather than changing the source, with all the problems that would lead to of having to re-patch the code everytime there's a new MySql release? Or is that inevitable? Well, if we decide to change the MySQL source code, we can submit our patches to MySQL team, and hopefully they will incorporate it into their new releases. Of course in that case we might have to look into adding that support to MySQL 4.1.x as well (if it already doesn't have.) So there's no need for re-patching. There's just a need for time! :-) In case I decide not to spend the time in the development of Persian collation support in MySQL, I'll be glad to help your team in case they need technical programming help. In that case, I'll let you know off-list (remind me if you don't get any note from me within a week, please.) - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Persian UTF-8 MySql collation
Right. I was thinking about adding UTF-8 Persian collation to MySql 4.1.x - our project will involve a fairly large amount of data, so we'd like to have the option of sorting at the DB level. I've never tested MySQL 4.1.x. Have you tried it? How is the UTF-8 support? Have you tried Persian collation in MySQL 4.1.x to see how much better it's compared to 4.0.x? Unfortunately I won't be willing to look into 4.1.x at this time, since it's Beta, and we don't use Beta products on our productions servers, so doing so will do no good to my project. ... which is why we're hoping to use MySql 4.1.x I'd give it a try if I were in your shoes. Nope, no Persian collation file for MySql 4.1.x as far as I can see (which is where we came in!) How does 4.1.x get Persian sorting? Like 4.0.x? - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Linux teaching website
BTW Ehsan, I consider this off-topic. This is about Persian support in software and computers, software written to handle Persian text, etc. This is not a list to gather volunteers for a website that happens to be about an operating system and in Persian. Not that I'm not personally interested, but only that it is off-topic. Oh, I'm sorry for posting off-topic to the list. I'll try not to do so again. :-) - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Vi/Emacs editor with RTL support
Not anything really useful. Vim has a rightleft mode (:set rightleft), which is useful for ONLY RIGHT-TO-LEFT text. Emacs, it's worse: there's an emacs-unicode branch, an emacs-bidi branch, and the emacs-head branch. They are trying to merge the three of them for a few years now! Thanks for your reply, Behdad. So, is there any editor you would recommend that has good support for bidirectional (Persian and English) text, and preferrably supporting HTML (but an editor without HTML support will also be just fine)? The latest one I'm working with is Bluefish, but it has some minor problems, and I'm looking to see if there's something better available. TIA, - Ehsan Akhgari Learn Linux in Persian: http://www.persian-linux.org/ ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: farsiweb.info
Hi friends, The FarsiWeb Project's website http://farsiweb.info/ is now up-to-date with a new Wiki system. Congrats on the new site! I took a quick look, and I have a comment regarding the design. It seems to me that you're using a transparent PNG file as the background for the pages. IE doesn't support this feature of PNG files correctly, so the pages render half unreadable on IE. I suggest changing this, and the easiest way would be not to use a transparent PNG (no need for that, anyway - just let the background be white.) Fortunately real browsers (Firefox, and Mozilla) do render it pretty fine! Other than that, the layout seems very nice. Thanks for your efforts. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: farsiweb.info
Ah, that's a good sign, that none of us at FarsiWeb uses IE anymore! BTW, IIRC, 8bit transparent PNG works in IE too. I'm not sure. What I can say for sure is the image won't render correctly in IE. Hmm, BTW, at a second look, IE fails to render the layout correctly as well! Of course that's not as bad as how the background image looks. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: farsiweb.info
Humm, would you check http://farsitex.org/? I think it worked in IE when I designed it. Done. It looks pretty well, only the non-link items in the left hand menu might not be much readable (or it might be my lack of perfect sight.) - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] Light without eyes illuminates nothing. ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
Roozbeh, it is a long time and I don't remember your answer to this email. What happened to this new dll? AFAIK, it's not still put in the sourceforge. If you're interested, I can mail it to you off-list. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Parsnegar to Unicode conversion AND phonetic Farsi keyboardwithEnglish keyboard
Mr. Khazaee misdirected the email to me personally. I thought I'd send it to the whole list. -Original Message- From: khazaee [mailto:[EMAIL PROTECTED] Sent: 2004/12/18 10:27 Þ.Ù To: Ehsan Akhgari Subject: RE: Parsnegar to Unicode conversion AND phonetic Farsi keyboardwithEnglish keyboard You want to define a user-defined keyboard for linux operating system or not? for linux operating system you can refer to persian keyboard on farsilinux.org. you can change the position of persian letter in your keyboard easily. regards. -- Original Message -- From: Ehsan Akhgari [EMAIL PROTECTED] Date: Fri, 17 Dec 2004 22:50:03 +0330 Also, I was wondering if anyone knows a way of defining a user-defined keyboard to use with Farsi Unicode, similar to Parsnegar which allows to define a phonetic Farsi keyboard with English keyboards, so that, when typing in Microsoft word in Farsi, I could use key J for letter jim, A for letter alef, etc. You need your custom keyboard layout. M$ has a tool for that: Microsoft Keyboard Layout Creator. You can use it to create your fully (well, nearly fully) customized keyboard layout for Windows. - Ehsan Akhgari www.farda-tech.com http://www.farda-tech.com/ List Owner: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] [Email: [EMAIL PROTECTED] [WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: openoffice zwnj
That's a famous bug that will happen in applications. KDE also had that bug for quite a time until Behdad fixed it. The bug is because the application or the rendering engine asks the font for a glyph for the character, where it shouldn't. The application or the rendering engine should not pass ZWNJ (and a few other invisible Unicode characters) down. Great to know it's been fixed. Do you exactly know the fix is included since which version of the KDE? I've noticed that this bug seriously affects the usability of KDE for Persian computing. Thanks, - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: MSVC@BeginThread.com [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Frasi in MS Powerpoint
Hi I would like to write farsi in microsoft powerpoint for presentation purposes. Would it be possible at all? If yes, how this can be done? What alternatives are available. I appreciate your help. It is possible. You simply should switch to a Persian keyboard and type your text. I seem to remember that some versions of MS Powerpoint did not support right-to-left text properly (I don't remember exactly what the problem was). A very good alternative to MS Powerpoint is the OpenOffice.org (www.openoffice.org) version 1.1.3. I have used it to create Persian presentations with no problems. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: MSVC@BeginThread.com [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Persian in Windows Applications
I'm going to program and develop a windows application and I want to use Persianin user interface. I'm using Windows XP and uni-code in programming language. But is there any trick or rule to make application working fine in older windows? (98, ME) Or just using uni-code makes anything fine? Win9x does not support Unicode internally. M$ has developed the so-called MSLU[1] which provides Unicode compatibility at the Windows API level for Win9x. I have used it, and it indeed works, but be warned that these OSes do *not* support Unicode anyway, and all MSLU can do is implement API stubs for Unicode versions Win32 functions (such as, CreateFileW) which would allow you to build your app in Unicode mode in Visual C++. What I've ended up doing in the past is do all the UI as HTML, and embed a HTML rendering engine in my app. I've used the WebBrowser control (the same control used by IE). This requires you to distribute a customized[2] version of IE with your own app which has "Arabic" support built-in, and write some amount of _javascript_ code to enable the user to type Persian in your application even if they don't have a Persian keyboard installed (you can find several JS codes as starters on the web for this purpose.) You can also use Gecko, which is Mozilla's great HTML rendering engine as well. If you decide to use the WebBrowser control, check out http://www.beginthread.com/Article/Ehsan/WebBrowser%20Goodies/ for some articles about possible customizations of the control that you may be needing in your own applications. All of this, of course, applies to Visual C++. If you use some other programming tool, then you'll have to research on your own, though I think that few support MSLU. [1] You can download it from http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdkredist.htm. [2] You can deploy acustomizedIE install using the IE Administration Kit (IEAK.) -Ehsan Akhgari www.farda-tech.comList Owner: MSVC@BeginThread.com [Email: [EMAIL PROTECTED]][WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: problem in myql data display
mzz wrote: hi every one i have a problem in mysql data base is that when i reveiw my table cotained data in PhpMyAdmin in persian i can see and edit data correctly but when i use my script to query my tables using PHP it display my table data as a '?' (question marks) i am using mysql server 4.1; php4.xx and utf-8 encoding in my pages. OS:Win2000 server. Regards zarbizade. Can you dump the table into a file from the PHP script and then make sure the data in the file is correct (and in UTF-8 encoding)? Ehsan ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: problem in myql data display
Sadeq Naqashzade wrote: Salaam, One of my frinds have same problem (but I have not) I'm using mysqli and he using mysql extention. Try mysqli this may help you. - Sadeq Thanks, but I wasn't the one who asked the question! I'm CCing the OP as well as the list. Ehsan ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
Dear Ehsan,You suggested a creative solution. Thank you.My application, consists of a database, and two user-interfaces.The first UI is used for data entry,where I parse a given XML file, extract and "Romanize" itsdata - based on a "Persian-Roman Conversion Map" -and then insert them into DB.Luckily, PHP provides a very fast function forsuch conversions, named strtr().Now I have a "Roman DB".The second UI is used for data retrieval (searching),where I "Romanize" the given search argument,and look for it trough the DB records. The results will bedecoded and converted to Persian, before sending to stdout. I've actually implemented this approach in a project. I have not yet published the code, but if you want, I can make it available under the GPL. Ehsan ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: IranSystem to Unicode (UTF-8) converter
salam nemidoonam shoma in narmafzaro darin ya na , age darin lotf konid baram send konid I just wrote a PHP script to do just that a couple of days ago at work. It's relatively simple, using Roozbeh Pournader's conversion table. All you have to do is to read the input string byte by byte, and output the appropriate UTF-8 codes in reverse order. The only gotcha I faced was if there are latin characters (or numbers) in the middle of the text, they should not be reversed. This is caused by the way IranSystem encodes strings. Ehsan ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing