Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
Behdad Esfahbod wrote: That's the tricky part, or where the runtime-hell comes in. What I did was to write a small java program based on the samples in Lucene to connect to my database and feed the data into Lucene. At search time, I have another little Java program that takes the query string from command line and prints out search results to standard output. My PHP script then just fires up a shell script that in turn runs the Java program, piping the output into PHP...Knowledge is Power. (Alvin Toffler)That's a very wonderful architecture. It seems that I was blind before reading your e-mail. I have never thought about "shell" power before, and using it as an interface to talk with Java. I like your point of view. Very Interesting!Thank you very much for sharing the source code!Behzad Yahoo! Shopping Find Great Deals on Holiday Gifts at Yahoo! Shopping ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
On Wed, 30 Nov 2005, AmirBehzad Eslami wrote: Dear Behdad, On 25 Nov 2005, you wrote: Another options is to get yourself a real search engine, like Apache Lucene. I've written my experience using that here: http://mces.blogspot.com/2005/04/on-lucene-and-its-decency.html You always offer the most brilliant solutions!! Unfortunately, I have no experience with this mehotd. But I'm still eager. I read your weblog and met Apache Lucene homepage. I'm impressed. Would you tell us how you have integrated this Java-driven package with PHP at http://rira.ir/ ?!! It works really fast. That's the tricky part, or where the runtime-hell comes in. What I did was to write a small java program based on the samples in Lucene to connect to my database and feed the data into Lucene. At search time, I have another little Java program that takes the query string from command line and prints out search results to standard output. My PHP script then just fires up a shell script that in turn runs the Java program, piping the output into PHP... I don't have access to the Java codes at this time, but the PHP code involved is available here: http://cvs.sourceforge.net/viewcvs.py/rira/rira/php/page/search.php?rev=1.1.1.1view=log If you are developing in .NET, there is a functional port of Lucene to .NET too. There is even a port of an older version of it to Python. BTW, you need to make sure you compile it with Unicode turned on. I don't quite remember the details, but there was some. I also have a Persian class written for it, but it didn't do much anyway. In a few weeks I will get access to rira.ir server and hopefully move the site to the above sf.net project, so you can see what's inside. Thank in advance, Behzad Cheers, --behdad http://behdad.org/ Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill -- Dan Bern, New American Language ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
Dear Behdad,On 25 Nov 2005, you wrote: Another options is to get yourself a real search engine, like Apache Lucene. I've written my experience using that here: http://mces.blogspot.com/2005/04/on-lucene-and-its-decency.htmlYou always offer the most brilliant solutions!!Unfortunately, I have no experience with this mehotd. But I'm still eager.I read your weblog and met "Apache Lucene" homepage. I'm impressed. Would you tell ushow you have integrated this Java-driven package with PHP at http://rira.ir/ ?!! It worksreally fast.Thank in advance, Behzad Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
Dear Ehsan,You suggested a creative solution. Thank you.My application, consists of a database, and two user-interfaces.The first UI is used for data entry,where I parse a given XML file, extract and "Romanize" itsdata - based on a "Persian-Roman Conversion Map" -and then insert them into DB.Luckily, PHP provides a very fast function forsuch conversions, named strtr().Now I have a "Roman DB".The second UI is used for data retrieval (searching),where I "Romanize" the given search argument,and look for it trough the DB records. The results will bedecoded and converted to Persian, before sending to stdout. I've actually implemented this approach in a project. I have not yet published the code, but if you want, I can make it available under the GPL. Ehsan ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
AmirBehzad Eslami [EMAIL PROTECTED] wrote on 24/11/2005 17:48:29: Dear list, I'm considering programming a simple Search Engine for a website, to find Arabic/Persian data within a MySQL database. This database contains a huge amount of data, encoded with Unicode(UTF-8). The big deal is to ** reduce the response time ** to end-users. My first solution is to create an Index and use the FULL-TEXT Searching method. Luckily, MySQL's provides FULL-TEXT Indexing support in MyISAM tables. But unfortunately, it doesn't support multi-byte charsets (e.g. Unicode). [1] Technically, MySQL creates Indexes over words. A word'' is any sequence of characters consisting of letters and numbers [2]. Assuming this, I tried to save the records as Unicode Character References (#;), but the search failed again :-( Any suggestion? I appreciate any solution to solve this problem. Thanks in Advance, Behzad [1] MySQL Manual - 6.8.3 Full-text Search TODO [2] MySQL Manual - 6.8 MySQL Full-text Search P.S. *** I use MySQL 4.0 *** I think this is your problem: MySQL does not properly support Unicode until version 4.1. I am successfully using FullText with MySQL 4.1 to sort UTF-8 encoded Japanese text. I see no reason why it should not work for Arabic - if you upgrade. Alec ___ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing