Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-12-09 Thread AmirBehzad Eslami
Behdad Esfahbod wrote:   That's the tricky part, or where the runtime-hell comes in.  What   I did was to write a small java program based on the samples in   Lucene to connect to my database and feed the data into Lucene.   At search time, I have another little Java program that takes the   query string from command line and prints out search results to   standard output.  My PHP script then just fires up a shell script   that in turn runs the Java program, piping the output into PHP...Knowledge is Power. (Alvin Toffler)That's a very wonderful architecture. It seems that I was blind before  reading your e-mail. I have never thought about "shell" power before,  and using it as an interface to talk with Java. I like your point of  view. Very Interesting!Thank you very much for sharing the source code!Behzad
	
		Yahoo! Shopping 
Find Great Deals on Holiday Gifts at Yahoo! Shopping ___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-12-04 Thread Behdad Esfahbod
On Wed, 30 Nov 2005, AmirBehzad Eslami wrote:

 Dear Behdad,

   On 25 Nov 2005, you wrote:

  Another options is to get yourself a real search engine, like
  Apache Lucene. I've written my experience using that here:
   
  http://mces.blogspot.com/2005/04/on-lucene-and-its-decency.html

 You always offer the most brilliant solutions!!
 Unfortunately, I have no experience with this mehotd. But I'm still eager.
 I read your weblog and met Apache Lucene homepage.

   I'm impressed. Would you tell us how you have integrated this
 Java-driven package with PHP at http://rira.ir/ ?!!  It works
 really fast.

That's the tricky part, or where the runtime-hell comes in.  What
I did was to write a small java program based on the samples in
Lucene to connect to my database and feed the data into Lucene.
At search time, I have another little Java program that takes the
query string from command line and prints out search results to
standard output.  My PHP script then just fires up a shell script
that in turn runs the Java program, piping the output into PHP...

I don't have access to the Java codes at this time, but the PHP
code involved is available here:

  
http://cvs.sourceforge.net/viewcvs.py/rira/rira/php/page/search.php?rev=1.1.1.1view=log


If you are developing in .NET, there is a functional port of
Lucene to .NET too.  There is even a port of an older version of
it to Python.

BTW, you need to make sure you compile it with Unicode turned on.
I don't quite remember the details, but there was some.  I also
have a Persian class written for it, but it didn't do much
anyway.  In a few weeks I will get access to rira.ir server and
hopefully move the site to the above sf.net project, so you can
see what's inside.

 Thank in advance,
   Behzad

Cheers,

--behdad
http://behdad.org/

Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill
-- Dan Bern, New American Language
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-30 Thread AmirBehzad Eslami
Dear Behdad,On 25 Nov 2005, you wrote:   Another options is to get yourself a real search engine, like Apache Lucene. I've written my experience using that here:   http://mces.blogspot.com/2005/04/on-lucene-and-its-decency.htmlYou always offer the most brilliant solutions!!Unfortunately, I have no experience with this mehotd. But I'm still eager.I read your weblog and met "Apache Lucene" homepage.  I'm impressed. Would you tell ushow you have integrated this Java-driven package with PHP at http://rira.ir/ ?!! It worksreally fast.Thank in advance,  Behzad
		 Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-28 Thread Ehsan Akhgari





  
  Dear Ehsan,You suggested a creative solution. Thank you.My 
  application, consists of a database, and two user-interfaces.The first 
  UI is used for data entry,where I parse a given XML file, extract and 
  "Romanize" itsdata - based on a "Persian-Roman Conversion Map" -and 
  then insert them into DB.Luckily, PHP provides a very fast function 
  forsuch conversions, named strtr().Now I have a "Roman 
  DB".The second UI is used for data retrieval (searching),where I 
  "Romanize" the given search argument,and look for it trough the DB 
  records. The results will bedecoded and converted to Persian, before 
  sending to stdout.
I've actually implemented this approach in a 
project. I have not yet published the code, but if you want, I can make it 
available under the GPL.

Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-27 Thread Alec . Cawley
AmirBehzad Eslami [EMAIL PROTECTED] wrote on 24/11/2005 17:48:29:

 Dear list,
 
   I'm considering programming a simple Search Engine for a website,
   to find Arabic/Persian data within a MySQL database.
   This database contains a huge amount of data, encoded with 
Unicode(UTF-8). 
 
 
   The big deal is to ** reduce the response time ** to end-users.
 
   My first solution is to create an Index and use the FULL-TEXT 
 Searching method.
 
   Luckily, MySQL's provides FULL-TEXT Indexing support in MyISAM tables.
   But unfortunately, it doesn't support multi-byte charsets (e.g. 
 Unicode). [1]
   Technically, MySQL creates Indexes over words.
   A word'' is any sequence of characters consisting of letters and 
 numbers [2].
 
   Assuming this, I tried to save the records as Unicode Character 
 References (#;), but the search failed again :-(
 
   Any suggestion?
   I appreciate any solution to solve this problem.
 
   Thanks in Advance,
   Behzad
 
 
   [1] MySQL Manual - 6.8.3 Full-text Search TODO
   [2] MySQL Manual - 6.8 MySQL Full-text Search
 
 
   P.S.

*** 
   I use MySQL 4.0
***

I think this is your problem: MySQL does not properly support Unicode 
until version 4.1. I am successfully using FullText with MySQL 4.1 to sort 
UTF-8 encoded Japanese text. I see no reason why it should not work for 
Arabic - if you upgrade.

Alec


___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-27 Thread AmirBehzad Eslami
Mohsen wrote: But himself solved his problem.   with : mysql_query("SET NAMES utf8");   Even 4.0.xWrong. I decided to prepare two different versions for my software:  - A MySQL 4.0-friendly version using Romanizing method (Hats off to you, Ehsan)  - A MySQL 4.1-compatible version.The code you mentioned belongs to the 2nd version." SET NAMES indicates what is in the SQL statements that the client  sends. Thus, SET NAMES 'cp1251' tells the server “future incoming  messages from this client are in character set cp1251.” It also  specifies the character set for results that the server sends back to  the client. (For example, it indicates what character set column values  are if you use a SELECT statement.) "MySQL Manual 4.1 - 10.3.6. Connection Character Sets and Collations.Kind Regards,  Behzad  
		 Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing