Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-12-09 Thread AmirBehzad Eslami
Behdad Esfahbod wrote:  > That's the tricky part, or where the runtime-hell comes in.  What  > I did was to write a small java program based on the samples in  > Lucene to connect to my database and feed the data into Lucene.  > At search time, I have another little Java program that takes the  > query string from command line and prints out search results to  > standard output.  My PHP script then just fires up a shell script  > that in turn runs the Java program, piping the output into PHP...Knowledge is Power. (Alvin Toffler)That's a very wonderful architecture. It seems that I was blind before  reading your e-mail. I have never thought about "shell" power before,  and using it as an interface to talk with Java. I like your point of  view. Very Interesting!Thank you very much for sharing the source code!Behzad
	
		Yahoo! Shopping 
Find Great Deals on Holiday Gifts at Yahoo! Shopping ___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-12-04 Thread Behdad Esfahbod
On Wed, 30 Nov 2005, AmirBehzad Eslami wrote:

> Dear Behdad,
>
>   On 25 Nov 2005, you wrote:
>
> > Another options is to get yourself a real search engine, like
> > Apache Lucene. I've written my experience using that here:
>   >
> > http://mces.blogspot.com/2005/04/on-lucene-and-its-decency.html
>
> You always offer the most brilliant solutions!!
> Unfortunately, I have no experience with this mehotd. But I'm still eager.
> I read your weblog and met "Apache Lucene" homepage.
>
>   I'm impressed. Would you tell us how you have integrated this
> Java-driven package with PHP at http://rira.ir/ ?!!  It works
> really fast.

That's the tricky part, or where the runtime-hell comes in.  What
I did was to write a small java program based on the samples in
Lucene to connect to my database and feed the data into Lucene.
At search time, I have another little Java program that takes the
query string from command line and prints out search results to
standard output.  My PHP script then just fires up a shell script
that in turn runs the Java program, piping the output into PHP...

I don't have access to the Java codes at this time, but the PHP
code involved is available here:

  
http://cvs.sourceforge.net/viewcvs.py/rira/rira/php/page/search.php?rev=1.1.1.1&view=log


If you are developing in .NET, there is a functional port of
Lucene to .NET too.  There is even a port of an older version of
it to Python.

BTW, you need to make sure you compile it with Unicode turned on.
I don't quite remember the details, but there was some.  I also
have a Persian class written for it, but it didn't do much
anyway.  In a few weeks I will get access to rira.ir server and
hopefully move the site to the above sf.net project, so you can
see what's inside.

> Thank in advance,
>   Behzad

Cheers,

--behdad
http://behdad.org/

"Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
-- Dan Bern, "New American Language"
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-30 Thread AmirBehzad Eslami
Dear Behdad,     On 25 Nov 2005, you wrote:  > Another options is to get yourself a real search engine, like> Apache Lucene. I've written my experience using that here:  >> http://mces.blogspot.com/2005/04/on-lucene-and-its-decency.htmlYou always offer the most brilliant solutions!!Unfortunately, I have no experience with this mehotd. But I'm still eager.I read your weblog and met "Apache Lucene" homepage.  I'm impressed. Would you tell us how you have integrated this Java-driven package with PHP at http://rira.ir/ ?!!  It works really fast.Thank in advance,  Behzad
		 Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-30 Thread AmirBehzad Eslami
Dear Ehsan,  On Nov 28, 2005, you wrote:     > I've actually implemented this approach in a project.  I have not yet published the  > code, but if you want, I can make it available under the GPL.  Yes! I would appreciate it.Thank you very much for your kindness.  Behzad
		 Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-28 Thread Ehsan Akhgari



 

  
  Dear Ehsan,You suggested a creative solution. Thank you.My 
  application, consists of a database, and two user-interfaces.The first 
  UI is used for data entry,where I parse a given XML file, extract and 
  "Romanize" itsdata - based on a "Persian-Roman Conversion Map" -and 
  then insert them into DB.Luckily, PHP provides a very fast function 
  forsuch conversions, named strtr().Now I have a "Roman 
  DB".The second UI is used for data retrieval (searching),where I 
  "Romanize" the given search argument,and look for it trough the DB 
  records. The results will bedecoded and converted to Persian, before 
  sending to stdout.
I've actually implemented this approach in a 
project.  I have not yet published the code, but if you want, I can make it 
available under the GPL.
 
Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-27 Thread AmirBehzad Eslami
Mohsen wrote:> But himself solved his problem.  > with : mysql_query("SET NAMES utf8");  > Even 4.0.xWrong. I decided to prepare two different versions for my software:  - A MySQL 4.0-friendly version using Romanizing method (Hats off to you, Ehsan)  - A MySQL 4.1-compatible version.The code you mentioned belongs to the 2nd version." SET NAMES indicates what is in the SQL statements that the client  sends. Thus, SET NAMES 'cp1251' tells the server “future incoming  messages from this client are in character set cp1251.” It also  specifies the character set for results that the server sends back to  the client. (For example, it indicates what character set column values  are if you use a SELECT statement.) "MySQL Manual 4.1 -> 10.3.6. Connection Character Sets and Collations.Kind Regards,  Behzad  
		 Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-27 Thread Mohsen Pahlevanzadeh

[EMAIL PROTECTED] wrote:


AmirBehzad Eslami <[EMAIL PROTECTED]> wrote on 24/11/2005 17:48:29:

 


Dear list,

 I'm considering programming a simple "Search Engine" for a website,
 to find Arabic/Persian data within a MySQL database.
 This database contains a huge amount of data, encoded with 
   

Unicode(UTF-8). 
 


 The big deal is to ** reduce the response time ** to end-users.

 My first solution is to create an Index and use the "FULL-TEXT 
Searching" method.


 Luckily, MySQL's provides FULL-TEXT Indexing support in MyISAM tables.
 But unfortunately, it doesn't support multi-byte charsets (e.g. 
Unicode). [1]

 Technically, MySQL creates Indexes over words.
 A "word'' is any sequence of characters consisting of letters and 
numbers [2].


 Assuming this, I tried to save the records as Unicode Character 
References (&#;), but the search failed again :-(


 Any suggestion?
 I appreciate any solution to solve this problem.

 Thanks in Advance,
 Behzad


 [1] MySQL Manual -> 6.8.3 Full-text Search TODO
 [2] MySQL Manual -> 6.8 MySQL Full-text Search


 P.S.
   



*** 
 


 I use MySQL 4.0
   


***

I think this is your problem: MySQL does not properly support Unicode 
until version 4.1. I am successfully using FullText with MySQL 4.1 to sort 
UTF-8 encoded Japanese text. I see no reason why it should not work for 
Arabic - if you upgrade.


   Alec


___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing
 


But himself solved his problem.
with : mysql_query("SET NAMES utf8");
Even 4.0.x
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-27 Thread Alec . Cawley
AmirBehzad Eslami <[EMAIL PROTECTED]> wrote on 24/11/2005 17:48:29:

> Dear list,
> 
>   I'm considering programming a simple "Search Engine" for a website,
>   to find Arabic/Persian data within a MySQL database.
>   This database contains a huge amount of data, encoded with 
Unicode(UTF-8). 
> 
> 
>   The big deal is to ** reduce the response time ** to end-users.
> 
>   My first solution is to create an Index and use the "FULL-TEXT 
> Searching" method.
> 
>   Luckily, MySQL's provides FULL-TEXT Indexing support in MyISAM tables.
>   But unfortunately, it doesn't support multi-byte charsets (e.g. 
> Unicode). [1]
>   Technically, MySQL creates Indexes over words.
>   A "word'' is any sequence of characters consisting of letters and 
> numbers [2].
> 
>   Assuming this, I tried to save the records as Unicode Character 
> References (&#;), but the search failed again :-(
> 
>   Any suggestion?
>   I appreciate any solution to solve this problem.
> 
>   Thanks in Advance,
>   Behzad
> 
> 
>   [1] MySQL Manual -> 6.8.3 Full-text Search TODO
>   [2] MySQL Manual -> 6.8 MySQL Full-text Search
> 
> 
>   P.S.

*** 
>   I use MySQL 4.0
***

I think this is your problem: MySQL does not properly support Unicode 
until version 4.1. I am successfully using FullText with MySQL 4.1 to sort 
UTF-8 encoded Japanese text. I see no reason why it should not work for 
Arabic - if you upgrade.

Alec


___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-25 Thread AmirBehzad Eslami
Ehsan Akhgari wrote:> Another solution is make the db believe your text   is English.  > This could be done by "romanizing" the text before inserting   it to the db,  > and converting it back to Unicode after reading it from the db and   before  > displaying it to the user.  This can be done by choosing a Roman   letter for  > each Persian letter, and reading Persian characters one by one and   looking  > them up in a conversion table and writing the equivalent Roman   characters to  > the output.  However, this has the downside that IIRC MySQL's   full-text search  > is case-insensitive, and if I'm right in that you'd have to   choose Roman  > characters all from one case (upper or lower.)  In addition to   that, the data  > stored in the db might be difficult/impossible to use without   such a conversion.  > It's you who !
should
 judge the tradeoffs before choosing   to use this method or  > not. Dear Ehsan,  You suggested a creative solution. Thank you.My application, consists of a database, and two user-interfaces.The first UI is used for data entry,  where I parse a given XML file, extract and "Romanize" its  data - based on a "Persian-Roman Conversion Map" -  and then insert them into DB.  Luckily, PHP provides a very fast function for  such conversions, named strtr().Now I have a "Roman DB".The second UI is used for data retrieval (searching),  where I "Romanize" the given search argument,  and look for it trough the DB records. The results will be  decoded and converted to Persian, before sending to stdout.There are two disadvantages concerning this method:  - Firstly, as you pointed out, it is impossible to use the data without  the coversion. How!
ever, I
 can develop "phpMyAdmin" to handle this and  simplify data manipulation for the client.- Secondly, Romanizing adds a few overhead to the system. But while  there only 10 records to be retrieved and displayed each time, this  overhead doesn't make sense. In addition, PHP's strtr() function works  fast enough.:-DI think, your solution is the only MySQL 4.0-friendly version to  implement FULL-TEXT searching for Persian (well, that's not Persian,  the Roman ;-) )Once again, thank you for sharing your knowledge.  Behzad
		 Yahoo! Personals 
Single? There's someone we'd like you to meet. 
Lot's of someone's, actually. Try Yahoo! Personals___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-25 Thread AmirBehzad Eslami
On Nov 24, 2005, Medi Montaseri wrote:> One solution would be to augment a DB capability  > at the application level. That is instead of the search  > or select qualified by a SQL where clause, simply get  > everything (select *) and then let the application filter  > what you want. Then when your given DB provides  > that operation by itself, simplify your application  > and deligate that to DB (Query Engine). Actually, the client asked me to write a  PHP-driven search engine to locate words in  HTML resources. I'm considering MySQL as  an "Indexing" tool to store the plain-text data  and speed-up this search.The solution you explained requires that I write  my own Indexer with PHP. I'm looking for a faster  and easier way.> I'm not sure about PHP support of unicode, but I know  > Perl is pretty strong on Regular Expressions wit!
h 
 > support for Unicode as well...With the aid of "mbstring" extension,  PHP supports multi-byte characters.  In case you're interested, take a look at:1) Toppa, Michael, "Solving the Unicode Puzzle," php|arch Magazine, May 2005.      Availabe online at http://www.phparch.com/issuedata/articles/article_179.pdf  2) http://www.php.net/ref.mbstring  BTW, if you code in Perl, I have something for you:  http://www.dataparksearch.org/If you know a PHP-driven search engine like this, please let me know.  Thanks in advance,  Behzad
		 Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-24 Thread Behdad Esfahbod
On Fri, 25 Nov 2005, Ehsan Akhgari wrote:

>
>   One solution would be to augment a DB capability
>   at the application level. That is instead of the search
>   or select qualified by a SQL where clause, simply get
>   everything (select *) and then let the application filter
>   what you want. Then when your given DB provides
>   that operation by itself, simplify your application
>   and deligate that to DB (Query Engine).
>
> Another solution is make the db believe your text is English.
> This could be done by "romanizing" the text before inserting it
> to the db, and converting it back to Unicode after reading it
> from the db and before displaying it to the user.  This can be
> done by choosing a Roman letter for each Persian letter, and
> reading Persian characters one by one and looking them up in a
> conversion table and writing the equivalent Roman characters to
> the output.  However, this has the downside that IIRC MySQL's
> full-text search is case-insensitive, and if I'm right in that
> you'd have to choose Roman characters all from one case (upper
> or lower.)  In addition to that, the data stored in the db
> might be difficult/impossible to use without such a conversion.
> It's you who should judge the tradeoffs before choosing to use
> this method or not.
>
> For some good romanizing scripts, check out 
> http://home.byu.net/jmd56/download.html.

Another options is to get yourself a real search engine, like
Apache Lucene.  I've written my experience using that here:

  http://mces.blogspot.com/2005/04/on-lucene-and-its-decency.html


> Ehsan

--behdad
http://behdad.org/

"Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
-- Dan Bern, "New American Language"
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-24 Thread Ehsan Akhgari



 

  One solution would be to augment a DB capabilityat the application 
  level. That is instead of the searchor select qualified by a SQL where 
  clause, simply geteverything (select *) and then let the application 
  filterwhat you want. Then when your given DB providesthat operation by 
  itself, simplify your applicationand deligate that to DB (Query Engine). 
  
Another solution is make the db believe your text 
is English.  This could be done by "romanizing" the text before inserting 
it to the db, and converting it back to Unicode after reading it from the db and 
before displaying it to the user.  This can be done by choosing a Roman 
letter for each Persian letter, and reading Persian characters one by one and 
looking them up in a conversion table and writing the equivalent Roman 
characters to the output.  However, this has the downside that IIRC MySQL's 
full-text search is case-insensitive, and if I'm right in that you'd have to 
choose Roman characters all from one case (upper or lower.)  In addition to 
that, the data stored in the db might be difficult/impossible to use without 
such a conversion.  It's you who should judge the tradeoffs before choosing 
to use this method or not.
 
For some good romanizing scripts, check out http://home.byu.net/jmd56/download.html.
 
Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-24 Thread Medi Montaseri
One solution would be to augment a DB capability
at the application level. That is instead of the search
or select qualified by a SQL where clause, simply get
everything (select *) and then let the application filter
what you want. Then when your given DB provides
that operation by itself, simplify your application
and deligate that to DB (Query Engine). 

I'm not sure about PHP support of unicode, but I know
Perl is pretty strong on Regular Expressions with 
support for Unicode as well...

MediOn 11/24/05, Behnam Esfahbod <[EMAIL PROTECTED]> wrote:
AmirBehzad Eslami wrote:> 2) Find another Web Hosting Company with PHP and MySQL 4.1 support.>> Would you (or anyone else in the list) recommend a reliable Web Hosting> Company with such services?!
>You may like to see www.1and1.com.  It's been our web hosting for 2years now.-- ' ' Behnam Esfahbod'   *  ..   
http://zwnj.org  *  `  *  http://zwnj.info   * o *   http://behnam.esfahbod.info___PersianComputing mailing list
PersianComputing@lists.sharif.eduhttp://lists.sharif.edu/mailman/listinfo/persiancomputing

___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-24 Thread Behnam Esfahbod

AmirBehzad Eslami wrote:

2) Find another Web Hosting Company with PHP and MySQL 4.1 support.

Would you (or anyone else in the list) recommend a reliable Web Hosting 
Company with such services?!




You may like to see www.1and1.com.  It's been our web hosting for 2 
years now.



--
'
' Behnam Esfahbod
   '
  *  ..   http://zwnj.org
 *  `  *  http://zwnj.info
  * o *   http://behnam.esfahbod.info
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-24 Thread AmirBehzad Eslami
Mohsen wrote:  > Please use MySQL 4.1 or higher.Dear Mohsen,Nice to e-meet(!) you here, at PersianComputing mailing list!Thanks for your advice.  I just heared the same message from MySQL geeks at [EMAIL PROTECTED]I already know that MySQL 4.1 supports Unicode[1], and I can install  and use it on my own computer. So, why I'm bothering you here?Here's the problem:  HostRocket.com - my prefered company for web hosting - have not  installed MySQL 4.1 yet. They still use MySQL 4.0.2 and they won't  install MySQL 4.1 :-(What Can I Do Now?  ===  1) To find a "MySQL 4.0-Friendly" method to perform quick searches. That's why I'm here, asking people to help me.2) Find another Web Hosting Company with PHP and MySQL 4.1 support.  Would you (or anyone else in the list) recommend a reliable Web Hosting Company with such services?!
 
 Thanks in advance,  Behzad  [1] http://lists.mysql.com/mysql/155039
		 Yahoo! Music Unlimited - Access over 1 million songs. Try it free.___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-24 Thread Mohsen Pahlevanzadeh

AmirBehzad Eslami wrote:


Dear list,

I'm considering programming a simple "Search Engine" for a website,
to find Arabic/Persian data within a MySQL database.
This database contains a huge amount of data, encoded with Unicode 
(UTF-8).



The big deal is to ** reduce the response time ** to end-users.

My first solution is to create an Index and use the "FULL-TEXT 
Searching" method.


Luckily, MySQL's provides FULL-TEXT Indexing support in MyISAM tables.
But unfortunately, it doesn't support multi-byte charsets (e.g. 
Unicode). [1]

Technically, MySQL creates Indexes over words.
A "word'' is any sequence of characters consisting of letters and 
numbers [2].


Assuming this, I tried to save the records as Unicode Character 
References (&#;), but the search failed again :-(


Any suggestion?
I appreciate any solution to solve this problem.

Thanks in Advance,
Behzad
*
[1] MySQL Manual -> 6.8.3 Full-text Search TODO
[2] MySQL Manual -> 6.8 MySQL Full-text Search


P.S.

I use MySQL 4.0

1) Table Strucutre

CREATE TABLE `articles` (
`article_id` int(10) unsigned NOT NULL auto_increment,
`article_title` NATIONAL varchar(255) NOT NULL default '',
`article_text` text NOT NULL,
PRIMARY KEY (`article_id`),
FULLTEXT (`article_title`,`article_text`)
) TYPE=MyISAM ;

ALTER TABLE `articles` CHARACTER SET ut8;

2) SQL-Query to Perform a Full-text search

SELECT * FROM articles WHERE MATCH(article_title, article_text) 
AGAINST('سوال')


*

* *


* Yahoo! Music Unlimited - Access over 1 million songs. Try it free. 
 
*


*
*

*
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing
*


Please use MySQL 4.1 or higher.
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing