RE: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'? [SOLVED]
-Original Message- From: Daevid Vincent [mailto:[EMAIL PROTECTED] Sent: Friday, May 04, 2007 1:22 AM To: mysql@lists.mysql.com Subject: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'? I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. I'm doing this in PHP and mySQL (of course), so if it can't be done with a single query, I can split it up. Here's the challenge, given a text field search box, someone enters: Sony 20 TV How do I search for that, not knowing which fields are which? For example, they could have also entered: 20 Sony TV This is the one I have now, but (as you probably noticed), it will return many rows, I expect that most of the time 1 row will be returned, but I'm getting a grip more than I want (or the customer would want), and also rows that have nothing to do with the search terms. SELECT products.*, companies.name AS company_name, categories.name AS category_name FROM products LEFT JOIN companies ON company_id = companies.id LEFT JOIN categories ON category_id = categories.id WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' OR products.model LIKE '20%' OR products.model LIKE 'tv%') OR (products.upc LIKE 'sony' OR products.upc LIKE '20' OR products.upc LIKE 'tv') OR (products.name LIKE '%sony%' OR products.name LIKE '20%' OR products.name LIKE '%tv%') OR (companies.name LIKE 'sony%' OR companies.name LIKE '20%' OR companies.name LIKE 'tv%') OR (categories.name LIKE '%sony%' OR categories.name LIKE '20%' OR categories.name LIKE '%tv%') ) ORDER BY categories.name DESC, products.name ASC, companies.name ASC; (and that just gets uglier the more words in the search) ++--+--+--+--- + | id | name | model| company_name | category_name | ++--+--+--+--- + | 1 | 20 TV | STV20-KVR-HD | Sony | Tube | --- | 2 | 36 TV | STV36-KVR-HD | Sony | Tube | | 4 | Ultra-Plasma 62 | UP62F900 | Sony | Plasma| | 5 | Value Plasma 38 | VPR542_38| Sony | Plasma| | 6 | Power-MP3 5gb| 09834wuw34 | Sony | MP3 Players | | 3 | Super-LCD 42| SLCD42hd002 | Sony | LCD | | 7 | Super-Player 1gb | SP1gb| Sony | Flash | | 8 | Porta CD | pcd500 | Sony | CD Players| .. ++--+--+--+--- + Obviously the person wanted id = 1 in this case. Unrelated, is there any speed improvement using JOIN instead of LEFT JOIN ? Think millions of products. Thanks for help and suggestions... Daevid. I'll attach a .php file, but this list server may strip it off, so I'll also paste it below, sorry for any formatting issues in advance... ?php if ($_POST['keywords']) { $_POST['keywords'] = stripslashes($_POST['keywords']); $words = preg_split(/\s+/,$_POST['keywords'], -1, PREG_SPLIT_NO_EMPTY); } $sql = 'SELECT products.* FROM product_table WHERE 1 '; $sql .= keyword_filter($words, array('products.model%', 'products.upc', '%products.name%', 'companies.name%', '%categories.name%'), true); $sth = SQL_QUERY($sql); /** * Builds the WHERE portion of a SQL statement using the keywords in various columns with wildcard support. * * @return string SQL statement fragment * @parammixed $words either a string of words space deliminated or an array of words * @paramarray $columns an array of table.column names to search the $words in. Use % as a wildcard for example pass in 'username%' or '%username%'. * @paramboolean $and (true) whether the words have to be ANDed or ORed together. * @author Daevid Vincent [EMAIL PROTECTED] * @since1.0 * @version 1.4 * @date 05/10/07 * @todo This should handle +, - and just like google or yahoo or other search engines do. */ function keyword_filter($words, $columns, $and = true) { // this maybe useful // http://wiki.ittoolbox.com/index.php/Code:Translate_Boolean_Query_to_SQL_select_statement // http://www.ibiblio.org/adriane/queries/ // http://www.zend.com/zend/tut/tutorial-ferrara1.php?article=tutorial-ferrara1kind=tid=8238open=1anc=0view=1 // http://evolt.org/article/Boolean_Fulltext_Searching_with_PHP_and_MySQL/18/15665/index.html // http://www.databasejournal.com/features/mysql/article.php/3512461 // this would be great, but the dumb-asses don't work with InnoDB tables. GRRR! // http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
Re: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
On 4 May 2007 at 1:21, Daevid Vincent wrote: I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. [snip] WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' OR products.model LIKE '20%' OR products.model LIKE 'tv%') OR (products.upc LIKE 'sony' OR products.upc LIKE '20' OR products.upc LIKE 'tv') OR (products.name LIKE '%sony%' OR products.name LIKE '20%' OR products.name LIKE '%tv%') OR (companies.name LIKE 'sony%' OR companies.name LIKE '20%' OR companies.name LIKE 'tv%') OR (categories.name LIKE '%sony%' OR categories.name LIKE '20%' OR categories.name LIKE '%tv%') ) [snip] It seems to me that the logic you're looking for is something more like (products.model LIKE 'sony%' OR products.upc LIKE 'sony' OR products.name LIKE '%sony%' OR companies.name LIKE 'sony%' OR categories.name LIKE '%sony%' ) AND ( products.model LIKE '20%' OR products.upc LIKE '20' OR products.name LIKE '20%' OR companies.name LIKE '20%' OR categories.name LIKE '20%' ) AND ( products.model LIKE 'tv%' OR products.upc LIKE 'tv' OR products.name LIKE '%tv%' OR companies.name LIKE 'tv%' OR categories.name LIKE '%tv%' ) so that each of the search terms appears in at least one of the relevant columns. -- Iain Alexander [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
-Original Message- From: Iain Alexander [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 09, 2007 3:11 PM To: mysql@lists.mysql.com Subject: Re: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'? On 4 May 2007 at 1:21, Daevid Vincent wrote: I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. [snip] WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' OR products.model LIKE '20%' OR products.model LIKE 'tv%') OR (products.upc LIKE 'sony' OR products.upc LIKE '20' OR products.upc LIKE 'tv') OR (products.name LIKE '%sony%' OR products.name LIKE '20%' OR products.name LIKE '%tv%') OR (companies.name LIKE 'sony%' OR companies.name LIKE '20%' OR companies.name LIKE 'tv%') OR (categories.name LIKE '%sony%' OR categories.name LIKE '20%' OR categories.name LIKE '%tv%') ) [snip] It seems to me that the logic you're looking for is something more like (products.model LIKE 'sony%' OR products.upc LIKE 'sony' OR products.name LIKE '%sony%' OR companies.name LIKE 'sony%' OR categories.name LIKE '%sony%' ) AND ( products.model LIKE '20%' OR products.upc LIKE '20' OR products.name LIKE '20%' OR companies.name LIKE '20%' OR categories.name LIKE '20%' ) AND ( products.model LIKE 'tv%' OR products.upc LIKE 'tv' OR products.name LIKE '%tv%' OR companies.name LIKE 'tv%' OR categories.name LIKE '%tv%' ) so that each of the search terms appears in at least one of the relevant columns. OMG! I think you are on to something. I just tried this, and I got one row. Exactly what I wanted. I'll have to poke at this some more, and tweak my PHP that autogenerates the SQL, but I may just be naming my first born Iain. :) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
Daevid, It occurs to me that what you were explaining was multi-column keyword search which MySQL does not support as of now. You may even face scenarios where you need multi-table and multi-column keyword search (even in your example, this may come up). In brief, you have a set of keywords and you don't know in which column of a table they may appear (multi-column kw-search), or you don't know in which (joined) tables they may appear (multi-table kw-search). I am working on a generic module to do this, however not sure when that would be available. Meanwhile what you can do, given that you're doing PHP+MySQL is to use free-text indexes on multiple textual columns, then write a php module that would iterate over the columns, runs keyword search on single column, then integrates the results. This will not be efficient, but as of now, this type of logical keyword search must be handled outside the database, I guess ... --mayssam On 5/4/07, Daevid Vincent [EMAIL PROTECTED] wrote: -Original Message- From: Baron Schwartz [mailto:[EMAIL PROTECTED] Daevid Vincent wrote: I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. I'm doing this in PHP and mySQL (of course), so if it can't be done with a single query, I can split it up. Here's the challenge, given a text field search box, someone enters: Sony 20 TV How do I search for that, not knowing which fields are which? For example, they could have also entered: 20 Sony TV I think you're describing full-text indexing. MySQL supports it but only on MyISAM tables. If you don't want to use MyISAM, full-text search engines like Lucene or Sphinx may be worth looking at. I don't think I am. While full-text indexing might help since the indexes would be faster. I think this is a logic issue. The full-text index would be useful on a TEXT or BLOB or some long varchar field, but it doesn't solve that I'm trying to pull from two different tables, Product and Company and mapping the free-form string to fields that could be one of several. I think my first attempt is close, but it's something to do with all the AND and OR combinations that's not right. My version gives many results because it matches (SONY OR TV OR 20). I need it to match (SONY AND TV AND 20) But this isn't it either (returns 0 results) because some fields, like the categories.name, products.upc and products.model don't match so the entire condition fails. SELECT products.*, companies.name AS company_name, categories.name AS category_name FROM products LEFT JOIN companies ON company_id = companies.id LEFT JOIN categories ON category_id = categories.id WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' OR products.model LIKE '20%' OR products.model LIKE 'tv%') AND (products.upc LIKE 'sony' OR products.upc LIKE '20' OR products.upc LIKE 'tv') AND (products.name LIKE '%sony%' OR products.name LIKE '20%' OR products.name LIKE '%tv%') AND (companies.name LIKE 'sony%' OR companies.name LIKE '20%' OR companies.name LIKE 'tv%') AND (categories.name LIKE '%sony%' OR categories.name LIKE '20%' OR categories.name LIKE '%tv%') ) ORDER BY categories.name DESC, products.name ASC, companies.name ASC; Also, the 'SONY' is really the companies.name, while the '20' _AND_ 'TV' together form '20 TV' to make the products.name. +--+--+--+--+ | name | model| upc | company_name | +--+--+--+--+ | 20 TV | STV20-KVR-HD | 097855008633 | Sony | | 36 TV | STV36-KVR-HD | 087452047023 | Sony | +--+--+--+--+ One way might be to do three separate queries, one for each word. Then store them in an array and compare the overlaps, removing any that aren't shared. Then a final query where product.id IN(array) That seems extremely inefficient and hackish though. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. I'm doing this in PHP and mySQL (of course), so if it can't be done with a single query, I can split it up. Here's the challenge, given a text field search box, someone enters: Sony 20 TV How do I search for that, not knowing which fields are which? For example, they could have also entered: 20 Sony TV This is the one I have now, but (as you probably noticed), it will return many rows, I expect that most of the time 1 row will be returned, but I'm getting a grip more than I want (or the customer would want), and also rows that have nothing to do with the search terms. SELECT products.*, companies.name AS company_name, categories.name AS category_name FROM products LEFT JOIN companies ON company_id = companies.id LEFT JOIN categories ON category_id = categories.id WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' OR products.model LIKE '20%' OR products.model LIKE 'tv%') OR (products.upc LIKE 'sony' OR products.upc LIKE '20' OR products.upc LIKE 'tv') OR (products.name LIKE '%sony%' OR products.name LIKE '20%' OR products.name LIKE '%tv%') OR (companies.name LIKE 'sony%' OR companies.name LIKE '20%' OR companies.name LIKE 'tv%') OR (categories.name LIKE '%sony%' OR categories.name LIKE '20%' OR categories.name LIKE '%tv%') ) ORDER BY categories.name DESC, products.name ASC, companies.name ASC; (and that just gets uglier the more words in the search) ++--+--+--+---+ | id | name | model| company_name | category_name | ++--+--+--+---+ | 1 | 20 TV | STV20-KVR-HD | Sony | Tube | --- | 2 | 36 TV | STV36-KVR-HD | Sony | Tube | | 4 | Ultra-Plasma 62 | UP62F900 | Sony | Plasma| | 5 | Value Plasma 38 | VPR542_38| Sony | Plasma| | 6 | Power-MP3 5gb| 09834wuw34 | Sony | MP3 Players | | 3 | Super-LCD 42| SLCD42hd002 | Sony | LCD | | 7 | Super-Player 1gb | SP1gb| Sony | Flash | | 8 | Porta CD | pcd500 | Sony | CD Players| .. ++--+--+--+---+ Obviously the person wanted id = 1 in this case. Unrelated, is there any speed improvement using JOIN instead of LEFT JOIN ? Think millions of products. Thanks for help and suggestions... Daevid. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
On Fri, May 4, 2007 10:21, Daevid Vincent wrote: I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. I'm doing this in PHP and mySQL (of course), so if it can't be done with a single query, I can split it up. Here's the challenge, given a text field search box, someone enters: Sony 20 TV How do I search for that, not knowing which fields are which? For example, they could have also entered: 20 Sony TV How about: select soundex('Sony 20 TV' ),soundex('20 Sony TV'); 'S531', 'S531' -- Later Mogens Melander +45 40 85 71 38 +66 870 133 224 -- This message has been scanned for viruses and dangerous content by OpenProtect(http://www.openprotect.com), and is believed to be clean. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
Hi Daevid, Daevid Vincent wrote: I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. I'm doing this in PHP and mySQL (of course), so if it can't be done with a single query, I can split it up. Here's the challenge, given a text field search box, someone enters: Sony 20 TV How do I search for that, not knowing which fields are which? For example, they could have also entered: 20 Sony TV I think you're describing full-text indexing. MySQL supports it but only on MyISAM tables. If you don't want to use MyISAM, full-text search engines like Lucene or Sphinx may be worth looking at. Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
-Original Message- From: Baron Schwartz [mailto:[EMAIL PROTECTED] Daevid Vincent wrote: I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. I'm doing this in PHP and mySQL (of course), so if it can't be done with a single query, I can split it up. Here's the challenge, given a text field search box, someone enters: Sony 20 TV How do I search for that, not knowing which fields are which? For example, they could have also entered: 20 Sony TV I think you're describing full-text indexing. MySQL supports it but only on MyISAM tables. If you don't want to use MyISAM, full-text search engines like Lucene or Sphinx may be worth looking at. I don't think I am. While full-text indexing might help since the indexes would be faster. I think this is a logic issue. The full-text index would be useful on a TEXT or BLOB or some long varchar field, but it doesn't solve that I'm trying to pull from two different tables, Product and Company and mapping the free-form string to fields that could be one of several. I think my first attempt is close, but it's something to do with all the AND and OR combinations that's not right. My version gives many results because it matches (SONY OR TV OR 20). I need it to match (SONY AND TV AND 20) But this isn't it either (returns 0 results) because some fields, like the categories.name, products.upc and products.model don't match so the entire condition fails. SELECT products.*, companies.name AS company_name, categories.name AS category_name FROM products LEFT JOIN companies ON company_id = companies.id LEFT JOIN categories ON category_id = categories.id WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' OR products.model LIKE '20%' OR products.model LIKE 'tv%') AND (products.upc LIKE 'sony' OR products.upc LIKE '20' OR products.upc LIKE 'tv') AND (products.name LIKE '%sony%' OR products.name LIKE '20%' OR products.name LIKE '%tv%') AND (companies.name LIKE 'sony%' OR companies.name LIKE '20%' OR companies.name LIKE 'tv%') AND (categories.name LIKE '%sony%' OR categories.name LIKE '20%' OR categories.name LIKE '%tv%') ) ORDER BY categories.name DESC, products.name ASC, companies.name ASC; Also, the 'SONY' is really the companies.name, while the '20' _AND_ 'TV' together form '20 TV' to make the products.name. +--+--+--+--+ | name | model| upc | company_name | +--+--+--+--+ | 20 TV | STV20-KVR-HD | 097855008633 | Sony | | 36 TV | STV36-KVR-HD | 087452047023 | Sony | +--+--+--+--+ One way might be to do three separate queries, one for each word. Then store them in an array and compare the overlaps, removing any that aren't shared. Then a final query where product.id IN(array) That seems extremely inefficient and hackish though. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: How do I find products when a user types freeform strings like 'Sony 20 TV' or '20 Sony TV'?
Hi Daevid, Daevid Vincent wrote: -Original Message- From: Baron Schwartz [mailto:[EMAIL PROTECTED] Daevid Vincent wrote: I'm having trouble figuring out the logic/query I want. I know that all those ORs are not right. I'm doing this in PHP and mySQL (of course), so if it can't be done with a single query, I can split it up. Here's the challenge, given a text field search box, someone enters: Sony 20 TV How do I search for that, not knowing which fields are which? For example, they could have also entered: 20 Sony TV I think you're describing full-text indexing. MySQL supports it but only on MyISAM tables. If you don't want to use MyISAM, full-text search engines like Lucene or Sphinx may be worth looking at. I don't think I am. While full-text indexing might help since the indexes would be faster. I think this is a logic issue. The full-text index would be useful on a TEXT or BLOB or some long varchar field, but it doesn't solve that I'm trying to pull from two different tables, Product and Company and mapping the free-form string to fields that could be one of several. I think my first attempt is close, but it's something to do with all the AND and OR combinations that's not right. My version gives many results because it matches (SONY OR TV OR 20). I need it to match (SONY AND TV AND 20) But this isn't it either (returns 0 results) because some fields, like the categories.name, products.upc and products.model don't match so the entire condition fails. SELECT products.*, companies.name AS company_name, categories.name AS category_name FROM products LEFT JOIN companies ON company_id = companies.id LEFT JOIN categories ON category_id = categories.id WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' OR products.model LIKE '20%' OR products.model LIKE 'tv%') AND (products.upc LIKE 'sony' OR products.upc LIKE '20' OR products.upc LIKE 'tv') AND (products.name LIKE '%sony%' OR products.name LIKE '20%' OR products.name LIKE '%tv%') AND (companies.name LIKE 'sony%' OR companies.name LIKE '20%' OR companies.name LIKE 'tv%') AND (categories.name LIKE '%sony%' OR categories.name LIKE '20%' OR categories.name LIKE '%tv%') ) ORDER BY categories.name DESC, products.name ASC, companies.name ASC; Also, the 'SONY' is really the companies.name, while the '20' _AND_ 'TV' together form '20 TV' to make the products.name. +--+--+--+--+ | name | model| upc | company_name | +--+--+--+--+ | 20 TV | STV20-KVR-HD | 097855008633 | Sony | | 36 TV | STV36-KVR-HD | 087452047023 | Sony | +--+--+--+--+ One way might be to do three separate queries, one for each word. Then store them in an array and compare the overlaps, removing any that aren't shared. Then a final query where product.id IN(array) That seems extremely inefficient and hackish though. I misunderstood what you meant at first. So, you want rows where all words appear in the row, rather than where all words appear in any one column. How about this: every time you get a match on a term in any column, count it as 1. Then add all these and compare to the number of words in your search input, which you can determine either in SQL with a little text wrangling, or probably more easily in whatever the client code is with a split() or similar. SELECT products.*, companies.name AS company_name, categories.name AS category_name FROM products LEFT JOIN companies ON company_id = companies.id LEFT JOIN categories ON category_id = categories.id WHERE products.enabled = 1 AND( (products.model LIKE 'sony%' + products.model LIKE '20%' products.model LIKE 'tv%') + (products.upc LIKE 'sony' + products.upc LIKE '20' + products.upc LIKE 'tv') + (products.name LIKE '%sony%' + products.name LIKE '20%' + products.name LIKE '%tv%') + (companies.name LIKE 'sony%' + companies.name LIKE '20%' + companies.name LIKE 'tv%') + (categories.name LIKE '%sony%' + categories.name LIKE '20%' + categories.name LIKE '%tv%') ) = [$number_of_words_in_input] ORDER BY categories.name DESC, products.name ASC, companies.name ASC; This will be ugly and inefficient though. It might be better to build and maintain a separate table with the concatenation of all the fields, and fulltext index that. Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]