Re: Use SOLR like the "MySQL LIKE"
On Tue, 18 Nov 2008 14:26:02 +0100 "Aleksander M. Stensby" <[EMAIL PROTECTED]> wrote: > Well, then I suggest you index the field in two different ways if you want > both possible ways of searching. One, where you treat the entire name as > one token (in lowercase) (then you can search for avera* and match on for > instance "average joe" etc.) And then another field where you tokenize on > whitespace for instance, if you want/need that possibility aswell. Look at > the solr copy fields and try it out, it works like a charm :) You should also make extensive use of analysis.jsp to see how data in your field (1) is tokenized, filtered and indexed, and how your search terms are tokenized, filtered and matched against (1). Hint 1 : check all the checkboxes ;) Hint 2: you don't need to reindex all your data, just enter test data in the form and give it a go. You will of course have to tweak schema.xml and restart your service when you do this. good luck, B _ {Beto|Norberto|Numard} Meijome "Intellectual: 'Someone who has been educated beyond his/her intelligence'" Arthur C. Clarke, from "3001, The Final Odyssey", Sources. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Use SOLR like the "MySQL LIKE"
Ah, okay! Well, then I suggest you index the field in two different ways if you want both possible ways of searching. One, where you treat the entire name as one token (in lowercase) (then you can search for avera* and match on for instance "average joe" etc.) And then another field where you tokenize on whitespace for instance, if you want/need that possibility aswell. Look at the solr copy fields and try it out, it works like a charm :) Cheers, Aleksander On Tue, 18 Nov 2008 10:40:24 +0100, Carsten L <[EMAIL PROTECTED]> wrote: Thanks for the quick reply! It is supposed to work a little like the Google Suggest or field autocompletion. I know I mentioned email and userid, but the problem lies with the name field, because of the whitespaces in combination with the wildcard. I looked at the solr.WordDelimiterFilterFactory, but it does not mention anything about whitespaces - or wildcards. A quick brushup: I would like to mimic the LIKE functionality from MySQL using the wildcards in the end of the searchquery. In MySQL whitespaces are treated as characters, not "splitters". Aleksander M. Stensby wrote: Hi there, You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer "recognizes email addresses and internet hostnames as one token". In your case, I guess you want an email, say "[EMAIL PROTECTED]" to be split into four tokens: average joe apache org, or something like that, which would indeed allow you to search for "joe" or "average j*" and match. To do so, you could use the WordDelimiterFilterFactory and split on intra-word delimiters (I think the defaults here are non-alphanumeric chars). Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more info on tokenizers and filters. cheers, Aleks On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote: Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the "MySQL LIKE". So when a user enters the search term: "carsten", then the query looks like: "name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*)" Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: "carsten l" the query looks like: "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)" Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..." I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Use SOLR like the "MySQL LIKE"
Thanks for the quick reply! It is supposed to work a little like the Google Suggest or field autocompletion. I know I mentioned email and userid, but the problem lies with the name field, because of the whitespaces in combination with the wildcard. I looked at the solr.WordDelimiterFilterFactory, but it does not mention anything about whitespaces - or wildcards. A quick brushup: I would like to mimic the LIKE functionality from MySQL using the wildcards in the end of the searchquery. In MySQL whitespaces are treated as characters, not "splitters". Aleksander M. Stensby wrote: > > Hi there, > > You should use LowerCaseTokenizerFactory as you point out yourself. As far > as I know, the StandardTokenizer "recognizes email addresses and internet > hostnames as one token". In your case, I guess you want an email, say > "[EMAIL PROTECTED]" to be split into four tokens: average joe apache > org, or something like that, which would indeed allow you to search for > "joe" or "average j*" and match. To do so, you could use the > WordDelimiterFilterFactory and split on intra-word delimiters (I think the > defaults here are non-alphanumeric chars). > > Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > for more info on tokenizers and filters. > > cheers, > Aleks > > On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote: > >> >> Hello. >> >> The data: >> I have a dataset containing ~500.000 documents. >> In each document there is an email, a name and an user ID. >> >> The problem: >> I would like to be able to search in it, but it should be like the "MySQL >> LIKE". >> >> So when a user enters the search term: "carsten", then the query looks >> like: >> "name:(carsten) OR name:(carsten*) OR email:(carsten) OR >> email:(carsten*) OR userid:(carsten) OR userid:(carsten*)" >> >> Then it should match: >> carsten l >> carsten larsen >> Carsten Larsen >> Carsten >> CARSTEN >> etc. >> >> And when the user enters the term: "carsten l" the query looks like: >> "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR >> email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)" >> >> Then it should match: >> carsten l >> carsten larsen >> Carsten Larsen >> >> Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%' OR >> `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..." >> >> I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name >> and email field, to ensure case insentitive behavior. >> The problem seems to be the wildcards and the whitespaces. > > > > -- > Aleksander M. Stensby > Senior software developer > Integrasco A/S > www.integrasco.no > > -- View this message in context: http://www.nabble.com/Use-SOLR-like-the-%22MySQL-LIKE%22-tp20554732p20556271.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use SOLR like the "MySQL LIKE"
Hi there, You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer "recognizes email addresses and internet hostnames as one token". In your case, I guess you want an email, say "[EMAIL PROTECTED]" to be split into four tokens: average joe apache org, or something like that, which would indeed allow you to search for "joe" or "average j*" and match. To do so, you could use the WordDelimiterFilterFactory and split on intra-word delimiters (I think the defaults here are non-alphanumeric chars). Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more info on tokenizers and filters. cheers, Aleks On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote: Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the "MySQL LIKE". So when a user enters the search term: "carsten", then the query looks like: "name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*)" Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: "carsten l" the query looks like: "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)" Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..." I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Use SOLR like the "MySQL LIKE"
Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the "MySQL LIKE". So when a user enters the search term: "carsten", then the query looks like: "name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*)" Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: "carsten l" the query looks like: "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)" Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..." I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- View this message in context: http://www.nabble.com/Use-SOLR-like-the-%22MySQL-LIKE%22-tp20554732p20554732.html Sent from the Solr - User mailing list archive at Nabble.com.