Re: Use SOLR like the "MySQL LIKE"

2008-11-19 Thread Norberto Meijome
On Tue, 18 Nov 2008 14:26:02 +0100
"Aleksander M. Stensby" <[EMAIL PROTECTED]> wrote:

> Well, then I suggest you index the field in two different ways if you want  
> both possible ways of searching. One, where you treat the entire name as  
> one token (in lowercase) (then you can search for avera* and match on for  
> instance "average joe" etc.) And then another field where you tokenize on  
> whitespace for instance, if you want/need that possibility aswell. Look at  
> the solr copy fields and try it out, it works like a charm :)

You should also make extensive use of  analysis.jsp  to see how data in your
field (1) is tokenized, filtered and indexed, and how your search terms are
tokenized, filtered and matched against (1). 
Hint 1 : check all the checkboxes ;)
Hint 2: you don't need to reindex all your data, just enter test data in the
form and give it a go. You will of course have to tweak schema.xml and restart
your service when you do this.

good luck,
B
_
{Beto|Norberto|Numard} Meijome

"Intellectual: 'Someone who has been educated beyond his/her intelligence'"
   Arthur C. Clarke, from "3001, The Final Odyssey", Sources.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Use SOLR like the "MySQL LIKE"

2008-11-18 Thread Aleksander M. Stensby

Ah, okay!
Well, then I suggest you index the field in two different ways if you want  
both possible ways of searching. One, where you treat the entire name as  
one token (in lowercase) (then you can search for avera* and match on for  
instance "average joe" etc.) And then another field where you tokenize on  
whitespace for instance, if you want/need that possibility aswell. Look at  
the solr copy fields and try it out, it works like a charm :)


Cheers,
 Aleksander

On Tue, 18 Nov 2008 10:40:24 +0100, Carsten L <[EMAIL PROTECTED]> wrote:



Thanks for the quick reply!

It is supposed to work a little like the Google Suggest or field
autocompletion.

I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.

I looked at the solr.WordDelimiterFilterFactory, but it does not mention
anything about whitespaces - or wildcards.

A quick brushup:
I would like to mimic the LIKE functionality from MySQL using the  
wildcards

in the end of the searchquery.
In MySQL whitespaces are treated as characters, not "splitters".


Aleksander M. Stensby wrote:


Hi there,

You should use LowerCaseTokenizerFactory as you point out yourself. As  
far
as I know, the StandardTokenizer "recognizes email addresses and  
internet

hostnames as one token". In your case, I guess you want an email, say
"[EMAIL PROTECTED]" to be split into four tokens: average joe  
apache

org, or something like that, which would indeed allow you to search for
"joe" or "average j*" and match. To do so, you could use the
WordDelimiterFilterFactory and split on intra-word delimiters (I think  
the

defaults here are non-alphanumeric chars).

Take a look at  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

for more info on tokenizers and filters.

cheers,
  Aleks

On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]>  
wrote:




Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the  
"MySQL

LIKE".

So when a user enters the search term: "carsten", then the query looks
like:
"name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: "carsten l" the query looks like:
"name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."

I know that I need to use the "solr.LowerCaseTokenizerFactory" on my  
name

and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no








--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Use SOLR like the "MySQL LIKE"

2008-11-18 Thread Carsten L

Thanks for the quick reply!

It is supposed to work a little like the Google Suggest or field
autocompletion.

I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.

I looked at the solr.WordDelimiterFilterFactory, but it does not mention
anything about whitespaces - or wildcards.

A quick brushup:
I would like to mimic the LIKE functionality from MySQL using the wildcards
in the end of the searchquery.
In MySQL whitespaces are treated as characters, not "splitters".


Aleksander M. Stensby wrote:
> 
> Hi there,
> 
> You should use LowerCaseTokenizerFactory as you point out yourself. As far  
> as I know, the StandardTokenizer "recognizes email addresses and internet  
> hostnames as one token". In your case, I guess you want an email, say  
> "[EMAIL PROTECTED]" to be split into four tokens: average joe apache  
> org, or something like that, which would indeed allow you to search for  
> "joe" or "average j*" and match. To do so, you could use the  
> WordDelimiterFilterFactory and split on intra-word delimiters (I think the  
> defaults here are non-alphanumeric chars).
> 
> Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters  
> for more info on tokenizers and filters.
> 
> cheers,
>   Aleks
> 
> On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote:
> 
>>
>> Hello.
>>
>> The data:
>> I have a dataset containing ~500.000 documents.
>> In each document there is an email, a name and an user ID.
>>
>> The problem:
>> I would like to be able to search in it, but it should be like the "MySQL
>> LIKE".
>>
>> So when a user enters the search term: "carsten", then the query looks  
>> like:
>> "name:(carsten) OR name:(carsten*) OR email:(carsten) OR
>> email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"
>>
>> Then it should match:
>> carsten l
>> carsten larsen
>> Carsten Larsen
>> Carsten
>> CARSTEN
>> etc.
>>
>> And when the user enters the term: "carsten l" the query looks like:
>> "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
>> email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"
>>
>> Then it should match:
>> carsten l
>> carsten larsen
>> Carsten Larsen
>>
>> Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
>> `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."
>>
>> I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name
>> and email field, to ensure case insentitive behavior.
>> The problem seems to be the wildcards and the whitespaces.
> 
> 
> 
> -- 
> Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Use-SOLR-like-the-%22MySQL-LIKE%22-tp20554732p20556271.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Use SOLR like the "MySQL LIKE"

2008-11-18 Thread Aleksander M. Stensby

Hi there,

You should use LowerCaseTokenizerFactory as you point out yourself. As far  
as I know, the StandardTokenizer "recognizes email addresses and internet  
hostnames as one token". In your case, I guess you want an email, say  
"[EMAIL PROTECTED]" to be split into four tokens: average joe apache  
org, or something like that, which would indeed allow you to search for  
"joe" or "average j*" and match. To do so, you could use the  
WordDelimiterFilterFactory and split on intra-word delimiters (I think the  
defaults here are non-alphanumeric chars).


Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters  
for more info on tokenizers and filters.


cheers,
 Aleks

On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote:



Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the "MySQL
LIKE".

So when a user enters the search term: "carsten", then the query looks  
like:

"name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: "carsten l" the query looks like:
"name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."

I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name
and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Use SOLR like the "MySQL LIKE"

2008-11-17 Thread Carsten L

Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the "MySQL
LIKE".

So when a user enters the search term: "carsten", then the query looks like:
"name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: "carsten l" the query looks like:
"name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."

I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name
and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.
-- 
View this message in context: 
http://www.nabble.com/Use-SOLR-like-the-%22MySQL-LIKE%22-tp20554732p20554732.html
Sent from the Solr - User mailing list archive at Nabble.com.