Re: Strip special chars like "-"

2011-08-12 Thread roySolr
Erick, you're right. It's working, my schema looks like this:


  
 

 
 

 
  
  
 

 
 

 
  


Thanks for helping me!!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3248545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strip special chars like "-"

2011-08-09 Thread Erick Erickson
That's not what I get. This is for Solr 3.3, but there's no
reason that I know of that other versions should give
different results.


Here's the field def form the 3.3 example, this is just
the standard implementation.

  
  








  
  







  


At index time, it produces the tokens for manchester-united
pos 1 pos 2
manchester united
manchesterunited

at query time, manchesterunited matches, it isn't transformed and
matches on the second row
manchester united and manchester-united
both parse to
manchester united
and match the first row.


So somehow we're not doing the same thing. Try
attaching &debugQuery=on to your query and post the results.
Also try looking at the admin/analysis page and see what
that tells you.

Best
Erick

P.S. Did you re-index after your schema changes?


On Tue, Aug 9, 2011 at 11:03 AM, roySolr  wrote:
> Ok, i there are three query possibilities:
>
> Manchester-united
> Manchester united
> Manchesterunited
>
> The original name of the club is "manchester-united".
>
>
> generateWordParts will fixes two of these possibilities:
>
> "Manchester-united" => "manchester","united"
>
> I can search for "Manchester-united" and "manchester" "united". When i
> search for "manchesterunited" i get no results.
>
> To fix this i could use catenateWords:
>
> "Manchester-united" => "manchesterunited"
>
> In this situation i can search for  "Manchester-united" and
> "manchesterunited". When i search for "manchester united" i get no results.
> The catenateWords option will also fixes only 2 situations.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Strip special chars like "-"

2011-08-09 Thread Sujit Pal
I have done this using a custom tokenfilter that (among other things)
detects hyphenated words and converts it to the 3 variations, using a
regex match on the incoming token:
(\w+)-(\w+)

that runs the following regex transform:

s/(\w+)-(\w+)/$1$2__$1 $2/

and then splits by "__" and passes the original token, the one word and
two word versions through a SynonymFilter further down the chain (see
Lucene in Action, 2nd Edition for code).

-sujit

On Tue, 2011-08-09 at 06:27 -0700, roySolr wrote:
> Hello,
> 
> I have some terms in my index with specials characters. An example is
> "manchester-united". I want that a user can search for
> "manchester-united","manchester united" and  "manchesterunited". What's the
> best way to fix this? i have used the patternReplaceFilter and some
> tokenizers but it couldn't fix the last situation(manchesterunited). Can
> someone helps me?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3238942.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strip special chars like "-"

2011-08-09 Thread lee carroll
Hi I might be wrong as I've not tried it out to be sure but from the wiki docs:

These parameters may be combined in any way.

Example of generateWordParts="1" and catenateWords="1":
"PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot"
(where 0,1,1 are token positions)

does that fit the bill ?

On 9 August 2011 16:03, roySolr  wrote:
> Ok, i there are three query possibilities:
>
> Manchester-united
> Manchester united
> Manchesterunited
>
> The original name of the club is "manchester-united".
>
>
> generateWordParts will fixes two of these possibilities:
>
> "Manchester-united" => "manchester","united"
>
> I can search for "Manchester-united" and "manchester" "united". When i
> search for "manchesterunited" i get no results.
>
> To fix this i could use catenateWords:
>
> "Manchester-united" => "manchesterunited"
>
> In this situation i can search for  "Manchester-united" and
> "manchesterunited". When i search for "manchester united" i get no results.
> The catenateWords option will also fixes only 2 situations.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Strip special chars like "-"

2011-08-09 Thread roySolr
Ok, i there are three query possibilities:

Manchester-united
Manchester united
Manchesterunited

The original name of the club is "manchester-united". 


generateWordParts will fixes two of these possibilities:

"Manchester-united" => "manchester","united"

I can search for "Manchester-united" and "manchester" "united". When i
search for "manchesterunited" i get no results. 

To fix this i could use catenateWords:

"Manchester-united" => "manchesterunited" 

In this situation i can search for  "Manchester-united" and
"manchesterunited". When i search for "manchester united" i get no results.
The catenateWords option will also fixes only 2 situations.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strip special chars like "-"

2011-08-09 Thread Erick Erickson
OK, what are the other possibilities that it doesn't fix? Just saying
"it won't work" without some examples doesn't leave much to
go on...

Best
Erick

On Tue, Aug 9, 2011 at 10:41 AM, roySolr  wrote:
> Yes, i understand the difference between generateWordParts and catenateWords.
> But i can't fix my problem with these options, It doesn't fix all the
> possibilities.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Strip special chars like "-"

2011-08-09 Thread roySolr
Yes, i understand the difference between generateWordParts and catenateWords.
But i can't fix my problem with these options, It doesn't fix all the
possibilities.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strip special chars like "-"

2011-08-09 Thread Jayendra Patil
catenateWordParts would club the two words as mentioned in the example
@ 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

catenateWords="1" causes maximum runs of word parts to be catenated:
"wi-fi" => "wifi"

Regards,
Jayendra

On Tue, Aug 9, 2011 at 10:25 AM, roySolr  wrote:
> The catenateWordParts option has the following effect:
>
> manchester-united => "manchester","united"
>
> The query "manchesterunited" will not match with: "manchester","united".
> Maybe i'm wrong but i have test something similar in the past.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239129.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Strip special chars like "-"

2011-08-09 Thread roySolr
The catenateWordParts option has the following effect:

manchester-united => "manchester","united"

The query "manchesterunited" will not match with: "manchester","united".
Maybe i'm wrong but i have test something similar in the past.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239129.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strip special chars like "-"

2011-08-09 Thread Markus Jelsma
Use the catenateWordParts option

On Tuesday 09 August 2011 16:02:47 roySolr wrote:
> With the worddelimiter i can only fix the first 2
> situations("manchester-united" and "manchester united")
> 
> I can use something like generateWordParts. But i think this doesn't fix
> the problem with "manchesterunited".
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239
> 056.html Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Strip special chars like "-"

2011-08-09 Thread roySolr
With the worddelimiter i can only fix the first 2
situations("manchester-united" and "manchester united")

I can use something like generateWordParts. But i think this doesn't fix the
problem with "manchesterunited".

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239056.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strip special chars like "-"

2011-08-09 Thread Jayendra Patil
Use 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
which can generate tokens as u need to match the search patterns.

Regards,
Jayendra

On Tue, Aug 9, 2011 at 9:27 AM, roySolr  wrote:
> Hello,
>
> I have some terms in my index with specials characters. An example is
> "manchester-united". I want that a user can search for
> "manchester-united","manchester united" and  "manchesterunited". What's the
> best way to fix this? i have used the patternReplaceFilter and some
> tokenizers but it couldn't fix the last situation(manchesterunited). Can
> someone helps me?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3238942.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>