RE: commas in synonyms.txt are not escaping

2011-08-29 Thread Moore, Gary
Hah, I knew it was something simple. :)  Thanks.
Gary

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, August 28, 2011 12:50 PM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Turns out this isn't a bug - I was just tripped up by the analysis
changes to the example server.

Gary, you are probably just hitting the same thing.
The "text" fieldType is no longer used by any fields by default - for
example the "text" field uses the "text_general" fieldType.
This fieldType uses the standard tokenizer, which discards stuff like
commas (hence the synonym will never match).

-Yonik
http://www.lucidimagination.com


Re: commas in synonyms.txt are not escaping

2011-08-28 Thread Yonik Seeley
Turns out this isn't a bug - I was just tripped up by the analysis
changes to the example server.

Gary, you are probably just hitting the same thing.
The "text" fieldType is no longer used by any fields by default - for
example the "text" field uses the "text_general" fieldType.
This fieldType uses the standard tokenizer, which discards stuff like
commas (hence the synonym will never match).

-Yonik
http://www.lucidimagination.com


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Alexi,
Yes but no difference.  This is apparently an issue introduced in 3.*.  Thanks 
for your help.
-Gary

-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, isn't your wordDelimiter removing your commas in the query time? have
u tried it in the analyzer?

2011/8/26 Moore, Gary 

> Here you go -- I'm just hacking the text field at the moment.  Thanks,
> Gary
>
> 
>  
>
>  synonyms="index_synonyms.txt"
> tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true"
> expand="true"/>
> 
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
>   
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> -Original Message-
> From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
> Sent: Friday, August 26, 2011 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: commas in synonyms.txt are not escaping
>
> Gary, please post the entire field declaration so I can try to reproduce
> here
>
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Thanks, Yonik.
Gary

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, August 26, 2011 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley
 wrote:
> On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary  wrote:
>>
>> I have a number of chemical names containing commas which I'm mapping in 
>> index_synonyms.txt thusly:
>>
>> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
>> 3,CCRIS 8562
>>
>> According to the sample synonyms.txt, the comma above should be. i.e. 
>> a\,a=>b\,b.    The problem is that according to analysis.jsp the commas are 
>> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
>> paste in 2\,4-D-butotyl, the mappings are done.
>
>
> I can confirm that this works in 1.4, but no longer works in 3x or
> trunk.  Can you open an issue?

Actually, I think I've tracked it to LUCENE-3233 where the parsing
rules were moved from Solr to Lucene (and changed the functionality in
the process).
I'll reopen t hat since I don't think it's been in a released version yet.

-Yonik
http://www.lucidimagination.com


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Yonik Seeley
On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley
 wrote:
> On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary  wrote:
>>
>> I have a number of chemical names containing commas which I'm mapping in 
>> index_synonyms.txt thusly:
>>
>> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
>> 3,CCRIS 8562
>>
>> According to the sample synonyms.txt, the comma above should be. i.e. 
>> a\,a=>b\,b.    The problem is that according to analysis.jsp the commas are 
>> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
>> paste in 2\,4-D-butotyl, the mappings are done.
>
>
> I can confirm that this works in 1.4, but no longer works in 3x or
> trunk.  Can you open an issue?

Actually, I think I've tracked it to LUCENE-3233 where the parsing
rules were moved from Solr to Lucene (and changed the functionality in
the process).
I'll reopen t hat since I don't think it's been in a released version yet.

-Yonik
http://www.lucidimagination.com


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Yonik Seeley
On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary  wrote:
>
> I have a number of chemical names containing commas which I'm mapping in 
> index_synonyms.txt thusly:
>
> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
> 3,CCRIS 8562
>
> According to the sample synonyms.txt, the comma above should be. i.e. 
> a\,a=>b\,b.    The problem is that according to analysis.jsp the commas are 
> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
> paste in 2\,4-D-butotyl, the mappings are done.


I can confirm that this works in 1.4, but no longer works in 3x or
trunk.  Can you open an issue?

-Yonik
http://www.lucidimagination.com


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Alexei Martchenko
Gary, isn't your wordDelimiter removing your commas in the query time? have
u tried it in the analyzer?

2011/8/26 Moore, Gary 

> Here you go -- I'm just hacking the text field at the moment.  Thanks,
> Gary
>
> 
>  
>
>  synonyms="index_synonyms.txt"
> tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true"
> expand="true"/>
> 
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
>   
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> -Original Message-
> From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
> Sent: Friday, August 26, 2011 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: commas in synonyms.txt are not escaping
>
> Gary, please post the entire field declaration so I can try to reproduce
> here
>
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Here you go -- I'm just hacking the text field at the moment.  Thanks,
Gary


  








  
  

   





  


-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, please post the entire field declaration so I can try to reproduce
here




Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Alexei Martchenko
Gary, please post the entire field declaration so I can try to reproduce
here

2011/8/26 Moore, Gary 

>
> I have a number of chemical names containing commas which I'm mapping in
> index_synonyms.txt thusly:
>
> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D
> 3,CCRIS 8562
>
> According to the sample synonyms.txt, the comma above should be. i.e.
> a\,a=>b\,b.The problem is that according to analysis.jsp the commas are
> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I
> paste in 2\,4-D-butotyl, the mappings are done.  This is verified by there
> being no mappings in the index.  I assume there would be if 2\,4-D-butotyl
> actually appeared in a document.
>
> The filter I'm declaring in the index analyzer looks like this:
>
>   tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true"
> expand="true"/>
>
> Doesn't seem to matter which tokenizer I use.This must be something
> simple that I'm not doing but am a bit stumped at the moment and would
> appreciate any tips.
> Thanks
> Gary
>
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary

I have a number of chemical names containing commas which I'm mapping in 
index_synonyms.txt thusly:

2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
3,CCRIS 8562

According to the sample synonyms.txt, the comma above should be. i.e. 
a\,a=>b\,b.The problem is that according to analysis.jsp the commas are not 
being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I paste in 
2\,4-D-butotyl, the mappings are done.  This is verified by there being no 
mappings in the index.  I assume there would be if 2\,4-D-butotyl actually 
appeared in a document.

The filter I'm declaring in the index analyzer looks like this:



Doesn't seem to matter which tokenizer I use.This must be something simple 
that I'm not doing but am a bit stumped at the moment and would appreciate any 
tips.
Thanks
Gary