Re: Strange search behaviour when upgrading to 4.10.3
Thanks Shawn. Just ran the analysis between 4.6 and 4.10, there seems to be only difference between the outputs positionLength value is set in 4.10. Does that mean anything. Version 4.10 SF text raw_bytes start end positionLength type position message [6d 65 73 73 61 67 65] 0 7 1 ALNUM 1 Version 4.6 SF text raw_bytes type start end position message [6d 65 73 73 61 67 65] ALNUM 0 7 1 Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Feb 20, 2015 6:51 pm Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 4:24 PM, Rishi Easwaran wrote: > Also, the tokenizer we use is very similar to the following. > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex > > > From the looks of it the text is being indexed as a single token and not broken across whitespace. I can't claim to know how analyzer code works. I did manage to see the code, but it doesn't mean much to me. I would suggest using the analysis tab in the Solr admin interface. On that page, select the field or fieldType, set the "verbose" flag and type the actual field contents into the "index" side of the page. When you click the Analyze Values button, it will show you what Solr does with the input at index time. Do you still have access to any machines (dev or otherwise) running the old version with the custom component? If so, do the same things on the analysis page for that version that you did on the new version, and see whether it does something different. If it does do something different, then you will need to track down the problem in the code for your custom analyzer. Thanks, Shawn
Re: Strange search behaviour when upgrading to 4.10.3
On 2/20/2015 4:24 PM, Rishi Easwaran wrote: > Also, the tokenizer we use is very similar to the following. > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex > > > From the looks of it the text is being indexed as a single token and not > broken across whitespace. I can't claim to know how analyzer code works. I did manage to see the code, but it doesn't mean much to me. I would suggest using the analysis tab in the Solr admin interface. On that page, select the field or fieldType, set the "verbose" flag and type the actual field contents into the "index" side of the page. When you click the Analyze Values button, it will show you what Solr does with the input at index time. Do you still have access to any machines (dev or otherwise) running the old version with the custom component? If so, do the same things on the analysis page for that version that you did on the new version, and see whether it does something different. If it does do something different, then you will need to track down the problem in the code for your custom analyzer. Thanks, Shawn
Re: Strange search behaviour when upgrading to 4.10.3
Hi Shawn, Also, the tokenizer we use is very similar to the following. ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex From the looks of it the text is being indexed as a single token and not broken across whitespace. Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Feb 20, 2015 11:52 am Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 9:37 AM, Rishi Easwaran wrote: > We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 search results are not being returned, actually looks like only the first word in a sentence is getting indexed. > Ex: inserting "This is a test message" only returns results when searching > for content:this*. searching for content:test* or content:message* does not work with 4.10. Only searching for content:*message* works. This leads to me to believe there is something wrong with behaviour of our analyzer and tokenizers > > > > > > > > > Looking at the release notes from solr and lucene > http://lucene.apache.org/solr/4_10_1/changes/Changes.html > http://lucene.apache.org/core/4_10_1/changes/Changes.html > Nothing really sticks out, atleast to me. Any help to get it working with 4.10 would be great. The links you provided lead to zero-byte files when I try them, so I could not look deeper. Have you recompiled your custom analysis components against the newer versions of the Solr/Lucene libraries? Anytime you're dealing with custom components, you cannot assume that a component compiled to work with one version of Solr will work with another version. The internal API does change, and there is less emphasis on avoiding API breaks in minor Solr releases than there is with Lucene, because the vast majority of Solr users are not writing their own code that uses the Solr API. Recompiling against the newer libraries may cause compiler errors that reveal places in your code that require changes. Thanks, Shawn
Re: Strange search behaviour when upgrading to 4.10.3
Yes, The analyzers and tokenizers were recompiled with new version of solr/lucene and there were some errors, most of them were related to using BytesRefBuilder, which i did. Can you try these links. ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/ZimbraAnalyzer.java ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalAnalyzer.java -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Feb 20, 2015 11:52 am Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 9:37 AM, Rishi Easwaran wrote: > We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 search results are not being returned, actually looks like only the first word in a sentence is getting indexed. > Ex: inserting "This is a test message" only returns results when searching > for content:this*. searching for content:test* or content:message* does not work with 4.10. Only searching for content:*message* works. This leads to me to believe there is something wrong with behaviour of our analyzer and tokenizers > > > > > > > > > Looking at the release notes from solr and lucene > http://lucene.apache.org/solr/4_10_1/changes/Changes.html > http://lucene.apache.org/core/4_10_1/changes/Changes.html > Nothing really sticks out, atleast to me. Any help to get it working with 4.10 would be great. The links you provided lead to zero-byte files when I try them, so I could not look deeper. Have you recompiled your custom analysis components against the newer versions of the Solr/Lucene libraries? Anytime you're dealing with custom components, you cannot assume that a component compiled to work with one version of Solr will work with another version. The internal API does change, and there is less emphasis on avoiding API breaks in minor Solr releases than there is with Lucene, because the vast majority of Solr users are not writing their own code that uses the Solr API. Recompiling against the newer libraries may cause compiler errors that reveal places in your code that require changes. Thanks, Shawn
Re: Strange search behaviour when upgrading to 4.10.3
On 2/20/2015 9:37 AM, Rishi Easwaran wrote: > We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 > search results are not being returned, actually looks like only the first > word in a sentence is getting indexed. > Ex: inserting "This is a test message" only returns results when searching > for content:this*. searching for content:test* or content:message* does not > work with 4.10. Only searching for content:*message* works. This leads to me > to believe there is something wrong with behaviour of our analyzer and > tokenizers > > required="false" multiValued="true" /> > > > > > > > Looking at the release notes from solr and lucene > http://lucene.apache.org/solr/4_10_1/changes/Changes.html > http://lucene.apache.org/core/4_10_1/changes/Changes.html > Nothing really sticks out, atleast to me. Any help to get it working with > 4.10 would be great. The links you provided lead to zero-byte files when I try them, so I could not look deeper. Have you recompiled your custom analysis components against the newer versions of the Solr/Lucene libraries? Anytime you're dealing with custom components, you cannot assume that a component compiled to work with one version of Solr will work with another version. The internal API does change, and there is less emphasis on avoiding API breaks in minor Solr releases than there is with Lucene, because the vast majority of Solr users are not writing their own code that uses the Solr API. Recompiling against the newer libraries may cause compiler errors that reveal places in your code that require changes. Thanks, Shawn
Strange search behaviour when upgrading to 4.10.3
Hi, We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 search results are not being returned, actually looks like only the first word in a sentence is getting indexed. Ex: inserting "This is a test message" only returns results when searching for content:this*. searching for content:test* or content:message* does not work with 4.10. Only searching for content:*message* works. This leads to me to believe there is something wrong with behaviour of our analyzer and tokenizers A little bit of background. We have our own analyzer and tokenizer since pre solr 1.4 and its been regularly updated. The analyzer works with solr 4.6 we have it running in production (I also tested that search works with solr 4.9.1). It is very similar to the tokenizers and analyzers located here. ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/ZimbraAnalyzer.java ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalAnalyzer.java ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/ But with modifications to work with latest solr/lucene code ex: override- createComponents The schema of the filed being analyzed is as follows Looking at the release notes from solr and lucene http://lucene.apache.org/solr/4_10_1/changes/Changes.html http://lucene.apache.org/core/4_10_1/changes/Changes.html Nothing really sticks out, atleast to me. Any help to get it working with 4.10 would be great. Thanks, Rishi.