Re: Strange search behaviour when upgrading to 4.10.3

2015-02-23 Thread Rishi Easwaran
Thanks Shawn.
Just ran the analysis between 4.6 and 4.10, there seems to be only difference 
between the outputs positionLength value is set in 4.10. Does that mean 
anything.

Version 4.10



SF





text

raw_bytes

start

end

positionLength

type

position









message

[6d 65 73 73 61 67 65]

0

7

1

ALNUM

1








 Version 4.6


 


SF





text

raw_bytes

type

start

end

position









message

[6d 65 73 73 61 67 65]

ALNUM

0

7

1







Thanks,
Rishi.


 

-Original Message-
From: Shawn Heisey 
To: solr-user 
Sent: Fri, Feb 20, 2015 6:51 pm
Subject: Re: Strange search behaviour when upgrading to 4.10.3


On 2/20/2015 4:24 PM, Rishi Easwaran wrote:
> Also, the tokenizer we use is very similar to the following.
> ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java
> ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex
>
>
> From the looks of it the text is being indexed as a single token and not 
broken across whitespace. 

I can't claim to know how analyzer code works.  I did manage to see the
code, but it doesn't mean much to me.

I would suggest using the analysis tab in the Solr admin interface.  On
that page, select the field or fieldType, set the "verbose" flag and
type the actual field contents into the "index" side of the page.  When
you click the Analyze Values button, it will show you what Solr does
with the input at index time.

Do you still have access to any machines (dev or otherwise) running the
old version with the custom component? If so, do the same things on the
analysis page for that version that you did on the new version, and see
whether it does something different.  If it does do something different,
then you will need to track down the problem in the code for your custom
analyzer.

Thanks,
Shawn


 


Re: Strange search behaviour when upgrading to 4.10.3

2015-02-20 Thread Shawn Heisey
On 2/20/2015 4:24 PM, Rishi Easwaran wrote:
> Also, the tokenizer we use is very similar to the following.
> ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java
> ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex
>
>
> From the looks of it the text is being indexed as a single token and not 
> broken across whitespace. 

I can't claim to know how analyzer code works.  I did manage to see the
code, but it doesn't mean much to me.

I would suggest using the analysis tab in the Solr admin interface.  On
that page, select the field or fieldType, set the "verbose" flag and
type the actual field contents into the "index" side of the page.  When
you click the Analyze Values button, it will show you what Solr does
with the input at index time.

Do you still have access to any machines (dev or otherwise) running the
old version with the custom component? If so, do the same things on the
analysis page for that version that you did on the new version, and see
whether it does something different.  If it does do something different,
then you will need to track down the problem in the code for your custom
analyzer.

Thanks,
Shawn



Re: Strange search behaviour when upgrading to 4.10.3

2015-02-20 Thread Rishi Easwaran
Hi Shawn,
Also, the tokenizer we use is very similar to the following.
ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java
ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex


From the looks of it the text is being indexed as a single token and not broken 
across whitespace. 

Thanks,
Rishi. 

 

 

-Original Message-
From: Shawn Heisey 
To: solr-user 
Sent: Fri, Feb 20, 2015 11:52 am
Subject: Re: Strange search behaviour when upgrading to 4.10.3


On 2/20/2015 9:37 AM, Rishi Easwaran wrote:
> We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 
search results are not being returned, actually looks like only the first word 
in a sentence is getting indexed. 
> Ex: inserting "This is a test message" only returns results when searching 
> for 
content:this*. searching for content:test* or content:message* does not work 
with 4.10. Only searching for content:*message* works. This leads to me to 
believe there is something wrong with behaviour of our analyzer and tokenizers 



>  
> 
>   
>
> 
> 
> 
>  
> Looking at the release notes from solr and lucene
> http://lucene.apache.org/solr/4_10_1/changes/Changes.html
> http://lucene.apache.org/core/4_10_1/changes/Changes.html
> Nothing really sticks out, atleast to me.  Any help to get it working with 
4.10 would be great.

The links you provided lead to zero-byte files when I try them, so I
could not look deeper.

Have you recompiled your custom analysis components against the newer
versions of the Solr/Lucene libraries?  Anytime you're dealing with
custom components, you cannot assume that a component compiled to work
with one version of Solr will work with another version.  The internal
API does change, and there is less emphasis on avoiding API breaks in
minor Solr releases than there is with Lucene, because the vast majority
of Solr users are not writing their own code that uses the Solr API. 
Recompiling against the newer libraries may cause compiler errors that
reveal places in your code that require changes.

Thanks,
Shawn


 


Re: Strange search behaviour when upgrading to 4.10.3

2015-02-20 Thread Rishi Easwaran
Yes, The analyzers and tokenizers were recompiled with new version of 
solr/lucene and there were some errors, most of them were related to using 
BytesRefBuilder, which i did. 

Can you try these links.
ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/ZimbraAnalyzer.java
ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalAnalyzer.java

 

 

 

-Original Message-
From: Shawn Heisey 
To: solr-user 
Sent: Fri, Feb 20, 2015 11:52 am
Subject: Re: Strange search behaviour when upgrading to 4.10.3


On 2/20/2015 9:37 AM, Rishi Easwaran wrote:
> We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 
search results are not being returned, actually looks like only the first word 
in a sentence is getting indexed. 
> Ex: inserting "This is a test message" only returns results when searching 
> for 
content:this*. searching for content:test* or content:message* does not work 
with 4.10. Only searching for content:*message* works. This leads to me to 
believe there is something wrong with behaviour of our analyzer and tokenizers 



>  
> 
>   
>
> 
> 
> 
>  
> Looking at the release notes from solr and lucene
> http://lucene.apache.org/solr/4_10_1/changes/Changes.html
> http://lucene.apache.org/core/4_10_1/changes/Changes.html
> Nothing really sticks out, atleast to me.  Any help to get it working with 
4.10 would be great.

The links you provided lead to zero-byte files when I try them, so I
could not look deeper.

Have you recompiled your custom analysis components against the newer
versions of the Solr/Lucene libraries?  Anytime you're dealing with
custom components, you cannot assume that a component compiled to work
with one version of Solr will work with another version.  The internal
API does change, and there is less emphasis on avoiding API breaks in
minor Solr releases than there is with Lucene, because the vast majority
of Solr users are not writing their own code that uses the Solr API. 
Recompiling against the newer libraries may cause compiler errors that
reveal places in your code that require changes.

Thanks,
Shawn


 


Re: Strange search behaviour when upgrading to 4.10.3

2015-02-20 Thread Shawn Heisey
On 2/20/2015 9:37 AM, Rishi Easwaran wrote:
> We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 
> search results are not being returned, actually looks like only the first 
> word in a sentence is getting indexed. 
> Ex: inserting "This is a test message" only returns results when searching 
> for content:this*. searching for content:test* or content:message* does not 
> work with 4.10. Only searching for content:*message* works. This leads to me 
> to believe there is something wrong with behaviour of our analyzer and 
> tokenizers 



>  
>  required="false" multiValued="true" />
>   
>
> 
> 
> 
>  
> Looking at the release notes from solr and lucene
> http://lucene.apache.org/solr/4_10_1/changes/Changes.html
> http://lucene.apache.org/core/4_10_1/changes/Changes.html
> Nothing really sticks out, atleast to me.  Any help to get it working with 
> 4.10 would be great.

The links you provided lead to zero-byte files when I try them, so I
could not look deeper.

Have you recompiled your custom analysis components against the newer
versions of the Solr/Lucene libraries?  Anytime you're dealing with
custom components, you cannot assume that a component compiled to work
with one version of Solr will work with another version.  The internal
API does change, and there is less emphasis on avoiding API breaks in
minor Solr releases than there is with Lucene, because the vast majority
of Solr users are not writing their own code that uses the Solr API. 
Recompiling against the newer libraries may cause compiler errors that
reveal places in your code that require changes.

Thanks,
Shawn



Strange search behaviour when upgrading to 4.10.3

2015-02-20 Thread Rishi Easwaran
Hi,

We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 
search results are not being returned, actually looks like only the first word 
in a sentence is getting indexed. 
Ex: inserting "This is a test message" only returns results when searching for 
content:this*. searching for content:test* or content:message* does not work 
with 4.10. Only searching for content:*message* works. This leads to me to 
believe there is something wrong with behaviour of our analyzer and tokenizers 

A little bit of background. 

We have our own analyzer and tokenizer since pre solr 1.4 and its been 
regularly updated. The analyzer works with solr 4.6 we have it running in 
production (I also tested that search works with solr 4.9.1).
It is very similar to the tokenizers and analyzers located here.
ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/ZimbraAnalyzer.java
ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalAnalyzer.java
ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/
But with modifications to work with latest solr/lucene code ex: override- 
createComponents

The schema of the filed being analyzed is as follows

 

  




 
Looking at the release notes from solr and lucene
http://lucene.apache.org/solr/4_10_1/changes/Changes.html
http://lucene.apache.org/core/4_10_1/changes/Changes.html
Nothing really sticks out, atleast to me.  Any help to get it working with 4.10 
would be great.

Thanks,
Rishi.