RE: FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
Ok I think I'm getting it. At Index/Query time the analyzers fire and "do 
stuff". Ex: "the sheep jumped over the MOON" that could be Tokened on spaces, 
lowercased etc. and that is stored in the Inverted Index, something you 
probably can't really see.

In solr the string above is what you see in its original form. When you search 
for "sheep" that would come back because the Inverted Index has it stored in 
that form, separated words based on spaces, right? Further if I searched for 
moon (lowercase) it would be found because the analyzer is also storing in the 
Inverted Index the lowercase form, right?

I'm getting closer I think. Ok so if I want to physically lowercase the URL and 
store it that way, I need to do it before it gets to the Index as you stated. 
Ok got it, Thanks!

Brett Moyer
Manager, Sr. Technical Lead | TFS Technology
  Public Production Support
  Digital Search & Discovery

8625 Andrew Carnegie Blvd | 4th floor
Charlotte, NC 28263
Tel: 704.988.4508
Fax: 704.988.4907
bmo...@tiaa.org 


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, March 14, 2019 10:57 AM
To: solr-user@lucene.apache.org
Subject: Re: FieldTypes and LowerCase

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


On 3/14/2019 8:49 AM, Moyer, Brett wrote:
> Thanks Shawn, " Analysis only happens to indexed data" Being the case when 
> the data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the 
> URL? The analyzer I have defined is not set for Index or Query, so as I 
> understand it will fire during both events. If that is the case I still don't 
> get why the Lowercase doesn't fire when the data is being indexed.

It does happen for both index and query.

It sounds like you are assuming that when index analysis happens, that
what you get back in search results will be affected by that analysis.

What you get back in search results is stored data -- that is never
affected by analysis.

What gets affected by analysis is indexed data -- the data that is
searched by queries.  Not the data that comes back in search results.

Thanks,
Shawn
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: FieldTypes and LowerCase

2019-03-14 Thread Shawn Heisey

On 3/14/2019 8:49 AM, Moyer, Brett wrote:

Thanks Shawn, " Analysis only happens to indexed data" Being the case when the 
data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the URL? The analyzer 
I have defined is not set for Index or Query, so as I understand it will fire during both 
events. If that is the case I still don't get why the Lowercase doesn't fire when the 
data is being indexed.


It does happen for both index and query.

It sounds like you are assuming that when index analysis happens, that 
what you get back in search results will be affected by that analysis.


What you get back in search results is stored data -- that is never 
affected by analysis.


What gets affected by analysis is indexed data -- the data that is 
searched by queries.  Not the data that comes back in search results.


Thanks,
Shawn


RE: FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
Thanks Shawn, " Analysis only happens to indexed data" Being the case when the 
data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the URL? 
The analyzer I have defined is not set for Index or Query, so as I understand 
it will fire during both events. If that is the case I still don't get why the 
Lowercase doesn't fire when the data is being indexed. 

Brett Moyer

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, March 14, 2019 10:44 AM
To: solr-user@lucene.apache.org
Subject: Re: FieldTypes and LowerCase

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


On 3/14/2019 7:47 AM, Moyer, Brett wrote:
> I'm using the below FieldType/Field but when I index my documents, the URL is 
> not being lower case. Any ideas? Do I have the below wrong?
>
> Example: http://connect.rightprospectus.com/RSVP/TADF
> Expect: http://connect.rightprospectus.com/rsvp/tadf
>
>  omitNorms="true">
> 
>
>
> 
> 
>
>  stored="true"/>

Analysis only happens to indexed data.

The data that you get back from Solr (stored data) is *always* EXACTLY
what Solr indexes, before analysis.

You'll need to lowercase the data before it reaches analysis.  This is
how it is designed to work ... that will not be changing.

If you were to configure an Update Processor chain that did the
lowercasing, that would affect stored data as well as indexed data.

Thanks,
Shawn
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*



Re: FieldTypes and LowerCase

2019-03-14 Thread Shawn Heisey

On 3/14/2019 7:47 AM, Moyer, Brett wrote:

I'm using the below FieldType/Field but when I index my documents, the URL is 
not being lower case. Any ideas? Do I have the below wrong?

Example: http://connect.rightprospectus.com/RSVP/TADF
Expect: http://connect.rightprospectus.com/rsvp/tadf



   
   






Analysis only happens to indexed data.

The data that you get back from Solr (stored data) is *always* EXACTLY 
what Solr indexes, before analysis.


You'll need to lowercase the data before it reaches analysis.  This is 
how it is designed to work ... that will not be changing.


If you were to configure an Update Processor chain that did the 
lowercasing, that would affect stored data as well as indexed data.


Thanks,
Shawn