Re: case preserving for data but not for indexing

2008-08-07 Thread Ian Connor
Thanks for the feedback and corrections - the definition was wrong
indeed. So, I have settled on this definition:
















and it seems to keep the names intact, as well as searching in a case
insensitive way. That is good to know about the meaning "stored" - it
might serve to cut down my index sizes.

On Thu, Aug 7, 2008 at 6:58 AM, Norberto Meijome <[EMAIL PROTECTED]> wrote:
> On Wed, 6 Aug 2008 21:35:47 -0700 (PDT)
> Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>
>> 
>> 
>>
>>
>> 2 Tokenizers?
>
> i wondered about that too, but didn't have the time to test...
> B
>
> _
> {Beto|Norberto|Numard} Meijome
>
> "Always listen to experts.  They'll tell you what can't be done, and why.
> Then do it."
>  Robert A. Heinlein
>
> I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
> Reading disclaimers makes you go blind. Writing them is worse. You have been 
> Warned.
>



-- 
Regards,

Ian Connor


Re: case preserving for data but not for indexing

2008-08-07 Thread Norberto Meijome
On Wed, 6 Aug 2008 21:35:47 -0700 (PDT)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> 
> 
> 
> 
> 2 Tokenizers?

i wondered about that too, but didn't have the time to test...
B

_
{Beto|Norberto|Numard} Meijome

"Always listen to experts.  They'll tell you what can't be done, and why.  
Then do it."
  Robert A. Heinlein

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: case preserving for data but not for indexing

2008-08-06 Thread Norberto Meijome
On Wed, 6 Aug 2008 20:21:28 -0400
"Ian Connor" <[EMAIL PROTECTED]> wrote:

> In order to preserve case for the data, but not for indexing, I have
> created two fields. One is type Author that is defined as:
> 
>  sortMissingLast="true" omitNorms="true">
>   
>   
>   
>   
>   
> 
> 
> and the other is just string:
> 
>  sortMissingLast="true" omitNorms="true"/>

Hi Ian,
the analyzers + filters apply to the data indexed (and to queries on the
field,of course), NOT what is stored. IOW, you don't have to do anything to have
SOLR return the data in your fields untouched. 

> this is used then for the author lists:
> omitNorms="true" multiValued="true"/>
> stored="true" omitNorms="true" multiValued="true"/>
> 
> Is there any other way than to have two fields like this? One for
> searching and one for displaying. 

Of course, you can do this but, for the reason you explained, it isn't needed.
As a matter of fact, you will be indexing and storing both... If you wanted to
have one field for indexing/search on and the other for retrieving, you'd have
to set the values of the indexed and stored properties accordingly.

> People's names can be vary case
> sensitive for display purpose (eg McDonald. DeBros) but I don't want
> people to miss results because they search for "lee" instead of "Lee".

your definition of typeField author:

>  sortMissingLast="true" omitNorms="true">
>   
>   
>   
>   
>   
> 

 should do that - it is telling SOLR (lucene?)  that, each piece of data stored
in a field of this type, to tokenize it., and then to change to lower case -
both at indexing and query time.

> 
> Also, can anyone see danger is using StandardTokenizerFactory for
> people's names?

I don't know, give it a try :) you can use the analysis page in /admin/ to see
how your date would be treated both at index and query time...

good luck,
B

_
{Beto|Norberto|Numard} Meijome

"As far as the laws of mathematics refer to reality, they are not certain, and
as far as they are certain, they do not refer to reality." Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: case preserving for data but not for indexing

2008-08-06 Thread Otis Gospodnetic
Maybe I'm missing something (it's late) but why not just index+store?  The 
stored value will be the original and indexing can lowercase (as you set it), 
so it's case-insensitive.

Also, does this actually work for you:





2 Tokenizers?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Ian Connor <[EMAIL PROTECTED]>
> To: solr 
> Sent: Wednesday, August 6, 2008 8:21:28 PM
> Subject: case preserving for data but not for indexing
> 
> In order to preserve case for the data, but not for indexing, I have
> created two fields. One is type Author that is defined as:
> 
> 
> sortMissingLast="true" omitNorms="true">
> 
> 
> 
> 
> 
> 
> 
> and the other is just string:
> 
> 
> sortMissingLast="true" omitNorms="true"/>
> 
> this is used then for the author lists:
>   
> omitNorms="true" multiValued="true"/>
>   
> stored="true" omitNorms="true" multiValued="true"/>
> 
> Is there any other way than to have two fields like this? One for
> searching and one for displaying. People's names can be vary case
> sensitive for display purpose (eg McDonald. DeBros) but I don't want
> people to miss results because they search for "lee" instead of "Lee".
> 
> Also, can anyone see danger is using StandardTokenizerFactory for
> people's names?
> -- 
> Regards,
> 
> Ian Connor



case preserving for data but not for indexing

2008-08-06 Thread Ian Connor
In order to preserve case for the data, but not for indexing, I have
created two fields. One is type Author that is defined as:









and the other is just string:



this is used then for the author lists:
   
   

Is there any other way than to have two fields like this? One for
searching and one for displaying. People's names can be vary case
sensitive for display purpose (eg McDonald. DeBros) but I don't want
people to miss results because they search for "lee" instead of "Lee".

Also, can anyone see danger is using StandardTokenizerFactory for
people's names?
-- 
Regards,

Ian Connor