RE: Re: uniqueKey and custom fieldType

Markus Jelsma Sun, 15 Aug 2010 11:41:52 -0700

copyField it to an analyzed field will do the trick. 
 
-----Original message-----
From: j <jta...@gmail.com>
Sent: Sun 15-08-2010 20:30
To: solr-user@lucene.apache.org; 
Subject: Re: uniqueKey and custom fieldType

Hi Erick, thanks- your explanation makes sense. But how then, do I
make my unique field useful in terms of searching. If I have a unique
column id with value:

sometexthere-1234567

and want it match the query '1234567', I need to use an analyzer to
split up the parts around the hyphen/dash. I guess I could make a copy
of that field in another field with gets analyzed?

Thanks for any advice.

The short answer is that unique keys should be s single
term. String types are guaranteed to be single, since they
aren't analyzed. Your SplitUpStuff type *does* analyze
terms, and can make multiple tokens out of single strings
via WordDelimterFactory.

A common error when thinking about the "string" the type is
not understanding that it is NOT analyzed. It's indexed as
a single term. So whey you define UniqueKey of type string,
it behaves as you expect. That is documents are updated if
the ID field matches exactly, case, spaces, order and all.

By introducing your "SplitUpStuff" type as UniqueKey, Well,
I don't even know what behavior I'd expect. And whatever
behavior I happened to observe would not be guaranteed to
be the behavior of the next release.

Consider what you're asking for and you can see why you
don't want to analyze your uniquekey field. Consider
the following simple text type (where each word is a term).
You have two values from two different docs
doc1: "this is a nice unique key"
doc2: "My Keys are Unique and Nice"

It's quite possible, with combinations of analyzers and stemmers
to index the exact same tokens, namely "nice", "unique" and "key"
for each document. Are these equivalent? Does order count?
Capitalization? It'd just be a nightmare to try to
explain/predict/implement.

Likely whatever behavior you do get is just whatever falls out of the
code. I'm not even sure any attempt is made to enforce uniqueness
on an analyzed field.

HTH
Erick

On Sun, Aug 15, 2010 at 11:59 AM, j <jta...@gmail.com> wrote:

> I guess another way to pose the question is- what could cause
> <uniqueKey>id</uniqueKey>   to no longer be respected?
>
>
> The last chance I made since I noticed the problem of non-unique docs
> was by changing field "title" from "string" to "SplitUpStuff". But I
> dont understand how that could affect the uniqueness of a different
> field called "id".
>
> <fieldType name="splitUpStuff" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="0" c
>         <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                 enablePositionIncrements="false"
>                />
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
> </fieldType>
>
>
>
>
>
>
> In order to make even a guess, we'd have to see your new
> field type. Particularly its field definitions and the analysis
> chain...
>
> Best
> Erick
>
> On Fri, Aug 13, 2010 at 5:16 PM, j <jta...@gmail.com> wrote:
>
> > Does fieldType have any effect on the thing that I specify should be
> > unique?
> >
> > uniqueKey has been working for me up until recently. I change the
> > field that is unique from type "string" to a fieldType that I have
> > defined. Now when I do an update I get a newly created document (so
> > that I have duplicates).
> >
> > Has anyone else had this problem before?
> >
>

RE: Re: uniqueKey and custom fieldType

Reply via email to