Re: Solr 5.0 - uniqueKey case insensitive ?

Daniel Collins Wed, 06 May 2015 00:45:10 -0700

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!


Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the "same", but have values "HELLO" and "hello",
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies "pn" and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson <erickerick...@gmail.com> wrote:

> Well, "working fine" may be a bit of an overstatement. That has never
> been officially supported, so it "just happened" to work in 3.6.
>
> As Chris points out, if you're using SolrCloud then this will _not_
> work as routing happens early in the process, i.e. before the analysis
> chain gets the token so various copies of the doc will exist on
> different shards.
>
> Best,
> Erick
>
> On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina <bmann...@free.fr> wrote:
> > Hello Chris,
> >
> > yes I confirm on my SOLR3.6 it works fine since several years, and each
> doc
> > added with same code is updated not added.
> >
> > To be more clear, I receive docs with a field name "pn" and it's the
> > uniqueKey, and it always in uppercase
> >
> > so I must define in my schema.xml
> >
> >     <field name="id" type="string" multiValued="false" indexed="true"
> > required="true" stored="true"/>
> >     <field name="pn" type="text_general" multiValued="true"
> indexed="true"
> > stored="false"/>
> > ...
> >    <uniqueKey>id</uniqueKey>
> > ...
> >   <copyField source="id" dest="pn"/>
> >
> > but the application that use solr already exists so it requests with pn
> > field not id, i cannot change that.
> > and in each docs I receive, there is not id field, just pn field, and  i
> > cannot also change that.
> >
> > so there is a problem no ? I must import a id field and request a pn
> field,
> > but I have a pn field only for import...
> >
> >
> >
> > Le 05/05/2015 01:00, Chris Hostetter a écrit :
> >>
> >> : On SOLR3.6, I defined a string_ci field like this:
> >> :
> >> : <fieldType name="string_ci" class="solr.TextField"
> >> : sortMissingLast="true" omitNorms="true">
> >> :     <analyzer>
> >> :       <tokenizer class="solr.KeywordTokenizerFactory"/>
> >> :       <filter class="solr.LowerCaseFilterFactory"/>
> >> :     </analyzer>
> >> :     </fieldType>
> >> :
> >> : <field name="pn" type="string_ci" multiValued="false" indexed="true"
> >> : required="true" stored="true"/>
> >>
> >>
> >> I'm really suprised that field would have worked for you (reliably) as a
> >> uniqueKey field even in Solr 3.6.
> >>
> >> the best practice for something like what you describe has always (going
> >> back to Solr 1.x) been to use a copyField to create a case insensitive
> >> copy of your uniqueKey for searching.
> >>
> >> if, for some reason, you really want case insensitve *updates* (so a doc
> >> with id "foo" overwrites a doc with id "FOO" then the only reliable way
> to
> >> make something like that work is to do the lowercassing in an
> >> UpdateProcessor to ensure it happens *before* the docs are distributed
> to
> >> the correct shard, and so the correct existing doc is overwritten (even
> if
> >> you aren't using solr cloud)
> >>
> >>
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
> >>
> >
> >
> > ---
> > Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> > parce que la protection avast! Antivirus est active.
> > http://www.avast.com
> >
>

Re: Solr 5.0 - uniqueKey case insensitive ?

Reply via email to