Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Daniel Collins
Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the same, but have values HELLO and hello,
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies pn and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson erickerick...@gmail.com wrote:

 Well, working fine may be a bit of an overstatement. That has never
 been officially supported, so it just happened to work in 3.6.

 As Chris points out, if you're using SolrCloud then this will _not_
 work as routing happens early in the process, i.e. before the analysis
 chain gets the token so various copies of the doc will exist on
 different shards.

 Best,
 Erick

 On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina bmann...@free.fr wrote:
  Hello Chris,
 
  yes I confirm on my SOLR3.6 it works fine since several years, and each
 doc
  added with same code is updated not added.
 
  To be more clear, I receive docs with a field name pn and it's the
  uniqueKey, and it always in uppercase
 
  so I must define in my schema.xml
 
  field name=id type=string multiValued=false indexed=true
  required=true stored=true/
  field name=pn type=text_general multiValued=true
 indexed=true
  stored=false/
  ...
 uniqueKeyid/uniqueKey
  ...
copyField source=id dest=pn/
 
  but the application that use solr already exists so it requests with pn
  field not id, i cannot change that.
  and in each docs I receive, there is not id field, just pn field, and  i
  cannot also change that.
 
  so there is a problem no ? I must import a id field and request a pn
 field,
  but I have a pn field only for import...
 
 
 
  Le 05/05/2015 01:00, Chris Hostetter a écrit :
 
  : On SOLR3.6, I defined a string_ci field like this:
  :
  : fieldType name=string_ci class=solr.TextField
  : sortMissingLast=true omitNorms=true
  : analyzer
  :   tokenizer class=solr.KeywordTokenizerFactory/
  :   filter class=solr.LowerCaseFilterFactory/
  : /analyzer
  : /fieldType
  :
  : field name=pn type=string_ci multiValued=false indexed=true
  : required=true stored=true/
 
 
  I'm really suprised that field would have worked for you (reliably) as a
  uniqueKey field even in Solr 3.6.
 
  the best practice for something like what you describe has always (going
  back to Solr 1.x) been to use a copyField to create a case insensitive
  copy of your uniqueKey for searching.
 
  if, for some reason, you really want case insensitve *updates* (so a doc
  with id foo overwrites a doc with id FOO then the only reliable way
 to
  make something like that work is to do the lowercassing in an
  UpdateProcessor to ensure it happens *before* the docs are distributed
 to
  the correct shard, and so the correct existing doc is overwritten (even
 if
  you aren't using solr cloud)
 
 
 
  -Hoss
  http://www.lucidworks.com/
 
 
 
 
  ---
  Ce courrier électronique ne contient aucun virus ou logiciel malveillant
  parce que la protection avast! Antivirus est active.
  http://www.avast.com
 



Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Bruno Mannina

Yes thanks it's now for me too.

Daniel, my pn is always in uppercase and I index them always in uppercase.
the problem (solved now after all your answers, thanks) was the request, 
if users

requests with lowercase then solr reply no result and it was not good.

but now the problem is solved, I changed in my source file the name pn 
field to id

and in my schema I use a copy field named pn and it works perfectly.

Thanks a lot !!!

Le 06/05/2015 09:44, Daniel Collins a écrit :

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the same, but have values HELLO and hello,
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies pn and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson erickerick...@gmail.com wrote:


Well, working fine may be a bit of an overstatement. That has never
been officially supported, so it just happened to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina bmann...@free.fr wrote:

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each

doc

added with same code is updated not added.

To be more clear, I receive docs with a field name pn and it's the
uniqueKey, and it always in uppercase

so I must define in my schema.xml

 field name=id type=string multiValued=false indexed=true
required=true stored=true/
 field name=pn type=text_general multiValued=true

indexed=true

stored=false/
...
uniqueKeyid/uniqueKey
...
   copyField source=id dest=pn/

but the application that use solr already exists so it requests with pn
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i
cannot also change that.

so there is a problem no ? I must import a id field and request a pn

field,

but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: fieldType name=string_ci class=solr.TextField
: sortMissingLast=true omitNorms=true
: analyzer
:   tokenizer class=solr.KeywordTokenizerFactory/
:   filter class=solr.LowerCaseFilterFactory/
: /analyzer
: /fieldType
:
: field name=pn type=string_ci multiValued=false indexed=true
: required=true stored=true/


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id foo overwrites a doc with id FOO then the only reliable way

to

make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed

to

the correct shard, and so the correct existing doc is overwritten (even

if

you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-05 Thread Erick Erickson
Well, working fine may be a bit of an overstatement. That has never
been officially supported, so it just happened to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina bmann...@free.fr wrote:
 Hello Chris,

 yes I confirm on my SOLR3.6 it works fine since several years, and each doc
 added with same code is updated not added.

 To be more clear, I receive docs with a field name pn and it's the
 uniqueKey, and it always in uppercase

 so I must define in my schema.xml

 field name=id type=string multiValued=false indexed=true
 required=true stored=true/
 field name=pn type=text_general multiValued=true indexed=true
 stored=false/
 ...
uniqueKeyid/uniqueKey
 ...
   copyField source=id dest=pn/

 but the application that use solr already exists so it requests with pn
 field not id, i cannot change that.
 and in each docs I receive, there is not id field, just pn field, and  i
 cannot also change that.

 so there is a problem no ? I must import a id field and request a pn field,
 but I have a pn field only for import...



 Le 05/05/2015 01:00, Chris Hostetter a écrit :

 : On SOLR3.6, I defined a string_ci field like this:
 :
 : fieldType name=string_ci class=solr.TextField
 : sortMissingLast=true omitNorms=true
 : analyzer
 :   tokenizer class=solr.KeywordTokenizerFactory/
 :   filter class=solr.LowerCaseFilterFactory/
 : /analyzer
 : /fieldType
 :
 : field name=pn type=string_ci multiValued=false indexed=true
 : required=true stored=true/


 I'm really suprised that field would have worked for you (reliably) as a
 uniqueKey field even in Solr 3.6.

 the best practice for something like what you describe has always (going
 back to Solr 1.x) been to use a copyField to create a case insensitive
 copy of your uniqueKey for searching.

 if, for some reason, you really want case insensitve *updates* (so a doc
 with id foo overwrites a doc with id FOO then the only reliable way to
 make something like that work is to do the lowercassing in an
 UpdateProcessor to ensure it happens *before* the docs are distributed to
 the correct shard, and so the correct existing doc is overwritten (even if
 you aren't using solr cloud)



 -Hoss
 http://www.lucidworks.com/




 ---
 Ce courrier électronique ne contient aucun virus ou logiciel malveillant
 parce que la protection avast! Antivirus est active.
 http://www.avast.com



Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina

Dear Solr users,

I have a problem with SOLR5.0 (and not on SOLR3.6)

What kind of field can I use for my uniqueKey field named code if I
want it case insensitive ?

On SOLR3.6, I defined a string_ci field like this:

fieldType name=string_ci class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

field name=pn type=string_ci multiValued=false indexed=true
required=true stored=true/

and it works fine.
- If I add a document with the same code then the doc is updated.
- If I search a document with lower or upper case, the doc is found


But in SOLR5.0, if I use this definition then :
- I can search in lower/upper case, it's OK
- BUT if I add a doc with the same code then the doc is added not updated !?

I read that the problem could be that the type of field is tokenized
instead of use a string.

If I change from string_ci to string, then
- I lost the possibility to search in lower/upper case
- but it works fine to update the doc.

So, could you help me to find the right field type to:

- search in case insensitive
- if I add a document with the same code, the old doc will be updated

Thanks a lot !


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Chris Hostetter

: On SOLR3.6, I defined a string_ci field like this:
: 
: fieldType name=string_ci class=solr.TextField
: sortMissingLast=true omitNorms=true
: analyzer
:   tokenizer class=solr.KeywordTokenizerFactory/
:   filter class=solr.LowerCaseFilterFactory/
: /analyzer
: /fieldType
: 
: field name=pn type=string_ci multiValued=false indexed=true
: required=true stored=true/


I'm really suprised that field would have worked for you (reliably) as a 
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going 
back to Solr 1.x) been to use a copyField to create a case insensitive 
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc 
with id foo overwrites a doc with id FOO then the only reliable way to 
make something like that work is to do the lowercassing in an 
UpdateProcessor to ensure it happens *before* the docs are distributed to 
the correct shard, and so the correct existing doc is overwritten (even if 
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/


Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each 
doc added with same code is updated not added.


To be more clear, I receive docs with a field name pn and it's the 
uniqueKey, and it always in uppercase


so I must define in my schema.xml

field name=id type=string multiValued=false indexed=true 
required=true stored=true/
field name=pn type=text_general multiValued=true 
indexed=true stored=false/

...
   uniqueKeyid/uniqueKey
...
  copyField source=id dest=pn/

but the application that use solr already exists so it requests with pn 
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i 
cannot also change that.


so there is a problem no ? I must import a id field and request a pn 
field, but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: fieldType name=string_ci class=solr.TextField
: sortMissingLast=true omitNorms=true
: analyzer
:   tokenizer class=solr.KeywordTokenizerFactory/
:   filter class=solr.LowerCaseFilterFactory/
: /analyzer
: /fieldType
:
: field name=pn type=string_ci multiValued=false indexed=true
: required=true stored=true/


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id foo overwrites a doc with id FOO then the only reliable way to
make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed to
the correct shard, and so the correct existing doc is overwritten (even if
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com