Re: Different ids for the same document in different replicas.

2014-11-13 Thread Garth Grimm
OK.  So it sounds like doctorURL is a good key, but you don’t like the special 
characters.  I’ve used MD5 hashes of URLs before as a way to convert unique 
URLs into unique alphanumeric strings in a repeatable way.  I think most 
programming languages contain libraries for doing that as you feed the data to 
Solr (Java certainly does).  Other hashing or encoding mechanisms could be used 
if you wanted to be able to programmatically convert from the doctorURL to the 
string you want to use and back again.

Anyway, the point there being that you have a repeatable unique key that is 
derived directly from the data you’re storing.  Not a random ID value that will 
be different every time you feed the same thing in.

BTW, you can certainly use a custom field type to do the hashing work, but I’d 
suggest you do that before feeding the data to SolrCloud.  If you do it outside 
of SolrCloud, then SolrCloud can use it for routing to the correct shard.  If 
you try to do it solely in a field type, the field type output won’t be 
available until the indexing is actually occurring, which is too late for 
routing purposes.  And that means you can’t ensure that subsequent re-feeds of 
the same thing will overwrite the old values since you can’t make sure they get 
routed to the same shard.

 On Nov 12, 2014, at 7:50 PM, Meraj A. Khan mera...@gmail.com wrote:
 
 Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
 mechanism because urls can have special characters that can caise issue
 with Solr lookup.
 
 I guess I should rephrase my question to ,how to auto generate the unique
 keys in the id field when using SolrCloud?
 On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com
 wrote:
 
 You mention you already have a unique Key identified for the data you’re
 storing in Solr:
 
 uniqueKeydoctorIduniquekey
 
 If that’s the field you’re using to uniquely identify each thing you’re
 storing in the solr index, why do you want to have an id field that is
 populated with some random value?  You’ll be using the doctorId field as
 the key, and the id field will have no real meaning in your Data Model.
 
 If doctorId actually isn’t unique to each item you plan on storing in
 Solr, is there any other field that is?  If so, use that field as your
 unique key.
 
 Remember, this uniqueKeys are usually used for routing documents to shards
 in SolrCloud, and are used to ensure that later updates of the same “thing”
 overwrite the old one, rather than generating multiple copies.  So the keys
 really should be something derived from the data your storing.  I’m not
 sure if I understand why you would want to have the key randomly generated.
 
 On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote:
 
 Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
 string only blank ids are being generated ,looks like the id is being
 auto generated only if the the id is set to  type uuid , but in case of
 SolrCloud this id will be unique per replica.
 
 Is there a  way to generate a unique id both in case of SolrCloud with
 out
 using the uuid type or not having a per replica unique id?
 
 The uuid in question is of type .
 
 fieldType name=uuid class=solr.UUIDField indexed=true /
 
 
 On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:
 
 Thanks.
 
 So the issue here is I already have a uniqueKeydoctorIduniquekey
 defined in my schema.xml.
 
 If along with that I also want the id/id field to be automatically
 generated for each document do I have to declare it as a uniquekey as
 well , because I just tried the following setting without the uniqueKey
 for
 id and its only generating blank ids for me.
 
 *schema.xml*
 
   field name=id type=string indexed=true stored=true
   required=true multiValued=false /
 
 *solrconfig.xml*
 
 updateRequestProcessorChain name=uuid
 
   processor class=solr.UUIDUpdateProcessorFactory
   str name=fieldNameid/str
   /processor
   processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
 
 On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
 garthgr...@averyranchconsulting.com wrote:
 
 Looking a little deeper, I did find this about UUIDField
 
 
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
 
 NOTE: Configuring a UUIDField instance with a default value of NEW
 is
 not advisable for most users when using SolrCloud (and not possible if
 the
 UUID value is configured as the unique key field) since the result
 will be
 that each replica of each document will get a unique UUID value. Using
 UUIDUpdateProcessorFactory
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 
 to generate UUID values when documents are added is recomended
 instead.”
 
 That might describe the behavior you saw.  And the use of
 UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
 well
 

Re: Different ids for the same document in different replicas.

2014-11-13 Thread Meraj A. Khan
Thanks , I also noticed that the mandatory _version_ field is also
uniquely generated for every document in the collection , can this be
used as an unique value instead of generating the hashcode for the
urlField.

I want to avoid creation of a custom unique filed if _version_ field
which is mandated for schema.xml actually does that for me.



On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm
garthgr...@averyranchconsulting.com wrote:
 OK.  So it sounds like doctorURL is a good key, but you don’t like the 
 special characters.  I’ve used MD5 hashes of URLs before as a way to convert 
 unique URLs into unique alphanumeric strings in a repeatable way.  I think 
 most programming languages contain libraries for doing that as you feed the 
 data to Solr (Java certainly does).  Other hashing or encoding mechanisms 
 could be used if you wanted to be able to programmatically convert from the 
 doctorURL to the string you want to use and back again.

 Anyway, the point there being that you have a repeatable unique key that is 
 derived directly from the data you’re storing.  Not a random ID value that 
 will be different every time you feed the same thing in.

 BTW, you can certainly use a custom field type to do the hashing work, but 
 I’d suggest you do that before feeding the data to SolrCloud.  If you do it 
 outside of SolrCloud, then SolrCloud can use it for routing to the correct 
 shard.  If you try to do it solely in a field type, the field type output 
 won’t be available until the indexing is actually occurring, which is too 
 late for routing purposes.  And that means you can’t ensure that subsequent 
 re-feeds of the same thing will overwrite the old values since you can’t make 
 sure they get routed to the same shard.

 On Nov 12, 2014, at 7:50 PM, Meraj A. Khan mera...@gmail.com wrote:

 Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
 mechanism because urls can have special characters that can caise issue
 with Solr lookup.

 I guess I should rephrase my question to ,how to auto generate the unique
 keys in the id field when using SolrCloud?
 On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com
 wrote:

 You mention you already have a unique Key identified for the data you’re
 storing in Solr:

 uniqueKeydoctorIduniquekey

 If that’s the field you’re using to uniquely identify each thing you’re
 storing in the solr index, why do you want to have an id field that is
 populated with some random value?  You’ll be using the doctorId field as
 the key, and the id field will have no real meaning in your Data Model.

 If doctorId actually isn’t unique to each item you plan on storing in
 Solr, is there any other field that is?  If so, use that field as your
 unique key.

 Remember, this uniqueKeys are usually used for routing documents to shards
 in SolrCloud, and are used to ensure that later updates of the same “thing”
 overwrite the old one, rather than generating multiple copies.  So the keys
 really should be something derived from the data your storing.  I’m not
 sure if I understand why you would want to have the key randomly generated.

 On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote:

 Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
 string only blank ids are being generated ,looks like the id is being
 auto generated only if the the id is set to  type uuid , but in case of
 SolrCloud this id will be unique per replica.

 Is there a  way to generate a unique id both in case of SolrCloud with
 out
 using the uuid type or not having a per replica unique id?

 The uuid in question is of type .

 fieldType name=uuid class=solr.UUIDField indexed=true /


 On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:

 Thanks.

 So the issue here is I already have a uniqueKeydoctorIduniquekey
 defined in my schema.xml.

 If along with that I also want the id/id field to be automatically
 generated for each document do I have to declare it as a uniquekey as
 well , because I just tried the following setting without the uniqueKey
 for
 id and its only generating blank ids for me.

 *schema.xml*

   field name=id type=string indexed=true stored=true
   required=true multiValued=false /

 *solrconfig.xml*

 updateRequestProcessorChain name=uuid

   processor class=solr.UUIDUpdateProcessorFactory
   str name=fieldNameid/str
   /processor
   processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain


 On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
 garthgr...@averyranchconsulting.com wrote:

 Looking a little deeper, I did find this about UUIDField



 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html

 NOTE: Configuring a UUIDField instance with a default value of NEW
 is
 not advisable for most users when using SolrCloud (and not possible if
 the
 UUID value is configured as the unique key field) since the result
 will be
 that 

Re: Different ids for the same document in different replicas.

2014-11-13 Thread Erick Erickson
bq:  can this be used as an unique value instead of generating the
hashcode for the urlField

Don't do this. The _version_ field is used internally for optimistic
locking etc. I'd be _very_
cautious about co-opting this for anything else.

Best,
Erick

On Thu, Nov 13, 2014 at 8:14 AM, Meraj A. Khan mera...@gmail.com wrote:
 Thanks , I also noticed that the mandatory _version_ field is also
 uniquely generated for every document in the collection , can this be
 used as an unique value instead of generating the hashcode for the
 urlField.

 I want to avoid creation of a custom unique filed if _version_ field
 which is mandated for schema.xml actually does that for me.



 On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm
 garthgr...@averyranchconsulting.com wrote:
 OK.  So it sounds like doctorURL is a good key, but you don’t like the 
 special characters.  I’ve used MD5 hashes of URLs before as a way to convert 
 unique URLs into unique alphanumeric strings in a repeatable way.  I think 
 most programming languages contain libraries for doing that as you feed the 
 data to Solr (Java certainly does).  Other hashing or encoding mechanisms 
 could be used if you wanted to be able to programmatically convert from the 
 doctorURL to the string you want to use and back again.

 Anyway, the point there being that you have a repeatable unique key that is 
 derived directly from the data you’re storing.  Not a random ID value that 
 will be different every time you feed the same thing in.

 BTW, you can certainly use a custom field type to do the hashing work, but 
 I’d suggest you do that before feeding the data to SolrCloud.  If you do it 
 outside of SolrCloud, then SolrCloud can use it for routing to the correct 
 shard.  If you try to do it solely in a field type, the field type output 
 won’t be available until the indexing is actually occurring, which is too 
 late for routing purposes.  And that means you can’t ensure that subsequent 
 re-feeds of the same thing will overwrite the old values since you can’t 
 make sure they get routed to the same shard.

 On Nov 12, 2014, at 7:50 PM, Meraj A. Khan mera...@gmail.com wrote:

 Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
 mechanism because urls can have special characters that can caise issue
 with Solr lookup.

 I guess I should rephrase my question to ,how to auto generate the unique
 keys in the id field when using SolrCloud?
 On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com
 wrote:

 You mention you already have a unique Key identified for the data you’re
 storing in Solr:

 uniqueKeydoctorIduniquekey

 If that’s the field you’re using to uniquely identify each thing you’re
 storing in the solr index, why do you want to have an id field that is
 populated with some random value?  You’ll be using the doctorId field as
 the key, and the id field will have no real meaning in your Data Model.

 If doctorId actually isn’t unique to each item you plan on storing in
 Solr, is there any other field that is?  If so, use that field as your
 unique key.

 Remember, this uniqueKeys are usually used for routing documents to shards
 in SolrCloud, and are used to ensure that later updates of the same “thing”
 overwrite the old one, rather than generating multiple copies.  So the keys
 really should be something derived from the data your storing.  I’m not
 sure if I understand why you would want to have the key randomly generated.

 On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote:

 Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
 string only blank ids are being generated ,looks like the id is being
 auto generated only if the the id is set to  type uuid , but in case of
 SolrCloud this id will be unique per replica.

 Is there a  way to generate a unique id both in case of SolrCloud with
 out
 using the uuid type or not having a per replica unique id?

 The uuid in question is of type .

 fieldType name=uuid class=solr.UUIDField indexed=true /


 On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:

 Thanks.

 So the issue here is I already have a uniqueKeydoctorIduniquekey
 defined in my schema.xml.

 If along with that I also want the id/id field to be automatically
 generated for each document do I have to declare it as a uniquekey as
 well , because I just tried the following setting without the uniqueKey
 for
 id and its only generating blank ids for me.

 *schema.xml*

   field name=id type=string indexed=true stored=true
   required=true multiValued=false /

 *solrconfig.xml*

 updateRequestProcessorChain name=uuid

   processor class=solr.UUIDUpdateProcessorFactory
   str name=fieldNameid/str
   /processor
   processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain


 On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
 garthgr...@averyranchconsulting.com wrote:

 Looking a little deeper, I did find this about 

Re: Different ids for the same document in different replicas.

2014-11-12 Thread S.L
Thanks.

So the issue here is I already have a uniqueKeydoctorIduniquekey
defined in my schema.xml.

If along with that I also want the id/id field to be automatically
generated for each document do I have to declare it as a uniquekey as
well , because I just tried the following setting without the uniqueKey for
id and its only generating blank ids for me.

*schema.xml*

field name=id type=string indexed=true stored=true
required=true multiValued=false /

*solrconfig.xml*

  updateRequestProcessorChain name=uuid

processor class=solr.UUIDUpdateProcessorFactory
str name=fieldNameid/str
/processor
processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain


On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
garthgr...@averyranchconsulting.com wrote:

 Looking a little deeper, I did find this about UUIDField


 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html

 NOTE: Configuring a UUIDField instance with a default value of NEW is
 not advisable for most users when using SolrCloud (and not possible if the
 UUID value is configured as the unique key field) since the result will be
 that each replica of each document will get a unique UUID value. Using
 UUIDUpdateProcessorFactory
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 to generate UUID values when documents are added is recomended instead.”

 That might describe the behavior you saw.  And the use of
 UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
 here:


 http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

 Though I’ve not actually tried that process before.

 On Nov 11, 2014, at 7:39 PM, Garth Grimm 
 garthgr...@averyranchconsulting.commailto:
 garthgr...@averyranchconsulting.com wrote:

 “uuid” isn’t an out of the box field type that I’m familiar with.

 Generally, I’d stick with the out of the box advice of the schema.xml
 file, which includes things like….

   !-- Only remove the id field if you have a very good reason to. While
 not strictly
 required, it is highly recommended. A uniqueKey is present in almost
 all Solr
 installations. See the uniqueKey declaration below where uniqueKey
 is set to id.
   --
   field name=id type=string indexed=true stored=true
 required=true multiValued=false /

 and…

 !-- Field to use to determine and enforce document uniqueness.
  Unless this field is marked with required=false, it will be a
 required field
   --
 uniqueKeyid/uniqueKey

 If you’re creating some key/value pair with uuid as the key as you feed
 documents in, and you know that the uuid values you’re creating are unique,
 just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
 change the key name you send in from ‘uuid’ to ‘id’.

 On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto:
 simpleliving...@gmail.com wrote:

 Hi All,

 I am seeing interesting behavior on the replicas , I have a single
 shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
 number of documents ~375 that are replicated across the six replicas .

 The interesting thing is that the same  document has a different id in
 each one of those replicas .

 This is causing the fq(id:xyz) type queries to fail, depending on
 which replica the query goes to.

 I have  specified the id field in the following manner in schema.xml,
 is it the right way to specifiy an auto generated id in  SolrCloud ?

   field name=id type=uuid indexed=true stored=true
   required=true multiValued=false /


 Thanks.





Re: Different ids for the same document in different replicas.

2014-11-12 Thread S.L
Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
string only blank ids are being generated ,looks like the id is being
auto generated only if the the id is set to  type uuid , but in case of
SolrCloud this id will be unique per replica.

Is there a  way to generate a unique id both in case of SolrCloud with out
using the uuid type or not having a per replica unique id?

The uuid in question is of type .

fieldType name=uuid class=solr.UUIDField indexed=true /


On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:

 Thanks.

 So the issue here is I already have a uniqueKeydoctorIduniquekey
 defined in my schema.xml.

 If along with that I also want the id/id field to be automatically
 generated for each document do I have to declare it as a uniquekey as
 well , because I just tried the following setting without the uniqueKey for
 id and its only generating blank ids for me.

 *schema.xml*

 field name=id type=string indexed=true stored=true
 required=true multiValued=false /

 *solrconfig.xml*

   updateRequestProcessorChain name=uuid

 processor class=solr.UUIDUpdateProcessorFactory
 str name=fieldNameid/str
 /processor
 processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain


 On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
 garthgr...@averyranchconsulting.com wrote:

 Looking a little deeper, I did find this about UUIDField


 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html

 NOTE: Configuring a UUIDField instance with a default value of NEW is
 not advisable for most users when using SolrCloud (and not possible if the
 UUID value is configured as the unique key field) since the result will be
 that each replica of each document will get a unique UUID value. Using
 UUIDUpdateProcessorFactory
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 to generate UUID values when documents are added is recomended instead.”

 That might describe the behavior you saw.  And the use of
 UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
 here:


 http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

 Though I’ve not actually tried that process before.

 On Nov 11, 2014, at 7:39 PM, Garth Grimm 
 garthgr...@averyranchconsulting.commailto:
 garthgr...@averyranchconsulting.com wrote:

 “uuid” isn’t an out of the box field type that I’m familiar with.

 Generally, I’d stick with the out of the box advice of the schema.xml
 file, which includes things like….

   !-- Only remove the id field if you have a very good reason to.
 While not strictly
 required, it is highly recommended. A uniqueKey is present in
 almost all Solr
 installations. See the uniqueKey declaration below where
 uniqueKey is set to id.
   --
   field name=id type=string indexed=true stored=true
 required=true multiValued=false /

 and…

 !-- Field to use to determine and enforce document uniqueness.
  Unless this field is marked with required=false, it will be a
 required field
   --
 uniqueKeyid/uniqueKey

 If you’re creating some key/value pair with uuid as the key as you feed
 documents in, and you know that the uuid values you’re creating are unique,
 just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
 change the key name you send in from ‘uuid’ to ‘id’.

 On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto:
 simpleliving...@gmail.com wrote:

 Hi All,

 I am seeing interesting behavior on the replicas , I have a single
 shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
 number of documents ~375 that are replicated across the six replicas .

 The interesting thing is that the same  document has a different id in
 each one of those replicas .

 This is causing the fq(id:xyz) type queries to fail, depending on
 which replica the query goes to.

 I have  specified the id field in the following manner in schema.xml,
 is it the right way to specifiy an auto generated id in  SolrCloud ?

   field name=id type=uuid indexed=true stored=true
   required=true multiValued=false /


 Thanks.






Re: Different ids for the same document in different replicas.

2014-11-12 Thread Garth Grimm
You mention you already have a unique Key identified for the data you’re 
storing in Solr:

 uniqueKeydoctorIduniquekey

If that’s the field you’re using to uniquely identify each thing you’re storing 
in the solr index, why do you want to have an id field that is populated with 
some random value?  You’ll be using the doctorId field as the key, and the id 
field will have no real meaning in your Data Model.

If doctorId actually isn’t unique to each item you plan on storing in Solr, is 
there any other field that is?  If so, use that field as your unique key.

Remember, this uniqueKeys are usually used for routing documents to shards in 
SolrCloud, and are used to ensure that later updates of the same “thing” 
overwrite the old one, rather than generating multiple copies.  So the keys 
really should be something derived from the data your storing.  I’m not sure if 
I understand why you would want to have the key randomly generated.

 On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote:
 
 Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
 string only blank ids are being generated ,looks like the id is being
 auto generated only if the the id is set to  type uuid , but in case of
 SolrCloud this id will be unique per replica.
 
 Is there a  way to generate a unique id both in case of SolrCloud with out
 using the uuid type or not having a per replica unique id?
 
 The uuid in question is of type .
 
 fieldType name=uuid class=solr.UUIDField indexed=true /
 
 
 On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:
 
 Thanks.
 
 So the issue here is I already have a uniqueKeydoctorIduniquekey
 defined in my schema.xml.
 
 If along with that I also want the id/id field to be automatically
 generated for each document do I have to declare it as a uniquekey as
 well , because I just tried the following setting without the uniqueKey for
 id and its only generating blank ids for me.
 
 *schema.xml*
 
field name=id type=string indexed=true stored=true
required=true multiValued=false /
 
 *solrconfig.xml*
 
  updateRequestProcessorChain name=uuid
 
processor class=solr.UUIDUpdateProcessorFactory
str name=fieldNameid/str
/processor
processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain
 
 
 On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
 garthgr...@averyranchconsulting.com wrote:
 
 Looking a little deeper, I did find this about UUIDField
 
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
 
 NOTE: Configuring a UUIDField instance with a default value of NEW is
 not advisable for most users when using SolrCloud (and not possible if the
 UUID value is configured as the unique key field) since the result will be
 that each replica of each document will get a unique UUID value. Using
 UUIDUpdateProcessorFactory
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 to generate UUID values when documents are added is recomended instead.”
 
 That might describe the behavior you saw.  And the use of
 UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
 here:
 
 
 http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
 
 Though I’ve not actually tried that process before.
 
 On Nov 11, 2014, at 7:39 PM, Garth Grimm 
 garthgr...@averyranchconsulting.commailto:
 garthgr...@averyranchconsulting.com wrote:
 
 “uuid” isn’t an out of the box field type that I’m familiar with.
 
 Generally, I’d stick with the out of the box advice of the schema.xml
 file, which includes things like….
 
  !-- Only remove the id field if you have a very good reason to.
 While not strictly
required, it is highly recommended. A uniqueKey is present in
 almost all Solr
installations. See the uniqueKey declaration below where
 uniqueKey is set to id.
  --
  field name=id type=string indexed=true stored=true
 required=true multiValued=false /
 
 and…
 
 !-- Field to use to determine and enforce document uniqueness.
 Unless this field is marked with required=false, it will be a
 required field
  --
 uniqueKeyid/uniqueKey
 
 If you’re creating some key/value pair with uuid as the key as you feed
 documents in, and you know that the uuid values you’re creating are unique,
 just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
 change the key name you send in from ‘uuid’ to ‘id’.
 
 On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto:
 simpleliving...@gmail.com wrote:
 
 Hi All,
 
 I am seeing interesting behavior on the replicas , I have a single
 shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
 number of documents ~375 that are replicated across the six replicas .
 
 The interesting thing is that the same  document has a different id in
 each one of those replicas .
 
 This is causing the fq(id:xyz) type queries to fail, 

Re: Different ids for the same document in different replicas.

2014-11-12 Thread Meraj A. Khan
Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
mechanism because urls can have special characters that can caise issue
with Solr lookup.

I guess I should rephrase my question to ,how to auto generate the unique
keys in the id field when using SolrCloud?
 On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com
wrote:

 You mention you already have a unique Key identified for the data you’re
 storing in Solr:

  uniqueKeydoctorIduniquekey

 If that’s the field you’re using to uniquely identify each thing you’re
 storing in the solr index, why do you want to have an id field that is
 populated with some random value?  You’ll be using the doctorId field as
 the key, and the id field will have no real meaning in your Data Model.

 If doctorId actually isn’t unique to each item you plan on storing in
 Solr, is there any other field that is?  If so, use that field as your
 unique key.

 Remember, this uniqueKeys are usually used for routing documents to shards
 in SolrCloud, and are used to ensure that later updates of the same “thing”
 overwrite the old one, rather than generating multiple copies.  So the keys
 really should be something derived from the data your storing.  I’m not
 sure if I understand why you would want to have the key randomly generated.

  On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote:
 
  Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
  string only blank ids are being generated ,looks like the id is being
  auto generated only if the the id is set to  type uuid , but in case of
  SolrCloud this id will be unique per replica.
 
  Is there a  way to generate a unique id both in case of SolrCloud with
 out
  using the uuid type or not having a per replica unique id?
 
  The uuid in question is of type .
 
  fieldType name=uuid class=solr.UUIDField indexed=true /
 
 
  On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:
 
  Thanks.
 
  So the issue here is I already have a uniqueKeydoctorIduniquekey
  defined in my schema.xml.
 
  If along with that I also want the id/id field to be automatically
  generated for each document do I have to declare it as a uniquekey as
  well , because I just tried the following setting without the uniqueKey
 for
  id and its only generating blank ids for me.
 
  *schema.xml*
 
 field name=id type=string indexed=true stored=true
 required=true multiValued=false /
 
  *solrconfig.xml*
 
   updateRequestProcessorChain name=uuid
 
 processor class=solr.UUIDUpdateProcessorFactory
 str name=fieldNameid/str
 /processor
 processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain
 
 
  On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
  garthgr...@averyranchconsulting.com wrote:
 
  Looking a little deeper, I did find this about UUIDField
 
 
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
 
  NOTE: Configuring a UUIDField instance with a default value of NEW
 is
  not advisable for most users when using SolrCloud (and not possible if
 the
  UUID value is configured as the unique key field) since the result
 will be
  that each replica of each document will get a unique UUID value. Using
  UUIDUpdateProcessorFactory
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 
  to generate UUID values when documents are added is recomended
 instead.”
 
  That might describe the behavior you saw.  And the use of
  UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
 well
  here:
 
 
 
 http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
 
  Though I’ve not actually tried that process before.
 
  On Nov 11, 2014, at 7:39 PM, Garth Grimm 
  garthgr...@averyranchconsulting.commailto:
  garthgr...@averyranchconsulting.com wrote:
 
  “uuid” isn’t an out of the box field type that I’m familiar with.
 
  Generally, I’d stick with the out of the box advice of the schema.xml
  file, which includes things like….
 
   !-- Only remove the id field if you have a very good reason to.
  While not strictly
 required, it is highly recommended. A uniqueKey is present in
  almost all Solr
 installations. See the uniqueKey declaration below where
  uniqueKey is set to id.
   --
   field name=id type=string indexed=true stored=true
  required=true multiValued=false /
 
  and…
 
  !-- Field to use to determine and enforce document uniqueness.
  Unless this field is marked with required=false, it will be a
  required field
   --
  uniqueKeyid/uniqueKey
 
  If you’re creating some key/value pair with uuid as the key as you feed
  documents in, and you know that the uuid values you’re creating are
 unique,
  just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
  change the key name you send in from ‘uuid’ to ‘id’.
 
  On Nov 11, 2014, at 7:18 PM, 

Different ids for the same document in different replicas.

2014-11-11 Thread S.L
Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?

field name=id type=uuid indexed=true stored=true
required=true multiValued=false /


Thanks.


Re: Different ids for the same document in different replicas.

2014-11-11 Thread Garth Grimm
“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, 
which includes things like….

   !-- Only remove the id field if you have a very good reason to. While not 
strictly
 required, it is highly recommended. A uniqueKey is present in almost all 
Solr 
 installations. See the uniqueKey declaration below where uniqueKey is 
set to id.
   --   
   field name=id type=string indexed=true stored=true required=true 
multiValued=false / 

and…

 !-- Field to use to determine and enforce document uniqueness. 
  Unless this field is marked with required=false, it will be a required 
field
   --
 uniqueKeyid/uniqueKey

If you’re creating some key/value pair with uuid as the key as you feed 
documents in, and you know that the uuid values you’re creating are unique, 
just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change 
the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.com wrote:

 Hi All,
 
 I am seeing interesting behavior on the replicas , I have a single
 shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
 number of documents ~375 that are replicated across the six replicas .
 
 The interesting thing is that the same  document has a different id in
 each one of those replicas .
 
 This is causing the fq(id:xyz) type queries to fail, depending on
 which replica the query goes to.
 
 I have  specified the id field in the following manner in schema.xml,
 is it the right way to specifiy an auto generated id in  SolrCloud ?
 
field name=id type=uuid indexed=true stored=true
required=true multiValued=false /
 
 
 Thanks.



Re: Different ids for the same document in different replicas.

2014-11-11 Thread Garth Grimm
Looking a little deeper, I did find this about UUIDField

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html

NOTE: Configuring a UUIDField instance with a default value of NEW is not 
advisable for most users when using SolrCloud (and not possible if the UUID 
value is configured as the unique key field) since the result will be that each 
replica of each document will get a unique UUID value. Using 
UUIDUpdateProcessorFactoryhttp://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 to generate UUID values when documents are added is recomended instead.”

That might describe the behavior you saw.  And the use of 
UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here:

http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

Though I’ve not actually tried that process before.

On Nov 11, 2014, at 7:39 PM, Garth Grimm 
garthgr...@averyranchconsulting.commailto:garthgr...@averyranchconsulting.com
 wrote:

“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, 
which includes things like….

  !-- Only remove the id field if you have a very good reason to. While not 
strictly
required, it is highly recommended. A uniqueKey is present in almost all 
Solr
installations. See the uniqueKey declaration below where uniqueKey is 
set to id.
  --
  field name=id type=string indexed=true stored=true required=true 
multiValued=false /

and…

!-- Field to use to determine and enforce document uniqueness.
 Unless this field is marked with required=false, it will be a required 
field
  --
uniqueKeyid/uniqueKey

If you’re creating some key/value pair with uuid as the key as you feed 
documents in, and you know that the uuid values you’re creating are unique, 
just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change 
the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L 
simpleliving...@gmail.commailto:simpleliving...@gmail.com wrote:

Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?

  field name=id type=uuid indexed=true stored=true
  required=true multiValued=false /


Thanks.