Re: Different ids for the same document in different replicas.
OK. So it sounds like doctorURL is a good key, but you don’t like the special characters. I’ve used MD5 hashes of URLs before as a way to convert unique URLs into unique alphanumeric strings in a repeatable way. I think most programming languages contain libraries for doing that as you feed the data to Solr (Java certainly does). Other hashing or encoding mechanisms could be used if you wanted to be able to programmatically convert from the doctorURL to the string you want to use and back again. Anyway, the point there being that you have a repeatable unique key that is derived directly from the data you’re storing. Not a random ID value that will be different every time you feed the same thing in. BTW, you can certainly use a custom field type to do the hashing work, but I’d suggest you do that before feeding the data to SolrCloud. If you do it outside of SolrCloud, then SolrCloud can use it for routing to the correct shard. If you try to do it solely in a field type, the field type output won’t be available until the indexing is actually occurring, which is too late for routing purposes. And that means you can’t ensure that subsequent re-feeds of the same thing will overwrite the old values since you can’t make sure they get routed to the same shard. On Nov 12, 2014, at 7:50 PM, Meraj A. Khan mera...@gmail.com wrote: Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup mechanism because urls can have special characters that can caise issue with Solr lookup. I guess I should rephrase my question to ,how to auto generate the unique keys in the id field when using SolrCloud? On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: You mention you already have a unique Key identified for the data you’re storing in Solr: uniqueKeydoctorIduniquekey If that’s the field you’re using to uniquely identify each thing you’re storing in the solr index, why do you want to have an id field that is populated with some random value? You’ll be using the doctorId field as the key, and the id field will have no real meaning in your Data Model. If doctorId actually isn’t unique to each item you plan on storing in Solr, is there any other field that is? If so, use that field as your unique key. Remember, this uniqueKeys are usually used for routing documents to shards in SolrCloud, and are used to ensure that later updates of the same “thing” overwrite the old one, rather than generating multiple copies. So the keys really should be something derived from the data your storing. I’m not sure if I understand why you would want to have the key randomly generated. On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote: Just tried adding uniqueKeyid/uniqueKey while keeping id type= string only blank ids are being generated ,looks like the id is being auto generated only if the the id is set to type uuid , but in case of SolrCloud this id will be unique per replica. Is there a way to generate a unique id both in case of SolrCloud with out using the uuid type or not having a per replica unique id? The uuid in question is of type . fieldType name=uuid class=solr.UUIDField indexed=true / On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote: Thanks. So the issue here is I already have a uniqueKeydoctorIduniquekey defined in my schema.xml. If along with that I also want the id/id field to be automatically generated for each document do I have to declare it as a uniquekey as well , because I just tried the following setting without the uniqueKey for id and its only generating blank ids for me. *schema.xml* field name=id type=string indexed=true stored=true required=true multiValued=false / *solrconfig.xml* updateRequestProcessorChain name=uuid processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactory http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html to generate UUID values when documents are added is recomended instead.” That might describe the behavior you saw. And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
Re: Different ids for the same document in different replicas.
Thanks , I also noticed that the mandatory _version_ field is also uniquely generated for every document in the collection , can this be used as an unique value instead of generating the hashcode for the urlField. I want to avoid creation of a custom unique filed if _version_ field which is mandated for schema.xml actually does that for me. On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm garthgr...@averyranchconsulting.com wrote: OK. So it sounds like doctorURL is a good key, but you don’t like the special characters. I’ve used MD5 hashes of URLs before as a way to convert unique URLs into unique alphanumeric strings in a repeatable way. I think most programming languages contain libraries for doing that as you feed the data to Solr (Java certainly does). Other hashing or encoding mechanisms could be used if you wanted to be able to programmatically convert from the doctorURL to the string you want to use and back again. Anyway, the point there being that you have a repeatable unique key that is derived directly from the data you’re storing. Not a random ID value that will be different every time you feed the same thing in. BTW, you can certainly use a custom field type to do the hashing work, but I’d suggest you do that before feeding the data to SolrCloud. If you do it outside of SolrCloud, then SolrCloud can use it for routing to the correct shard. If you try to do it solely in a field type, the field type output won’t be available until the indexing is actually occurring, which is too late for routing purposes. And that means you can’t ensure that subsequent re-feeds of the same thing will overwrite the old values since you can’t make sure they get routed to the same shard. On Nov 12, 2014, at 7:50 PM, Meraj A. Khan mera...@gmail.com wrote: Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup mechanism because urls can have special characters that can caise issue with Solr lookup. I guess I should rephrase my question to ,how to auto generate the unique keys in the id field when using SolrCloud? On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: You mention you already have a unique Key identified for the data you’re storing in Solr: uniqueKeydoctorIduniquekey If that’s the field you’re using to uniquely identify each thing you’re storing in the solr index, why do you want to have an id field that is populated with some random value? You’ll be using the doctorId field as the key, and the id field will have no real meaning in your Data Model. If doctorId actually isn’t unique to each item you plan on storing in Solr, is there any other field that is? If so, use that field as your unique key. Remember, this uniqueKeys are usually used for routing documents to shards in SolrCloud, and are used to ensure that later updates of the same “thing” overwrite the old one, rather than generating multiple copies. So the keys really should be something derived from the data your storing. I’m not sure if I understand why you would want to have the key randomly generated. On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote: Just tried adding uniqueKeyid/uniqueKey while keeping id type= string only blank ids are being generated ,looks like the id is being auto generated only if the the id is set to type uuid , but in case of SolrCloud this id will be unique per replica. Is there a way to generate a unique id both in case of SolrCloud with out using the uuid type or not having a per replica unique id? The uuid in question is of type . fieldType name=uuid class=solr.UUIDField indexed=true / On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote: Thanks. So the issue here is I already have a uniqueKeydoctorIduniquekey defined in my schema.xml. If along with that I also want the id/id field to be automatically generated for each document do I have to declare it as a uniquekey as well , because I just tried the following setting without the uniqueKey for id and its only generating blank ids for me. *schema.xml* field name=id type=string indexed=true stored=true required=true multiValued=false / *solrconfig.xml* updateRequestProcessorChain name=uuid processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that
Re: Different ids for the same document in different replicas.
bq: can this be used as an unique value instead of generating the hashcode for the urlField Don't do this. The _version_ field is used internally for optimistic locking etc. I'd be _very_ cautious about co-opting this for anything else. Best, Erick On Thu, Nov 13, 2014 at 8:14 AM, Meraj A. Khan mera...@gmail.com wrote: Thanks , I also noticed that the mandatory _version_ field is also uniquely generated for every document in the collection , can this be used as an unique value instead of generating the hashcode for the urlField. I want to avoid creation of a custom unique filed if _version_ field which is mandated for schema.xml actually does that for me. On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm garthgr...@averyranchconsulting.com wrote: OK. So it sounds like doctorURL is a good key, but you don’t like the special characters. I’ve used MD5 hashes of URLs before as a way to convert unique URLs into unique alphanumeric strings in a repeatable way. I think most programming languages contain libraries for doing that as you feed the data to Solr (Java certainly does). Other hashing or encoding mechanisms could be used if you wanted to be able to programmatically convert from the doctorURL to the string you want to use and back again. Anyway, the point there being that you have a repeatable unique key that is derived directly from the data you’re storing. Not a random ID value that will be different every time you feed the same thing in. BTW, you can certainly use a custom field type to do the hashing work, but I’d suggest you do that before feeding the data to SolrCloud. If you do it outside of SolrCloud, then SolrCloud can use it for routing to the correct shard. If you try to do it solely in a field type, the field type output won’t be available until the indexing is actually occurring, which is too late for routing purposes. And that means you can’t ensure that subsequent re-feeds of the same thing will overwrite the old values since you can’t make sure they get routed to the same shard. On Nov 12, 2014, at 7:50 PM, Meraj A. Khan mera...@gmail.com wrote: Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup mechanism because urls can have special characters that can caise issue with Solr lookup. I guess I should rephrase my question to ,how to auto generate the unique keys in the id field when using SolrCloud? On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: You mention you already have a unique Key identified for the data you’re storing in Solr: uniqueKeydoctorIduniquekey If that’s the field you’re using to uniquely identify each thing you’re storing in the solr index, why do you want to have an id field that is populated with some random value? You’ll be using the doctorId field as the key, and the id field will have no real meaning in your Data Model. If doctorId actually isn’t unique to each item you plan on storing in Solr, is there any other field that is? If so, use that field as your unique key. Remember, this uniqueKeys are usually used for routing documents to shards in SolrCloud, and are used to ensure that later updates of the same “thing” overwrite the old one, rather than generating multiple copies. So the keys really should be something derived from the data your storing. I’m not sure if I understand why you would want to have the key randomly generated. On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote: Just tried adding uniqueKeyid/uniqueKey while keeping id type= string only blank ids are being generated ,looks like the id is being auto generated only if the the id is set to type uuid , but in case of SolrCloud this id will be unique per replica. Is there a way to generate a unique id both in case of SolrCloud with out using the uuid type or not having a per replica unique id? The uuid in question is of type . fieldType name=uuid class=solr.UUIDField indexed=true / On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote: Thanks. So the issue here is I already have a uniqueKeydoctorIduniquekey defined in my schema.xml. If along with that I also want the id/id field to be automatically generated for each document do I have to declare it as a uniquekey as well , because I just tried the following setting without the uniqueKey for id and its only generating blank ids for me. *schema.xml* field name=id type=string indexed=true stored=true required=true multiValued=false / *solrconfig.xml* updateRequestProcessorChain name=uuid processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Looking a little deeper, I did find this about
Re: Different ids for the same document in different replicas.
Thanks. So the issue here is I already have a uniqueKeydoctorIduniquekey defined in my schema.xml. If along with that I also want the id/id field to be automatically generated for each document do I have to declare it as a uniquekey as well , because I just tried the following setting without the uniqueKey for id and its only generating blank ids for me. *schema.xml* field name=id type=string indexed=true stored=true required=true multiValued=false / *solrconfig.xml* updateRequestProcessorChain name=uuid processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactory http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html to generate UUID values when documents are added is recomended instead.” That might describe the behavior you saw. And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here: http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/ Though I’ve not actually tried that process before. On Nov 11, 2014, at 7:39 PM, Garth Grimm garthgr...@averyranchconsulting.commailto: garthgr...@averyranchconsulting.com wrote: “uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto: simpleliving...@gmail.com wrote: Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.
Re: Different ids for the same document in different replicas.
Just tried adding uniqueKeyid/uniqueKey while keeping id type= string only blank ids are being generated ,looks like the id is being auto generated only if the the id is set to type uuid , but in case of SolrCloud this id will be unique per replica. Is there a way to generate a unique id both in case of SolrCloud with out using the uuid type or not having a per replica unique id? The uuid in question is of type . fieldType name=uuid class=solr.UUIDField indexed=true / On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote: Thanks. So the issue here is I already have a uniqueKeydoctorIduniquekey defined in my schema.xml. If along with that I also want the id/id field to be automatically generated for each document do I have to declare it as a uniquekey as well , because I just tried the following setting without the uniqueKey for id and its only generating blank ids for me. *schema.xml* field name=id type=string indexed=true stored=true required=true multiValued=false / *solrconfig.xml* updateRequestProcessorChain name=uuid processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactory http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html to generate UUID values when documents are added is recomended instead.” That might describe the behavior you saw. And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here: http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/ Though I’ve not actually tried that process before. On Nov 11, 2014, at 7:39 PM, Garth Grimm garthgr...@averyranchconsulting.commailto: garthgr...@averyranchconsulting.com wrote: “uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto: simpleliving...@gmail.com wrote: Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.
Re: Different ids for the same document in different replicas.
You mention you already have a unique Key identified for the data you’re storing in Solr: uniqueKeydoctorIduniquekey If that’s the field you’re using to uniquely identify each thing you’re storing in the solr index, why do you want to have an id field that is populated with some random value? You’ll be using the doctorId field as the key, and the id field will have no real meaning in your Data Model. If doctorId actually isn’t unique to each item you plan on storing in Solr, is there any other field that is? If so, use that field as your unique key. Remember, this uniqueKeys are usually used for routing documents to shards in SolrCloud, and are used to ensure that later updates of the same “thing” overwrite the old one, rather than generating multiple copies. So the keys really should be something derived from the data your storing. I’m not sure if I understand why you would want to have the key randomly generated. On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote: Just tried adding uniqueKeyid/uniqueKey while keeping id type= string only blank ids are being generated ,looks like the id is being auto generated only if the the id is set to type uuid , but in case of SolrCloud this id will be unique per replica. Is there a way to generate a unique id both in case of SolrCloud with out using the uuid type or not having a per replica unique id? The uuid in question is of type . fieldType name=uuid class=solr.UUIDField indexed=true / On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote: Thanks. So the issue here is I already have a uniqueKeydoctorIduniquekey defined in my schema.xml. If along with that I also want the id/id field to be automatically generated for each document do I have to declare it as a uniquekey as well , because I just tried the following setting without the uniqueKey for id and its only generating blank ids for me. *schema.xml* field name=id type=string indexed=true stored=true required=true multiValued=false / *solrconfig.xml* updateRequestProcessorChain name=uuid processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactory http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html to generate UUID values when documents are added is recomended instead.” That might describe the behavior you saw. And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here: http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/ Though I’ve not actually tried that process before. On Nov 11, 2014, at 7:39 PM, Garth Grimm garthgr...@averyranchconsulting.commailto: garthgr...@averyranchconsulting.com wrote: “uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto: simpleliving...@gmail.com wrote: Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail,
Re: Different ids for the same document in different replicas.
Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup mechanism because urls can have special characters that can caise issue with Solr lookup. I guess I should rephrase my question to ,how to auto generate the unique keys in the id field when using SolrCloud? On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: You mention you already have a unique Key identified for the data you’re storing in Solr: uniqueKeydoctorIduniquekey If that’s the field you’re using to uniquely identify each thing you’re storing in the solr index, why do you want to have an id field that is populated with some random value? You’ll be using the doctorId field as the key, and the id field will have no real meaning in your Data Model. If doctorId actually isn’t unique to each item you plan on storing in Solr, is there any other field that is? If so, use that field as your unique key. Remember, this uniqueKeys are usually used for routing documents to shards in SolrCloud, and are used to ensure that later updates of the same “thing” overwrite the old one, rather than generating multiple copies. So the keys really should be something derived from the data your storing. I’m not sure if I understand why you would want to have the key randomly generated. On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote: Just tried adding uniqueKeyid/uniqueKey while keeping id type= string only blank ids are being generated ,looks like the id is being auto generated only if the the id is set to type uuid , but in case of SolrCloud this id will be unique per replica. Is there a way to generate a unique id both in case of SolrCloud with out using the uuid type or not having a per replica unique id? The uuid in question is of type . fieldType name=uuid class=solr.UUIDField indexed=true / On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote: Thanks. So the issue here is I already have a uniqueKeydoctorIduniquekey defined in my schema.xml. If along with that I also want the id/id field to be automatically generated for each document do I have to declare it as a uniquekey as well , because I just tried the following setting without the uniqueKey for id and its only generating blank ids for me. *schema.xml* field name=id type=string indexed=true stored=true required=true multiValued=false / *solrconfig.xml* updateRequestProcessorChain name=uuid processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactory http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html to generate UUID values when documents are added is recomended instead.” That might describe the behavior you saw. And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here: http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/ Though I’ve not actually tried that process before. On Nov 11, 2014, at 7:39 PM, Garth Grimm garthgr...@averyranchconsulting.commailto: garthgr...@averyranchconsulting.com wrote: “uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM,
Different ids for the same document in different replicas.
Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.
Re: Different ids for the same document in different replicas.
“uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.
Re: Different ids for the same document in different replicas.
Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactoryhttp://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html to generate UUID values when documents are added is recomended instead.” That might describe the behavior you saw. And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here: http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/ Though I’ve not actually tried that process before. On Nov 11, 2014, at 7:39 PM, Garth Grimm garthgr...@averyranchconsulting.commailto:garthgr...@averyranchconsulting.com wrote: “uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto:simpleliving...@gmail.com wrote: Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.