Some good info on unique id’s for Lucene / Solr can be found here: 
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
-- 
Mark Miller
about.me/markrmiller

On July 24, 2014 at 9:51:28 PM, He haobo (haob...@gmail.com) wrote:

Hi,  

In our Solr collection (Solr 4.8), we have the following unique key  
definition.  
<field name="id" type="string" indexed="true" stored="true"  
required="true" multiValued="false" />  

<uniqueKey>id</uniqueKey>  


In our external java program, we will generate an UUID with  
UUID.randomUUID().toString() first. Then, we will use Cryptographic hash to  
generate a 32 bytes length text and finally use it as id.  

For now, we might need to post more than 20k Solr docs per second. Then  
UUID.randomUUID() or the Cryptographic hash stuff might take time. We might  
have a simple workaround to share one Cryptographic hash stuff for many  
Solr docs. Namely, we want to append sequence to Cryptographic hash such  
as 9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY000000,  
9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY000001,  
9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY000002, etc.  


What we want to know, if we use a 38 bytes length id, are there any  
performance impact for Solr data insert or query? Or, if we use Solr's  
default automatically generated id implementation, should it be more  
efficient?  



Thanks,  
Eternal  

Reply via email to