You're right Erick. for the Hash.murmurhash3_x86_32 method I don't know should I pass my Id directly or with specific format like '1874f9aa-4cad-4839-a282-d624fe2c40c6!document_id', so I used a predefined method that get shard name directly.
createCollection method doesn't create a collection physically on SolrCould, it's only a reference to the size of shards of the collection. Also the CloudSolrClient doesn't have a method called "getCollection" may be related to the SolrRequest class which is not used on my code. I used the following code to target my shards String id = document.getFieldValue("document_id").toString(); Slice slice = router.getTargetSlice(id, document, null, null, solrCollection ); String shard = slice.getName(); if(targetShards.contains(shard)){ bufferDocuments.add(document); } Thanks for your help, Mahmoud On Fri, Dec 14, 2018 at 11:20 PM Erick Erickson <erickerick...@gmail.com> wrote: > Why do you need to create a collection? That's probably just there in > the test code to have something to test against. > > WARNING: I haven't verified this, but it should be something like the > following. What you need > is the hash range for the shard (slice) you're trying to update, then > send each doc ID through > the hash function and, if the result falls in the range of your target > shard, index the doc. > > CloudSolrClient cloudSolrClient = ..... > > DocCollection coll = cloudSolrClient.getCollection(collName); > Slice slice = coll.getSlice("shard_name_you_care_about"); // you can > get all the slices and interate BTW. > DocRouter.Range range = slice.getRange() > > for (each doc) { > int hash = Hash.murmurhash3_x86_32(whatever_your_unique_key_is, 0, > id.length(), 0); > if (range.includes(hash)) { > index it to Solr > } > } > > "Hash" is in org.apache.solr.common.util, in > > solr-solrj-######.jar, part of the normal distro. > > Best, > Erick > On Fri, Dec 14, 2018 at 11:53 AM Mahmoud Almokadem > <prog.mahm...@gmail.com> wrote: > > > > Thanks Erick, > > > > I got it from TestHashPartitioner.java > > > > > https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/test/org/apache/solr/cloud/TestHashPartitioner.java > > > > Here is a sample code > > > > router = DocRouter.getDocRouter(CompositeIdRouter.NAME); > > int shardsCount = 12; > > solrCollection = createCollection(shardsCount, router); > > > > SolrInputDocument document = getSolrDocument(item); //need to implement > > this method to get SolrInputDocument > > > > String id = "1874f9aa-4cad-4839-a282-d624fe2c40c6" > > Slice slice = router.getTargetSlice(id, document, null, null, > > solrCollection ); > > String shardName = slice.getName(); // shard1, shard2, ... etc > > > > //Helper methods from > > DocCollection createCollection(int nSlices, DocRouter router) { > > List<DocRouter.Range> ranges = router.partitionRange(nSlices, > > router.fullRange()); > > > > Map<String,Slice> slices = new HashMap<>(); > > for (int i=0; i<ranges.size(); i++) { > > DocRouter.Range range = ranges.get(i); > > Slice slice = new Slice("shard"+(i+1), null, > > map("range",range)); > > slices.put(slice.getName(), slice); > > } > > > > DocCollection coll = new DocCollection("collection1", slices, > null, > > router); > > return coll; > > } > > > > > > public static Map map(Object... params) { > > LinkedHashMap ret = new LinkedHashMap(); > > for (int i=0; i<params.length; i+=2) { > > Object o = ret.put(params[i], params[i+1]); > > // TODO: handle multi-valued map? > > } > > return ret; > > } > > > > > > Mahmoud > > > > On Fri, Dec 14, 2018 at 7:06 PM Mahmoud Almokadem < > prog.mahm...@gmail.com> > > wrote: > > > > > Thanks Erick, > > > > > > You know how to use this method. Or I need to dive into the code? > > > > > > I've the document_id as string uniqueKey and have 12 shards. > > > > > > On Fri, Dec 14, 2018 at 5:58 PM Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > >> Sure. Of course you have to make sure you use the exact same hashing > > >> algorithm on the <uniqueKey>. > > >> > > >> See CompositeIdRouter.sliceHash > > >> > > >> Best, > > >> Erick > > >> On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem > > >> <prog.mahm...@gmail.com> wrote: > > >> > > > >> > Hello, > > >> > > > >> > I've a corruption on some of the shards on my collection and I've a > full > > >> > dataset on my database, and I'm using CompositeId for routing > documents. > > >> > > > >> > Can I traverse the whole dataset and do something like hashing the > > >> > document_id to identify that this document belongs to a specific > shard > > >> to > > >> > send the desired documents only instead of reindex the whole > dataset? > > >> > > > >> > Sincerely, > > >> > Mahmoud > > >> > > > >