Re: Reindex single shard on solr

Erick Erickson Fri, 14 Dec 2018 13:20:25 -0800

Why do you need to create a collection? That's probably just there in
the test code to have something to test against.


WARNING: I haven't verified this, but it should be something like the
following. What you need
is the hash range for the shard (slice) you're trying to update, then
send each doc ID through
the hash function and, if the result falls in the range of your target
shard, index the doc.

CloudSolrClient cloudSolrClient = .....

DocCollection coll = cloudSolrClient.getCollection(collName);
Slice slice = coll.getSlice("shard_name_you_care_about"); // you can
get all the slices and interate BTW.
DocRouter.Range range = slice.getRange()

for (each doc) {
  int hash =  Hash.murmurhash3_x86_32(whatever_your_unique_key_is, 0,
id.length(), 0);
  if (range.includes(hash)) {
      index it to Solr
  }
}

"Hash" is in org.apache.solr.common.util, in

solr-solrj-######.jar, part of the normal distro.

Best,
Erick
On Fri, Dec 14, 2018 at 11:53 AM Mahmoud Almokadem
<prog.mahm...@gmail.com> wrote:
>
> Thanks Erick,
>
> I got it from TestHashPartitioner.java
>
> https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/test/org/apache/solr/cloud/TestHashPartitioner.java
>
> Here is a sample code
>
> router = DocRouter.getDocRouter(CompositeIdRouter.NAME);
> int shardsCount = 12;
> solrCollection = createCollection(shardsCount, router);
>
> SolrInputDocument document = getSolrDocument(item); //need to implement
> this method to get SolrInputDocument
>
> String id = "1874f9aa-4cad-4839-a282-d624fe2c40c6"
> Slice slice = router.getTargetSlice(id, document, null, null,
> solrCollection );
> String shardName = slice.getName(); // shard1, shard2, ... etc
>
> //Helper methods from
> DocCollection createCollection(int nSlices, DocRouter router) {
>         List<DocRouter.Range> ranges = router.partitionRange(nSlices,
> router.fullRange());
>
>         Map<String,Slice> slices = new HashMap<>();
>         for (int i=0; i<ranges.size(); i++) {
>             DocRouter.Range range = ranges.get(i);
>             Slice slice = new Slice("shard"+(i+1), null,
> map("range",range));
>             slices.put(slice.getName(), slice);
>         }
>
>         DocCollection coll = new DocCollection("collection1", slices, null,
> router);
>         return coll;
>     }
>
>
>     public static Map map(Object... params) {
>         LinkedHashMap ret = new LinkedHashMap();
>         for (int i=0; i<params.length; i+=2) {
>             Object o = ret.put(params[i], params[i+1]);
>             // TODO: handle multi-valued map?
>         }
>         return ret;
>     }
>
>
> Mahmoud
>
> On Fri, Dec 14, 2018 at 7:06 PM Mahmoud Almokadem <prog.mahm...@gmail.com>
> wrote:
>
> > Thanks Erick,
> >
> > You know how to use this method. Or I need to dive into the code?
> >
> > I've the document_id as string uniqueKey and have 12 shards.
> >
> > On Fri, Dec 14, 2018 at 5:58 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> Sure. Of course you have to make sure you use the exact same hashing
> >> algorithm on the <uniqueKey>.
> >>
> >> See CompositeIdRouter.sliceHash
> >>
> >> Best,
> >> Erick
> >> On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem
> >> <prog.mahm...@gmail.com> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I've a corruption on some of the shards on my collection and I've a full
> >> > dataset on my database, and I'm using CompositeId for routing documents.
> >> >
> >> > Can I traverse the whole dataset and do something like hashing the
> >> > document_id to identify that this document belongs to a specific shard
> >> to
> >> > send the desired documents only instead of reindex the whole dataset?
> >> >
> >> > Sincerely,
> >> > Mahmoud
> >>
> >

Re: Reindex single shard on solr

Reply via email to