Re: Reindex single shard on solr

Mahmoud Almokadem Sat, 15 Dec 2018 03:23:12 -0800

You're right Erick.

for the Hash.murmurhash3_x86_32 method I don't know should I pass my Id
directly or with specific format like
'1874f9aa-4cad-4839-a282-d624fe2c40c6!document_id', so I used a predefined
method that get shard name directly.


createCollection method doesn't create a collection physically on
SolrCould, it's only a reference to the size of shards of the collection.

Also the CloudSolrClient doesn't have a method called "getCollection" may
be related to the SolrRequest class which is not used on my code.

I used the following code to target my shards

String id = document.getFieldValue("document_id").toString();
                    Slice slice = router.getTargetSlice(id, document, null,
null, solrCollection );
                    String shard = slice.getName();
                    if(targetShards.contains(shard)){
                        bufferDocuments.add(document);
                    }

Thanks for your help,
Mahmoud

On Fri, Dec 14, 2018 at 11:20 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> Why do you need to create a collection? That's probably just there in
> the test code to have something to test against.
>
> WARNING: I haven't verified this, but it should be something like the
> following. What you need
> is the hash range for the shard (slice) you're trying to update, then
> send each doc ID through
> the hash function and, if the result falls in the range of your target
> shard, index the doc.
>
> CloudSolrClient cloudSolrClient = .....
>
> DocCollection coll = cloudSolrClient.getCollection(collName);
> Slice slice = coll.getSlice("shard_name_you_care_about"); // you can
> get all the slices and interate BTW.
> DocRouter.Range range = slice.getRange()
>
> for (each doc) {
>   int hash =  Hash.murmurhash3_x86_32(whatever_your_unique_key_is, 0,
> id.length(), 0);
>   if (range.includes(hash)) {
>       index it to Solr
>   }
> }
>
> "Hash" is in org.apache.solr.common.util, in
>
> solr-solrj-######.jar, part of the normal distro.
>
> Best,
> Erick
> On Fri, Dec 14, 2018 at 11:53 AM Mahmoud Almokadem
> <prog.mahm...@gmail.com> wrote:
> >
> > Thanks Erick,
> >
> > I got it from TestHashPartitioner.java
> >
> >
> https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/test/org/apache/solr/cloud/TestHashPartitioner.java
> >
> > Here is a sample code
> >
> > router = DocRouter.getDocRouter(CompositeIdRouter.NAME);
> > int shardsCount = 12;
> > solrCollection = createCollection(shardsCount, router);
> >
> > SolrInputDocument document = getSolrDocument(item); //need to implement
> > this method to get SolrInputDocument
> >
> > String id = "1874f9aa-4cad-4839-a282-d624fe2c40c6"
> > Slice slice = router.getTargetSlice(id, document, null, null,
> > solrCollection );
> > String shardName = slice.getName(); // shard1, shard2, ... etc
> >
> > //Helper methods from
> > DocCollection createCollection(int nSlices, DocRouter router) {
> >         List<DocRouter.Range> ranges = router.partitionRange(nSlices,
> > router.fullRange());
> >
> >         Map<String,Slice> slices = new HashMap<>();
> >         for (int i=0; i<ranges.size(); i++) {
> >             DocRouter.Range range = ranges.get(i);
> >             Slice slice = new Slice("shard"+(i+1), null,
> > map("range",range));
> >             slices.put(slice.getName(), slice);
> >         }
> >
> >         DocCollection coll = new DocCollection("collection1", slices,
> null,
> > router);
> >         return coll;
> >     }
> >
> >
> >     public static Map map(Object... params) {
> >         LinkedHashMap ret = new LinkedHashMap();
> >         for (int i=0; i<params.length; i+=2) {
> >             Object o = ret.put(params[i], params[i+1]);
> >             // TODO: handle multi-valued map?
> >         }
> >         return ret;
> >     }
> >
> >
> > Mahmoud
> >
> > On Fri, Dec 14, 2018 at 7:06 PM Mahmoud Almokadem <
> prog.mahm...@gmail.com>
> > wrote:
> >
> > > Thanks Erick,
> > >
> > > You know how to use this method. Or I need to dive into the code?
> > >
> > > I've the document_id as string uniqueKey and have 12 shards.
> > >
> > > On Fri, Dec 14, 2018 at 5:58 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Sure. Of course you have to make sure you use the exact same hashing
> > >> algorithm on the <uniqueKey>.
> > >>
> > >> See CompositeIdRouter.sliceHash
> > >>
> > >> Best,
> > >> Erick
> > >> On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem
> > >> <prog.mahm...@gmail.com> wrote:
> > >> >
> > >> > Hello,
> > >> >
> > >> > I've a corruption on some of the shards on my collection and I've a
> full
> > >> > dataset on my database, and I'm using CompositeId for routing
> documents.
> > >> >
> > >> > Can I traverse the whole dataset and do something like hashing the
> > >> > document_id to identify that this document belongs to a specific
> shard
> > >> to
> > >> > send the desired documents only instead of reindex the whole
> dataset?
> > >> >
> > >> > Sincerely,
> > >> > Mahmoud
> > >>
> > >
>

Re: Reindex single shard on solr

Reply via email to