Re: Reindex single shard on solr

2018-12-15 Thread Mahmoud Almokadem
You're right Erick.

for the Hash.murmurhash3_x86_32 method I don't know should I pass my Id
directly or with specific format like
'1874f9aa-4cad-4839-a282-d624fe2c40c6!document_id', so I used a predefined
method that get shard name directly.

createCollection method doesn't create a collection physically on
SolrCould, it's only a reference to the size of shards of the collection.

Also the CloudSolrClient doesn't have a method called "getCollection" may
be related to the SolrRequest class which is not used on my code.

I used the following code to target my shards

String id = document.getFieldValue("document_id").toString();
Slice slice = router.getTargetSlice(id, document, null,
null, solrCollection );
String shard = slice.getName();
if(targetShards.contains(shard)){
bufferDocuments.add(document);
}

Thanks for your help,
Mahmoud

On Fri, Dec 14, 2018 at 11:20 PM Erick Erickson 
wrote:

> Why do you need to create a collection? That's probably just there in
> the test code to have something to test against.
>
> WARNING: I haven't verified this, but it should be something like the
> following. What you need
> is the hash range for the shard (slice) you're trying to update, then
> send each doc ID through
> the hash function and, if the result falls in the range of your target
> shard, index the doc.
>
> CloudSolrClient cloudSolrClient = .
>
> DocCollection coll = cloudSolrClient.getCollection(collName);
> Slice slice = coll.getSlice("shard_name_you_care_about"); // you can
> get all the slices and interate BTW.
> DocRouter.Range range = slice.getRange()
>
> for (each doc) {
>   int hash =  Hash.murmurhash3_x86_32(whatever_your_unique_key_is, 0,
> id.length(), 0);
>   if (range.includes(hash)) {
>   index it to Solr
>   }
> }
>
> "Hash" is in org.apache.solr.common.util, in
>
> solr-solrj-##.jar, part of the normal distro.
>
> Best,
> Erick
> On Fri, Dec 14, 2018 at 11:53 AM Mahmoud Almokadem
>  wrote:
> >
> > Thanks Erick,
> >
> > I got it from TestHashPartitioner.java
> >
> >
> https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/test/org/apache/solr/cloud/TestHashPartitioner.java
> >
> > Here is a sample code
> >
> > router = DocRouter.getDocRouter(CompositeIdRouter.NAME);
> > int shardsCount = 12;
> > solrCollection = createCollection(shardsCount, router);
> >
> > SolrInputDocument document = getSolrDocument(item); //need to implement
> > this method to get SolrInputDocument
> >
> > String id = "1874f9aa-4cad-4839-a282-d624fe2c40c6"
> > Slice slice = router.getTargetSlice(id, document, null, null,
> > solrCollection );
> > String shardName = slice.getName(); // shard1, shard2, ... etc
> >
> > //Helper methods from
> > DocCollection createCollection(int nSlices, DocRouter router) {
> > List ranges = router.partitionRange(nSlices,
> > router.fullRange());
> >
> > Map slices = new HashMap<>();
> > for (int i=0; i > DocRouter.Range range = ranges.get(i);
> > Slice slice = new Slice("shard"+(i+1), null,
> > map("range",range));
> > slices.put(slice.getName(), slice);
> > }
> >
> > DocCollection coll = new DocCollection("collection1", slices,
> null,
> > router);
> > return coll;
> > }
> >
> >
> > public static Map map(Object... params) {
> > LinkedHashMap ret = new LinkedHashMap();
> > for (int i=0; i > Object o = ret.put(params[i], params[i+1]);
> > // TODO: handle multi-valued map?
> > }
> > return ret;
> > }
> >
> >
> > Mahmoud
> >
> > On Fri, Dec 14, 2018 at 7:06 PM Mahmoud Almokadem <
> prog.mahm...@gmail.com>
> > wrote:
> >
> > > Thanks Erick,
> > >
> > > You know how to use this method. Or I need to dive into the code?
> > >
> > > I've the document_id as string uniqueKey and have 12 shards.
> > >
> > > On Fri, Dec 14, 2018 at 5:58 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Sure. Of course you have to make sure you use the exact same hashing
> > >> algorithm on the .
> > >>
> > >> See CompositeIdRouter.sliceHash
> > >>
> > >> Best,
> > >> Erick
> > >> On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem
> > >>  wrote:
> > >> >
> > >> > Hello,
> > >> >
> > >> > I've a corruption on some of the shards on my collection and I've a
> full
> > >> > dataset on my database, and I'm using CompositeId for routing
> documents.
> > >> >
> > >> > Can I traverse the whole dataset and do something like hashing the
> > >> > document_id to identify that this document belongs to a specific
> shard
> > >> to
> > >> > send the desired documents only instead of reindex the whole
> dataset?
> > >> >
> > >> > Sincerely,
> > >> > Mahmoud
> > >>
> > >
>


Re: Reindex single shard on solr

2018-12-14 Thread Erick Erickson
Why do you need to create a collection? That's probably just there in
the test code to have something to test against.

WARNING: I haven't verified this, but it should be something like the
following. What you need
is the hash range for the shard (slice) you're trying to update, then
send each doc ID through
the hash function and, if the result falls in the range of your target
shard, index the doc.

CloudSolrClient cloudSolrClient = .

DocCollection coll = cloudSolrClient.getCollection(collName);
Slice slice = coll.getSlice("shard_name_you_care_about"); // you can
get all the slices and interate BTW.
DocRouter.Range range = slice.getRange()

for (each doc) {
  int hash =  Hash.murmurhash3_x86_32(whatever_your_unique_key_is, 0,
id.length(), 0);
  if (range.includes(hash)) {
  index it to Solr
  }
}

"Hash" is in org.apache.solr.common.util, in

solr-solrj-##.jar, part of the normal distro.

Best,
Erick
On Fri, Dec 14, 2018 at 11:53 AM Mahmoud Almokadem
 wrote:
>
> Thanks Erick,
>
> I got it from TestHashPartitioner.java
>
> https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/test/org/apache/solr/cloud/TestHashPartitioner.java
>
> Here is a sample code
>
> router = DocRouter.getDocRouter(CompositeIdRouter.NAME);
> int shardsCount = 12;
> solrCollection = createCollection(shardsCount, router);
>
> SolrInputDocument document = getSolrDocument(item); //need to implement
> this method to get SolrInputDocument
>
> String id = "1874f9aa-4cad-4839-a282-d624fe2c40c6"
> Slice slice = router.getTargetSlice(id, document, null, null,
> solrCollection );
> String shardName = slice.getName(); // shard1, shard2, ... etc
>
> //Helper methods from
> DocCollection createCollection(int nSlices, DocRouter router) {
> List ranges = router.partitionRange(nSlices,
> router.fullRange());
>
> Map slices = new HashMap<>();
> for (int i=0; i DocRouter.Range range = ranges.get(i);
> Slice slice = new Slice("shard"+(i+1), null,
> map("range",range));
> slices.put(slice.getName(), slice);
> }
>
> DocCollection coll = new DocCollection("collection1", slices, null,
> router);
> return coll;
> }
>
>
> public static Map map(Object... params) {
> LinkedHashMap ret = new LinkedHashMap();
> for (int i=0; i Object o = ret.put(params[i], params[i+1]);
> // TODO: handle multi-valued map?
> }
> return ret;
> }
>
>
> Mahmoud
>
> On Fri, Dec 14, 2018 at 7:06 PM Mahmoud Almokadem 
> wrote:
>
> > Thanks Erick,
> >
> > You know how to use this method. Or I need to dive into the code?
> >
> > I've the document_id as string uniqueKey and have 12 shards.
> >
> > On Fri, Dec 14, 2018 at 5:58 PM Erick Erickson 
> > wrote:
> >
> >> Sure. Of course you have to make sure you use the exact same hashing
> >> algorithm on the .
> >>
> >> See CompositeIdRouter.sliceHash
> >>
> >> Best,
> >> Erick
> >> On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem
> >>  wrote:
> >> >
> >> > Hello,
> >> >
> >> > I've a corruption on some of the shards on my collection and I've a full
> >> > dataset on my database, and I'm using CompositeId for routing documents.
> >> >
> >> > Can I traverse the whole dataset and do something like hashing the
> >> > document_id to identify that this document belongs to a specific shard
> >> to
> >> > send the desired documents only instead of reindex the whole dataset?
> >> >
> >> > Sincerely,
> >> > Mahmoud
> >>
> >


Re: Reindex single shard on solr

2018-12-14 Thread Mahmoud Almokadem
Thanks Erick,

I got it from TestHashPartitioner.java

https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/test/org/apache/solr/cloud/TestHashPartitioner.java

Here is a sample code

router = DocRouter.getDocRouter(CompositeIdRouter.NAME);
int shardsCount = 12;
solrCollection = createCollection(shardsCount, router);

SolrInputDocument document = getSolrDocument(item); //need to implement
this method to get SolrInputDocument

String id = "1874f9aa-4cad-4839-a282-d624fe2c40c6"
Slice slice = router.getTargetSlice(id, document, null, null,
solrCollection );
String shardName = slice.getName(); // shard1, shard2, ... etc

//Helper methods from
DocCollection createCollection(int nSlices, DocRouter router) {
List ranges = router.partitionRange(nSlices,
router.fullRange());

Map slices = new HashMap<>();
for (int i=0; i
wrote:

> Thanks Erick,
>
> You know how to use this method. Or I need to dive into the code?
>
> I've the document_id as string uniqueKey and have 12 shards.
>
> On Fri, Dec 14, 2018 at 5:58 PM Erick Erickson 
> wrote:
>
>> Sure. Of course you have to make sure you use the exact same hashing
>> algorithm on the .
>>
>> See CompositeIdRouter.sliceHash
>>
>> Best,
>> Erick
>> On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem
>>  wrote:
>> >
>> > Hello,
>> >
>> > I've a corruption on some of the shards on my collection and I've a full
>> > dataset on my database, and I'm using CompositeId for routing documents.
>> >
>> > Can I traverse the whole dataset and do something like hashing the
>> > document_id to identify that this document belongs to a specific shard
>> to
>> > send the desired documents only instead of reindex the whole dataset?
>> >
>> > Sincerely,
>> > Mahmoud
>>
>


Re: Reindex single shard on solr

2018-12-14 Thread Mahmoud Almokadem
Thanks Erick,

You know how to use this method. Or I need to dive into the code?

I've the document_id as string uniqueKey and have 12 shards.

On Fri, Dec 14, 2018 at 5:58 PM Erick Erickson 
wrote:

> Sure. Of course you have to make sure you use the exact same hashing
> algorithm on the .
>
> See CompositeIdRouter.sliceHash
>
> Best,
> Erick
> On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem
>  wrote:
> >
> > Hello,
> >
> > I've a corruption on some of the shards on my collection and I've a full
> > dataset on my database, and I'm using CompositeId for routing documents.
> >
> > Can I traverse the whole dataset and do something like hashing the
> > document_id to identify that this document belongs to a specific shard
> to
> > send the desired documents only instead of reindex the whole dataset?
> >
> > Sincerely,
> > Mahmoud
>


Re: Reindex single shard on solr

2018-12-14 Thread Erick Erickson
Sure. Of course you have to make sure you use the exact same hashing
algorithm on the .

See CompositeIdRouter.sliceHash

Best,
Erick
On Fri, Dec 14, 2018 at 3:36 AM Mahmoud Almokadem
 wrote:
>
> Hello,
>
> I've a corruption on some of the shards on my collection and I've a full
> dataset on my database, and I'm using CompositeId for routing documents.
>
> Can I traverse the whole dataset and do something like hashing the
> document_id to identify that this document belongs to a specific shard  to
> send the desired documents only instead of reindex the whole dataset?
>
> Sincerely,
> Mahmoud


Reindex single shard on solr

2018-12-14 Thread Mahmoud Almokadem
Hello,

I've a corruption on some of the shards on my collection and I've a full
dataset on my database, and I'm using CompositeId for routing documents.

Can I traverse the whole dataset and do something like hashing the
document_id to identify that this document belongs to a specific shard  to
send the desired documents only instead of reindex the whole dataset?

Sincerely,
Mahmoud