Ok I see what Erick`s meant now.. Thanks.

The original index I`m working on contains about 120k documents. Since I have 
no access to the code that pushes documents into the index, I made four copies 
of the same index.

The master node contains no data at all, it simply use the data available in 
its four shards. Knowing that I have 1000 documents matching the keyword "java" 
on each shard I was expecting to receive 4000 documents out of my sharded 
setup. There are only a few documents that are not accounted for (The result 
count is about 3996 which is pretty close but not accurate).

Right now, the index is static so there is no need for any replication so the 
polling interval has no effect.
Later this week, I will configure the replication and have the indexation 
modified to  distribute the documents to each shard using a simple ID modulo 4 
rule.

Were my expectations wrong about the number  of documents? 

-----Original Message-----
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: January-15-13 9:21 AM
To: solr-user@lucene.apache.org
Subject: Re: SOlr 3.5 and sharding

He was referring to master/slave setup, where a slave will poll the master 
periodically asking for index updates. That frequency is configured in 
solrconfig.xml on the slave.

So, you are saying that you have, say 1m documents in your master index.
You then copy your index to four other boxes. At that point you have 1m 
documents on each of those four. Eventually, you'll delete some docs, so'd you 
have 250k on each. You're wondering, before the deletes, you're not seeing 1m 
docs on each of your instances.

Or are you wondering why you're not seeing 1m docs when you do a distributed 
query across all for of these boxes?

Is that correct? 

Upayavira

On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote:
> Hi Erick,
> 
> Thanks for your comments but I am migrating an existing index (single
> instance) to a sharded setup and currently I have no access to the 
> code involved in the indexation process. That`s why I made a simple 
> copy of the index on each shards.
> 
> In the end, the data will be distributed among all shards.
> 
> I was just curious to know why I had not the expected number of 
> documents with my four shards.
> 
> Can you elaborate on  this "polling interval" thing? I am pretty sure 
> I never eared about this...
> 
> Regards
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: January-15-13 8:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOlr 3.5 and sharding
> 
> You're confusing shards and slaves here. Shards are splitting a 
> logical index amongst N machines, where each machine contains a 
> portion of the index. In that setup, you have to configure the slaves 
> to know about the other shards, and the incoming query has to be 
> distributed amongst all the shards to find all the docs.
> 
> In your case, since you're really replicating (rather than sharding), 
> you only have to query _one_ slave, the query doesn't need to be distributed.
> 
> So pull all the sharding stuff out of your config files, put a load 
> balancer in front of your slaves and only send the request to one of 
> them would be the place I'd start.
> 
> Also, don't be at all surprised if the number of hits from the 
> _master_ (which you shouldn't be searching, BTW) is different than the 
> slaves, there's the polling interval to consider.
> 
> Best
> Erick
> 
> 
> On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon < 
> jean-sebastien.vac...@wantedanalytics.com> wrote:
> 
> > Hi,
> >
> > I`m setting up a small Sorl setup consisting of 1 master node and 4 
> > shards. For now, all four shards contains the exact same data. When 
> > I perform a query on each individual shards for the word `java` I am 
> > receiving the same number of docs (as expected). However, when I am 
> > going through the master node using the shards parameters, the 
> > number of results is slightly off by a few documents. There is 
> > nothing special in my setup so I`m looking for hints on why I am 
> > getting this problem
> >
> > Thanks
> >
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date:
> 14/01/2013

-----
Aucun virus trouvé dans ce message.
Analyse effectuée par AVG - www.avg.fr
Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013

Reply via email to