Re: Sharding configuration

Anca Kopetz Thu, 30 Oct 2014 03:35:02 -0700

Hi,

We did some tests with 4 shards / 4 different tomcat instances on the
same server and the average latency was smaller than the one when having
only one shard.
We tested also é shards on different servers and the performance results
were also worse.


It seems that the sharding does not make any difference for our index in
terms of latency gains.

Thanks for your response,
Anca

On 10/28/2014 08:44 PM, Ramkumar R. Aiyengar wrote:

As far as the second option goes, unless you are using a large amount of
memory and you reach a point where a JVM can't sensibly deal with a GC
load, having multiple JVMs wouldn't buy you much. With a 26GB index, you
probably haven't reached that point. There are also other shared resources
at an instance level like connection pools and ZK connections, but those
are tunable and you probably aren't pushing them as well (I would imagine
you are just trying to have only a handful of shards given that you aren't
sharded at all currently).

That leaves single vs multiple machines. Assuming the network isn't a
bottleneck, and given the same amount of resources overall (number of
cores, amount of memory, IO bandwidth times number of machines), it
shouldn't matter between the two. If you are procuring new hardware, I
would say buy more, smaller machines, but if you already have the hardware,
you could serve as much as possible off a machine before moving to a
second. There's nothing which limits the number of shards as long as the
underlying machine has the sufficient amount of parallelism.

Again, this advice is for a small number of shards, if you had a lot more
(hundreds) of shards and significant volume of requests, things start to
become a bit more fuzzy with other limits kicking in.
On 28 Oct 2014 09:26, "Anca Kopetz"<anca.kop...@kelkoo.com>  wrote:

Hi,

We have a SolrCloud configuration of 10 servers, no sharding, 20
millions of documents, the index has 26 GB.
As the number of documents has increased recently, the performance of
the cluster decreased.

We thought of sharding the index, in order to measure the latency. What
is the best approach ?
- to use shard splitting and have several sub-shards on the same server
and in the same tomcat instance
- having several shards on the same server but on different tomcat
instances
- having one shard on each server (for example 2 shards / 5 replicas on
10 servers)

What's the impact of these 3 configuration on performance ?

Thanks,
Anca

--

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à
l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
destinataire de ce message, merci de le détruire et d'en avertir
l'expéditeur.


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Re: Sharding configuration

Reply via email to