Re: performance crossover between single index and sharding

Bernd Fehling Thu, 04 Aug 2011 06:02:46 -0700


java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)


java: file format elf64-x86-64

Including the -d64 switch.


Am 04.08.2011 14:40, schrieb Bob Sandiford:

Dumb question time - you are using a 64 bit Java, and not a 32 bit Java?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

-----Original Message-----
From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
Sent: Thursday, August 04, 2011 2:39 AM
To: solr-user@lucene.apache.org
Subject: Re: performance crossover between single index and sharding

Hi Shawn,

the 0.05 seconds for search time at peek times (3 qps) is my target for
Solr.
The numbers for solr are from Solr's statistic report page. So 39.5
seconds
average per request is definately to long and I have to change to
sharding.

For FAST system the numbers for the search dispatcher are:
       0.042 sec elapsed per normal search, on avg.
       0.053 sec average uncached normal search time (last 100 queries).
       99.898% of searches using<  1 sec
       99.999% of searches using<  3 sec
       0.000% of all requests timed out
       22454567.577 sec time up (that is 259 days)

Is there a report page for those numbers for Solr?

About the RAM, the 32GB RAM sind physical for each VM and the 20GB RAM
are -Xmx for Java.
Yesterday I noticed that we are running out of heap during replication
so I have to
increase -Xmx to about 22g.

The reported 0.6 average requests per second seams to me right because
the Solr system isn't under full load yet. The FAST system is still
taking
most of the load. I plan to switch completely to Solr after sharding is
up and
running stable. So there will be additional 3 qps to Solr at peek
times.

I don't know if a controlling master like FAST makes any sense for
Solr.
The small VMs with heartbeat and haproxy sounds great, must be on my
todo list.

But the biggest problem currently is, how to configure the DIH to split
up the
content to several indexer. Is there an indexing distributor?

Regards,
Bernd

Am 03.08.2011 16:33, schrieb Shawn Heisey:

Replies inline.

On 8/3/2011 2:24 AM, Bernd Fehling wrote:

To show that I compare apples and oranges here are my previous FAST

Search setup:

- one master server (controlling, logging, search dispatcher)
- six index server (4.25 mio docs per server, 5 slices per index)
(searching and indexing at the same time, indexing once per week

during the weekend)

- each server has 4GB RAM, all servers are physical on seperate

machines

- RAM usage controlled by the processes
- total of 25.5 mio. docs (mainly metadata) from 1500 databases

worldwide

- index size is about 67GB per indexer -->  about 402GB total
- about 3 qps at peek times
- with average search time of 0.05 seconds at peek times


An average query time of 50 milliseconds isn't too bad. If the number

from your Solr setup below (39.5) is the QTime, then Solr thinks it is

performing better, but Solr's QTime does not include absolutely

everything that hs to happen. Do you by chance have 95th and 99th
percentile

query times for either system?

And here is now my current Solr setup:
- one master server (indexing only)
- two slave server (search only) but only one is online, the second

is fallback

- each server has 32GB RAM, all server are virtuell
(master on a seperate physical machine, both slaves together on a

physical machine)

- RAM usage is currently 20GB to java heap
- total of 31 mio. docs (all metadata) from 2000 databases worldwide
- index size is 156GB total
- search handler statistic report 0.6 average requests per second
- average time per request 39.5 (is that seconds?)
- building the index from scratch takes about 20 hours


I can't tell whether you mean that each physical host has 32GB or

each VM has 32GB. You want to be sure that you are not oversubscribing
your

memory. If you can get more memory in your machines, you really

should. Do you know whether that 0.6 seconds is most of the delay that
a user

sees when making a search request, or are there other things going on

that contribute more delay? In our webapp, the Solr request time is

usually small compared with everything else the server and the user's

browser are doing to render the results page. As much as I hate being
the

tall pole in the tent, I look forward to the day when the developers

can change that balance.

The good thing is I have the ability to compare a commercial product

and

enterprise system to open source.

I started with my simple Solr setup because of "kiss" (keep it

simple and stupid).

Actually it is doing excellent as single index on a single virtuell

server.

But the average time per request should be reduced now, thats why I

started

this discussion.
While searches with smaller Solr index size (3 mio. docs) showed

that it can

stand with FAST Search it now shows that its time to go with

sharding.

I think we are already far behind the point of search performance

crossover.


What I hope to get with sharding:
- reduce time for building the index
- reduce average time per request


You will probably achieve both of these things by sharding,

especially if you have a lot of CPU cores available. Like mine, your
query volume is

very low, so the CPU cores are better utilized distributing the

search.

What I fear with sharding:
- i currently have master/slave, do I then have e.g. 3 master and 3

slaves?

- the query changes because of sharding (is there a search

distributor?)

- how to distribute the content the indexer with DIH on 3 server?
- anything else to think about while changing to sharding?


I think sharding is probably a good idea for you, as long as you

don't lose redundancy. You can duplicate the FAST concept of a master
server,

in a Solr core with no index. The solrconfig.xml for the core needs

to include the shards parameter. That core combined with those shards
will

make up one complete index chain, and you need to have at least two

complete chains, running on separate physical hardware. A load balancer
will

be critical. I use two small VMs on separate hosts with heartbeat and

haproxy for mine.


Thanks,
Shawn

Re: performance crossover between single index and sharding

Reply via email to