For what it's worth, Google has done some pretty interesting research into 
coping with the idea that particular shards might very well be busy doing 
something else when your query comes in.

Check out this slide deck: http://research.google.com/people/jeff/latency.html
Lots of interesting ideas, but in particular, around slide 39 he talks about 
"backup requests" where you wait for something like your typical response time 
and then issue a second request to a different shard. You take whichever answer 
you get first, and cancel the other. The initial wait + cancellation means your 
extra cluster load is minimal, and you still get the benefit of reducing your 
p95+ response times if the first request was high-latency due to something 
unrelated to the query. (Say, GC.)

Of course, a central principle of this approach is being able to cancel a query 
and have it stop consuming resources. I'd love to be corrected, but I don't 
think Solr allows this. You can stop waiting for a response, but even the 
timeAllowed param doesn't seem to stop resource usage after the allotted time.  
Meaning, a few exceptionally long-running queries can take out your 
high-throughput cluster by tying up entire CPUs for long periods.

Let me know the JIRA number, I'd love to see work in this area.


-----Original Message-----
From: Phil Hoy [mailto:p...@brightsolid.com] 
Sent: Tuesday, January 29, 2013 11:33 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr load balancer

Hi Erick,

Thanks, I have read the blogs you cited and I found them very interesting, and 
we have tuned the jvm accordingly but still we get the odd longish gc pause. 

That said we perhaps have an unusual setup; we index a lot of small documents 
using servers with ssd's and 128 GB RAM in a sharded set up with replicas and 
our queries rely heavily on query filters and faceting with minimal free-text 
style searching. For that reason we rely heavily on the filter cache to improve 
query latency, therefore we assign a large percentage of available ram to the 
jvm hosting solr. 

Anyhow we are happy with the current configuration and performance profile, 
aside from the odd gc pause that is, and as we have index replicas it seems to 
me that we should be able to cope, hence my willingness to tweak how the load 
balancer behaves.

Thanks,
Phil



-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 20 January 2013 15:56
To: solr-user@lucene.apache.org
Subject: Re: Solr load balancer

Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's 
a great place to start:

http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/
and:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I've wondered about a similar approach, but by firing off the same query to 
multiple nodes in your cluster, you'll be effectively doubling (at least) the 
load on your system. Leading to more memory issues perhaps in a "non-virtuous 
cycle".

FWIW,
Erick

On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy <p...@brightsolid.com> wrote:
> Hi,
>
> I would like to experiment with some custom load balancers to help with query 
> latency in the face of long gc pauses and the odd time-consuming query that 
> we need to be able to support. At the moment setting the socket timeout via 
> the HttpShardHandlerFactory does help, but of course it can only be set to a 
> length of time as long as the most time consuming query we are likely to 
> receive.
>
> For example perhaps a load balancer that sends multiple queries concurrently 
> to all/some replicas and only keeps the first response might be effective. Or 
> maybe a load balancer which takes account of the frequency of timeouts would 
> be able to recognize zombies more effectively.
>
> To use alternative load balancer implementations cleanly and without having 
> to hack solr directly, I would need to be able to make the existing 
> LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I 
> can then override the default load balancer using solr's plugin mechanism.
>
> So my question is, if I made a patch to make the load balancer more 
> pluggable, is this something that would be acceptable and if so what do I do 
> next?
>
> Phil
>
> ______________________________________________________________________
> "brightsolid" is used in this email to collectively mean brightsolid online 
> innovation limited and its subsidiary companies brightsolid online publishing 
> limited and brightsolid online technology limited.
> findmypast.co.uk is a brand of brightsolid online publishing limited.
> brightsolid online innovation limited, Gateway House, Luna Place, Dundee 
> Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC274983.
> brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington 
> Street, London EC2A 3DQ. Registered in England No. 04369607.
> brightsolid online technology limited, Gateway House, Luna Place, Dundee 
> Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC161678.
>
> Email Disclaimer
>
> This message is confidential and may contain privileged information. You 
> should not disclose its contents to any other person. If you are not the 
> intended recipient, please notify the sender named above immediately. It is 
> expressly declared that this e-mail does not constitute nor form part of a 
> contract or unilateral obligation. Opinions, conclusions and other 
> information in this message that do not relate to the official business of 
> brightsolid shall be understood as neither given nor endorsed by it.
> ______________________________________________________________________
> This email has been scanned by the brightsolid Email Security System. 
> Powered by MessageLabs
> ______________________________________________________________________

Reply via email to