Facets and distributed search

Aleksander Stensby Mon, 04 Jan 2010 01:16:31 -0800

Hi everyone! I've posted a similar question earlier, but in a thread related
to facets in general, so I thought I'd repost it here as a separate thread.


I have a faceted search that is very fast when I executed the query on a
single solr server, but is significantly slower when executed in a
distributed environment.
The set-back seem to be in the sharding of our data.. And that puzzles me a
little bit... I can't really see why SOLR is so slow at doing this.

The scenario:
Let's say we have two servers (s1 and s2).
If i query
the following:
q=threadid:33&facet=true&facet.field=author&limit=-1&facet.mincount=0&rows=0
directly on either server, the response is lightning fast. (<10ms)

So, in theory I could query them directly, concat the result myself and get
that done pretty fast.

But if I introduce the shards parameter, the response time booms to between
15000ms and 20000ms!
shards=s1:8983/solr,s2:8983/solr

My initial thoughts is that I MUST be doing something wrong here?

So I try the following:
Run the query on server s1, with the shards param shards=s1:8983/solr
response time goes from sub 10ms to between 5000ms and 10000ms!
Same results if i run the query on s2, and same if i use shards=s2:8983/solr

Is there really that much overhead in running a distributed facet field
query with Solr? Anyone else experienced this?

On the other hand, running regular queries without facet distributed is
lightning fast... (so can't really see that this is a network problem or
anything either). - I tried running a facet query on s1 with s1 as the
shards param, and that is still as slow as if the shards param was pointed
to a different server...

Any insight into this would be greatly appreciated! (Would like to avoid
having to hack together our own solution concatenating results...)

Cheers,
 Aleks

Facets and distributed search

Reply via email to