I'd like to thank you for lending a hand on my query time problem with SolrCloud. By switching to a single shard with replicas setup, I've reduced my query time to 18 msec. My full ingestion of 300k+ documents went down from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes that are going in that should help a bit as well. Big thanks to everyone that had suggestions.
On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch <arafa...@gmail.com>wrote: > I suspect faceting is the issue here. The actual query you have shown > seem to bring back a single document (or a single set of document for > a product): > fq=id:(320403401) > > On the other hand, you are asking for 4 field facets: > facet.field=q_virtualCategory_ss > facet.field=q_brand_s > facet.field=q_color_s > facet.field=q_category_ss > AND 2 range facets, both clustered/grouped: > facet.range=daysSinceStart_i > facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000) > > And for all facets you have asked to bring back ALL of the results: > facet.limit=-1 > > Plus, you are doing a complex sort: > sort=popularity_i desc,popularity_i desc > > So, you are probably spending quite a bit of time counting (especially > in a shared setup) and then quite a bit more sending the response > back. > > I would check the size of the result document (HTTP result) and see > how large it is. Maybe you don't need all of the stuff that's coming > back. I assume you are not actually querying Solr from the client's > machine (that is I hope it is inside your data centre close to your > web server), otherwise I would say to look at automatic content > compression as well to minimize on-wire document size. > > Finally, if your documents have many stored fields (store=true in > schema.xml) but you only return small subsets of them during search, > you could look into using enableLazyFieldLoading flag in the > solrconfig. > > Regards, > Alex. > P.s. As others said, you don't seem to have too many documents. > Perhaps you want replication instead of sharding for improved > performance. > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin > <alexey_kozhemia...@epam.com> wrote: > > Btw "timing" for distributed requests are broken at this moment, it > doesn't combine values from requests to shards. I'm working on a patch. > > > > https://issues.apache.org/jira/browse/SOLR-3644 > > > > -----Original Message----- > > From: Jack Krupansky [mailto:j...@basetechnology.com] > > Sent: Tuesday, February 04, 2014 22:00 > > To: solr-user@lucene.apache.org > > Subject: Re: Lowering query time > > > > Add the debug=true parameter to some test queries and look at the > "timing" > > section to see which search components are taking the time. > Traditionally, highlighting for large documents was a top culprit. > > > > Are you returning a lot of data or field values? Sometimes reducing the > amount of data processed can help. Any multivalued fields with lots of > values? > > > > -- Jack Krupansky > > > > -----Original Message----- > > From: Joel Cohen > > Sent: Tuesday, February 4, 2014 1:43 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Lowering query time > > > > 1. We are faceting. I'm not a developer so I'm not quite sure how we're > doing it. How can I measure? > > 2. I'm not sure how we'd force this kind of document partitioning. I can > see how my shards are partitioned by looking at the clusterstate.json from > Zookeeper, but I don't have a clue on how to get documents into specific > shards. > > > > Would I be better off with fewer shards given the small size of my > indexes? > > > > > > On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley <yo...@heliosearch.com> > wrote: > > > >> On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen <joel.co...@bluefly.com> > >> wrote: > >> > I'm trying to get the query time down to ~15 msec. Anyone have any > >> > tuning recommendations? > >> > >> I guess it depends on what the slowest part of the query currently is. > >> If you are faceting, it's often that. > >> Also, it's often a big win if you can somehow partition documents such > >> that requests can normally be serviced from a single shard. > >> > >> -Yonik > >> http://heliosearch.org - native off-heap filters and fieldcache for > >> solr > >> > > > > > > > > -- > > > > joel cohen, senior system engineer > > > > e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th > st. new york, ny 10018 www.bluefly.com < > http://www.bluefly.com/?referer=autosig> | *fly since > > 2013...* > > > -- joel cohen, senior system engineer e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th st. new york, ny 10018 www.bluefly.com <http://www.bluefly.com/?referer=autosig> | *fly since 2013...*