Re: Improving performance to return 2000+ documents

2013-07-01 Thread Utkarsh Sengar
Thanks Erick/Jagdish. Just to give some background on my queries. 1. All my queries are unique. A query can be: ipod and ipod 8gb (but these are unique). These are about 1.2M in total. So, I assume setting a high queryResultCache, queryResultWindowSize and queryResultMaxDocsCached won't help.

Re: Improving performance to return 2000+ documents

2013-06-30 Thread Utkarsh Sengar
Thanks Erick/Peter. This is an offline process, used by a relevancy engine implemented around solr. The engine computes boost scores for related keywords based on clickstream data. i.e.: say clickstream has: ipad=upc1,upc2,upc3 I query solr with keyword: ipad (to get 2000 documents) and then make

Re: Improving performance to return 2000+ documents

2013-06-30 Thread Erick Erickson
50M documents, depending on a bunch of things, may not be unreasonable for a single node, only testing will tell. But the question I have is whether you should be using standard Solr queries for this or building a custom component that goes at the base Lucene index and does the right thing. Or

Re: Improving performance to return 2000+ documents

2013-06-30 Thread Jagdish Nomula
Solrconfig.xml has got entries which you can tweak for your use case. One of them is queryresultwindowsize. You can try using the value of 2000 and see if it helps improving performance. Please make sure you have enough memory allocated for queryresultcache. A combination of sharding and

Re: Improving performance to return 2000+ documents

2013-06-29 Thread Erick Erickson
Well, depending on how many docs get served from the cache the time will vary. But this is just ugly, if you can avoid this use-case it would be a Good Thing. Problem here is that each and every shard must assemble the list of 2,000 documents (just ID and sort criteria, usually score). Then the

Re: Improving performance to return 2000+ documents

2013-06-29 Thread Peter Sturge
Hello Utkarsh, This may or may not be relevant for your use-case, but the way we deal with this scenario is to retrieve the top N documents 5,10,20or100 at a time (user selectable). We can then page the results, changing the start parameter to return the next set. This allows us to 'retrieve'

Improving performance to return 2000+ documents

2013-06-28 Thread Utkarsh Sengar
Hello, I have a usecase where I need to retrive top 2000 documents matching a query. What are the parameters (in query, solrconfig, schema) I shoud look at to improve this? I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB RAM, 8vCPU and 7GB JVM heap size. I have

Re: Improving performance to return 2000+ documents

2013-06-28 Thread Utkarsh Sengar
Also, I don't see a consistent response time from solr, I ran ab again and I get this: ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201rows=2000wt=json Benchmarking x.amazonaws.com (be patient) Completed 100