Thanks Erick/Jagdish.
Just to give some background on my queries.
1. All my queries are unique. A query can be: ipod and ipod 8gb (but
these are unique). These are about 1.2M in total.
So, I assume setting a high queryResultCache, queryResultWindowSize and
queryResultMaxDocsCached won't help.
Thanks Erick/Peter.
This is an offline process, used by a relevancy engine implemented around
solr. The engine computes boost scores for related keywords based on
clickstream data.
i.e.: say clickstream has: ipad=upc1,upc2,upc3
I query solr with keyword: ipad (to get 2000 documents) and then make
50M documents, depending on a bunch of things,
may not be unreasonable for a single node, only
testing will tell.
But the question I have is whether you should be
using standard Solr queries for this or building a custom
component that goes at the base Lucene index
and does the right thing. Or
Solrconfig.xml has got entries which you can tweak for your use case. One
of them is queryresultwindowsize. You can try using the value of 2000 and
see if it helps improving performance. Please make sure you have enough
memory allocated for queryresultcache.
A combination of sharding and
Well, depending on how many docs get served
from the cache the time will vary. But this is
just ugly, if you can avoid this use-case it would
be a Good Thing.
Problem here is that each and every shard must
assemble the list of 2,000 documents (just ID and
sort criteria, usually score).
Then the
Hello Utkarsh,
This may or may not be relevant for your use-case, but the way we deal with
this scenario is to retrieve the top N documents 5,10,20or100 at a time
(user selectable). We can then page the results, changing the start
parameter to return the next set. This allows us to 'retrieve'
Hello,
I have a usecase where I need to retrive top 2000 documents matching a
query.
What are the parameters (in query, solrconfig, schema) I shoud look at to
improve this?
I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB RAM,
8vCPU and 7GB JVM heap size.
I have
Also, I don't see a consistent response time from solr, I ran ab again and
I get this:
ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500
http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201rows=2000wt=json
Benchmarking x.amazonaws.com (be patient)
Completed 100