Since all queries return total count as well so on average a query matches 10% of the total documents. The index I am talking about is around 13 million so that means around 1.3 million documents match on average. Of course all of them won't be overlapping so I am guessing that around 30-50% documents do match the daily queries.
I tried to find out a lot if you can tell SOLR to stop searching after a certain count - I don't mean no. of rows but just like MySQL limit so that it doesn't have to spend time calculating the total count whereas its only returning few rows to UI and we are OK in showing count as 1000+ (if its more than 1000) but couldn't find any way. On Sat, Feb 5, 2011 at 7:45 AM, Otis Gospodnetic <otis_gospodne...@yahoo.com > wrote: > Heh, I'm not sure if this is valid thinking. :) > > By *matching* doc distribution I meant: what proportion of your millions of > documents actually ever get matched and then how many of those make it to > the > UI. > If you have 1000 queries in a day and they all end up matching only 3 of > your > docs, the system will need less RAM than a system where 1000 queries match > 50000 > different docs. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Salman Akram <salman.ak...@northbaysolutions.net> > > To: solr-user@lucene.apache.org > > Sent: Fri, February 4, 2011 3:38:55 PM > > Subject: Re: Performance optimization of Proximity/Wildcard searches > > > > Well I assume many people out there would have indexes larger than 100GB > and > > I don't think so normally you will have more RAM than 32GB or 64! > > > > As I mentioned the queries are mostly phrase, proximity, wildcard and > > combination of these. > > > > What exactly do you mean by distribution of documents? On this index our > > documents are not more than few hundred KB's on average (file system > size) > > and there are around 14 million documents. 80% of the index size is > taken up > > by position file. I am not sure if this is what you asked? > > > > On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic < > otis_gospodne...@yahoo.com > > > wrote: > > > > > Hi, > > > > > > > > > > Sharding is an option too but that too comes with limitations so > want to > > > > keep that as a last resort but I think there must be other things > coz > > > 150GB > > > > is not too big for one drive/server with 32GB Ram. > > > > > > Hmm.... what makes you think 32 GB is enough for your 150 GB index? > > > It depends on queries and distribution of matching documents, for > example. > > > What's yours like? > > > > > > Otis > > > ---- > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > > > > > ----- Original Message ---- > > > > From: Salman Akram <salman.ak...@northbaysolutions.net> > > > > To: solr-user@lucene.apache.org > > > > Sent: Tue, January 25, 2011 4:20:34 AM > > > > Subject: Performance optimization of Proximity/Wildcard searches > > > > > > > > Hi, > > > > > > > > I am facing performance issues in three types of queries (and their > > > > combination). Some of the queries take more than 2-3 mins. Index > size is > > > > around 150GB. > > > > > > > > > > > > - Wildcard > > > > - Proximity > > > > - Phrases (with common words) > > > > > > > > I know CommonGrams and Stop words are a good way to resolve such > issues > > > but > > > > they don't fulfill our functional requirements (Common Grams seem > to > > > have > > > > issues with phrase proximity, stop words have issues with exact > match > > > etc). > > > > > > > > Sharding is an option too but that too comes with limitations so > want to > > > > keep that as a last resort but I think there must be other things > coz > > > 150GB > > > > is not too big for one drive/server with 32GB Ram. > > > > > > > > Cache warming is a good option too but the index get updated every > hour > > > so > > > > not sure how much would that help. > > > > > > > > What are the other main tips that can help in performance > optimization > > > of > > > > the above queries? > > > > > > > > Thanks > > > > > > > > -- > > > > Regards, > > > > > > > > Salman Akram > > > > > > > > > > > > > > > -- > > Regards, > > > > Salman Akram > > > -- Regards, Salman Akram