It depends on many factors - how big those docs are (compare a tweet to a news article to a book chapter) whether you store the data or just index it, whether you compress it, how and how much you analyze the data, etc.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Jean-Sebastien Vachon <js.vac...@videotron.ca> > To: solr-user@lucene.apache.org > Sent: Wed, February 24, 2010 8:57:21 AM > Subject: Index size > > Hi All, > > I'm currently looking on integrating Solr and I'd like to have some hints on > the > size of the index (number of documents) I could possibly host on a server > running a Double-Quad server (16 cores) with 48Gb of RAM running Linux. > Basically, I need to determine how many of these servers would be required to > host about half a billion documents. Should I setup multiple Solr instances > (in > Virtual Machines or not) or should I run a single instance (with multicores > or > not) using all available memory as the cache ? > > I also made some tests with shardings on this same server and I could not see > any improvement (at least not with 4.5 millions documents). Should all the > shards be hosted on different servers? I shall try with more documents in the > following days. > > Thx