Hi,

All the document can be up to 10K. Most if it comes from a single field which 
is both indexed and stored. 
The data is uncompressed because it would eat up to much CPU considering the 
volume we have. We have around 30 fields in all.
We also need to compute some facets as well as collapse the documents forming 
the result set and to be able to sort them on any field.

Thx

On 2010-02-25, at 5:50 PM, Otis Gospodnetic wrote:

> It depends on many factors - how big those docs are (compare a tweet to a 
> news article to a book chapter) whether you store the data or just index it, 
> whether you compress it, how and how much you analyze the data, etc.
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> ----- Original Message ----
>> From: Jean-Sebastien Vachon <js.vac...@videotron.ca>
>> To: solr-user@lucene.apache.org
>> Sent: Wed, February 24, 2010 8:57:21 AM
>> Subject: Index size
>> 
>> Hi All,
>> 
>> I'm currently looking on integrating Solr and I'd like to have some hints on 
>> the 
>> size of the index (number of documents) I could possibly host on a server 
>> running a Double-Quad server (16 cores) with 48Gb of RAM running Linux. 
>> Basically, I need to determine how many of these servers would be required 
>> to 
>> host about half a billion documents. Should I setup multiple Solr instances 
>> (in 
>> Virtual Machines or not) or should I run a single instance (with multicores 
>> or 
>> not) using all available memory as the cache ?
>> 
>> I also made some tests with shardings on this same server and I could not 
>> see 
>> any improvement (at least not with 4.5 millions documents). Should all the 
>> shards be hosted on different servers? I shall try with more documents in 
>> the 
>> following days.
>> 
>> Thx 
> 

Reply via email to