Re: SOLR indexing strategy

Shawn Heisey Fri, 20 Mar 2015 22:29:36 -0700

On 3/20/2015 10:08 PM, Jack Krupansky wrote:
> 1. With 1000 fields, you may only get 10 to 25 million rows per node. So, a
> single date may take 15 to 50 nodes.
> 2. How many of the fields need to be indexed for reference in a query?
> 3. Are all the fields populated for each row?
> 4. Maybe you could split each row, so that one Solr collection would have a
> slice of the fields. Then separate Solr clusters could be used for each of
> the slices.
> 
> -- Jack Krupansky
> 
> On Fri, Mar 20, 2015 at 7:12 AM, varun sharma <mechanism_...@yahoo.co.in>
> wrote:
> 
>> Requirements of the system that we are trying to build are for each date
>> we need to create a SOLR index containing about 350-500 million documents ,
>> where each document is a single structured record having about 1000 fields
>> .Then query same based on index keys & date, for instance we will try to
>> search records related to a particular user where date between Jan-1-2015
>> to Jan-31-2015. This query should load only indexes within this date range
>> into memory and return rows corresponding to the search pattern.Please
>> suggest how this can be implemented using SOLR/Lucene.Thank you ,Varun.


If you literally have 350-500 million documents for every single day in
your index, that's like the hamburger count at McDonald's ... billions
and billions.  With 1000 fields per document, the amount of disk space
required is going to be huge ... and if you care at all about
performance, you're going to need a lot of machines with a lot of memory.

Keeping that much hardware tamed will require SolrCloud.  Jack may be
right about you needing to create entirely separate collections, each
one would be sharded and replicated across multiple servers.  You could
also do a single collection with manual sharding where a new shard is
created every few hours, to keep the document count in each shard low.
I'm not sure which approach would give the best results.

Thanks,
Shawn

Re: SOLR indexing strategy

Reply via email to