Re: Configuration and specs to index a 1 terabyte (TB) repository

Erick Erickson Tue, 29 Oct 2013 18:27:51 -0700

In addition to Shawn's comments...

bq: we're close to beta release, so I can't
upgrade right now

WHOAAAA! You say you're close to release but
you haven't successfully crawled the data even once?
Upgrading to 4.5.1 is a trivial risk compared to that
statement!

This is setting itself up for a really rocky launch.

Frankly, I'd use independent clients running SolrJ
to parse the files on the client side (and perhaps
run a bunch of clients). You can use Tika, exactly
what's used on Solr. Plus offload moving 1T of data
across the wire. Plus relieve your (single?) Solr node
from doing all the work.

See: http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Oct 29, 2013 at 1:19 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 10/29/2013 10:44 AM, eShard wrote:
>
>> Offhand, how do I control how much of the index is held in RAM?
>> Can you point me in the right direction?
>>
>
> This is automatically handled by the operating system.  For quite some
> time, Solr (Lucene) has by default used the MMap functionality provided by
> all modern operating systems to access the index files.  The OS
> transparently handles caching with any available RAM, no configuration or
> limits required.  If the memory is needed for other purposes, the OS gives
> it up and the cache gets smaller.
>
> http://blog.thetaphi.de/2012/**07/use-lucenes-mmapdirectory-**
> on-64bit.html<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>
>
> Thanks,
> Shawn
>
>

Re: Configuration and specs to index a 1 terabyte (TB) repository

Reply via email to