In addition to Shawn's comments... bq: we're close to beta release, so I can't upgrade right now
WHOAAAA! You say you're close to release but you haven't successfully crawled the data even once? Upgrading to 4.5.1 is a trivial risk compared to that statement! This is setting itself up for a really rocky launch. Frankly, I'd use independent clients running SolrJ to parse the files on the client side (and perhaps run a bunch of clients). You can use Tika, exactly what's used on Solr. Plus offload moving 1T of data across the wire. Plus relieve your (single?) Solr node from doing all the work. See: http://searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Tue, Oct 29, 2013 at 1:19 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 10/29/2013 10:44 AM, eShard wrote: > >> Offhand, how do I control how much of the index is held in RAM? >> Can you point me in the right direction? >> > > This is automatically handled by the operating system. For quite some > time, Solr (Lucene) has by default used the MMap functionality provided by > all modern operating systems to access the index files. The OS > transparently handles caching with any available RAM, no configuration or > limits required. If the memory is needed for other purposes, the OS gives > it up and the cache gets smaller. > > http://blog.thetaphi.de/2012/**07/use-lucenes-mmapdirectory-** > on-64bit.html<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html> > > Thanks, > Shawn > >