>> As far as I know, Solr will never arrive to a segment file greater than 2GB,
>>so this shouldn't be a problem.

Solr can easily create a file size over 2GB, it just depends on how much data 
you index and your particular Solr configuration, including your 
ramBufferSizeMB, your mergeFactor, and whether you optimize.  For example we 
index about a terabyte of full text and optimize our indexes so we have a 300GB 
*prx file.  If you really have a filesystem limit of 2GB, there is a parameter 
called maxMergeMB in Solr 3.1 that you can set.  Unfortunately it is the 
maximum size of a segment that will be merged rather than the maximum size of 
the resulting segment.  So if you have a mergeFactor of 10 you could probably 
set it somewhere around (2GB / 10)= 200.  Just to be cautious, you might want 
to set it to 100.  

<mergePolicy class="org.apache.lucene.index.LogByteSizeMergePolicy">
        <double name="maxMergeMB">200</double>
</mergePolicy>

In the flexible indexing branch/trunk there is a new merge policy and parameter 
that allows you to set the maximum size of the merged segment: 
https://issues.apache.org/jira/browse/LUCENE-854. 


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

-----Original Message-----
From: Juan Grande [mailto:juan.gra...@gmail.com] 
Sent: Friday, April 15, 2011 5:15 PM
To: solr-user@lucene.apache.org
Subject: Re: QUESTION: SOLR INDEX BIG FILE SIZES

Hi John,

¿How can split the file of the solr index into multiple files?
>

Actually, the index is organized in a set of files called segments. It's not
just a single file, unless you tell Solr to do so.

That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>

As far as I know, Solr will never arrive to a segment file greater than 2GB,
so this shouldn't be a problem.

¿What is the recommended storage strategy for a big solr index files?
>

I guess that it depends in the indexing/querying performance that you're
having, the performance that you want, and what "big" exactly means for you.
If your index is so big that individual queries take too long, sharding may
be what you're looking for.

To better understand the index format, you can see
http://lucene.apache.org/java/3_1_0/fileformats.html

Also, you can take a look at my blog (http://juanggrande.wordpress.com), in
my last post I speak about segments merging.

Regards,

*Juan*


2011/4/15 JOHN JAIRO GÓMEZ LAVERDE <jjai...@hotmail.com>

>
> SOLR
> USER SUPPORT TEAM
>
> I have a quiestion about the "maximun file size of solr index",
> when i have a "lot of data in the solr index",
>
> -¿How can split the file of the solr index into multiple files?
>
> That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>
> -¿What is the recommended storage strategy for a big solr index files?
>
> Thanks for the reply.
>
> JOHN JAIRO GÓMEZ LAVERDE
> Bogotá - Colombia - South America

Reply via email to