>> As far as I know, Solr will never arrive to a segment file greater than 2GB, >>so this shouldn't be a problem.
Solr can easily create a file size over 2GB, it just depends on how much data you index and your particular Solr configuration, including your ramBufferSizeMB, your mergeFactor, and whether you optimize. For example we index about a terabyte of full text and optimize our indexes so we have a 300GB *prx file. If you really have a filesystem limit of 2GB, there is a parameter called maxMergeMB in Solr 3.1 that you can set. Unfortunately it is the maximum size of a segment that will be merged rather than the maximum size of the resulting segment. So if you have a mergeFactor of 10 you could probably set it somewhere around (2GB / 10)= 200. Just to be cautious, you might want to set it to 100. <mergePolicy class="org.apache.lucene.index.LogByteSizeMergePolicy"> <double name="maxMergeMB">200</double> </mergePolicy> In the flexible indexing branch/trunk there is a new merge policy and parameter that allows you to set the maximum size of the merged segment: https://issues.apache.org/jira/browse/LUCENE-854. Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search -----Original Message----- From: Juan Grande [mailto:juan.gra...@gmail.com] Sent: Friday, April 15, 2011 5:15 PM To: solr-user@lucene.apache.org Subject: Re: QUESTION: SOLR INDEX BIG FILE SIZES Hi John, ¿How can split the file of the solr index into multiple files? > Actually, the index is organized in a set of files called segments. It's not just a single file, unless you tell Solr to do so. That's because some "file systems are about to support a maximun > of space in a single file" for example some UNIX file systems only support > a maximun of 2GB per file. > As far as I know, Solr will never arrive to a segment file greater than 2GB, so this shouldn't be a problem. ¿What is the recommended storage strategy for a big solr index files? > I guess that it depends in the indexing/querying performance that you're having, the performance that you want, and what "big" exactly means for you. If your index is so big that individual queries take too long, sharding may be what you're looking for. To better understand the index format, you can see http://lucene.apache.org/java/3_1_0/fileformats.html Also, you can take a look at my blog (http://juanggrande.wordpress.com), in my last post I speak about segments merging. Regards, *Juan* 2011/4/15 JOHN JAIRO GÓMEZ LAVERDE <jjai...@hotmail.com> > > SOLR > USER SUPPORT TEAM > > I have a quiestion about the "maximun file size of solr index", > when i have a "lot of data in the solr index", > > -¿How can split the file of the solr index into multiple files? > > That's because some "file systems are about to support a maximun > of space in a single file" for example some UNIX file systems only support > a maximun of 2GB per file. > > -¿What is the recommended storage strategy for a big solr index files? > > Thanks for the reply. > > JOHN JAIRO GÓMEZ LAVERDE > Bogotá - Colombia - South America