Re: When not to use NRTCachingDirectory and what to use instead.

2014-04-30 Thread Jeff Wartes


On 4/19/14, 6:51 AM, Ken Krugler kkrugler_li...@transpac.com wrote:

The code I see seems to be using an FSDirectory, or is there another
layer of wrapping going on here?

return new NRTCachingDirectory(FSDirectory.open(new File(path)),
maxMergeSizeMB, maxCachedMB);


I was also curious about this subject. Not enough to test anything, but
enough to look at the code too.

FSDirectory.open picks one of MMapDirectory, SimpleFSDirectory and
NIOFSDirectory in that order of preference based on what it thinks your
system will support.

ThereĀ¹s still the possibility that the added caching functionality slows
down bulk index operations, but setting that aside, it does look like
NRTCachingDirectoryFactory is almost always the best choice.



Re: When not to use NRTCachingDirectory and what to use instead.

2014-04-21 Thread Tom Burton-West
Hi Ken,

Given the comments which seemed to describe using NRT for the opposite of
our use case, I just set our Solr 4 to use the solr.MMapDirectoryFactory.
 Didn't bother to test whether NRT would be better for our use case, mostly
because it didn't sound like there was an advantage and   I've been focused
on other things relating to Solr 4.  , I'd love to hear any results from
someone who is testing for a  batch indexing use case and has tested
various xxxDirectoryFactory implementations.  Please let me know your
results if you do end up doing some testing.

Tom


On Sat, Apr 19, 2014 at 9:51 AM, Ken Krugler kkrugler_li...@transpac.comwrote:


 Tom - did you ever get any useful results from testing here? I'm also
 curious about the impact of various xxxDirectoryFactory implementations for
 batch indexing.

 Thanks,

 -- Ken

 --
 Ken Krugler
 +1 530-210-6378
 http://www.scaleunlimited.com
 custom big data solutions  training
 Hadoop, Cascading, Cassandra  Solr








Re: When not to use NRTCachingDirectory and what to use instead.

2014-04-19 Thread Ken Krugler

On Jul 10, 2013, at 9:16am, Shawn Heisey s...@elyograg.org wrote:

 On 7/10/2013 9:59 AM, Tom Burton-West wrote:
 The Javadoc for NRTCachingDirectoy (
 http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/store/NRTCachingDirectory.html?is-external=true)
  says:
 
  This class is likely only useful in a near real-time context, where
 indexing rate is lowish but reopen rate is highish, resulting in many tiny
 files being written...
 
 It seems like we have exactly the opposite use case, so we would like
 advice on what directory implementation to use instead.
 
 We are doing offline batch indexing, so no searches are being done.  So we
 don't need NRT.  We also have a high indexing rate as we are trying to
 index 3 billion pages as quickly as possible.
 
 I am not clear what determines the reopen rate.   Is it only related to
 searching or is it involved in indexing as well?
 
  Does the NRTCachingDirectory have any benefit for indexing under the use
 case noted above?
 
 I'm guessing we should just use the solrStandardDirectoryFactory instead.
  Is this correct?
 
 The NRT directory object in Solr uses the MMap implementation as its default 
 delegate.  

The code I see seems to be using an FSDirectory, or is there another layer of 
wrapping going on here?

return new NRTCachingDirectory(FSDirectory.open(new File(path)), 
maxMergeSizeMB, maxCachedMB);

 I would use MMapDirectoryFactory (the default for most of the 3.x releases) 
 for testing whether you can get any improvement from moving away from the 
 default.  The advantages of memory mapping are not something you'd want to 
 give up.

Tom - did you ever get any useful results from testing here? I'm also curious 
about the impact of various xxxDirectoryFactory implementations for batch 
indexing.

Thanks,

-- Ken

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Cassandra  Solr







Re: When not to use NRTCachingDirectory and what to use instead.

2013-07-10 Thread Shawn Heisey

On 7/10/2013 9:59 AM, Tom Burton-West wrote:

The Javadoc for NRTCachingDirectoy (
http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/store/NRTCachingDirectory.html?is-external=true)
  says:

  This class is likely only useful in a near real-time context, where
indexing rate is lowish but reopen rate is highish, resulting in many tiny
files being written...

It seems like we have exactly the opposite use case, so we would like
advice on what directory implementation to use instead.

We are doing offline batch indexing, so no searches are being done.  So we
don't need NRT.  We also have a high indexing rate as we are trying to
index 3 billion pages as quickly as possible.

I am not clear what determines the reopen rate.   Is it only related to
searching or is it involved in indexing as well?

  Does the NRTCachingDirectory have any benefit for indexing under the use
case noted above?

I'm guessing we should just use the solrStandardDirectoryFactory instead.
  Is this correct?


The NRT directory object in Solr uses the MMap implementation as its 
default delegate.  I would use MMapDirectoryFactory (the default for 
most of the 3.x releases) for testing whether you can get any 
improvement from moving away from the default.  The advantages of memory 
mapping are not something you'd want to give up.


Thanks,
Shawn