Re: throttle segment merging

2012-10-30 Thread Otis Gospodnetic
Hi Radim,

To address your comment about JIRA and search - perhaps this is better
- and it finds you:
http://search-lucene.com/?q=throttle+mergefc_project=Solr

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html



On Mon, Oct 29, 2012 at 6:57 PM, Radim Kolar h...@filez.com wrote:
 Dne 29.10.2012 12:18, Michael McCandless napsal(a):

 With Lucene 4.0, FSDirectory now supports merge bytes/sec throttling
 (FSDirectory.setMaxMergeWriteMBPerSec): it rate limits that max
 bytes/sec load on the IO system due to merging.

 Not sure if it's been exposed in Solr / ElasticSearch yet ...

 its not available in solr. Also solr class hierarchy for directory providers
 is bit different from lucene. In solr, MMAP DF and NIOFSDF needs to be
 subclass of StandardDF. then add write limit property to standardDF and it
 will be inherited by others like in lucene.

 solr
 http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/core/CachingDirectoryFactory.html
 lucene
 http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/FSDirectory.html


Re: throttle segment merging

2012-10-29 Thread Radim Kolar

Dne 29.10.2012 0:09, Lance Norskog napsal(a):

1) Do you use compound files (CFS)? This adds a lot of overhead to merging.

i do not know. whats solr configuration statement for turning them on/off?

2) Does ES use the same merge policy code as Solr?

ES rate limiting:

http://www.elasticsearch.org/guide/reference/index-modules/store.html
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/indices/store/IndicesStore.java
http://rajish.github.com/api/elasticsearch/0.20.0.Beta1-SNAPSHOT/org/apache/lucene/store/StoreRateLimiting.html




Re: throttle segment merging

2012-10-29 Thread Radim Kolar
is there JIRA ticket dedicated to throttling segment merge? i could not 
find any, but jira search kinda sucks.


It should be ported from ES because its not much code.


Re: throttle segment merging

2012-10-29 Thread Tomás Fernández Löbbe

  Is there way to set-up logging to output something when segment merging
 runs?

  I think segment merging is logged when you enable infoStream logging
 (you
 should see it commented in the solrconfig.xml)

 no, segment merging is not logged at info level. it needs customized log
 config.


INFO level is not the same as infoStream. See solrconfig, there is a
commented section that talks about it, and if you uncomment it it will
generate a file with low level Lucene logging. This file will include
segments information, including merging.




  Can be segment merges throttled?

  You can change when and how segments are merged with the merge policy,
 maybe it's enough for you changing the initial settings (mergeFactor for
 example)?

 I am now researching elasticsearch, it can do it, its lucene 3.6 based



I don't know if this is what you are looking for, but the TieredMergePolicy
(default) allows you to set maximum number of segments to be merged at once
and maximum size of segments to be created during normal merging.
Other option is, as you said, create a Jira for a new merge policy.

Tomás


Re: throttle segment merging

2012-10-29 Thread Michael McCandless
With Lucene 4.0, FSDirectory now supports merge bytes/sec throttling
(FSDirectory.setMaxMergeWriteMBPerSec): it rate limits that max
bytes/sec load on the IO system due to merging.

Not sure if it's been exposed in Solr / ElasticSearch yet ...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Oct 29, 2012 at 7:07 AM, Tomás Fernández Löbbe
tomasflo...@gmail.com wrote:

  Is there way to set-up logging to output something when segment merging
 runs?

  I think segment merging is logged when you enable infoStream logging
 (you
 should see it commented in the solrconfig.xml)

 no, segment merging is not logged at info level. it needs customized log
 config.


 INFO level is not the same as infoStream. See solrconfig, there is a
 commented section that talks about it, and if you uncomment it it will
 generate a file with low level Lucene logging. This file will include
 segments information, including merging.




  Can be segment merges throttled?

  You can change when and how segments are merged with the merge policy,
 maybe it's enough for you changing the initial settings (mergeFactor for
 example)?

 I am now researching elasticsearch, it can do it, its lucene 3.6 based



 I don't know if this is what you are looking for, but the TieredMergePolicy
 (default) allows you to set maximum number of segments to be merged at once
 and maximum size of segments to be created during normal merging.
 Other option is, as you said, create a Jira for a new merge policy.

 Tomás


Re: throttle segment merging

2012-10-29 Thread Radim Kolar

Dne 29.10.2012 12:18, Michael McCandless napsal(a):

With Lucene 4.0, FSDirectory now supports merge bytes/sec throttling
(FSDirectory.setMaxMergeWriteMBPerSec): it rate limits that max
bytes/sec load on the IO system due to merging.

Not sure if it's been exposed in Solr / ElasticSearch yet ...
its not available in solr. Also solr class hierarchy for directory 
providers is bit different from lucene. In solr, MMAP DF and NIOFSDF 
needs to be subclass of StandardDF. then add write limit property to 
standardDF and it will be inherited by others like in lucene.


solr
http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/core/CachingDirectoryFactory.html
lucene
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/FSDirectory.html


Re: throttle segment merging

2012-10-28 Thread Lance Norskog
1) Do you use compound files (CFS)? This adds a lot of overhead to merging.
2) Does ES use the same merge policy code as Solr?

In solrconfig.xml, here are the lines that control segment merging. You can 
probably set mergeFactor to 20 and cut the amount of disk I/O.

!-- Expert: Merge Policy 
 The Merge Policy in Lucene controls how merging of segments is done.
 The default since Solr/Lucene 3.3 is TieredMergePolicy.
 The default since Lucene 2.3 was the LogByteSizeMergePolicy,
 Even older versions of Lucene used LogDocMergePolicy.
  --
!--
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce10/int
  int name=segmentsPerTier10/int
/mergePolicy
  --
   
!-- Merge Factor
 The merge factor controls how many segments will get merged at a time.
 For TieredMergePolicy, mergeFactor is a convenience parameter which
 will set both MaxMergeAtOnce and SegmentsPerTier at once.
 For LogByteSizeMergePolicy, mergeFactor decides how many new segments
 will be allowed before they are merged into one.
 Default is 10 for both merge policies.
  --
!-- 
mergeFactor10/mergeFactor
  --

!-- Expert: Merge Scheduler
 The Merge Scheduler in Lucene controls how merges are
 performed.  The ConcurrentMergeScheduler (Lucene 2.3 default)
 can perform merges in the background using separate threads.
 The SerialMergeScheduler (Lucene 2.2 default) does not.
 --
!-- 
   mergeScheduler 
class=org.apache.lucene.index.ConcurrentMergeScheduler/
   --


- Original Message -
| From: Radim Kolar h...@filez.com
| To: solr-user@lucene.apache.org
| Sent: Saturday, October 27, 2012 7:44:46 PM
| Subject: Re: throttle segment merging
| 
| Dne 26.10.2012 3:47, Tomás Fernández Löbbe napsal(a):
|  Is there way to set-up logging to output something when segment
|  merging
|  runs?
| 
|  I think segment merging is logged when you enable infoStream
|  logging (you
|  should see it commented in the solrconfig.xml)
| no, segment merging is not logged at info level. it needs customized
| log
| config.
| 
| 
|  Can be segment merges throttled?
|   You can change when and how segments are merged with the merge
| policy, maybe it's enough for you changing the initial settings
| (mergeFactor for example)?
| 
| I am now researching elasticsearch, it can do it, its lucene 3.6
| based
| 


Re: throttle segment merging

2012-10-27 Thread Radim Kolar

Dne 26.10.2012 3:47, Tomás Fernández Löbbe napsal(a):

Is there way to set-up logging to output something when segment merging
runs?


I think segment merging is logged when you enable infoStream logging (you
should see it commented in the solrconfig.xml)
no, segment merging is not logged at info level. it needs customized log 
config.





Can be segment merges throttled?
 You can change when and how segments are merged with the merge 
policy, maybe it's enough for you changing the initial settings 
(mergeFactor for example)?


I am now researching elasticsearch, it can do it, its lucene 3.6 based


throttle segment merging

2012-10-25 Thread Radim Kolar
I have problems with very low indexing speed as soon as core size grows 
over 15 GB. I suspect that it can be due io intensive segment merging.


Is there way to set-up logging to output something when segment merging 
runs?


Can be segment merges throttled?


Re: throttle segment merging

2012-10-25 Thread Tomás Fernández Löbbe

 Is there way to set-up logging to output something when segment merging
 runs?

I think segment merging is logged when you enable infoStream logging (you
should see it commented in the solrconfig.xml)


 Can be segment merges throttled?


You can change when and how segments are merged with the merge policy,
maybe it's enough for you changing the initial settings (mergeFactor for
example)?

Tomás