Re: Defaults Merge Policy

2020-04-27 Thread Kayak28
Thank you for responding.
I will keep your words in mind.

Thank you again.



2020年4月23日(木) 20:38 Erick Erickson :

> Glad those articles helped, I remember them well ;)
>
> Do note that 30 (well, actually 33%) is usually the ceiling.
> But as I mentioned, it’s soft, not absolute. So your index
> might have a higher percentage temporarily.
>
> Best,
> Erick
>
> > On Apr 23, 2020, at 4:01 AM, Kayak28  wrote:
> >
> > Hello, Erick Erickson:
> >
> > Thank you for answering my questions.
> >
> > Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
> > so I will monitor it for now.
> > Again thank you for your response.
> >
> > Actually, the articles below helped me a lot.
> >
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> > https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> >
> >
> > Sincerely,
> > Kaya Ota
> >
> > 2020年4月16日(木) 2:41 Erick Erickson :
> >
> >> The number of deleted documents will bounce around.
> >> The default TieredMergePolicy has a rather complex
> >> algorithm that decides which segments to
> >> merge, and the percentage of deleted docs in any
> >> given segment is a factor, but not the sole determinant.
> >>
> >> Merging is not really based on the raw number of segments,
> >> rather on the number of segments of similar size.
> >>
> >> But the short answer is “no, you don’t have to configure
> >> anything explicitly”. The percentage of deleted docs
> >> should max out at around 30% or so, although that’s a
> >> soft number, it’s usually lower.
> >>
> >> Unless you have some provable performance problem,
> >> I wouldn’t worry about it. And don’t infer anything
> >> until you’ve indexed a _lot_ of docs.
> >>
> >> Oh, and I kind of dislike numDocs as the trigger and
> >> tend to use time on the theory that it’s easier to explain,
> >> whereas when commits happen when using maxDocs
> >> varies depending on the throughput rate.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Apr 15, 2020, at 1:28 PM, Kayak28  wrote:
> >>>
> >>> Hello, Solr Community:
> >>>
> >>> I would like to ask about Default's Merge Policy for Solr 8.3.0.
> >>> My client (SolrJ) makes a commit every 10'000 doc.
> >>> I have not explicitly configured Merge Policy via solrconfig.xml
> >>> For each indexing time, some documents are updated or deleted.
> >>> I think the Default Merge Policy will merge segments automatically
> >>> if there are too many segments.
> >>> But, the number of deleted documents is increasing.
> >>>
> >>> Is there a Default Merge Policy Configuration?
> >>> Or, do I have to configure it?
> >>>
> >>> Sincerely,
> >>> Kaya Ota
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Sincerely,
> >>> Kaya
> >>> github: https://github.com/28kayak
> >>
> >>
> >
> > --
> >
> > Sincerely,
> > Kaya
> > github: https://github.com/28kayak
>
>

-- 

Sincerely,
Kaya
github: https://github.com/28kayak


Re: Defaults Merge Policy

2020-04-23 Thread Erick Erickson
Glad those articles helped, I remember them well ;)

Do note that 30 (well, actually 33%) is usually the ceiling.
But as I mentioned, it’s soft, not absolute. So your index
might have a higher percentage temporarily.

Best,
Erick

> On Apr 23, 2020, at 4:01 AM, Kayak28  wrote:
> 
> Hello, Erick Erickson:
> 
> Thank you for answering my questions.
> 
> Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
> so I will monitor it for now.
> Again thank you for your response.
> 
> Actually, the articles below helped me a lot.
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> 
> 
> Sincerely,
> Kaya Ota
> 
> 2020年4月16日(木) 2:41 Erick Erickson :
> 
>> The number of deleted documents will bounce around.
>> The default TieredMergePolicy has a rather complex
>> algorithm that decides which segments to
>> merge, and the percentage of deleted docs in any
>> given segment is a factor, but not the sole determinant.
>> 
>> Merging is not really based on the raw number of segments,
>> rather on the number of segments of similar size.
>> 
>> But the short answer is “no, you don’t have to configure
>> anything explicitly”. The percentage of deleted docs
>> should max out at around 30% or so, although that’s a
>> soft number, it’s usually lower.
>> 
>> Unless you have some provable performance problem,
>> I wouldn’t worry about it. And don’t infer anything
>> until you’ve indexed a _lot_ of docs.
>> 
>> Oh, and I kind of dislike numDocs as the trigger and
>> tend to use time on the theory that it’s easier to explain,
>> whereas when commits happen when using maxDocs
>> varies depending on the throughput rate.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 15, 2020, at 1:28 PM, Kayak28  wrote:
>>> 
>>> Hello, Solr Community:
>>> 
>>> I would like to ask about Default's Merge Policy for Solr 8.3.0.
>>> My client (SolrJ) makes a commit every 10'000 doc.
>>> I have not explicitly configured Merge Policy via solrconfig.xml
>>> For each indexing time, some documents are updated or deleted.
>>> I think the Default Merge Policy will merge segments automatically
>>> if there are too many segments.
>>> But, the number of deleted documents is increasing.
>>> 
>>> Is there a Default Merge Policy Configuration?
>>> Or, do I have to configure it?
>>> 
>>> Sincerely,
>>> Kaya Ota
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Sincerely,
>>> Kaya
>>> github: https://github.com/28kayak
>> 
>> 
> 
> -- 
> 
> Sincerely,
> Kaya
> github: https://github.com/28kayak



Re: Defaults Merge Policy

2020-04-23 Thread Kayak28
Hello, Erick Erickson:

Thank you for answering my questions.

Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
so I will monitor it for now.
Again thank you for your response.

Actually, the articles below helped me a lot.
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/


 Sincerely,
Kaya Ota

2020年4月16日(木) 2:41 Erick Erickson :

> The number of deleted documents will bounce around.
> The default TieredMergePolicy has a rather complex
> algorithm that decides which segments to
> merge, and the percentage of deleted docs in any
> given segment is a factor, but not the sole determinant.
>
> Merging is not really based on the raw number of segments,
> rather on the number of segments of similar size.
>
> But the short answer is “no, you don’t have to configure
> anything explicitly”. The percentage of deleted docs
> should max out at around 30% or so, although that’s a
> soft number, it’s usually lower.
>
> Unless you have some provable performance problem,
> I wouldn’t worry about it. And don’t infer anything
> until you’ve indexed a _lot_ of docs.
>
> Oh, and I kind of dislike numDocs as the trigger and
> tend to use time on the theory that it’s easier to explain,
> whereas when commits happen when using maxDocs
> varies depending on the throughput rate.
>
> Best,
> Erick
>
> > On Apr 15, 2020, at 1:28 PM, Kayak28  wrote:
> >
> > Hello, Solr Community:
> >
> > I would like to ask about Default's Merge Policy for Solr 8.3.0.
> > My client (SolrJ) makes a commit every 10'000 doc.
> > I have not explicitly configured Merge Policy via solrconfig.xml
> > For each indexing time, some documents are updated or deleted.
> > I think the Default Merge Policy will merge segments automatically
> > if there are too many segments.
> > But, the number of deleted documents is increasing.
> >
> > Is there a Default Merge Policy Configuration?
> > Or, do I have to configure it?
> >
> > Sincerely,
> > Kaya Ota
> >
> >
> >
> > --
> >
> > Sincerely,
> > Kaya
> > github: https://github.com/28kayak
>
>

-- 

Sincerely,
Kaya
github: https://github.com/28kayak


Re: Defaults Merge Policy

2020-04-15 Thread Erick Erickson
The number of deleted documents will bounce around.
The default TieredMergePolicy has a rather complex
algorithm that decides which segments to 
merge, and the percentage of deleted docs in any
given segment is a factor, but not the sole determinant.

Merging is not really based on the raw number of segments,
rather on the number of segments of similar size.

But the short answer is “no, you don’t have to configure
anything explicitly”. The percentage of deleted docs
should max out at around 30% or so, although that’s a
soft number, it’s usually lower.

Unless you have some provable performance problem,
I wouldn’t worry about it. And don’t infer anything
until you’ve indexed a _lot_ of docs.

Oh, and I kind of dislike numDocs as the trigger and
tend to use time on the theory that it’s easier to explain,
whereas when commits happen when using maxDocs
varies depending on the throughput rate.

Best,
Erick

> On Apr 15, 2020, at 1:28 PM, Kayak28  wrote:
> 
> Hello, Solr Community:
> 
> I would like to ask about Default's Merge Policy for Solr 8.3.0.
> My client (SolrJ) makes a commit every 10'000 doc.
> I have not explicitly configured Merge Policy via solrconfig.xml
> For each indexing time, some documents are updated or deleted.
> I think the Default Merge Policy will merge segments automatically
> if there are too many segments.
> But, the number of deleted documents is increasing.
> 
> Is there a Default Merge Policy Configuration?
> Or, do I have to configure it?
> 
> Sincerely,
> Kaya Ota
> 
> 
> 
> -- 
> 
> Sincerely,
> Kaya
> github: https://github.com/28kayak



Defaults Merge Policy

2020-04-15 Thread Kayak28
Hello, Solr Community:

I would like to ask about Default's Merge Policy for Solr 8.3.0.
My client (SolrJ) makes a commit every 10'000 doc.
I have not explicitly configured Merge Policy via solrconfig.xml
For each indexing time, some documents are updated or deleted.
I think the Default Merge Policy will merge segments automatically
if there are too many segments.
But, the number of deleted documents is increasing.

Is there a Default Merge Policy Configuration?
Or, do I have to configure it?

Sincerely,
Kaya Ota



-- 

Sincerely,
Kaya
github: https://github.com/28kayak


Re: Reindex Required for Merge Policy Changes?

2020-02-25 Thread Zimmermann, Thomas
Thanks so much Erick. Sounds like this should be a perfect approach to helping 
resolve our current issue.

On 2/24/20, 6:48 PM, "Erick Erickson"  wrote:

Thomas:
Yes, upgrading to 7.5+ will automagically take advantage of the 
improvements, eventually... No, you don’t have to reindex.

The “eventually” part. As you add, and particularly replace, existing 
documents, TMP will make decisions based on the new policy. If you’ve optimized 
in the past and have a very large segment (I.e. > 5G), it’ll be rewritten when 
the number of deleted docs exceeds the threshold; I don’t remember what the 
exact number is. Point is it’ll recover from having an over-large segment over 
time and _eventually_ the largest segment will be < 5G.

Absent a previous optimize making a large segment, I’d just consider 
optimizing after you’ve upgraded. The TMP revisions respect the max segment 
size, so that should purge all deleted documents from your index without 
creating a too-large one. Thereafter the number of deleted docs should remain < 
about 33%. It only really approaches that percentage when you’re updating lots 
of existing docs.

Finally, expungeDeletes is less expensive than optimize because it doesn’t 
rewrite segments with 10% deleted docs so that’s an alternative to optimizing 
after upgrading.


Best,
Erick

> On Feb 24, 2020, at 5:42 PM, Zimmermann, Thomas 
 wrote:
> 
> Hi Folks –
> 
> Few questions before I tackled an upgrade here. Looking to go from 7.4 to 
7.7.2 to take advantage of the improved Tiered Merge Policy and segment cleanup 
– we are dealing with some high (45%) deleted doc counts in a few cores. Would 
simply upgrading Solr and setting the cores to use Lucene 7.7.2 take advantage 
of these features? Would I need to reindex to get existing segments merged more 
efficiently? Does it depend on the size of my current segments vs the 
configuration of the merge policy or would upgrading simply allow solr to do 
its own thing help mitigate this issue?
> 
> Also – I noticed the 7.5+ defaults to the Autoscaling for replication, 
and 8.0 defaults to legacy. Would I potentially need to make changes to my 
existing configs to ensure they stay on Legacy replication?
> 
> Thanks much!
> TZ
> 
> 
> 




Re: Reindex Required for Merge Policy Changes?

2020-02-24 Thread Erick Erickson
Thomas:
Yes, upgrading to 7.5+ will automagically take advantage of the improvements, 
eventually... No, you don’t have to reindex.

The “eventually” part. As you add, and particularly replace, existing 
documents, TMP will make decisions based on the new policy. If you’ve optimized 
in the past and have a very large segment (I.e. > 5G), it’ll be rewritten when 
the number of deleted docs exceeds the threshold; I don’t remember what the 
exact number is. Point is it’ll recover from having an over-large segment over 
time and _eventually_ the largest segment will be < 5G.

Absent a previous optimize making a large segment, I’d just consider optimizing 
after you’ve upgraded. The TMP revisions respect the max segment size, so that 
should purge all deleted documents from your index without creating a too-large 
one. Thereafter the number of deleted docs should remain < about 33%. It only 
really approaches that percentage when you’re updating lots of existing docs.

Finally, expungeDeletes is less expensive than optimize because it doesn’t 
rewrite segments with 10% deleted docs so that’s an alternative to optimizing 
after upgrading.


Best,
Erick

> On Feb 24, 2020, at 5:42 PM, Zimmermann, Thomas  
> wrote:
> 
> Hi Folks –
> 
> Few questions before I tackled an upgrade here. Looking to go from 7.4 to 
> 7.7.2 to take advantage of the improved Tiered Merge Policy and segment 
> cleanup – we are dealing with some high (45%) deleted doc counts in a few 
> cores. Would simply upgrading Solr and setting the cores to use Lucene 7.7.2 
> take advantage of these features? Would I need to reindex to get existing 
> segments merged more efficiently? Does it depend on the size of my current 
> segments vs the configuration of the merge policy or would upgrading simply 
> allow solr to do its own thing help mitigate this issue?
> 
> Also – I noticed the 7.5+ defaults to the Autoscaling for replication, and 
> 8.0 defaults to legacy. Would I potentially need to make changes to my 
> existing configs to ensure they stay on Legacy replication?
> 
> Thanks much!
> TZ
> 
> 
> 


Reindex Required for Merge Policy Changes?

2020-02-24 Thread Zimmermann, Thomas
Hi Folks –

Few questions before I tackled an upgrade here. Looking to go from 7.4 to 7.7.2 
to take advantage of the improved Tiered Merge Policy and segment cleanup – we 
are dealing with some high (45%) deleted doc counts in a few cores. Would 
simply upgrading Solr and setting the cores to use Lucene 7.7.2 take advantage 
of these features? Would I need to reindex to get existing segments merged more 
efficiently? Does it depend on the size of my current segments vs the 
configuration of the merge policy or would upgrading simply allow solr to do 
its own thing help mitigate this issue?

Also – I noticed the 7.5+ defaults to the Autoscaling for replication, and 8.0 
defaults to legacy. Would I potentially need to make changes to my existing 
configs to ensure they stay on Legacy replication?

Thanks much!
TZ





Re: merge policy & autocommit

2019-10-28 Thread Shawn Heisey

On 10/28/2019 7:23 AM, Danilo Tomasoni wrote:

We have a solr instance with around 40MLN docs.

In the bulk import phase we noticed a high IO and CPU load and it looks 
like it's related to autocommit because if I disable autocommit the load 
of the system is very low.


I know that disabling autocommit is not recommended, but I'm wondering 
if there is a minimum hardware requirement to make this suggestion 
effective.


What are your settings for autoCommit and autoSoftCommit?  If the 
settings are referring to system properties, have you defined those 
system properties?  Would you be able to restart Solr and then share a 
solr.log file that goes back to that start?


The settings that Solr has shipped with for quite a while are to enable 
autoCommit with a 15 second maxTime, no maxDoc, and openSearcher set to 
false.  The autoSoftCommit setting is not enabled by default.


These settings work well, though I personally think 15 seconds is 
perhaps too frequent, and like to set it to something like one minute 
instead.


With openSearcher set to false, autoCommit will not affect document 
visibility.  If automatically making index changes visible is desired, 
it is better to configure autoSoftCommit in addition to autoCommit ... 
and super short intervals are not recommended.


Our system is not very powerful in terms of IO read/write speed (around 
100 Mbyte/s) is it possible that this relatively low IO performance 
combined with


100MB/sec is not what I would call low I/O.  It's the minimum that you 
can expect from modern commodity SATA hard drives, and some of those can 
go even faster.  It's also roughly equivalent to the maximum real-world 
achievable throughput of a gigabit network connection with TCP-based 
protocols.


autocommit will slow down incredibly our solr instance to the point of 
making it not responsive?


If it's configured correctly, autoCommit should have very little effect 
on performance.  Hard commits that do not open a new searcher should 
happen VERY quickly.  It seems very strange to me that disabling a 
correctly configured autoCommit would substantially affect indexing speeds.


The same can be true also for the merge policy? how the IO speed can 
affect the merge policy parameters?


I kept the default merge policy configuration but it looks like it never 
merges segments. How can I know if a merge is happening?


If you have segments that are radically different sizes, then merging is 
happening.  With default settings, merges from the first level should 
produce segments roughly ten times the size of the ones created by 
indexing.  Second level merges will probably produce segments roughly 
100 times the size of the smallest ones.  Segment merging is a normal 
part of Lucene operation, it would be very unusual for it to not occur.


Merging will affect I/O, but it is extremely rare for merging to happen 
super-quickly.  The fastest I have ever seen merging on a single Solr 
core proceed is about 30 megabytes per second, though usually that 
system achieved about 20 megabytes per second.  Merging involves 
considerable computational work, it's not just a straight data copy.


Thanks,
Shawn


merge policy & autocommit

2019-10-28 Thread Danilo Tomasoni

Hello all,

We have a solr instance with around 40MLN docs.

In the bulk import phase we noticed a high IO and CPU load and it looks 
like it's related to autocommit because if I disable autocommit the load 
of the system is very low.


I know that disabling autocommit is not recommended, but I'm wondering 
if there is a minimum hardware requirement to make this suggestion 
effective.


Our system is not very powerful in terms of IO read/write speed (around 
100 Mbyte/s) is it possible that this relatively low IO performance 
combined with


autocommit will slow down incredibly our solr instance to the point of 
making it not responsive?


The same can be true also for the merge policy? how the IO speed can 
affect the merge policy parameters?


I kept the default merge policy configuration but it looks like it never 
merges segments. How can I know if a merge is happening?



Thank you

Danilo

--
Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu
 
As for the European General Data Protection Regulation 2016/679 on the protection of natural persons with regard to the processing of personal data, we inform you that all the data we possess are object of treatment in the respect of the normative provided for by the cited GDPR.

It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to



Re: Default merge policy

2018-10-12 Thread Shawn Heisey

On 10/12/2018 8:32 AM, root23 wrote:

We are on solr 6.  and as per the documentation i think solr 6 uses
TieredMergePolicyFactory.
However we have not specified it in the following way

   10
   10


We still use 25. which i understand is not used
by TieredMergePolicyFactory.


Supplementing what Erick said.  Erick's info is completely valid.

For the version you are on, specifying a mergeFactor of 25 with no other 
merge-policy related config effectively results in this config:



  25
  25
  30


I would recommend replacing mergeFactor with an explicit merge policy 
config.  Since you are going with numbers higher than default, you 
should set maxMergeAtOnceExplicit to three times the value of the other 
two settings.  In this case, that would be 75.


Thanks,
Shawn



Re: Default merge policy

2018-10-12 Thread Erick Erickson
bq. However we have not specified it in the following way

Is that a typo and you mean "have now specified"?

There's code in SolrIndexConfig:

if (policy instanceof TieredMergePolicy) {
  if (mergeFactor != -1) {
tieredMergePolicy.setMaxMergeAtOnce(mergeFactor);
tieredMergePolicy.setSegmentsPerTier(mergeFactor);
  }
}

So TieredMergePolicy is the default and sets these two parameters to
your mergeFactor. This support is removed in Solr 7, so I'd recommend
you update your configs.

Best,
Erick
On Fri, Oct 12, 2018 at 8:53 AM root23  wrote:
>
> Hi all,
> I am little bit confused.
> We are on solr 6.  and as per the documentation i think solr 6 uses
> TieredMergePolicyFactory.
> However we have not specified it in the following way
> 
>   10
>   10
> 
>
> We still use 25. which i understand is not used
> by TieredMergePolicyFactory.
>
> So my confusion is that which MergePolicy is being used by solr  ? and what
> settings are being applied.
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Default merge policy

2018-10-12 Thread root23
Hi all,
I am little bit confused. 
We are on solr 6.  and as per the documentation i think solr 6 uses
TieredMergePolicyFactory.
However we have not specified it in the following way

  10
  10


We still use 25. which i understand is not used
by TieredMergePolicyFactory.

So my confusion is that which MergePolicy is being used by solr  ? and what
settings are being applied.






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Changing merge policy config on production

2017-12-16 Thread alexpusch
Thanks Erick, good point on maxMergedSegmentMB, many of my segments really
are max out.
My index isn't 800G, but it's not far from it -  it's about 250G per server.
I have high confidence in Solr and my EC2 i3-2xl instances, so far I got
pretty good results.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Changing merge policy config on production

2017-12-16 Thread Erick Erickson
So I'm guessing you have something on the order of an 800G index?

The max segment size is roughly 5G (by default) and assuming all your segments
are close to the max size I get 160 * 5G = 800G, but that may be off.

I think you're barking up the wrong tree if these numbers are close to
correct. This is
a _very_ large index. segments per tier and merge at once aren't going
to help you
here.

You could possibly set
maxMergedSegmentMB
to something other than the default 5000, but I seriously doubt your problem is
the number of segments. You need more shards/hardware if my numbers
are in the right ballpark to have a serious impact on performance.

Best,
Erick

On Sat, Dec 16, 2017 at 12:35 AM, alexpusch  wrote:
> To be clear - I'm talking about query performance, not indexing performance.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Changing merge policy config on production

2017-12-16 Thread alexpusch
To be clear - I'm talking about query performance, not indexing performance.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Changing merge policy config on production

2017-12-16 Thread alexpusch
Thanks for the quick answer Erick,

I'm hoping to improve performance by reducing the number of segments.

Currently I have ~160 segments. Am I wrong thinking it might improve
performance?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Changing merge policy config on production

2017-12-15 Thread Erick Erickson
The merge rate will be limited by the number of merge threads. You'll merge
more often though so the load will change. That said, I wouldn't be
concerned unless you have a very high indexing rate.

Why do you want to change anyway? Unless you've tried the new settings in a
Dev environment, the biggest risk seems to me to be whether the new
settings are just plain bad in your situation rather than what the short
term effects are.

On Dec 15, 2017 4:20 PM, "alexpusch"  wrote:

> Hi,
> Is it safe to change the mergePolicyFactory config on production servers?
> Specifically maxMergeAtOnce and segmentsPerTier. How will solr reconcile
> the
> current state of the segments with the new config? In case of setting
> segmentsPerTier to a lower number - will subsequent merges be particulary
> heavy and might cause performance issues?
>
> Thanks,
> Alex.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Changing merge policy config on production

2017-12-15 Thread alexpusch
Hi,
Is it safe to change the mergePolicyFactory config on production servers?
Specifically maxMergeAtOnce and segmentsPerTier. How will solr reconcile the
current state of the segments with the new config? In case of setting
segmentsPerTier to a lower number - will subsequent merges be particulary
heavy and might cause performance issues?

Thanks,
Alex.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Merging is not taking place with tiered merge policy

2017-10-23 Thread Erick Erickson
1> merging takes place up until the max segment size is reached (5G in
the default TieredMergePolicy).

2> there are a couple of options, again config changes for TieredMergePolicy
10
might help.

You could also try upping this (the default is 5G).
5000

Best,
Erick


On Mon, Oct 23, 2017 at 10:34 AM, chandrushanmugasundaram
 wrote:
> Thanks eric.
>
> (Beginner in solr). Few questions.
>
> 1. Does merging take place only when we have deleted docs?
> When my segments reach a count of 35+ the search is getting slow.Only on
> performing force merge to index the search is efficient.
>
> 2. Is there any way we can reduce the number of segments in solr
> automatically without any cron job by just altering some configuration in
> solrconfig.xml.
>
>
>
>
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Merging is not taking place with tiered merge policy

2017-10-23 Thread chandrushanmugasundaram
Thanks eric.

(Beginner in solr). Few questions.

1. Does merging take place only when we have deleted docs? 
When my segments reach a count of 35+ the search is getting slow.Only on
performing force merge to index the search is efficient. 

2. Is there any way we can reduce the number of segments in solr
automatically without any cron job by just altering some configuration in
solrconfig.xml.










--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Merging is not taking place with tiered merge policy

2017-10-23 Thread chandrushanmugasundaram
Amrit,

Thanks for your reply. I have removed that 


1000
1
15
false
1024

  2
  2

hdfs

1
0








--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Merging is not taking place with tiered merge policy

2017-10-23 Thread Erick Erickson
And please define what you mean by "merging is not working". One
parameter is max segments size, which defaults to 5G. Segments
at or near that size are not eligible for merging unless they have
around 50% deleted docs.

Best,
Erick

On Mon, Oct 23, 2017 at 3:11 AM, Amrit Sarkar  wrote:
> Chandru,
>
> Didn't try the above config bu whyt have you defined both "mergePolicy" and
> "mergePolicyFactory"? and pass different values for same parameters?
>
>
> 
>>   10
>>   1
>> 
>> 
>>   10
>>   10
>> 
>>
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Mon, Oct 23, 2017 at 11:00 AM, Chandru Shanmugasundaram <
> chandru.shanmugasunda...@exterro.com> wrote:
>
>> The following is my solrconfig.xml
>>
>> 
>> 1000
>> 1
>> 15
>> false
>> 1024
>> 
>>   10
>>   1
>> 
>> 
>>   10
>>   10
>> 
>> hdfs
>> 
>> 1
>> 0
>> 
>>   
>>
>> Please let me know if should I tweak something above
>>
>>
>> --
>> Thanks,
>> Chandru.S
>>


Re: Merging is not taking place with tiered merge policy

2017-10-23 Thread Amrit Sarkar
Chandru,

Didn't try the above config bu whyt have you defined both "mergePolicy" and
"mergePolicyFactory"? and pass different values for same parameters?



>   10
>   1
> 
> 
>   10
>   10
> 
>


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Oct 23, 2017 at 11:00 AM, Chandru Shanmugasundaram <
chandru.shanmugasunda...@exterro.com> wrote:

> The following is my solrconfig.xml
>
> 
> 1000
> 1
> 15
> false
> 1024
> 
>   10
>   1
> 
> 
>   10
>   10
> 
> hdfs
> 
> 1
> 0
> 
>   
>
> Please let me know if should I tweak something above
>
>
> --
> Thanks,
> Chandru.S
>


Merging is not taking place with tiered merge policy

2017-10-22 Thread Chandru Shanmugasundaram
The following is my solrconfig.xml


1000
1
15
false
1024

  10
  1


  10
  10

hdfs

1
0

  

Please let me know if should I tweak something above


-- 
Thanks,
Chandru.S


Re: Merge policy

2016-10-28 Thread Walter Underwood
25% overhead is pretty good. It is easy for a merge to need almost double the 
space of a minimum sized index. It is possible to use 3X the space.

Don’t try use the least possible disk space. If there isn’t enough free space 
on the disk, Solr cannot merge the big indexes. Ever. That may be what has 
happened here.

Make sure the nodes have at lease 100 Gb of free space on the volumes, maybe 
150. That space is not “wasted” or “unused”. It is necessary for merges.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 28, 2016, at 12:20 AM, Arkadi Colson  wrote:
> 
> The index size of 1 shard is about 125GB and we are running 11 shards with 
> replication factor 2 so it's a lot of data. The deletions percentage at the 
> bottom of the segment page is around 25%. So it's quite some space which we 
> could recover. That's why I was looking for an optimize.
> 
> Do you have any idea why the merge policy does not merge away the deletions? 
> Should I tweak some parameters somehow? It's a default installation using the 
> default settings and parameters. If you need more info, just let me know...
> 
> Thx!
> 
> On 27-10-16 17:40, Erick Erickson wrote:
>> Why do you think you need to get rid of the deleted data? During normal
>> indexing, these will be "merged away". Optimizing has some downsides
>> for continually changing indexes, in particular since the default 
>> tieredmergepolicy tries to merge "like size" segments, deletions will
>> accumulate in your one large segment and the percentage of
>> deleted documents may get even higher.
>> 
>> Unless there's some measurable performance gain that the users
>> will notice, I'd just leave this alone.
>> 
>> The exception here is if you have, say, an index that changes rarely
>> in which case optimizing then makes more sense.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson > <mailto:ark...@smartbit.be>> wrote:
>> Thanks for the answer!
>> Do you know if there is a way to trigger an optimize for only 1 shard and 
>> not the whole collection at once?
>> 
>> On 27-10-16 15:30, Pushkar Raste wrote:
>>> Try commit with expungeDeletes="true"
>>> 
>>> I am not sure if it will merge old segments that have deleted documents.
>>> 
>>> In the worst case you can 'optimize' your index which should take care of 
>>> removing deleted document
>>> 
>>> 
>>> On Oct 27, 2016 4:20 AM, "Arkadi Colson" >> <mailto:ark...@smartbit.be>> wrote:
>>> Hi
>>> 
>>> As you can see in the screenshot above in the oldest segments there are a 
>>> lot of deletions. In total the shard has about 26% deletions. How can I get 
>>> rid of them so the index will be smaller again?
>>> Can this only be done with an optimize or does it also depend on the merge 
>>> policy? If it also depends also on the merge policy which one should I 
>>> choose then?
>>> 
>>> Thanks!
>>> 
>>> BR,
>>> Arkadi
>> 
>> 
> 



Re: Merge policy

2016-10-28 Thread Emir Arnautovic

I got some notification from mailer, so not sure if my reply reached you:

"If you are using TieredMergePolicy, you can try setting 
/*reclaimDeletesWeight*/."


HTH,
Emir


On 28.10.2016 09:20, Arkadi Colson wrote:


The index size of 1 shard is about 125GB and we are running 11 shards 
with replication factor 2 so it's a lot of data. The deletions 
percentage at the bottom of the segment page is around 25%. So it's 
quite some space which we could recover. That's why I was looking for 
an optimize.


Do you have any idea why the merge policy does not merge away the 
deletions? Should I tweak some parameters somehow? It's a default 
installation using the default settings and parameters. If you need 
more info, just let me know...


Thx!


On 27-10-16 17:40, Erick Erickson wrote:

Why do you think you need to get rid of the deleted data? During normal
indexing, these will be "merged away". Optimizing has some downsides
for continually changing indexes, in particular since the default
tieredmergepolicy tries to merge "like size" segments, deletions will
accumulate in your one large segment and the percentage of
deleted documents may get even higher.

Unless there's some measurable performance gain that the users
will notice, I'd just leave this alone.

The exception here is if you have, say, an index that changes rarely
in which case optimizing then makes more sense.

Best,
Erick

On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson <mailto:ark...@smartbit.be>> wrote:


Thanks for the answer!
Do you know if there is a way to trigger an optimize for only 1
shard and not the whole collection at once?


On 27-10-16 15:30, Pushkar Raste wrote:


Try commit with expungeDeletes="true"

I am not sure if it will merge old segments that have deleted
documents.

In the worst case you can 'optimize' your index which should
take care of removing deleted document


On Oct 27, 2016 4:20 AM, "Arkadi Colson" mailto:ark...@smartbit.be>> wrote:

Hi

As you can see in the screenshot above in the oldest
segments there are a lot of deletions. In total the shard
has about 26% deletions. How can I get rid of them so the
index will be smaller again?
Can this only be done with an optimize or does it also
depend on the merge policy? If it also depends also on the
merge policy which one should I choose then?

Thanks!

BR,
Arkadi








--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Merge policy

2016-10-28 Thread Arkadi Colson
It's a default installation using the default settings and parameters. 
Should I perhaps change the segment size or so? Is it possible to do 
live without re-indexing? If you need more info, just let me know...


Thx!


On 27-10-16 19:03, Walter Underwood wrote:

That distribution of segment sizes seems odd. Why so many medium-large segments?

Are there custom settings for merge policy? I think the default policy would 
avoid so many segments that are mostly deleted documents.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Oct 27, 2016, at 9:40 AM, Shawn Heisey  wrote:

On 10/27/2016 9:50 AM, Yonik Seeley wrote:

On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson 
wrote:

Thanks for the answer! Do you know if there is a way to trigger an
optimize for only 1 shard and not the whole collection at once?

Adding a "distrib=false" parameter should work I think.

Last time I checked, which I admit has been a little while, optimize
ignored distrib and proceeded with a sequential optimize of every core
in the collection.

Thanks,
Shawn







Re: Merge policy

2016-10-28 Thread Arkadi Colson
The index size of 1 shard is about 125GB and we are running 11 shards 
with replication factor 2 so it's a lot of data. The deletions 
percentage at the bottom of the segment page is around 25%. So it's 
quite some space which we could recover. That's why I was looking for an 
optimize.


Do you have any idea why the merge policy does not merge away the 
deletions? Should I tweak some parameters somehow? It's a default 
installation using the default settings and parameters. If you need more 
info, just let me know...


Thx!


On 27-10-16 17:40, Erick Erickson wrote:

Why do you think you need to get rid of the deleted data? During normal
indexing, these will be "merged away". Optimizing has some downsides
for continually changing indexes, in particular since the default
tieredmergepolicy tries to merge "like size" segments, deletions will
accumulate in your one large segment and the percentage of
deleted documents may get even higher.

Unless there's some measurable performance gain that the users
will notice, I'd just leave this alone.

The exception here is if you have, say, an index that changes rarely
in which case optimizing then makes more sense.

Best,
Erick

On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson <mailto:ark...@smartbit.be>> wrote:


Thanks for the answer!
Do you know if there is a way to trigger an optimize for only 1
shard and not the whole collection at once?


On 27-10-16 15:30, Pushkar Raste wrote:


Try commit with expungeDeletes="true"

I am not sure if it will merge old segments that have deleted
documents.

In the worst case you can 'optimize' your index which should take
care of removing deleted document


On Oct 27, 2016 4:20 AM, "Arkadi Colson" mailto:ark...@smartbit.be>> wrote:

Hi

As you can see in the screenshot above in the oldest segments
there are a lot of deletions. In total the shard has about
26% deletions. How can I get rid of them so the index will be
smaller again?
    Can this only be done with an optimize or does it also depend
on the merge policy? If it also depends also on the merge
policy which one should I choose then?

Thanks!

BR,
Arkadi








Re: Merge policy

2016-10-27 Thread Walter Underwood
That distribution of segment sizes seems odd. Why so many medium-large segments?

Are there custom settings for merge policy? I think the default policy would 
avoid so many segments that are mostly deleted documents.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 27, 2016, at 9:40 AM, Shawn Heisey  wrote:
> 
> On 10/27/2016 9:50 AM, Yonik Seeley wrote:
>> On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson 
>> wrote:
>>> Thanks for the answer! Do you know if there is a way to trigger an
>>> optimize for only 1 shard and not the whole collection at once? 
>> Adding a "distrib=false" parameter should work I think. 
> 
> Last time I checked, which I admit has been a little while, optimize
> ignored distrib and proceeded with a sequential optimize of every core
> in the collection.
> 
> Thanks,
> Shawn
> 



Re: Merge policy

2016-10-27 Thread Shawn Heisey
On 10/27/2016 9:50 AM, Yonik Seeley wrote:
> On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson 
> wrote:
>> Thanks for the answer! Do you know if there is a way to trigger an
>> optimize for only 1 shard and not the whole collection at once? 
> Adding a "distrib=false" parameter should work I think. 

Last time I checked, which I admit has been a little while, optimize
ignored distrib and proceeded with a sequential optimize of every core
in the collection.

Thanks,
Shawn



Re: Merge policy

2016-10-27 Thread Yonik Seeley
On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson  wrote:

> Thanks for the answer!
> Do you know if there is a way to trigger an optimize for only 1 shard and
> not the whole collection at once?
>

Adding a "distrib=false" parameter should work I think.

-Yonik


Re: Merge policy

2016-10-27 Thread Erick Erickson
Why do you think you need to get rid of the deleted data? During normal
indexing, these will be "merged away". Optimizing has some downsides
for continually changing indexes, in particular since the default
tieredmergepolicy tries to merge "like size" segments, deletions will
accumulate in your one large segment and the percentage of
deleted documents may get even higher.

Unless there's some measurable performance gain that the users
will notice, I'd just leave this alone.

The exception here is if you have, say, an index that changes rarely
in which case optimizing then makes more sense.

Best,
Erick

On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson  wrote:

> Thanks for the answer!
> Do you know if there is a way to trigger an optimize for only 1 shard and
> not the whole collection at once?
>
> On 27-10-16 15:30, Pushkar Raste wrote:
>
> Try commit with expungeDeletes="true"
>
> I am not sure if it will merge old segments that have deleted documents.
>
> In the worst case you can 'optimize' your index which should take care of
> removing deleted document
>
> On Oct 27, 2016 4:20 AM, "Arkadi Colson"  wrote:
>
>> Hi
>>
>> As you can see in the screenshot above in the oldest segments there are a
>> lot of deletions. In total the shard has about 26% deletions. How can I get
>> rid of them so the index will be smaller again?
>> Can this only be done with an optimize or does it also depend on the
>> merge policy? If it also depends also on the merge policy which one should
>> I choose then?
>>
>> Thanks!
>>
>> BR,
>> Arkadi
>>
>
>


Re: Merge policy

2016-10-27 Thread Arkadi Colson

Thanks for the answer!
Do you know if there is a way to trigger an optimize for only 1 shard 
and not the whole collection at once?



On 27-10-16 15:30, Pushkar Raste wrote:


Try commit with expungeDeletes="true"

I am not sure if it will merge old segments that have deleted documents.

In the worst case you can 'optimize' your index which should take care 
of removing deleted document



On Oct 27, 2016 4:20 AM, "Arkadi Colson" <mailto:ark...@smartbit.be>> wrote:


Hi

As you can see in the screenshot above in the oldest segments
there are a lot of deletions. In total the shard has about 26%
deletions. How can I get rid of them so the index will be smaller
again?
Can this only be done with an optimize or does it also depend on
the merge policy? If it also depends also on the merge policy
which one should I choose then?

Thanks!

BR,
Arkadi





Re: Merge policy

2016-10-27 Thread Pushkar Raste
Try commit with expungeDeletes="true"

I am not sure if it will merge old segments that have deleted documents.

In the worst case you can 'optimize' your index which should take care of
removing deleted document

On Oct 27, 2016 4:20 AM, "Arkadi Colson"  wrote:

> Hi
>
> As you can see in the screenshot above in the oldest segments there are a
> lot of deletions. In total the shard has about 26% deletions. How can I get
> rid of them so the index will be smaller again?
> Can this only be done with an optimize or does it also depend on the merge
> policy? If it also depends also on the merge policy which one should I
> choose then?
>
> Thanks!
>
> BR,
> Arkadi
>


Merge policy

2016-10-27 Thread Arkadi Colson

Hi

As you can see in the screenshot above in the oldest segments there are 
a lot of deletions. In total the shard has about 26% deletions. How can 
I get rid of them so the index will be smaller again?
Can this only be done with an optimize or does it also depend on the 
merge policy? If it also depends also on the merge policy which one 
should I choose then?


Thanks!

BR,
Arkadi



Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-16 Thread Dmitry Kan
Hi,

I can confirm similar behaviour, but for solr 4.3.1. We use default values
for merge related settings. Even though mergeFactor=10
by default, there are 13 segments in one core and 30 segments in another. I
am not sure it proves there is a bug in the merging, because it depends on
the TieredMergePolicy. Relevant discussion from the past:
http://lucene.472066.n3.nabble.com/TieredMergePolicy-reclaimDeletesWeight-td4071487.html
Apart from other policy parameters you could play with ReclaimDeletesWeight,
in case you'd like to affect on merging the segments with deletes in them.
See
http://stackoverflow.com/questions/18361300/informations-about-tieredmergepolicy


Regarding your attachment: I believe it got cut by the mailing list system,
could you share it via a file sharing system?

On Sat, Mar 14, 2015 at 7:36 AM, Summer Shire  wrote:

> Hi All,
>
> Did anyone get a chance to look at my config and the InfoStream File ?
>
> I am very curious to see what you think
>
> thanks,
> Summer
>
> > On Mar 6, 2015, at 5:20 PM, Summer Shire  wrote:
> >
> > Hi All,
> >
> > Here’s more update on where I am at with this.
> > I enabled infoStream logging and quickly figured that I need to get rid
> of maxBufferedDocs. So Erick you
> > were absolutely right on that.
> > I increased my ramBufferSize to 100MB
> > and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
> > My config looks like this
> >
> > 
> >false
> >100
> >
> >
> 
> >
> >  3
> >  3
> >
> > class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
> >true
> >  
> >
> > I am attaching a sample infostream log file.
> > In the infoStream logs though you an see how the segments keep on adding
> > and it shows (just an example )
> > allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
> >
> > I looked at TieredMergePolicy.java to see how allowedSegmentCount is
> getting calculated
> > // Compute max allowed segs in the index
> >long levelSize = minSegmentBytes;
> >long bytesLeft = totIndexBytes;
> >double allowedSegCount = 0;
> >while(true) {
> >  final double segCountLevel = bytesLeft / (double) levelSize;
> >  if (segCountLevel < segsPerTier) {
> >allowedSegCount += Math.ceil(segCountLevel);
> >break;
> >  }
> >  allowedSegCount += segsPerTier;
> >  bytesLeft -= segsPerTier * levelSize;
> >  levelSize *= maxMergeAtOnce;
> >}
> >int allowedSegCountInt = (int) allowedSegCount;
> > and the minSegmentBytes is calculated as follows
> > // Compute total index bytes & print details about the index
> >long totIndexBytes = 0;
> >long minSegmentBytes = Long.MAX_VALUE;
> >for(SegmentInfoPerCommit info : infosSorted) {
> >  final long segBytes = size(info);
> >  if (verbose()) {
> >String extra = merging.contains(info) ? " [merging]" : "";
> >if (segBytes >= maxMergedSegmentBytes/2.0) {
> >  extra += " [skip: too large]";
> >} else if (segBytes < floorSegmentBytes) {
> >  extra += " [floored]";
> >}
> >message("  seg=" + writer.get().segString(info) + " size=" +
> String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
> >  }
> >
> >  minSegmentBytes = Math.min(segBytes, minSegmentBytes);
> >  // Accum total byte size
> >  totIndexBytes += segBytes;
> >}
> >
> >
> > any input is welcome.
> >
> > 
> >
> >
> > thanks,
> > Summer
> >
> >
> >> On Mar 5, 2015, at 8:11 AM, Erick Erickson 
> wrote:
> >>
> >> I would, BTW, either just get rid of the  all together
> or
> >> make it much higher, i.e. 10. I don't think this is really your
> >> problem, but you're creating a lot of segments here.
> >>
> >> But I'm kind of at a loss as to what would be different about your
> setup.
> >> Is there _any_ chance that you have some secondary process looking at
> >> your index that's maintaining open searchers? Any custom code that's
> >> perhaps failing to close searchers? Is this a Unix or Windows system?
> >>
> >> And just to be really clear, you _only_ seeing more segments being
> >> added, right? If you're only counting files in the index directory, it's
> >> _possible_ that mergi

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-13 Thread Summer Shire
Hi All,

Did anyone get a chance to look at my config and the InfoStream File ?

I am very curious to see what you think

thanks,
Summer

> On Mar 6, 2015, at 5:20 PM, Summer Shire  wrote:
> 
> Hi All,
> 
> Here’s more update on where I am at with this.
> I enabled infoStream logging and quickly figured that I need to get rid of 
> maxBufferedDocs. So Erick you 
> were absolutely right on that.
> I increased my ramBufferSize to 100MB
> and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
> My config looks like this 
> 
> 
>false
>100
> 
>
> 
>
>  3
>  3
>
>
>true
>  
> 
> I am attaching a sample infostream log file.
> In the infoStream logs though you an see how the segments keep on adding
> and it shows (just an example )
> allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
> 
> I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting 
> calculated
> // Compute max allowed segs in the index
>long levelSize = minSegmentBytes;
>long bytesLeft = totIndexBytes;
>double allowedSegCount = 0;
>while(true) {
>  final double segCountLevel = bytesLeft / (double) levelSize;
>  if (segCountLevel < segsPerTier) {
>allowedSegCount += Math.ceil(segCountLevel);
>break;
>  }
>  allowedSegCount += segsPerTier;
>  bytesLeft -= segsPerTier * levelSize;
>  levelSize *= maxMergeAtOnce;
>}
>int allowedSegCountInt = (int) allowedSegCount;
> and the minSegmentBytes is calculated as follows
> // Compute total index bytes & print details about the index
>long totIndexBytes = 0;
>long minSegmentBytes = Long.MAX_VALUE;
>for(SegmentInfoPerCommit info : infosSorted) {
>  final long segBytes = size(info);
>  if (verbose()) {
>String extra = merging.contains(info) ? " [merging]" : "";
>if (segBytes >= maxMergedSegmentBytes/2.0) {
>  extra += " [skip: too large]";
>} else if (segBytes < floorSegmentBytes) {
>  extra += " [floored]";
>}
>message("  seg=" + writer.get().segString(info) + " size=" + 
> String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
>  }
> 
>  minSegmentBytes = Math.min(segBytes, minSegmentBytes);
>  // Accum total byte size
>  totIndexBytes += segBytes;
>}
> 
> 
> any input is welcome. 
> 
> 
> 
> 
> thanks,
> Summer
> 
> 
>> On Mar 5, 2015, at 8:11 AM, Erick Erickson  wrote:
>> 
>> I would, BTW, either just get rid of the  all together or
>> make it much higher, i.e. 10. I don't think this is really your
>> problem, but you're creating a lot of segments here.
>> 
>> But I'm kind of at a loss as to what would be different about your setup.
>> Is there _any_ chance that you have some secondary process looking at
>> your index that's maintaining open searchers? Any custom code that's
>> perhaps failing to close searchers? Is this a Unix or Windows system?
>> 
>> And just to be really clear, you _only_ seeing more segments being
>> added, right? If you're only counting files in the index directory, it's
>> _possible_ that merging is happening, you're just seeing new files take
>> the place of old ones.
>> 
>> Best,
>> Erick
>> 
>> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey  wrote:
>>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>>>> I _think_, but don't know for sure, that the merging stuff doesn't get
>>>> triggered until you commit, it doesn't "just happen".
>>>> 
>>>> Shot in the dark...
>>> 
>>> I believe that new segments are created when the indexing buffer
>>> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
>>> anytime a new segment is created, the merge policy is checked to see
>>> whether a merge is needed.
>>> 
>>> Thanks,
>>> Shawn
>>> 
> 



Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-06 Thread Summer Shire
Hi All,

Here’s more update on where I am at with this.
I enabled infoStream logging and quickly figured that I need to get rid of 
maxBufferedDocs. So Erick you 
were absolutely right on that.
I increased my ramBufferSize to 100MB
and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
My config looks like this 


false
100




  3
  3


true
  

I am attaching a sample infostream log file.
In the infoStream logs though you an see how the segments keep on adding
and it shows (just an example )
allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0

I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting 
calculated
// Compute max allowed segs in the index
long levelSize = minSegmentBytes;
long bytesLeft = totIndexBytes;
double allowedSegCount = 0;
while(true) {
  final double segCountLevel = bytesLeft / (double) levelSize;
  if (segCountLevel < segsPerTier) {
allowedSegCount += Math.ceil(segCountLevel);
break;
  }
  allowedSegCount += segsPerTier;
  bytesLeft -= segsPerTier * levelSize;
  levelSize *= maxMergeAtOnce;
}
int allowedSegCountInt = (int) allowedSegCount;
and the minSegmentBytes is calculated as follows
 // Compute total index bytes & print details about the index
long totIndexBytes = 0;
long minSegmentBytes = Long.MAX_VALUE;
for(SegmentInfoPerCommit info : infosSorted) {
  final long segBytes = size(info);
  if (verbose()) {
String extra = merging.contains(info) ? " [merging]" : "";
if (segBytes >= maxMergedSegmentBytes/2.0) {
  extra += " [skip: too large]";
} else if (segBytes < floorSegmentBytes) {
  extra += " [floored]";
}
message("  seg=" + writer.get().segString(info) + " size=" + 
String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
  }

  minSegmentBytes = Math.min(segBytes, minSegmentBytes);
  // Accum total byte size
  totIndexBytes += segBytes;
}


any input is welcome. 




thanks,
Summer


> On Mar 5, 2015, at 8:11 AM, Erick Erickson  wrote:
> 
> I would, BTW, either just get rid of the  all together or
> make it much higher, i.e. 10. I don't think this is really your
> problem, but you're creating a lot of segments here.
> 
> But I'm kind of at a loss as to what would be different about your setup.
> Is there _any_ chance that you have some secondary process looking at
> your index that's maintaining open searchers? Any custom code that's
> perhaps failing to close searchers? Is this a Unix or Windows system?
> 
> And just to be really clear, you _only_ seeing more segments being
> added, right? If you're only counting files in the index directory, it's
> _possible_ that merging is happening, you're just seeing new files take
> the place of old ones.
> 
> Best,
> Erick
> 
> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey  wrote:
>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>>> I _think_, but don't know for sure, that the merging stuff doesn't get
>>> triggered until you commit, it doesn't "just happen".
>>> 
>>> Shot in the dark...
>> 
>> I believe that new segments are created when the indexing buffer
>> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
>> anytime a new segment is created, the merge policy is checked to see
>> whether a merge is needed.
>> 
>> Thanks,
>> Shawn
>> 



Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-05 Thread Erick Erickson
I would, BTW, either just get rid of the  all together or
make it much higher, i.e. 10. I don't think this is really your
problem, but you're creating a lot of segments here.

But I'm kind of at a loss as to what would be different about your setup.
Is there _any_ chance that you have some secondary process looking at
your index that's maintaining open searchers? Any custom code that's
perhaps failing to close searchers? Is this a Unix or Windows system?

And just to be really clear, you _only_ seeing more segments being
added, right? If you're only counting files in the index directory, it's
_possible_ that merging is happening, you're just seeing new files take
the place of old ones.

Best,
Erick

On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey  wrote:
> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>> I _think_, but don't know for sure, that the merging stuff doesn't get
>> triggered until you commit, it doesn't "just happen".
>>
>> Shot in the dark...
>
> I believe that new segments are created when the indexing buffer
> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
> anytime a new segment is created, the merge policy is checked to see
> whether a merge is needed.
>
> Thanks,
> Shawn
>


Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Shawn Heisey
On 3/4/2015 4:12 PM, Erick Erickson wrote:
> I _think_, but don't know for sure, that the merging stuff doesn't get
> triggered until you commit, it doesn't "just happen".
> 
> Shot in the dark...

I believe that new segments are created when the indexing buffer
(ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
anytime a new segment is created, the merge policy is checked to see
whether a merge is needed.

Thanks,
Shawn



Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Summer Shire
actually after every commit a new segment gets created. I don't see them
merging down.

what all could i do to debug this better. Hasn't anyone else tried to merge
their segments down to a specific range :) ?

On Wed, Mar 4, 2015 at 3:12 PM, Erick Erickson 
wrote:

> I _think_, but don't know for sure, that the merging stuff doesn't get
> triggered until you commit, it doesn't "just happen".
>
> Shot in the dark...
>
> Erick
>
> On Wed, Mar 4, 2015 at 1:15 PM, Summer Shire 
> wrote:
> > Hi All,
> >
> > I am using solr 4.7.2 is there a bug wrt merging the segments down ?
> >
> > I recently added the following to my solrConfig.xml
> >
> >   
> > false
> > 100
> > 1000
> > 5
> >   
> >
> >
> > But I do not see any merging of the segments happening. I saw some other
> > people have
> > the same issue but there wasn’t much info. except one suggesting to use
> > 
> >   5
> >   5
> > 
> >  class="org.apache.lucene.index.ConcurrentMergeScheduler”/>
> >
> > instead of mergeFactor.
> >
> > Thanks,
> > Summer
>


Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Erick Erickson
I _think_, but don't know for sure, that the merging stuff doesn't get
triggered until you commit, it doesn't "just happen".

Shot in the dark...

Erick

On Wed, Mar 4, 2015 at 1:15 PM, Summer Shire  wrote:
> Hi All,
>
> I am using solr 4.7.2 is there a bug wrt merging the segments down ?
>
> I recently added the following to my solrConfig.xml
>
>   
> false
> 100
> 1000
> 5
>   
>
>
> But I do not see any merging of the segments happening. I saw some other
> people have
> the same issue but there wasn’t much info. except one suggesting to use
> 
>   5
>   5
> 
> 

solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Summer Shire
Hi All,

I am using solr 4.7.2 is there a bug wrt merging the segments down ?

I recently added the following to my solrConfig.xml

  
false
100
1000
5
  


But I do not see any merging of the segments happening. I saw some other
people have
the same issue but there wasn’t much info. except one suggesting to use

  5
  5


Re: lucene merge policy in solr

2013-03-07 Thread Erick Erickson
I think you're on a slightly wrong track. In Solr 4.1, merging is
done as a background task. In 3.x, an incoming indexing
request would block until the merge completed. In 4.1, all
your indexing requests should return immediately, any merging
will be carried out by background threads so you don't have to do
anything to get this functionality.

Here's a writeup:

http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-you-have-i-can-use-them!/

Best
Erick


On Tue, Mar 5, 2013 at 9:18 PM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:

> Hi,
>
> In earlier lucene version it merges segements periodically
> according to merge policy, when it reached merge time, indexing
> request may take longer time to finish (in my test it may delay
> 10-30 seconds, depending on indexed data size).
>
> I read solr 3.6 - 4.1 doc and we have entries in solrconfig.xml
> to control segment merge. I am wondering if someone gives me
> a very high-level confirmation: in solr 3.6 - 4.1, indexing could
> be delayed also when big merge happens, and before merging finishes
> we cannot index (since collection is locked)?
>
> Thanks very much for helps, Lisheng
>


lucene merge policy in solr

2013-03-05 Thread Zhang, Lisheng
Hi,

In earlier lucene version it merges segements periodically
according to merge policy, when it reached merge time, indexing
request may take longer time to finish (in my test it may delay
10-30 seconds, depending on indexed data size).

I read solr 3.6 - 4.1 doc and we have entries in solrconfig.xml
to control segment merge. I am wondering if someone gives me
a very high-level confirmation: in solr 3.6 - 4.1, indexing could 
be delayed also when big merge happens, and before merging finishes
we cannot index (since collection is locked)?

Thanks very much for helps, Lisheng


Re: Merge Policy Recommendation for 3.6.1

2012-09-29 Thread Sujatha Arun
Thanks Shawn,that helps a lot .our current OS limit is set to 300,000+, I
guess, which is I heard is maximum for the OS .. not sure of the soft and
hard limits .Will check this .

Regards,
Sujatha



On Fri, Sep 28, 2012 at 8:14 PM, Shawn Heisey  wrote:

> On 9/28/2012 12:43 AM, Sujatha Arun wrote:
>
>> Hello,
>>
>> In the case where there are over 200+ cores on a single node , is it
>> recommended to go with Tiered MP with segment size of 4 ? Our Index size
>> vary from a few MB to 4 GB .
>>
>> Will there be any issue with "Too many open files " and the number of
>> indexes with respect to MP ?  At the moment we are thinking of going with
>> Tiered MP ..
>>
>> Os file limit has been set to maximum.
>>
>
> Whether or not to deviate from the standard TieredMergePolicy depends
> heavily on many factors which we do not know, but I can tell you that it's
> probably not a good idea.  That policy typically produces the best results
> in all scenarios.
>
> http://blog.mikemccandless.**com/2011/02/visualizing-**
> lucenes-segment-merges.html<http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html>
>
> On the subject of open files:  With its default configuration, a Solr 3.x
> index will have either 8 or 11 files per segment, depending on whether you
> are using termvectors.  I am completely unsure about 4.0, because I've
> never used it, but it is probably similar.  The following calculations are
> based on my experience with 3.x.
>
> With a segment limit of 4, you might expect to have only six segments
> around at any one time - the four that are being merged, the new merged
> segment, and a segment where new data is being written.  If your system
> indexes data slow enough for merges to complete before another new segment
> is created, this is indeed the most you will ever see.  If your system
> indexes data fast enough, you might actually have short-lived moments with
> 10 or 14 segments, and possibly more.
>
> Assuming some things, which lead to using the 13 segment figure:
> simultaneous indexing to multiple cores at once, with termvectors turned
> on.  With these assumptions, a 200 core Solr installation using 4 segments
> might potentially have nearly 37000 files open, but is more likely to have
> significantly less.  If you increase your merge policy segment limit, the
> numbers will go up from there.
>
> I have configured my Linux servers with a soft file limit of 49152 and a
> hard limit of 65536.  My segment limit is set to 35, and each server has a
> maximum of four active cores, which means that during heavy indexing, I can
> see over 8000 open files.
>
> What does "maximum" on the OS file limit actually mean?  Does your OS have
> a way to specify unlimited? My personal feeling is that it's a bad idea to
> run with no limits at all.  I would imagine that you need to go with a
> minimum soft limit of 65536.  Your segment limit of 4 is probably
> reasonable, unless you will be doing a lot of indexing in a very short
> amount of time.  If you are, you may want a larger limit, and a larger
> number of maximum open files.
>
> Thanks,
> Shawn
>
>


Re: Merge Policy Recommendation for 3.6.1

2012-09-28 Thread Shawn Heisey

On 9/28/2012 12:43 AM, Sujatha Arun wrote:

Hello,

In the case where there are over 200+ cores on a single node , is it
recommended to go with Tiered MP with segment size of 4 ? Our Index size
vary from a few MB to 4 GB .

Will there be any issue with "Too many open files " and the number of
indexes with respect to MP ?  At the moment we are thinking of going with
Tiered MP ..

Os file limit has been set to maximum.


Whether or not to deviate from the standard TieredMergePolicy depends 
heavily on many factors which we do not know, but I can tell you that 
it's probably not a good idea.  That policy typically produces the best 
results in all scenarios.


http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

On the subject of open files:  With its default configuration, a Solr 
3.x index will have either 8 or 11 files per segment, depending on 
whether you are using termvectors.  I am completely unsure about 4.0, 
because I've never used it, but it is probably similar.  The following 
calculations are based on my experience with 3.x.


With a segment limit of 4, you might expect to have only six segments 
around at any one time - the four that are being merged, the new merged 
segment, and a segment where new data is being written.  If your system 
indexes data slow enough for merges to complete before another new 
segment is created, this is indeed the most you will ever see.  If your 
system indexes data fast enough, you might actually have short-lived 
moments with 10 or 14 segments, and possibly more.


Assuming some things, which lead to using the 13 segment figure: 
simultaneous indexing to multiple cores at once, with termvectors turned 
on.  With these assumptions, a 200 core Solr installation using 4 
segments might potentially have nearly 37000 files open, but is more 
likely to have significantly less.  If you increase your merge policy 
segment limit, the numbers will go up from there.


I have configured my Linux servers with a soft file limit of 49152 and a 
hard limit of 65536.  My segment limit is set to 35, and each server has 
a maximum of four active cores, which means that during heavy indexing, 
I can see over 8000 open files.


What does "maximum" on the OS file limit actually mean?  Does your OS 
have a way to specify unlimited? My personal feeling is that it's a bad 
idea to run with no limits at all.  I would imagine that you need to go 
with a minimum soft limit of 65536.  Your segment limit of 4 is probably 
reasonable, unless you will be doing a lot of indexing in a very short 
amount of time.  If you are, you may want a larger limit, and a larger 
number of maximum open files.


Thanks,
Shawn



Merge Policy Recommendation for 3.6.1

2012-09-27 Thread Sujatha Arun
Hello,

In the case where there are over 200+ cores on a single node , is it
recommended to go with Tiered MP with segment size of 4 ? Our Index size
vary from a few MB to 4 GB .

Will there be any issue with "Too many open files " and the number of
indexes with respect to MP ?  At the moment we are thinking of going with
Tiered MP ..

Os file limit has been set to maximum.

Regards
Sujatha


Re: Merge Policy

2009-07-21 Thread Jason Rutherglen
I am referring to setting properties on the *existing* policy
available in Lucene such as LogByteSizeMergePolicy.setMaxMergeMB

On Tue, Jul 21, 2009 at 5:11 PM, Chris
Hostetter wrote:
>
> : SolrIndexConfig accepts a mergePolicy class name, however how does one
> : inject properties into it?
>
> At the moment you can't.
>
> If you look at the history of MergePolicy, users have never been
> encouraged to implement their own (the API actively discourages it,
> without going so far as to make it impossible).
>
>
> -Hoss
>
>


Re: Merge Policy

2009-07-21 Thread Chris Hostetter

: SolrIndexConfig accepts a mergePolicy class name, however how does one
: inject properties into it?

At the moment you can't.  

If you look at the history of MergePolicy, users have never been 
encouraged to implement their own (the API actively discourages it, 
without going so far as to make it impossible).


-Hoss



Merge Policy

2009-07-13 Thread Jason Rutherglen
SolrIndexConfig accepts a mergePolicy class name, however how does one
inject properties into it?