Re: Massively unbalanced CPU by different SOLR Nodes

2020-10-26 Thread Jonathan Tan
Hi Shalin,

Moving to 8.6.3 fixed it!

Thank you very much for that. :)
We'd considered an upgrade - just because - but we won't have done so so
quickly without your information.

Cheers

On Sat, Oct 24, 2020 at 11:37 PM Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Jonathan,
>
> Are you using the "shards.preference" parameter by any chance? There is a
> bug that causes uneven request distribution during fan-out. Can you check
> the number of requests using the /admin/metrics API? Look for the /select
> handler's distrib and local request times for each core in the node.
> Compare those across different nodes.
>
> The bug I refer to is https://issues.apache.org/jira/browse/SOLR-14471 and
> it is fixed in Solr 8.5.2
>
> On Fri, Oct 23, 2020 at 9:05 AM Jonathan Tan  wrote:
>
> > Hi,
> >
> > We've got a 3 node SolrCloud cluster running on GKE, each on their own
> > kube node (which is in itself, relatively empty of other things).
> >
> > Our collection has ~18m documents of 36gb in size, split into 6 shards
> > with 2 replicas each, and they are evenly distributed across the 3 nodes.
> > Our JVMs are currently sized to ~14gb min & max , and they are running on
> > SSDs.
> >
> >
> > [image: Screen Shot 2020-10-23 at 2.15.48 pm.png]
> >
> > Graph also available here: https://pasteboard.co/JwUQ98M.png
> >
> > Under perf testing of ~30 requests per second, we start seeing really bad
> > response times (around 3s in the 90th percentile, and *one* of the nodes
> > would be fully maxed out on CPU. At about 15 requests per second, our
> > response times are reasonable enough for our purposes (~0.8-1.1s), but as
> > is visible in the graph, it's definitely *not* an even distribution of
> the
> > CPU load. One of the nodes is running at around 13cores, whilst the
> other 2
> > are running at ~8cores and 6 cores respectively.
> >
> > We've tracked in our monitoring tools that the 3 nodes *are* getting an
> > even distribution of requests, and we're using a Kube service which is in
> > itself a fairly well known tool for load balancing pods. We've also used
> > kube services heaps for load balancing of other apps and haven't seen
> such
> > a problem, so we doubt it's the load balancer that is the problem.
> >
> > All 3 nodes are built from the same kubernetes statefulset deployment so
> > they'd all have the same configuration & setup. Additionally, over the
> > course of the day, it may suddenly change so that an entirely different
> > node is the one that is majorly overloaded on CPU.
> >
> > All this is happening only under queries, and we are doing no indexing at
> > that time.
> >
> > We'd initially thought it might be the overseer that is being majorly
> > overloaded when under queries (although we were surprised) until we did
> > more testing and found that even the nodes that weren't overseer would
> > sometimes have that disparity. We'd also tried using the `ADDROLE` API to
> > force an overseer change in the middle of a test, and whilst the tree
> > updated to show that the overseer had changed, it made no difference to
> the
> > highest CPU load.
> >
> > Directing queries directly to the non-busy nodes do actually give us back
> > decent response times.
> >
> > We're quite puzzled by this and would really like some help figuring out
> > *why* the CPU on one is so much higher. I did try to get the jaeger
> tracing
> > working (we already have jaeger in our cluster), but we just kept getting
> > errors on startup with solr not being able to load the main function...
> >
> >
> > Thank you in advance!
> > Cheers
> > Jonathan
> >
> >
> >
> >
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Duplicate entries for request handlers in Solr metric reporter

2020-10-26 Thread gnandre
Hi,

I have hooked up Grafana dashboards with Solr 8.5.2 Prometheus exporter.
For some reason, some dashboards like Requests, Timeouts are not showing
any data. When I took a look at corresponding data from Prometheus
exporter, it showed two entries per search request handler, first with
count of 0 and the second with the correct count. I am not sure why the
entry with count 0 is appearing or all search request handlers. I checked
the configuration and there is no duplication of request handlers in
solrconfig.xml. My guest is that Grafana is picking up this first entry and
therefore does not show any data.

E.g.

solr_metrics_core_requests_total{category="QUERY",handler="/questions",core="answers",base_url="
http://localhost:8983/solr",} 0.0

solr_metrics_core_requests_total{category="QUERY",handler="/questions",core="answers",base_url="
http://localhost:8983/solr",} 4534446.0


disallowing delete through security.json

2020-10-26 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
I am interested in disallowing delete through security.json

After seeing the "method" section in 
lucene.apache.org/solr/guide/8_4/rule-based-authorization-plugin.html my first 
attempt was as follows:

{"set-permission":{
"name":"NO_delete",
"path":["/update/*","/update"],
"collection":col_name,
"role":"NoSuchRole",
"method":"DELETE",
"before":4}}

I found, however, that this did not disallow deleted: I could still run
curl -u ... "http://.../solr/col_name/update?commit=true; --data 
"id:11"

After further experimentation, I seemed to have success with
{"set-permission":
{"name":"NO_delete6",
"path":"/update/*",
"collection":"col_name",
"role":"NoSuchRole",
"method":["REGEX:(?i)DELETE"],
"before":4}}

My initial impression was that this did what I wanted; but now I find that this 
disallows *any* updates to this collection (which had previously been allowed). 
Other attempts to tweak this strategy, such as granting permissions for 
"/update/*" for methods other than DELETE to a role which is granted to the 
desired user, have not yet been successful.

Does anyone have an example of security.json disallowing a delete while still 
allowing an update?

Thanks


Question on solr metrics

2020-10-26 Thread yaswanth kumar
Can we get the metrics for a particular time range? I know metrics history
was not enabled, so that I will be having only from when the solr node is
up and running last time, but even from it can we do a data range like for
example on to see CPU usage on a particular time range?

Note: Solr version: 8.2

-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com


List Collections API output as CSV causes NullPointerException

2020-10-26 Thread Michael Belt
I'm trying to get a simple text listing of my collections (so that I can do 
some shell scripting loops [for example calling SOLR Cloud Backup on each]).  
I'm getting an exception when I simply append the "=csv" to the end of the 
collection's LIST api (eg:  
http://localhost:8983/solr/admin/collections?action=LIST=csv  ).  Does 
anyone know if this is a known bug?

Problem accessing /solr/admin/collections. Reason:
{trace=java.lang.NullPointerException
   at 
org.apache.solr.response.TabularResponseWriter.getFields(TabularResponseWriter.java:71)
   at 
org.apache.solr.response.CSVWriter.writeResponse(CSVResponseWriter.java:241)
   at 
org.apache.solr.response.CSVResponseWriter.write(CSVResponseWriter.java:57)
   at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
   at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)


Thanks.


Re: Performance issues with CursorMark

2020-10-26 Thread Erick Erickson
8.6 still has uninvertible=true, so this should go ahead and create an on-heap 
docValues structure. That’s going to consume 38M ints to the heap. Still, that 
shouldn’t require 500M additional space, and this would have been happening in 
your old system anyway so I’m at a loss to explain…

Unless you’re committing frequently or something like that, in which case I 
guess you could have multiple uninverted structures, but that’s a stretch. And 
you can’t get away from sorting on the ID, since it’s required for CursorMark…

I’m wondering whether it’s on the fetch or push phase. What happens if you 
disable firing the docs off to be indexed? It’s vaguely possible that 
CursorMark is a red herring and the slowdown is in the indexing side, at least 
it’s worth checking.

This is puzzling, IDK what would be causing it.

It would be good to get to the bottom of this, but I wanted to mention the 
Collections API REINDEXCOLLECTION command as something possibly worth exploring 
as an alternative. That said, understanding why things have changed is 
important… 

Best,
Erick

> On Oct 26, 2020, at 12:29 PM, Markus Jelsma  
> wrote:
> 
> Hello Anshum,
> 
> Good point! We sort on the collection's uniqueKey, our id field and this one 
> does not have docValues enabled for it. It could be a contender but is it the 
> problem? I cannot easily test it at this scale.
> 
> Thanks,
> Markus
> 
> -Original message-
>> From:Anshum Gupta 
>> Sent: Monday 26th October 2020 17:00
>> To: solr-user@lucene.apache.org
>> Subject: Re: Performance issues with CursorMark
>> 
>> Hey Markus,
>> 
>> What are you sorting on? Do you have docValues enabled on the sort field ?
>> 
>> On Mon, Oct 26, 2020 at 5:36 AM Markus Jelsma 
>> wrote:
>> 
>>> Hello,
>>> 
>>> We have been using a simple Python tool for a long time that eases
>>> movement of data between Solr collections, it uses CursorMark to fetch
>>> small or large pieces of data. Recently it stopped working when moving data
>>> from a production collection to my local machine for testing, the Solr
>>> nodes began to run OOM.
>>> 
>>> I added 500M to the 3G heap and now it works again, but slow (240docs/s)
>>> and costing 3G of the entire heap just to move 32k docs out of 76m total.
>>> 
>>> Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has
>>> 38m docs almost no deletions (0.4%) taking up ~10.6g disk space. The
>>> documents are very small, they are logs of various interactions of users
>>> with our main text search engine.
>>> 
>>> I monitored all four nodes with VisualVM during the transfer, all four
>>> went up to 3g heap consumption very quickly. After the transfer it took a
>>> while for two nodes to (forcefully) release the no longer for the transfer
>>> needed heap space. The two other nodes, now, 17 minutes later, still think
>>> they have to hang on to their heap consumption. When i start the same
>>> transfer again, the nodes that already have high memory consumption just
>>> seem to reuse that, not consuming additional heap. At least the second time
>>> it went 920docs/s. While we are used to transfer these tiny documents at
>>> light speed of multiple thousands per second.
>>> 
>>> What is going on? We do not need additional heap, Solr is clearly not
>>> asking for more and GC activity is minimal. Why did it become so slow?
>>> Regular queries on the collection are still going fast, but CursorMarking
>>> even through a tiny portion is taking time and memory.
>>> 
>>> Many thanks,
>>> Markus
>>> 
>> 
>> 
>> -- 
>> Anshum Gupta
>> 



Re: Question on metric values

2020-10-26 Thread Andrzej Białecki
The “requests” metric is a simple counter. Please see the documentation in the 
Reference Guide on the available metrics and their meaning. This counter is 
initialised when the replica starts up, and it’s not persisted (so if you 
restart this Solr node it will reset to 0).


If by “frequency” you mean rate of requests over a time period then the 1-, 5- 
and 15-min rates are available from “QUERY./select.requestTimes”

—

Andrzej Białecki

> On 26 Oct 2020, at 17:25, yaswanth kumar  wrote:
> 
> I am new to metrics api in solr , when I try to do
> solr/admin/metrics?prefix=QUERY./select.requests its throwing numbers
> against each collection that I have, I can understand those are the
> requests coming in against each collection, but for how much frequencies??
> Like are those numbers from the time the collection went live or are those
> like last n minutes or any config based?? also what's the default
> frequencies when we don't configure anything??
> 
> Note: I am using solr 8.2
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com



RE: Performance issues with CursorMark

2020-10-26 Thread Markus Jelsma
Hello Anshum,

Good point! We sort on the collection's uniqueKey, our id field and this one 
does not have docValues enabled for it. It could be a contender but is it the 
problem? I cannot easily test it at this scale.

Thanks,
Markus
 
-Original message-
> From:Anshum Gupta 
> Sent: Monday 26th October 2020 17:00
> To: solr-user@lucene.apache.org
> Subject: Re: Performance issues with CursorMark
> 
> Hey Markus,
> 
> What are you sorting on? Do you have docValues enabled on the sort field ?
> 
> On Mon, Oct 26, 2020 at 5:36 AM Markus Jelsma 
> wrote:
> 
> > Hello,
> >
> > We have been using a simple Python tool for a long time that eases
> > movement of data between Solr collections, it uses CursorMark to fetch
> > small or large pieces of data. Recently it stopped working when moving data
> > from a production collection to my local machine for testing, the Solr
> > nodes began to run OOM.
> >
> > I added 500M to the 3G heap and now it works again, but slow (240docs/s)
> > and costing 3G of the entire heap just to move 32k docs out of 76m total.
> >
> > Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has
> > 38m docs almost no deletions (0.4%) taking up ~10.6g disk space. The
> > documents are very small, they are logs of various interactions of users
> > with our main text search engine.
> >
> > I monitored all four nodes with VisualVM during the transfer, all four
> > went up to 3g heap consumption very quickly. After the transfer it took a
> > while for two nodes to (forcefully) release the no longer for the transfer
> > needed heap space. The two other nodes, now, 17 minutes later, still think
> > they have to hang on to their heap consumption. When i start the same
> > transfer again, the nodes that already have high memory consumption just
> > seem to reuse that, not consuming additional heap. At least the second time
> > it went 920docs/s. While we are used to transfer these tiny documents at
> > light speed of multiple thousands per second.
> >
> > What is going on? We do not need additional heap, Solr is clearly not
> > asking for more and GC activity is minimal. Why did it become so slow?
> > Regular queries on the collection are still going fast, but CursorMarking
> > even through a tiny portion is taking time and memory.
> >
> > Many thanks,
> > Markus
> >
> 
> 
> -- 
> Anshum Gupta
> 


Question on metric values

2020-10-26 Thread yaswanth kumar
I am new to metrics api in solr , when I try to do
solr/admin/metrics?prefix=QUERY./select.requests its throwing numbers
against each collection that I have, I can understand those are the
requests coming in against each collection, but for how much frequencies??
Like are those numbers from the time the collection went live or are those
like last n minutes or any config based?? also what's the default
frequencies when we don't configure anything??

Note: I am using solr 8.2

-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com


Re: Performance issues with CursorMark

2020-10-26 Thread Anshum Gupta
Hey Markus,

What are you sorting on? Do you have docValues enabled on the sort field ?

On Mon, Oct 26, 2020 at 5:36 AM Markus Jelsma 
wrote:

> Hello,
>
> We have been using a simple Python tool for a long time that eases
> movement of data between Solr collections, it uses CursorMark to fetch
> small or large pieces of data. Recently it stopped working when moving data
> from a production collection to my local machine for testing, the Solr
> nodes began to run OOM.
>
> I added 500M to the 3G heap and now it works again, but slow (240docs/s)
> and costing 3G of the entire heap just to move 32k docs out of 76m total.
>
> Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has
> 38m docs almost no deletions (0.4%) taking up ~10.6g disk space. The
> documents are very small, they are logs of various interactions of users
> with our main text search engine.
>
> I monitored all four nodes with VisualVM during the transfer, all four
> went up to 3g heap consumption very quickly. After the transfer it took a
> while for two nodes to (forcefully) release the no longer for the transfer
> needed heap space. The two other nodes, now, 17 minutes later, still think
> they have to hang on to their heap consumption. When i start the same
> transfer again, the nodes that already have high memory consumption just
> seem to reuse that, not consuming additional heap. At least the second time
> it went 920docs/s. While we are used to transfer these tiny documents at
> light speed of multiple thousands per second.
>
> What is going on? We do not need additional heap, Solr is clearly not
> asking for more and GC activity is minimal. Why did it become so slow?
> Regular queries on the collection are still going fast, but CursorMarking
> even through a tiny portion is taking time and memory.
>
> Many thanks,
> Markus
>


-- 
Anshum Gupta


Re: TieredMergePolicyFactory question

2020-10-26 Thread Moulay Hicham
Thanks Shawn and Erick.

So far I haven't noticed any performance issues before and after the change.

My concern all along is COST. We could have left the configuration as is -
keeping the deleting documents in the index - But we have to scale up our
Solr cluster.  This will double our Solr Cluster Cost. And the additional
COST is what we are trying to avoid.

I will test the expungeDeletes and revert the max segment size back to 5G.

Thanks again,

Moulay

On Mon, Oct 26, 2020 at 5:49 AM Erick Erickson 
wrote:

> "Some large segments were merged into 12GB segments and
> deleted documents were physically removed.”
> and
> “So with the current natural merge strategy, I need to update
> solrconfig.xml
> and increase the maxMergedSegmentMB often"
>
> I strongly recommend you do not continue down this path. You’re making a
> mountain out of a mole-hill. You have offered no proof that removing the
> deleted documents is noticeably improving performance. If you replace
> docs randomly, deleted docs will be removed eventually with the default
> merge policy without you doing _anything_ special at all.
>
> The fact that you think you need to continuously bump up the size of
> your segments indicates your understanding is incomplete. When
> you start changing settings basically at random in order to “fix” a
> problem,
> especially one that you haven’t demonstrated _is_ a problem, you
> invariably make the problem worse.
>
> By making segments larger, you’ve increased the work Solr (well Lucene) has
> to do in order to merge them since the merge process has to handle these
> larger segments. That’ll take longer. There are a fixed number of threads
> that do merging. If they’re all tied up, incoming updates will block until
> a thread frees up. I predict that if you continue down this path,
> eventually
> your updates will start to misbehave and you’ll spend a week trying to
> figure
> out why.
>
> If you insist on worrying about deleted documents, just expungeDeletes
> occasionally. I’d also set the segments size back to the default 5G. I
> can’t
> emphasize strongly enough that the way you’re approaching this will lead
> to problems, not to mention maintenance that is harder than it needs to
> be. If you do set the max segment size back to 5G, your 12G segments will
> _not_ merge until they have lots of deletes, making your problem worse.
> Then you’ll spend time trying to figure out why.
>
> Recovering from what you’ve done already has problems. Those large segments
> _will_ get rewritten (we call it “singleton merge”) when they’ve
> accumulated a
> lot of deletes, but meanwhile you’ll think that your problem is getting
> worse and worse.
>
> When those large segments have more than 10% deleted documents,
> expungeDeletes
> will singleton merge them and they’ll gradually shrink.
>
> So my prescription is:
>
> 1> set the max segment size back to 5G
>
> 2> monitor your segments. When you see your large segments  > 5G have
> more than 10% deleted documents, issue an expungeDeletes command (not
> optimize).
> This will recover your index from the changes you’ve already made.
>
> 3> eventually, all your segments will be under 5G. When that happens, stop
> issuing expungeDeletes.
>
> 4> gather some performance statistics and prove one way or another that as
> deleted
> docs accumulate over time, it impacts performance. NOTE: after your last
> expungeDeletes, deleted docs will accumulate over time until they reach a
> plateau and
> shouldn’t continue increasing after that. If you can _prove_ that
> accumulating deleted
> documents affects performance, institute a regular expungeDeletes.
> Optimize, but
> expungeDeletes is less expensive and on a changing index expungeDeletes is
> sufficient. Optimize is only really useful for a static index, so I’d
> avoid it in your
> situation.
>
> Best,
> Erick
>
> > On Oct 26, 2020, at 1:22 AM, Moulay Hicham 
> wrote:
> >
> > Some large segments were merged into 12GB segments and
> > deleted documents were physically removed.
>
>


Re: json.facet floods the filterCache

2020-10-26 Thread Michael Gibney
Damien, I gathered that you're using "nested facet"; but there are a
lot of different ways to do that, with different implications. e.g.,
nesting terms facet within terms facet, query facet within terms,
terms within query, different stats, sorting, overrequest/overrefine
(and for that matter, refine:none|simple, or even distirbuted vs.
non-distributed), etc. I was wondering if you could share an example
of an actual json facet specification.

Pending more information, I can say that I've been independently
looking into this also. I think high filterCache usage can result if
you're using terms faceting that results in a lot of refinement
requests (either a high setting for overrefine, or
low/unevenly-distributed facet counts (as might happen with
high-cardinality fields). I think nested terms could also magnify the
effect of high-cardinality fields, increasing the number of buckets
needing refinement. You could see if setting refine:none helps (though
of course it could have undesirable effects on the actual results).
But afaict every term specified in a refinement request currently hits
the filterCache:
https://github.com/apache/lucene-solr/blob/40e2122/solr/core/src/java/org/apache/solr/search/facet/FacetProcessor.java#L418

A word of caution regarding the JSON facet `cacheDf` param: although
it's currently undocumented in the refGuide, I believe it's only
respected at all in FacetFieldProcessorByEnumTermsStream, which is
only invoked under certain circumstances (and only when sort=index).
So this is unlikely to help (though it's impossible to say without
more specific information about the actual requests you're trying to
run).

Michael

On Fri, Oct 23, 2020 at 12:52 AM  wrote:
>
> Im dong a nested facet (
> https://lucene.apache.org/solr/guide/8_6/json-facet-api.html#nested-facets)
> or sub-facets, and am using the 'terms' facet.
>
> Digging around more looks like I can set 'cacheDf=-1' to disable the use of
> the cache.
>
> On Fri, 23 Oct 2020 at 00:14, Michael Gibney 
> wrote:
>
> > Damien,
> > Are you able to share the actual json.facet request that you're using
> > (at least just the json.facet part)? I'm having a hard time being
> > confident that I'm correctly interpreting when you say "a json.facet
> > query on nested facets terms".
> > Michael
> >
> > On Thu, Oct 22, 2020 at 3:52 AM Christine Poerschke (BLOOMBERG/
> > LONDON)  wrote:
> > >
> > > Hi Damien,
> > >
> > > You mention about JSON term facets, I haven't explored w.r.t. that but
> > we have observed what you describe for JSON range facets and I've started
> > https://issues.apache.org/jira/browse/SOLR-14939 about it.
> > >
> > > Hope that helps.
> > >
> > > Regards,
> > > Christine
> > >
> > > From: solr-user@lucene.apache.org At: 10/22/20 01:07:59To:
> > solr-user@lucene.apache.org
> > > Subject: json.facet floods the filterCache
> > >
> > > Hi,
> > >
> > > I'm using a json.facet query on nested facets terms and am seeing very
> > high
> > > filterCache usage. Is it possible to somehow control this? With a fq it's
> > > possible to specify fq={!cache=false}... but I don't see a similar thing
> > > json.facet.
> > >
> > > Kind regards,
> > > Damien
> > >
> > >
> >


Re: TieredMergePolicyFactory question

2020-10-26 Thread Erick Erickson
"Some large segments were merged into 12GB segments and
deleted documents were physically removed.”
and
“So with the current natural merge strategy, I need to update solrconfig.xml
and increase the maxMergedSegmentMB often"

I strongly recommend you do not continue down this path. You’re making a
mountain out of a mole-hill. You have offered no proof that removing the
deleted documents is noticeably improving performance. If you replace
docs randomly, deleted docs will be removed eventually with the default
merge policy without you doing _anything_ special at all.

The fact that you think you need to continuously bump up the size of
your segments indicates your understanding is incomplete. When
you start changing settings basically at random in order to “fix” a problem,
especially one that you haven’t demonstrated _is_ a problem, you 
invariably make the problem worse.

By making segments larger, you’ve increased the work Solr (well Lucene) has
to do in order to merge them since the merge process has to handle these
larger segments. That’ll take longer. There are a fixed number of threads
that do merging. If they’re all tied up, incoming updates will block until
a thread frees up. I predict that if you continue down this path, eventually
your updates will start to misbehave and you’ll spend a week trying to figure
out why.

If you insist on worrying about deleted documents, just expungeDeletes
occasionally. I’d also set the segments size back to the default 5G. I can’t
emphasize strongly enough that the way you’re approaching this will lead
to problems, not to mention maintenance that is harder than it needs to
be. If you do set the max segment size back to 5G, your 12G segments will
_not_ merge until they have lots of deletes, making your problem worse. 
Then you’ll spend time trying to figure out why.

Recovering from what you’ve done already has problems. Those large segments
_will_ get rewritten (we call it “singleton merge”) when they’ve accumulated a
lot of deletes, but meanwhile you’ll think that your problem is getting worse 
and worse.

When those large segments have more than 10% deleted documents, expungeDeletes
will singleton merge them and they’ll gradually shrink.

So my prescription is:

1> set the max segment size back to 5G

2> monitor your segments. When you see your large segments  > 5G have 
more than 10% deleted documents, issue an expungeDeletes command (not optimize).
This will recover your index from the changes you’ve already made.

3> eventually, all your segments will be under 5G. When that happens, stop
issuing expungeDeletes.

4> gather some performance statistics and prove one way or another that as 
deleted
docs accumulate over time, it impacts performance. NOTE: after your last
expungeDeletes, deleted docs will accumulate over time until they reach a 
plateau and
shouldn’t continue increasing after that. If you can _prove_ that accumulating 
deleted
documents affects performance, institute a regular expungeDeletes. Optimize, but
expungeDeletes is less expensive and on a changing index expungeDeletes is
sufficient. Optimize is only really useful for a static index, so I’d avoid it 
in your
situation.

Best,
Erick

> On Oct 26, 2020, at 1:22 AM, Moulay Hicham  wrote:
> 
> Some large segments were merged into 12GB segments and
> deleted documents were physically removed.



Performance issues with CursorMark

2020-10-26 Thread Markus Jelsma
Hello,

We have been using a simple Python tool for a long time that eases movement of 
data between Solr collections, it uses CursorMark to fetch small or large 
pieces of data. Recently it stopped working when moving data from a production 
collection to my local machine for testing, the Solr nodes began to run OOM.

I added 500M to the 3G heap and now it works again, but slow (240docs/s) and 
costing 3G of the entire heap just to move 32k docs out of 76m total.

Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has 38m 
docs almost no deletions (0.4%) taking up ~10.6g disk space. The documents are 
very small, they are logs of various interactions of users with our main text 
search engine.

I monitored all four nodes with VisualVM during the transfer, all four went up 
to 3g heap consumption very quickly. After the transfer it took a while for two 
nodes to (forcefully) release the no longer for the transfer needed heap space. 
The two other nodes, now, 17 minutes later, still think they have to hang on to 
their heap consumption. When i start the same transfer again, the nodes that 
already have high memory consumption just  seem to reuse that, not consuming 
additional heap. At least the second time it went 920docs/s. While we are used 
to transfer these tiny documents at light speed of multiple thousands per 
second.

What is going on? We do not need additional heap, Solr is clearly not asking 
for more and GC activity is minimal. Why did it become so slow? Regular queries 
on the collection are still going fast, but CursorMarking even through a tiny 
portion is taking time and memory.

Many thanks,
Markus


Re: When are the score values evaluated?

2020-10-26 Thread Taisuke Miyazaki
This was my mistake.
Thank you.

Taisuke

2020年10月23日(金) 15:02 Taisuke Miyazaki :

> Thanks.
>
> I analyzed it as explain=true and this is what I found.
> Why does this behave this way?
>
> fq=foo:1
> bq=foo:(1)^1
> bf=sum(200)
>
> If you do this, the score will be boosted by bq.
> However, if you remove fq, the score will not be boosted by bq.
> However, if you change the boost value of bq to 2, bq will be boosted
> regardless of whether you have fq or not.
>
> This behavior seems very strange to me. (I'm not familiar with the
> internals of Solr or Lucene).
>
> By the way, this doesn't happen if you change the sum number to a value
> that doesn't need to be expressed as an exponent. (20,000,000 is marked as
> 2.0E7 on EXPLAIN.)
>
> Regards,
> Taisuke
>
> 2020年10月22日(木) 21:41 Erick Erickson :
>
>> You’d get a much better idea of what goes on
>> if you added =true and analyzed the
>> output. That’d show you exactly what is
>> calculated when.
>>
>> Best,
>> Erick
>>
>> > On Oct 22, 2020, at 4:05 AM, Taisuke Miyazaki <
>> miyazakitais...@lifull.com> wrote:
>> >
>> > Hi,
>> >
>> > If you use a high value for the score, the values on the smaller scale
>> are
>> > ignored.
>> >
>> > Example :
>> > bq = foo:(1.0)^1.0
>> > bf = sum(200)
>> >
>> > When I do this, the additional score for "foo" at 1.0 does not affect
>> the
>> > sort order.
>> >
>> > I'm assuming this is an issue with the precision of the score floating
>> > point, is that correct?
>> >
>> > As a test, if we change the query as follows, the order will change as
>> you
>> > would expect, reflecting the additional score of "foo" when it is 1.0
>> > bq = foo:(1.0)^10
>> > bf = sum(200)
>> >
>> > How can I avoid this?
>> > The idea I'm thinking of at the moment is to divide the whole thing by
>> an
>> > appropriate number, such as bf= div(sum(200),100).
>> > However, this may or may not work as expected depending on when the
>> > floating point operations are done and rounded off.
>> >
>> > At what point are score's floats rounded?
>> >
>> > 1. when sorting
>> > 2. when calculating the score
>> > 3. when evaluating each function for each bq and bf
>> >
>> > Regards,
>> > Taisuke
>>
>>


Re: Backup fails despite allowPaths=* being set

2020-10-26 Thread Jan Høydahl
According to the source code here

https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/core/src/java/org/apache/solr/core/SolrPaths.java#L134

your allowPaths value is NOT equal to «*» (which is stored as _ALL_) (parsed 
here 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/core/src/java/org/apache/solr/core/SolrXmlConfig.java#L311)


Please check your solr.xml file, it needs to contain this line
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/server/solr/solr.xml#L33

Jan

> 22. okt. 2020 kl. 15:57 skrev Philipp Trulson :
> 
> I'm sure that this is not the case. On the Java Properties page it says
> "solr.allowPaths  *", on the dashboard I can verify that the
> "-Dsolr.allowPaths=*" option is present.
> 
> Am Mi., 21. Okt. 2020 um 19:10 Uhr schrieb Jan Høydahl <
> jan@cominvent.com>:
> 
>> Are you sure the * is not eaten by the shell since it’s a special char?
>> You can view the sys props in admin UI to check.
>> 
>> Jan Høydahl
>> 
>>> 16. okt. 2020 kl. 19:39 skrev Philipp Trulson :
>>> 
>>> Hello everyone,
>>> 
>>> we are having problems with our backup script since we upgraded to Solr
>>> 8.6.2 on kubernetes. To be more precise the message is
>>> *Path /data/backup/2020-10-16/collection must be relative to SOLR_HOME,
>>> SOLR_DATA_HOME coreRootDirectory. Set system property 'solr.allowPaths'
>> to
>>> add other allowed paths.*
>>> 
>>> I executed the script by calling this endpoint
>>> *curl
>>> '
>> http://solr.default.svc.cluster.local/solr/admin/collections?action=BACKUP=collection=
>>> <
>> http://solr.default.svc.cluster.local/solr/admin/collections?action=BACKUP=collection=
>>> *
>>> collection*=/data/backup/2020-10-16=1114'*
>>> 
>>> The strange thing is that all 5 nodes are started with
>> *-Dsolr.allowPaths=**,
>>> so in theory it should work. The folder is an AWS EFS share, that's the
>>> only reason I can imagine. Or can I check any other options?
>>> 
>>> Thank you for your help!
>>> Philipp
>>> 
>>> --
>>> 
>>> 
>>> 
>>> 
>>> reBuy reCommerce GmbH* · *Potsdamer Str. 188* ·
>>> *10783 Berlin* · *Geschäftsführer: Dr. Philipp GattnerSitz und
>>> Registergericht: Berlin, Amtsgericht Charlottenburg, HRB 109344 B,
>>> *USt-ID-Nr.:* DE237458635
>> 
> 
> 
> -- 
> 
> Philipp Trulson
> 
> Platform Engineer
> mail: p.trul...@rebuy.com · web: www.reBuy.de 
> 
> -- 
> 
> 
> 
> 
> reBuy reCommerce GmbH* · *Potsdamer Str. 188* · 
> *10783 Berlin* · *Geschäftsführer: Dr. Philipp GattnerSitz und 
> Registergericht: Berlin, Amtsgericht Charlottenburg, HRB 109344 B, 
> *USt-ID-Nr.:* DE237458635



Re: TieredMergePolicyFactory question

2020-10-26 Thread Shawn Heisey

On 10/25/2020 11:22 PM, Moulay Hicham wrote:

I am wondering about 3 other things:

1 - You mentioned that I need free disk space. Just to make sure that we
are talking about disc space here. RAM can still remain at the same size?
My current RAM size is  Index size < RAM < 1.5 Index size


You must always have enough disk space available for your indexes to 
double in size.  We recommend having enough disk space for your indexes 
to *triple* in size, because there is a real-world scenario that will 
require that much disk space.



2 - When the merge is happening, it happens in disc and when it's
completed, then the data is sync'ed with RAM. I am just guessing here ;-).
I couldn't find a good explanation online about this.


If you have enough free memory, then the OS will make sure that the data 
is available in RAM.  All modern operating systems do this 
automatically.  Note that I am talking about memory that is not 
allocated to programs.  Any memory assigned to the Solr heap (or any 
other program) will NOT be available for caching index data.


If you want ideal performance in typical situations, you must have as 
much free memory as the space your indexes take up on disk.  For ideal 
performance in ALL situations, you'll want enough free memory to be able 
to hold both the original and optimized copies of your index data at the 
same time.  We have seen that good performance can be achieved without 
going to this extreme, but if you have little free memory, Solr 
performance will be terrible.


I wrote a wiki page that covers this in some detail:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems


3 - Also I am wondering what recommendation you have for continuously
purging deleted documents. optimize? expungeDeletes? Natural Merge?
Here are more details about the need to purge documents.


The only way to guarantee that all deleted docs are purged is to 
optimize.   You could use the expungeDeletes action ... but this might 
not get rid of all the deleted documents, and depending on how those 
documents are distributed across the whole index, expungeDeletes might 
not do anything at all.  These operations are expensive (require a lot 
of time and system resources) and will temporarily increase the size of 
your index, up to double the starting size.


Before you go down the road of optimizing regularly, you should 
determine whether freeing up the disk space for deleted documents 
actually makes a substantial difference in performance.  In very old 
Solr versions, optimizing the index did produce major performance 
gains... but current versions have much better performance on indexes 
that have deleted documents.  Because performance is typically 
drastically reduced while the optimize is happening, the tradeoff may 
not be worthwhile.


Thanks,
Shawn