Re: Question on metric values

2020-10-26 Thread Andrzej Białecki
The “requests” metric is a simple counter. Please see the documentation in the 
Reference Guide on the available metrics and their meaning. This counter is 
initialised when the replica starts up, and it’s not persisted (so if you 
restart this Solr node it will reset to 0).


If by “frequency” you mean rate of requests over a time period then the 1-, 5- 
and 15-min rates are available from “QUERY./select.requestTimes”

—

Andrzej Białecki

> On 26 Oct 2020, at 17:25, yaswanth kumar  wrote:
> 
> I am new to metrics api in solr , when I try to do
> solr/admin/metrics?prefix=QUERY./select.requests its throwing numbers
> against each collection that I have, I can understand those are the
> requests coming in against each collection, but for how much frequencies??
> Like are those numbers from the time the collection went live or are those
> like last n minutes or any config based?? also what's the default
> frequencies when we don't configure anything??
> 
> Note: I am using solr 8.2
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com



Re: Is metrics api enabled by default in solr 8.2

2020-10-14 Thread Andrzej Białecki
SOLR-14914 (scheduled for 8.7) adds a boolean property (either by modifying 
solr.xml:/metrics element or via “metricsEnabled” system property) to almost 
completely turn off the metrics collection and processing. The “almost” part 
means that the instrumentation still remains in place, but the cost is reduced 
to empty method calls.

> On 14 Oct 2020, at 10:03, Radu Gheorghe  wrote:
> 
> Hi,
> 
> Yes, the API works by default on 8.2: 
> https://lucene.apache.org/solr/guide/8_2/metrics-reporting.html
> 
> I don’t know of a way to disable it, but he configuration is described in the 
> page above (i.e. on how to configure different reporters).
> 
> Best regards,
> Radu
> --
> Sematext Cloud - Full Stack Observability - https://sematext.com
> Solr and Elasticsearch Consulting, Training and Production Support
> 
>> On 14 Oct 2020, at 06:05, yaswanth kumar  wrote:
>> 
>> Can I get some info on where to disable or enable metrics api on solr 8.2 ?
>> 
>> I believe its enabled by default on solr 8.2 , where can I check the
>> configurations? and also how can I disable if I want to disable it
>> 
>> -- 
>> Thanks & Regards,
>> Yaswanth Kumar Konathala.
>> yaswanth...@gmail.com
> 



Re: Non Deterministic Results from /admin/luke

2020-10-06 Thread Andrzej Białecki
You may want to check the COLSTATUS collection command added in 8.1 
(https://lucene.apache.org/solr/guide/8_6/collection-management.html#colstatus 
).

This reports much of the information returned by /admin/luke but can also 
report this for all shard leaders in a collection.

> On 2 Oct 2020, at 01:06, Shawn Heisey  wrote:
> 
> On 10/1/2020 4:24 AM, Nussbaum, Ronen wrote:
>> We are using the Luke API in order to get all dynamic field names from our 
>> collection:
>> /solr/collection/admin/luke?wt=csv=0
>> This worked fine in 6.2.1 but it's non deterministic anymore (8.6.1) - looks 
>> like it queries a random single shard.
>> I've tried using /solr/collection/select?q=*:*=csv=0 but it 
>> behaves the same.
>> Can it be configured to query all shards?
>> Is there another way to achieve this?
> 
> The Luke handler (usually at /admin/luke) is not SolrCloud aware.  It is 
> designed to operate on a single core.  So if you send the request to the 
> collection and not a specific core, Solr must forward the request to a core 
> in order for you to get ANY result.  The core selection will be random.
> 
> The software called Luke (which is where the Luke handler gets its name) 
> operates on a Lucene index -- each Solr core is based around a Lucene index.  
> It would be a LOT of work to make the handler SolrCloud aware.
> 
> Depending on how your collection is set up, you may need to query the Luke 
> handler on multiple cores in order to get a full picture of all fields 
> present in the Lucene indexes.  I am not aware of any other way to do it.
> 
> Thanks,
> Shawn
> 



Re: Solr waitForMerges() causing leaderless shard during shutdown

2020-09-28 Thread Andrzej Białecki
Hi Ramsey,

This is an interesting scenario, I vaguely remember someone (Cao Manh Dat?) on 
a similar issue - I’m not sure if newer versions of Solr already fixed that but 
it would be helpful to create a Jira issue to investigate it and verify that 
it’s indeed fixed in a more recent Solr release.


> On 16 Sep 2020, at 13:42, Ramsey Haddad (BLOOMBERG/ LONDON) 
>  wrote:
> 
> Hi Solr community,
> 
> We have been investigating an issue in our solr (7.5.0) setup where the 
> shutdown of our solr node takes quite some time (3-4 minutes) during which we 
> are effectively leaderless.
> After investigating and digging deeper we were able to track it down to 
> segment merges which happen before a solr core is closed.
> 
>  stack trace when killing the node 
> 
> 
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode):
> 
> "Attach Listener" #150736 daemon prio=9 os_prio=0 tid=0x7f6da4002000 
> nid=0x13292 waiting on condition [0x]
> java.lang.Thread.State: RUNNABLE
> 
> "coreCloseExecutor-22-thread-1" #150733 prio=5 os_prio=0 
> tid=0x7f6d54020800 nid=0x11b61 in Object.wait() [0x7f6c98564000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> ~at java.lang.Object.wait(Native Method)
> ~at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4672)
> ~- locked <0x0005499908c0> (a org.apache.solr.update.SolrIndexWriter)
> ~at org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2559)
> ~- locked <0x0005499908c0> (a org.apache.solr.update.SolrIndexWriter)
> ~at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1036)
> ~at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1078)
> ~at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:286)
> ~at 
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:892)
> ~at 
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:105)
> ~at 
> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:399)
> ~- locked <0x00054e150cc0> (a org.apache.solr.update.DefaultSolrCoreState)
> ~at 
> org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:83)
> ~at org.apache.solr.core.SolrCore.close(SolrCore.java:1574)
> ~at org.apache.solr.core.SolrCores.lambda$close$0(SolrCores.java:141)
> ~at org.apache.solr.core.SolrCores$$Lambda$443/1058423472.call(Unknown Source)
> ~at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> 
> 
> 
> The situation is as follows -
> 
> 1. The first thing that happens is the request handlers being closed at -
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1588
> 
> 2. Then it tries to close the index writer via -
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1610
> 
> 3. When closing the index writer, it waits for any pending merges to finish 
> at -
> https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L1236
> 
> Now, if this waitForMerges() takes a long time (3-4 minutes), the instance 
> won't shut down for the whole of that time, but because of *Step 1* it will 
> stop
> accepting any requests.
> 
> This becomes a problem when this node has a leader replica and it is stuck on 
> waitForMerges() after closing its reqHandlers. We are in a situation where
> the leader is not accepting requests but has not given away the leadership, 
> so we are in a leaderless phase.
> 
> 
> This issue triggers when we turnaround our nodes which causes a brief period 
> of leaderless shards which leads to potential data losses.
> 
> My question is -
> 1. How to avoid this situation given that we have big segment sizes and the 
> merging the largest segments is going to take some time.
> We do not want to reduce the segment size as it will impact our search 
> performance which is crucial.
> 2. Should Solr ideally not do the waitForMerges() step before closing the 
> request handlers?
> 
> 
> Merge Policy config and segment size -
> 
> 
> time_of_arrival desc
> inner
> org.apache.solr.index.TieredMergePolicyFactory
> 
> 16
> 20480
> 
> 
> 



Re: SegmentsInfoRequestHandler does not release IndexWriter

2020-04-23 Thread Andrzej Białecki
Hi Tiziano,

Indeed, this looks like a bug - good catch! Please file a Jira issue, I’ll get 
to it soon.

> On 23 Apr 2020, at 00:19, Tiziano Degaetano  
> wrote:
> 
> Hello,
> 
> I’m digging in an issue getting timeouts doing a managed schema change using 
> the schema api.
> The call  hangs reloading the cores (does not recover until restarting the 
> node):
> 
> sun.misc.Unsafe.park​(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos​(Unknown Source)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos​(Unknown 
> Source)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos​(Unknown
>  Source)
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock​(Unknown 
> Source)
> org.apache.solr.update.DefaultSolrCoreState.lock​(DefaultSolrCoreState.java:179)
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter​(DefaultSolrCoreState.java:230)
> org.apache.solr.core.SolrCore.reload​(SolrCore.java:696)
> org.apache.solr.core.CoreContainer.reload​(CoreContainer.java:1558)
> org.apache.solr.schema.SchemaManager.doOperations​(SchemaManager.java:133)
> org.apache.solr.schema.SchemaManager.performOperations​(SchemaManager.java:92)
> org.apache.solr.handler.SchemaHandler.handleRequestBody​(SchemaHandler.java:90)
> org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:211)
> org.apache.solr.core.SolrCore.execute​(SolrCore.java:2596)
> org.apache.solr.servlet.HttpSolrCall.execute​(HttpSolrCall.java:802)
> org.apache.solr.servlet.HttpSolrCall.call​(HttpSolrCall.java:579)
> 
> After a while I realized it was only deadlocked, after I used the AdminUI to 
> view the segments info of the core.
> 
> So my question: is this line correct? If withCoreInfo is false iwRef.decref() 
> will not be called to release the reader lock, preventing any further writer 
> locks.
> https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
> 
> Regards,
> Tiziano
> 



Re: How to compute index size

2020-02-04 Thread Andrzej Białecki
If you’re using Solr 8.2 or newer there’s a built-in index analysis tool that 
gives you a better understanding of what kind of data in your index occupies 
the most disk space, so that you can tweak your schema accordingly: 
https://lucene.apache.org/solr/guide/8_2/collection-management.html#colstatus 


Which is another way of saying that you have to try and see ;)

> On 3 Feb 2020, at 18:02, David Hastings  wrote:
> 
> Yup, I find the right calculation to be as much ram as the server can take,
> and as much SSD space as it will hold, when you run out, buy another server
> and repeat.  machines/ram/SSD's are cheap.  just get as much as you can.
> 
> On Mon, Feb 3, 2020 at 11:59 AM Walter Underwood 
> wrote:
> 
>> What he said.
>> 
>> But if you must have a number, assume that the index will be as big as
>> your (text) data. It might be 2X bigger or 2X smaller. Or 3X or 4X, but
>> that is a starting point. Once you start updating, the index might get as
>> much as 2X bigger before merges.
>> 
>> Do NOT try to get by with the smallest possible RAM or disk.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 3, 2020, at 5:28 AM, Erick Erickson 
>> wrote:
>>> 
>>> I’ve always had trouble with that advice, that RAM size should be JVM +
>> index size. I’ve seen 300G indexes (as measured by the size of the
>> data/index directory) run in 128G of memory.
>>> 
>>> Here’s the long form:
>> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>> 
>>> But the short form is “stress test and see”.
>>> 
>>> To answer your question, though, when people say “index size” they’re
>> usually referring to the size on disk as I mentioned above.
>>> 
>>> Best,
>>> Erick
>>> 
 On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz 
>> wrote:
 
 Hello All,
 
 I want to size the RAM for my Solr cloud instance. The thumb rule is
>> your
 total RAM size should be = (JVM size + index size)
 
 Now I have a simple question, How do I know my index size? A simple
>> method,
 perhaps from the Solr cloud admin UI or an API?
 
 My assumption so far is the total segment info size is the same as the
 index size.
 
 Thanks & Regards
 Farhan
>>> 
>> 
>> 



Re: [EXTERNAL] Autoscaling simulation error

2019-12-19 Thread Andrzej Białecki
Hi,

Thanks for the data. I see the problem now - it’s a bug in the simulator. I 
filed a Jira issue to track and fix it: SOLR-14122.

> On 16 Dec 2019, at 19:13, Cao, Li  wrote:
> 
>> I am using solr 8.3.0 in cloud mode. I have collection level autoscaling 
>> policy and the collection name is “entity”. But when I run autoscaling 
>> simulation all the steps failed with this message:
>> 
>>   "error":{
>> "exception":"java.io.IOException: 
>> java.util.concurrent.ExecutionException: 
>> org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: 
>> Could not find collection : entity/shards",
>> "suggestion":{
>>   "type":"repair",
>>   "operation":{
>> "method":"POST",
>> "path":"/c/entity/shards",
>> "command":{"add-replica":{
>> "shard":"shard2",
>> "node":"my_node:8983_solr",
>> "type":"TLOG",
>> "replicaInfo":null}}},



Re: "No value present" when set cluster policy for autoscaling in solr cloud mode

2019-12-19 Thread Andrzej Białecki
Hi,

For some strange reason global tags (such as “cores”) don’t support the 
“nodeset” syntax. For “cores” the only supported attribute is “node”, and then 
you’re only allowed to use #ANY or a single specific node name (with optional 
“!" NOT operand), or a JSON array containing node names to indicate the IN 
operand.

The Ref Guide indeed is not very clear on that…


> On 17 Dec 2019, at 21:20, Cao, Li  wrote:
> 
> Hi!
> 
> I am trying to add a cluster policy to a freshly built 8.3.0 cluster (no 
> collection added). I got this error when adding such a cluster policy
> 
> { 
> "set-cluster-policy":[{"cores":"<3","nodeset":{"sysprop.rex.node.type":"tlog"}}]}
> 
> Basically I want to limit the number of cores for certain machines with a 
> special environmental variable value.
> 
> But I got this error response:
> 
> {
>  "responseHeader":{
>"status":400,
>"QTime":144},
>  "result":"failure",
>  "WARNING":"This response format is experimental.  It is likely to change in 
> the future.",
>  "error":{
>"metadata":[
>  "error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject",
>  "root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"],
>"details":[{
>"set-cluster-policy":[{
>"cores":"<3",
>"nodeset":{"sysprop.rex.node.type":"tlog"}}],
>"errorMessages":["No value present"]}],
>"msg":"Error in command payload",
>"code":400}}
> 
> However, this works:
> 
> { "set-cluster-policy":[{"cores":"<3","node":"#ANY"}]}
> 
> I read the autoscaling policy documentations and cannot figure out why. Could 
> someone help me on this?
> 
> Thanks!
> 
> Li



Re: Autoscaling simulation error

2019-12-15 Thread Andrzej Białecki
Could you please provide the exact command-line? It would also help if you 
could provide an autoscaling snapshot of the cluster (bin/solr autoscaling 
-save ) or at least the autoscaling diagnostic info.

(Please note that the mailing list removes all attachments, so just provide a 
link to the snapshot).


> On 15 Dec 2019, at 18:42, Cao, Li  wrote:
> 
> Hi!
> 
> I am using solr 8.3.0 in cloud mode. I have collection level autoscaling 
> policy and the collection name is “entity”. But when I run autoscaling 
> simulation all the steps failed with this message:
> 
>"error":{
>  "exception":"java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: 
> Could not find collection : entity/shards",
>  "suggestion":{
>"type":"repair",
>"operation":{
>  "method":"POST",
>  "path":"/c/entity/shards",
>  "command":{"add-replica":{
>  "shard":"shard2",
>  "node":"my_node:8983_solr",
>  "type":"TLOG",
>  "replicaInfo":null}}},
> 
> Does anyone know how to fix this? Is this a bug?
> 
> Thanks!
> 
> Li



Re: Icelandic support in Solr

2019-11-27 Thread Andrzej Białecki
If I’m not mistaken Hunspell supports Icelandic (see here: 
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/is 
) and Lucene 
HunspellStemFilter should be able to use these dictionaries.

> On 27 Nov 2019, at 10:10, Charlie Hull  wrote:
> 
> On 26/11/2019 16:35, Mikhail Ibraheem wrote:
>> Hi,Does Solr supports Icelandic language out of the box? If not, can you 
>> please let me know how to add that with custom analyzers?
>> Thanks
> 
> The Snowball stemmer project which is used by Solr 
> (https://snowballstem.org/algorithms/ - co-created by Martin Porter, author 
> of the famous stemmer) doesn't support Icelandic unfortunately. I can't find 
> any other stemmers that you could use in Solr.
> 
> Basis Technology offer various commercial software for language processing 
> that can work with Solr and other engines, not sure if they support Icelandic.
> 
> So, not very positive I'm afraid: you could look into creating your own 
> stemmer using Snowball, or some heuristic approaches, but you'd need a good 
> grasp of the structure of the language.
> 
> 
> Best
> 
> 
> Charlie
> 
> 
> -- 
> Charlie Hull
> Flax - Open Source Enterprise Search
> 
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
> 



Re: Possible bug in cluster status - > solr 8.3

2019-11-21 Thread Andrzej Białecki
AFAIK these collection properties are not tracked that faithfully and can get 
out of sync, mostly because they are used only during collection CREATE and 
BACKUP / RESTORE and not during other collection operations or during searching 
/ indexing. SPLITSHARD doesn’t trust them, instead it checks the actual counts 
of existing replicas.

These out-of-sync counts may actually cause problems in BACKUP / RESTORE, which 
is worth checking.

There are also conceptual issues here, eg. “replicationFactor” becomes 
meaningless as soon as we have different counts of NRT / TLOG / PULL replicas.

> On 21 Nov 2019, at 13:40, Jason Gerlowski  wrote:
> 
> It seems like an issue to me.  Can you open a JIRA with these details?
> 
> On Fri, Nov 15, 2019 at 10:51 AM Jacek Kikiewicz  wrote:
>> 
>> I found interesting situation, I've created a collection with only one 
>> replica.
>> Then I scaled solr-cloud cluster, and run  'addreplica' call to add 2 more.
>> So I have a collection with 3 tlog replicas, cluster status page shows
>> them but shows also this:
>>  "core_node2":{
>>"core":"EDITED_NAME_shard1_replica_t1",
>>"base_url":"http://EDITED_NODE:8983/solr;,
>>"node_name":"EDITED_NODE:8983_solr",
>>"state":"active",
>>"type":"TLOG",
>>"force_set_state":"false",
>>"leader":"true"},
>>  "core_node5":{
>>"core":"EDITED_NAME_shard1_replica_t3",
>>"base_url":"http://EDITED_NODE:8983/solr;,
>>"node_name":"EDITED_NODE:8983_solr",
>>"state":"active",
>>"type":"TLOG",
>>"force_set_state":"false"},
>>  "core_node6":{
>>"core":"EDITED_NAME_shard1_replica_t4",
>>"base_url":"http://EDITED_NODE:8983/solr;,
>>"node_name":"EDITED_NODE:8983_solr",
>>"state":"active",
>>"type":"TLOG",
>>"force_set_state":"false",
>>"router":{"name":"compositeId"},
>>"maxShardsPerNode":"1",
>>"autoAddReplicas":"false",
>>"nrtReplicas":"1",
>>"tlogReplicas":"1",
>>"znodeVersion":11,
>> 
>> 
>> As you can see I have 3 replicas but then I have also: "tlogReplicas":"1"
>> 
>> If I create collection with tlogReplicas=3 then cluster status shows
>> "tlogReplicas":"3"
>> IS that a bug or somehow 'works as it should' ?
>> 
>> Regards,
>> Jacek
> 



Re: Metrics avgRequestsPerSecond and avgRequestsPerSecond from documentation gone?

2019-11-20 Thread Andrzej Białecki
Hi,

Yes, the documentation needs to be fixed, these attributes have been removed or 
replaced.

* avgRequestsPerSecond -> requestTimes:meanRate. Please note that this is a 
non-decaying simple average based on the total wall clock time elapsed since 
the handler was started until NOW, and the total number of requests the handler 
processed in this time.

* avgTimePerRequest =  totalTime / requests (in nano-seconds). Please note that 
the “totalTime” metric represents the aggregated elapsed time when the handler 
was processing requests (ie. not including all other elapsed time when the 
handler was just sitting idle). Perhaps a better name for this metric would be 
“totalProcessingTime”. 

> On 19 Nov 2019, at 17:35, Koen De Groote  wrote:
> 
> Greetings,
> 
> I'm using Solr 7.6 and have enabled JMX metrics.
> 
> I ran into this page:
> https://lucene.apache.org/solr/guide/7_6/performance-statistics-reference.html#commonly-used-stats-for-request-handlers
> 
> Which mentions "avgRequestsPerSecond" and "avgTimePerRequest" and some
> other attributes, which do not exist anymore in this version. I have an
> older version(4) I spun up to have a look and they do exist in that version.
> 
> When getting info on a QUERY or UPDATE bean with name `requestTimes`, I get
> this:
> 
> # attributes
>  %0   - 50thPercentile (double, r)
>  %1   - 75thPercentile (double, r)
>  %2   - 95thPercentile (double, r)
>  %3   - 98thPercentile (double, r)
>  %4   - 999thPercentile (double, r)
>  %5   - 99thPercentile (double, r)
>  %6   - Count (long, r)
>  %7   - DurationUnit (java.lang.String, r)
>  %8   - FifteenMinuteRate (double, r)
>  %9   - FiveMinuteRate (double, r)
>  %10  - Max (double, r)
>  %11  - Mean (double, r)
>  %12  - MeanRate (double, r)
>  %13  - Min (double, r)
>  %14  - OneMinuteRate (double, r)
>  %15  - RateUnit (java.lang.String, r)
>  %16  - StdDev (double, r)
>  %17  - _instanceTag (java.lang.String, r)
> # operations
>  %0   - javax.management.ObjectName objectName()
>  %1   - [J values()
> #there's no notifications
> 
> And it seems that none of the current values are actually a proper
> replacement for the functionality these values used to offer.
> 
> How shall I go about getting this info now? Do I need to combine several
> other metrics?
> 
> For completeness sake, my solr.xml, where I enabled JMX, is just the
> default example from the documentation, with JMX added:
> 
> 
> 
>
>${host:}
>${jetty.port:8983}
>${hostContext:solr}
>${zkClientTimeout:15000}
> name="genericCoreNodeNames">${genericCoreNodeNames:true}
>
> class="HttpShardHandlerFactory">
>${socketTimeout:0}
>${connTimeout:0}
>
>
>
>javax.net.ssl.keyStorePassword
>javax.net.ssl.trustStorePassword
>basicauth
>zkDigestPassword
>zkDigestReadonlyPassword
>
> class="org.apache.solr.metrics.reporters.SolrJmxReporter">
> name="rootName">very_obvious_name_for_easy_reading_${jetty.port:8983}
>
>
> 
> 
> 
> Kind regards,
> Koen De Groote



Re: daily SolrCloud collection wipes

2019-11-18 Thread Andrzej Białecki
This default autoscaling config helps to keep some aspects of SolrCloud clean - 
specifically:
* Inactive shard plan: it periodically checks whether there are old shards in 
INACTIVE state that can be removed. Shards in this state are left-over parent 
shards remaining after a *successful* SPLITSHARD operation (i.e. the SPLITSHARD 
has completed successfully and the new sub-shards are ACTIVE and in use, and 
the parent shards are no longer in use). That’s likely not your case.
* inactive markers plan has to do with Overseer state recovery when an overseer 
leader crashes. Again, this likely has nothing to do with your case.

As Shawn said, logs should be able to tell you what’s really happening. For 
example, there could be some wild external process in your setup that 
periodically cleans up the collections :)

> On 14 Nov 2019, at 18:25, Shawn Heisey  wrote:
> 
> On 11/14/2019 9:17 AM, Werner Detter wrote:
>> first, thanks for your response. By "reset" I mean: collection still exists
>> but documents have been dropped (from actually round 50k to 0). It happened
>> twice within the same timeframe early in the morning the last two days so I
>> was wondering if something within Solr like this:
>> ".scheduled_maintenance":{
>>   "name":".scheduled_maintenance",
>>   "event":"scheduled",
>>   "startTime":"NOW",
>>   "every":"+1DAY",
>>   "enabled":true,
>>   "actions"
>> {
>>   "name":"inactive_shard_plan",
>>   "class":"solr.InactiveShardPlanAction"},
>> {
>>   "name":"inactive_markers_plan",
>>   "class":"solr.InactiveMarkersPlanAction"},
>> {
>>   "name":"execute_plan",
>>   "class":"solr.ExecutePlanAction"}]}},
>> could be the reason for the resets due to $something =) But I'm not sure 
>> about those
>> Solr maintenance things, that's why I initially asked on the mailinglist 
>> here. But
>> you said Solr doesn't contain any internal scheduling capability which means 
>> this
>> is probably something else. There are no crons on the operating system 
>> itself that do
>> any kind of solr maintenance.
> 
> I was unaware of that config.  Had to look it up.  I have never looked at the 
> autoscaling feature.  I'm not even sure what that config will actually do.  
> To me, it doesn't look like it's configured to do much.
> 
> Someone who is familiar with that feature will need to chime in and 
> confirm/refute my thoughts, but as far as I know, it is only capable of 
> things like adding or removing replicas, not deleting the data or the index.
> 
> Seeing the logs, with them set to the defaults that Solr ships, might reveal 
> something.
> 
> Thanks,
> Shawn
> 



Re: Metrics API - Documentation

2019-10-15 Thread Andrzej Białecki
We keep all essential user documentation (and some dev docs) in the Ref Guide.

The source for the Ref Guide is checked-in under solr/solr-ref-guide, it uses a 
simple ASCII markup so adding some content should be easy. You should follow 
the same workflow as with the code (create a JIRA, and then either add a patch 
or create a PR).

> On 15 Oct 2019, at 17:33, Richard Goodman  wrote:
> 
> Many thanks both for your responses, they've been helpful.
> 
> @Andrzej - Sorry I wasn't clear on the "A latency of 1mil" as I wasn't
> aware the image wouldn't come through. But following your bullet points
> helped me present a better unit for measurement in the axis.
> 
> In regards to contributing, would absolutely love to help there, just not
> sure what the correct direction is? I wasn't sure if the web page source
> code / contributions are in the apache-lucene repository?
> 
> Thanks,
> 
> 
> On Tue, 8 Oct 2019 at 11:04, Andrzej Białecki  wrote:
> 
>> Hi,
>> 
>> Starting with Solr 7.0 all JMX metrics are actually internally driven by
>> the metrics API - JMX (or Prometheus) is just a way of exposing them.
>> 
>> I agree that we need more documentation on metrics - contributions are
>> welcome :)
>> 
>> Regarding your specific examples (btw. our mailing lists aggressively
>> strip all attachments - your graphs didn’t make it):
>> 
>> * time units in time-based counters are in nanoseconds. This is just a
>> unit of value, not necessarily precision. In this specific example
>> `ADMIN./admin/collections.totalTime` (and similarly named metrics for all
>> other request handlers) represents the total elapsed time spent processing
>> requests.
>> * time-based histograms are expressed in milliseconds, where it is
>> indicated by the “_ms” suffix.
>> * 1-, 5- and 15-min rates represent an exponentially weighted moving
>> average over that time window, expressed in events/second.
>> * handlerStart is initialised with System.currentTimeMillis() when this
>> instance of request handler is first created.
>> * details on GC, memory buffer pools, and similar JVM metrics are
>> documented in JDK documentation on Management Beans. For example:
>> 
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
>> <
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
>>> 
>> * "A latency of 1mil” - no idea what that is, I don’t think Solr API uses
>> this abbreviation anywhere.
>> 
>> Hope this helps.
>> 
>> —
>> 
>> Andrzej Białecki
>> 
>>> On 7 Oct 2019, at 13:41, Emir Arnautović 
>> wrote:
>>> 
>>> Hi Richard,
>>> We do not use API to collect metrics but JMX, but I believe that those
>> are the same (did not verify it in code). You can see how we handled those
>> metrics into reports/charts or even use our agent to send data to
>> Prometheus:
>> https://github.com/sematext/sematext-agent-integrations/tree/master/solr <
>> https://github.com/sematext/sematext-agent-integrations/tree/master/solr>
>>> 
>>> You can also see some links to Solr metric related blog posts in this
>> repo. If you find out that managing your own monitoring stack is
>> overwhelming, you can try our Solr integration.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 7 Oct 2019, at 12:40, Richard Goodman 
>> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> I'm currently working on using the prometheus exporter to provide some
>> detailed insights for our Solr Cloud clusters.
>>>> 
>>>> Using the provided template killed our prometheus server, as well as
>> the exporter due to the size of our clusters (each cluster is around 96
>> nodes, ~300 collections with 3way replication and 16 shards), so you can
>> imagine the amount of data that comes through /admin/metrics and not
>> filtering it down first.
>>>> 
>>>> I've began working on writing my own template to reduce the amount of
>> data being requested and it's working fine, and I'm starting to build some
>> nice graphs in Grafana.
>>>> 
>>>> The only difficulty I'm having with this, is I'm struggling to find
>> decent documentation on the metrics themselves. I was using t

Re: Metrics API - Documentation

2019-10-08 Thread Andrzej Białecki
Hi,

Starting with Solr 7.0 all JMX metrics are actually internally driven by the 
metrics API - JMX (or Prometheus) is just a way of exposing them.

I agree that we need more documentation on metrics - contributions are welcome 
:)

Regarding your specific examples (btw. our mailing lists aggressively strip all 
attachments - your graphs didn’t make it):

* time units in time-based counters are in nanoseconds. This is just a unit of 
value, not necessarily precision. In this specific example 
`ADMIN./admin/collections.totalTime` (and similarly named metrics for all other 
request handlers) represents the total elapsed time spent processing requests.
* time-based histograms are expressed in milliseconds, where it is indicated by 
the “_ms” suffix.
* 1-, 5- and 15-min rates represent an exponentially weighted moving average 
over that time window, expressed in events/second.
* handlerStart is initialised with System.currentTimeMillis() when this 
instance of request handler is first created.
* details on GC, memory buffer pools, and similar JVM metrics are documented in 
JDK documentation on Management Beans. For example:
https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
 
<https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true>
* "A latency of 1mil” - no idea what that is, I don’t think Solr API uses this 
abbreviation anywhere.

Hope this helps.

—

Andrzej Białecki

> On 7 Oct 2019, at 13:41, Emir Arnautović  wrote:
> 
> Hi Richard,
> We do not use API to collect metrics but JMX, but I believe that those are 
> the same (did not verify it in code). You can see how we handled those 
> metrics into reports/charts or even use our agent to send data to Prometheus: 
> https://github.com/sematext/sematext-agent-integrations/tree/master/solr 
> <https://github.com/sematext/sematext-agent-integrations/tree/master/solr>
> 
> You can also see some links to Solr metric related blog posts in this repo. 
> If you find out that managing your own monitoring stack is overwhelming, you 
> can try our Solr integration.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 7 Oct 2019, at 12:40, Richard Goodman  wrote:
>> 
>> Hi there,
>> 
>> I'm currently working on using the prometheus exporter to provide some 
>> detailed insights for our Solr Cloud clusters.
>> 
>> Using the provided template killed our prometheus server, as well as the 
>> exporter due to the size of our clusters (each cluster is around 96 nodes, 
>> ~300 collections with 3way replication and 16 shards), so you can imagine 
>> the amount of data that comes through /admin/metrics and not filtering it 
>> down first.
>> 
>> I've began working on writing my own template to reduce the amount of data 
>> being requested and it's working fine, and I'm starting to build some nice 
>> graphs in Grafana.
>> 
>> The only difficulty I'm having with this, is I'm struggling to find decent 
>> documentation on the metrics themselves. I was using the resources metrics 
>> reporting - metrics-api 
>> <https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api>
>>  and monitoring solr with prometheus and grafana 
>> <https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html>
>>  but there is a lack of information on most metrics. 
>> 
>> For example:
>> "ADMIN./admin/collections.totalTime":6715327903,
>> I understand this is a counter, however, I'm not sure what unit this would 
>> be represented when displaying it, for example:
>> 
>> 
>> 
>> A latency of 1mil, not sure if this means milliseconds, million, etc., 
>> Another example would be the GC metrics:
>>  "gc.ConcurrentMarkSweep.count":7,
>>  "gc.ConcurrentMarkSweep.time":1247,
>>  "gc.ParNew.count":16759,
>>  "gc.ParNew.time":884173,
>> Which when displayed, doesn't give the clearest insight as to what the unit 
>> is:
>> 
>> 
>> If anyone has any advice / guidance, that would be greatly appreciated. If 
>> there isn't documentation for the API, then this would also be something 
>> I'll look into help contributing with too.
>> 
>> Thanks,
>> -- 
>> Richard Goodman
> 



Re: HDFS Shard Split

2019-09-17 Thread Andrzej Białecki
SplitShardCmd assumes that its main phase (when the Lucene index is being 
split) always executes on the local file system of the shard leader, and indeed 
the ShardSplitCmd.checkDiskSpace() checks the local file system’s free disk 
space - even though in reality in your case the actual data is written to the 
HDFS Directory so it (almost) doesn’t affect the local FS…

Please file a JIRA request to improve this. For now you simply have to make 
sure that you have at least 2x the index size of free disk space available on 
the shard leader.

> On 16 Sep 2019, at 18:15, Joe Obernberger  
> wrote:
> 
> Hi All - added a couple more solr nodes to an existing solr cloud cluster 
> where the index is in HDFS.  When I try to a split a shard, I get an error 
> saying there is not enough disk space.  It looks like it is looking on the 
> local file system, and not in HDFS.
> 
> "Operation splitshard casued 
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>  not enough free disk space to perform index split on node
> 
> -Joe
> 



Re: Is shard split operation multithreaded?

2019-09-17 Thread Andrzej Białecki
If I understand your question correctly .. it’s single-threaded with regard to 
a specific shard - but you can run multiple shard splitting operations in 
parallel IFF they affect different shards (or different collections).

See SplitShardCmd for the details of locking and how the new sub-shards are 
initialised and replicated. Part of the shard splitting always runs 
asynchronously, namely the part where the new sub-shards are replicated and 
recovered, and only once this part is done the new sub-shards become active and 
the old parent shard becomes inactive. You can find this code in 
ReplicaMutator.checkAndCompleteShardSplit.

> On 17 Sep 2019, at 09:51, Antczak, Lukasz  wrote:
> 
> Hello,
> I have short question to Solr experts.
> Is shard split operation single- or multi- threaded?
> 
> Regards
> Łukasz Antczak
> 
> -- 
> *Łukasz Antczak*
> Senior IT Professional
> GS Data Frontiers Team 
> 
> *Planned absences:*
> *11th August - 18th August*
> *26th August - 1st September*
> *Roche Polska Sp. z o.o.*
> ADMD Group Services - Business Intelligence Team
> HQ: ul. Domaniewska 39B, 02-672 Warszawa
> Office: ul. Abpa Baraniaka 88D, 61-131 Poznań
> 
> Mobile: +48 519 515 010
> mailto: lukasz.antc...@roche.com
> 
> *Informacja o poufności: *Treść tej wiadomości zawiera informacje
> przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej
> adresatem, bądź otrzymaliście ją przez pomyłkę, prosimy o powiadomienie o
> tym nadawcy oraz trwałe jej usunięcie. Wszelkie nieuprawnione
> wykorzystanie informacji zawartych w tej wiadomości jest zabronione.
> 
> *Confidentiality Note:* This message is intended only for the use of the
> named recipient(s) and may contain confidential and/or proprietary
> information. If you are not the intended recipient, please contact the
> sender and delete this message. Any unauthorized use of the information
> contained in this message is prohibited.



Re: Solr 7.7.2 - Autoscaling in new cluster ignoring sysprop rules, possibly all rules

2019-06-28 Thread Andrzej Białecki
Andrew, please create a JIRA issue - in my opinion this is a bug not a feature, 
or at least something that needs clarification.

> On 27 Jun 2019, at 23:56, Andrew Kettmann  
> wrote:
> 
> I found the issue. Autoscaling seems to silently ignore rules (at least 
> sysprop rules). Example rule:
> 
> 
> {'set-policy': {'sales-uat': [{'node': '#ANY',
>   'replica': '<2',
>   'strict': 'false'},
>  {'replica': '#ALL',
>   'strict': 'true',
>   'sysprop.HELM_CHART': 'foo'}]}}
> 
> 
> Two cases will get the sysprop rule ignored:
> 
>  1.  No nodes have a HELM_CHART system property defined
>  2.  No nodes have the value "foo" for the HELM_CHART system property
> 
> 
> If you have SOME nodes that have -DHELM_CHART=foo, then it will fail if it 
> cannot satisfy another strict rule. So sysprop autoscaling rules appear to be 
> unable to be strict on their own it appears.
> 
> 
> Hopefully this can solve some issues for other people as well.
> 
> 
> From: Andrew Kettmann
> Sent: Tuesday, June 25, 2019 1:04:21 PM
> To: solr-user@lucene.apache.org
> Subject: Solr 7.7.2 - Autoscaling in new cluster ignoring sysprop rules, 
> possibly all rules
> 
> 
> Using docker 7.7.2 image
> 
> 
> Solr 7.7.2 on new Znode on ZK. Created the chroot using solr zk mkroot.
> 
> 
> Created a policy:
> 
> {'set-policy': {'banana': [{'replica': '#ALL',
>'sysprop.HELM_CHART': 'notbanana'}]}}
> 
> 
> No errors on creation of the policy.
> 
> 
> I have no nodes that have that value for the system property "HELM_CHART", I 
> have nodes that contain "banana" and "rulesos" for that value only.
> 
> 
> I create the collection with a call to the /admin/collections:
> 
> {'action': 'CREATE',
> 'collection.configName': 'project-solr-7',
> 'name': 'banana',
> 'numShards': '2',
> 'policy': 'banana',
> 'replicationFactor': '2'}
> 
> 
> and it creates the collection without an error. Which what I expected was the 
> collection creation to fail. This is the behavior I had seen in the past, but 
> after tearing down and recreating the cluster in a higher environment, it 
> does not appear to function.
> 
> 
> Is there some prerequisite before policies will be respected? The .system 
> collection is in place as expected, and I am not seeing anything in the logs 
> on the overseer to suggest any problems.
> 
> [https://storage.googleapis.com/e24-email-images/e24logonotag.png]
>  Andrew Kettmann
> DevOps Engineer
> P: 1.314.596.2836
> [LinkedIn] [Twitter] 
>   [Instagram] 
> 
> 
> evolve24 Confidential & Proprietary Statement: This email and any attachments 
> are confidential and may contain information that is privileged, confidential 
> or exempt from disclosure under applicable law. It is intended for the use of 
> the recipients. If you are not the intended recipient, or believe that you 
> have received this communication in error, please do not read, print, copy, 
> retransmit, disseminate, or otherwise use the information. Please delete this 
> email and attachments, without reading, printing, copying, forwarding or 
> saving them, and notify the Sender immediately by reply email. No 
> confidentiality or privilege is waived or lost by any transmission in error.



Re: Solr 7.7.2 - SolrCloud - SPLITSHARD - Using LINK method fails on disk usage checks

2019-06-19 Thread Andrzej Białecki
Hi Andrew,

Please create a JIRA issue and attach this patch, I’ll look into fixing this. 
Thanks!


> On 18 Jun 2019, at 23:19, Andrew Kettmann  
> wrote:
> 
> Attached the patch, but that isn't sent out on the mailing list, my mistake. 
> Patch below:
> 
> 
> 
> ### START
> 
> diff --git 
> a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java 
> b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
> index 24a52eaf97..e018f8a42f 100644
> --- 
> a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
> +++ 
> b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
> @@ -135,7 +135,9 @@ public class SplitShardCmd implements 
> OverseerCollectionMessageHandler.Cmd {
> }
> 
> RTimerTree t = timings.sub("checkDiskSpace");
> -checkDiskSpace(collectionName, slice.get(), parentShardLeader);
> +if (splitMethod != SolrIndexSplitter.SplitMethod.LINK) {
> +  checkDiskSpace(collectionName, slice.get(), parentShardLeader);
> +}
> t.stop();
> 
> // let's record the ephemeralOwner of the parent leader node
> 
> ### END
> 
> 
> From: Andrew Kettmann
> Sent: Tuesday, June 18, 2019 3:05:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 7.7.2 - SolrCloud - SPLITSHARD - Using LINK method fails on 
> disk usage checks
> 
> 
> Looks like the disk check here is the problem, I am no Java developer, but 
> this patch ignores the check if you are using the link method for splitting. 
> Attached the patch. This is off of the commit for 7.7.2, d4c30fc285 . The 
> modified version only has to be run on the overseer machine, so there is that 
> at least.
> 
> 
> From: Andrew Kettmann
> Sent: Tuesday, June 18, 2019 11:32:43 AM
> To: solr-user@lucene.apache.org
> Subject: Solr 7.7.2 - SolrCloud - SPLITSHARD - Using LINK method fails on 
> disk usage checks
> 
> 
> Using Solr 7.7.2 Docker image, testing some of the new autoscale features, 
> huge fan so far. Tested with the link method on a 2GB core and found that it 
> took less than 1MB of additional space. Filled the core quite a bit larger, 
> 12GB of a 20GB PVC, and now splitting the shard fails with the following 
> error message on my overseer:
> 
> 
> 2019-06-18 16:27:41.754 ERROR 
> (OverseerThreadFactory-49-thread-5-processing-n:10.0.192.74:8983_solr) 
> [c:test_autoscale s:shard1  ] o.a.s.c.a.c.OverseerCollectionMessageHandler 
> Collection: test_autoscale operation: splitshard 
> failed:org.apache.solr.common.SolrException: not enough free disk space to 
> perform index split on node 10.0.193.23:8983_solr, required: 
> 23.35038321465254, available: 7.811378479003906
>at 
> org.apache.solr.cloud.api.collections.SplitShardCmd.checkDiskSpace(SplitShardCmd.java:567)
>at 
> org.apache.solr.cloud.api.collections.SplitShardCmd.split(SplitShardCmd.java:138)
>at 
> org.apache.solr.cloud.api.collections.SplitShardCmd.call(SplitShardCmd.java:94)
>at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:294)
>at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
>at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>at java.base/java.lang.Thread.run(Thread.java:834)
> 
> 
> 
> I attempted sending the request to the node itself to see if it did anything 
> different, but no luck. My parameters are (Note Python formatting as that is 
> my language of choice):
> 
> 
> 
> splitparams = {'action':'SPLITSHARD',
>   'collection':'test_autoscale',
>   'shard':'shard1',
>   'splitMethod':'link',
>   'timing':'true',
>   'async':'shardsplitasync'}
> 
> 
> And this is confirmed by the log message from the node itself:
> 
> 
> 2019-06-18 16:27:41.730 INFO  (qtp1107530534-16) [c:test_autoscale   ] 
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections 
> params={async=shardsplitasync=true=SPLITSHARD=test_autoscale=shard1=link}
>  status=0 QTime=20
> 
> 
> While it is true I do not have enough space if I were using the rewrite 
> method, the link method on a 2GB core used an additional less than 1MB of 
> space. Is there something I am missing here? is there an option to disable 
> the disk space check that I need to pass? I can't find anything in the 
> documentation at this point.
> 
> 
> [https://storage.googleapis.com/e24-email-images/e24logonotag.png]
> Andrew Kettmann
> DevOps Engineer
> P: 1.314.596.2836
> [LinkedIn] [Twitter] 
>   

Re: Solr 8.1.1, JMX and VisualVM

2019-05-30 Thread Andrzej Białecki
Hi,

This has to do with the new JVM flags that optimise performance, they were 
added roughly at the same time when Solr switched to G1GC.

In ‘bin/solr’ please comment out this flag: '-XX:+PerfDisableSharedMem'.

> On 30 May 2019, at 14:59, Markus Jelsma  wrote:
> 
> Hello,
> 
> Slight correction, SolrCLI does become visible in the local applications 
> view. I just missed it before.
> 
> Thanks,
> Markus
> 
> -Original message-
>> From:Markus Jelsma 
>> Sent: Thursday 30th May 2019 14:47
>> To: solr-user 
>> Subject: Solr 8.1.1, JMX and VisualVM
>> 
>> Hello,
>> 
>> While upgrading from 7.7 to 8.1.1, i noticed start.jar and SolrCLI no longer 
>> pop up in the local applications view of VisualVM! I CTRL-F'ed my way 
>> through the changelog for Solr 8.0.0 to 8.1.1 but could not find anything 
>> related. I am clueless!
>> 
>> Using OpenJDK 11.0.3 2019-04-16 and Solr 8, how can i attach my VisualVM to 
>> it?
>> 
>> Many thanks,
>> Markus
>> 
> 



[ANNOUNCE] Apache Solr 8.1.1 released

2019-05-28 Thread Andrzej Białecki
## 28 May 2019, Apache Solr™ 8.1.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 8.1.1

Solr is the popular, blazing fast, open source NoSQL search platform from the
Apache Lucene project. Its major features include powerful full-text search,
hit highlighting, faceted search, dynamic clustering, database integration, 
rich document
(e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, 
providing fault
tolerant distributed search and indexing, and powers the search and navigation 
features of
many of the world's largest internet sites.

Solr 8.1.1 is available for immediate download at:
  

Please read CHANGES.txt for a full list of new features and changes:

  

### Solr 8.1.1 Release Highlights
* Fix for a Null Pointer Exception when querying collection through collection 
alias.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Re: Distributed IDF in Alias

2019-05-18 Thread Andrzej Białecki
Yes, the IDFs will be different. You could probably implement a custom 
component that would take term statistics from the previous collections to 
pre-populate the stats of the current collection, but this is an uncharted 
area, there’s a lot that could go wrong. Eg. if there’s a genuine shift in the 
term distribution in more recent documents then you probably would not want the 
old statistics to skew the more recent results, at least you would want to use 
some weighting factor - and at this point predicting the final term IDFs (and 
consequently document rankings) becomes quite complicated.

> On 18 May 2019, at 08:14, SOLR4189  wrote:
> 
> I ask my question due to I want to use TRA (Time Routed Aliases). Let's say
> SOLR will open new collection every month. In the beginning of month a new
> collection will be empty almost. 
> So IDF will be different between new collection and collection of previous
> month? 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 



Re: Distributed IDF in Alias

2019-05-17 Thread Andrzej Białecki
Both descriptions are correct, but in their context. The description in the Ref 
Guide in the section about ExactStatsCache is correct in the sense that it uses 
collection-wide IDF values for terms when calculating scores for different 
SHARDS (and merging partial per-shard lists). This means that even if local IDF 
(for documents in a particular shard) is biased the scores will be still 
comparable across shards and the documents coming from these partial lists can 
be merged using their absolute scores - and their rank (ordering) will be the 
same as if they all came from one big shard..

There’s no such mechanism for adjusting scores across two or more different 
COLLECTIONS. Usually IDFs for the same terms will be different in different 
collections - which means the absolute values of scores for the same terms 
won’t be comparable. Still, if you insist and you use a multi-collection alias 
Solr will obey ;) and it will merge these partial lists as if their scores were 
comparable. The end result will be that some or most of the results will be 
incorrectly ranked, depending on how different were the IDFs in these 
collections.

> On 17 May 2019, at 16:37, SOLR4189  wrote:
> 
> Hi all,
> 
> Can somebody explain me SOLR tip from  here
> 
>  
> :
> /"Any alias (standard or routed) that references multiple collections may
> complicate relevancy. By default, SolrCloud scores documents on a per shard
> basis. With multiple collections in an alias this is always a problem, so if
> you have a use case for which BM25 or TF/IDF relevancy is important you will
> want to turn on one of the ExactStatsCache implementations"/
> 
> But there is / "This implementation uses global values (across the
> collection) for document frequency" / in ExactStatsCache documentation (from 
> here
> 
>  
> )
> 
> So what does it mean "across the collection"? Does it mean that distributed
> IDF is inside the same collection (across shards)? If yes, how it will help
> in the alias case?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 



Re: Solr 8.1 issue with collection aliases

2019-05-16 Thread Andrzej Białecki
Yes, I can work on 8.1.1 release - I’ll announce this shortly.

> On 16 May 2019, at 13:51, Ishan Chattopadhyaya  
> wrote:
> 
> Absolutely. This is a critical feature.
> Andrzej, would you have time to do a 8.1.1 release? We also need to
> coordinate with Jan, since he's doing a 7.7.2 release right now.
> 
> On Thu, May 16, 2019 at 5:18 PM Jörn Franke  wrote:
>> 
>> For the specific client the Solr 8.1 is not usable with this bug.
>> 
>> Collection aliases are also a crucial feature for doing “zero-downtime” 
>> reindexing or changing the Schema of a collection or for switching back to 
>> an old Index if the new Index structure has bugs etc.
>> 
>> However  I also understand that there are other considerations by other 
>> people.
>> 
>>> Am 16.05.2019 um 11:55 schrieb Ishan Chattopadhyaya 
>>> :
>>> 
>>> Does this warrant a 8.1.1 release? I think this is serious enough.
>>> 
 On Thu, May 16, 2019 at 12:03 PM Jörn Franke  wrote:
 
 SOLR-13475
 
> Am 16.05.2019 um 05:24 schrieb Ishan Chattopadhyaya 
> :
> 
> Please open a JIRA.
> 
>> On Thu, 16 May, 2019, 8:09 AM Jörn Franke,  wrote:
>> Sorry autocorrection. It is not only a admin UI issue. I described in my 
>> previous email that access through the collection alias does not work. I 
>> cannot even do execute the select query handler if I use the collection 
>> alias instead of the collection name.
>> So it is maybe more problematic.
>> 
>>> Am 16.05.2019 um 04:36 schrieb Jörn Franke :
>>> 
>>> Note only an admin UI issue. Access collections via their alias does 
>>> not work.
>>> 
 Am 15.05.2019 um 15:47 schrieb Mikhail Khludnev :
 
 It seems creating alias in Solr Admin UI is broken. It's a minor issue 
 for 8.1.0
 I've alias via REST call 
 http://localhost:8983/solr/admin/collections?action=CREATEALIAS=testalias=gettingstarted
   successfully.
 Jörn, thanks for reporting.
 
> On Tue, May 14, 2019 at 11:03 PM Jörn Franke  
> wrote:
> Hi,
> 
> I tried to upgrade from 8.0 to 8.1. I noticed that there is an issue 
> with
> collection aliases, but I am not 100% sure it is due to the upgrade.
> 
> Situation:
> I have a collection called c_testcollection.
> I have an alias called testcollection.
> Alias "testcollection" points to "c_testcollection".
> On Solr 8.0 no issue
> 
> After upgrade to Solr 8.1:
> When I do a query on c_testcollection then there is no issue:
> http://localhost:8983/solr/c_testcollection/select?q=test
> When I do a query on testcollection then I receive the stacktrace 
> below
> http://localhost:8983/solr/testcollection/select?q=test
> 
> Additionally I observe a strange behavior in the admin ui. When I try 
> to
> create an alias (e.g. new) for a new collection (e.g. c_new) then it
> creates two aliases:
> new => c_new
> c_new => c_new
> if i then do a query on the alias new it works without issues. If I 
> remove
> the alias from c_new to c_new then I get the same error. Is this 
> desired
> behaviour?
> It is rather annoying to have unnecessary aliases, because I need to 
> filter
> them out in my application when retrieving all aliases.
> Is there a related issue.
> 
> Here the stacktrace:
> {
> "error":{
>   "trace":"java.lang.NullPointerException\n\tat
> java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:351)\n\tat
> org.apache.solr.common.cloud.Aliases.resolveAliasesGivenAliasMap(Aliases.java:258)\n\tat
> org.apache.solr.common.cloud.Aliases.resolveAliases(Aliases.java:181)\n\tat
> org.apache.solr.servlet.HttpSolrCall.resolveCollectionListOrAlias(HttpSolrCall.java:385)\n\tat
> org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:273)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:486)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
> 

Re: The parent shard will never be delete/clean?

2019-01-23 Thread Andrzej Białecki
Solr 7.4.0 added a periodic maintenance task that cleans up old inactive parent 
shards left after the split. “Old” means 2 days by default.

> On 22 Jan 2019, at 15:31, Jason Gerlowski  wrote:
> 
> Hi,
> 
> You might want to check out the documentation, which goes over
> split-shard in a bit more detail:
> https://lucene.apache.org/solr/guide/7_6/collections-api.html#CollectionsAPI-splitshard
> 
> To answer your question directly though, no.  Split-shard creates two
> new subshards, but it doesn't do anything to remove or cleanup the
> original shard.  The original shard remains with its data and will
> delegate future requests to the result shards.
> 
> Hope that helps,
> 
> Jason
> 
> On Tue, Jan 22, 2019 at 4:17 AM zhenyuan wei  wrote:
>> 
>> Hi,
>>   If I split shard1 to shard1_0,shard1_1, Is the parent shard1 will
>> never be clean up?
>> 
>> 
>> Best,
>> Tinswzy
> 



Re: SPLITSHARD throwing OutOfMemory Error

2018-10-04 Thread Andrzej Białecki
I know it’s not much help if you’re stuck with Solr 6.1 … but Solr 7.5 comes 
with an alternative strategy for SPLITSHARD that doesn’t consume as much memory 
and nearly doesn’t consume additional disk space on the leader. This strategy 
can be turned on by “splitMethod=link” parameter.

> On 4 Oct 2018, at 10:23, Atita Arora  wrote:
> 
> Hi Edwin,
> 
> Thanks for following up on this.
> 
> So here are the configs :
> 
> Memory - 30G - 20 G to Solr
> Disk - 1TB
> Index = ~ 500G
> 
> and I think that it possibly is due to the reason why this could be
> happening is that during split shard, the unsplit index + split index
> persists on the instance and may be causing this.
> I actually tried splitshard on another instance with index size 64G and it
> went through without any issues.
> 
> I would appreciate if you have additional information to enlighten me on
> this issue.
> 
> Thanks again.
> 
> Regards,
> 
> Atita
> 
> On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi Atita,
>> 
>> What is the amount of memory that you have in your system?
>> And what is your index size?
>> 
>> Regards,
>> Edwin
>> 
>> On Tue, 25 Sep 2018 at 22:39, Atita Arora  wrote:
>> 
>>> Hi,
>>> 
>>> I am working on a test setup with Solr 6.1.0 cloud with 1 collection
>>> sharded across 2 shards with no replication. When triggered a SPLITSHARD
>>> command it throws "java.lang.OutOfMemoryError: Java heap space"
>> everytime.
>>> I tried this with multiple heap settings of 8, 12 & 20G but every time it
>>> does create 2 sub-shards but then fails eventually.
>>> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has
>> been
>>> resolved but the trace looked very similar to this one.
>>> Also just to ensure that I do not run into exceptions due to merge as
>>> reported in this ticket, I also tried running optimize before proceeding
>>> with splitting the shard.
>>> I issued the following commands :
>>> 
>>> 1.
>>> 
>>> 
>> http://localhost:8983/solr/admin/collections?collection=testcollection=shard1=SPLITSHARD
>>> 
>>> This threw java.lang.OutOfMemoryError: Java heap space
>>> 
>>> 2.
>>> 
>>> 
>> http://localhost:8983/solr/admin/collections?collection=testcollection=shard1=SPLITSHARD=1000
>>> 
>>> Then I ran with async=1000 and checked the status. Every time It's
>> creating
>>> the sub shards, but not splitting the index.
>>> 
>>> Is there something that I am not doing correctly?
>>> 
>>> Please guide.
>>> 
>>> Thanks,
>>> Atita
>>> 
>> 

—

Andrzej Białecki



Re: Solr 7.4.0 - bug in JMX cache stats?

2018-09-18 Thread Andrzej Białecki
Hi Bojan,

This will be fixed in the upcoming 7.5.0 release. Thank you for reporting this!

> On 6 Sep 2018, at 18:16, Bojan Šmid  wrote:
> 
> Hi,
> 
>  it seems the format of cache mbeans changed with 7.4.0.  And from what I
> see similar change wasn't made for other mbeans, which may mean it was
> accidental and may be a bug.
> 
>  In Solr 7.3.* format was (each attribute on its own, numeric type):
> 
> mbean:
> solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache
> 
> attributes:
>  lookups java.lang.Long = 0
>  hits java.lang.Long = 0
>  cumulative_evictions java.lang.Long = 0
>  size java.lang.Long = 0
>  hitratio java.lang.Float = 0.0
>  evictions java.lang.Long = 0
>  cumulative_lookups java.lang.Long = 0
>  cumulative_hitratio java.lang.Float = 0.0
>  warmupTime java.lang.Long = 0
>  inserts java.lang.Long = 0
>  cumulative_inserts java.lang.Long = 0
>  cumulative_hits java.lang.Long = 0
> 
> 
>  With 7.4.0 there is a single attribute "Value" (java.lang.Object):
> 
> mbean:
> solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache
> 
> attributes:
>  Value java.lang.Object = {lookups=0, evictions=0,
> cumulative_inserts=0, cumulative_hits=0, hits=0, cumulative_evictions=0,
> size=0, hitratio=0.0, cumulative_lookups=0, cumulative_hitratio=0.0,
> warmupTime=0, inserts=0}
> 
> 
>  So the question is - was this intentional change or a bug?
> 
>  Thanks,
> 
>Bojan



—

Andrzej Białecki



Re: Autoscaling and inactive shards

2018-06-18 Thread Andrzej Białecki


> On 18 Jun 2018, at 14:02, Jan Høydahl  wrote:
> 
> Is there still a valid reason to keep the inactive shards around?
> If shard splitting is robust, could not the split operation delete the 
> inactive shard once the new shards are successfully loaded, just like what 
> happens during an automated merge of segments?
> 


Shard splitting is not robust :) There are some interesting partial failure 
scenarios in SplitShardCmd that still need fixing - most likely a complete 
rewrite of SplitShardCmd is required to improve error handling, perhaps also to 
use a more efficient index splitting algorithm.

Until this is done shard splitting leaves the original shard for a while, and 
then InactiveShardPlanAction removes them after their TTL expired (default is 2 
days).

> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 18. jun. 2018 kl. 12:12 skrev Andrzej Białecki 
>> :
>> 
>> If I’m not mistaken the weird accounting of “inactive” shard cores is caused 
>> also by the fact that individual cores that constitute replicas in the 
>> inactive shard are still loaded, so they still affect the number of active 
>> cores. If that’s the case then we should probably fix this to prevent 
>> loading the cores from inactive (but still present) shards.
>> 
>>> On 14 Jun 2018, at 04:27, Shalin Shekhar Mangar  
>>> wrote:
>>> 
>>> Yes, I believe Noble is working on this. See
>>> https://issues.apache.org/jira/browse/SOLR-11985
>>> 
>>> On Wed, Jun 13, 2018 at 1:35 PM Jan Høydahl  wrote:
>>> 
>>>> Ok, get the meaning of preferences.
>>>> 
>>>> Would there be a way to write a generic rule that would suggest moving
>>>> shards to obtain balance, without specifying absolute core counts? I.e. if
>>>> you have three nodes
>>>> A: 3 cores
>>>> B: 5 cores
>>>> C: 3 cores
>>>> 
>>>> Then that rule would suggest two moves to end up with 4 cores on all three
>>>> (unless that would violate disk space or load limits)?
>>>> 
>>>> --
>>>> Jan Høydahl, search solution architect
>>>> Cominvent AS - www.cominvent.com
>>>> 
>>>>> 12. jun. 2018 kl. 08:10 skrev Shalin Shekhar Mangar <
>>>> shalinman...@gmail.com>:
>>>>> 
>>>>> Hi Jan,
>>>>> 
>>>>> Comments inline:
>>>>> 
>>>>> On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl >>> <mailto:jan@cominvent.com>> wrote:
>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> I'm trying to have Autoscaling move a shard to another node after
>>>> manually
>>>>>> splitting.
>>>>>> We have two nodes, one has a shard1 and the other node is empty.
>>>>>> 
>>>>>> After SPLITSHARD you have
>>>>>> 
>>>>>> * shard1 (inactive)
>>>>>> * shard1_0
>>>>>> * shard1_1
>>>>>> 
>>>>>> For autoscaling we have the {"minimize" : "cores"} cluster preference
>>>>>> active. Because of that I'd expect that Autoscaling would suggest to
>>>> move
>>>>>> e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
>>>>>> rule just to test {"cores": "<2", "node": "#ANY"}, but still no
>>>>>> suggestions. Not until I delete the inactive shard1, then it suggests to
>>>>>> move one of the two remaining shards to the other node.
>>>>>> 
>>>>>> So my two questions are
>>>>>> 1. Is it by design that inactive shards "count" wrt #cores?
>>>>>> I understand that it consumes disk but it is not active otherwise,
>>>>>> so one could argue that it should not be counted in core/replica
>>>> rules?
>>>>>> 
>>>>> 
>>>>> Today, inactive slices also count towards the number of cores -- though
>>>>> technically correct, it is probably an oversight.
>>>>> 
>>>>> 
>>>>>> 2. Why is there no suggestion to move a shard due to the "minimize
>>>> cores"
>>>>>> reference itself?
>>>>>> 
>>>>> 
>>>>> The /autoscaling/suggestions end point only suggests if there are policy
>>>>> violations. Preferences such as minimize:cores are more of a sorting
>>>> order
>>>>> so they aren't really being violated. After you add the rule, the
>>>> framework
>>>>> still cannot give a suggestion that satisfies your rule. This is because
>>>>> even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0.
>>>> So
>>>>> the system ends up not suggesting anything. You should get a suggestion
>>>> if
>>>>> you add a third node to the cluster though.
>>>>> 
>>>>> Also see SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997 <
>>>> https://issues.apache.org/jira/browse/SOLR-11997>> which
>>>>> will tell users that a suggestion could not be returned because we cannot
>>>>> satisfy the policy. There are a slew of other improvements to suggestions
>>>>> planned that will return suggestions even when there are no policy
>>>>> violations.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Jan Høydahl, search solution architect
>>>>>> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>> 
>>>> 
>>> 
>>> -- 
>>> Regards,
>>> Shalin Shekhar Mangar.
>> 
> 



Re: Autoscaling and inactive shards

2018-06-18 Thread Andrzej Białecki
If I’m not mistaken the weird accounting of “inactive” shard cores is caused 
also by the fact that individual cores that constitute replicas in the inactive 
shard are still loaded, so they still affect the number of active cores. If 
that’s the case then we should probably fix this to prevent loading the cores 
from inactive (but still present) shards.

> On 14 Jun 2018, at 04:27, Shalin Shekhar Mangar  
> wrote:
> 
> Yes, I believe Noble is working on this. See
> https://issues.apache.org/jira/browse/SOLR-11985
> 
> On Wed, Jun 13, 2018 at 1:35 PM Jan Høydahl  wrote:
> 
>> Ok, get the meaning of preferences.
>> 
>> Would there be a way to write a generic rule that would suggest moving
>> shards to obtain balance, without specifying absolute core counts? I.e. if
>> you have three nodes
>> A: 3 cores
>> B: 5 cores
>> C: 3 cores
>> 
>> Then that rule would suggest two moves to end up with 4 cores on all three
>> (unless that would violate disk space or load limits)?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 12. jun. 2018 kl. 08:10 skrev Shalin Shekhar Mangar <
>> shalinman...@gmail.com>:
>>> 
>>> Hi Jan,
>>> 
>>> Comments inline:
>>> 
>>> On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl > > wrote:
>>> 
 Hi
 
 I'm trying to have Autoscaling move a shard to another node after
>> manually
 splitting.
 We have two nodes, one has a shard1 and the other node is empty.
 
 After SPLITSHARD you have
 
 * shard1 (inactive)
 * shard1_0
 * shard1_1
 
 For autoscaling we have the {"minimize" : "cores"} cluster preference
 active. Because of that I'd expect that Autoscaling would suggest to
>> move
 e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
 rule just to test {"cores": "<2", "node": "#ANY"}, but still no
 suggestions. Not until I delete the inactive shard1, then it suggests to
 move one of the two remaining shards to the other node.
 
 So my two questions are
 1. Is it by design that inactive shards "count" wrt #cores?
  I understand that it consumes disk but it is not active otherwise,
  so one could argue that it should not be counted in core/replica
>> rules?
 
>>> 
>>> Today, inactive slices also count towards the number of cores -- though
>>> technically correct, it is probably an oversight.
>>> 
>>> 
 2. Why is there no suggestion to move a shard due to the "minimize
>> cores"
 reference itself?
 
>>> 
>>> The /autoscaling/suggestions end point only suggests if there are policy
>>> violations. Preferences such as minimize:cores are more of a sorting
>> order
>>> so they aren't really being violated. After you add the rule, the
>> framework
>>> still cannot give a suggestion that satisfies your rule. This is because
>>> even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0.
>> So
>>> the system ends up not suggesting anything. You should get a suggestion
>> if
>>> you add a third node to the cluster though.
>>> 
>>> Also see SOLR-11997 > https://issues.apache.org/jira/browse/SOLR-11997>> which
>>> will tell users that a suggestion could not be returned because we cannot
>>> satisfy the policy. There are a slew of other improvements to suggestions
>>> planned that will return suggestions even when there are no policy
>>> violations.
>>> 
>>> 
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com 
 
 
>>> 
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>> 
>> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: Expose a metric for percentage-recovered during full recoveries

2018-03-15 Thread Andrzej Białecki
Hi S G,

This looks useful, and it should be easy to add to the existing metrics in 
ReplicationHandler, probably somewhere around ReplicationHandler:856 .

> On 14 Mar 2018, at 20:16, S G  wrote:
> 
> Hi,
> 
> Solr does full recoveries very frequently - sometimes even for seemingly
> simple cases like adding a field to the schema, a couple of nodes go into
> recovery.
> It would be nice if it did not do such full recoveries so frequently but
> since that may require a lot of fixing, can we have a metric that reports
> how much a core has recovered already?
> 
> Example:
> 
> $ cd data
> $ du -h . | grep  my_collection | grep -w index
> 77G   ./my_collection_shard3_replica2/data/index.20180314184942993
> 145G ./my_collection_shard3_replica2/data/index.20180112001943687
> 
> This shows that the shard3-replica2 core is doing a full recovery and has
> only copied 77G out of 145G
> That is about 50% recovery done.
> 
> 
> It would be very nice if we can have this as a JMX metric and we can then
> plot it somewhere instead of having to keep running the same command in a
> loop and guessing how much is left to be copied.
> 
> A metric like the following would be great:
> {
>"my_collection_shard3_replica2": {
> "recovery": {
>  "currentSize": "77 gb",
>  "expectedSize": "145 gb",
>  "percentRecovered": "50",
>  "startTimeEpoch": "361273126317"
>  }
>}
> }
> 
> If it looks useful, I will open a JIRA for the same.
> 
> Thanks
> SG



Re: Heads up: SOLR-10130, Performance issue in Solr 6.4.1

2017-02-13 Thread Andrzej Białecki

> On 13 Feb 2017, at 13:46, Ere Maijala  wrote:
> 
> Hi all,
> 
> this is just a quick heads-up that we've stumbled on serious performance 
> issues after upgrading to Solr 6.4.1 apparently due to the new metrics 
> collection causing a major slowdown. I've filed an issue 
> (https://issues.apache.org/jira/browse/SOLR-10130) about it, but decided to 
> post this just so that anyone else doesn't need to encounter this unprepared. 
> It seems to me that metrics would need to be explicitly disabled altogether 
> in the index config to avoid the issue.
> 
> --Ere


Unfortunately this bug is present in both 6.4.0 and 6.4.1, and needs a patch, 
ie. config changes won’t solve it.

It’s a pity that Solr doesn’t have a continuous performance benchmark setup, 
like Lucene does.

--
Best regards,
Andrzej Bialecki

--=# http://www.lucidworks.com #=--