Re: Metrics API - Documentation

2019-10-08 Thread Andrzej Białecki
Hi,

Starting with Solr 7.0 all JMX metrics are actually internally driven by the 
metrics API - JMX (or Prometheus) is just a way of exposing them.

I agree that we need more documentation on metrics - contributions are welcome 
:)

Regarding your specific examples (btw. our mailing lists aggressively strip all 
attachments - your graphs didn’t make it):

* time units in time-based counters are in nanoseconds. This is just a unit of 
value, not necessarily precision. In this specific example 
`ADMIN./admin/collections.totalTime` (and similarly named metrics for all other 
request handlers) represents the total elapsed time spent processing requests.
* time-based histograms are expressed in milliseconds, where it is indicated by 
the “_ms” suffix.
* 1-, 5- and 15-min rates represent an exponentially weighted moving average 
over that time window, expressed in events/second.
* handlerStart is initialised with System.currentTimeMillis() when this 
instance of request handler is first created.
* details on GC, memory buffer pools, and similar JVM metrics are documented in 
JDK documentation on Management Beans. For example:
https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
 

* "A latency of 1mil” - no idea what that is, I don’t think Solr API uses this 
abbreviation anywhere.

Hope this helps.

—

Andrzej Białecki

> On 7 Oct 2019, at 13:41, Emir Arnautović  wrote:
> 
> Hi Richard,
> We do not use API to collect metrics but JMX, but I believe that those are 
> the same (did not verify it in code). You can see how we handled those 
> metrics into reports/charts or even use our agent to send data to Prometheus: 
> https://github.com/sematext/sematext-agent-integrations/tree/master/solr 
> 
> 
> You can also see some links to Solr metric related blog posts in this repo. 
> If you find out that managing your own monitoring stack is overwhelming, you 
> can try our Solr integration.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 7 Oct 2019, at 12:40, Richard Goodman  wrote:
>> 
>> Hi there,
>> 
>> I'm currently working on using the prometheus exporter to provide some 
>> detailed insights for our Solr Cloud clusters.
>> 
>> Using the provided template killed our prometheus server, as well as the 
>> exporter due to the size of our clusters (each cluster is around 96 nodes, 
>> ~300 collections with 3way replication and 16 shards), so you can imagine 
>> the amount of data that comes through /admin/metrics and not filtering it 
>> down first.
>> 
>> I've began working on writing my own template to reduce the amount of data 
>> being requested and it's working fine, and I'm starting to build some nice 
>> graphs in Grafana.
>> 
>> The only difficulty I'm having with this, is I'm struggling to find decent 
>> documentation on the metrics themselves. I was using the resources metrics 
>> reporting - metrics-api 
>> 
>>  and monitoring solr with prometheus and grafana 
>> 
>>  but there is a lack of information on most metrics. 
>> 
>> For example:
>> "ADMIN./admin/collections.totalTime":6715327903,
>> I understand this is a counter, however, I'm not sure what unit this would 
>> be represented when displaying it, for example:
>> 
>> 
>> 
>> A latency of 1mil, not sure if this means milliseconds, million, etc., 
>> Another example would be the GC metrics:
>>  "gc.ConcurrentMarkSweep.count":7,
>>  "gc.ConcurrentMarkSweep.time":1247,
>>  "gc.ParNew.count":16759,
>>  "gc.ParNew.time":884173,
>> Which when displayed, doesn't give the clearest insight as to what the unit 
>> is:
>> 
>> 
>> If anyone has any advice / guidance, that would be greatly appreciated. If 
>> there isn't documentation for the API, then this would also be something 
>> I'll look into help contributing with too.
>> 
>> Thanks,
>> -- 
>> Richard Goodman
> 



Re: How to block expensive solr queries

2019-10-08 Thread Mikhail Khludnev
It's worth to raise an issue for supporting timeAllowed for stats. Until
it's done, something like jetty filter is only an option,

On Tue, Oct 8, 2019 at 12:34 AM Wei  wrote:

> Hi Mikhail,
>
> Yes I have the timeAllowed parameter configured, still is this case it
> doesn't seem to prevent the stats request from blocking other normal
> queries.  Is it possible to drop the request before solr executes it? maybe
> at the jetty request filter?
>
> Thanks,
> Wei
>
> On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev  wrote:
>
> > Hello, Wei.
> >
> > Have you tried to abandon heavy queries with
> >
> >
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
> >  ?
> > It may or may not be able to stop stats.
> >
> >
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> > can clarify it.
> >
> > On Mon, Oct 7, 2019 at 8:19 PM Wei  wrote:
> >
> > > Hi,
> > >
> > > Recently we encountered a problem when solr cloud query latency
> suddenly
> > > increase, many simple queries that has small recall gets time out.
> After
> > > digging a bit I found that the root cause is some stats queries happen
> at
> > > the same time, such as
> > >
> > >
> > >
> >
> /solr/mycollection/select?stats=true=unique_ids=true
> > >
> > >
> > >
> > > I see unique_ids is a high cardinality field so this query is quite
> > > expensive. But why a small volume of such query blocks other queries
> and
> > > make simple queries time out?  I checked the solr thread pool and see
> > there
> > > are plenty of idle threads available.  We are using solr 7.6.2 with a
> 10
> > > shard cloud set up.
> > >
> > > Is there a way to block certain solr queries based on url pattern? i.e.
> > > ignore the stats.calcdistinct request in this case.
> > >
> > >
> > > Thanks,
> > >
> > > Wei
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to block expensive solr queries

2019-10-08 Thread Toke Eskildsen
On Mon, 2019-10-07 at 10:18 -0700, Wei wrote:
> /solr/mycollection/select?stats=true=unique_ids
> cdistinct=true
...
> Is there a way to block certain solr queries based on url pattern?
> i.e. ignore the stats.calcdistinct request in this case.

It sounds like it is possible for users to issue arbitrary queries
against your Solr installation. As you have noticed, it makes it easy
to perform a Denial Of Service (intentional or not). Filtering out
stats.calcdistinct won't help with the next request for
group.ngroups=true, facet.field=unique_id=1,
rows=1 or something fifth.

I recommend you flip your logic and only allow specific types of
requests and put limits on those. To my knowledge that is not a build-
in feature of Solr.

- Toke Eskildsem, Royal Danish Library




Re: Protecting Tokens from Any Analysis

2019-10-08 Thread David Hastings
Another thing to add to the above,
>
> IT:ibm. In this case, we would want to maintain the colon and the
> capitalization (otherwise “it” would be taken out as a stopword).
>
stopwords are a thing of the past at this point.  there is no benefit to
using them now with hardware being so cheap.

On Tue, Oct 8, 2019 at 12:43 PM Alexandre Rafalovitch 
wrote:

> If you don't want it to be touched by a tokenizer, how would the
> protection step know that the sequence of characters you want to
> protect is "IT:ibm" and not "this is an IT:ibm term I want to
> protect"?
>
> What it sounds to me is that you may want to:
> 1) copyField to a second field
> 2) Apply a much lighter (whitespace?) tokenizer to that second field
> 3) Run the results through something like KeepWordFilterFactory
> 4) Search both fields with a boost on the second, higher-signal field
>
> The other option is to run CharacterFilter,
> (PatternReplaceCharFilterFactory) which is pre-tokenizer to map known
> complex acronyms to non-tokenizable substitutions. E.g. "IT:ibm ->
> term365". As long as it is done on both indexing and query, they will
> still match. You may have to have a bunch of them or write some sort
> of lookup map.
>
> Regards,
>Alex.
>
> On Tue, 8 Oct 2019 at 12:10, Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
> >
> > Hi All,
> >
> > This is likely a rudimentary question, but I can’t seem to find a
> straight-forward answer on forums or the documentation…is there a way to
> protect tokens from ANY analysis? I know things like the
> KeywordMarkerFilterFactory protect tokens from stemming, but we have some
> terms we don’t even want our tokenizer to touch. Mostly, these are
> IBM-specific acronyms, such as IT:ibm. In this case, we would want to
> maintain the colon and the capitalization (otherwise “it” would be taken
> out as a stopword).
> >
> > Any advice is appreciated!
> >
> > Thank you,
> > Audrey
> >
> > --
> > Audrey Lorberfeld
> > Data Scientist, w3 Search
> > IBM
> > audrey.lorberf...@ibm.com
> >
>


Protecting Tokens from Any Analysis

2019-10-08 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All,

This is likely a rudimentary question, but I can’t seem to find a 
straight-forward answer on forums or the documentation…is there a way to 
protect tokens from ANY analysis? I know things like the 
KeywordMarkerFilterFactory protect tokens from stemming, but we have some terms 
we don’t even want our tokenizer to touch. Mostly, these are IBM-specific 
acronyms, such as IT:ibm. In this case, we would want to maintain the colon and 
the capitalization (otherwise “it” would be taken out as a stopword).

Any advice is appreciated!

Thank you,
Audrey

--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com



Re: Protecting Tokens from Any Analysis

2019-10-08 Thread Alexandre Rafalovitch
If you don't want it to be touched by a tokenizer, how would the
protection step know that the sequence of characters you want to
protect is "IT:ibm" and not "this is an IT:ibm term I want to
protect"?

What it sounds to me is that you may want to:
1) copyField to a second field
2) Apply a much lighter (whitespace?) tokenizer to that second field
3) Run the results through something like KeepWordFilterFactory
4) Search both fields with a boost on the second, higher-signal field

The other option is to run CharacterFilter,
(PatternReplaceCharFilterFactory) which is pre-tokenizer to map known
complex acronyms to non-tokenizable substitutions. E.g. "IT:ibm ->
term365". As long as it is done on both indexing and query, they will
still match. You may have to have a bunch of them or write some sort
of lookup map.

Regards,
   Alex.

On Tue, 8 Oct 2019 at 12:10, Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:
>
> Hi All,
>
> This is likely a rudimentary question, but I can’t seem to find a 
> straight-forward answer on forums or the documentation…is there a way to 
> protect tokens from ANY analysis? I know things like the 
> KeywordMarkerFilterFactory protect tokens from stemming, but we have some 
> terms we don’t even want our tokenizer to touch. Mostly, these are 
> IBM-specific acronyms, such as IT:ibm. In this case, we would want to 
> maintain the colon and the capitalization (otherwise “it” would be taken out 
> as a stopword).
>
> Any advice is appreciated!
>
> Thank you,
> Audrey
>
> --
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>


Re: Solr 7.7 restore issue

2019-10-08 Thread Natarajan, Rajeswari
I am also facing the same issue. With Solr 7.6 restore fails with below rule. 
Would like to place one replica per node by below rule

 with the rule to place one replica per node
"set-cluster-policy": [{
"replica": "<2",
"shard": "#EACH",
"node": "#ANY"
}]

Without the rule the restore works. But we need this rule. Any suggestions to 
overcome this issue. 

Thanks,
Rajeswari

On 7/12/19, 11:00 AM, "Mark Thill"  wrote:

I have a 4 node cluster.  My goal is to have 2 shards with two replicas
each and only allowing 1 core on each node.  I have a cluster policy set to:

[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

I then manually create a collection with:

name: test
config set: test
numShards: 2
replicationFact: 2

This works and I get a collection that looks like what I expect.  I then
backup this collection.  But when I try to restore the collection it fails
and says

"Error getting replica locations : No node can satisfy the rules"
[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

If I set my cluster-policy rules back to [] and try to restore it then
successfully restores my collection exactly how I expect it to be.  It
appears that having any cluster-policy rules in place is affecting my
restore, but the "error getting replica locations" is strange.

Any suggestions?

mark 




Wild-card query behavior

2019-10-08 Thread Paresh
Hi All,

I am trying wild-card query with query, filter query with and without !join
and finding it difficult to understand the SOLR behavior.

(-) wild-card like 12* in query: field:12* works well
(-) wild-card like 12* in query with {!join to=... from=...}field:12* -->
works well
(-) wild-card like (12*) in query with {!join to=... from=...}field:(12*)
--> doesn't work 
(-) wild-card like (12*) in filter query with ={!join to=...
from=...}field:12* --> doesn't work
(-) wild-card like (12*) in filter query with ={!join to=...
from=...}field:"12*" --> doesn't work
(-) wild-card like (12*) in filter query with ={!join to=...
from=...}field:(12*) --> works well

Why wild-card query does not work with {!join}?

Regards,
Paresh



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [CAUTION] Re: Solr 7.7 restore issue

2019-10-08 Thread Natarajan, Rajeswari
It looks like the rule created before was wrong.

From the solr documentation below
https://lucene.apache.org/solr/guide/7_6/rule-based-replica-placement.html

For a given shard, keep less than 2 replicas on any node
For this rule, we use the shard condition to define any shard, the replica 
condition with operators for "less than 2", and finally a pre-defined tag named 
node to define nodes with any name.

shard:*,replica:<2,node:*

The a above rule works fine with the restore.

Thanks,
Rajeswari

On 10/8/19, 9:34 PM, "Natarajan, Rajeswari"  
wrote:

I am also facing the same issue. With Solr 7.6 restore fails with below 
rule. Would like to place one replica per node by below rule

 with the rule to place one replica per node
"set-cluster-policy": [{
"replica": "<2",
"shard": "#EACH",
"node": "#ANY"
}]

Without the rule the restore works. But we need this rule. Any suggestions 
to overcome this issue. 

Thanks,
Rajeswari

On 7/12/19, 11:00 AM, "Mark Thill"  wrote:

I have a 4 node cluster.  My goal is to have 2 shards with two replicas
each and only allowing 1 core on each node.  I have a cluster policy 
set to:

[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

I then manually create a collection with:

name: test
config set: test
numShards: 2
replicationFact: 2

This works and I get a collection that looks like what I expect.  I then
backup this collection.  But when I try to restore the collection it 
fails
and says

"Error getting replica locations : No node can satisfy the rules"
[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

If I set my cluster-policy rules back to [] and try to restore it then
successfully restores my collection exactly how I expect it to be.  It
appears that having any cluster-policy rules in place is affecting my
restore, but the "error getting replica locations" is strange.

Any suggestions?

mark 






Re: SOLR version 8.1 compatibility with OS OEL 7

2019-10-08 Thread Jörn Franke
The best way is always to test yourself. I have used Solr 8.1 / 8.2 with 
OpenJDK11 on RHEL 7. OpenJDK11 was chosen as this will be the minimal 
compatible one in Solr 9.0 and because older JDKs are already out of support. 
however, I don’t know in how far this is comparable with your setting.

> Am 09.10.2019 um 14:12 schrieb Abburi, Susnigdha 
> :
> 
> Hi Support,
> 
> We are looking at upgrading the SOLR from version 7.2 to version 8.1. Could 
> we please check if SOLR version 8.1 is compatible with Oracle Enterprise 
> Linux 7.
> 
> Thank you!
> 
> Kind Regards,
> Susnigdha.
> 
> 
> 
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy. Your privacy is important to us. Accenture uses your 
> personal data only in compliance with data protection laws. For further 
> information on how Accenture processes your personal data, please see our 
> privacy statement at https://www.accenture.com/us-en/privacy-policy.
> __
> 
> www.accenture.com


Re: ant precommit fails on .adoc files

2019-10-08 Thread Chris Hostetter


This is strange -- I can't reproduce, and I can't see any evidence of a 
change to explain why this might have been failing 8 days ago but not any 
more.

Are you still seeing this error?

The lines in question are XML comments inside of (example) code blocks (in 
the ref-guide source), which is valid and the 
'checkForUnescapedSymbolSubstitutions' groovy function that generates the 
error below already has allowances for this posibility.

(normally putting '->' in asciidoctor source files is a bad idea and 
renders as giberish, which is why we have this check)


I wonder if it's possible that something in the local ENV where you are 
running ant is causing the groovy regex patterns to be evaluated 
differently? (ie: mismatched unix/windows line endings, LANG that doesn't 
use UTF-8, etc...)




: I've checked out lucene-solr project, branch "branch_8x"

: When I run "ant precommit" at project root, I get these validation 
: errors on "analytics.adoc" file.  Has anyone seen these before, and if 
: you knew of a fix?

: validate-source-patterns:
: 
: [source-patterns] Unescaped symbol "->" on line #46: 
solr/solr-ref-guide/src/analytics.adoc
: 
: [source-patterns] Unescaped symbol "->" on line #55: 
solr/solr-ref-guide/src/analytics.adoc


-Hoss
http://www.lucidworks.com/


SOLR version 8.1 compatibility with OS OEL 7

2019-10-08 Thread Abburi, Susnigdha
Hi Support,

We are looking at upgrading the SOLR from version 7.2 to version 8.1. Could we 
please check if SOLR version 8.1 is compatible with Oracle Enterprise Linux 7.

Thank you!

Kind Regards,
Susnigdha.




This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy. Your privacy is important to us. Accenture uses your personal data only 
in compliance with data protection laws. For further information on how 
Accenture processes your personal data, please see our privacy statement at 
https://www.accenture.com/us-en/privacy-policy.
__

www.accenture.com