[jira] [Created] (SOLR-15106) Thread in OverseerTaskProcessor should not "return"

2021-01-25 Thread Mathieu Marie (Jira)
Mathieu Marie created SOLR-15106:


 Summary: Thread in OverseerTaskProcessor should not "return"
 Key: SOLR-15106
 URL: https://issues.apache.org/jira/browse/SOLR-15106
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 8.6, master (9.0)
Reporter: Mathieu Marie


I have encountered a scenario were ZK was not accessible for a long time (due 
to _jute.maxbuffer_ issue, but not related to the rest of this issue).
During that time, the ClusterStateUpdater and OC queues from the Overseer got 
filled with 1200+ messages.

Once we restored ZK availability, the ClusterStateUpdater queue got emptied, 
but not the OC one.

The Overseer stopped to dequeue from the OC queue.

After some digging in the code it seems that a *return* from the overseer 
thread starting the runners could be the issue.

Code in OverseerTaskProcessor.java 
(https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L357)
The lines of codes that immediately follow should also be reviewed carefully as 
they also return or interrupt the thread that is responsible to execute the 
runners.

Anyhow, if anybody hit that same issue, the quick workaround is to bump the 
overseer instance to elect a new overseer on another node.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15083) prometheus-exporter metric solr_metrics_jvm_os_cpu_time_seconds is misnamed

2021-01-15 Thread Mathieu Marie (Jira)
Mathieu Marie created SOLR-15083:


 Summary: prometheus-exporter metric 
solr_metrics_jvm_os_cpu_time_seconds is misnamed
 Key: SOLR-15083
 URL: https://issues.apache.org/jira/browse/SOLR-15083
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - prometheus-exporter
Affects Versions: 8.6, master (9.0)
Reporter: Mathieu Marie


*solr_metrics_jvm_os_cpu_time_seconds* metric exported by prometheus-exporter 
has seconds in its name, however it appears that it is microseconds.

This name can create confusion when one wants to report it in a dashboard.
 That metric is defined in 
[https://github.com/apache/lucene-solr/blob/branch_8_5/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml#L247]

 {code}
  
.metrics["solr.jvm"] | to_entries | .[] | select(.key == 
"os.processCpuTime") as $object |
($object.value / 1000.0) as $value |
{
  name : "solr_metrics_jvm_os_cpu_time_seconds",
  type : "COUNTER",
  help : "See following URL: 
https://lucene.apache.org/solr/guide/metrics-reporting.html;,
  label_names  : ["item"],
  label_values : ["processCpuTime"],
  value: $value
}
  
{code}

In the above config we see that the metric came from  *os.processCpuTime*, 
which itself came from JMX call 
[getProcessCpuTime()|https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuTime()].

That javadoc says

{code}
long getProcessCpuTime()
Returns the CPU time used by the process on which the Java virtual machine is 
running in nanoseconds. The returned value is of nanoseconds precision but not 
necessarily nanoseconds accuracy. This method returns -1 if the the platform 
does not support this operation.
Returns:
the CPU time used by the process in nanoseconds, or -1 if this operation is not 
supported.
{code}

Nanoseconds / 1000 is microseconds.
Either the name or the computation should be updated.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15074) ClassCastException when repeating the same query param twice in Prometheus exporter config file

2021-01-07 Thread Mathieu Marie (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu Marie updated SOLR-15074:
-
Description: 
I am using the prometheus-exporter to monitor my service, and I wish to scale 
down the amount of data exchanged between solr and prometheus.
I can do that by updating the queries in the solr-exporter.xml file.
I wanted to use a query that look like :


{noformat}
http://localhost:8983/solr/admin/metrics?group=solr.node=.*metrics.*requestTimes=p95_ms=count
{
  "responseHeader":{
"status":0,
"QTime":1},
  "metrics":{
"solr.node":{
  "ADMIN./admin/metrics.distrib.requestTimes":{
"count":0,
"p95_ms":0.0},
  "ADMIN./admin/metrics.local.requestTimes":{
"count":238,
"p95_ms":1.984835},
  "ADMIN./admin/metrics.requestTimes":{
"count":238,
"p95_ms":1.995053},
  "QUERY./admin/metrics/collector.distrib.requestTimes":{
"count":0,
"p95_ms":0.0},
  "QUERY./admin/metrics/collector.local.requestTimes":{
"count":0,
"p95_ms":0.0},
  "QUERY./admin/metrics/collector.requestTimes":{
"count":0,
"p95_ms":0.0},
  "QUERY./admin/metrics/history.distrib.requestTimes":{
"count":0,
"p95_ms":0.0},
  "QUERY./admin/metrics/history.local.requestTimes":{
"count":0,
"p95_ms":0.0},
  "QUERY./admin/metrics/history.requestTimes":{
"count":0,
"p95_ms":0.0
{noformat}

In that query I repeated the param property to provide two values:
{{property=p95_ms=count}}

My config file look like that :

 {noformat}
  

  /admin/metrics
  
solr.core
.*metrics.*requestTimes
count
p95_ms
  

     ...
   
{noformat}
 

Then when the prometheus-exporter starts, it produces a ClassCastException.


{noformat}
Exception in thread "main" java.lang.RuntimeException: 
java.lang.ClassCastException: class java.util.ArrayList cannot be cast to class 
java.lang.String (java.util.ArrayList and java.lang.String are in module 
java.base of loader 'bootstrap')
 at 
org.apache.solr.prometheus.exporter.SolrExporter.loadMetricsConfiguration(SolrExporter.java:231)
 at org.apache.solr.prometheus.exporter.SolrExporter.main(SolrExporter.java:213)
Caused by: java.lang.ClassCastException: class java.util.ArrayList cannot be 
cast to class java.lang.String (java.util.ArrayList and java.lang.String are in 
module java.base of loader 'bootstrap')
 at org.apache.solr.prometheus.exporter.MetricsQuery.from(MetricsQuery.java:109)
 at 
org.apache.solr.prometheus.exporter.MetricsConfiguration.toMetricQueries(MetricsConfiguration.java:91)
 at 
org.apache.solr.prometheus.exporter.MetricsConfiguration.from(MetricsConfiguration.java:80)
 at 
org.apache.solr.prometheus.exporter.SolrExporter.loadMetricsConfiguration(SolrExporter.java:228)
 ... 1 more
{noformat}
 

This comes from a bad casting (obviously) in that code:
{noformat}

  NamedList query = (NamedList) request.get("query");
  NamedList queryParameters = (NamedList) query.get("params");
  String path = (String) query.get("path");
  String core = (String) query.get("core");
  String collection = (String) query.get("collection");
  List jsonQueries = (ArrayList) request.get("jsonQueries");

  ModifiableSolrParams params = new ModifiableSolrParams();
  if (queryParameters != null) {
for (Map.Entry entrySet : (Set>) queryParameters.asShallowMap().entrySet()) {
  params.add(entrySet.getKey(), entrySet.getValue());
}
  }

{noformat}

 

 

  was:
I am using the prometheus-exporter to monitor my service, and I wish to scale 
down the amount of data exchanged between solr and prometheus.
I can do that by updating the queries in the solr-exporter.xml file.
I wanted to use a query that look like :



{{http://localhost:8983/solr/admin/metrics?group=solr.node=.*metrics.*requestTimes=p95_ms=count}}{{
 }}{{{}}
{{ "responseHeader":{}}
{{ "status":0,}}
{{ "QTime":1},}}
{{ "metrics":{}}
{{ "solr.node":{}}
{{ "ADMIN./admin/metrics.distrib.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "ADMIN./admin/metrics.local.requestTimes":{}}
{{ "count":238,}}
{{ "p95_ms":1.984835},}}
{{ "ADMIN./admin/metrics.requestTimes":{}}
{{ "count":238,}}
{{ "p95_ms":1.995053},}}
{{ "QUERY./admin/metrics/collector.distrib.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/collector.local.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/collector.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/history.distrib.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/history.local.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/history.requestTimes":{}}
{{ "count":0,}}
{{ 

[jira] [Created] (SOLR-15074) ClassCastException when repeating the same query param twice in Prometheus exporter config file

2021-01-07 Thread Mathieu Marie (Jira)
Mathieu Marie created SOLR-15074:


 Summary: ClassCastException when repeating the same query param 
twice in Prometheus exporter config file
 Key: SOLR-15074
 URL: https://issues.apache.org/jira/browse/SOLR-15074
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - prometheus-exporter
Affects Versions: 8.6.2
Reporter: Mathieu Marie


I am using the prometheus-exporter to monitor my service, and I wish to scale 
down the amount of data exchanged between solr and prometheus.
I can do that by updating the queries in the solr-exporter.xml file.
I wanted to use a query that look like :



{{http://localhost:8983/solr/admin/metrics?group=solr.node=.*metrics.*requestTimes=p95_ms=count}}{{
 }}{{{}}
{{ "responseHeader":{}}
{{ "status":0,}}
{{ "QTime":1},}}
{{ "metrics":{}}
{{ "solr.node":{}}
{{ "ADMIN./admin/metrics.distrib.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "ADMIN./admin/metrics.local.requestTimes":{}}
{{ "count":238,}}
{{ "p95_ms":1.984835},}}
{{ "ADMIN./admin/metrics.requestTimes":{}}
{{ "count":238,}}
{{ "p95_ms":1.995053},}}
{{ "QUERY./admin/metrics/collector.distrib.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/collector.local.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/collector.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/history.distrib.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/history.local.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0},}}
{{ "QUERY./admin/metrics/history.requestTimes":{}}
{{ "count":0,}}
{{ "p95_ms":0.0}}


In that query I repeated the param property to provide two values:
{{property=p95_ms=count}}

{{My config file look like that :}}

 

{{ }}
{{  }}
{{    /admin/metrics}}
{{    }}
{{      solr.core}}
{{      .*metrics.*requestTimes}}
{color:#FF}{{      count}}{color}
{color:#FF}{{      p95_ms}}{color}
{{    }}
{{   }}

{{   ...}}

{{}}

 

{{Then when the prometheus-exporter starts, it produces a ClassCastException.}}

 


Exception in thread "main" java.lang.RuntimeException: 
java.lang.ClassCastException: class java.util.ArrayList cannot be cast to class 
java.lang.String (java.util.ArrayList and java.lang.String are in module 
java.base of loader 'bootstrap')
 at 
org.apache.solr.prometheus.exporter.SolrExporter.loadMetricsConfiguration(SolrExporter.java:231)
 at org.apache.solr.prometheus.exporter.SolrExporter.main(SolrExporter.java:213)
Caused by: java.lang.ClassCastException: class java.util.ArrayList cannot be 
cast to class java.lang.String (java.util.ArrayList and java.lang.String are in 
module java.base of loader 'bootstrap')
 at org.apache.solr.prometheus.exporter.MetricsQuery.from(MetricsQuery.java:109)
 at 
org.apache.solr.prometheus.exporter.MetricsConfiguration.toMetricQueries(MetricsConfiguration.java:91)
 at 
org.apache.solr.prometheus.exporter.MetricsConfiguration.from(MetricsConfiguration.java:80)
 at 
org.apache.solr.prometheus.exporter.SolrExporter.loadMetricsConfiguration(SolrExporter.java:228)
 ... 1 more

 

{{This comes from a bad casting (obviously) in that code:}}

{{NamedList query = (NamedList) request.get("query");}}
{{NamedList queryParameters = (NamedList) query.get("params");}}
{{String path = (String) query.get("path");}}
{{String core = (String) query.get("core");}}
{{String collection = (String) query.get("collection");}}
{{List jsonQueries = (ArrayList) request.get("jsonQueries");}}

{{ModifiableSolrParams params = new ModifiableSolrParams();}}
{{if (queryParameters != null) {}}
{{{color:#FF}  for (Map.Entry entrySet : 
(Set>) 
queryParameters.asShallowMap().entrySet()) {{color}}}
{{{color:#FF}      params.add(entrySet.getKey(), 
entrySet.getValue());{color}}}
{{   }}}
{{}}}{{}}{{}}{{}}

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15007) Aggregate core handler=/select and /update metrics at the node level metric too

2020-11-17 Thread Mathieu Marie (Jira)
Mathieu Marie created SOLR-15007:


 Summary: Aggregate core handler=/select and /update metrics at the 
node level metric too
 Key: SOLR-15007
 URL: https://issues.apache.org/jira/browse/SOLR-15007
 Project: Solr
  Issue Type: Wish
  Security Level: Public (Default Security Level. Issues are Public)
  Components: metrics
Affects Versions: master (9.0)
Reporter: Mathieu Marie


At my company, we anticipate huge number of cores and would like to report 
aggregated view at the node level instead of the core level that will grow 
exponentially.



Right now, we're aggregating all of the solr.cores metrics to compute 
per-cluster dashboards.
But given that there are many admin handlers already reporting metrics at the 
node level, I wonder if we could aggregate _/update_, _/select_ and all the 
other handler counters in solr and expose them at the solr.node level too.

It would requires (a lot) less data to transport, store and aggregate later, 
while still giving access to per core metrics.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14699) Solr request logs should escape names, values (SolrQueryResponse.getToLogAsString)

2020-08-03 Thread Mathieu Marie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170050#comment-17170050
 ] 

Mathieu Marie commented on SOLR-14699:
--

We should not encode depending on the content. Else it becomes to difficult to 
understand if some values where already encoded or have been encoded because 
they contained a special character. It looks like a global parameter would be 
preferred in that case. 

> Solr request logs should escape names, values 
> (SolrQueryResponse.getToLogAsString)
> --
>
> Key: SOLR-14699
> URL: https://issues.apache.org/jira/browse/SOLR-14699
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: logging
>Reporter: David Smiley
>Priority: Minor
>
> {{SolrQueryResponse.getToLogAsString}} encodes the NamedList into a String 
> with simple space-separated pairs with name=value.  However, it does no 
> escaping/encoding, and as-such a value might itself contain spaces and 
> equals.  This is a problem if these logs are being parsed, and we'd like to 
> ensure we do so correctly.  Note that SolrLogPostTool (aka "postlogs") parses 
> these logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org