Re: Expose a metric for percentage-recovered during full recoveries

2018-03-15 Thread Andrzej BiaƂecki
Hi S G,

This looks useful, and it should be easy to add to the existing metrics in 
ReplicationHandler, probably somewhere around ReplicationHandler:856 .

> On 14 Mar 2018, at 20:16, S G  wrote:
> 
> Hi,
> 
> Solr does full recoveries very frequently - sometimes even for seemingly
> simple cases like adding a field to the schema, a couple of nodes go into
> recovery.
> It would be nice if it did not do such full recoveries so frequently but
> since that may require a lot of fixing, can we have a metric that reports
> how much a core has recovered already?
> 
> Example:
> 
> $ cd data
> $ du -h . | grep  my_collection | grep -w index
> 77G   ./my_collection_shard3_replica2/data/index.20180314184942993
> 145G ./my_collection_shard3_replica2/data/index.20180112001943687
> 
> This shows that the shard3-replica2 core is doing a full recovery and has
> only copied 77G out of 145G
> That is about 50% recovery done.
> 
> 
> It would be very nice if we can have this as a JMX metric and we can then
> plot it somewhere instead of having to keep running the same command in a
> loop and guessing how much is left to be copied.
> 
> A metric like the following would be great:
> {
>"my_collection_shard3_replica2": {
> "recovery": {
>  "currentSize": "77 gb",
>  "expectedSize": "145 gb",
>  "percentRecovered": "50",
>  "startTimeEpoch": "361273126317"
>  }
>}
> }
> 
> If it looks useful, I will open a JIRA for the same.
> 
> Thanks
> SG



Re: Expose a metric for percentage-recovered during full recoveries

2018-03-15 Thread Rick Leir
S
Were there errors in the logs just before recoveries?
Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Expose a metric for percentage-recovered during full recoveries

2018-03-14 Thread S G
Hi,

Solr does full recoveries very frequently - sometimes even for seemingly
simple cases like adding a field to the schema, a couple of nodes go into
recovery.
It would be nice if it did not do such full recoveries so frequently but
since that may require a lot of fixing, can we have a metric that reports
how much a core has recovered already?

Example:

$ cd data
$ du -h . | grep  my_collection | grep -w index
77G   ./my_collection_shard3_replica2/data/index.20180314184942993
145G ./my_collection_shard3_replica2/data/index.20180112001943687

This shows that the shard3-replica2 core is doing a full recovery and has
only copied 77G out of 145G
That is about 50% recovery done.


It would be very nice if we can have this as a JMX metric and we can then
plot it somewhere instead of having to keep running the same command in a
loop and guessing how much is left to be copied.

A metric like the following would be great:
{
"my_collection_shard3_replica2": {
 "recovery": {
  "currentSize": "77 gb",
  "expectedSize": "145 gb",
  "percentRecovered": "50",
  "startTimeEpoch": "361273126317"
  }
}
}

If it looks useful, I will open a JIRA for the same.

Thanks
SG