[prometheus-users] Re: NTP Metrics.

2020-05-19 Thread Yagyansh S. Kumar
Thanks for the response Brian.

I have already enabled the NTP collector in all all my servers, but still 
cannot see the *node_ntp_drift_seconds* metrics giving the output. Apart 
from that, I have couple of questions here. 
Firstly, why are we checking the target clock with Prometheus' server? What 
if it itself get unsyncronized? The whole idea of alerting goes out of the 
water in that case. Also, what does the node_ntp_sanity checks? How much is 
the variation in the clock that it takes into consideration to make the 
sanity 0(I know other factors also can make the sanity 0, but what is the 
criteria to call the clock *unsynchronized*. Same question for 
node_ntp_leap, if leaps turn 3, that means it is unsynchronized. Again what 
is the difference in clock timings that is takes to call the clock 
*unsynchronized*?

Secondly, according to you which one is better for keeping a track of clock 
Sync? Timex or NTP? 

On Tuesday, May 19, 2020 at 2:32:06 PM UTC+5:30, Brian Candler wrote:
>
> The ntp collector is disabled by default 
> : you 
> can turn it on with a command-line flag. However, the timex collector is 
> enabled by default (e.g. node_timex_sync_status, 
> node_timex_estimated_error_seconds)
>
> For a rough idea of how the target clock compares to the prometheus 
> server's clock, you can also just do:
>
> node_time_seconds - timestamp(node_time_seconds)
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ac52c604-53cd-4a20-a7fd-7b46e2a8bf65%40googlegroups.com.


[prometheus-users] How to generate a GUID or current time for my annotations in Prometheus alerting rule templates

2020-05-19 Thread zichen chuh
I went through documents given by prometheus website and didn't find a clue.

>From alerting_rules 

 , 
only 3 variables are available : $lables, $externalLabels, $value.

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d5b9244a-1582-4816-a314-9b7c9917aa10%40googlegroups.com.


Re: [prometheus-users] Error shutting down ActiveMQ with JMX Exporter

2020-05-19 Thread Harald Koch

On Tue, May 19, 2020, at 18:00, Brad Pridgeon wrote:
> I'm testing this JMX Exporter  
> with ActiveMQ to get metrics. Per instructions in this post, I setup the jar 
> as a Java agent. 
> 

Just FYI - unless your traffic volumes are low, I don't recommend this. 
Accessing the ActiveMQ JMX endpoints causes high CPU usage and measurably 
affects performance.

-- 
Harald

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9483dbad-0c82-4d9e-aebd-ca72f36ea51c%40www.fastmail.com.


[prometheus-users] Re: Monitor specific application process in Linux

2020-05-19 Thread Juan Rosero
Thanks everyone! I'll explore the options suggested and see what works 
best. Thanks again and have a great day/evening!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/60e4b002-30cb-44b4-ba6e-5164bc8e4b63%40googlegroups.com.


[prometheus-users] Re: Monitor specific application process in Linux

2020-05-19 Thread Juan Rosero
Thanks everyone! I'll exporte the options suggested and see what works 
best. Thanks again and have a great day/evening!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d14c32eb-3425-4832-960b-2e69d15bae44%40googlegroups.com.


[prometheus-users] Does Prometheus cloudwatch exporter supports multiple AWS regions in a single instance

2020-05-19 Thread Jayaprakash Rangaswamy
Hello Team,

I want to export AWS cloudwatch metrics which are running in mutilple AWS 
regions. 
As per the Prometheus cloudwatch exporter documentation (
https://github.com/prometheus/cloudwatch_exporter), I could see AWS region 
should be defined for metrics export in YAML file. 
To export from multiple AWS regions, list of region names separated by 
comma works?


Regards,
Jayaprakash

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4a401ecf-70ed-4b11-a5be-b20631525b89%40googlegroups.com.


[prometheus-users] Error shutting down ActiveMQ with JMX Exporter

2020-05-19 Thread Brad Pridgeon
I'm testing this JMX Exporter  
with ActiveMQ to get metrics.  Per instructions in this post, I setup the 
jar as a Java agent. 


Here are the changes in the activemq/bin/env file:

if [ -z "$ACTIVEMQ_OPTS" ] ; then
ACTIVEMQ_OPTS="$ACTIVEMQ_OPTS_MEMORY 
-Djava.util.logging.config.file=logging.properties 
-Djava.security.auth.login.config=$ACTIVEMQ_CONF/login.config"
fi

ACTIVEMQ_OPTS="$ACTIVEMQ_OPTS 
-javaagent:/apps/jmx_exporter/jmx_prometheus_javaagent-0.13.0.jar=localhost:8042:/apps/jmx_exporter/config.yml"

It starts correctly, and the metrics are accessible over the configured 
port, but when I stop ActiveMQ, I get the following exception below.  Does 
anyone know why this is occurring ?

Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
at 
sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.net.httpserver.ServerImpl.(ServerImpl.java:100)
at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50)
at 
sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130)
at 
io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:176)
at 
io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31)

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c0df773c-e02b-4b03-8181-ff432d615188%40googlegroups.com.


[prometheus-users] Diff bw MetricRegistry and Mbean Server

2020-05-19 Thread Thomas Will
Hello fellas,

I have searched a lot before asking here but didn't get the solution. What 
is the difference between MetricRegistry and Mbean Server, and in what 
cases we use each of them?

Have a good day.
Thomas Will.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6f7ba3c7-33cb-498b-a3e4-8d83d1e516b1%40googlegroups.com.


Re: [prometheus-users] derive alert severity from other labels

2020-05-19 Thread Christian Hoffmann
Hi Roland,

On 5/19/20 10:25 AM, Roland Mieslinger wrote:
> we are using the same set of alert rules for both, our production and qa
> environment, with the severity label set to a value based on what is
> appropriate for production.
> As a consequence, alert severity is too high for most alerts in our qa
> environment.
> 
> The environment is available as a label form every metric, I could of
> course duplicate all alert rules, filter by environment label, and set
> the appropriate severity label; very tedious in the long run, but so far
> the only solution that came to my mind,
> 
> Are there better ways to achieve this, what am I missing?
> 
> Something like the ternary operator would be helpful in this case, e. g.:
>   labels:
>     severity: environment=="qa" ? "warn" : "page"
> 
> Alternativly some kind of "functional if " could solve this as well:
>   labels:
>     severity: iff(environment=="qa", "warn", "page")
> note: depending on the implementation this could cause performance
> issues if the expression engine requires
>   evaluation of all parameters passed to the function

alert_relabel_configs might be another option to override the severity
label of alerts with specific labels (e.g. environment="qa").

I would also go for handling this in alert routing though, I agree.

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cae39359-a4f5-3d05-f0dd-e35bd9c5a8ad%40hoffmann-christian.info.


Re: [prometheus-users] Monitor specific application process in Linux

2020-05-19 Thread Christian Hoffmann
Hi Juan,
On 5/19/20 9:11 AM, Juan Rosero wrote:
> I've been reading a lot on different sites and this User Group as well,
> but have not come up with a clear answer. I need to monitor a specific
> application process in Linux and verify if it's running and I've been
> reading about *--collector.processes* and enabling it on Node Exporter.
> Ideally, I would like to narrow down to that specific process instead of
> getting info on all running processes on the system. What's the best
> approach for this and correct syntax?

Besides the other replies, there are also existing process exporters
which may fit your use case.
I've been using this one:
https://github.com/ncabatoff/process-exporter

Just expect that it may accumulate some CPU time over time (which is not
necessarily a problem with that implementation; older monitoring tools
might use "ps" for such checks and cpu time will never be tracked for
them as one would only have short-living processes compared to
process_exporter).

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/de5dd02a-064c-e4fa-2eb7-5a2bd93f0ecc%40hoffmann-christian.info.


Re: [prometheus-users] Prometheus setting off Checkpoint firewalls

2020-05-19 Thread Andy Kruta
Although I wish it was, unfortunately, it's not an option.  The good news 
is that I don't have to deal with the checkpoints much longer.  The bad 
news is that until I get rid of them, I have to silence the noise.

On Tuesday, May 19, 2020 at 10:53:38 AM UTC-5, Brian Brazil wrote:
>
> On Tue, 19 May 2020 at 16:02, Andy Kruta > 
> wrote:
>
>> My apologies if this has been answered already, but I've looked through 
>> the configs for a setting that would allow me to define how many targets 
>> can be scraped at once and came up empty.  Essentially, what I've got going 
>> on here is my prometheus is being blocked by my checkpoint firewalls (for 
>> between 10-20 minutes) due to the number of targets that it's scraping at 
>> once ( because of the Suspicious Activity Monitoring module.)  
>>
>> My configuration:
>>
>>
>>- Central Prometheus server
>>- Multiple Data Centers 
>>   - SNMP monitored by local SNMP Exporters local to each datacenter
>>   - Windows / Linux boxes monitored via Telegraf scraping
>>   - Various other exporters (generally on the Prometheus server 
>>   itself unless large number of targets in remote datacenter)
>>
>>
>> Unfortunately, I've already talked to Checkpoint and made all of the 
>> changes they recommend without any improvement.  I've also already 
>> increased the scrape interval (currently sitting at 4m) but the scrapes 
>> appear to all be happening within say a minute of each other.  This results 
>> in the checkpoints blocking the activity and the targets appearing to be 
>> down.  
>>
>> My only other idea to resolve this is to increase the time in the alert 
>> configuration to give additional time so that while the firewall is still 
>> blocking the traffic, we don't get the alerts.  This feels moronic though, 
>> and I'm holding it back as a "just keep my mailbox empty" route. 
>>
>> Has anyone come up with a clever way to work around this?
>>
>
> Prometheus already spreads the scrapes across time, this is fundamentally 
> an issue with your firewall blocking scrapes.  The generally recommended 
> architecture would be to have a Prometheus inside each datacenter, rather 
> than trying to scrape everything across datacenters.
>
> -- 
> Brian Brazil
> www.robustperception.io
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/fa25acb2-43ec-4deb-8b53-b2c841c1b83d%40googlegroups.com.


[prometheus-users] Re: Alertmanager Pod is failing - CrashLoopBackOff

2020-05-19 Thread vikram yerneni
It worked... Thanks a lot Brian...

On Tuesday, May 19, 2020 at 1:48:26 AM UTC-5, Brian Candler wrote:
>
> What the error is saying is you tried to add a setting "teams" under 
> opsgenie_configs, but opsgenie_configs does not recognise such an option.  
> The set of allowed options is defined here:
> https://prometheus.io/docs/alerting/configuration/#opsgenie_config
>
> Maybe you wanted something like:
>
> opsgenie_configs:
>   - name: saas-ops
> responders:
>   - name: blah
> type: team
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8bb469bd-140e-4292-9777-b30f96fab717%40googlegroups.com.


[prometheus-users] Re: Alertmanager Pod is failing - CrashLoopBackOff

2020-05-19 Thread vikram yerneni
Sure Brian... I am not sure the config changes earlier. Let me try it out.
Thanks

On Tuesday, May 19, 2020 at 1:48:26 AM UTC-5, Brian Candler wrote:
>
> What the error is saying is you tried to add a setting "teams" under 
> opsgenie_configs, but opsgenie_configs does not recognise such an option.  
> The set of allowed options is defined here:
> https://prometheus.io/docs/alerting/configuration/#opsgenie_config
>
> Maybe you wanted something like:
>
> opsgenie_configs:
>   - name: saas-ops
> responders:
>   - name: blah
> type: team
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b820d6ea-d7da-44e3-b685-4f0d4dd44197%40googlegroups.com.


Re: [prometheus-users] How to get data from SYNTAX section of MIB file

2020-05-19 Thread Denis Trunov
It helped with moduleIDsType metrics - instead of

moduleIDsType{moduleIDsIndex="1"} 296


with

overrides:
  moduleIDsType:
type: EnumAsInfo

I can get

moduleIDsType_info{moduleIDsIndex="1",moduleIDsType="moduleDigitalVideo12PortIO"}
 1


But in other metrics I still have

moduleConfigsPowerStatus{moduleConfigsIndex="1",moduleIDsType="296"} 2







вторник, 19 мая 2020 г., 18:54:39 UTC+3 пользователь Brian Brazil написал:
>
> On Tue, 19 May 2020 at 16:51, Denis Trunov  > wrote:
>
>> Hi,
>> I have a very simple generator.yml file with just walk section - it works 
>> fine, I can get metrics look like
>>
>> moduleConfigsPowerStatus{moduleConfigsIndex="1",moduleIDsType="296"} 2
>>
>>
>> moduleIDsType is described in MIB
>>
>> moduleIDsType OBJECT-TYPE
>> SYNTAX  Integer32 { moduleUnknown(0), 
>> moduleDigitalVideoInput(272), moduleDigitalVideoOutput(288), 
>> moduleDigitalVideo12PortIO(296), moduleAnalogueVideoInput(304),
>> moduleAnalogueVideoOutput(320), 
>> moduleDigitalAudioInput(336), moduleDigitalAudioOutput(352), 
>> moduleFibreInput(368),
>> moduleFibreOutput(384), 
>> moduleTimeCode(400), moduleRS422(402), moduleVideoCrosspoint(416),
>> moduleAudioCrosspoint(418), 
>> moduleDigitalVideoInputXpntOutput(432), 
>> moduleAnalogueVideoInputXpntOutput(433),
>> moduleDigitalAudioInputXpntOutput(434), 
>> moduleAnalogueAudioInputXpntOutput(435), 
>> moduleRS422InputCrosspointOutput(440),
>> moduleMonitor(464), 
>> modulePowerSupply(466), moduleFanAlarm(468), moduleControl(448), 
>> moduleControlExpansion(449),
>> moduleDigitalVideoInputVariant2(528), 
>> moduleDigitalVideoOutputVariant2(544), moduleVideoCrosspointVariant2(672),
>> moduleTimecodeInputXpntOutput(441), 
>> moduleCATSIIControl(470), moduleReferenceControl(471), 
>> moduleCATSIIControlVariant2(726),
>> moduleMV830Input(278), 
>> moduleMV830Output(294), moduleIPInput(276), moduleIPOutput(292) }
>> MAX-ACCESS  read-only
>> STATUS  current
>> DESCRIPTION "None"
>> ::= { moduleIDsEntry 12 }
>>
>> How can I replace "296" with the corresponding text value? I.e. 296 
>> -> moduleDigitalVideo12PortIO
>>
>
> Presuming that doesn't vary over time, add an metric override for 
> EnumAsInfo in the generator.yml.
>
> -- 
> Brian Brazil
> www.robustperception.io
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/69d93de3-b937-4314-993b-e647a201b356%40googlegroups.com.


Re: [prometheus-users] Re: How to optimize High cardinality labels in Prometheus

2020-05-19 Thread Stuart Clark
If you are doing large queries which touch a lot of timeseries you will need 
lots of memory and CPU.

Ideally you would minimise such queries, or use pre-aggregated metrics (created 
with recording rules) to simplify what is being requested.

I'd suggest looking at what you are try to achieve. Are you looking over a long 
period? Could you aggregate?

Scaling horizontally will quite likely not help if you are just asking 
Prometheus to process huge amounts of data. 

On 19 May 2020 15:29:26 BST, Dinesh Nithyanandam 
 wrote:
>Can someone please help here
>
>On Tuesday, May 19, 2020 at 2:25:01 AM UTC+5:30, Dinesh N wrote:
>>
>> Hi Team,
>>
>> I have been using Thanos-Prometheus stack and running into high 
>> cardinality issues where CPU goes till ~80% and then goes down and
>this 
>> happens when firing high cardinality queries which results in "Http 
>> superfluous" exception and then promethus instance goes down.
>>
>> We are trying following things as listed below - 
>>
>> 1) We are running only with 2 instances of Prometheus on top of
>Thanos 
>> querier and need guidance where can we increase more to handle huge
>queries
>>
>> 2) Any front end cache like cortex cache can help here for high 
>> cardinality queries ?
>>
>> 3) Looking for any optimal linux parameters like hugepages which
>would 
>> suffice high cardinality issues 
>>
>>
>> RCA So far, I have observed was CPU was clocking till ~80% and
>prometheus 
>> server was doing down and I also see lot of memory residing at cache
>memory
>> .
>> Even with above options we are not sure whether we are looking things
>at 
>> right direction hence need Your pair of eyes and pointers would be
>greatly 
>> appreciated here Brain.
>>
>>
>> Thanks and Regards
>> Dinesh
>>
>
>-- 
>You received this message because you are subscribed to the Google
>Groups "Prometheus Users" group.
>To unsubscribe from this group and stop receiving emails from it, send
>an email to prometheus-users+unsubscr...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/prometheus-users/7aeb7ca5-3f81-4670-b80f-6b9030d913df%40googlegroups.com.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CDEBB67B-2F8F-4B00-B8AA-69350108003A%40Jahingo.com.


Re: [prometheus-users] How to get data from SYNTAX section of MIB file

2020-05-19 Thread Brian Brazil
On Tue, 19 May 2020 at 16:51, Denis Trunov  wrote:

> Hi,
> I have a very simple generator.yml file with just walk section - it works
> fine, I can get metrics look like
>
> moduleConfigsPowerStatus{moduleConfigsIndex="1",moduleIDsType="296"} 2
>
>
> moduleIDsType is described in MIB
>
> moduleIDsType OBJECT-TYPE
> SYNTAX  Integer32 { moduleUnknown(0),
> moduleDigitalVideoInput(272), moduleDigitalVideoOutput(288),
> moduleDigitalVideo12PortIO(296), moduleAnalogueVideoInput(304),
> moduleAnalogueVideoOutput(320),
> moduleDigitalAudioInput(336), moduleDigitalAudioOutput(352),
> moduleFibreInput(368),
> moduleFibreOutput(384),
> moduleTimeCode(400), moduleRS422(402), moduleVideoCrosspoint(416),
> moduleAudioCrosspoint(418),
> moduleDigitalVideoInputXpntOutput(432),
> moduleAnalogueVideoInputXpntOutput(433),
> moduleDigitalAudioInputXpntOutput(434),
> moduleAnalogueAudioInputXpntOutput(435),
> moduleRS422InputCrosspointOutput(440),
> moduleMonitor(464),
> modulePowerSupply(466), moduleFanAlarm(468), moduleControl(448),
> moduleControlExpansion(449),
> moduleDigitalVideoInputVariant2(528),
> moduleDigitalVideoOutputVariant2(544), moduleVideoCrosspointVariant2(672),
> moduleTimecodeInputXpntOutput(441),
> moduleCATSIIControl(470), moduleReferenceControl(471),
> moduleCATSIIControlVariant2(726),
> moduleMV830Input(278),
> moduleMV830Output(294), moduleIPInput(276), moduleIPOutput(292) }
> MAX-ACCESS  read-only
> STATUS  current
> DESCRIPTION "None"
> ::= { moduleIDsEntry 12 }
>
> How can I replace "296" with the corresponding text value? I.e. 296
> -> moduleDigitalVideo12PortIO
>

Presuming that doesn't vary over time, add an metric override for
EnumAsInfo in the generator.yml.

-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLr9umgevKX5yPzKiKPcF3NTta4RoFaNtDOVx1j0BFghrA%40mail.gmail.com.


Re: [prometheus-users] Prometheus setting off Checkpoint firewalls

2020-05-19 Thread Brian Brazil
On Tue, 19 May 2020 at 16:02, Andy Kruta  wrote:

> My apologies if this has been answered already, but I've looked through
> the configs for a setting that would allow me to define how many targets
> can be scraped at once and came up empty.  Essentially, what I've got going
> on here is my prometheus is being blocked by my checkpoint firewalls (for
> between 10-20 minutes) due to the number of targets that it's scraping at
> once ( because of the Suspicious Activity Monitoring module.)
>
> My configuration:
>
>
>- Central Prometheus server
>- Multiple Data Centers
>   - SNMP monitored by local SNMP Exporters local to each datacenter
>   - Windows / Linux boxes monitored via Telegraf scraping
>   - Various other exporters (generally on the Prometheus server
>   itself unless large number of targets in remote datacenter)
>
>
> Unfortunately, I've already talked to Checkpoint and made all of the
> changes they recommend without any improvement.  I've also already
> increased the scrape interval (currently sitting at 4m) but the scrapes
> appear to all be happening within say a minute of each other.  This results
> in the checkpoints blocking the activity and the targets appearing to be
> down.
>
> My only other idea to resolve this is to increase the time in the alert
> configuration to give additional time so that while the firewall is still
> blocking the traffic, we don't get the alerts.  This feels moronic though,
> and I'm holding it back as a "just keep my mailbox empty" route.
>
> Has anyone come up with a clever way to work around this?
>

Prometheus already spreads the scrapes across time, this is fundamentally
an issue with your firewall blocking scrapes.  The generally recommended
architecture would be to have a Prometheus inside each datacenter, rather
than trying to scrape everything across datacenters.

-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLr%2BZ3VLX7dnQa09tOn9qc8wY2MhCQzRyeDKY%3D1i0jzMiQ%40mail.gmail.com.


[prometheus-users] How to get data from SYNTAX section of MIB file

2020-05-19 Thread Denis Trunov
Hi,
I have a very simple generator.yml file with just walk section - it works 
fine, I can get metrics look like

moduleConfigsPowerStatus{moduleConfigsIndex="1",moduleIDsType="296"} 2


moduleIDsType is described in MIB

moduleIDsType OBJECT-TYPE
SYNTAX  Integer32 { moduleUnknown(0), 
moduleDigitalVideoInput(272), moduleDigitalVideoOutput(288), 
moduleDigitalVideo12PortIO(296), moduleAnalogueVideoInput(304),
moduleAnalogueVideoOutput(320), 
moduleDigitalAudioInput(336), moduleDigitalAudioOutput(352), 
moduleFibreInput(368),
moduleFibreOutput(384), 
moduleTimeCode(400), moduleRS422(402), moduleVideoCrosspoint(416),
moduleAudioCrosspoint(418), 
moduleDigitalVideoInputXpntOutput(432), 
moduleAnalogueVideoInputXpntOutput(433),
moduleDigitalAudioInputXpntOutput(434), 
moduleAnalogueAudioInputXpntOutput(435), 
moduleRS422InputCrosspointOutput(440),
moduleMonitor(464), modulePowerSupply(466), 
moduleFanAlarm(468), moduleControl(448), moduleControlExpansion(449),
moduleDigitalVideoInputVariant2(528), 
moduleDigitalVideoOutputVariant2(544), moduleVideoCrosspointVariant2(672),
moduleTimecodeInputXpntOutput(441), 
moduleCATSIIControl(470), moduleReferenceControl(471), 
moduleCATSIIControlVariant2(726),
moduleMV830Input(278), 
moduleMV830Output(294), moduleIPInput(276), moduleIPOutput(292) }
MAX-ACCESS  read-only
STATUS  current
DESCRIPTION "None"
::= { moduleIDsEntry 12 }

How can I replace "296" with the corresponding text value? I.e. 296 
-> moduleDigitalVideo12PortIO

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d95c83d6-5bc7-419c-abb9-2126d48ea0f7%40googlegroups.com.


[prometheus-users] Prometheus setting off Checkpoint firewalls

2020-05-19 Thread Andy Kruta
My apologies if this has been answered already, but I've looked through the 
configs for a setting that would allow me to define how many targets can be 
scraped at once and came up empty.  Essentially, what I've got going on 
here is my prometheus is being blocked by my checkpoint firewalls (for 
between 10-20 minutes) due to the number of targets that it's scraping at 
once ( because of the Suspicious Activity Monitoring module.)  

My configuration:


   - Central Prometheus server
   - Multiple Data Centers 
  - SNMP monitored by local SNMP Exporters local to each datacenter
  - Windows / Linux boxes monitored via Telegraf scraping
  - Various other exporters (generally on the Prometheus server itself 
  unless large number of targets in remote datacenter)
   

Unfortunately, I've already talked to Checkpoint and made all of the 
changes they recommend without any improvement.  I've also already 
increased the scrape interval (currently sitting at 4m) but the scrapes 
appear to all be happening within say a minute of each other.  This results 
in the checkpoints blocking the activity and the targets appearing to be 
down.  

My only other idea to resolve this is to increase the time in the alert 
configuration to give additional time so that while the firewall is still 
blocking the traffic, we don't get the alerts.  This feels moronic though, 
and I'm holding it back as a "just keep my mailbox empty" route. 

Has anyone come up with a clever way to work around this?

Thanks,

Andy

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1d24070e-eda2-4c1a-b5f3-e747e920ba82%40googlegroups.com.


[prometheus-users] How to scale Prometheus in an organization?

2020-05-19 Thread Juergen Etzlstorfer
Hi everyone,

I’ve blogified some of our learnings when it comes to scaling Prometheus in 
an organization. The article should help to understand the most common 
challenges + give guidance how to overcome them. Plus I am briefly 
discussing an open-source project called Keptn https://keptn.sh that might 
help you get started more easily.

Happy to hear your feedback! 
https://medium.com/keptn/overcoming-scalability-issues-in-your-prometheus-ecosystem-4430cea6472f?sk=edf9a96ff81721c290e005fc7b4b9bfb

What would you expect from an article like this? Are there any crucial 
parts missing? Is the discussed solutions something that might be of 
interest to you?

Cheers,
Jürgen.



-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cf13b181-9fc0-4378-843b-36c94c59f696%40googlegroups.com.


[prometheus-users] Re: How to optimize High cardinality labels in Prometheus

2020-05-19 Thread Dinesh Nithyanandam
Can someone please help here

On Tuesday, May 19, 2020 at 2:25:01 AM UTC+5:30, Dinesh N wrote:
>
> Hi Team,
>
> I have been using Thanos-Prometheus stack and running into high 
> cardinality issues where CPU goes till ~80% and then goes down and this 
> happens when firing high cardinality queries which results in "Http 
> superfluous" exception and then promethus instance goes down.
>
> We are trying following things as listed below - 
>
> 1) We are running only with 2 instances of Prometheus on top of Thanos 
> querier and need guidance where can we increase more to handle huge queries
>
> 2) Any front end cache like cortex cache can help here for high 
> cardinality queries ?
>
> 3) Looking for any optimal linux parameters like hugepages which would 
> suffice high cardinality issues 
>
>
> RCA So far, I have observed was CPU was clocking till ~80% and prometheus 
> server was doing down and I also see lot of memory residing at cache memory
> .
> Even with above options we are not sure whether we are looking things at 
> right direction hence need Your pair of eyes and pointers would be greatly 
> appreciated here Brain.
>
>
> Thanks and Regards
> Dinesh
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7aeb7ca5-3f81-4670-b80f-6b9030d913df%40googlegroups.com.


[prometheus-users] codahale and jmx instrumentation

2020-05-19 Thread aditya garg
Hello guys, I saw that there are 2 ways to the instrument java application.

1) Using JMX, in which we need to manage the MBean server.
2) Using codahale, where there is a metric registry to do the same work.

I want to know that am I thinking correct by assuming this. Secondly, how 
to do instrumentation when the application is not Java-based.

Do both the MBean server and metric registry present in JVM?

How in both the cases we are exporting them to a port so that Prometheus 
can collect them.

Regards,
Aditya Garg

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b1326870-b3e2-4823-b799-527412a60c1b%40googlegroups.com.


Re: [prometheus-users] Suggested limit for max number of series per Prometheus instance?

2020-05-19 Thread Brian Brazil
On Tue, 19 May 2020 at 14:23, Al  wrote:

> I'm currently on-boarding many more metrics and hosts to our Prometheus
> infrastructure and I wanted to know when is it advised to shard out metrics
> into a separate instance?  I have multiple shards (or groups) of Prometheus
> instances although one of the groups of instances has over 3 million total
> series in it, even though its ingestion rate doesn't really go above a
> comfortable 100k series/second.  I know that a single instance can
> typically handle up to 1 million series per second, although I can't seem
> to find clear guidelines with regards to the total number of series per
> Prometheus host.   Some of the metrics have a relatively higher
> cardinality, thus causing the higher number of series.   I'd appreciate any
> advice you can provide me with.
>

The limit currently appears to be somewhere in the low tens of millions of
head series.

Brian


>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/608224ac-f7fc-4dbb-9b18-0e7ed4f6e604%40googlegroups.com
> 
> .
>


-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLqiD1X-vUtHQysK%2BDh6q23YyQub-%2B_V4AtRNTNTEjUbwQ%40mail.gmail.com.


[prometheus-users] Suggested limit for max number of series per Prometheus instance?

2020-05-19 Thread Al
I'm currently on-boarding many more metrics and hosts to our Prometheus 
infrastructure and I wanted to know when is it advised to shard out metrics 
into a separate instance?  I have multiple shards (or groups) of Prometheus 
instances although one of the groups of instances has over 3 million total 
series in it, even though its ingestion rate doesn't really go above a 
comfortable 100k series/second.  I know that a single instance can 
typically handle up to 1 million series per second, although I can't seem 
to find clear guidelines with regards to the total number of series per 
Prometheus host.   Some of the metrics have a relatively higher 
cardinality, thus causing the higher number of series.   I'd appreciate any 
advice you can provide me with.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/608224ac-f7fc-4dbb-9b18-0e7ed4f6e604%40googlegroups.com.


[prometheus-users] server returned HTTP status 500 Internal Server Error

2020-05-19 Thread Valliappan RM
Hi
Trying to monitor fortigate Firewall
Getting this error -server returned HTTP status 500 Internal Server Error
Trying based on this
https://grafana.com/grafana/dashboards/7567

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/44feee19-502c-4344-9335-3a61ed30c1e1%40googlegroups.com.


[prometheus-users] Hosted Prometheus with Grafana Cloud

2020-05-19 Thread Colton Conor
We are exploring the option of paying for Grafana Cloud's service. In
addition to hosting Grafana, it comes with the ability to store metrics
from Prometheus and Graphite. The documentation says:

To send data using Prometheus you need the following:

   - A running instance of Prometheus.
   - In your Prometheus configuration, add a remote_write section.

So it doesn't really sound like this Cloud option actually host
Prometheus itself, but instead just collect metrics from an onsite
Prometheus server which then sends it to the cloud using remote_write.

My question is how beneficial is this to just running our own onsite
assuming we are going to have to install and run Prometheus anyways? Would
the one-site Prometheus sever just not need that much storage space since
it would remote write to the cloud, and then delete the onsite metrics?

Also, since Prometheus also supports exporting to Graphite, which is also
supported by the cloud service, would it be better to send the metrics
using Prometheus with remote_write, or Prometheus export to
Graphite using carbon-relay-ng?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMDdSzOYazWpaTuZY21Ob0quWSPoqQgNKD0HvH_GU9PwjA%3DDDw%40mail.gmail.com.


Re: [prometheus-users] derive alert severity from other labels

2020-05-19 Thread Brian Brazil
On Tue, 19 May 2020 at 10:25, Roland Mieslinger  wrote:

> Am Dienstag, 19. Mai 2020 10:46:32 UTC+2 schrieb Brian Brazil:
>>
>> On Tue, 19 May 2020 at 09:25, Roland Mieslinger  wrote:
>>
>>> Hi,
>>>
>>> we are using the same set of alert rules for both, our production and qa
>>> environment, with the severity label set to a value based on what is
>>> appropriate for production.
>>> As a consequence, alert severity is too high for most alerts in our qa
>>> environment.
>>>
>>
>>> The environment is available as a label form every metric, I could of
>>> course duplicate all alert rules, filter by environment label, and set the
>>> appropriate severity label; very tedious in the long run, but so far the
>>> only solution that came to my mind,
>>>
>>
>> The usual way I'd handle this is via routing alerts differently in the
>> alertmanager for dev/qa environments.
>>
>
> But this would leave the severity at the same level, or am I missing a way
> to change it this way?
>

It would, however the purpose of severity is to be used for routing in the
alertmanager.

-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLrAataOVgxZkpKV6mKPs2yOAM3wRSiJXbOtXH7YESKDYw%40mail.gmail.com.


[prometheus-users] Re: Expose java metrics to prometheus

2020-05-19 Thread Vu Tuan Dat
Create a metric name as you wanted and add labels to it 
https://github.com/prometheus/client_java#labels

On Tuesday, May 19, 2020 at 4:17:00 PM UTC+7, Nidhi Sharma wrote:
>
> Thanks for replying. Can we do this by Mbeans and JMX ? I can create 
> Mbeans and register it in my application but I dont know how to capture the 
> output in Prometheus time series format. For ex : 
> Product_version{product="SomeProduct",version="2.9.1"}
>
>
> On Tuesday, May 19, 2020 at 2:09:51 PM UTC+5:30, Vu Tuan Dat wrote:
>>
>> this can help https://github.com/prometheus/client_java
>>
>> On Tuesday, May 19, 2020 at 1:32:10 PM UTC+7, Nidhi Sharma wrote:
>>>
>>> Hi, I have a web app running on tomcat. There is a hello resource (REST 
>>> API) to check the health of the app. Output if this resource gives the 
>>> version of the app. I want output of this resource to be captured as a 
>>> metric and pulled by Prometheus. Please help on how can I proceed.  
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5c106729-690f-43bf-ad88-abfbc1ba775a%40googlegroups.com.


Re: [prometheus-users] derive alert severity from other labels

2020-05-19 Thread Roland Mieslinger
Am Dienstag, 19. Mai 2020 10:46:32 UTC+2 schrieb Brian Brazil:
>
> On Tue, 19 May 2020 at 09:25, Roland Mieslinger  > wrote:
>
>> Hi,
>>
>> we are using the same set of alert rules for both, our production and qa 
>> environment, with the severity label set to a value based on what is 
>> appropriate for production.
>> As a consequence, alert severity is too high for most alerts in our qa 
>> environment. 
>>
>
>> The environment is available as a label form every metric, I could of 
>> course duplicate all alert rules, filter by environment label, and set the 
>> appropriate severity label; very tedious in the long run, but so far the 
>> only solution that came to my mind,
>>
>
> The usual way I'd handle this is via routing alerts differently in the 
> alertmanager for dev/qa environments.
>

But this would leave the severity at the same level, or am I missing a way 
to change it this way?

 --
Roland

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/faffee64-98cf-4144-9cf2-74cbcb105c5c%40googlegroups.com.


[prometheus-users] Re: derive alert severity from other labels

2020-05-19 Thread Roland Mieslinger
Am Dienstag, 19. Mai 2020 10:33:20 UTC+2 schrieb Vu Tuan Dat:
>
> you can try: 
> severity: '{{ if eq $labels.environment "qa" }} warn {{ else }} page {{ 
> end }}'
>
>>
>>
Nice hack, I haven't thought about (ab)using the templating engine for that.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0b6a0aba-69af-45bc-aeee-8afd4d70aff9%40googlegroups.com.


[prometheus-users] Re: Expose java metrics to prometheus

2020-05-19 Thread Nidhi Sharma
Thanks for replying. Can we do this by Mbeans and JMX ? I can create Mbeans 
and register it in my application but I dont know how to capture the output 
in Prometheus time series format. For ex : 
Product_version{product="SomeProduct",version="2.9.1"}


On Tuesday, May 19, 2020 at 2:09:51 PM UTC+5:30, Vu Tuan Dat wrote:
>
> this can help https://github.com/prometheus/client_java
>
> On Tuesday, May 19, 2020 at 1:32:10 PM UTC+7, Nidhi Sharma wrote:
>>
>> Hi, I have a web app running on tomcat. There is a hello resource (REST 
>> API) to check the health of the app. Output if this resource gives the 
>> version of the app. I want output of this resource to be captured as a 
>> metric and pulled by Prometheus. Please help on how can I proceed.  
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2e4c7d5e-8222-4b26-861a-01b451af5480%40googlegroups.com.


[prometheus-users] Re: Monitor specific application process in Linux

2020-05-19 Thread Brian Candler
You can use a little script to write metrics to a textfile and pick them up 
with node_exporter's textfile_collector, and run it periodically (e.g. from 
cron).

The textfile_collector also exposes the timestamp when the file was last 
modified, so you can alert if it stops being updated.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1a49fda8-3446-4583-b32e-4f3ba7400331%40googlegroups.com.


[prometheus-users] Re: NTP Metrics.

2020-05-19 Thread Brian Candler
The ntp collector is disabled by default 
: you can 
turn it on with a command-line flag. However, the timex collector is 
enabled by default (e.g. node_timex_sync_status, 
node_timex_estimated_error_seconds)

For a rough idea of how the target clock compares to the prometheus 
server's clock, you can also just do:

node_time_seconds - timestamp(node_time_seconds)

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a32e52e0-6c26-4a7b-9766-ac10daede241%40googlegroups.com.


Re: [prometheus-users] derive alert severity from other labels

2020-05-19 Thread Brian Brazil
On Tue, 19 May 2020 at 09:25, Roland Mieslinger  wrote:

> Hi,
>
> we are using the same set of alert rules for both, our production and qa
> environment, with the severity label set to a value based on what is
> appropriate for production.
> As a consequence, alert severity is too high for most alerts in our qa
> environment.
>

> The environment is available as a label form every metric, I could of
> course duplicate all alert rules, filter by environment label, and set the
> appropriate severity label; very tedious in the long run, but so far the
> only solution that came to my mind,
>

The usual way I'd handle this is via routing alerts differently in the
alertmanager for dev/qa environments.

Brian


>
> Are there better ways to achieve this, what am I missing?
>
> Something like the ternary operator would be helpful in this case, e. g.:
>   labels:
> severity: environment=="qa" ? "warn" : "page"
>
> Alternativly some kind of "functional if " could solve this as well:
>   labels:
> severity: iff(environment=="qa", "warn", "page")
> note: depending on the implementation this could cause performance issues
> if the expression engine requires
>   evaluation of all parameters passed to the function
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/19c8b790-aa37-435c-a055-a1e41fb6033d%40googlegroups.com
> 
> .
>


-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLqzd%2B-aK3f0W6GLXkAhcjv3n059i%3DwDbG01%3DkMozVUYdg%40mail.gmail.com.


[prometheus-users] Re: Expose java metrics to prometheus

2020-05-19 Thread Vu Tuan Dat
this can help https://github.com/prometheus/client_java

On Tuesday, May 19, 2020 at 1:32:10 PM UTC+7, Nidhi Sharma wrote:
>
> Hi, I have a web app running on tomcat. There is a hello resource (REST 
> API) to check the health of the app. Output if this resource gives the 
> version of the app. I want output of this resource to be captured as a 
> metric and pulled by Prometheus. Please help on how can I proceed.  
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/13d47de8-d318-4960-9306-df6a4a4306a1%40googlegroups.com.


[prometheus-users] Re: Monitor specific application process in Linux

2020-05-19 Thread Vu Tuan Dat
Ideally, write a exporter for your specific process, it's not that hard.

Or instrument directly into your app.

On Tuesday, May 19, 2020 at 2:11:47 PM UTC+7, Juan Rosero wrote:
>
> Hello,
>
> I've been reading a lot on different sites and this User Group as well, 
> but have not come up with a clear answer. I need to monitor a specific 
> application process in Linux and verify if it's running and I've been 
> reading about *--collector.processes* and enabling it on Node Exporter. 
> Ideally, I would like to narrow down to that specific process instead of 
> getting info on all running processes on the system. What's the best 
> approach for this and correct syntax?
>
> Many thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/3a18e299-4f97-4446-9d3d-c5755ae47d71%40googlegroups.com.


[prometheus-users] Re: derive alert severity from other labels

2020-05-19 Thread Vu Tuan Dat
you can try: 
severity: '{{ if eq $labels.environment "qa" }} warn {{ else }} page {{ end 
}}'

On Tuesday, May 19, 2020 at 3:25:01 PM UTC+7, Roland Mieslinger wrote:
>
> Hi,
>
> we are using the same set of alert rules for both, our production and qa 
> environment, with the severity label set to a value based on what is 
> appropriate for production.
> As a consequence, alert severity is too high for most alerts in our qa 
> environment.
>
> The environment is available as a label form every metric, I could of 
> course duplicate all alert rules, filter by environment label, and set the 
> appropriate severity label; very tedious in the long run, but so far the 
> only solution that came to my mind,
>
> Are there better ways to achieve this, what am I missing?
>
> Something like the ternary operator would be helpful in this case, e. g.:
>   labels:
> severity: environment=="qa" ? "warn" : "page"
>
> Alternativly some kind of "functional if " could solve this as well:
>   labels:
> severity: iff(environment=="qa", "warn", "page")
> note: depending on the implementation this could cause performance issues 
> if the expression engine requires
>   evaluation of all parameters passed to the function
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b1a54c2e-22dc-4ba2-807b-04bb62ccd952%40googlegroups.com.


[prometheus-users] derive alert severity from other labels

2020-05-19 Thread Roland Mieslinger
Hi,

we are using the same set of alert rules for both, our production and qa 
environment, with the severity label set to a value based on what is 
appropriate for production.
As a consequence, alert severity is too high for most alerts in our qa 
environment.

The environment is available as a label form every metric, I could of 
course duplicate all alert rules, filter by environment label, and set the 
appropriate severity label; very tedious in the long run, but so far the 
only solution that came to my mind,

Are there better ways to achieve this, what am I missing?

Something like the ternary operator would be helpful in this case, e. g.:
  labels:
severity: environment=="qa" ? "warn" : "page"

Alternativly some kind of "functional if " could solve this as well:
  labels:
severity: iff(environment=="qa", "warn", "page")
note: depending on the implementation this could cause performance issues 
if the expression engine requires
  evaluation of all parameters passed to the function




-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/19c8b790-aa37-435c-a055-a1e41fb6033d%40googlegroups.com.


[prometheus-users] NTP Metrics.

2020-05-19 Thread Yagyansh S. Kumar
Hi. I have my own NTP server configured at x.x.x.x . Now, I want to check 
if my 10 other servers are synchronized with my NTP server or not. I have 
gone through a lot of threads and found different opinions with different 
answers. Also, I guess node_ntp_drift_seconds is an old metrics and doesn't 
exists anymore in the update node_exporter version. What query/ combination 
of queries should I use to check the sync., if it is not in sync I also 
want to know the deviation. Totally confused on what to use for this and 
what is reliable.

Can someone help?
Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bfd13ec3-c799-490d-91b3-64f7c1396800%40googlegroups.com.


[prometheus-users] Alerts in Alertmanger cannot be cleared

2020-05-19 Thread Vu Tuan Dat
Hello,


I got an issue with Alertmanager and Prometheus synchronization.

My system has a cluster of two Alertmanager nodes for multiple Prometheus 
clusters. 

Yesterday, I updated Prometheus config (targets) using `/-/reload` 
endpoints then I got several unresolved alerts which were NOT (for sure) in 
Prometheus.

At first, I thought it was Alertmanager kind of bug, I tried:
- Restart AM container
- Delete nflog
- Delete nflog and recreate AM container

However, neither of those worked for me.

Does anyone have the same situation or solution for deleting those annoying 
alerts?


Best Regards,
Dat

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/616181a6-be35-4b5e-a589-5714dcbab503%40googlegroups.com.


[prometheus-users] Monitor specific application process in Linux

2020-05-19 Thread Juan Rosero
Hello,

I've been reading a lot on different sites and this User Group as well, but 
have not come up with a clear answer. I need to monitor a specific 
application process in Linux and verify if it's running and I've been 
reading about *--collector.processes* and enabling it on Node Exporter. 
Ideally, I would like to narrow down to that specific process instead of 
getting info on all running processes on the system. What's the best 
approach for this and correct syntax?

Many thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/00a0c828-6047-4cb7-a42a-2d6fa7b42d33%40googlegroups.com.


[prometheus-users] How to get the count or sum by hour in day or daywise in a month from PromQL in Grafana

2020-05-19 Thread Rajesh Reddy Nachireddi
Hi,

How to get the count or sum by hour in day or daywise in a month from
PromQL in Grafana ?

we want to get the following:

1. when we select daily report -
ex: on Monday
12 AM to 1AM - 100
1AM - 2AM - 200
2AM - 3AM - 400


on tuesday
12 AM to 1AM - 100
1AM - 2AM - 200
2AM - 3AM - 400


similarly for monthly reports

Jan
1/1 - 100
1/2 - 300
1/3 -0
1/4 -234
..
1/31 - 456

Please suggest a way to get these kind of reports using recording rules or
promql instead of external scripts.

Thanks,

Rajesh

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAEyhnp%2Bt6u8HL%3DG1cAQRYTsexFCSkGdiUpmT3Giy6piSwNP0CQ%40mail.gmail.com.


Re: [prometheus-users] Re: Pushgateway or StatsD

2020-05-19 Thread 'Albert Aleksandrov' via Prometheus Users
Thanks for the idea

воскресенье, 17 мая 2020 г., 0:24:24 UTC+3 пользователь Matthias Rampke 
написал:
>
> If you care about the individual event to the extent that you want to see 
> it individually, you are probably better off using an event tracking system 
> like the ELK stack.
>
> Prometheus shines when you only need to track aggregates, such as the 
> number of uploads, and the total time of uploads. It can do math to get the 
> average duration and such from that, but it cannot track "this upload in 
> particular took that long". From this perspective, the metrics staying 
> constant when there are no new uploads makes sense: the *total* does not 
> stop existing just because it did not increase in the last minute.
>
>
>
> /MR
>
> On Sat, May 16, 2020, 19:06 'Albert Aleksandrov' via Prometheus Users <
> promethe...@googlegroups.com > wrote:
>
>> One minute later I thought about deleting metrics from registry after 
>> being scraped
>> and registering it back when upload appear.
>>
>> суббота, 16 мая 2020 г., 19:59:28 UTC+3 пользователь Albert Aleksandrov 
>> написал:
>>>
>>> Hi all!
>>>
>>> (*Django app*)
>>>
>>> We have a business entity called *upload*. It has such parameters 
>>> (labels) as 
>>> 1. *series* (series1, series2, etc), 
>>> 2. *processing_duration* (in seconds)*, *
>>> 3. *status *(success, running, terminated),
>>> 4. some another labels.
>>>
>>> With Prometheus we would like to count:
>>> 1. uploads summed by status,
>>> 2. average duration by series,
>>> 3. something else.
>>>
>>> Now I see metrics to look like this to archive our goals:
>>>
>>> *upload{series="series1", status="terminated"} 1  # actually the value 
>>> is always 1*
>>>
>>> *upload{series="series2", status="terminated"} 1*
>>>
>>> *upload{series="series2", status="success"} 1*
>>>
>>> *upload_processing_duration{series="series1"} 20 *
>>> *upload_**processing_**duration{series="series2"} 30*
>>>
>>> With such metrics queries would be like this:
>>>
>>> *sum(upload{status="terminated"}) or sum(upload{series="series1"})*
>>> *avg(upload_duration{series="series1"})*
>>>
>>> So as to have such raw (plain, atomic) data in Prometheus one should to 
>>> push
>>> them as they appear or to save them by one (without overriding) if we use* 
>>> /metrics* 
>>> endpoint which then is being scraped by Prometheus with some interval.
>>>
>>> I tried pushgateway but when metrics pushed they stay there with the 
>>> same values
>>> until overriding or deleting. And it happens that Prometheus scrapes the 
>>> same values again 
>>> and again instead of to scrape them and forget delete.
>>>
>>> [image: wefwe.jpg]
>>>
>>> Could you please say how can I archive such behaviour?
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to promethe...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/c7870b28-7a40-417d-9b74-d6c2b48390b4%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d33dc8f2-d8cd-4529-805d-a8ba031a53cf%40googlegroups.com.


[prometheus-users] Re: Alertmanager Pod is failing - CrashLoopBackOff

2020-05-19 Thread Brian Candler
What the error is saying is you tried to add a setting "teams" under 
opsgenie_configs, but opsgenie_configs does not recognise such an option.  
The set of allowed options is defined here:
https://prometheus.io/docs/alerting/configuration/#opsgenie_config

Maybe you wanted something like:

opsgenie_configs:
  - name: saas-ops
responders:
  - name: blah
type: team

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/52a29dad-d098-4365-ae11-c298f6f9e1e5%40googlegroups.com.


[prometheus-users] Expose java metrics to prometheus

2020-05-19 Thread Nidhi Sharma
Hi, I have a web app running on tomcat. There is a hello resource (REST 
API) to check the health of the app. Output if this resource gives the 
version of the app. I want output of this resource to be captured as a 
metric and pulled by Prometheus. Please help on how can I proceed.  

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d20c7fa9-5d70-4472-a044-6f73a2dde4e1%40googlegroups.com.