[prometheus-users] Difference between two timestamps

2020-06-30 Thread 'Metrics Searcher' via Prometheus Users
Hello.

I'm trying to collect timestamps of start and stop of a cronjob vis push 
gateway.
All works fine if my job runs each 5 minutes. 
However, the job starts at 5 minutes of each hour, I see that the result of 
the different looks weird:

0:Array[1593561885,20]
1:Array[1593561900,20]
2:Array[1593561915,-3580]
3:Array[1593561930,9]

Could someone give me a clue about how to solve this issue? I probably use 
the wrong way to collect metrics.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7faaa363-f786-4a6b-b906-933407507e4cn%40googlegroups.com.


[prometheus-users] node_exporter support to read from libraries

2020-06-30 Thread Ranganath Sunku
Hi Node_exporter maintainers,
Node_exporter today supports reading metrics from file systems such as proc 
and sys. 
Would it be acceptable for a collector within node_exporter to rely on an 
external library to receive hardware metrics?

For example, one can obtain RedFish metrics in-band using libredfish 
library. 
Would it be ok for node_exporter to accept such a collector that relies on 
a library?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/60851411-4ab6-406f-a6ee-b12535c4b985o%40googlegroups.com.


[prometheus-users] Re: empty external labels are not allowed for Thanos block

2020-06-30 Thread Thomas Will
Can anyone please help me out on this issue. It's a blocker for me.

Thanks,

On Tuesday, June 30, 2020 at 11:18:18 PM UTC+5:30, Thomas Will wrote:
>
> [image: Screenshot 1942-04-09 at 8.00.08 PM.png]
> I can see external_labels on prometheus UI.
>
> On Tuesday, June 30, 2020 at 8:09:06 AM UTC+5:30, Thomas Will wrote:
>>
>>
>>
>> "ulid": "01EC1E4WSJVFQWCFH0J0CKRQTQ",
>> "minTime": 1593480811780,
>> "maxTime": 159348090,
>> "stats": {
>> "numSamples": 25450,
>> "numSeries": 5046,
>> "numChunks": 5046
>> },
>> "compaction": {
>> "level": 1,
>> "sources": [
>> "01EC1E4WSJVFQWCFH0J0CKRQTQ"
>> ]
>> },
>> "version": 1
>> This is the Prometheus meta.json file. I think external_labels should be 
>> present in this?
>>
>> On Tuesday, June 30, 2020 at 6:39:00 AM UTC+5:30, Thomas Will wrote:
>>>
>>> Hello guys, while setting up sidecar, I am getting this error.
>>>
>>> level=warn ts=2020-06-30T00:59:28.739042462Z caller=sidecar.go:131 
>>> msg="failed 
>>> to fetch initial external labels. Is Prometheus running? Retrying" 
>>> err="request 
>>> config against http://ec2-54-166-121-208.compute-
>>> level=info ts=2020-06-30T00:59:28.786279597Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC194NMFNVXJV3P3ZJV711FX
>>> level=error ts=2020-06-30T00:59:28.788272247Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC194NMFNVXJV3P3ZJV711FX err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:28.797210843Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC19C00DSNBYDF0QY6SJ0RCJ
>>> level=error ts=2020-06-30T00:59:28.798609749Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC19C00DSNBYDF0QY6SJ0RCJ err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:28.80781908Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC19N4ZDHV3A9ZAV4CW0W50Z
>>> level=error ts=2020-06-30T00:59:28.809984074Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC19N4ZDHV3A9ZAV4CW0W50Z err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:28.871277989Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC19Y9XM4EVA6C5XQB6QHA1H
>>> level=error ts=2020-06-30T00:59:28.873172562Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC19Y9XM4EVA6C5XQB6QHA1H err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:28.941585234Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC1A7EXED6X31J8AVMME60N2
>>> level=error ts=2020-06-30T00:59:28.943638781Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC1A7EXED6X31J8AVMME60N2 err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:28.959223519Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC1AGKWE20FEFYPG2EVQZ973
>>> level=error ts=2020-06-30T00:59:28.960929593Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC1AGKWE20FEFYPG2EVQZ973 err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:28.984357407Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC1ASRVEPMF6YDVHWCZ3F9HG
>>> level=error ts=2020-06-30T00:59:28.986786419Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC1ASRVEPMF6YDVHWCZ3F9HG err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:28.18447Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC1B2XTFF24K5QZZCW4FECRW
>>> level=error ts=2020-06-30T00:59:29.00133826Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC1B2XTFF24K5QZZCW4FECRW err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:29.020650171Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC1BC2SHK8Z9HX3DNAAAPCVM
>>> level=error ts=2020-06-30T00:59:29.024475442Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC1BC2SHK8Z9HX3DNAAAPCVM err="empty external labels are 
>>> not allowed for Thanos block."
>>> level=info ts=2020-06-30T00:59:29.032014206Z caller=shipper.go:200 
>>> msg="upload 
>>> new block" id=01EC1BN7RFWV6C93ZNG9BAYZ5M
>>> level=error ts=2020-06-30T00:59:29.033591482Z caller=shipper.go:165 
>>> msg="shipping 
>>> failed" block=01EC1BN7RFWV6C93ZNG9BAYZ5M err="empty external labels are 
>>> not allowed for Thanos block."
>>>
>>> sidecar config :-
>>>
>>> # thanos sidecar
>>> ./thanos sidecar \
>>> --prometheus.url 
>>> "http://ec2-42-685-242-86.compute-1.amazonaws.com:9090; \
>>> --tsdb.path "data" \
>>> --cluster.address  "localhost:20056" \
>>> --http-address "localhost:20057" \
>>> --grpc-address "localhost:20058" \

Re: [prometheus-users] Number of metrics per job

2020-06-30 Thread Julien Pivotto
On 30 Jun 13:39, Tomer Leibovich wrote:
> Can anyone share a promql query that can display the number of metrics per 
> job?


sum by(job) (scrape_samples_post_metric_relabeling)

> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/c42f9bf0-768d-4ef0-bdf0-2537c62f8962o%40googlegroups.com.


-- 
Julien Pivotto
@roidelapluie

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200630204506.GA1110403%40oxygen.


[prometheus-users] Not able to receive emails from alert manager

2020-06-30 Thread Ali Shiekh
Hi guys,

i want to configure my alertmanager to send alerts on emails.
i have a rule set for instance if it goes down, an alert is activated.

Although i'm able to check alerts on Web UI
[image: 1.JPG]

But i'm not able to receive any emails.

*Here is my alertmanager.yml file:*

global:

route:
  group_by: [Alertname]
  receiver: email-me

receivers:
  - name: email-me
email_configs:
- to: 'x...@gmail.com'
  from: 'x...@gmail.com'
  smarthost: 'smtp.gmail.com:587'
  auth_username: "x...@gmail.com"
  auth_password: "xx"


And I've also checked alertmanager logs and it gives me this:

Jun 30 17:17:17 localhost.localdomain alertmanager[41098]: level=error 
ts=2020-06-30T21:17:17.150686531Z caller=notify.go:332 
component==dispatcher msg="Error on notify" err="Post 
http://127.0.0.1:5001/: dial tcp 127.0.0.1:5001: connect: connection 
refused"
Jun 30 17:17:17 localhost.localdomain alertmanager[41098]: level=error 
ts=2020-06-30T21:17:17.150754532Z 
caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" 
num_alerts=1 err="Post http://127.0.0.1:5001/: dial tcp 127.0.0.1:5001: 
connect: connection refused"

I've searched a lot online, couldn't find anything!
Can someone please help me with this, Thanks in advance!



-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0084a854-448b-4bd1-a925-eecdcc8bbfa3o%40googlegroups.com.


Re: [prometheus-users] Number of metrics per job

2020-06-30 Thread Tomer Leibovich
Can anyone share a promql query that can display the number of metrics per job?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c42f9bf0-768d-4ef0-bdf0-2537c62f8962o%40googlegroups.com.


Re: [prometheus-users] Division between two queries not working?

2020-06-30 Thread Cole Beasley
Awesome thanks for the help! Still learning the basics.

On Tuesday, June 30, 2020 at 11:06:11 AM UTC-6, Julius Volz wrote:
>
> By default, binary operators match series from the left and right side on 
> exactly identical label sets. But you keep the log_type label on the left, 
> while aggregating it away on the right. You can either aggregate it away on 
> both sides:
>
>   sum(type_counter{log_type="ERROR"}) / sum(type_counter)
>
> ...or ignore that label for matching:
>
>   type_counter{log_type="ERROR"} / ignoring(log_type) sum(type_counter)
>
> Btw., I assume type_counter is a counter metric. You should typically 
> never care about the absolute value of a counter metric and instead always 
> rate(...) it (or increase(), etc.) before using.
>
> On Tue, Jun 30, 2020 at 6:11 PM Cole Beasley  > wrote:
>
>> Hi there,
>>
>> This might be a very basic question but I'm trying to get the ration of 
>> error log messages / total error messages. Using grok_exporter I have the 
>> metric type_counter which returns 
>> ElementValue
>> type_counter{instance="localhost:9144",job="grok",log_type="AUTH"} 45
>> type_counter{instance="localhost:9144",job="grok",log_type="ERROR"} 55
>> type_counter{instance="localhost:9144",job="grok",log_type="IN"} 17603
>> type_counter{instance="localhost:9144",job="grok",log_type="NESTED"} 971
>> I simply then want (type_counter{log_type="ERROR"}) / 
>> (sum(type_counter)), but this does not return a value, this is odd to me as 
>> both type_counter{log_type="ERROR"} and sum(type_counter) return their 
>> expected values and will divide by themselves or manually entered integers 
>> (such as (18,000 / sum(type_counter)). I'm not sure why the division here 
>> is not working, any ideas?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to promethe...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/c1b630e6-60de-42dd-aeab-51e57baf9dcdo%40googlegroups.com
>>  
>> 
>> .
>>
>
>
> -- 
> Julius Volz
> PromLabs - promlabs.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0db45f79-5bcf-46ff-847a-3ac982666dfeo%40googlegroups.com.


[prometheus-users] Re: empty external labels are not allowed for Thanos block

2020-06-30 Thread Thomas Will


[image: Screenshot 1942-04-09 at 8.00.08 PM.png]
I can see external_labels on prometheus UI.

On Tuesday, June 30, 2020 at 8:09:06 AM UTC+5:30, Thomas Will wrote:
>
>
>
> "ulid": "01EC1E4WSJVFQWCFH0J0CKRQTQ",
> "minTime": 1593480811780,
> "maxTime": 159348090,
> "stats": {
> "numSamples": 25450,
> "numSeries": 5046,
> "numChunks": 5046
> },
> "compaction": {
> "level": 1,
> "sources": [
> "01EC1E4WSJVFQWCFH0J0CKRQTQ"
> ]
> },
> "version": 1
> This is the Prometheus meta.json file. I think external_labels should be 
> present in this?
>
> On Tuesday, June 30, 2020 at 6:39:00 AM UTC+5:30, Thomas Will wrote:
>>
>> Hello guys, while setting up sidecar, I am getting this error.
>>
>> level=warn ts=2020-06-30T00:59:28.739042462Z caller=sidecar.go:131 
>> msg="failed 
>> to fetch initial external labels. Is Prometheus running? Retrying" 
>> err="request 
>> config against http://ec2-54-166-121-208.compute-
>> level=info ts=2020-06-30T00:59:28.786279597Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC194NMFNVXJV3P3ZJV711FX
>> level=error ts=2020-06-30T00:59:28.788272247Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC194NMFNVXJV3P3ZJV711FX err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:28.797210843Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC19C00DSNBYDF0QY6SJ0RCJ
>> level=error ts=2020-06-30T00:59:28.798609749Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC19C00DSNBYDF0QY6SJ0RCJ err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:28.80781908Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC19N4ZDHV3A9ZAV4CW0W50Z
>> level=error ts=2020-06-30T00:59:28.809984074Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC19N4ZDHV3A9ZAV4CW0W50Z err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:28.871277989Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC19Y9XM4EVA6C5XQB6QHA1H
>> level=error ts=2020-06-30T00:59:28.873172562Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC19Y9XM4EVA6C5XQB6QHA1H err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:28.941585234Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC1A7EXED6X31J8AVMME60N2
>> level=error ts=2020-06-30T00:59:28.943638781Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC1A7EXED6X31J8AVMME60N2 err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:28.959223519Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC1AGKWE20FEFYPG2EVQZ973
>> level=error ts=2020-06-30T00:59:28.960929593Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC1AGKWE20FEFYPG2EVQZ973 err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:28.984357407Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC1ASRVEPMF6YDVHWCZ3F9HG
>> level=error ts=2020-06-30T00:59:28.986786419Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC1ASRVEPMF6YDVHWCZ3F9HG err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:28.18447Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC1B2XTFF24K5QZZCW4FECRW
>> level=error ts=2020-06-30T00:59:29.00133826Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC1B2XTFF24K5QZZCW4FECRW err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:29.020650171Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC1BC2SHK8Z9HX3DNAAAPCVM
>> level=error ts=2020-06-30T00:59:29.024475442Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC1BC2SHK8Z9HX3DNAAAPCVM err="empty external labels are 
>> not allowed for Thanos block."
>> level=info ts=2020-06-30T00:59:29.032014206Z caller=shipper.go:200 
>> msg="upload 
>> new block" id=01EC1BN7RFWV6C93ZNG9BAYZ5M
>> level=error ts=2020-06-30T00:59:29.033591482Z caller=shipper.go:165 
>> msg="shipping 
>> failed" block=01EC1BN7RFWV6C93ZNG9BAYZ5M err="empty external labels are 
>> not allowed for Thanos block."
>>
>> sidecar config :-
>>
>> # thanos sidecar
>> ./thanos sidecar \
>> --prometheus.url "http://ec2-42-685-242-86.compute-1.amazonaws.com:9090; 
>> \
>> --tsdb.path "data" \
>> --cluster.address  "localhost:20056" \
>> --http-address "localhost:20057" \
>> --grpc-address "localhost:20058" \
>> --log.level=debug \
>> >> log/sidecar.log 2>&1 &
>>
>>
>>
>> prometheus config file :-
>>
>> global:
>>   scrape_interval: 15s
>>   evaluation_interval: 1m
>>   external_labels:
>> monitor: prometheus_monitor
>> cluster: 102323
>>
>>
>>
>> Can 

Re: [prometheus-users] prometheus cannot collect metrics correctly after binary update

2020-06-30 Thread Yashar Nesabian
Hi Julien, I double-checked the envs and we don't have such an environment 

On Tuesday, June 30, 2020 at 10:04:19 PM UTC+4:30, Julien Pivotto wrote:
>
> Can you check if you have leftover JAEGER environment variables by any 
> chance? 
>
> On 30 Jun 09:57, nesa...@gmail.com wrote: 
> > Hi, 
> > We have a Prometheus server which is our primary Prometheus server. we 
> had 
> > a Prometheus federation which used to gather all the metrics from the 
> main 
> > Prometheus every 90 seconds. we now decided to stop using federation as 
> the 
> > backup solution and move the Prometheus configuration to the ansible so 
> > both servers can have the same configuration at the same time, the main 
> > Prometheus and the federation's version was 2.15.2. after changing the 
> > federation server's configuration and putting the same configuration as 
> the 
> > primary Prometheus on it, I decided to update the binary file to 2.19.2. 
> > At first, I got service unavailable alert for about 10 minutes, after an 
> > hour, we start to get lots of alerts claiming exporter has no data. 
> > The problem got fixed after 30 minutes  but again we start to get lots 
> of 
> > alerts about it after few hours. 
> > when I checked the logs, I saw all jobs have "contecxt deadline 
> exceeded" 
> > alert (during the problem I couldn't connect to the web interface as 
> well) 
> > whereas I don't get any alert from the primary Prometheus and everything 
> > works fine there. 
> > here is my systemd configuration for the secondary Prometheus: 
> > 
> >  style='color:#808030; 
> > '>[Unit > style='color:#808030; '>] 
> > Description style='color:#808030; 
> > '>=Prometheus 
> > After=network-online.target 
> > 
> > [Service] 
> > Type=simple 
> > Environment style='color:#808030; 
> > '>=" style='color:#797997; 
> > '>GOMAXPROCS= > style='color:#e6; '>8" 
> > User=prometheus 
> > Group=prometheus 
> > ExecReload style='color:#808030; 
> > '>=/bin/kill -HUP  > style='color:#797997; '>$MAINPID 
> > ExecStart style='color:#808030; 
> > '>=/usr/local/sbin/prometheus 
> > \ 
> >   --config.file > style='color:#808030; '>=/etc/prometheus/prometheus.yml \ 
> >   --storage.tsdb.path > style='color:#808030; '>=/var/lib/prometheus \ 
> >   --storage.tsdb.retention.time > style='color:#808030; '>=30d \ 
> >   --storage.tsdb.retention.size > style='color:#808030; '>=275GB \ 
> >   --web.console.libraries > style='color:#808030; '>=/etc/prometheus/console_libraries \ 
> >   --web.console.templates > style='color:#808030; '>=/etc/prometheus/consoles \ 
> >   --web. > style='color:#bb7977; font-weight:bold; '>enable-admin-api  > style='color:#0f69ff; '>\ 
> >   --web.listen-address > style='color:#808030; '>=0 > style='color:#80; font-weight:bold; '>. > style='color:#008c00; '>0.0 > style='color:#80; font-weight:bold; '>. > style='color:#008c00; '>0: > style='color:#008c00; '>9090 
> > 
> > CapabilityBoundingSet > style='color:#808030; '>=CAP_SET_UID 
> > LimitNOFILE style='color:#808030; 
> > '>=65000 
> > LockPersonality > style='color:#808030; '>=true 
> > NoNewPrivileges > style='color:#808030; '>=true 
> > MemoryDenyWriteExecute > style='color:#808030; '>=true 
> > PrivateDevices > style='color:#808030; '>=true 
> > PrivateTmp style='color:#808030; 
> > '>=true 
> > ProtectHome style='color:#808030; 
> > '>=true 
> > RemoveIPC style='color:#808030; 
> > '>=true 
> > RestrictSUIDSGID > style='color:#808030; '>=true 
> > CPUAccounting > style='color:#808030; '>=yes 
> > MemoryAccounting > style='color:#808030; '>=yes 
> > #SystemCallFilter=@signal @timer 
> > 
> > ReadWritePaths > style='color:#808030; '>=/var/lib/prometheus 
> > 
> > PrivateUsers > style='color:#808030; '>=true 
> > ProtectControlGroups > style='color:#808030; '>=true 
> > ProtectKernelModules > style='color:#808030; '>=true 
> > ProtectKernelTunables > style='color:#808030; '>=true 
> > ProtectSystem > style='color:#808030; '>=strict 
> > 
> > 
> > SyslogIdentifier > style='color:#808030; '>=prometheus 
> > Restart=always 
> > 
> > [Install] 
> > WantedBy=multi-user.target 
> >  
> >  
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Prometheus Users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to promethe...@googlegroups.com . 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/5362956f-268d-477e-9d9b-0146e73f2fden%40googlegroups.com.
>  
>
>
>
> -- 
> Julien Pivotto 
> @roidelapluie 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/819eead2-969a-4552-83b2-ba454b6d9cd2o%40googlegroups.com.


Re: [prometheus-users] prometheus cannot collect metrics correctly after binary update

2020-06-30 Thread Julien Pivotto
Can you check if you have leftover JAEGER environment variables by any
chance?

On 30 Jun 09:57, nesa...@gmail.com wrote:
> Hi,
> We have a Prometheus server which is our primary Prometheus server. we had 
> a Prometheus federation which used to gather all the metrics from the main 
> Prometheus every 90 seconds. we now decided to stop using federation as the 
> backup solution and move the Prometheus configuration to the ansible so 
> both servers can have the same configuration at the same time, the main 
> Prometheus and the federation's version was 2.15.2. after changing the 
> federation server's configuration and putting the same configuration as the 
> primary Prometheus on it, I decided to update the binary file to 2.19.2. 
> At first, I got service unavailable alert for about 10 minutes, after an 
> hour, we start to get lots of alerts claiming exporter has no data. 
> The problem got fixed after 30 minutes  but again we start to get lots of 
> alerts about it after few hours.
> when I checked the logs, I saw all jobs have "contecxt deadline exceeded" 
> alert (during the problem I couldn't connect to the web interface as well) 
> whereas I don't get any alert from the primary Prometheus and everything 
> works fine there.
> here is my systemd configuration for the secondary Prometheus:
> 
> [Unit style='color:#808030; '>]
> Description=Prometheus
> After=network-online.target
> 
> [Service]
> Type=simple
> Environment="GOMAXPROCS= style='color:#e6; '>8"
> User=prometheus
> Group=prometheus
> ExecReload=/bin/kill -HUP  style='color:#797997; '>$MAINPID
> ExecStart=/usr/local/sbin/prometheus 
> \
>   --config.file style='color:#808030; '>=/etc/prometheus/prometheus.yml \
>   --storage.tsdb.path style='color:#808030; '>=/var/lib/prometheus \
>   --storage.tsdb.retention.time style='color:#808030; '>=30d \
>   --storage.tsdb.retention.size style='color:#808030; '>=275GB \
>   --web.console.libraries style='color:#808030; '>=/etc/prometheus/console_libraries \
>   --web.console.templates style='color:#808030; '>=/etc/prometheus/consoles \
>   --web. style='color:#bb7977; font-weight:bold; '>enable-admin-api  style='color:#0f69ff; '>\
>   --web.listen-address style='color:#808030; '>=0 style='color:#80; font-weight:bold; '>. style='color:#008c00; '>0.0 style='color:#80; font-weight:bold; '>. style='color:#008c00; '>0: style='color:#008c00; '>9090 
> 
> CapabilityBoundingSet style='color:#808030; '>=CAP_SET_UID
> LimitNOFILE=65000
> LockPersonality style='color:#808030; '>=true
> NoNewPrivileges style='color:#808030; '>=true
> MemoryDenyWriteExecute style='color:#808030; '>=true
> PrivateDevices style='color:#808030; '>=true
> PrivateTmp=true
> ProtectHome=true
> RemoveIPC=true
> RestrictSUIDSGID style='color:#808030; '>=true
> CPUAccounting style='color:#808030; '>=yes
> MemoryAccounting style='color:#808030; '>=yes
> #SystemCallFilter=@signal @timer
> 
> ReadWritePaths style='color:#808030; '>=/var/lib/prometheus
> 
> PrivateUsers style='color:#808030; '>=true
> ProtectControlGroups style='color:#808030; '>=true
> ProtectKernelModules style='color:#808030; '>=true
> ProtectKernelTunables style='color:#808030; '>=true
> ProtectSystem style='color:#808030; '>=strict
> 
> 
> SyslogIdentifier style='color:#808030; '>=prometheus
> Restart=always
> 
> [Install]
> WantedBy=multi-user.target
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/5362956f-268d-477e-9d9b-0146e73f2fden%40googlegroups.com.


-- 
Julien Pivotto
@roidelapluie

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200630173407.GA844212%40oxygen.


Re: [prometheus-users] Division between two queries not working?

2020-06-30 Thread Julius Volz
By default, binary operators match series from the left and right side on
exactly identical label sets. But you keep the log_type label on the left,
while aggregating it away on the right. You can either aggregate it away on
both sides:

  sum(type_counter{log_type="ERROR"}) / sum(type_counter)

...or ignore that label for matching:

  type_counter{log_type="ERROR"} / ignoring(log_type) sum(type_counter)

Btw., I assume type_counter is a counter metric. You should typically never
care about the absolute value of a counter metric and instead always
rate(...) it (or increase(), etc.) before using.

On Tue, Jun 30, 2020 at 6:11 PM Cole Beasley 
wrote:

> Hi there,
>
> This might be a very basic question but I'm trying to get the ration of
> error log messages / total error messages. Using grok_exporter I have the
> metric type_counter which returns
> ElementValue
> type_counter{instance="localhost:9144",job="grok",log_type="AUTH"} 45
> type_counter{instance="localhost:9144",job="grok",log_type="ERROR"} 55
> type_counter{instance="localhost:9144",job="grok",log_type="IN"} 17603
> type_counter{instance="localhost:9144",job="grok",log_type="NESTED"} 971
> I simply then want (type_counter{log_type="ERROR"}) / (sum(type_counter)),
> but this does not return a value, this is odd to me as both
> type_counter{log_type="ERROR"} and sum(type_counter) return their expected
> values and will divide by themselves or manually entered integers (such as
> (18,000 / sum(type_counter)). I'm not sure why the division here is not
> working, any ideas?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/c1b630e6-60de-42dd-aeab-51e57baf9dcdo%40googlegroups.com
> 
> .
>


-- 
Julius Volz
PromLabs - promlabs.com

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAObpH5xyTc88o%3DteeiRJUNuT_dQ5zXcXcCN5SB8d_KzZ%2Bf1c4w%40mail.gmail.com.


[prometheus-users] Re: prometheus cannot collect metrics correctly after binary update

2020-06-30 Thread Yashar Nesabian
I'm sorry for the problem in showing the systemd configuration, Here is the 
correct configuration:
[Unit]
Description=Prometheus
After=network-online.target

[Service]
Type=simple
Environment="GOMAXPROCS=8"
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/sbin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=275GB \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.console.templates=/etc/prometheus/consoles \
  --web.enable-admin-api \
  --web.listen-address=0.0.0.0:9090 

CapabilityBoundingSet=CAP_SET_UID
LimitNOFILE=65000
LockPersonality=true
NoNewPrivileges=true
MemoryDenyWriteExecute=true
PrivateDevices=true
PrivateTmp=true
ProtectHome=true
RemoveIPC=true
RestrictSUIDSGID=true
CPUAccounting=yes
MemoryAccounting=yes
#SystemCallFilter=@signal @timer

ReadWritePaths=/var/lib/prometheus

PrivateUsers=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=strict


SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target



On Tuesday, June 30, 2020 at 9:27:10 PM UTC+4:30, Yashar Nesabian wrote:
>
> Hi,
> We have a Prometheus server which is our primary Prometheus server. we had 
> a Prometheus federation which used to gather all the metrics from the main 
> Prometheus every 90 seconds. we now decided to stop using federation as the 
> backup solution and move the Prometheus configuration to the ansible so 
> both servers can have the same configuration at the same time, the main 
> Prometheus and the federation's version was 2.15.2. after changing the 
> federation server's configuration and putting the same configuration as the 
> primary Prometheus on it, I decided to update the binary file to 2.19.2. 
> At first, I got service unavailable alert for about 10 minutes, after an 
> hour, we start to get lots of alerts claiming exporter has no data. 
> The problem got fixed after 30 minutes  but again we start to get lots of 
> alerts about it after few hours.
> when I checked the logs, I saw all jobs have "contecxt deadline exceeded" 
> alert (during the problem I couldn't connect to the web interface as well) 
> whereas I don't get any alert from the primary Prometheus and everything 
> works fine there.
> here is my systemd configuration for the secondary Prometheus:
>
> [Unit style='color:#808030; '>]
> Description style='color:#808030; '>=Prometheus
> After=network-online.target
>
> [Service]
> Type=simple
> Environment style='color:#808030; '>=" style='color:#797997; '>GOMAXPROCS=8"
> User=prometheus
> Group=prometheus
> ExecReload=/bin/kill -HUP  style='color:#797997; '>$MAINPID
> ExecStart=/usr/local/sbin/prometheus 
> \
>   --config.file style='color:#808030; '>=/etc/prometheus/prometheus.yml \
>   --storage.tsdb.path style='color:#808030; '>=/var/lib/prometheus \
>   --storage.tsdb.retention.time style='color:#808030; '>=30d \
>   --storage.tsdb.retention.size style='color:#808030; '>=275GB \
>   --web.console.libraries style='color:#808030; '>=/etc/prometheus/console_libraries \
>   --web.console.templates style='color:#808030; '>=/etc/prometheus/consoles \
>   --web. style='color:#bb7977; font-weight:bold; '>enable-admin-api  style='color:#0f69ff; '>\
>   --web.listen-address style='color:#808030; '>=0 style='color:#80; font-weight:bold; '>. style='color:#008c00; '>0.0 style='color:#80; font-weight:bold; '>. style='color:#008c00; '>0: style='color:#008c00; '>9090 
>
> CapabilityBoundingSet style='color:#808030; '>=CAP_SET_UID
> LimitNOFILE style='color:#808030; '>=65000
> LockPersonality style='color:#808030; '>=true
> NoNewPrivileges style='color:#808030; '>=true
> MemoryDenyWriteExecute style='color:#808030; '>=true
> PrivateDevices style='color:#808030; '>=true
> PrivateTmp=true
> ProtectHome style='color:#808030; '>=true
> RemoveIPC=true
> RestrictSUIDSGID style='color:#808030; '>=true
> CPUAccounting style='color:#808030; '>=yes
> MemoryAccounting style='color:#808030; '>=yes
> #SystemCallFilter=@signal @timer
>
> ReadWritePaths style='color:#808030; '>=/var/lib/prometheus
>
> PrivateUsers style='color:#808030; '>=true
> ProtectControlGroups style='color:#808030; '>=true
> ProtectKernelModules style='color:#808030; '>=true
> ProtectKernelTunables style='color:#808030; '>=true
> ProtectSystem style='color:#808030; '>=strict
>
>
> SyslogIdentifier style='color:#808030; '>=prometheus
> Restart=always
>
> [Install]
> WantedBy=multi-user.target
> 
> 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c4f2083b-f551-46bf-b330-236fba482a7eo%40googlegroups.com.


[prometheus-users] prometheus cannot collect metrics correctly after binary update

2020-06-30 Thread nesa...@gmail.com
Hi,
We have a Prometheus server which is our primary Prometheus server. we had 
a Prometheus federation which used to gather all the metrics from the main 
Prometheus every 90 seconds. we now decided to stop using federation as the 
backup solution and move the Prometheus configuration to the ansible so 
both servers can have the same configuration at the same time, the main 
Prometheus and the federation's version was 2.15.2. after changing the 
federation server's configuration and putting the same configuration as the 
primary Prometheus on it, I decided to update the binary file to 2.19.2. 
At first, I got service unavailable alert for about 10 minutes, after an 
hour, we start to get lots of alerts claiming exporter has no data. 
The problem got fixed after 30 minutes  but again we start to get lots of 
alerts about it after few hours.
when I checked the logs, I saw all jobs have "contecxt deadline exceeded" 
alert (during the problem I couldn't connect to the web interface as well) 
whereas I don't get any alert from the primary Prometheus and everything 
works fine there.
here is my systemd configuration for the secondary Prometheus:

[Unit]
Description=Prometheus
After=network-online.target

[Service]
Type=simple
Environment="GOMAXPROCS=8"
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/sbin/prometheus 
\
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=275GB \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.console.templates=/etc/prometheus/consoles \
  --web.enable-admin-api \
  --web.listen-address=0.0.0.0:9090 

CapabilityBoundingSet=CAP_SET_UID
LimitNOFILE=65000
LockPersonality=true
NoNewPrivileges=true
MemoryDenyWriteExecute=true
PrivateDevices=true
PrivateTmp=true
ProtectHome=true
RemoveIPC=true
RestrictSUIDSGID=true
CPUAccounting=yes
MemoryAccounting=yes
#SystemCallFilter=@signal @timer

ReadWritePaths=/var/lib/prometheus

PrivateUsers=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=strict


SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target



-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5362956f-268d-477e-9d9b-0146e73f2fden%40googlegroups.com.


[prometheus-users] Division between two queries not working?

2020-06-30 Thread Cole Beasley
Hi there,

This might be a very basic question but I'm trying to get the ration of 
error log messages / total error messages. Using grok_exporter I have the 
metric type_counter which returns 
ElementValue
type_counter{instance="localhost:9144",job="grok",log_type="AUTH"} 45
type_counter{instance="localhost:9144",job="grok",log_type="ERROR"} 55
type_counter{instance="localhost:9144",job="grok",log_type="IN"} 17603
type_counter{instance="localhost:9144",job="grok",log_type="NESTED"} 971
I simply then want (type_counter{log_type="ERROR"}) / (sum(type_counter)), 
but this does not return a value, this is odd to me as both 
type_counter{log_type="ERROR"} and sum(type_counter) return their expected 
values and will divide by themselves or manually entered integers (such as 
(18,000 / sum(type_counter)). I'm not sure why the division here is not 
working, any ideas?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c1b630e6-60de-42dd-aeab-51e57baf9dcdo%40googlegroups.com.


[prometheus-users] promql

2020-06-30 Thread raghu padaki
need help for writing promql query to populate metrics
i am using prometheus to integrate mongodb

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/965b284e-88a4-474a-b143-7eb2277239cfo%40googlegroups.com.


Re: [prometheus-users] Is it possible to extract labels when generating AlertManager alert ?

2020-06-30 Thread Sébastien Dionne
YES.. when I have labels on my pods.. I received them.  good.  I think, 
I'll be able to work with AlertManager webhook.


Prometheus auto-discover my pods because they are annoted with
prometheus.io/path: /metrics
  prometheus.io/port: 8080
  prometheus.io/scrape: true


but there is a way to configure the scrape interval with annoation too ?

I could have applications that we want to monitor each 15 sec and others at 
45sec interval or more.



thanks 





On Tuesday, June 30, 2020 at 7:34:00 AM UTC-4, Sébastien Dionne wrote:
>
> that is the config that I have so far
>
>
> serverFiles:
>   alerts:
> groups:
>   - name: Instances
> rules:
>   - alert: InstanceDown
> expr: up == 0
> for: 10s
> labels:
>   severity: page
> annotations:
>   description: '{{ $labels.instance }} of job {{ $labels.job 
> }} has been down for more than 1 minute.'
>   summary: 'Instance {{ $labels.instance }} down'
>   
> alertmanagerFiles:
>   alertmanager.yml:
> route:
>   receiver: default-receiver
>   group_wait: 5s
>   group_interval: 10s
>
> receivers:
>   - name: default-receiver
> webhook_configs:
>   - url: "
> https://webhook.site/815a0b0b-f40c-4fc2-984d-e29cb9606840;
>   
>
>
> here a exemple of one of my pods
>
> Labels:   app.kubernetes.io/instance=optimizer-6e0f0a089c70
>   app.kubernetes.io/name=optimizer-interface
>   
>   pod-template-hash=784669954d
>   releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70
>   service.ip=10.1.7.200
>
> Annotations:  cni.projectcalico.org/podIP: 192.168.218.99/32
>   cni.projectcalico.org/podIPs: 192.168.218.99/32
>   prometheus.io/path: /metrics
>   prometheus.io/port: 8080
>   prometheus.io/scrape: true
>
> I have to get Prometheus to scan for pod "health" each 10-15 seconds and 
> send a alert for the pods that are up->down  and down -> up
>
>
> on the side, I added a Gauge that return the timestamp in my application 
> and I pool Prometheus each 15 seconds to get the last timestamp of all 
> application and if the NOW - timestamp> 15, that means that Prometheus 
> wasn't able to call the pod in the last 15 seconds.. so I consider that pod 
> down.  With a query like that
>
>
> http://localhost:9090/api/v1/query?query={__name__=~".*lastTimestampScrapped"}
>
> but if I could do the same directly with Prometheus+alertManager, I 
> wouldn't have to query manually Prometheus myself.
>
>
>
>
>
>
>
>
> On Tuesday, June 30, 2020 at 4:15:58 AM UTC-4, Christian Hoffmann wrote:
>>
>> Hi, 
>>
>> On 6/25/20 8:55 PM, Sébastien Dionne wrote: 
>> > I have few java applications that I'll deploy in my cluster.  I need to 
>> > know how can I detect if a instance is up or down with Prometheus.  
>> > 
>> > *Alerting with AlertManager* 
>> > * 
>> > * 
>> > I have a alert that check for "instanceDown" and send a alert to 
>> > AlertManager-webhook. So when one instance is down, i'm receiving 
>> alerts 
>> > in my application.   
>> > 
>> > But how can I extract the labels that are in that instance ? 
>> What do you mean by "in that instance"? 
>>
>> If the label is part of your service discovery, then it should be 
>> attached to all series from that target. This would also imply that it 
>> would be part of any alert by default unless you aggregate it away (e.g. 
>> by using sum, avg or something). 
>>
>> If the label is only part of some info-style metric, you will have to 
>> mix this metric into your alert. 
>>
>> Can you share one of the relevant alert rules if you need more specific 
>> guidance? 
>>
>> Note: I don't know how many releaseUUIDGroups you have, but having UUIDs 
>> as label values might ring some alarm bells due to the potential for 
>> high cardinality issues. :) 
>>
>> Kind regards, 
>> Christian 
>>
>>
>> > 
>> > ex : I have a special labels in all my application that link the pod to 
>> > the information that I have in the database 
>> > 
>> > releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70 
>> > 
>> > 
>> > there is a way to add that information in the message that AlertManager 
>> > send ? 
>> > 
>> > right now I configure AlertManager to send the alert to 
>> > : 
>> https://webhook.site/#!/815a0b0b-f40c-4fc2-984d-e29cb9606840/b0dd701d-e972-48d4-9083-385e6a788d55/1
>>  
>> > 
>> > for an example, I kill the pod : prometheus-pushgateway 
>> > 
>> > and I received this message :  
>> > 
>> > { 
>> >   "receiver": "default-receiver", 
>> >   "status": "resolved", 
>> >   "alerts": [ 
>> > { 
>> >   "status": "resolved", 
>> >   "labels": { 
>> > "alertname": "InstanceDown", 
>> > "instance": "prometheus-pushgateway.default.svc:9091", 
>> > "job": "prometheus-pushgateway", 
>> > "severity": "page" 
>> >   }, 
>> >  

[prometheus-users] Invalid path to scrape the metrics

2020-06-30 Thread Sadhana Kumari
Hi Team,

We have configured few node and url, but the metrics are not getting 
scraped. 

There is a minion between the customer server and the node exporter server.

Below is the configuration done.

On Prometheus server for Apigee(customer) url

- job_name: 'apigeeurl'
metrics_path: /probe
scheme: http
params:
module: [http_2xx]
file_sd_configs:
- files:
  - /etc/prometheus/targets.yml
relabel_configs:
- source_labels: [__address__]
  target_label: instance
- source_labels: [__address__]
  target_label: __metrics_path__
- target_label: __address__
  replacement: 10.1.3.17:8080

For APigee node

- job_name: 'apigeenode'
static_configs:
  - targets:
- 10.2.0.85
- 10.2.0.86
- 10.2.0.87
- 10.2.0.101
- 10.2.0.102
- 10.2.0.103
- 10.2.0.117
- 10.2.0.118
- 10.2.0.119
- 10.2.16.85
- 10.2.16.86
- 10.2.16.87
- 10.2.16.101
- 10.2.16.102
- 10.2.16.103
- 10.2.16.117
- 10.2.16.118
- 10.2.16.119

relabel_configs:
  - source_labels: [__address__]
target_label: instance
  - source_labels: [__address__]
target_label: __metrics_path__
  - target_label: __address__
replacement: 10.1.3.17:8080


On prometheus exporter server :-

root@prodexporter01:/opt/promitor_azure_exporter/prodmon# docker ps -a
CONTAINER IDIMAGE  COMMAND  
CREATED STATUS  PORTS  

   NAMES
18a29bfa41b1pambrose/prometheus-proxy:1.6.4"java 
-server -XX:+U…"   14 hours agoUp 14 hours 0.0.0.0
:8080->8080/tcp, 0.0.0.0:8082->8082/tcp, 0.0.0.0:8092->8092/tcp, 
0.0.0.0:50051->50051/tcp, 50440/tcp   prometheus-proxy


Logs for prometheus proxy 

12:28:26.433 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.16.119 [DefaultDispatcher-worker-6]
12:28:26.470 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.0.119ps9 [DefaultDispatcher-worker-4]
12:28:27.719 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.0.102 [DefaultDispatcher-worker-6]
12:28:28.496 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.0.117 [DefaultDispatcher-worker-4]
12:28:28.639 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.16.103qs6 [DefaultDispatcher-worker-5]
12:28:28.792 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.16.87 [DefaultDispatcher-worker-4]
12:28:29.914 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.0.86rtr2 [DefaultDispatcher-worker-5]
12:28:30.034 INFO  [ProxyHttpConfig.kt:116] - Invalid path request 
/10.2.0.85ms1 [DefaultDispatcher-worker-4]

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/061c8284-ec8c-4d9f-a8ab-8cd71c2750f8o%40googlegroups.com.


Re: [prometheus-users] Is it possible to extract labels when generating AlertManager alert ?

2020-06-30 Thread Sébastien Dionne
that is the config that I have so far


serverFiles:
  alerts:
groups:
  - name: Instances
rules:
  - alert: InstanceDown
expr: up == 0
for: 10s
labels:
  severity: page
annotations:
  description: '{{ $labels.instance }} of job {{ $labels.job }} 
has been down for more than 1 minute.'
  summary: 'Instance {{ $labels.instance }} down'
  
alertmanagerFiles:
  alertmanager.yml:
route:
  receiver: default-receiver
  group_wait: 5s
  group_interval: 10s

receivers:
  - name: default-receiver
webhook_configs:
  - url: "https://webhook.site/815a0b0b-f40c-4fc2-984d-e29cb9606840;
  


here a exemple of one of my pods

Labels:   app.kubernetes.io/instance=optimizer-6e0f0a089c70
  app.kubernetes.io/name=optimizer-interface
  
  pod-template-hash=784669954d
  releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70
  service.ip=10.1.7.200

Annotations:  cni.projectcalico.org/podIP: 192.168.218.99/32
  cni.projectcalico.org/podIPs: 192.168.218.99/32
  prometheus.io/path: /metrics
  prometheus.io/port: 8080
  prometheus.io/scrape: true

I have to get Prometheus to scan for pod "health" each 10-15 seconds and 
send a alert for the pods that are up->down  and down -> up


on the side, I added a Gauge that return the timestamp in my application 
and I pool Prometheus each 15 seconds to get the last timestamp of all 
application and if the NOW - timestamp> 15, that means that Prometheus 
wasn't able to call the pod in the last 15 seconds.. so I consider that pod 
down.  With a query like that

http://localhost:9090/api/v1/query?query={__name__=~".*lastTimestampScrapped"}

but if I could do the same directly with Prometheus+alertManager, I 
wouldn't have to query manually Prometheus myself.








On Tuesday, June 30, 2020 at 4:15:58 AM UTC-4, Christian Hoffmann wrote:
>
> Hi, 
>
> On 6/25/20 8:55 PM, Sébastien Dionne wrote: 
> > I have few java applications that I'll deploy in my cluster.  I need to 
> > know how can I detect if a instance is up or down with Prometheus.  
> > 
> > *Alerting with AlertManager* 
> > * 
> > * 
> > I have a alert that check for "instanceDown" and send a alert to 
> > AlertManager-webhook. So when one instance is down, i'm receiving alerts 
> > in my application.   
> > 
> > But how can I extract the labels that are in that instance ? 
> What do you mean by "in that instance"? 
>
> If the label is part of your service discovery, then it should be 
> attached to all series from that target. This would also imply that it 
> would be part of any alert by default unless you aggregate it away (e.g. 
> by using sum, avg or something). 
>
> If the label is only part of some info-style metric, you will have to 
> mix this metric into your alert. 
>
> Can you share one of the relevant alert rules if you need more specific 
> guidance? 
>
> Note: I don't know how many releaseUUIDGroups you have, but having UUIDs 
> as label values might ring some alarm bells due to the potential for 
> high cardinality issues. :) 
>
> Kind regards, 
> Christian 
>
>
> > 
> > ex : I have a special labels in all my application that link the pod to 
> > the information that I have in the database 
> > 
> > releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70 
> > 
> > 
> > there is a way to add that information in the message that AlertManager 
> > send ? 
> > 
> > right now I configure AlertManager to send the alert to 
> > : 
> https://webhook.site/#!/815a0b0b-f40c-4fc2-984d-e29cb9606840/b0dd701d-e972-48d4-9083-385e6a788d55/1
>  
> > 
> > for an example, I kill the pod : prometheus-pushgateway 
> > 
> > and I received this message :  
> > 
> > { 
> >   "receiver": "default-receiver", 
> >   "status": "resolved", 
> >   "alerts": [ 
> > { 
> >   "status": "resolved", 
> >   "labels": { 
> > "alertname": "InstanceDown", 
> > "instance": "prometheus-pushgateway.default.svc:9091", 
> > "job": "prometheus-pushgateway", 
> > "severity": "page" 
> >   }, 
> >   "annotations": { 
> > "description": "prometheus-pushgateway.default.svc:9091 of job 
> prometheus-pushgateway has been down for more than 1 minute.", 
> > "summary": "Instance prometheus-pushgateway.default.svc:9091 
> down" 
> >   }, 
> >   "startsAt": "2020-06-19T17:09:53.862877577Z", 
> >   "endsAt": "2020-06-22T11:23:53.862877577Z", 
> >   "generatorURL": "
> http://prometheus-server-57d8dcc67f-qnmkj:9090/graph?g0.expr=up+%3D%3D+0=1;,
>  
>
> >   "fingerprint": "1ed4a1dca68d64fb" 
> > } 
> >   ], 
> >   "groupLabels": {}, 
> >   "commonLabels": { 
> > "alertname": "InstanceDown", 
> > "instance": "prometheus-pushgateway.default.svc:9091", 
> > "job": "prometheus-pushgateway", 
> > 

Re: [prometheus-users] Is it possible to extract labels when generating AlertManager alert ?

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/25/20 8:55 PM, Sébastien Dionne wrote:
> I have few java applications that I'll deploy in my cluster.  I need to
> know how can I detect if a instance is up or down with Prometheus. 
> 
> *Alerting with AlertManager*
> *
> *
> I have a alert that check for "instanceDown" and send a alert to
> AlertManager-webhook. So when one instance is down, i'm receiving alerts
> in my application.  
> 
> But how can I extract the labels that are in that instance ?
What do you mean by "in that instance"?

If the label is part of your service discovery, then it should be
attached to all series from that target. This would also imply that it
would be part of any alert by default unless you aggregate it away (e.g.
by using sum, avg or something).

If the label is only part of some info-style metric, you will have to
mix this metric into your alert.

Can you share one of the relevant alert rules if you need more specific
guidance?

Note: I don't know how many releaseUUIDGroups you have, but having UUIDs
as label values might ring some alarm bells due to the potential for
high cardinality issues. :)

Kind regards,
Christian


> 
> ex : I have a special labels in all my application that link the pod to
> the information that I have in the database
> 
> releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70
> 
> 
> there is a way to add that information in the message that AlertManager
> send ?
> 
> right now I configure AlertManager to send the alert to
> : 
> https://webhook.site/#!/815a0b0b-f40c-4fc2-984d-e29cb9606840/b0dd701d-e972-48d4-9083-385e6a788d55/1
> 
> for an example, I kill the pod : prometheus-pushgateway
> 
> and I received this message : 
> 
> {
>   "receiver": "default-receiver",
>   "status": "resolved",
>   "alerts": [
> {
>   "status": "resolved",
>   "labels": {
> "alertname": "InstanceDown",
> "instance": "prometheus-pushgateway.default.svc:9091",
> "job": "prometheus-pushgateway",
> "severity": "page"
>   },
>   "annotations": {
> "description": "prometheus-pushgateway.default.svc:9091 of job 
> prometheus-pushgateway has been down for more than 1 minute.",
> "summary": "Instance prometheus-pushgateway.default.svc:9091 down"
>   },
>   "startsAt": "2020-06-19T17:09:53.862877577Z",
>   "endsAt": "2020-06-22T11:23:53.862877577Z",
>   "generatorURL": 
> "http://prometheus-server-57d8dcc67f-qnmkj:9090/graph?g0.expr=up+%3D%3D+0=1;,
>   "fingerprint": "1ed4a1dca68d64fb"
> }
>   ],
>   "groupLabels": {},
>   "commonLabels": {
> "alertname": "InstanceDown",
> "instance": "prometheus-pushgateway.default.svc:9091",
> "job": "prometheus-pushgateway",
> "severity": "page"
>   },
>   "commonAnnotations": {
> "description": "prometheus-pushgateway.default.svc:9091 of job 
> prometheus-pushgateway has been down for more than 1 minute.",
> "summary": "Instance prometheus-pushgateway.default.svc:9091 down"
>   },
>   "externalURL": "http://localhost:9093;,
>   "version": "4",
>   "groupKey": "{}:{}"
> }
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-users+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/20ec33e0-e9bf-4f2a-b366-092743dad957o%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f46ecd4e-0986-939f-a98e-def56d0f1fe9%40hoffmann-christian.info.


Re: [prometheus-users] prometheus is scraping metrics from an instance which has no exporter running on

2020-06-30 Thread Christian Hoffmann
Hi,


On 6/23/20 8:32 AM, Yashar Nesabian wrote:
> Hi,
> A few days ago I realized IPMI exporter is not running on one of our
> bare metals but we didn't get any alert from our Prometheus. Although I
> cannot get the metrics via curl on the Prometheus server, our Prometheus
> is scraping metrics successfully from this server!
> here is the Prometheus page indicating the Prometheus can scrap metrics
> successfully :
> 
> prom.png
> 
> 
> But when I SSH to the server, no one is listening on port 9290:
> 
> 
> 
> And I've checked the DNS records, they are correct (when I ping the
> address, it returns the correct address. Here is the curl result from
> the Prometheus server for one-08:
> |
> curl http://one-08.compute.x.y.z:9290
> curl: (7) Failed to connect to one-08.compute.x.y.z port 9290:
> Connection refused
> |
> 
> The weird thing is I can see one-08 metrics on the Prometheus server
> (for the moment):
> 
> prom1.png
> 
> 
> 
> I tried to put this job on another Prometheus server but I get an error
> on the second one claiming context deadline exceeded which is correct.

Could a DNS cache be involved?

Try comparing
getent hosts one-08
vs.
dig one-08
on the Prometheus machine.

You can also try tcpdump to analyze where Prometheus is actually
connecting to.

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c338d7d6-1e08-2ce6-10ce-64be1e416f3a%40hoffmann-christian.info.


Re: [prometheus-users] Is there a good grok user group? I need a pattern!

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/24/20 11:14 AM, Danny de Waard wrote:
> Prometheus users,
> 
> Who of you knows a good grok site/group/knowledge base where i can
> figure out my pattern.
> I can not figure out how to get my ssl log good in grok.

Looks like this is used in Logstash, maybe you can ask there?

https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/28e3825b-87c5-4aa4-4320-d8bf8b332e7b%40hoffmann-christian.info.


Re: [prometheus-users] Issues with group_left to exclude specific label value

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/25/20 4:05 PM, Al wrote:
> All hosts from which I collect node_exporter metrics each have an
> additional node_role metric (added via textfile collector) which
> identifies all the Chef roles a given host has.  As an example, say we
> have 3 hosts with the following textfile collector metrics:
> 
> *server1:*
> |
> node_role{role="web_server_app1"}
> |
> 
> *server2:*
> |
> node_role{role="redis_server"}
> |
> 
> *server3:*
> |
> node_role{role="db_server"}
> |
> 
> *server4:*
> |
> node_role{role="web_server_app2"}
> |
> 
> 
> 
> I'm attempting to write a PromQL query which will return the current
> disk usage % for hosts that do not have a specific role "web_server"
> assigned to them.  I've attempted the following PromQL query although
> it's invalid as we end up with many results on the right hand side,
> which doesn't match the many-to-one nature of the group left:
> 
> |
> 
> 100-(
> 
>    (node_filesystem_free_bytes{mountpoint=“”/data}*on
> (hostname)group_left(role)node_role{role!~“web_server.*”})
> 
>    /
> 
>    (node_filesystem_size_bytes{mountpoint=“”/data}*on
> (hostname)group_left(role)node_role{role!~“web_server”})
> 
>    *
> 
>    100
> 
> )
> 
> |
> 
> How could I modify this query so that it correctly return the disk usage
> percentage of server2 and server3?

The pattern basically looks fine.
Some remarks:

- Your quotation marks do not look like ASCII quotes -- I guess this
comes from pasting?
- The quotation marks around mountpoint= seem off (one should come after
/data).
- Your role regexp is not identical. The regexp in the second part lacks
the .* and will therefore match all your example servers.
- The role "join" can probably be omitted from the second part when
using on(instance, mountpoint)

In general I suggest trying the parts of your query individually and
only putting them into the larger query once both parts return what you
need.

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9e2af627-d490-7eba-9738-faa6b72e1758%40hoffmann-christian.info.


Re: [prometheus-users] Merging too prometheus datasources on the same grafana dashboard

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/29/20 11:43 AM, Daly Graty wrote:
> I got to grafana servers first one is monitoring kubernetes installed on
> the master the second is on a separate Vm both are pinging !
> I need to merge both of them in order to access them with the same URL
> I tried to added kubernetes prometheus ( my first Grafana server) as a
> data source on the second one but i got an ‘’ error gateway’’ !
> some help please !

I suggest looking at the Grafana logs. Also try accessing your data
source URL of the problematic Prometheus instance from your Grafana
server via curl. Maybe there is some firewall restriction in place (ping
is not sufficient, you will need tcp access on the relevant port [9090,
by default]).

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/453f1c40-18c7-6b6a-67e9-271e6dfdba8d%40hoffmann-christian.info.


Re: [prometheus-users] Custom Threshold for a particular instance.

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/24/20 8:09 PM, yagyans...@gmail.com wrote:
> Hi. Currently I am using a custom threshold in case of my Memory alerts.
> I have 2 main labels for my every node exporter target - cluster and
> component.
> My custom threshold till now has been based on the component as I had to
> define that particular custom threshold for all the servers of the
> component. But now, I have 5 instances, all from different components
> and I have to set the threshold as 97. How do approach this?
> 
> My typical node exporter job.
>   - job_name: 'node_exporter_JOB-A'
>     static_configs:
>     - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100']
>   labels:
>     cluster: 'Cluster-A'
>     env: 'PROD'
>     component: 'Comp-A'
>     scrape_interval: 10s
> 
> Recording rule for custom thresholds.
>   - record: abcd_critical
>     expr: 99.9
>     labels:
>   component: 'Comp-A'
> 
>   - record: xyz_critical
>     expr: 95
>     labels:
>   node: 'Comp-B'
> 
> The expression for Memory Alert.
> ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) *
> on(instance) group_left(nodename) node_uname_info > on(component)
> group_left() (*abcd_critical* or *xyz_critical* or on(node) count by
> (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 90)
> 
> Now, I have 5 servers with different components. How to include that in
> the most optimized manner?

This looks almost like the pattern described here:
https://www.robustperception.io/using-time-series-as-alert-thresholds

It looks like you already tried to integrate the two different ways to
specific thresholds, right? Is there any specific problem with it?

Sadly, this pattern quickly becomes complex, especially if nested (like
you would need to do) and if combined with an already longer query (like
in your case).

I can only suggest to try to move some of the complexity out of the
query (e.g. by moving the memory calculation to a recording rule instead).

You can also split the rule into multiple rules (with the same name).
You will just have to ensure that they only ever fire for a subset of
your instances (e.g. the first variant would only fire for
compartment-based thresholds, the second only for instance-based
thresholds).

Hope this helps.

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2565fb74-b5ab-26a9-7656-8b81eeb277ff%40hoffmann-christian.info.


Re: [prometheus-users] Prometheus timeseries and table panel in grafana

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/23/20 5:43 PM, neel patel wrote:
> I am using prometheus and grafana combo to monitor PostgreSQL database.
> 
> Now prometheus stores the timeseries as below.
> 
> disk_free_space{file_system="/dev/sda1",file_system_type=“xfs”,mount_point="/boot",server=“127.0.0.1:5432”}
> 9.5023104e+07
> disk_free_space{file_system="/dev/sda1",file_system_type=“xfs”,mount_point="/boot",server=“127.0.0.1:5433”}
> 9.5023104e+07
> disk_free_space{file_system="/dev/sda3",file_system_type=“xfs”,mount_point="/",server=“127.0.0.1:5432”}
> 2.8713885696e+10
> disk_free_space{file_system="/dev/sda3",file_system_type=“xfs”,mount_point="/",server=“127.0.0.1:5433”}
> 2.8714070016e+10
> disk_free_space{file_system=“rootfs”,file_system_type=“rootfs”,mount_point="/",server=“127.0.0.1:5432”}
> 2.8713885696e+10
> disk_free_space{file_system=“rootfs”,file_system_type=“rootfs”,mount_point="/",server=“127.0.0.1:5433”}
> 2.8714070016e+10
> 
> How to plot Table panel in grafana using above metrics. I can plot the
> time series using line chart but how to represent above data in Table in
> grafana as it is timeseries ? Any pointers will be helpful.

Are you looking for the table panel?

https://grafana.com/docs/grafana/latest/panels/visualizations/table-panel/

Note: This mailing list is primarily targeted at Prometheus. Although
many Prometheus users are also Grafana users, there may be more
Grafana-focused support facilities somewhere else (I don't know). :)

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7a3934fb-41ab-00f6-a4c6-01ec2b735fca%40hoffmann-christian.info.


Re: [prometheus-users] disk speed

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/23/20 4:45 PM, 'Metrics Searcher' via Prometheus Users wrote:
> Does anyone know how to collect the disk speed, like I can do it via
> hdparm or dd?

I don't know of a standard solution for this. Also, your examples are
performance metrics which cannot be collected passively and continuously
such as other system metrics (i.e. disk throughput based on normal usage).

You can still set up a small cronjob to run such benchmarks and place
the results in a .prom file for the node_exporter's textfile collector.

For background/examples, see this blog post and the three linked articles:
https://www.robustperception.io/atomic-writes-and-the-textfile-collector

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/3146ddb5-af8f-5616-8a24-2383cb9bb1d7%40hoffmann-christian.info.