[prometheus-users] Re: What could be causing the alert manager to fail sending emails?

2023-04-30 Thread Brian Candler
The error "dial tcp 74.125.206.108:587: i/o timeout" suggests that your 
container can't even make an outbound network connection. Try logging into 
the container and running "telnet smtp.gmail.com 587" or "nc 
smtp.gmail.com:587".  You should get a banner back like this:

220 smtp.gmail.com ESMTP t26-2002.1 - gsmtp

If you don't, then you have a networking/firewalling issue to investigate 
on your k8s cluster - it's nothing to do with alertmanager.

As for your config, I observe that your 'receivers' section doesn't look 
like what Alertmanager requires:
https://prometheus.io/docs/alerting/latest/configuration/#email_config
For example, you have "emailConfigs" whereas alertmanager requires 
"email_configs".

Presumably, therefore, you are using some middleware which is munging the 
config before giving it to alertmanager (I guess this is something provided 
by "monitoring.coreos.com"?)

If you're able to find the alertmanager config *as deployed* by this 
middleware, e.g. by entering the running container and printing it, then 
you can post that here - or it may be obvious what the problem is. Note 
that auth_password is *not* base64 encoded in the alertmanager config file 
- I don't know if your middleware base64-decodes it as part of deployment.

On Sunday, 30 April 2023 at 09:12:20 UTC+1 Zikou wrote:

> I'm testing to trigger alert manager to send email but It doesnt work logs 
> that I got from the pod of alert manager
>
> ts=2023-04-18T17:35:15.453Z caller=dispatch.go:352 level=error 
> component=dispatcher msg="Notify for alerts failed" num_alerts=1 
> err="monitoring/main-rules-alert-config/email/email[0]: notify retry 
> canceled after 4 attempts: establish connection to server: dial tcp 
> 74.125.206.108:587: i/o timeout"
> and here is the content of my alertmanagerconfig
>
> apiVersion: monitoring.coreos.com/v1alpha1
> kind: AlertmanagerConfig
> metadata:
>   name: main-rules-alert-config
>   namespace: monitoring
> spec:
>   route:
> receiver: 'email'
> repeatInterval: 30m
> routes:
> - matchers:
>   - name: alertname
> value: HostHighCPULoad
> - matchers:
>   - name: alertname
> value: KuberenetesPodCrashLooping
>   repeatInterval: 30m
>   receivers:
> - name: 'email'
>   emailConfigs:
>   - to: "zikou...@gmail.com"
> from: "zikou...@gmail.com"
> smarthost: 'smtp.gmail.com:587'
> authIdentity: "zikou...@gmail.com"
> authUsername: "zikou...@gmail.com"
> authPassword:
>   name: gmail-auth
>   key: password
> I created a secret named gmail-auth (in the same namespace as alert) 
> before apply the alert config the secret contain password of my gmail but 
> since 2FA is enabled I put app passwors in the secret base64 of course but 
> I dont know where is the issue why I dont receive any mail
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/89428057-1017-49df-9d77-7c84bcd966aan%40googlegroups.com.


[prometheus-users] Re: Do we need 2 config.yml for fetching metrics from 2 separate regions ?

2023-04-30 Thread Arunprasadh PM
Hi,

We are facing the same issue, do we have a solution to manage multiple 
regions? 
One is global/us-east-1 for Cloudfront  and other region for services like 
RDS.

We are facing issue to configure both in a  single config file?

On Sunday, 23 October 2022 at 14:30:12 UTC+5:30 Søren Valentin Silkjær 
wrote:

> Hi. I have the same question, did you find an answer?
>
> On Wednesday, November 11, 2020 at 8:09:31 AM UTC+1 m-mukesh wrote:
>
>> Hello All..i have a scenario where i need to fetch ALB metrics from 
>> us-west-2 region, and also Cloudfront metrics. As per documentation, we 
>> need to specify region=us-east-1 for cloudfront metrics. But, as per the 
>> cloudwatch exporter docs, looks like we can specify only 1 region name in 
>> single config file. Please let me know whether we need 2 separate config 
>> files, and also do we need to create 2 cloudwatch-exporter services with 
>> separate ports like 9106, 9107 ?
>> Please guide me on this.  
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c07cc1b2-b104-49ba-8c58-434aeb62a404n%40googlegroups.com.


[prometheus-users] Cloudwatch Exporter configuration for existing Grafana Dashboards

2023-04-30 Thread Uğurcan Çaykara
Hello everyone,
I have currently have an eks cluster running on AWS. I installed helm chart 
to setup prometheus and grafana. All metrics I need for EKS(deployments, 
services, pods etc.) are totally fine. So I wanted to centrally use that 
grafana for other AWS services that I use and that's why I configured 
config.yml for lambda metrics as given at the repository and deployed 
cloudwatch exporter (https://github.com/prometheus/cloudwatch_exporter/). I 
can see the related metrics at the Grafana dashboard. When I hit the 
explore tab at left menu from Grafana UI and enter lambda, all metrics 
given at the config.yml for lambda metrics are totally fine. I can query 
them. And now I wanted to use a dashboard for lambda 
-> https://grafana.com/grafana/dashboards/593-aws-lambda/ 
However it uses cloudwatch as datasource not prometheus and that's why I 
can't see no data for that specific dashboard. What's best way to overcome 
this. Is it something quickly editable and fixable ? or is it better to 
start creating dashboard from beginning. If someone. can help me here, I 
would appreciate that. 

Thanks

Best regards

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b132b832-d68f-4028-a330-5ac64f5b717cn%40googlegroups.com.


[prometheus-users] NLOG target?

2023-04-30 Thread Thomas Coakley
We are considering converting from application insights to Prometheus. We 
have used nlog to send our trace data. 

Does anyone know if it is possible to target prometheus from nlog?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c72cf988-26d9-4acd-afe7-129045e15f65n%40googlegroups.com.


Re: [prometheus-users] Cloudwatch Exporter configuration for existing Grafana Dashboards

2023-04-30 Thread Stuart Clark

On 19/04/2023 14:52, Uğurcan Çaykara wrote:

Hello everyone,
I have currently have an eks cluster running on AWS. I installed helm 
chart to setup prometheus and grafana. All metrics I need for 
EKS(deployments, services, pods etc.) are totally fine. So I wanted to 
centrally use that grafana for other AWS services that I use and 
that's why I configured config.yml for lambda metrics as given at the 
repository and deployed cloudwatch exporter 
(https://github.com/prometheus/cloudwatch_exporter/). I can see the 
related metrics at the Grafana dashboard. When I hit the explore tab 
at left menu from Grafana UI and enter lambda, all metrics given at 
the config.yml for lambda metrics are totally fine. I can query them. 
And now I wanted to use a dashboard for lambda 
-> https://grafana.com/grafana/dashboards/593-aws-lambda/
However it uses cloudwatch as datasource not prometheus and that's why 
I can't see no data for that specific dashboard. What's best way to 
overcome this. Is it something quickly editable and fixable ? or is it 
better to start creating dashboard from beginning. If someone. can 
help me here, I would appreciate that.


If you use dashboards from the Grafana site that aren't designed for 
Prometheus their usefulness is limited. Other datasources (such as 
Cloudwatch or Datadog) use totally different query languages, so what 
you are really gaining is an outline of a design rather than anything 
you can directly use. You would need to rewrite all the queries to 
operate in a similar manner using PromQL.


Whenever I'm having a look for Grafana dashboards I use the datasource 
filter such that only those designed specifically for Prometheus are 
listed - at least then they mostly work out of the box (although often 
still need slight tweaks due to different job names or use of labels).


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ac78d732-a7de-b7ec-d3cc-82800fd839c6%40Jahingo.com.


[prometheus-users] Simple Prometheus + Grafana setup not showing metrics

2023-04-30 Thread membrano phone
I have grafana and prometheus running on localhost and trying to run a 
sample app to get some graphs within grafana. This is my first dip into 
prometheus.

I am running a simple fastapi server
from fastapi import FastAPI
from prometheus_client import make_asgi_app, Gauge

# Create app
app = FastAPI(debug=False)

metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

g = Gauge('my_counter', 'Description of gauge')


@app.get('/inc')
def process_request():
   g.inc()
   return {}

I am running this with
pipenv run uvicorn fastapi_example:app --port 9090

Navigating to http://localhost:9090/metrics/ shows my_counter 0.0 if I 
navigate to the /inc I can observe the number increasing.

For the setup within Grafana I only add a Prometheus data source and set 
the url to http://127.0.0.1:9090/metrics

However when I navigate to add a panel that exposes the counter my_counter 
it just says 'No metrics found' I can however see that the fastapi server 
is seeing requests and responding with 200 OK.

I also have the first example for start_http_server running but that is 
also not giving me any metrics from within Grafana.

It feels like I am missing something glaringly obvious. Can anyone see 
anything wrong with my test setup?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a1f106d4-2595-4b2e-b8ec-1f75e32a497dn%40googlegroups.com.


Re: [prometheus-users] NLOG target?

2023-04-30 Thread Stuart Clark

On 19/04/2023 15:19, Thomas Coakley wrote:
We are considering converting from application insights to Prometheus. 
We have used nlog to send our trace data.


Does anyone know if it is possible to target prometheus from nlog?


Assuming that is traces (as in spans containing call durations & 
details, etc.) then no, as Prometheus is a metric system. Tempo is the 
tracing system from Grafana that can work alongside Prometheus (and 
there are other OS and commercial offerings too).


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0f66413d-f4dd-76ab-2759-396fabb8908d%40Jahingo.com.


[prometheus-users] Aggregating histograms with different bucket boundaries

2023-04-30 Thread Adam Sydnor
Hey all,

I was wondering if it is a bad idea to aggregate histograms that have 
different bucket boundaries. If so, why?

Or can Prometheus handle this and produce reliable metrics like quartiles?

I want to dynamically resize buckets boundaries in a histogram based on 
observed data but still be able to aggregate it with other histograms that 
have done the same.

Thanks!
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7989f505-e7d7-43d9-bc7d-b26a0fad44e0n%40googlegroups.com.


[prometheus-users] What could be causing the alert manager to fail sending emails?

2023-04-30 Thread Zikou
I'm testing to trigger alert manager to send email but It doesnt work logs 
that I got from the pod of alert manager

ts=2023-04-18T17:35:15.453Z caller=dispatch.go:352 level=error 
component=dispatcher msg="Notify for alerts failed" num_alerts=1 
err="monitoring/main-rules-alert-config/email/email[0]: notify retry 
canceled after 4 attempts: establish connection to server: dial tcp 
74.125.206.108:587: i/o timeout"
and here is the content of my alertmanagerconfig

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: main-rules-alert-config
  namespace: monitoring
spec:
  route:
receiver: 'email'
repeatInterval: 30m
routes:
- matchers:
  - name: alertname
value: HostHighCPULoad
- matchers:
  - name: alertname
value: KuberenetesPodCrashLooping
  repeatInterval: 30m
  receivers:
- name: 'email'
  emailConfigs:
  - to: "zikou.e...@gmail.com"
from: "zikou.e...@gmail.com"
smarthost: 'smtp.gmail.com:587'
authIdentity: "zikou.e...@gmail.com"
authUsername: "zikou.e...@gmail.com"
authPassword:
  name: gmail-auth
  key: password
I created a secret named gmail-auth (in the same namespace as alert) before 
apply the alert config the secret contain password of my gmail but since 
2FA is enabled I put app passwors in the secret base64 of course but I dont 
know where is the issue why I dont receive any mail

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0976da34-b492-45fb-bfb7-0b6d5ed353abn%40googlegroups.com.


[prometheus-users] What could be causing the alert manager to fail sending emails?

2023-04-30 Thread Zikou


I'm testing to trigger alert manager to send email but It doesnt work logs 
that I got from the pod of alert manager
*ts=2023-04-18T17:35:15.453Z caller=dispatch.go:352 level=error 
component=dispatcher msg="Notify for alerts failed" num_alerts=1 
err="monitoring/main-rules-alert-config/email/email[0]: notify retry 
canceled after 4 attempts: establish connection to server: dial tcp 
74.125.206.108:587: i/o timeout" *

and here is the content of my alertmanagerconfig

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: main-rules-alert-config
  namespace: monitoring
spec:
  route:
receiver: 'email'
repeatInterval: 30m
routes:
- matchers:
  - name: alertname
value: HostHighCPULoad
- matchers:
  - name: alertname
value: KuberenetesPodCrashLooping
  repeatInterval: 30m
  receivers:
- name: 'email'
  emailConfigs:
  - to: "zikou.e...@gmail.com"
from: "zikou.e...@gmail.com"
smarthost: 'smtp.gmail.com:587'
authIdentity: "zikou.e...@gmail.com"
authUsername: "zikou.e...@gmail.com"
authPassword:
  name: gmail-auth
  key: password

I created a secret named gmail-auth (in the same namespace as alert) before 
apply the alert config the secret contain password of my gmail but since 
2FA is enabled I put app password in the secret base64 of course but I dont 
know where is the issue why I dont receive any mail

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d1728e51-35fa-4dbf-b7c7-887fffca3bd4n%40googlegroups.com.


[prometheus-users] Synthetic monitoring using prometheus

2023-04-30 Thread Anton Naumovski
Hi Everyone

We are considering consolidating our monitoring under prometheus. In that 
regards, what is the best solution/exporter/combo of exporters that can be 
used for synthetic monitoring purposes, like getting response time of a 
user journeys in an application / per action, for example:

1. Log in Form 
2. Submit User Credential 
3. Submit Refresh Token 
4. Download
5. Upload Standard 
6. Delete Standard Uploaded File
7. Upload Partial
8. Delete Uploaded File

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5e8a1599-c497-4dd8-bae3-f301804543c2n%40googlegroups.com.


[prometheus-users] How to debug prometheus When we lose data?

2023-04-30 Thread lx
Hi all:
   I'm new to use the Prometheus, I send data by Pushgateway and
display data by Grafana.
I find I lose data for 3 minutes when I see grafana in two days. I think I
send the data and the network is ok.
How to debug this case?


these versions:
#
prometheus-2.42.0-rc.0.linux-amd64
pushgateway-1.5.1.linux-amd64
grafana-9.3.6
#

Prometheus config:
#
  - job_name: "pushgatewayoverflow"
#honor_labels: true
scrape_interval: 30s # Set the scrape interval to every 15 seconds.
Default is every 1 minute.
static_configs:
  - targets: ["127.0.0.1:9091"]

  - job_name: "pushgatewayportflow"
#honor_labels: true
scrape_interval: 30s # Set the scrape interval to every 15 seconds.
Default is every 1 minute.
static_configs:
  - targets: ["127.0.0.1:9092"]

  - job_name: "pushgatewaybuflow"
#honor_labels: true
scrape_interval: 30s # Set the scrape interval to every 15 seconds.
Default is every 1 minute.
static_configs:
  - targets: ["127.0.0.1:9093"]
#

If I use only one PushgateWay, Data loss will become more frequent. This is
Why?

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CA%2B5jrf%3DbPaW7EpRSyAmbnQ5HkXcutZ_8baAvq8-rr1_qq2MQsA%40mail.gmail.com.


[prometheus-users] Prometheus graph shows decreasing value for an non-decreasing gauge

2023-04-30 Thread Sunil Anand
Hello All,
I am seeing a weird behavior on the graph being plotted by prometheus.
We are using a gauge which is always a non-decreasing value.
When we query the gauge from the DB, it is showing an always increasing 
value across time intervals. 

16800906600 vsdptas1pcar-tas01.tc.corp see-1 cap_tcapProvider1 
NEXUS_CAP_SERVICE_VOICE cap_tcapProvider1 *18646863*   <-- human time 
13:51:00

16800906700 vsdptas1pcar-tas01.tc.corp see-1 cap_tcapProvider1 
NEXUS_CAP_SERVICE_VOICE cap_tcapProvider1 *18647216*   <-- human time 
13:51:10

But there are decreasing graph plot in the UI. Could this be a bug in 
prometheus or we are doing something wrong. Attached the screen shot of the 
graph.


Regards,

Sunil


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9cbbd86d-4e40-4fc6-8248-29d73cfac734n%40googlegroups.com.


[prometheus-users] Prometheus showing decreasing graph for a non-decreasing gauge

2023-04-30 Thread Sunil Anand
We are using Prometheus for displaying several gauge type metrics. This 
specific gauge would always be increasing as it demonstrates total value.

>From the query we could able to retrieve value that shows always in the 
increasing order across several intervals of 10s.

16800906600 vsdptas1pcar-tas01.tc.corp see-1 cap_tcapProvider1 
NEXUS_CAP_SERVICE_VOICE cap_tcapProvider1 *18646863*   <-- human time 
13:51:00

16800906700 vsdptas1pcar-tas01.tc.corp see-1 cap_tcapProvider1 
NEXUS_CAP_SERVICE_VOICE cap_tcapProvider1 *18647216*   <-- human time 
13:51:10

But when we plot the graph, sometimes this shows dip in the line which is 
abnormal for a non-decreasing metrics.

I have attached the graph for reference.

Does the UI have some bug or are we doing anything wrong in here, ?[image: 
Prometheus1.jpg][image: Prometheus2.jpg]

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/44b28310-5b57-4d3b-966c-beeab4062467n%40googlegroups.com.


[prometheus-users] How to debug the Prometheus?

2023-04-30 Thread 李欣
Hi all:
I'm the new to Prometheus. I send data to Prometheus by Pushgatway, 
and I display data by Grafana. I find I lose data for 3 minutes. How to 
debug it ? I think the client which send data is OK, I check the log. And 
the network between the client and Pushgatway is alwyas OK.

I use these versions:
#
prometheus-2.42.0-rc.0.linux-amd64
pushgateway-1.5.1.linux-amd64
grafana-9.3.6
#

The Prometheus config:
##
  - job_name: "pushgatewayoverflow"
#honor_labels: true
scrape_interval: 30s # Set the scrape interval to every 15 seconds. 
Default is every 1 minute.
static_configs:
  - targets: ["127.0.0.1:9091"]

  - job_name: "pushgatewayportflow"
#honor_labels: true
scrape_interval: 30s # Set the scrape interval to every 15 seconds. 
Default is every 1 minute.
static_configs:
  - targets: ["127.0.0.1:9092"]

  - job_name: "pushgatewaybuflow"
#honor_labels: true
scrape_interval: 30s # Set the scrape interval to every 15 seconds. 
Default is every 1 minute.
static_configs:
  - targets: ["127.0.0.1:9093"]
##

If I just use only one Pushgateway, the frequency of data loss will 
increase.

Thank you.


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/dd196e6e-0f56-41af-bcd2-f9c1d965bb25n%40googlegroups.com.


[prometheus-users] servers returning Hex-STRING instead of DisplayString = trailing NULL

2023-04-30 Thread Jonathan Tougas
I'm looking for a way to deal with a situation where we end up with null 
characters trailing some label values: `count({ifDescr=~".*\x00"}) != 0`.

The source of the problem seems to be with `ifDescr` returned as a 
`Hex-String` instead of what the MIB says should be a `DisplayString`... 
for __some__ servers.

# Good,  99% of servers:
$ snmpget -v 2c -c $creds 172.21.34.10 1.3.6.1.2.1.2.2.1.2.1
iso.3.6.1.2.1.2.2.1.2.1 = STRING: "eth0"

# Bad, Cisco CVP tsk tsk tsk...
$ snmpget -v 2c -c $creds 172.20.220.88 1.3.6.1.2.1.2.2.1.2.1
iso.3.6.1.2.1.2.2.1.2.1 = Hex-STRING: 53 6F 66 74 77 61 72 65 20 4C 6F 6F 
70 62 61 63
6B 20 49 6E 74 65 72 66 61 63 65 20 31 00

I'm currently planning on using `metric_relabel_configs` to cleanup the 
trailing nulls on these and other similar situations I uncovered. 
Is there better way than mopping up like that? Perhaps snmp-exporter can 
deal with these and convert somehow? I'm not familiar enough with it to 
figure out if it can or not.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e8a318f5-7365-4982-9882-8cbefe179f50n%40googlegroups.com.


[prometheus-users] Sum up request duration correctly

2023-04-30 Thread ofir y
I have a Prometheus server which scrapes data from my API metrics endpoint 
that is populated using  Prometheus.net library . the scraping interval set 
to 15 seconds. I'm publishing a request duration summary metric to it. this 
metric is published at random times to the endpoint. but the scrape 
interval makes Prometheus thinks it is a new value every 15 seconds, even 
if no new data was published. this causes the _count & _sum values of the 
metric to be wrong, as they consider every 15 seconds to be a new point.

my goal is to be able to count & to sum up all requests actions. so if I 
had 3 requests over a period of 2 minutes like so:
00:00 request 1: duration 1 sec
00:30 request 2: duration 1 sec
01:55 request 3: duration 2 sec

the _count will be 3, and the _sum will be 4 seconds. can I achieve this 
somehow by using labels or something else?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d44a4f1f-a8b7-4814-8c2e-667171e4372fn%40googlegroups.com.


[prometheus-users] can dnf -install prometheus

2023-04-30 Thread Kabamaru
Hello everyone

New to prometheus and I'm just trying to install it on Rocky9. Following an 
online guide, I installed epel-release and created repo file, adding the 
lines below to it.
 

curl -s 
https://packagecloud.io/install/repositories/prometheus-rpm/release/script.rpm.sh
 | sudo bash

 

 but I get the following error when I try to install it.  


- sudo dnf install prometheus -yheus –y 

- Warning: failed loading '/etc/yum.repos.d/prometheus.repo', skipping. 


Does anyone know why?


Thank you

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9a4e9fe7-b9d9-4a93-a7bb-2cf0b264f419n%40googlegroups.com.


[prometheus-users] Hex-STRING instead of DisplayString = trailing NULL

2023-04-30 Thread Jonathan Tougas
I'm looking for a way to deal with a situation where we end up with null 
characters trailing some label values: `count({ifDescr=~".*\x00"}) != 0`.

The source of the problem seems to be with `ifDescr` returned as a 
`Hex-String` instead of what the MIB says should be a `DisplayString`... 
for __some__ servers.

# Good,  99% of servers:
$ snmpget -v 2c -c $creds 172.21.34.10 1.3.6.1.2.1.2.2.1.2.1
iso.3.6.1.2.1.2.2.1.2.1 = STRING: "eth0"

# Bad, Cisco CVP tsk tsk tsk...
$ snmpget -v 2c -c $creds 172.20.220.88 1.3.6.1.2.1.2.2.1.2.1
iso.3.6.1.2.1.2.2.1.2.1 = Hex-STRING: 53 6F 66 74 77 61 72 65 20 4C 6F 6F 
70 62 61 63
6B 20 49 6E 74 65 72 66 61 63 65 20 31 00

I'm currently planning on using `metric_relabel_configs` to cleanup the 
trailing nulls on these and other similar situations I uncovered. 
Is there better way than mopping up like that? Perhaps snmp-exporter can 
deal with these and convert somehow? I'm not familiar enough with it to 
figure out if it can or not.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d6833ef8-bbee-4f5c-a964-ffb2cf471309n%40googlegroups.com.


[prometheus-users] Avoiding duplicate points when scraping

2023-04-30 Thread ofir y
I have a Prometheus server which scrapes data from my API metrics endpoint 
that is populated using  Prometheus.net library . the scraping interval set 
to 15 seconds. I'm publishing a request duration summary metric to it. this 
metric is published at random times to the endpoint. but the scrape 
interval makes Prometheus thinks it is a new value every 15 seconds, even 
if no new data was published. this causes the _count & _sum values of the 
metric to be wrong, as they consider every 15 seconds to be a new point.

my goal is to be able to count & to sum up all requests actions. so if I 
had 3 requests over a period of 2 minutes like so:
00:00 request 1: duration 1 sec
00:30 request 2: duration 1 sec
01:55 request 3: duration 2 sec

the _count will be 3, and the _sum will be 4 seconds. can I achieve this 
somehow by using labels or something else?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d7950816-21f8-486a-bd1a-7852d1112f87n%40googlegroups.com.


Re: [prometheus-users] Sum up request duration correctly

2023-04-30 Thread Stuart Clark

On 25/04/2023 10:50, ofir y wrote:
I have a Prometheus server which scrapes data from my API metrics 
endpoint that is populated using Prometheus.net library . the scraping 
interval set to 15 seconds. I'm publishing a request duration summary 
metric to it. this metric is published at random times to the 
endpoint. but the scrape interval makes Prometheus thinks it is a new 
value every 15 seconds, even if no new data was published. this causes 
the _count & _sum values of the metric to be wrong, as they consider 
every 15 seconds to be a new point.


my goal is to be able to count & to sum up all requests actions. so if 
I had 3 requests over a period of 2 minutes like so:

00:00 request 1: duration 1 sec
00:30 request 2: duration 1 sec
01:55 request 3: duration 2 sec

the _count will be 3, and the _sum will be 4 seconds. can I achieve 
this somehow by using labels or something else?


It sounds like you are trying to use Prometheus to store events, which 
won't work as Prometheus is a metric system.


Normally what you would expose from your application are counters giving 
the total number of the event being monitored as well as the total 
duration of all of that event.


Once scraped you can then show things like the number of events over a 
given period of time, as well as the average durations of those events 
over that period. What you cannot do with a metric system is know 
anything specific about an individual event. To do that you need an 
event system, such as Loki, Elasticsearch or a SQL database.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/09688fe8-de60-88d8-3c51-ef8dbb5d87b5%40Jahingo.com.


Re: [prometheus-users] Re: Do we need 2 config.yml for fetching metrics from 2 separate regions ?

2023-04-30 Thread Stuart Clark

On 21/04/2023 13:53, Arunprasadh PM wrote:

Hi,

We are facing the same issue, do we have a solution to manage multiple 
regions?
One is global/us-east-1 for Cloudfront  and other region for services 
like RDS.


We are facing issue to configure both in a  single config file?


You would need to be running multiple instances of the exporter.

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/afe00535-e1c0-8580-c359-52cdab1a7b22%40Jahingo.com.


[prometheus-users] Re: Prometheus showing decreasing graph for a non-decreasing gauge

2023-04-30 Thread Brian Candler
> 16800906600 vsdptas1pcar-tas01.tc.corp see-1 cap_tcapProvider1 
NEXUS_CAP_SERVICE_VOICE cap_tcapProvider1 *18646863*

That's not a prometheus-format metric. Where are you seeing that?

And those graphs aren't from prometheus either: it would show the metric 
name as
tas_see_TacpProvider_totalBeginSent{host="vsdptas1pcar-tas01.tc.corp",...}
not
{__name__="tas_see_TacpProvider_totalBeginSent",host="vsdptas1pcar-tas01.tc.corp",...}

(although I don't know what *version* of prometheus you're running, and if 
it's something ancient I suppose it could behave differently)

In any case: I suggest that you go to the Prometheus web interface, go to 
the PromQL expression editor, and enter

tas_see_TacpProvider_totalBeginSent{host="vsdptas1pcar-tas01.tc.corp"}[2m]

to see 2 minutes worth of raw metrics from the prometheus database, with 
timestamps.

Then change the expression to

tas_see_TacpProvider_totalBeginSent{host="vsdptas1pcar-tas01.tc.corp"}
and switch to the Graph tab.

I trust what it says is in the database is real. Therefore, if the graph 
shows values going down, and a database query shows values going down, then 
these are coming from your scraping process somehow.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bb2cdf20-53a0-452f-b7c1-333173d2074cn%40googlegroups.com.