[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

'Brian Candler' via Prometheus Users Thu, 29 Feb 2024 01:23:06 -0800

> I don't think  *condition1* and *condition2* will work as labels and 
label values returned by condition1 and condition2 are different.


condition1 if on (instance,mountpoint) group_left(username) condition2

This assumes that the both expressions have "instance" and "mountpoint" 
labels; these are the only ones considered when matching. It also assumes 
there is a many-to-1 relationship from the left-hand size (users) to right 
hand side (filesystem), and that there is a label "username" that you would 
like carried forward from the LHS into the result.

> So i need 3 rules  - 1 each for server1,server2 and server3

I don't think so. The vector of results can include values for each 
(user,filesystem,instance) on the LHS, and each (filesystem,instnace) on 
the RHS, and alert separately for every filesystem that reaches 90%.

On Wednesday 28 February 2024 at 22:55:11 UTC+7 Puneet Singh wrote:

> Hi All, 
> I have a monitoring requirement related to the user level disk usage and 
> alerting. And i am wondering if prometheus is the correct tool to handle 
> this requirement or,
>   a custom python script (whish uses os, subprocess, smtp module)  to 
> handle monitoring and alerting will be optimial solution in this context?
>
>
> Here is the problem description - 
> In our setup we have 3 servers we have  a single mount point "/", and each 
> user's directory, such as "/home/user1", "/home/user2", and so forth, 
> resides within this mount point.
> [image: Untitled11.png]
>   We enforce disk quotas for individual users, and our goal is to monitor 
> each user's disk usage and trigger alerts to the top 10 users when overall 
> quota exceeds 90%.
>
>
> Challenges:
> 1. Afaik, prometheus monitors the overall storage status and the 
> mountpoint information, so individual user's disk consumption is not being  
> tracked by Prometheus. Example - 
> [image: Untitled12.png]
>
> a) Do i need to write custom exporter here which uses du -sh to figure out 
> the disk usage  ? where 
> user_disk_usage_bytes{*username="ravi"*} 390000    
>
> b) or node exporter can do this?
>
>
>
>
> after data collection, i need to deal with alerting rule 
> 2. Here is the alert condition on the custom exporter-
>
> *condition1:* can help determine the users who have high usage
> topk*( * user_disk_usage_bytes*  /  * *scalar(*
> node_filesystem_size_bytes{instance="server1:9100",mountpoint='/'}*) ) *
>
> *condition2:*  this can help determine if the usage has reached 90% 
> (available space less than 10%)
>  (    node_filesystem_avail_bytes{instance="server1:9100",mountpoint='/'}  
> /   node_filesystem_size_bytes{ instance="server1:9100",mountpoint='/'  }  
>   ) < 0.1
>
> I don't think  *condition1* and *condition2* will work as labels and 
> label values returned by condition1 and condition2 are different.
>
> Is there a way to achieve this with PromQL ?
>
> Now, assuming that i am able to get a list of users if system utilization 
> is 90% as - 
> {username="ravi"}  80
> {username="user1"}  90
> {username="user2"}  70
> {username="user3"}  80
> {username="user4"}  90
>
> the alerting rule will be 
> groups:
> - name: example
>   rules:
>   - alert: Storage space is low on server1
>     expr: *condition1* and *condition2*
>     for: 10m
>     labels: alertname: "Server1's Storage space is running low, Please 
> cleanup the disk space - {{ $labels.username }}"     annotations:
>       summary: "you are using {{ $value }}% space on the / space.please 
> cleanup."
> So i need 3 rules  - 1 each for server1,server2 and server3
>
> 3.  Now alert manager is responsible to sending out the alerts 
> And to send the alert , i think this should be the configuration in 
> current context - 
> [image: Untitled14.png]
> as i have already included username in the alert name , and by default 
> grouping of alert happens by alertname so i think with this setting 1:1 
> email should be sent to each user.
>
>
>
> Apologies for the lengthy post , but I have tried expressing the flow to 
> solve this problem based on my understanding of Prometheus so far.
>
> I would greatly appreciate any insights, recommendations, or best 
> practices i can get can offer in achieving dynamic user disk usage 
> monitoring with Prometheus and Alert Manager.
>
> Thank you in advance .
>
> Best regards,
> Puneet
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/049b709b-8a09-4a49-9a71-f29a24314f30n%40googlegroups.com.

[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

Reply via email to