On 29 Feb 18:13, Nur Kholis Majid wrote: > Hi, > > On Sunday, March 1, 2020 at 7:55:30 AM UTC+7, Julien Pivotto wrote: > > > > On 29 Feb 16:40, Nur Kholis Majid wrote: > > > Hi Julien, > > > > > > On Sunday, March 1, 2020 at 6:44:34 AM UTC+7, Julien Pivotto wrote: > > > > > > > > On 29 Feb 15:38, Nur Kholis Majid wrote: > > > > > Hi, > > > > > > > > > > I've test prometheus to monitoring node_exporter on 400 instances. > > With > > > > > default configuration, in just two months tsdb size reach +- 450GB > > and > > > > > memory size +- 135GB. Query become slow and unuseable. > > > > > > > > > > [image: photo_2020-03-01_06-33-51.jpg] > > > > > > > > > > [image: photo_2020-03-01_06-34-00.jpg] > > > > Hi, > > > > Can you tell us what is in your data directory? Are compaction > > happening, etc? > > > > e.g. the command > > tree data > > > > or ls -Rl data > > > > too long to copy here. please see https://paste.ee/p/ayBlq > > Thanks
You have a lot of failed compations in the past, and a lot of .tmp directories. What is strange is that at the end compaction happens. I have the following next questions to help you: - What is your prometheus version? - Can you share the logs of prometheus? - Are you using the node_exporter textfile_collector? - Do you have metrics relabel configs? We have a few bugs out there but none of them explain that the wal is compacted correctly at the end. > > > > Thanks > > > > > > > > > > > > > > > > Can we know what you mean by default configuration? Is it default or > > > > documented one? What are your startup parameters? > > > > > > > > I mean I just add minimum configuration in prometheus.yml: > > > $ cat prometheus.yml > > > # my global config > > > global: > > > scrape_interval: 15s # Set the scrape interval to every 15 > > seconds. > > > Default is every 1 minute. > > > evaluation_interval: 15s # Evaluate rules every 15 seconds. The > > default > > > is every 1 minute. > > > # scrape_timeout is set to the global default (10s). > > > > > > # Alertmanager configuration > > > alerting: > > > alertmanagers: > > > - static_configs: > > > - targets: > > > # - alertmanager:9093 > > > > > > # Load rules once and periodically evaluate them according to the global > > > 'evaluation_interval'. > > > rule_files: > > > # - "first_rules.yml" > > > # - "second_rules.yml" > > > > > > # A scrape configuration containing exactly one endpoint to scrape: > > > # Here it's Prometheus itself. > > > scrape_configs: > > > # The job name is added as a label `job=<job_name>` to any timeseries > > > scraped from this config. > > > - job_name: 'prometheus' > > > > > > # metrics_path defaults to '/metrics' > > > # scheme defaults to 'http'. > > > > > > static_configs: > > > - targets: ['localhost:9090'] > > > > > > - job_name: 'node' > > > static_configs: > > > - targets: ['10.10.10.1:9100', '10.10.10.2:9100', etc until 400 > > nodes] > > > > > > In node_exporter side, no additional config made. > > > > > > > > > > How many series do you have? > > > > max_over_time(prometheus_tsdb_head_series[1d]) > > > > > > > > 771651 > > > > > > > > > > Do you have lots of different disks/devices per machines ? lots of > > > > network interfaces? > > > > > > > Yes. Each node consist of 2 NIC in bonding mode and 12 disks. > > > > > > > > > > > > > > I recommend you read > > > > > > > > > > https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion > > > > > > to better understand this. > > > > > > > > > > > > > > > > > > > Question: > > > > > 1. How many maximum node_exporter instances can handle by prometheus > > > > with > > > > > acceptable query duration? > > > > > 2. Is there any special prometheus configuration for huge amount of > > > > > instances? > > > > > > > > > > Thank you > > > > > > > > > > -- > > > > > You received this message because you are subscribed to the Google > > > > Groups "Prometheus Users" group. > > > > > To unsubscribe from this group and stop receiving emails from it, > > send > > > > an email to [email protected] <javascript:>. > > > > > To view this discussion on the web visit > > > > > > https://groups.google.com/d/msgid/prometheus-users/7da6b213-02d0-4beb-83fb-e943701b2422%40googlegroups.com. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > (o- Julien Pivotto > > > > //\ Open-Source Consultant > > > > V_/_ Inuits - https://www.inuits.eu > > > > > > > > > > -- > > > You received this message because you are subscribed to the Google > > Groups "Prometheus Users" group. > > > To unsubscribe from this group and stop receiving emails from it, send > > an email to [email protected] <javascript:>. > > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/prometheus-users/986e63a7-798d-4945-adf6-580f9e48ad4b%40googlegroups.com. > > > > > > > > > > -- > > (o- Julien Pivotto > > //\ Open-Source Consultant > > V_/_ Inuits - https://www.inuits.eu > > > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/97240c8d-3a9d-4bf4-9a14-a91ae0a087d9%40googlegroups.com. -- (o- Julien Pivotto //\ Open-Source Consultant V_/_ Inuits - https://www.inuits.eu -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20200301090157.GA14672%40oxygen.
signature.asc
Description: PGP signature

