Re: [prometheus-users] Preventing data loss from poor network communication

Mathieu Tétreault Mon, 15 Jun 2020 11:41:39 -0700

Alright, thank you for your time.

On Mon, Jun 15, 2020 at 2:19 PM Stuart Clark <[email protected]>
wrote:


> The Push Gateway isn't a caching system. If the Prometheus server can't
> connect to fetch a scrape due to network issues you will miss data. The
> server needs to have reliable connectivity to the systems it is scraping.
>
> On 15 June 2020 13:29:20 BST, "Mathieu Tétreault" <
> [email protected]> wrote:
>>
>> Alright, I'll look into it.
>>
>> Just in case we don't have the resources required to run prometheus and
>> thanos sidecar on the metrics server.
>>
>> Would there be any issues using the pushgateway to cache the metrics
>> while the network is down? I understand that it would be more complicated
>> to implement, but other than that? I'll do some testing this week, but I
>> was wondering if there were anything that I was missing.
>>
>> Thanks for your help, it is really appreciated.
>>
>> Cheers,
>>
>> Mathieu
>>
>> On Sun, Jun 14, 2020 at 7:32 AM Stuart Clark <[email protected]>
>> wrote:
>>
>>> What you'd generally do is look at using federation or one of the global
>>> storage systems like Victoria Metrics, Thanos or Cortex.
>>>
>>> You'd have a Prometheus server in each location, and then central
>>> systems for global views and alerts.
>>>
>>> On 14 June 2020 12:19:43 BST, "Mathieu Tétreault" <
>>> [email protected]> wrote:
>>>>
>>>> I will have to double check, at first glance, the metrics servers
>>>> didn't have enough resources available to run prometheus alongside their
>>>> application.
>>>> That's the main reason why I started to investigate setting up a
>>>> watchdog setup and the pushgateway.
>>>>
>>>> My understanding is that it will also prevent grafana frome properly
>>>> displaying the data properly from time to time. Since sometimes it won't be
>>>> able to query the metrics server, an issue that would be less visible if we
>>>> have a global prometheus instance that stores all the data.
>>>>
>>>> Cheers,
>>>>
>>>> Mathieu
>>>>
>>>> On Sat, Jun 13, 2020 at 8:25 AM Stuart Clark <[email protected]>
>>>> wrote:
>>>>
>>>>> On 12/06/2020 19:45, Mathieu Tétreault wrote:
>>>>> > We plan on using prometheus to fetch data from multiples servers and
>>>>> > the link between the metrics's server and the prometheus servers is
>>>>> > known for not being that reliable. The instability can last a
>>>>> couples
>>>>> > of minutes and there is nothing we can do about it.
>>>>> >
>>>>> > Most of the time prometheus will be able to fetch the metrics.
>>>>> > However, when prometheus is unable to pull the data the metrics
>>>>> server
>>>>> > will need to be able to cache them until the connection is back.
>>>>> >
>>>>> > Since most of the time the connection will be up, I was thinking
>>>>> about
>>>>> > setting up a watchdog refreshed by the metric pull. When the
>>>>> watchdog
>>>>> > trigs, then cache the data until the pushgateway is pulled.
>>>>> >
>>>>> > If anyone had any advise on that, that'd be appreciated.
>>>>> >
>>>>>
>>>>> Is it possible to run the Prometheus server on the other end of the
>>>>> link?
>>>>>
>>>>> In general it is advised to run Prometheus servers as close as
>>>>> possible
>>>>> to the things being monitored. For example a server per datacenter
>>>>> instead of a single global server, etc.
>>>>>
>>>>>
>>> --
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAO%2BPXKMFAhmzkt%2BHYyepHPr-mhHLteSWP8D7y%2BysDG9yAzt4Ow%40mail.gmail.com.

Re: [prometheus-users] Preventing data loss from poor network communication

Reply via email to