Re: [prometheus-users] Re: Losing metrics for the time when federation node is down

Ben Kochie Mon, 30 Nov 2020 04:23:17 -0800

On Mon, Nov 30, 2020 at 1:08 PM Aliaksandr Valialkin <[email protected]>
wrote:


>
>
> On Sun, Nov 29, 2020 at 3:10 PM Ben Kochie <[email protected]> wrote:
>
>> On Sun, Nov 29, 2020 at 11:51 AM Aliaksandr Valialkin <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Fri, Nov 27, 2020 at 11:11 AM Ben Kochie <[email protected]> wrote:
>>>
>>>>
>>>>>
>>>>>> Or else is there any other ways by which we can solve this issue.
>>>>>>
>>>>>
>>>>> Using something other than federation.  remote_write is able to buffer
>>>>> up data locally if the endpoint is down.
>>>>>
>>>>> Prometheus itself can't accept remote_write requests, so you'd have to
>>>>> write to some other system
>>>>> <https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage>
>>>>> which can.  I suggest VictoriaMetrics, as it's simple to run and has a 
>>>>> very
>>>>> prometheus-like API, which can be queried as if it were a prometheus
>>>>> instance.
>>>>>
>>>>
>>>> I recommend Thanos, as it scales better and with less effort than
>>>> VictoriaMetrics. It also uses PromQL code directly, so you will get the
>>>> same results as Prometheus, not an emulation of PromQL.
>>>>
>>>>
>>> Could you share more details on why you think that VictoriaMetrics has
>>> scalability issues and is harder to set up and operate than Thanos?
>>> VictoriaMetrics users have quite the opposite opinion. See
>>> https://victoriametrics.github.io/CaseStudies.html and
>>> https://medium.com/faun/comparing-thanos-to-victoriametrics-cluster-b193bea1683
>>> .
>>>
>>
>> Thanos uses object storage, which avoids the need for manual sharding of
>> TSDB storage. Today I have 100TiB of data stored in object storage buckets.
>> I make no changes to scale up or down these buckets.
>>
>>
> VictoriaMetrics stores data on persistent disks. Every replicated durable
> persistent disk in GCP <https://cloud.google.com/persistent-disk> can scale
> up to 64TB
> <https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd>
> without the need to stop VictoriaMetrics, i.e. without downtime. Given that 
> VictoriaMetrics
> compresses real-world data much better than Prometheus
> <https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f>,
> a single-node VictoriaMetrics can substitute the whole Thanos cluster for
> your workload (in theory of course - just give it a try in order to verify
> this statement :) ). Cluster version of VictoriaMetrics
> <https://victoriametrics.github.io/Cluster-VictoriaMetrics.html> can
> scale to petabytes. For example, a cluster with one terabyte capacity can
> be built with 16 vmstorage nodes with 64TB persistent disk per each node.
> That's why VictoriaMetrics in production usually has lower infrastructure
> costs than Thanos.
>

* GCP persistent disk costs double that of object storage, and is zone
local only.
* Cost is four times if you want regional replication.
* GCP persistent disks don't have multi-regional replication (GCS does by
default).
* Object storage versioning makes for easy lifecycle management for
disaster recovery.
* Plus you have to maintain some percent of un-used filesystem overhead to
avoid running out of space.
* You can't shrink Persistent disks
* And we're back to manual labor required to scale.

Storing on persistent disks is a major reason why we don't just use
Prometheus for TSDB. As an instance-level SPoF, the cost of persistent
disks compared to object storage, and the toil involved.

No thanks, we're moving away from old-school architectures.


>
>
> --
> Best Regards,
>
> Aliaksandr Valialkin, CTO VictoriaMetrics
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmqNtcDOu2nGTunh2QQz27ym3wkUDAfLN4Eos-FDUAwL%3DA%40mail.gmail.com.

Re: [prometheus-users] Re: Losing metrics for the time when federation node is down

Reply via email to