Re: Extstore revival after crash

2023-04-24 Thread dormando
Hey,Aside:I'm actually busy trying to parse the datafile with a small Go program to try and replay all the data. Solving this warming will give us a lot of confidence to roll this out in a big way across our infra.What're your thoughts on this and the above?It would be really bad for both of us if you created a mission critical backup solution based off of an undocumented, unsupported dataformat which potentially changes with version updates. I think you may have also misunderstood me; the data is actually partially in RAM.Is there any chance I could get you into the MC discord to chat a bit further about your use case? (linked from https://memcached.org/) - easier to play 20 questions there. If that's not possible I'll list a bunch of questions in the mailing list here instead :)@Javier, thanks for your thoughts here too. Replication is not an option for us at this scale; that said, your solution is pretty cool!One of many questions; is this due to cost? (ie; don't want to double the cache storage) or some other reason?On Monday, April 24, 2023 at 1:05:23 PM UTC+2 Javier Arias Losada wrote:Hi there,one thing we've done to mitigate this kind of risk is having two copies of every shard in different availability zones in our cloud provider. Also, we run in kubernetes so for us nodes leaving the cluster is a relatively frequent issue... we are playing with a small process that does the warmup of new nodes quicker.Since we have more than one copy of the data, we do a warmup process. Our cache nodes are MUCH MUCH smaller... so this approach might not be reasonable for your use-case.This is how our process works, when a new node is restarted or any other situation that involves an empty memcached process starting, our warmup process: locates the warmer node for the shardgets all the keys and TTLS with from the warmer node: lru_crawler metadump alltraverses in reverse the list of keys (lru_crawler goes from the least recently used, for this it's better to go from most recent).For each key: get the value from the warmer node and add (not set) it to the cold node, including TTL.This process might lead to some small data inconcistencies, it will depend on your use case how important that is.Since our access patterns are very skewed (a small % of keys gets the bigger % of traffic, at least during some time) going in reverse in the LRU dump helps being much more effective.BestJavier AriasOn Sunday, April 23, 2023 at 7:24:28 PM UTC+2 dormando wrote:Hey,

Thanks for reaching out!

There is no crash safety in memcached or extstore; it does look like the
data is on disk but it is actually spread across memory and disk, with
recent or heavily accessed data staying in RAM. Best case you only recover
your cold data. Further, keys can appear multiple times in the extstore
datafile and we rely on the RAM index to know which one is current.

I've never heard of anyone losing an entire cluster, but people do try to
mitigate this by replicating cache across availability zones/regions.
This can be done with a few methods, like our new proxy code. I'd be happy
to go over a few scenarios if you'd like.

-Dormando

On Sun, 23 Apr 2023, 'Danny Kopping' via memcached wrote:

> First off, thanks for the amazing work @dormando & others!
> Context:
> I work at Grafana Labs, and we are very interested in trying out extstore for some very large (>50TB) caches. We plan to split this 50TB cache into about
> 35 different nodes, each with 1.5TB of NVMe & a small memcached instance. Losing any given node will result in losing ~3% of the overall cache which is
> acceptable, however if we lose all nodes at once somehow, losing all of our cache will be pretty bad and will put severe pressure on our backend.
>
> Ask:
> Having looked at the file that extstore writes on disk, it looks like it has both keys & values contained in it. Would it be possible to "re-warm" the
> cache on startup by scanning this data and resubmitting it to itself? We could then have add some condition to our readiness check in k8s to wait until
> the data is all re-warmed and then allow traffic to flow to those instances. Is this feature planned for anytime soon?
>
> Thanks!
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/memcached/cc45382b-eee7-4e37-a841-d210bf18ff4bn%40googlegroups.com.
>
>




-- 

--- 
You received this message because you are subscribed to the Google Groups "memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/memcached/daa1fca5-e1b6-4879-9184-eafe0aa9cb82n%40googlegroups.com.




-- 

--- 
You received this message because you are subscribed 

Re: Extstore revival after crash

2023-04-24 Thread 'Danny Kopping' via memcached
Thanks for the reply @dormando!

> Best case you only recover your cold data. Further, keys can appear 
multiple times in the extstore datafile and we rely on the RAM index to 
know which one is current.

This is actually perfect for our use-case. We just need a big ol' cache of 
cold data, and we never overwrite keys; they're immutable in our system.
The volume of data we're dealing with is so big that there will be very 
little hotspotting on any particular keys, so I'm intending to force most 
of the data into cold storage.

The cache will be used to as a read-through, to protect our upstream 
service which we're loading many millions of files from (object storage) - 
sometimes up to several hundred thousand RPS.
It's true that it's unlikely that we'll lose everything all at once, and we 
will design for frequent failure, but as ever "hope is not a strategy" 
(although it springs eternal... :))

Aside:
I'm actually busy trying to parse the datafile with a small Go program to 
try and replay all the data. Solving this warming will give us a lot of 
confidence to roll this out in a big way across our infra.
What're your thoughts on this and the above?

@Javier, thanks for your thoughts here too. Replication is not an option 
for us at this scale; that said, your solution is pretty cool!

On Monday, April 24, 2023 at 1:05:23 PM UTC+2 Javier Arias Losada wrote:

> Hi there,
>
> one thing we've done to mitigate this kind of risk is having two copies of 
> every shard in different availability zones in our cloud provider. Also, we 
> run in kubernetes so for us nodes leaving the cluster is a relatively 
> frequent issue... we are playing with a small process that does the warmup 
> of new nodes quicker.
>
> Since we have more than one copy of the data, we do a warmup process. Our 
> cache nodes are MUCH MUCH smaller... so this approach might not be 
> reasonable for your use-case.
>
> This is how our process works, when a new node is restarted or any other 
> situation that involves an empty memcached process starting, our warmup 
> process: 
> locates the warmer node for the shard
> gets all the keys and TTLS with from the warmer node: lru_crawler 
> metadump all
> traverses in reverse the list of keys (lru_crawler goes from the least 
> recently used, for this it's better to go from most recent).
> For each key: get the value from the warmer node and add (not set) it to 
> the cold node, including TTL.
>
> This process might lead to some small data inconcistencies, it will depend 
> on your use case how important that is.
>
> Since our access patterns are very skewed (a small % of keys gets the 
> bigger % of traffic, at least during some time) going in reverse in the LRU 
> dump helps being much more effective.
>
> Best
> Javier Arias
> On Sunday, April 23, 2023 at 7:24:28 PM UTC+2 dormando wrote:
>
>> Hey, 
>>
>> Thanks for reaching out! 
>>
>> There is no crash safety in memcached or extstore; it does look like the 
>> data is on disk but it is actually spread across memory and disk, with 
>> recent or heavily accessed data staying in RAM. Best case you only 
>> recover 
>> your cold data. Further, keys can appear multiple times in the extstore 
>> datafile and we rely on the RAM index to know which one is current. 
>>
>> I've never heard of anyone losing an entire cluster, but people do try to 
>> mitigate this by replicating cache across availability zones/regions. 
>> This can be done with a few methods, like our new proxy code. I'd be 
>> happy 
>> to go over a few scenarios if you'd like. 
>>
>> -Dormando 
>>
>> On Sun, 23 Apr 2023, 'Danny Kopping' via memcached wrote: 
>>
>> > First off, thanks for the amazing work @dormando & others! 
>> > Context: 
>> > I work at Grafana Labs, and we are very interested in trying out 
>> extstore for some very large (>50TB) caches. We plan to split this 50TB 
>> cache into about 
>> > 35 different nodes, each with 1.5TB of NVMe & a small memcached 
>> instance. Losing any given node will result in losing ~3% of the overall 
>> cache which is 
>> > acceptable, however if we lose all nodes at once somehow, losing all of 
>> our cache will be pretty bad and will put severe pressure on our backend. 
>> > 
>> > Ask: 
>> > Having looked at the file that extstore writes on disk, it looks like 
>> it has both keys & values contained in it. Would it be possible to 
>> "re-warm" the 
>> > cache on startup by scanning this data and resubmitting it to itself? 
>> We could then have add some condition to our readiness check in k8s to wait 
>> until 
>> > the data is all re-warmed and then allow traffic to flow to those 
>> instances. Is this feature planned for anytime soon? 
>> > 
>> > Thanks! 
>> > 
>> > -- 
>> > 
>> > --- 
>> > You received this message because you are subscribed to the Google 
>> Groups "memcached" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to memcached+...@googlegroups.com. 
>> > To view this 

Re: Extstore revival after crash

2023-04-24 Thread Javier Arias Losada
Hi there,

one thing we've done to mitigate this kind of risk is having two copies of 
every shard in different availability zones in our cloud provider. Also, we 
run in kubernetes so for us nodes leaving the cluster is a relatively 
frequent issue... we are playing with a small process that does the warmup 
of new nodes quicker.

Since we have more than one copy of the data, we do a warmup process. Our 
cache nodes are MUCH MUCH smaller... so this approach might not be 
reasonable for your use-case.

This is how our process works, when a new node is restarted or any other 
situation that involves an empty memcached process starting, our warmup 
process: 
locates the warmer node for the shard
gets all the keys and TTLS with from the warmer node: lru_crawler metadump 
all
traverses in reverse the list of keys (lru_crawler goes from the least 
recently used, for this it's better to go from most recent).
For each key: get the value from the warmer node and add (not set) it to 
the cold node, including TTL.

This process might lead to some small data inconcistencies, it will depend 
on your use case how important that is.

Since our access patterns are very skewed (a small % of keys gets the 
bigger % of traffic, at least during some time) going in reverse in the LRU 
dump helps being much more effective.

Best
Javier Arias
On Sunday, April 23, 2023 at 7:24:28 PM UTC+2 dormando wrote:

> Hey,
>
> Thanks for reaching out!
>
> There is no crash safety in memcached or extstore; it does look like the
> data is on disk but it is actually spread across memory and disk, with
> recent or heavily accessed data staying in RAM. Best case you only recover
> your cold data. Further, keys can appear multiple times in the extstore
> datafile and we rely on the RAM index to know which one is current.
>
> I've never heard of anyone losing an entire cluster, but people do try to
> mitigate this by replicating cache across availability zones/regions.
> This can be done with a few methods, like our new proxy code. I'd be happy
> to go over a few scenarios if you'd like.
>
> -Dormando
>
> On Sun, 23 Apr 2023, 'Danny Kopping' via memcached wrote:
>
> > First off, thanks for the amazing work @dormando & others!
> > Context:
> > I work at Grafana Labs, and we are very interested in trying out 
> extstore for some very large (>50TB) caches. We plan to split this 50TB 
> cache into about
> > 35 different nodes, each with 1.5TB of NVMe & a small memcached 
> instance. Losing any given node will result in losing ~3% of the overall 
> cache which is
> > acceptable, however if we lose all nodes at once somehow, losing all of 
> our cache will be pretty bad and will put severe pressure on our backend.
> >
> > Ask:
> > Having looked at the file that extstore writes on disk, it looks like it 
> has both keys & values contained in it. Would it be possible to "re-warm" 
> the
> > cache on startup by scanning this data and resubmitting it to itself? We 
> could then have add some condition to our readiness check in k8s to wait 
> until
> > the data is all re-warmed and then allow traffic to flow to those 
> instances. Is this feature planned for anytime soon?
> >
> > Thanks!
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to memcached+...@googlegroups.com.
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/cc45382b-eee7-4e37-a841-d210bf18ff4bn%40googlegroups.com
> .
> >
> >
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/cc97a126-f5c1-49e8-9e0b-d370efac7224n%40googlegroups.com.