Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread dormando
Thanks for taking the time to evaluate! It helps my confidence level with
the fix.

You caught me at a good time :) Been really behind with fixes for quite a
while and only catching up this week. I've looked at this a few times and
didn't see the easy fix before...

I think earlier versions of the item chunking code were more fragile and I
didn't revisit it after the cleanup work. In this case each chunk
remembers its original slab class, so having the final chunk be from an
unintended class doesn't break anything. Otherwise freeing the chunks
would be impossible if I had to recalculate their original slab class from
the chunk size.

So now it'll use too much memory in some cases, and lowering slab chunk
max would ease that a bit... so maybe soon will finally be a good time to
lower the default chunk max a little to at least 128k or 256k.

-Dormando

On Fri, 26 Aug 2022, Hayden wrote:

> I didn't see the docker files in the repo that could build the docker image, 
> and when I tried cloning the git repo and doing a docker build I encountered
> errors that I think were related to the web proxy on my work network. I was 
> able to grab the release tarball and the bitnami docker file, do a little
> surgery to work around my proxy issue, and build a 1.6.17 docker image though.
> I ran my application against the new version and it ran for ~2hr without any 
> errors (it previously wouldn't run more than 30s or so before encountering
> blocks of the OOM during read errors). I also made a little test loop that 
> just hammered the instance with similar sized writes (1-2MB) as fast as it
> could and let it run a few hours, and it didn't have a single blip. That 
> encompassed a couple million evictions. I'm pretty comfortable saying the 
> issue
> is fixed, at least for the kind of use I had in mind.
>
> I added a comment to the issue on GitHub to the same effect.
>
> I'm impressed by the quick turnaround, BTW. ;-)
>
> H
>
> On Friday, August 26, 2022 at 5:54:26 PM UTC-7 Dormando wrote:
>   So I tested this a bit more and released it in 1.6.17; I think bitnami
>   should pick it up soonish. if not I'll try to figure out docker this
>   weekend if you still need it.
>
>   I'm not 100% sure it'll fix your use case but it does fix some things I
>   can test and it didn't seem like a regression. would be nice to validate
>   still.
>
>   On Fri, 26 Aug 2022, dormando wrote:
>
>   > You can't build docker images or compile binaries? there's a
>   > docker-compose.yml in the repo already if that helps.
>   >
>   > If not I can try but I don't spend a lot of time with docker directly.
>   >
>   > On Fri, 26 Aug 2022, Hayden wrote:
>   >
>   > > I'd be happy to help validate the fix, but I can't do it until the 
> weekend, and I don't have a ready way to build an updated image. Any
>   chance you could
>   > > create a docker image with the fix that I could grab from somewhere?
>   > >
>   > > On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:
>   > > I have an opportunity to put this fix into a release today if 
> anyone wants
>   > > to help validate :)
>   > >
>   > > On Thu, 25 Aug 2022, dormando wrote:
>   > >
>   > > > Took another quick look...
>   > > >
>   > > > Think there's an easy patch that might work:
>   > > > https://github.com/memcached/memcached/pull/924
>   > > >
>   > > > If you wouldn't mind helping validate? An external validator 
> would help me
>   > > > get it in time for the next release :)
>   > > >
>   > > > Thanks,
>   > > > -Dormando
>   > > >
>   > > > On Wed, 24 Aug 2022, dormando wrote:
>   > > >
>   > > > > Hey,
>   > > > >
>   > > > > Thanks for the info. Yes; this generally confirms the issue. I 
> see some of
>   > > > > your higher slab classes with "free_chunks 0", so if you're 
> setting data
>   > > > > that requires these chunks it could error out. The "stats 
> items" confirms
>   > > > > this since there are no actual items in those lower slab 
> classes.
>   > > > >
>   > > > > You're certainly right a workaround of making your items < 512k 
> would also
>   > > > > work; but in general if I have features it'd be nice if they 
> worked well
>   > > > > :) Please open an issue so we can improve things!
>   > > > >
>   > > > > I intended to lower the slab_chunk_max default from 512k to 
> much lower, as
>   > > > > that actually raises the memory efficiency by a bit (less gap 
> at the
>   > > > > higher classes). That may help here. The system should also try 
> ejecting
>   > > > > items from the highest LRU... I need to double check that it 
> wasn't
>   > > > > already intending to do that and failing.
>   > > > >
>   > > > > Might also be able to adjust the page mover but not sure. The 
> page mover
>   > > > > can probably be adjusted to atte

Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread Hayden
I didn't see the docker files in the repo that could build the docker 
image, and when I tried cloning the git repo and doing a docker build I 
encountered errors that I think were related to the web proxy on my work 
network. I was able to grab the release tarball and the bitnami docker 
file, do a little surgery to work around my proxy issue, and build a 1.6.17 
docker image though.

I ran my application against the new version and it ran for ~2hr without 
any errors (it previously wouldn't run more than 30s or so before 
encountering blocks of the OOM during read errors). I also made a little 
test loop that just hammered the instance with similar sized writes (1-2MB) 
as fast as it could and let it run a few hours, and it didn't have a single 
blip. That encompassed a couple million evictions. I'm pretty comfortable 
saying the issue is fixed, at least for the kind of use I had in mind.

I added a comment to the issue on GitHub to the same effect.

I'm impressed by the quick turnaround, BTW. ;-)

H

On Friday, August 26, 2022 at 5:54:26 PM UTC-7 Dormando wrote:

> So I tested this a bit more and released it in 1.6.17; I think bitnami
> should pick it up soonish. if not I'll try to figure out docker this
> weekend if you still need it.
>
> I'm not 100% sure it'll fix your use case but it does fix some things I
> can test and it didn't seem like a regression. would be nice to validate
> still.
>
> On Fri, 26 Aug 2022, dormando wrote:
>
> > You can't build docker images or compile binaries? there's a
> > docker-compose.yml in the repo already if that helps.
> >
> > If not I can try but I don't spend a lot of time with docker directly.
> >
> > On Fri, 26 Aug 2022, Hayden wrote:
> >
> > > I'd be happy to help validate the fix, but I can't do it until the 
> weekend, and I don't have a ready way to build an updated image. Any chance 
> you could
> > > create a docker image with the fix that I could grab from somewhere?
> > >
> > > On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:
> > > I have an opportunity to put this fix into a release today if anyone 
> wants
> > > to help validate :)
> > >
> > > On Thu, 25 Aug 2022, dormando wrote:
> > >
> > > > Took another quick look...
> > > >
> > > > Think there's an easy patch that might work:
> > > > https://github.com/memcached/memcached/pull/924
> > > >
> > > > If you wouldn't mind helping validate? An external validator would 
> help me
> > > > get it in time for the next release :)
> > > >
> > > > Thanks,
> > > > -Dormando
> > > >
> > > > On Wed, 24 Aug 2022, dormando wrote:
> > > >
> > > > > Hey,
> > > > >
> > > > > Thanks for the info. Yes; this generally confirms the issue. I see 
> some of
> > > > > your higher slab classes with "free_chunks 0", so if you're 
> setting data
> > > > > that requires these chunks it could error out. The "stats items" 
> confirms
> > > > > this since there are no actual items in those lower slab classes.
> > > > >
> > > > > You're certainly right a workaround of making your items < 512k 
> would also
> > > > > work; but in general if I have features it'd be nice if they 
> worked well
> > > > > :) Please open an issue so we can improve things!
> > > > >
> > > > > I intended to lower the slab_chunk_max default from 512k to much 
> lower, as
> > > > > that actually raises the memory efficiency by a bit (less gap at 
> the
> > > > > higher classes). That may help here. The system should also try 
> ejecting
> > > > > items from the highest LRU... I need to double check that it wasn't
> > > > > already intending to do that and failing.
> > > > >
> > > > > Might also be able to adjust the page mover but not sure. The page 
> mover
> > > > > can probably be adjusted to attempt to keep one page in reserve, 
> but I
> > > > > think the algorithm isn't expecting slabs with no items in it so 
> I'd have
> > > > > to audit that too.
> > > > >
> > > > > If you're up for experiments it'd be interesting to know if setting
> > > > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) makes 
> things
> > > > > better or worse.
> > > > >
> > > > > Also, crud.. it's documented as kilobytes but that's not working 
> somehow?
> > > > > aaahahah. I guess the big EXPERIMENTAL tag scared people off since 
> that
> > > > > never got reported.
> > > > >
> > > > > I'm guessing most people have a mix of small to large items, but 
> you only
> > > > > have large items and a relatively low memory limit, so this is why 
> you're
> > > > > seeing it so easily. I think most people setting large items have 
> like
> > > > > 30G+ of memory so you end up with more spread around.
> > > > >
> > > > > Thanks,
> > > > > -Dormando
> > > > >
> > > > > On Wed, 24 Aug 2022, Hayden wrote:
> > > > >
> > > > > > What you're saying makes sense, and I'm pretty sure it won't be 
> too hard to add some functionality to my writing code to break my large
> > > items up into
> > > > > > smaller parts that can each fit into a single chunk. That has 
> the ad

Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread dormando
So I tested this a bit more and released it in 1.6.17; I think bitnami
should pick it up soonish. if not I'll try to figure out docker this
weekend if you still need it.

I'm not 100% sure it'll fix your use case but it does fix some things I
can test and it didn't seem like a regression. would be nice to validate
still.

On Fri, 26 Aug 2022, dormando wrote:

> You can't build docker images or compile binaries? there's a
> docker-compose.yml in the repo already if that helps.
>
> If not I can try but I don't spend a lot of time with docker directly.
>
> On Fri, 26 Aug 2022, Hayden wrote:
>
> > I'd be happy to help validate the fix, but I can't do it until the weekend, 
> > and I don't have a ready way to build an updated image. Any chance you could
> > create a docker image with the fix that I could grab from somewhere?
> >
> > On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:
> >   I have an opportunity to put this fix into a release today if anyone 
> > wants
> >   to help validate :)
> >
> >   On Thu, 25 Aug 2022, dormando wrote:
> >
> >   > Took another quick look...
> >   >
> >   > Think there's an easy patch that might work:
> >   > https://github.com/memcached/memcached/pull/924
> >   >
> >   > If you wouldn't mind helping validate? An external validator would 
> > help me
> >   > get it in time for the next release :)
> >   >
> >   > Thanks,
> >   > -Dormando
> >   >
> >   > On Wed, 24 Aug 2022, dormando wrote:
> >   >
> >   > > Hey,
> >   > >
> >   > > Thanks for the info. Yes; this generally confirms the issue. I 
> > see some of
> >   > > your higher slab classes with "free_chunks 0", so if you're 
> > setting data
> >   > > that requires these chunks it could error out. The "stats items" 
> > confirms
> >   > > this since there are no actual items in those lower slab classes.
> >   > >
> >   > > You're certainly right a workaround of making your items < 512k 
> > would also
> >   > > work; but in general if I have features it'd be nice if they 
> > worked well
> >   > > :) Please open an issue so we can improve things!
> >   > >
> >   > > I intended to lower the slab_chunk_max default from 512k to much 
> > lower, as
> >   > > that actually raises the memory efficiency by a bit (less gap at 
> > the
> >   > > higher classes). That may help here. The system should also try 
> > ejecting
> >   > > items from the highest LRU... I need to double check that it 
> > wasn't
> >   > > already intending to do that and failing.
> >   > >
> >   > > Might also be able to adjust the page mover but not sure. The 
> > page mover
> >   > > can probably be adjusted to attempt to keep one page in reserve, 
> > but I
> >   > > think the algorithm isn't expecting slabs with no items in it so 
> > I'd have
> >   > > to audit that too.
> >   > >
> >   > > If you're up for experiments it'd be interesting to know if 
> > setting
> >   > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) 
> > makes things
> >   > > better or worse.
> >   > >
> >   > > Also, crud.. it's documented as kilobytes but that's not working 
> > somehow?
> >   > > aaahahah. I guess the big EXPERIMENTAL tag scared people off 
> > since that
> >   > > never got reported.
> >   > >
> >   > > I'm guessing most people have a mix of small to large items, but 
> > you only
> >   > > have large items and a relatively low memory limit, so this is 
> > why you're
> >   > > seeing it so easily. I think most people setting large items have 
> > like
> >   > > 30G+ of memory so you end up with more spread around.
> >   > >
> >   > > Thanks,
> >   > > -Dormando
> >   > >
> >   > > On Wed, 24 Aug 2022, Hayden wrote:
> >   > >
> >   > > > What you're saying makes sense, and I'm pretty sure it won't be 
> > too hard to add some functionality to my writing code to break my large
> >   items up into
> >   > > > smaller parts that can each fit into a single chunk. That has 
> > the added benefit that I won't have to bother increasing the max item
> >   size.
> >   > > > In the meantime, though, I reran my pipeline and captured the 
> > output of stats, stats slabs, and stats items both when evicting normally
> >   and when getting
> >   > > > spammed with the error.
> >   > > >
> >   > > > First, the output when I'm in the error state:
> >   > > >  Output of stats
> >   > > > STAT pid 1
> >   > > > STAT uptime 11727
> >   > > > STAT time 1661406229
> >   > > > STAT version b'1.6.14'
> >   > > > STAT libevent b'2.1.8-stable'
> >   > > > STAT pointer_size 64
> >   > > > STAT rusage_user 2.93837
> >   > > > STAT rusage_system 6.339015
> >   > > > STAT max_connections 1024
> >   > > > STAT curr_connections 2
> >   > > > STAT total_conne

Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread dormando
You can't build docker images or compile binaries? there's a
docker-compose.yml in the repo already if that helps.

If not I can try but I don't spend a lot of time with docker directly.

On Fri, 26 Aug 2022, Hayden wrote:

> I'd be happy to help validate the fix, but I can't do it until the weekend, 
> and I don't have a ready way to build an updated image. Any chance you could
> create a docker image with the fix that I could grab from somewhere?
>
> On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:
>   I have an opportunity to put this fix into a release today if anyone 
> wants
>   to help validate :)
>
>   On Thu, 25 Aug 2022, dormando wrote:
>
>   > Took another quick look...
>   >
>   > Think there's an easy patch that might work:
>   > https://github.com/memcached/memcached/pull/924
>   >
>   > If you wouldn't mind helping validate? An external validator would 
> help me
>   > get it in time for the next release :)
>   >
>   > Thanks,
>   > -Dormando
>   >
>   > On Wed, 24 Aug 2022, dormando wrote:
>   >
>   > > Hey,
>   > >
>   > > Thanks for the info. Yes; this generally confirms the issue. I see 
> some of
>   > > your higher slab classes with "free_chunks 0", so if you're setting 
> data
>   > > that requires these chunks it could error out. The "stats items" 
> confirms
>   > > this since there are no actual items in those lower slab classes.
>   > >
>   > > You're certainly right a workaround of making your items < 512k 
> would also
>   > > work; but in general if I have features it'd be nice if they worked 
> well
>   > > :) Please open an issue so we can improve things!
>   > >
>   > > I intended to lower the slab_chunk_max default from 512k to much 
> lower, as
>   > > that actually raises the memory efficiency by a bit (less gap at the
>   > > higher classes). That may help here. The system should also try 
> ejecting
>   > > items from the highest LRU... I need to double check that it wasn't
>   > > already intending to do that and failing.
>   > >
>   > > Might also be able to adjust the page mover but not sure. The page 
> mover
>   > > can probably be adjusted to attempt to keep one page in reserve, 
> but I
>   > > think the algorithm isn't expecting slabs with no items in it so 
> I'd have
>   > > to audit that too.
>   > >
>   > > If you're up for experiments it'd be interesting to know if setting
>   > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) makes 
> things
>   > > better or worse.
>   > >
>   > > Also, crud.. it's documented as kilobytes but that's not working 
> somehow?
>   > > aaahahah. I guess the big EXPERIMENTAL tag scared people off since 
> that
>   > > never got reported.
>   > >
>   > > I'm guessing most people have a mix of small to large items, but 
> you only
>   > > have large items and a relatively low memory limit, so this is why 
> you're
>   > > seeing it so easily. I think most people setting large items have 
> like
>   > > 30G+ of memory so you end up with more spread around.
>   > >
>   > > Thanks,
>   > > -Dormando
>   > >
>   > > On Wed, 24 Aug 2022, Hayden wrote:
>   > >
>   > > > What you're saying makes sense, and I'm pretty sure it won't be 
> too hard to add some functionality to my writing code to break my large
>   items up into
>   > > > smaller parts that can each fit into a single chunk. That has the 
> added benefit that I won't have to bother increasing the max item
>   size.
>   > > > In the meantime, though, I reran my pipeline and captured the 
> output of stats, stats slabs, and stats items both when evicting normally
>   and when getting
>   > > > spammed with the error.
>   > > >
>   > > > First, the output when I'm in the error state:
>   > > >  Output of stats
>   > > > STAT pid 1
>   > > > STAT uptime 11727
>   > > > STAT time 1661406229
>   > > > STAT version b'1.6.14'
>   > > > STAT libevent b'2.1.8-stable'
>   > > > STAT pointer_size 64
>   > > > STAT rusage_user 2.93837
>   > > > STAT rusage_system 6.339015
>   > > > STAT max_connections 1024
>   > > > STAT curr_connections 2
>   > > > STAT total_connections 8230
>   > > > STAT rejected_connections 0
>   > > > STAT connection_structures 6
>   > > > STAT response_obj_oom 0
>   > > > STAT response_obj_count 1
>   > > > STAT response_obj_bytes 65536
>   > > > STAT read_buf_count 8
>   > > > STAT read_buf_bytes 131072
>   > > > STAT read_buf_bytes_free 49152
>   > > > STAT read_buf_oom 0
>   > > > STAT reserved_fds 20
>   > > > STAT cmd_get 0
>   > > > STAT cmd_set 12640
>   > > > STAT cmd_flush 0
>   > > > STAT cmd_touch 0
>   > > > STAT cmd_meta 0
>   > > > STAT get_hits 0
>   

Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread Hayden
I'd be happy to help validate the fix, but I can't do it until the weekend, 
and I don't have a ready way to build an updated image. Any chance you 
could create a docker image with the fix that I could grab from somewhere?

On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:

> I have an opportunity to put this fix into a release today if anyone wants
> to help validate :)
>
> On Thu, 25 Aug 2022, dormando wrote:
>
> > Took another quick look...
> >
> > Think there's an easy patch that might work:
> > https://github.com/memcached/memcached/pull/924
> >
> > If you wouldn't mind helping validate? An external validator would help 
> me
> > get it in time for the next release :)
> >
> > Thanks,
> > -Dormando
> >
> > On Wed, 24 Aug 2022, dormando wrote:
> >
> > > Hey,
> > >
> > > Thanks for the info. Yes; this generally confirms the issue. I see 
> some of
> > > your higher slab classes with "free_chunks 0", so if you're setting 
> data
> > > that requires these chunks it could error out. The "stats items" 
> confirms
> > > this since there are no actual items in those lower slab classes.
> > >
> > > You're certainly right a workaround of making your items < 512k would 
> also
> > > work; but in general if I have features it'd be nice if they worked 
> well
> > > :) Please open an issue so we can improve things!
> > >
> > > I intended to lower the slab_chunk_max default from 512k to much 
> lower, as
> > > that actually raises the memory efficiency by a bit (less gap at the
> > > higher classes). That may help here. The system should also try 
> ejecting
> > > items from the highest LRU... I need to double check that it wasn't
> > > already intending to do that and failing.
> > >
> > > Might also be able to adjust the page mover but not sure. The page 
> mover
> > > can probably be adjusted to attempt to keep one page in reserve, but I
> > > think the algorithm isn't expecting slabs with no items in it so I'd 
> have
> > > to audit that too.
> > >
> > > If you're up for experiments it'd be interesting to know if setting
> > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) makes 
> things
> > > better or worse.
> > >
> > > Also, crud.. it's documented as kilobytes but that's not working 
> somehow?
> > > aaahahah. I guess the big EXPERIMENTAL tag scared people off since that
> > > never got reported.
> > >
> > > I'm guessing most people have a mix of small to large items, but you 
> only
> > > have large items and a relatively low memory limit, so this is why 
> you're
> > > seeing it so easily. I think most people setting large items have like
> > > 30G+ of memory so you end up with more spread around.
> > >
> > > Thanks,
> > > -Dormando
> > >
> > > On Wed, 24 Aug 2022, Hayden wrote:
> > >
> > > > What you're saying makes sense, and I'm pretty sure it won't be too 
> hard to add some functionality to my writing code to break my large items 
> up into
> > > > smaller parts that can each fit into a single chunk. That has the 
> added benefit that I won't have to bother increasing the max item size.
> > > > In the meantime, though, I reran my pipeline and captured the output 
> of stats, stats slabs, and stats items both when evicting normally and when 
> getting
> > > > spammed with the error.
> > > >
> > > > First, the output when I'm in the error state:
> > > >  Output of stats
> > > > STAT pid 1
> > > > STAT uptime 11727
> > > > STAT time 1661406229
> > > > STAT version b'1.6.14'
> > > > STAT libevent b'2.1.8-stable'
> > > > STAT pointer_size 64
> > > > STAT rusage_user 2.93837
> > > > STAT rusage_system 6.339015
> > > > STAT max_connections 1024
> > > > STAT curr_connections 2
> > > > STAT total_connections 8230
> > > > STAT rejected_connections 0
> > > > STAT connection_structures 6
> > > > STAT response_obj_oom 0
> > > > STAT response_obj_count 1
> > > > STAT response_obj_bytes 65536
> > > > STAT read_buf_count 8
> > > > STAT read_buf_bytes 131072
> > > > STAT read_buf_bytes_free 49152
> > > > STAT read_buf_oom 0
> > > > STAT reserved_fds 20
> > > > STAT cmd_get 0
> > > > STAT cmd_set 12640
> > > > STAT cmd_flush 0
> > > > STAT cmd_touch 0
> > > > STAT cmd_meta 0
> > > > STAT get_hits 0
> > > > STAT get_misses 0
> > > > STAT get_expired 0
> > > > STAT get_flushed 0
> > > > STAT delete_misses 0
> > > > STAT delete_hits 0
> > > > STAT incr_misses 0
> > > > STAT incr_hits 0
> > > > STAT decr_misses 0
> > > > STAT decr_hits 0
> > > > STAT cas_misses 0
> > > > STAT cas_hits 0
> > > > STAT cas_badval 0
> > > > STAT touch_hits 0
> > > > STAT touch_misses 0
> > > > STAT store_too_large 0
> > > > STAT store_no_memory 0
> > > > STAT auth_cmds 0
> > > > STAT auth_errors 0
> > > > STAT bytes_read 21755739959
> > > > STAT bytes_written 330909
> > > > STAT limit_maxbytes 5368709120
> > > > STAT accepting_conns 1
> > > > STAT listen_disabled_num 0
> > > > STAT time_in_listen_disabled_us 0
> > > > STAT threads 4
> > > > STAT conn_yields 0
> > > > STAT hash_power_level 16
> > > > STAT hash_byt

Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread dormando
I have an opportunity to put this fix into a release today if anyone wants
to help validate :)

On Thu, 25 Aug 2022, dormando wrote:

> Took another quick look...
>
> Think there's an easy patch that might work:
> https://github.com/memcached/memcached/pull/924
>
> If you wouldn't mind helping validate? An external validator would help me
> get it in time for the next release :)
>
> Thanks,
> -Dormando
>
> On Wed, 24 Aug 2022, dormando wrote:
>
> > Hey,
> >
> > Thanks for the info. Yes; this generally confirms the issue. I see some of
> > your higher slab classes with "free_chunks 0", so if you're setting data
> > that requires these chunks it could error out. The "stats items" confirms
> > this since there are no actual items in those lower slab classes.
> >
> > You're certainly right a workaround of making your items < 512k would also
> > work; but in general if I have features it'd be nice if they worked well
> > :) Please open an issue so we can improve things!
> >
> > I intended to lower the slab_chunk_max default from 512k to much lower, as
> > that actually raises the memory efficiency by a bit (less gap at the
> > higher classes). That may help here. The system should also try ejecting
> > items from the highest LRU... I need to double check that it wasn't
> > already intending to do that and failing.
> >
> > Might also be able to adjust the page mover but not sure. The page mover
> > can probably be adjusted to attempt to keep one page in reserve, but I
> > think the algorithm isn't expecting slabs with no items in it so I'd have
> > to audit that too.
> >
> > If you're up for experiments it'd be interesting to know if setting
> > "-o slab_chunk_max=32768" or 16k (probably not more than 64) makes things
> > better or worse.
> >
> > Also, crud.. it's documented as kilobytes but that's not working somehow?
> > aaahahah. I guess the big EXPERIMENTAL tag scared people off since that
> > never got reported.
> >
> > I'm guessing most people have a mix of small to large items, but you only
> > have large items and a relatively low memory limit, so this is why you're
> > seeing it so easily. I think most people setting large items have like
> > 30G+ of memory so you end up with more spread around.
> >
> > Thanks,
> > -Dormando
> >
> > On Wed, 24 Aug 2022, Hayden wrote:
> >
> > > What you're saying makes sense, and I'm pretty sure it won't be too hard 
> > > to add some functionality to my writing code to break my large items up 
> > > into
> > > smaller parts that can each fit into a single chunk. That has the added 
> > > benefit that I won't have to bother increasing the max item size.
> > > In the meantime, though, I reran my pipeline and captured the output of 
> > > stats, stats slabs, and stats items both when evicting normally and when 
> > > getting
> > > spammed with the error.
> > >
> > > First, the output when I'm in the error state:
> > >  Output of stats
> > > STAT pid 1
> > > STAT uptime 11727
> > > STAT time 1661406229
> > > STAT version b'1.6.14'
> > > STAT libevent b'2.1.8-stable'
> > > STAT pointer_size 64
> > > STAT rusage_user 2.93837
> > > STAT rusage_system 6.339015
> > > STAT max_connections 1024
> > > STAT curr_connections 2
> > > STAT total_connections 8230
> > > STAT rejected_connections 0
> > > STAT connection_structures 6
> > > STAT response_obj_oom 0
> > > STAT response_obj_count 1
> > > STAT response_obj_bytes 65536
> > > STAT read_buf_count 8
> > > STAT read_buf_bytes 131072
> > > STAT read_buf_bytes_free 49152
> > > STAT read_buf_oom 0
> > > STAT reserved_fds 20
> > > STAT cmd_get 0
> > > STAT cmd_set 12640
> > > STAT cmd_flush 0
> > > STAT cmd_touch 0
> > > STAT cmd_meta 0
> > > STAT get_hits 0
> > > STAT get_misses 0
> > > STAT get_expired 0
> > > STAT get_flushed 0
> > > STAT delete_misses 0
> > > STAT delete_hits 0
> > > STAT incr_misses 0
> > > STAT incr_hits 0
> > > STAT decr_misses 0
> > > STAT decr_hits 0
> > > STAT cas_misses 0
> > > STAT cas_hits 0
> > > STAT cas_badval 0
> > > STAT touch_hits 0
> > > STAT touch_misses 0
> > > STAT store_too_large 0
> > > STAT store_no_memory 0
> > > STAT auth_cmds 0
> > > STAT auth_errors 0
> > > STAT bytes_read 21755739959
> > > STAT bytes_written 330909
> > > STAT limit_maxbytes 5368709120
> > > STAT accepting_conns 1
> > > STAT listen_disabled_num 0
> > > STAT time_in_listen_disabled_us 0
> > > STAT threads 4
> > > STAT conn_yields 0
> > > STAT hash_power_level 16
> > > STAT hash_bytes 524288
> > > STAT hash_is_expanding False
> > > STAT slab_reassign_rescues 0
> > > STAT slab_reassign_chunk_rescues 0
> > > STAT slab_reassign_evictions_nomem 0
> > > STAT slab_reassign_inline_reclaim 0
> > > STAT slab_reassign_busy_items 0
> > > STAT slab_reassign_busy_deletes 0
> > > STAT slab_reassign_running False
> > > STAT slabs_moved 0
> > > STAT lru_crawler_running 0
> > > STAT lru_crawler_starts 20
> > > STAT lru_maintainer_juggles 71777
> > > STAT malloc_fails 0
> > > STAT log_worker_dropped 0
> > > STAT log_worker_written 0
> > >