Re: Crashes/hung tasks with z3pool under memory pressure
Hi Vitaly, On Thu, Apr 19, 2018 at 08:18:51AM +, Vitaly Wool wrote: > Hi Guenter, > > Den ons 18 apr. 2018 kl 18:07 skrev Guenter Roeck: > > > On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote: > > > Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck : > > > > > > > > > > > > > Getting better; the log is much less noisy. Unfortunately, there are > > still > > > > locking problems, resulting in a hung task. I copied the log message > > to [1]. > > > > This is with [2] applied on top of v4.17-rc1. > > > > > > Now this version (this is a full patch to be applied instead of the > > previous one) should have the above problem resolved too: > > > > > > > Excellent - I can not reproduce the problem with this patch > > applied. > > > > thank you very much for testing and prompt feedback. I'll come up shortly > with the final patch. > Any updates ? Thanks, Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
Hi Vitaly, On Thu, Apr 19, 2018 at 08:18:51AM +, Vitaly Wool wrote: > Hi Guenter, > > Den ons 18 apr. 2018 kl 18:07 skrev Guenter Roeck : > > > On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote: > > > Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck : > > > > > > > > > > > > > Getting better; the log is much less noisy. Unfortunately, there are > > still > > > > locking problems, resulting in a hung task. I copied the log message > > to [1]. > > > > This is with [2] applied on top of v4.17-rc1. > > > > > > Now this version (this is a full patch to be applied instead of the > > previous one) should have the above problem resolved too: > > > > > > > Excellent - I can not reproduce the problem with this patch > > applied. > > > > thank you very much for testing and prompt feedback. I'll come up shortly > with the final patch. > Any updates ? Thanks, Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote: > Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck: > > > > > Getting better; the log is much less noisy. Unfortunately, there are still > > locking problems, resulting in a hung task. I copied the log message to [1]. > > This is with [2] applied on top of v4.17-rc1. > > Now this version (this is a full patch to be applied instead of the previous > one) should have the above problem resolved too: > Excellent - I can not reproduce the problem with this patch applied. Guenter > diff --git a/mm/z3fold.c b/mm/z3fold.c > index c0bca6153b95..901c0b07cbda 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -144,7 +144,8 @@ enum z3fold_page_flags { > PAGE_HEADLESS = 0, > MIDDLE_CHUNK_MAPPED, > NEEDS_COMPACTING, > - PAGE_STALE > + PAGE_STALE, > + UNDER_RECLAIM > }; > > /* > @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page > *page, > clear_bit(MIDDLE_CHUNK_MAPPED, >private); > clear_bit(NEEDS_COMPACTING, >private); > clear_bit(PAGE_STALE, >private); > + clear_bit(UNDER_RECLAIM, >private); > > spin_lock_init(>page_lock); > kref_init(>refcount); > @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, > unsigned long handle) > atomic64_dec(>pages_nr); > return; > } > + if (test_bit(UNDER_RECLAIM, >private)) { > + z3fold_page_unlock(zhdr); > + return; > + } > if (test_and_set_bit(NEEDS_COMPACTING, >private)) { > z3fold_page_unlock(zhdr); > return; > @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, > unsigned int retries) > kref_get(>refcount); > list_del_init(>buddy); > zhdr->cpu = -1; > + set_bit(UNDER_RECLAIM, >private); > + break; > } > > list_del_init(>lru); > @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool > *pool, unsigned int retries) > goto next; > } > next: > - spin_lock(>lock); > if (test_bit(PAGE_HEADLESS, >private)) { > if (ret == 0) { > - spin_unlock(>lock); > free_z3fold_page(page); > return 0; > } > - } else if (kref_put(>refcount, release_z3fold_page)) { > - atomic64_dec(>pages_nr); > + spin_lock(>lock); > + list_add(>lru, >lru); > + spin_unlock(>lock); > + } else { > + z3fold_page_lock(zhdr); > + clear_bit(UNDER_RECLAIM, >private); > + if (kref_put(>refcount, > + release_z3fold_page_locked)) { > + atomic64_dec(>pages_nr); > + return 0; > + } > + /* > + * if we are here, the page is still not completely > + * free. Take the global pool lock then to be able extra then ? > + * to add it back to the lru list > + */ > + spin_lock(>lock); > + list_add(>lru, >lru); > spin_unlock(>lock); > - return 0; > + z3fold_page_unlock(zhdr); > } > > - /* > - * Add to the beginning of LRU. > - * Pool lock has to be kept here to ensure the page has > - * not already been released > - */ > - list_add(>lru, >lru); > + /* We started off locked to we need to lock the pool back */ > + spin_lock(>lock); > } > spin_unlock(>lock); > return -EAGAIN;
Re: Crashes/hung tasks with z3pool under memory pressure
On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote: > Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck : > > > > > Getting better; the log is much less noisy. Unfortunately, there are still > > locking problems, resulting in a hung task. I copied the log message to [1]. > > This is with [2] applied on top of v4.17-rc1. > > Now this version (this is a full patch to be applied instead of the previous > one) should have the above problem resolved too: > Excellent - I can not reproduce the problem with this patch applied. Guenter > diff --git a/mm/z3fold.c b/mm/z3fold.c > index c0bca6153b95..901c0b07cbda 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -144,7 +144,8 @@ enum z3fold_page_flags { > PAGE_HEADLESS = 0, > MIDDLE_CHUNK_MAPPED, > NEEDS_COMPACTING, > - PAGE_STALE > + PAGE_STALE, > + UNDER_RECLAIM > }; > > /* > @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page > *page, > clear_bit(MIDDLE_CHUNK_MAPPED, >private); > clear_bit(NEEDS_COMPACTING, >private); > clear_bit(PAGE_STALE, >private); > + clear_bit(UNDER_RECLAIM, >private); > > spin_lock_init(>page_lock); > kref_init(>refcount); > @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, > unsigned long handle) > atomic64_dec(>pages_nr); > return; > } > + if (test_bit(UNDER_RECLAIM, >private)) { > + z3fold_page_unlock(zhdr); > + return; > + } > if (test_and_set_bit(NEEDS_COMPACTING, >private)) { > z3fold_page_unlock(zhdr); > return; > @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, > unsigned int retries) > kref_get(>refcount); > list_del_init(>buddy); > zhdr->cpu = -1; > + set_bit(UNDER_RECLAIM, >private); > + break; > } > > list_del_init(>lru); > @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool > *pool, unsigned int retries) > goto next; > } > next: > - spin_lock(>lock); > if (test_bit(PAGE_HEADLESS, >private)) { > if (ret == 0) { > - spin_unlock(>lock); > free_z3fold_page(page); > return 0; > } > - } else if (kref_put(>refcount, release_z3fold_page)) { > - atomic64_dec(>pages_nr); > + spin_lock(>lock); > + list_add(>lru, >lru); > + spin_unlock(>lock); > + } else { > + z3fold_page_lock(zhdr); > + clear_bit(UNDER_RECLAIM, >private); > + if (kref_put(>refcount, > + release_z3fold_page_locked)) { > + atomic64_dec(>pages_nr); > + return 0; > + } > + /* > + * if we are here, the page is still not completely > + * free. Take the global pool lock then to be able extra then ? > + * to add it back to the lru list > + */ > + spin_lock(>lock); > + list_add(>lru, >lru); > spin_unlock(>lock); > - return 0; > + z3fold_page_unlock(zhdr); > } > > - /* > - * Add to the beginning of LRU. > - * Pool lock has to be kept here to ensure the page has > - * not already been released > - */ > - list_add(>lru, >lru); > + /* We started off locked to we need to lock the pool back */ > + spin_lock(>lock); > } > spin_unlock(>lock); > return -EAGAIN;
Re: Crashes/hung tasks with z3pool under memory pressure
Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck: > Getting better; the log is much less noisy. Unfortunately, there are still > locking problems, resulting in a hung task. I copied the log message to [1]. > This is with [2] applied on top of v4.17-rc1. Now this version (this is a full patch to be applied instead of the previous one) should have the above problem resolved too: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..901c0b07cbda 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -144,7 +144,8 @@ enum z3fold_page_flags { PAGE_HEADLESS = 0, MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, - PAGE_STALE + PAGE_STALE, + UNDER_RECLAIM }; /* @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page, clear_bit(MIDDLE_CHUNK_MAPPED, >private); clear_bit(NEEDS_COMPACTING, >private); clear_bit(PAGE_STALE, >private); + clear_bit(UNDER_RECLAIM, >private); spin_lock_init(>page_lock); kref_init(>refcount); @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) atomic64_dec(>pages_nr); return; } + if (test_bit(UNDER_RECLAIM, >private)) { + z3fold_page_unlock(zhdr); + return; + } if (test_and_set_bit(NEEDS_COMPACTING, >private)) { z3fold_page_unlock(zhdr); return; @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + set_bit(UNDER_RECLAIM, >private); + break; } list_del_init(>lru); @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); + spin_lock(>lock); + list_add(>lru, >lru); + spin_unlock(>lock); + } else { + z3fold_page_lock(zhdr); + clear_bit(UNDER_RECLAIM, >private); +if (kref_put(>refcount, + release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + /* +* if we are here, the page is still not completely +* free. Take the global pool lock then to be able +* to add it back to the lru list +*/ + spin_lock(>lock); + list_add(>lru, >lru); spin_unlock(>lock); - return 0; + z3fold_page_unlock(zhdr); } - /* -* Add to the beginning of LRU. -* Pool lock has to be kept here to ensure the page has -* not already been released -*/ - list_add(>lru, >lru); + /* We started off locked to we need to lock the pool back */ + spin_lock(>lock); } spin_unlock(>lock); return -EAGAIN;
Re: Crashes/hung tasks with z3pool under memory pressure
Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck : > Getting better; the log is much less noisy. Unfortunately, there are still > locking problems, resulting in a hung task. I copied the log message to [1]. > This is with [2] applied on top of v4.17-rc1. Now this version (this is a full patch to be applied instead of the previous one) should have the above problem resolved too: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..901c0b07cbda 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -144,7 +144,8 @@ enum z3fold_page_flags { PAGE_HEADLESS = 0, MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, - PAGE_STALE + PAGE_STALE, + UNDER_RECLAIM }; /* @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page, clear_bit(MIDDLE_CHUNK_MAPPED, >private); clear_bit(NEEDS_COMPACTING, >private); clear_bit(PAGE_STALE, >private); + clear_bit(UNDER_RECLAIM, >private); spin_lock_init(>page_lock); kref_init(>refcount); @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) atomic64_dec(>pages_nr); return; } + if (test_bit(UNDER_RECLAIM, >private)) { + z3fold_page_unlock(zhdr); + return; + } if (test_and_set_bit(NEEDS_COMPACTING, >private)) { z3fold_page_unlock(zhdr); return; @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + set_bit(UNDER_RECLAIM, >private); + break; } list_del_init(>lru); @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); + spin_lock(>lock); + list_add(>lru, >lru); + spin_unlock(>lock); + } else { + z3fold_page_lock(zhdr); + clear_bit(UNDER_RECLAIM, >private); +if (kref_put(>refcount, + release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + /* +* if we are here, the page is still not completely +* free. Take the global pool lock then to be able +* to add it back to the lru list +*/ + spin_lock(>lock); + list_add(>lru, >lru); spin_unlock(>lock); - return 0; + z3fold_page_unlock(zhdr); } - /* -* Add to the beginning of LRU. -* Pool lock has to be kept here to ensure the page has -* not already been released -*/ - list_add(>lru, >lru); + /* We started off locked to we need to lock the pool back */ + spin_lock(>lock); } spin_unlock(>lock); return -EAGAIN;
Re: Crashes/hung tasks with z3pool under memory pressure
On Tue, Apr 17, 2018 at 04:00:32PM +0200, Vitaly Wool wrote: > Hi Guenter, > > > [ ... ] > > > Ugh. Could you please keep that patch and apply this on top: > > > > > > diff --git a/mm/z3fold.c b/mm/z3fold.c > > > index c0bca6153b95..e8a80d044d9e 100644 > > > --- a/mm/z3fold.c > > > +++ b/mm/z3fold.c > > > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool > > > *pool, unsigned int retries) > > > kref_get(>refcount); > > > list_del_init(>buddy); > > > zhdr->cpu = -1; > > > + break; > > > } > > > list_del_init(>lru); > > > > > Much better, in a way. The system now takes much longer to crash, > > and the crash reason is a bit different. The log is too long to attach, > > so I copied it to [1]. > > > > crashdump.0002 Latest log > > 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1. > > thanks for the update. Let's start from a clean sheet. I believe this patch > has to be applied anyway so could you please check if it solves the problem. > Getting better; the log is much less noisy. Unfortunately, there are still locking problems, resulting in a hung task. I copied the log message to [1]. This is with [2] applied on top of v4.17-rc1. Guenter --- [1] http://server.roeck-us.net/qemu/z3pool/crashdump.0003 [2] http://server.roeck-us.net/qemu/z3pool/0001-Fix-attempt-3.patch
Re: Crashes/hung tasks with z3pool under memory pressure
On Tue, Apr 17, 2018 at 04:00:32PM +0200, Vitaly Wool wrote: > Hi Guenter, > > > [ ... ] > > > Ugh. Could you please keep that patch and apply this on top: > > > > > > diff --git a/mm/z3fold.c b/mm/z3fold.c > > > index c0bca6153b95..e8a80d044d9e 100644 > > > --- a/mm/z3fold.c > > > +++ b/mm/z3fold.c > > > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool > > > *pool, unsigned int retries) > > > kref_get(>refcount); > > > list_del_init(>buddy); > > > zhdr->cpu = -1; > > > + break; > > > } > > > list_del_init(>lru); > > > > > Much better, in a way. The system now takes much longer to crash, > > and the crash reason is a bit different. The log is too long to attach, > > so I copied it to [1]. > > > > crashdump.0002 Latest log > > 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1. > > thanks for the update. Let's start from a clean sheet. I believe this patch > has to be applied anyway so could you please check if it solves the problem. > Getting better; the log is much less noisy. Unfortunately, there are still locking problems, resulting in a hung task. I copied the log message to [1]. This is with [2] applied on top of v4.17-rc1. Guenter --- [1] http://server.roeck-us.net/qemu/z3pool/crashdump.0003 [2] http://server.roeck-us.net/qemu/z3pool/0001-Fix-attempt-3.patch
Re: Crashes/hung tasks with z3pool under memory pressure
Hi Guenter, > [ ... ] > > Ugh. Could you please keep that patch and apply this on top: > > > > diff --git a/mm/z3fold.c b/mm/z3fold.c > > index c0bca6153b95..e8a80d044d9e 100644 > > --- a/mm/z3fold.c > > +++ b/mm/z3fold.c > > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool > > *pool, unsigned int retries) > > kref_get(>refcount); > > list_del_init(>buddy); > > zhdr->cpu = -1; > > + break; > > } > > list_del_init(>lru); > > > Much better, in a way. The system now takes much longer to crash, > and the crash reason is a bit different. The log is too long to attach, > so I copied it to [1]. > > crashdump.0002 Latest log > 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1. thanks for the update. Let's start from a clean sheet. I believe this patch has to be applied anyway so could you please check if it solves the problem. diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..059fb3d5ca86 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -144,7 +144,8 @@ enum z3fold_page_flags { PAGE_HEADLESS = 0, MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, - PAGE_STALE + PAGE_STALE, + UNDER_RECLAIM }; /* @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page, clear_bit(MIDDLE_CHUNK_MAPPED, >private); clear_bit(NEEDS_COMPACTING, >private); clear_bit(PAGE_STALE, >private); + clear_bit(UNDER_RECLAIM, >private); spin_lock_init(>page_lock); kref_init(>refcount); @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) atomic64_dec(>pages_nr); return; } + if (test_bit(UNDER_RECLAIM, >private)) { + z3fold_page_unlock(zhdr); + return; + } if (test_and_set_bit(NEEDS_COMPACTING, >private)) { z3fold_page_unlock(zhdr); return; @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + set_bit(UNDER_RECLAIM, >private); + break; } list_del_init(>lru); @@ -888,6 +896,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) } next: spin_lock(>lock); + clear_bit(UNDER_RECLAIM, >private); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { spin_unlock(>lock);
Re: Crashes/hung tasks with z3pool under memory pressure
Hi Guenter, > [ ... ] > > Ugh. Could you please keep that patch and apply this on top: > > > > diff --git a/mm/z3fold.c b/mm/z3fold.c > > index c0bca6153b95..e8a80d044d9e 100644 > > --- a/mm/z3fold.c > > +++ b/mm/z3fold.c > > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool > > *pool, unsigned int retries) > > kref_get(>refcount); > > list_del_init(>buddy); > > zhdr->cpu = -1; > > + break; > > } > > list_del_init(>lru); > > > Much better, in a way. The system now takes much longer to crash, > and the crash reason is a bit different. The log is too long to attach, > so I copied it to [1]. > > crashdump.0002 Latest log > 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1. thanks for the update. Let's start from a clean sheet. I believe this patch has to be applied anyway so could you please check if it solves the problem. diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..059fb3d5ca86 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -144,7 +144,8 @@ enum z3fold_page_flags { PAGE_HEADLESS = 0, MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, - PAGE_STALE + PAGE_STALE, + UNDER_RECLAIM }; /* @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page, clear_bit(MIDDLE_CHUNK_MAPPED, >private); clear_bit(NEEDS_COMPACTING, >private); clear_bit(PAGE_STALE, >private); + clear_bit(UNDER_RECLAIM, >private); spin_lock_init(>page_lock); kref_init(>refcount); @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) atomic64_dec(>pages_nr); return; } + if (test_bit(UNDER_RECLAIM, >private)) { + z3fold_page_unlock(zhdr); + return; + } if (test_and_set_bit(NEEDS_COMPACTING, >private)) { z3fold_page_unlock(zhdr); return; @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + set_bit(UNDER_RECLAIM, >private); + break; } list_del_init(>lru); @@ -888,6 +896,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) } next: spin_lock(>lock); + clear_bit(UNDER_RECLAIM, >private); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { spin_unlock(>lock);
Re: Crashes/hung tasks with z3pool under memory pressure
On Tue, Apr 17, 2018 at 12:14:37AM +0200, Vitaly Wool wrote: [ ... ] > Ugh. Could you please keep that patch and apply this on top: > > diff --git a/mm/z3fold.c b/mm/z3fold.c > index c0bca6153b95..e8a80d044d9e 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, > unsigned int retries) > kref_get(>refcount); > list_del_init(>buddy); > zhdr->cpu = -1; > + break; > } > list_del_init(>lru); > Much better, in a way. The system now takes much longer to crash, and the crash reason is a bit different. The log is too long to attach, so I copied it to [1]. crashdump.0002 Latest log 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1. Hope it helps, Guenter [1] http://server.roeck-us.net/qemu/z3pool/
Re: Crashes/hung tasks with z3pool under memory pressure
On Tue, Apr 17, 2018 at 12:14:37AM +0200, Vitaly Wool wrote: [ ... ] > Ugh. Could you please keep that patch and apply this on top: > > diff --git a/mm/z3fold.c b/mm/z3fold.c > index c0bca6153b95..e8a80d044d9e 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, > unsigned int retries) > kref_get(>refcount); > list_del_init(>buddy); > zhdr->cpu = -1; > + break; > } > list_del_init(>lru); > Much better, in a way. The system now takes much longer to crash, and the crash reason is a bit different. The log is too long to attach, so I copied it to [1]. crashdump.0002 Latest log 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1. Hope it helps, Guenter [1] http://server.roeck-us.net/qemu/z3pool/
Re: Crashes/hung tasks with z3pool under memory pressure
On 4/16/18 5:58 PM, Guenter Roeck wrote: On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote: Hey Guenter, On 04/13/2018 07:56 PM, Guenter Roeck wrote: On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: On Fri, Apr 13, 2018, 7:35 PM Guenter Roeckwrote: On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: Hi Guenter, Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : Hi all, we are observing crashes with z3pool under memory pressure. The kernel version used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was also seen with v4.14 based kernels. just before I dig into this, could you please try reproducing the errors you see with https://patchwork.kernel.org/patch/10210459/ applied? As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already includes this patch. Bah. Sorry. Expect an update after the weekend. NP; easy to miss. Thanks a lot for looking into it. I wonder if the following patch would make a difference: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..5e547c2d5832 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); - spin_unlock(>lock); - return 0; + } else { + spin_lock(>page_lock); + if (kref_put(>refcount, release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + spin_unlock(>page_lock); } + spin_lock(>lock); /* * Add to the beginning of LRU. * Pool lock has to be kept here to ensure the page has No, it doesn't. Same crash. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by kswapd0/51: #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 ... PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b __lock_acquire+0x429/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? lock_acquire+0x93/0x230 lock_acquire+0x93/0x230 ? z3fold_zpool_shrink+0xb7/0x3e0 _raw_spin_trylock+0x65/0x80 ? z3fold_zpool_shrink+0xb7/0x3e0 ? z3fold_zpool_shrink+0x47/0x3e0 z3fold_zpool_shrink+0xb7/0x3e0 zswap_frontswap_store+0x180/0x7c0 ... BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0 INFO: lockdep is turned off. Preemption disabled at: [<>] (null) CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b ___might_sleep+0x16c/0x250 __alloc_pages_nodemask+0x1e7/0x1490 ? lock_acquire+0x93/0x230 ? lock_acquire+0x93/0x230 __read_swap_cache_async+0x14d/0x260 zswap_writeback_entry+0xdb/0x340 z3fold_zpool_shrink+0x2b1/0x3e0 zswap_frontswap_store+0x180/0x7c0 ? page_vma_mapped_walk+0x22/0x230 __frontswap_store+0x6e/0xf0 swap_writepage+0x49/0x70 ... This is with your patch applied on top of v4.17-rc1. Guenter Ugh. Could you please keep that patch and apply this on top: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..e8a80d044d9e 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + break; } list_del_init(>lru); Thanks, Vitaly
Re: Crashes/hung tasks with z3pool under memory pressure
On 4/16/18 5:58 PM, Guenter Roeck wrote: On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote: Hey Guenter, On 04/13/2018 07:56 PM, Guenter Roeck wrote: On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck wrote: On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: Hi Guenter, Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : Hi all, we are observing crashes with z3pool under memory pressure. The kernel version used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was also seen with v4.14 based kernels. just before I dig into this, could you please try reproducing the errors you see with https://patchwork.kernel.org/patch/10210459/ applied? As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already includes this patch. Bah. Sorry. Expect an update after the weekend. NP; easy to miss. Thanks a lot for looking into it. I wonder if the following patch would make a difference: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..5e547c2d5832 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); - spin_unlock(>lock); - return 0; + } else { + spin_lock(>page_lock); + if (kref_put(>refcount, release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + spin_unlock(>page_lock); } + spin_lock(>lock); /* * Add to the beginning of LRU. * Pool lock has to be kept here to ensure the page has No, it doesn't. Same crash. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by kswapd0/51: #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 ... PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b __lock_acquire+0x429/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? lock_acquire+0x93/0x230 lock_acquire+0x93/0x230 ? z3fold_zpool_shrink+0xb7/0x3e0 _raw_spin_trylock+0x65/0x80 ? z3fold_zpool_shrink+0xb7/0x3e0 ? z3fold_zpool_shrink+0x47/0x3e0 z3fold_zpool_shrink+0xb7/0x3e0 zswap_frontswap_store+0x180/0x7c0 ... BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0 INFO: lockdep is turned off. Preemption disabled at: [<>] (null) CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b ___might_sleep+0x16c/0x250 __alloc_pages_nodemask+0x1e7/0x1490 ? lock_acquire+0x93/0x230 ? lock_acquire+0x93/0x230 __read_swap_cache_async+0x14d/0x260 zswap_writeback_entry+0xdb/0x340 z3fold_zpool_shrink+0x2b1/0x3e0 zswap_frontswap_store+0x180/0x7c0 ? page_vma_mapped_walk+0x22/0x230 __frontswap_store+0x6e/0xf0 swap_writepage+0x49/0x70 ... This is with your patch applied on top of v4.17-rc1. Guenter Ugh. Could you please keep that patch and apply this on top: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..e8a80d044d9e 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + break; } list_del_init(>lru); Thanks, Vitaly
Re: Crashes/hung tasks with z3pool under memory pressure
On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote: > Hey Guenter, > > On 04/13/2018 07:56 PM, Guenter Roeck wrote: > > >On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: > >>On Fri, Apr 13, 2018, 7:35 PM Guenter Roeckwrote: > >> > >>>On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: > Hi Guenter, > > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : > > >Hi all, > >we are observing crashes with z3pool under memory pressure. The kernel > version > >used to reproduce the problem is v4.16-11827-g5d1365940a68, but the > problem was > >also seen with v4.14 based kernels. > > just before I dig into this, could you please try reproducing the errors > you see with https://patchwork.kernel.org/patch/10210459/ applied? > > >>>As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already > >>>includes this patch. > >>> > >>Bah. Sorry. Expect an update after the weekend. > >> > >NP; easy to miss. Thanks a lot for looking into it. > > > I wonder if the following patch would make a difference: > > diff --git a/mm/z3fold.c b/mm/z3fold.c > index c0bca6153b95..5e547c2d5832 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool > *pool, unsigned int retries) > goto next; > } > next: > - spin_lock(>lock); > if (test_bit(PAGE_HEADLESS, >private)) { > if (ret == 0) { > - spin_unlock(>lock); > free_z3fold_page(page); > return 0; > } > - } else if (kref_put(>refcount, release_z3fold_page)) { > - atomic64_dec(>pages_nr); > - spin_unlock(>lock); > - return 0; > + } else { > + spin_lock(>page_lock); > + if (kref_put(>refcount, > release_z3fold_page_locked)) { > + atomic64_dec(>pages_nr); > + return 0; > + } > + spin_unlock(>page_lock); > } > + spin_lock(>lock); > /* >* Add to the beginning of LRU. >* Pool lock has to be kept here to ensure the page has > No, it doesn't. Same crash. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by kswapd0/51: #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 ... PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b __lock_acquire+0x429/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? lock_acquire+0x93/0x230 lock_acquire+0x93/0x230 ? z3fold_zpool_shrink+0xb7/0x3e0 _raw_spin_trylock+0x65/0x80 ? z3fold_zpool_shrink+0xb7/0x3e0 ? z3fold_zpool_shrink+0x47/0x3e0 z3fold_zpool_shrink+0xb7/0x3e0 zswap_frontswap_store+0x180/0x7c0 ... BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0 INFO: lockdep is turned off. Preemption disabled at: [<>] (null) CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b ___might_sleep+0x16c/0x250 __alloc_pages_nodemask+0x1e7/0x1490 ? lock_acquire+0x93/0x230 ? lock_acquire+0x93/0x230 __read_swap_cache_async+0x14d/0x260 zswap_writeback_entry+0xdb/0x340 z3fold_zpool_shrink+0x2b1/0x3e0 zswap_frontswap_store+0x180/0x7c0 ? page_vma_mapped_walk+0x22/0x230 __frontswap_store+0x6e/0xf0 swap_writepage+0x49/0x70 ... This is with your patch applied on top of v4.17-rc1. Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote: > Hey Guenter, > > On 04/13/2018 07:56 PM, Guenter Roeck wrote: > > >On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: > >>On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck wrote: > >> > >>>On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: > Hi Guenter, > > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : > > >Hi all, > >we are observing crashes with z3pool under memory pressure. The kernel > version > >used to reproduce the problem is v4.16-11827-g5d1365940a68, but the > problem was > >also seen with v4.14 based kernels. > > just before I dig into this, could you please try reproducing the errors > you see with https://patchwork.kernel.org/patch/10210459/ applied? > > >>>As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already > >>>includes this patch. > >>> > >>Bah. Sorry. Expect an update after the weekend. > >> > >NP; easy to miss. Thanks a lot for looking into it. > > > I wonder if the following patch would make a difference: > > diff --git a/mm/z3fold.c b/mm/z3fold.c > index c0bca6153b95..5e547c2d5832 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool > *pool, unsigned int retries) > goto next; > } > next: > - spin_lock(>lock); > if (test_bit(PAGE_HEADLESS, >private)) { > if (ret == 0) { > - spin_unlock(>lock); > free_z3fold_page(page); > return 0; > } > - } else if (kref_put(>refcount, release_z3fold_page)) { > - atomic64_dec(>pages_nr); > - spin_unlock(>lock); > - return 0; > + } else { > + spin_lock(>page_lock); > + if (kref_put(>refcount, > release_z3fold_page_locked)) { > + atomic64_dec(>pages_nr); > + return 0; > + } > + spin_unlock(>page_lock); > } > + spin_lock(>lock); > /* >* Add to the beginning of LRU. >* Pool lock has to be kept here to ensure the page has > No, it doesn't. Same crash. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by kswapd0/51: #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 ... PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b __lock_acquire+0x429/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? __lock_acquire+0x2af/0x18f0 ? lock_acquire+0x93/0x230 lock_acquire+0x93/0x230 ? z3fold_zpool_shrink+0xb7/0x3e0 _raw_spin_trylock+0x65/0x80 ? z3fold_zpool_shrink+0xb7/0x3e0 ? z3fold_zpool_shrink+0x47/0x3e0 z3fold_zpool_shrink+0xb7/0x3e0 zswap_frontswap_store+0x180/0x7c0 ... BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0 INFO: lockdep is turned off. Preemption disabled at: [<>] (null) CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b ___might_sleep+0x16c/0x250 __alloc_pages_nodemask+0x1e7/0x1490 ? lock_acquire+0x93/0x230 ? lock_acquire+0x93/0x230 __read_swap_cache_async+0x14d/0x260 zswap_writeback_entry+0xdb/0x340 z3fold_zpool_shrink+0x2b1/0x3e0 zswap_frontswap_store+0x180/0x7c0 ? page_vma_mapped_walk+0x22/0x230 __frontswap_store+0x6e/0xf0 swap_writepage+0x49/0x70 ... This is with your patch applied on top of v4.17-rc1. Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
Hey Guenter, On 04/13/2018 07:56 PM, Guenter Roeck wrote: On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: On Fri, Apr 13, 2018, 7:35 PM Guenter Roeckwrote: On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: Hi Guenter, Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : Hi all, we are observing crashes with z3pool under memory pressure. The kernel version used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was also seen with v4.14 based kernels. just before I dig into this, could you please try reproducing the errors you see with https://patchwork.kernel.org/patch/10210459/ applied? As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already includes this patch. Bah. Sorry. Expect an update after the weekend. NP; easy to miss. Thanks a lot for looking into it. I wonder if the following patch would make a difference: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..5e547c2d5832 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); - spin_unlock(>lock); - return 0; + } else { + spin_lock(>page_lock); + if (kref_put(>refcount, release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + spin_unlock(>page_lock); } + spin_lock(>lock); /* * Add to the beginning of LRU. * Pool lock has to be kept here to ensure the page has Thanks, Vitaly
Re: Crashes/hung tasks with z3pool under memory pressure
Hey Guenter, On 04/13/2018 07:56 PM, Guenter Roeck wrote: On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck wrote: On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: Hi Guenter, Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : Hi all, we are observing crashes with z3pool under memory pressure. The kernel version used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was also seen with v4.14 based kernels. just before I dig into this, could you please try reproducing the errors you see with https://patchwork.kernel.org/patch/10210459/ applied? As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already includes this patch. Bah. Sorry. Expect an update after the weekend. NP; easy to miss. Thanks a lot for looking into it. I wonder if the following patch would make a difference: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..5e547c2d5832 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); - spin_unlock(>lock); - return 0; + } else { + spin_lock(>page_lock); + if (kref_put(>refcount, release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + spin_unlock(>page_lock); } + spin_lock(>lock); /* * Add to the beginning of LRU. * Pool lock has to be kept here to ensure the page has Thanks, Vitaly
Re: Crashes/hung tasks with z3pool under memory pressure
On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: > On Fri, Apr 13, 2018, 7:35 PM Guenter Roeckwrote: > > > On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: > > > Hi Guenter, > > > > > > > > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : > > > > > > > Hi all, > > > > > > > we are observing crashes with z3pool under memory pressure. The kernel > > > version > > > > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the > > > problem was > > > > also seen with v4.14 based kernels. > > > > > > > > > just before I dig into this, could you please try reproducing the errors > > > you see with https://patchwork.kernel.org/patch/10210459/ applied? > > > > > > > As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already > > includes this patch. > > > > Bah. Sorry. Expect an update after the weekend. > NP; easy to miss. Thanks a lot for looking into it. Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote: > On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck wrote: > > > On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: > > > Hi Guenter, > > > > > > > > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : > > > > > > > Hi all, > > > > > > > we are observing crashes with z3pool under memory pressure. The kernel > > > version > > > > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the > > > problem was > > > > also seen with v4.14 based kernels. > > > > > > > > > just before I dig into this, could you please try reproducing the errors > > > you see with https://patchwork.kernel.org/patch/10210459/ applied? > > > > > > > As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already > > includes this patch. > > > > Bah. Sorry. Expect an update after the weekend. > NP; easy to miss. Thanks a lot for looking into it. Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: > Hi Guenter, > > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck: > > > Hi all, > > > we are observing crashes with z3pool under memory pressure. The kernel > version > > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the > problem was > > also seen with v4.14 based kernels. > > > just before I dig into this, could you please try reproducing the errors > you see with https://patchwork.kernel.org/patch/10210459/ applied? > As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already includes this patch. $ git log --oneline v4.14..5d1365940a68 mm/z3fold.c 8a97ea546bb6 mm/z3fold.c: use gfpflags_allow_blocking 1ec6995d1290 z3fold: fix memory leak 5c9bab592f53 z3fold: limit use of stale list for allocation f144c390f905 mm: docs: fix parameter names mismatch 5d03a6613957 mm/z3fold.c: use kref to prevent page free/compact race Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote: > Hi Guenter, > > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : > > > Hi all, > > > we are observing crashes with z3pool under memory pressure. The kernel > version > > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the > problem was > > also seen with v4.14 based kernels. > > > just before I dig into this, could you please try reproducing the errors > you see with https://patchwork.kernel.org/patch/10210459/ applied? > As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already includes this patch. $ git log --oneline v4.14..5d1365940a68 mm/z3fold.c 8a97ea546bb6 mm/z3fold.c: use gfpflags_allow_blocking 1ec6995d1290 z3fold: fix memory leak 5c9bab592f53 z3fold: limit use of stale list for allocation f144c390f905 mm: docs: fix parameter names mismatch 5d03a6613957 mm/z3fold.c: use kref to prevent page free/compact race Guenter
Re: Crashes/hung tasks with z3pool under memory pressure
Hi Guenter, Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck: > Hi all, > we are observing crashes with z3pool under memory pressure. The kernel version > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was > also seen with v4.14 based kernels. just before I dig into this, could you please try reproducing the errors you see with https://patchwork.kernel.org/patch/10210459/ applied? Thanks, Vitaly > For simplicity, here is a set of shortened logs. A more complete log is > available at [1]. > [ cut here ] > DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10) > WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212 preempt_count_add+0x90/0xa0 > Modules linked in: > CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 > RIP: 0010:preempt_count_add+0x90/0xa0 > RSP: :b12740db7750 EFLAGS: 00010286 > RAX: RBX: 0001 RCX: 00f6 > RDX: 00f6 RSI: 0082 RDI: > RBP: f00480f357a0 R08: 004a R09: 01ad > R10: b12740db77e0 R11: R12: 9cbc7e265d10 > R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780 > FS: 7f5140791700() GS:9cbc7fd0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f513260f000 CR3: 32086000 CR4: 06e0 > Call Trace: > _raw_spin_trylock+0x13/0x30 > z3fold_zpool_shrink+0xab/0x3a0 > zswap_frontswap_store+0x10b/0x610 > ... > WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278 release_z3fold_page_locked+0x25/0x40 > Modules linked in: > ... > INFO: rcu_preempt self-detected stall on CPU > 2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906 > softirq=4104/4113 fqs=11 > ... > RIP: 0010:queued_spin_lock_slowpath+0x132/0x190 > RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13 > RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001 > RDX: 0101 RSI: 0001 RDI: 0101 > RBP: R08: 9cbc7fc21240 R09: af19c900 > R10: b12740db75a0 R11: 0010 R12: f00480522d20 > R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00 > ? __zswap_pool_current+0x80/0x90 > z3fold_zpool_shrink+0x1d3/0x3a0 > zswap_frontswap_store+0x10b/0x610 > ... > With lock debugging enabled, the log is a bit different, but similar. > BUG: MAX_LOCK_DEPTH too low! > turning off the locking correctness validator. > depth: 48 max: 48! > 48 locks held by memory-eater/619: > #0: 2da807ce (>mmap_sem){}, at: __do_page_fault+0x122/0x5a0 > #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 > #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 > #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 > ... > watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613] > Modules linked in: > irq event stamp: 1435394 > hardirqs last enabled at (1435393): [] > _raw_spin_unlock_irqrestore+0x51/0x60 > hardirqs last disabled at (1435394): [] __schedule+0xba/0xbb0 > softirqs last enabled at (1434508): [] __do_softirq+0x27c/0x516 > softirqs last disabled at (1434323): [] irq_exit+0xa9/0xc0 > CPU: 0 PID: 613 Comm: memory-eater Tainted: GW > 4.16.0-yocto-standard+ #9 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 > RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0 > RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13 > RAX: RBX: 9704379cba08 RCX: 97043fc22080 > RDX: 0001 RSI: 9c05f6a0 RDI: 0004 > RBP: R08: 9b1f2215 R09: > R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080 > R13: de77404e0920 R14: 970413824000 R15: 9704379cba00 > FS: 7f8c6317f700() GS:97043fc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0 > Call Trace: > do_raw_spin_lock+0xad/0xb0 > z3fold_zpool_malloc+0x595/0x790 > ... > The problem is easy to reproduce. Please see [2] and the other files > at [3] for details. Various additional crash logs, observed with > chromeos-4.14, are available at [4]. > Please let me know if there is anything else I can do to help solving > or debugging the problem. I had a look into the code, but I must admit > that its locking is a mystery to me. > Thanks, > Guenter > --- > [1] http://server.roeck-us.net/qemu/z3pool/crashdump > [2] http://server.roeck-us.net/qemu/z3pool/README > [3] http://server.roeck-us.net/qemu/z3pool/ > [4]
Re: Crashes/hung tasks with z3pool under memory pressure
Hi Guenter, Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : > Hi all, > we are observing crashes with z3pool under memory pressure. The kernel version > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was > also seen with v4.14 based kernels. just before I dig into this, could you please try reproducing the errors you see with https://patchwork.kernel.org/patch/10210459/ applied? Thanks, Vitaly > For simplicity, here is a set of shortened logs. A more complete log is > available at [1]. > [ cut here ] > DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10) > WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212 preempt_count_add+0x90/0xa0 > Modules linked in: > CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 > RIP: 0010:preempt_count_add+0x90/0xa0 > RSP: :b12740db7750 EFLAGS: 00010286 > RAX: RBX: 0001 RCX: 00f6 > RDX: 00f6 RSI: 0082 RDI: > RBP: f00480f357a0 R08: 004a R09: 01ad > R10: b12740db77e0 R11: R12: 9cbc7e265d10 > R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780 > FS: 7f5140791700() GS:9cbc7fd0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f513260f000 CR3: 32086000 CR4: 06e0 > Call Trace: > _raw_spin_trylock+0x13/0x30 > z3fold_zpool_shrink+0xab/0x3a0 > zswap_frontswap_store+0x10b/0x610 > ... > WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278 release_z3fold_page_locked+0x25/0x40 > Modules linked in: > ... > INFO: rcu_preempt self-detected stall on CPU > 2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906 > softirq=4104/4113 fqs=11 > ... > RIP: 0010:queued_spin_lock_slowpath+0x132/0x190 > RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13 > RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001 > RDX: 0101 RSI: 0001 RDI: 0101 > RBP: R08: 9cbc7fc21240 R09: af19c900 > R10: b12740db75a0 R11: 0010 R12: f00480522d20 > R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00 > ? __zswap_pool_current+0x80/0x90 > z3fold_zpool_shrink+0x1d3/0x3a0 > zswap_frontswap_store+0x10b/0x610 > ... > With lock debugging enabled, the log is a bit different, but similar. > BUG: MAX_LOCK_DEPTH too low! > turning off the locking correctness validator. > depth: 48 max: 48! > 48 locks held by memory-eater/619: > #0: 2da807ce (>mmap_sem){}, at: __do_page_fault+0x122/0x5a0 > #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 > #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 > #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 > ... > watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613] > Modules linked in: > irq event stamp: 1435394 > hardirqs last enabled at (1435393): [] > _raw_spin_unlock_irqrestore+0x51/0x60 > hardirqs last disabled at (1435394): [] __schedule+0xba/0xbb0 > softirqs last enabled at (1434508): [] __do_softirq+0x27c/0x516 > softirqs last disabled at (1434323): [] irq_exit+0xa9/0xc0 > CPU: 0 PID: 613 Comm: memory-eater Tainted: GW > 4.16.0-yocto-standard+ #9 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 > RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0 > RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13 > RAX: RBX: 9704379cba08 RCX: 97043fc22080 > RDX: 0001 RSI: 9c05f6a0 RDI: 0004 > RBP: R08: 9b1f2215 R09: > R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080 > R13: de77404e0920 R14: 970413824000 R15: 9704379cba00 > FS: 7f8c6317f700() GS:97043fc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0 > Call Trace: > do_raw_spin_lock+0xad/0xb0 > z3fold_zpool_malloc+0x595/0x790 > ... > The problem is easy to reproduce. Please see [2] and the other files > at [3] for details. Various additional crash logs, observed with > chromeos-4.14, are available at [4]. > Please let me know if there is anything else I can do to help solving > or debugging the problem. I had a look into the code, but I must admit > that its locking is a mystery to me. > Thanks, > Guenter > --- > [1] http://server.roeck-us.net/qemu/z3pool/crashdump > [2] http://server.roeck-us.net/qemu/z3pool/README > [3] http://server.roeck-us.net/qemu/z3pool/ > [4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360
Crashes/hung tasks with z3pool under memory pressure
Hi all, we are observing crashes with z3pool under memory pressure. The kernel version used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was also seen with v4.14 based kernels. For simplicity, here is a set of shortened logs. A more complete log is available at [1]. [ cut here ] DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10) WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212 preempt_count_add+0x90/0xa0 Modules linked in: CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 RIP: 0010:preempt_count_add+0x90/0xa0 RSP: :b12740db7750 EFLAGS: 00010286 RAX: RBX: 0001 RCX: 00f6 RDX: 00f6 RSI: 0082 RDI: RBP: f00480f357a0 R08: 004a R09: 01ad R10: b12740db77e0 R11: R12: 9cbc7e265d10 R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780 FS: 7f5140791700() GS:9cbc7fd0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f513260f000 CR3: 32086000 CR4: 06e0 Call Trace: _raw_spin_trylock+0x13/0x30 z3fold_zpool_shrink+0xab/0x3a0 zswap_frontswap_store+0x10b/0x610 ... WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278 release_z3fold_page_locked+0x25/0x40 Modules linked in: ... INFO: rcu_preempt self-detected stall on CPU 2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906 softirq=4104/4113 fqs=11 ... RIP: 0010:queued_spin_lock_slowpath+0x132/0x190 RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13 RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001 RDX: 0101 RSI: 0001 RDI: 0101 RBP: R08: 9cbc7fc21240 R09: af19c900 R10: b12740db75a0 R11: 0010 R12: f00480522d20 R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00 ? __zswap_pool_current+0x80/0x90 z3fold_zpool_shrink+0x1d3/0x3a0 zswap_frontswap_store+0x10b/0x610 ... With lock debugging enabled, the log is a bit different, but similar. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by memory-eater/619: #0: 2da807ce (>mmap_sem){}, at: __do_page_fault+0x122/0x5a0 #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 ... watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613] Modules linked in: irq event stamp: 1435394 hardirqs last enabled at (1435393): [] _raw_spin_unlock_irqrestore+0x51/0x60 hardirqs last disabled at (1435394): [] __schedule+0xba/0xbb0 softirqs last enabled at (1434508): [] __do_softirq+0x27c/0x516 softirqs last disabled at (1434323): [] irq_exit+0xa9/0xc0 CPU: 0 PID: 613 Comm: memory-eater Tainted: GW 4.16.0-yocto-standard+ #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0 RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13 RAX: RBX: 9704379cba08 RCX: 97043fc22080 RDX: 0001 RSI: 9c05f6a0 RDI: 0004 RBP: R08: 9b1f2215 R09: R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080 R13: de77404e0920 R14: 970413824000 R15: 9704379cba00 FS: 7f8c6317f700() GS:97043fc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0 Call Trace: do_raw_spin_lock+0xad/0xb0 z3fold_zpool_malloc+0x595/0x790 ... The problem is easy to reproduce. Please see [2] and the other files at [3] for details. Various additional crash logs, observed with chromeos-4.14, are available at [4]. Please let me know if there is anything else I can do to help solving or debugging the problem. I had a look into the code, but I must admit that its locking is a mystery to me. Thanks, Guenter --- [1] http://server.roeck-us.net/qemu/z3pool/crashdump [2] http://server.roeck-us.net/qemu/z3pool/README [3] http://server.roeck-us.net/qemu/z3pool/ [4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360
Crashes/hung tasks with z3pool under memory pressure
Hi all, we are observing crashes with z3pool under memory pressure. The kernel version used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was also seen with v4.14 based kernels. For simplicity, here is a set of shortened logs. A more complete log is available at [1]. [ cut here ] DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10) WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212 preempt_count_add+0x90/0xa0 Modules linked in: CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 RIP: 0010:preempt_count_add+0x90/0xa0 RSP: :b12740db7750 EFLAGS: 00010286 RAX: RBX: 0001 RCX: 00f6 RDX: 00f6 RSI: 0082 RDI: RBP: f00480f357a0 R08: 004a R09: 01ad R10: b12740db77e0 R11: R12: 9cbc7e265d10 R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780 FS: 7f5140791700() GS:9cbc7fd0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f513260f000 CR3: 32086000 CR4: 06e0 Call Trace: _raw_spin_trylock+0x13/0x30 z3fold_zpool_shrink+0xab/0x3a0 zswap_frontswap_store+0x10b/0x610 ... WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278 release_z3fold_page_locked+0x25/0x40 Modules linked in: ... INFO: rcu_preempt self-detected stall on CPU 2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906 softirq=4104/4113 fqs=11 ... RIP: 0010:queued_spin_lock_slowpath+0x132/0x190 RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13 RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001 RDX: 0101 RSI: 0001 RDI: 0101 RBP: R08: 9cbc7fc21240 R09: af19c900 R10: b12740db75a0 R11: 0010 R12: f00480522d20 R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00 ? __zswap_pool_current+0x80/0x90 z3fold_zpool_shrink+0x1d3/0x3a0 zswap_frontswap_store+0x10b/0x610 ... With lock debugging enabled, the log is a bit different, but similar. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by memory-eater/619: #0: 2da807ce (>mmap_sem){}, at: __do_page_fault+0x122/0x5a0 #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 ... watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613] Modules linked in: irq event stamp: 1435394 hardirqs last enabled at (1435393): [] _raw_spin_unlock_irqrestore+0x51/0x60 hardirqs last disabled at (1435394): [] __schedule+0xba/0xbb0 softirqs last enabled at (1434508): [] __do_softirq+0x27c/0x516 softirqs last disabled at (1434323): [] irq_exit+0xa9/0xc0 CPU: 0 PID: 613 Comm: memory-eater Tainted: GW 4.16.0-yocto-standard+ #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0 RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13 RAX: RBX: 9704379cba08 RCX: 97043fc22080 RDX: 0001 RSI: 9c05f6a0 RDI: 0004 RBP: R08: 9b1f2215 R09: R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080 R13: de77404e0920 R14: 970413824000 R15: 9704379cba00 FS: 7f8c6317f700() GS:97043fc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0 Call Trace: do_raw_spin_lock+0xad/0xb0 z3fold_zpool_malloc+0x595/0x790 ... The problem is easy to reproduce. Please see [2] and the other files at [3] for details. Various additional crash logs, observed with chromeos-4.14, are available at [4]. Please let me know if there is anything else I can do to help solving or debugging the problem. I had a look into the code, but I must admit that its locking is a mystery to me. Thanks, Guenter --- [1] http://server.roeck-us.net/qemu/z3pool/crashdump [2] http://server.roeck-us.net/qemu/z3pool/README [3] http://server.roeck-us.net/qemu/z3pool/ [4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360