Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-27 Thread Guenter Roeck
Hi Vitaly,

On Thu, Apr 19, 2018 at 08:18:51AM +, Vitaly Wool wrote:
> Hi Guenter,
> 
> Den ons 18 apr. 2018 kl 18:07 skrev Guenter Roeck :
> 
> > On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote:
> > > Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck :
> > >
> > > 
> > >
> > > > Getting better; the log is much less noisy. Unfortunately, there are
> > still
> > > > locking problems, resulting in a hung task. I copied the log message
> > to [1].
> > > > This is with [2] applied on top of v4.17-rc1.
> > >
> > > Now this version (this is a full patch to be applied instead of the
> > previous one) should have the above problem resolved too:
> > >
> >
> > Excellent - I can not reproduce the problem with this patch
> > applied.
> >
> > thank you very much for testing and prompt feedback. I'll come up shortly
> with the final patch.
> 

Any updates ?

Thanks,
Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-27 Thread Guenter Roeck
Hi Vitaly,

On Thu, Apr 19, 2018 at 08:18:51AM +, Vitaly Wool wrote:
> Hi Guenter,
> 
> Den ons 18 apr. 2018 kl 18:07 skrev Guenter Roeck :
> 
> > On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote:
> > > Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck :
> > >
> > > 
> > >
> > > > Getting better; the log is much less noisy. Unfortunately, there are
> > still
> > > > locking problems, resulting in a hung task. I copied the log message
> > to [1].
> > > > This is with [2] applied on top of v4.17-rc1.
> > >
> > > Now this version (this is a full patch to be applied instead of the
> > previous one) should have the above problem resolved too:
> > >
> >
> > Excellent - I can not reproduce the problem with this patch
> > applied.
> >
> > thank you very much for testing and prompt feedback. I'll come up shortly
> with the final patch.
> 

Any updates ?

Thanks,
Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-18 Thread Guenter Roeck
On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote:
> Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck :
> 
> 
> 
> > Getting better; the log is much less noisy. Unfortunately, there are still
> > locking problems, resulting in a hung task. I copied the log message to [1].
> > This is with [2] applied on top of v4.17-rc1.
> 
> Now this version (this is a full patch to be applied instead of the previous 
> one) should have the above problem resolved too:
> 

Excellent - I can not reproduce the problem with this patch
applied.

Guenter

> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..901c0b07cbda 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -144,7 +144,8 @@ enum z3fold_page_flags {
>   PAGE_HEADLESS = 0,
>   MIDDLE_CHUNK_MAPPED,
>   NEEDS_COMPACTING,
> - PAGE_STALE
> + PAGE_STALE,
> + UNDER_RECLAIM
>  };
>  
>  /*
> @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
> *page,
>   clear_bit(MIDDLE_CHUNK_MAPPED, >private);
>   clear_bit(NEEDS_COMPACTING, >private);
>   clear_bit(PAGE_STALE, >private);
> + clear_bit(UNDER_RECLAIM, >private);
>  
>   spin_lock_init(>page_lock);
>   kref_init(>refcount);
> @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, 
> unsigned long handle)
>   atomic64_dec(>pages_nr);
>   return;
>   }
> + if (test_bit(UNDER_RECLAIM, >private)) {
> + z3fold_page_unlock(zhdr);
> + return;
> + }
>   if (test_and_set_bit(NEEDS_COMPACTING, >private)) {
>   z3fold_page_unlock(zhdr);
>   return;
> @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
> unsigned int retries)
>   kref_get(>refcount);
>   list_del_init(>buddy);
>   zhdr->cpu = -1;
> + set_bit(UNDER_RECLAIM, >private);
> + break;
>   }
>  
>   list_del_init(>lru);
> @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> *pool, unsigned int retries)
>   goto next;
>   }
>  next:
> - spin_lock(>lock);
>   if (test_bit(PAGE_HEADLESS, >private)) {
>   if (ret == 0) {
> - spin_unlock(>lock);
>   free_z3fold_page(page);
>   return 0;
>   }
> - } else if (kref_put(>refcount, release_z3fold_page)) {
> - atomic64_dec(>pages_nr);
> + spin_lock(>lock);
> + list_add(>lru, >lru);
> + spin_unlock(>lock);
> + } else {
> + z3fold_page_lock(zhdr);
> + clear_bit(UNDER_RECLAIM, >private);
> +  if (kref_put(>refcount,
> + release_z3fold_page_locked)) {
> + atomic64_dec(>pages_nr);
> + return 0;
> + }
> + /*
> +  * if we are here, the page is still not completely
> +  * free. Take the global pool lock then to be able

extra then ?

> +  * to add it back to the lru list
> +  */
> + spin_lock(>lock);
> + list_add(>lru, >lru);
>   spin_unlock(>lock);
> - return 0;
> + z3fold_page_unlock(zhdr);
>   }
>  
> - /*
> -  * Add to the beginning of LRU.
> -  * Pool lock has to be kept here to ensure the page has
> -  * not already been released
> -  */
> - list_add(>lru, >lru);
> + /* We started off locked to we need to lock the pool back */
> + spin_lock(>lock);
>   }
>   spin_unlock(>lock);
>   return -EAGAIN;


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-18 Thread Guenter Roeck
On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote:
> Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck :
> 
> 
> 
> > Getting better; the log is much less noisy. Unfortunately, there are still
> > locking problems, resulting in a hung task. I copied the log message to [1].
> > This is with [2] applied on top of v4.17-rc1.
> 
> Now this version (this is a full patch to be applied instead of the previous 
> one) should have the above problem resolved too:
> 

Excellent - I can not reproduce the problem with this patch
applied.

Guenter

> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..901c0b07cbda 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -144,7 +144,8 @@ enum z3fold_page_flags {
>   PAGE_HEADLESS = 0,
>   MIDDLE_CHUNK_MAPPED,
>   NEEDS_COMPACTING,
> - PAGE_STALE
> + PAGE_STALE,
> + UNDER_RECLAIM
>  };
>  
>  /*
> @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
> *page,
>   clear_bit(MIDDLE_CHUNK_MAPPED, >private);
>   clear_bit(NEEDS_COMPACTING, >private);
>   clear_bit(PAGE_STALE, >private);
> + clear_bit(UNDER_RECLAIM, >private);
>  
>   spin_lock_init(>page_lock);
>   kref_init(>refcount);
> @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, 
> unsigned long handle)
>   atomic64_dec(>pages_nr);
>   return;
>   }
> + if (test_bit(UNDER_RECLAIM, >private)) {
> + z3fold_page_unlock(zhdr);
> + return;
> + }
>   if (test_and_set_bit(NEEDS_COMPACTING, >private)) {
>   z3fold_page_unlock(zhdr);
>   return;
> @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
> unsigned int retries)
>   kref_get(>refcount);
>   list_del_init(>buddy);
>   zhdr->cpu = -1;
> + set_bit(UNDER_RECLAIM, >private);
> + break;
>   }
>  
>   list_del_init(>lru);
> @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> *pool, unsigned int retries)
>   goto next;
>   }
>  next:
> - spin_lock(>lock);
>   if (test_bit(PAGE_HEADLESS, >private)) {
>   if (ret == 0) {
> - spin_unlock(>lock);
>   free_z3fold_page(page);
>   return 0;
>   }
> - } else if (kref_put(>refcount, release_z3fold_page)) {
> - atomic64_dec(>pages_nr);
> + spin_lock(>lock);
> + list_add(>lru, >lru);
> + spin_unlock(>lock);
> + } else {
> + z3fold_page_lock(zhdr);
> + clear_bit(UNDER_RECLAIM, >private);
> +  if (kref_put(>refcount,
> + release_z3fold_page_locked)) {
> + atomic64_dec(>pages_nr);
> + return 0;
> + }
> + /*
> +  * if we are here, the page is still not completely
> +  * free. Take the global pool lock then to be able

extra then ?

> +  * to add it back to the lru list
> +  */
> + spin_lock(>lock);
> + list_add(>lru, >lru);
>   spin_unlock(>lock);
> - return 0;
> + z3fold_page_unlock(zhdr);
>   }
>  
> - /*
> -  * Add to the beginning of LRU.
> -  * Pool lock has to be kept here to ensure the page has
> -  * not already been released
> -  */
> - list_add(>lru, >lru);
> + /* We started off locked to we need to lock the pool back */
> + spin_lock(>lock);
>   }
>   spin_unlock(>lock);
>   return -EAGAIN;


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-18 Thread Vitaly Wool
Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck :



> Getting better; the log is much less noisy. Unfortunately, there are still
> locking problems, resulting in a hung task. I copied the log message to [1].
> This is with [2] applied on top of v4.17-rc1.

Now this version (this is a full patch to be applied instead of the previous 
one) should have the above problem resolved too:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..901c0b07cbda 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -144,7 +144,8 @@ enum z3fold_page_flags {
PAGE_HEADLESS = 0,
MIDDLE_CHUNK_MAPPED,
NEEDS_COMPACTING,
-   PAGE_STALE
+   PAGE_STALE,
+   UNDER_RECLAIM
 };
 
 /*
@@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page,
clear_bit(MIDDLE_CHUNK_MAPPED, >private);
clear_bit(NEEDS_COMPACTING, >private);
clear_bit(PAGE_STALE, >private);
+   clear_bit(UNDER_RECLAIM, >private);
 
spin_lock_init(>page_lock);
kref_init(>refcount);
@@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
atomic64_dec(>pages_nr);
return;
}
+   if (test_bit(UNDER_RECLAIM, >private)) {
+   z3fold_page_unlock(zhdr);
+   return;
+   }
if (test_and_set_bit(NEEDS_COMPACTING, >private)) {
z3fold_page_unlock(zhdr);
return;
@@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
kref_get(>refcount);
list_del_init(>buddy);
zhdr->cpu = -1;
+   set_bit(UNDER_RECLAIM, >private);
+   break;
}
 
list_del_init(>lru);
@@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
goto next;
}
 next:
-   spin_lock(>lock);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
-   spin_unlock(>lock);
free_z3fold_page(page);
return 0;
}
-   } else if (kref_put(>refcount, release_z3fold_page)) {
-   atomic64_dec(>pages_nr);
+   spin_lock(>lock);
+   list_add(>lru, >lru);
+   spin_unlock(>lock);
+   } else {
+   z3fold_page_lock(zhdr);
+   clear_bit(UNDER_RECLAIM, >private);
+if (kref_put(>refcount,
+   release_z3fold_page_locked)) {
+   atomic64_dec(>pages_nr);
+   return 0;
+   }
+   /*
+* if we are here, the page is still not completely
+* free. Take the global pool lock then to be able
+* to add it back to the lru list
+*/
+   spin_lock(>lock);
+   list_add(>lru, >lru);
spin_unlock(>lock);
-   return 0;
+   z3fold_page_unlock(zhdr);
}
 
-   /*
-* Add to the beginning of LRU.
-* Pool lock has to be kept here to ensure the page has
-* not already been released
-*/
-   list_add(>lru, >lru);
+   /* We started off locked to we need to lock the pool back */
+   spin_lock(>lock);
}
spin_unlock(>lock);
return -EAGAIN;


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-18 Thread Vitaly Wool
Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck :



> Getting better; the log is much less noisy. Unfortunately, there are still
> locking problems, resulting in a hung task. I copied the log message to [1].
> This is with [2] applied on top of v4.17-rc1.

Now this version (this is a full patch to be applied instead of the previous 
one) should have the above problem resolved too:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..901c0b07cbda 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -144,7 +144,8 @@ enum z3fold_page_flags {
PAGE_HEADLESS = 0,
MIDDLE_CHUNK_MAPPED,
NEEDS_COMPACTING,
-   PAGE_STALE
+   PAGE_STALE,
+   UNDER_RECLAIM
 };
 
 /*
@@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page,
clear_bit(MIDDLE_CHUNK_MAPPED, >private);
clear_bit(NEEDS_COMPACTING, >private);
clear_bit(PAGE_STALE, >private);
+   clear_bit(UNDER_RECLAIM, >private);
 
spin_lock_init(>page_lock);
kref_init(>refcount);
@@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
atomic64_dec(>pages_nr);
return;
}
+   if (test_bit(UNDER_RECLAIM, >private)) {
+   z3fold_page_unlock(zhdr);
+   return;
+   }
if (test_and_set_bit(NEEDS_COMPACTING, >private)) {
z3fold_page_unlock(zhdr);
return;
@@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
kref_get(>refcount);
list_del_init(>buddy);
zhdr->cpu = -1;
+   set_bit(UNDER_RECLAIM, >private);
+   break;
}
 
list_del_init(>lru);
@@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
goto next;
}
 next:
-   spin_lock(>lock);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
-   spin_unlock(>lock);
free_z3fold_page(page);
return 0;
}
-   } else if (kref_put(>refcount, release_z3fold_page)) {
-   atomic64_dec(>pages_nr);
+   spin_lock(>lock);
+   list_add(>lru, >lru);
+   spin_unlock(>lock);
+   } else {
+   z3fold_page_lock(zhdr);
+   clear_bit(UNDER_RECLAIM, >private);
+if (kref_put(>refcount,
+   release_z3fold_page_locked)) {
+   atomic64_dec(>pages_nr);
+   return 0;
+   }
+   /*
+* if we are here, the page is still not completely
+* free. Take the global pool lock then to be able
+* to add it back to the lru list
+*/
+   spin_lock(>lock);
+   list_add(>lru, >lru);
spin_unlock(>lock);
-   return 0;
+   z3fold_page_unlock(zhdr);
}
 
-   /*
-* Add to the beginning of LRU.
-* Pool lock has to be kept here to ensure the page has
-* not already been released
-*/
-   list_add(>lru, >lru);
+   /* We started off locked to we need to lock the pool back */
+   spin_lock(>lock);
}
spin_unlock(>lock);
return -EAGAIN;


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-17 Thread Guenter Roeck
On Tue, Apr 17, 2018 at 04:00:32PM +0200, Vitaly Wool wrote:
> Hi Guenter,
> 
> > [ ... ]
> > > Ugh. Could you please keep that patch and apply this on top:
> > > 
> > > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > > index c0bca6153b95..e8a80d044d9e 100644
> > > --- a/mm/z3fold.c
> > > +++ b/mm/z3fold.c
> > > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> > > *pool, unsigned int retries)
> > >   kref_get(>refcount);
> > >   list_del_init(>buddy);
> > >   zhdr->cpu = -1;
> > > + break;
> > >   }
> > >   list_del_init(>lru);
> > > 
> > Much better, in a way. The system now takes much longer to crash,
> > and the crash reason is a bit different. The log is too long to attach,
> > so I copied it to [1].
> > 
> > crashdump.0002  Latest log
> > 000[12]-Fix-attempt-[12].patch  Patches applied on top of v4.17.0-rc1.
> 
> thanks for the update. Let's start from a clean sheet. I believe this patch 
> has to be applied anyway so could you please check if it solves the problem. 
> 

Getting better; the log is much less noisy. Unfortunately, there are still
locking problems, resulting in a hung task. I copied the log message to [1].
This is with [2] applied on top of v4.17-rc1.

Guenter

---
[1] http://server.roeck-us.net/qemu/z3pool/crashdump.0003
[2] http://server.roeck-us.net/qemu/z3pool/0001-Fix-attempt-3.patch


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-17 Thread Guenter Roeck
On Tue, Apr 17, 2018 at 04:00:32PM +0200, Vitaly Wool wrote:
> Hi Guenter,
> 
> > [ ... ]
> > > Ugh. Could you please keep that patch and apply this on top:
> > > 
> > > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > > index c0bca6153b95..e8a80d044d9e 100644
> > > --- a/mm/z3fold.c
> > > +++ b/mm/z3fold.c
> > > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> > > *pool, unsigned int retries)
> > >   kref_get(>refcount);
> > >   list_del_init(>buddy);
> > >   zhdr->cpu = -1;
> > > + break;
> > >   }
> > >   list_del_init(>lru);
> > > 
> > Much better, in a way. The system now takes much longer to crash,
> > and the crash reason is a bit different. The log is too long to attach,
> > so I copied it to [1].
> > 
> > crashdump.0002  Latest log
> > 000[12]-Fix-attempt-[12].patch  Patches applied on top of v4.17.0-rc1.
> 
> thanks for the update. Let's start from a clean sheet. I believe this patch 
> has to be applied anyway so could you please check if it solves the problem. 
> 

Getting better; the log is much less noisy. Unfortunately, there are still
locking problems, resulting in a hung task. I copied the log message to [1].
This is with [2] applied on top of v4.17-rc1.

Guenter

---
[1] http://server.roeck-us.net/qemu/z3pool/crashdump.0003
[2] http://server.roeck-us.net/qemu/z3pool/0001-Fix-attempt-3.patch


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-17 Thread Vitaly Wool
Hi Guenter,

> [ ... ]
> > Ugh. Could you please keep that patch and apply this on top:
> > 
> > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > index c0bca6153b95..e8a80d044d9e 100644
> > --- a/mm/z3fold.c
> > +++ b/mm/z3fold.c
> > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> > *pool, unsigned int retries)
> >   kref_get(>refcount);
> >   list_del_init(>buddy);
> >   zhdr->cpu = -1;
> > + break;
> >   }
> >   list_del_init(>lru);
> > 
> Much better, in a way. The system now takes much longer to crash,
> and the crash reason is a bit different. The log is too long to attach,
> so I copied it to [1].
> 
> crashdump.0002  Latest log
> 000[12]-Fix-attempt-[12].patch  Patches applied on top of v4.17.0-rc1.

thanks for the update. Let's start from a clean sheet. I believe this patch has 
to be applied anyway so could you please check if it solves the problem. 

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..059fb3d5ca86 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -144,7 +144,8 @@ enum z3fold_page_flags {
PAGE_HEADLESS = 0,
MIDDLE_CHUNK_MAPPED,
NEEDS_COMPACTING,
-   PAGE_STALE
+   PAGE_STALE,
+   UNDER_RECLAIM
 };
 
 /*
@@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page,
clear_bit(MIDDLE_CHUNK_MAPPED, >private);
clear_bit(NEEDS_COMPACTING, >private);
clear_bit(PAGE_STALE, >private);
+   clear_bit(UNDER_RECLAIM, >private);
 
spin_lock_init(>page_lock);
kref_init(>refcount);
@@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
atomic64_dec(>pages_nr);
return;
}
+   if (test_bit(UNDER_RECLAIM, >private)) {
+   z3fold_page_unlock(zhdr);
+   return;
+   }
if (test_and_set_bit(NEEDS_COMPACTING, >private)) {
z3fold_page_unlock(zhdr);
return;
@@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
kref_get(>refcount);
list_del_init(>buddy);
zhdr->cpu = -1;
+   set_bit(UNDER_RECLAIM, >private);
+   break;
}
 
list_del_init(>lru);
@@ -888,6 +896,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
}
 next:
spin_lock(>lock);
+   clear_bit(UNDER_RECLAIM, >private);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
spin_unlock(>lock);


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-17 Thread Vitaly Wool
Hi Guenter,

> [ ... ]
> > Ugh. Could you please keep that patch and apply this on top:
> > 
> > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > index c0bca6153b95..e8a80d044d9e 100644
> > --- a/mm/z3fold.c
> > +++ b/mm/z3fold.c
> > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> > *pool, unsigned int retries)
> >   kref_get(>refcount);
> >   list_del_init(>buddy);
> >   zhdr->cpu = -1;
> > + break;
> >   }
> >   list_del_init(>lru);
> > 
> Much better, in a way. The system now takes much longer to crash,
> and the crash reason is a bit different. The log is too long to attach,
> so I copied it to [1].
> 
> crashdump.0002  Latest log
> 000[12]-Fix-attempt-[12].patch  Patches applied on top of v4.17.0-rc1.

thanks for the update. Let's start from a clean sheet. I believe this patch has 
to be applied anyway so could you please check if it solves the problem. 

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..059fb3d5ca86 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -144,7 +144,8 @@ enum z3fold_page_flags {
PAGE_HEADLESS = 0,
MIDDLE_CHUNK_MAPPED,
NEEDS_COMPACTING,
-   PAGE_STALE
+   PAGE_STALE,
+   UNDER_RECLAIM
 };
 
 /*
@@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page,
clear_bit(MIDDLE_CHUNK_MAPPED, >private);
clear_bit(NEEDS_COMPACTING, >private);
clear_bit(PAGE_STALE, >private);
+   clear_bit(UNDER_RECLAIM, >private);
 
spin_lock_init(>page_lock);
kref_init(>refcount);
@@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
atomic64_dec(>pages_nr);
return;
}
+   if (test_bit(UNDER_RECLAIM, >private)) {
+   z3fold_page_unlock(zhdr);
+   return;
+   }
if (test_and_set_bit(NEEDS_COMPACTING, >private)) {
z3fold_page_unlock(zhdr);
return;
@@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
kref_get(>refcount);
list_del_init(>buddy);
zhdr->cpu = -1;
+   set_bit(UNDER_RECLAIM, >private);
+   break;
}
 
list_del_init(>lru);
@@ -888,6 +896,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
}
 next:
spin_lock(>lock);
+   clear_bit(UNDER_RECLAIM, >private);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
spin_unlock(>lock);


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Guenter Roeck
On Tue, Apr 17, 2018 at 12:14:37AM +0200, Vitaly Wool wrote:
[ ... ]
> Ugh. Could you please keep that patch and apply this on top:
> 
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..e8a80d044d9e 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
> unsigned int retries)
>   kref_get(>refcount);
>   list_del_init(>buddy);
>   zhdr->cpu = -1;
> + break;
>   }
>   list_del_init(>lru);
> 
Much better, in a way. The system now takes much longer to crash,
and the crash reason is a bit different. The log is too long to attach,
so I copied it to [1].

crashdump.0002  Latest log
000[12]-Fix-attempt-[12].patch  Patches applied on top of v4.17.0-rc1.

Hope it helps,
Guenter


[1] http://server.roeck-us.net/qemu/z3pool/


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Guenter Roeck
On Tue, Apr 17, 2018 at 12:14:37AM +0200, Vitaly Wool wrote:
[ ... ]
> Ugh. Could you please keep that patch and apply this on top:
> 
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..e8a80d044d9e 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
> unsigned int retries)
>   kref_get(>refcount);
>   list_del_init(>buddy);
>   zhdr->cpu = -1;
> + break;
>   }
>   list_del_init(>lru);
> 
Much better, in a way. The system now takes much longer to crash,
and the crash reason is a bit different. The log is too long to attach,
so I copied it to [1].

crashdump.0002  Latest log
000[12]-Fix-attempt-[12].patch  Patches applied on top of v4.17.0-rc1.

Hope it helps,
Guenter


[1] http://server.roeck-us.net/qemu/z3pool/


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Vitaly Wool

On 4/16/18 5:58 PM, Guenter Roeck wrote:


On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote:

Hey Guenter,

On 04/13/2018 07:56 PM, Guenter Roeck wrote:


On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:

On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:


On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:

Hi Guenter,


Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :


Hi all,
we are observing crashes with z3pool under memory pressure. The kernel

version

used to reproduce the problem is v4.16-11827-g5d1365940a68, but the

problem was

also seen with v4.14 based kernels.

just before I dig into this, could you please try reproducing the errors
you see with https://patchwork.kernel.org/patch/10210459/ applied?


As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
includes this patch.


Bah. Sorry. Expect an update after the weekend.


NP; easy to miss. Thanks a lot for looking into it.


I wonder if the following patch would make a difference:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..5e547c2d5832 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
goto next;
}
  next:
-   spin_lock(>lock);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
-   spin_unlock(>lock);
free_z3fold_page(page);
return 0;
}
-   } else if (kref_put(>refcount, release_z3fold_page)) {
-   atomic64_dec(>pages_nr);
-   spin_unlock(>lock);
-   return 0;
+   } else {
+   spin_lock(>page_lock);
+   if (kref_put(>refcount, 
release_z3fold_page_locked)) {
+   atomic64_dec(>pages_nr);
+   return 0;
+   }
+   spin_unlock(>page_lock);
}
+   spin_lock(>lock);
/*
 * Add to the beginning of LRU.
 * Pool lock has to be kept here to ensure the page has


No, it doesn't. Same crash.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by kswapd0/51:
  #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: 
z3fold_zpool_shrink+0x47/0x3e0
  #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
  #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
  #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
...
PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
  dump_stack+0x67/0x9b
  __lock_acquire+0x429/0x18f0
  ? __lock_acquire+0x2af/0x18f0
  ? __lock_acquire+0x2af/0x18f0
  ? lock_acquire+0x93/0x230
  lock_acquire+0x93/0x230
  ? z3fold_zpool_shrink+0xb7/0x3e0
  _raw_spin_trylock+0x65/0x80
  ? z3fold_zpool_shrink+0xb7/0x3e0
  ? z3fold_zpool_shrink+0x47/0x3e0
  z3fold_zpool_shrink+0xb7/0x3e0
  zswap_frontswap_store+0x180/0x7c0
...
BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0
INFO: lockdep is turned off.
Preemption disabled at:
[<>]   (null)
CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
  dump_stack+0x67/0x9b
  ___might_sleep+0x16c/0x250
  __alloc_pages_nodemask+0x1e7/0x1490
  ? lock_acquire+0x93/0x230
  ? lock_acquire+0x93/0x230
  __read_swap_cache_async+0x14d/0x260
  zswap_writeback_entry+0xdb/0x340
  z3fold_zpool_shrink+0x2b1/0x3e0
  zswap_frontswap_store+0x180/0x7c0
  ? page_vma_mapped_walk+0x22/0x230
  __frontswap_store+0x6e/0xf0
  swap_writepage+0x49/0x70
...

This is with your patch applied on top of v4.17-rc1.

Guenter


Ugh. Could you please keep that patch and apply this on top:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..e8a80d044d9e 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
kref_get(>refcount);
list_del_init(>buddy);
zhdr->cpu = -1;
+   break;
}
 
 		list_del_init(>lru);


Thanks,
   Vitaly



Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Vitaly Wool

On 4/16/18 5:58 PM, Guenter Roeck wrote:


On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote:

Hey Guenter,

On 04/13/2018 07:56 PM, Guenter Roeck wrote:


On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:

On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:


On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:

Hi Guenter,


Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :


Hi all,
we are observing crashes with z3pool under memory pressure. The kernel

version

used to reproduce the problem is v4.16-11827-g5d1365940a68, but the

problem was

also seen with v4.14 based kernels.

just before I dig into this, could you please try reproducing the errors
you see with https://patchwork.kernel.org/patch/10210459/ applied?


As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
includes this patch.


Bah. Sorry. Expect an update after the weekend.


NP; easy to miss. Thanks a lot for looking into it.


I wonder if the following patch would make a difference:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..5e547c2d5832 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
goto next;
}
  next:
-   spin_lock(>lock);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
-   spin_unlock(>lock);
free_z3fold_page(page);
return 0;
}
-   } else if (kref_put(>refcount, release_z3fold_page)) {
-   atomic64_dec(>pages_nr);
-   spin_unlock(>lock);
-   return 0;
+   } else {
+   spin_lock(>page_lock);
+   if (kref_put(>refcount, 
release_z3fold_page_locked)) {
+   atomic64_dec(>pages_nr);
+   return 0;
+   }
+   spin_unlock(>page_lock);
}
+   spin_lock(>lock);
/*
 * Add to the beginning of LRU.
 * Pool lock has to be kept here to ensure the page has


No, it doesn't. Same crash.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by kswapd0/51:
  #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: 
z3fold_zpool_shrink+0x47/0x3e0
  #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
  #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
  #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
...
PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
  dump_stack+0x67/0x9b
  __lock_acquire+0x429/0x18f0
  ? __lock_acquire+0x2af/0x18f0
  ? __lock_acquire+0x2af/0x18f0
  ? lock_acquire+0x93/0x230
  lock_acquire+0x93/0x230
  ? z3fold_zpool_shrink+0xb7/0x3e0
  _raw_spin_trylock+0x65/0x80
  ? z3fold_zpool_shrink+0xb7/0x3e0
  ? z3fold_zpool_shrink+0x47/0x3e0
  z3fold_zpool_shrink+0xb7/0x3e0
  zswap_frontswap_store+0x180/0x7c0
...
BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0
INFO: lockdep is turned off.
Preemption disabled at:
[<>]   (null)
CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
  dump_stack+0x67/0x9b
  ___might_sleep+0x16c/0x250
  __alloc_pages_nodemask+0x1e7/0x1490
  ? lock_acquire+0x93/0x230
  ? lock_acquire+0x93/0x230
  __read_swap_cache_async+0x14d/0x260
  zswap_writeback_entry+0xdb/0x340
  z3fold_zpool_shrink+0x2b1/0x3e0
  zswap_frontswap_store+0x180/0x7c0
  ? page_vma_mapped_walk+0x22/0x230
  __frontswap_store+0x6e/0xf0
  swap_writepage+0x49/0x70
...

This is with your patch applied on top of v4.17-rc1.

Guenter


Ugh. Could you please keep that patch and apply this on top:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..e8a80d044d9e 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
kref_get(>refcount);
list_del_init(>buddy);
zhdr->cpu = -1;
+   break;
}
 
 		list_del_init(>lru);


Thanks,
   Vitaly



Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Guenter Roeck
On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote:
> Hey Guenter,
> 
> On 04/13/2018 07:56 PM, Guenter Roeck wrote:
> 
> >On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:
> >>On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:
> >>
> >>>On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:
> Hi Guenter,
> 
> 
> Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :
> 
> >Hi all,
> >we are observing crashes with z3pool under memory pressure. The kernel
> version
> >used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> problem was
> >also seen with v4.14 based kernels.
> 
> just before I dig into this, could you please try reproducing the errors
> you see with https://patchwork.kernel.org/patch/10210459/ applied?
> 
> >>>As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
> >>>includes this patch.
> >>>
> >>Bah. Sorry. Expect an update after the weekend.
> >>
> >NP; easy to miss. Thanks a lot for looking into it.
> >
> I wonder if the following patch would make a difference:
> 
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..5e547c2d5832 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> *pool, unsigned int retries)
>   goto next;
>   }
>  next:
> - spin_lock(>lock);
>   if (test_bit(PAGE_HEADLESS, >private)) {
>   if (ret == 0) {
> - spin_unlock(>lock);
>   free_z3fold_page(page);
>   return 0;
>   }
> - } else if (kref_put(>refcount, release_z3fold_page)) {
> - atomic64_dec(>pages_nr);
> - spin_unlock(>lock);
> - return 0;
> + } else {
> + spin_lock(>page_lock);
> + if (kref_put(>refcount, 
> release_z3fold_page_locked)) {
> + atomic64_dec(>pages_nr);
> + return 0;
> + }
> + spin_unlock(>page_lock);
>   }
> + spin_lock(>lock);
>   /*
>* Add to the beginning of LRU.
>* Pool lock has to be kept here to ensure the page has
> 
No, it doesn't. Same crash.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by kswapd0/51:
 #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: 
z3fold_zpool_shrink+0x47/0x3e0
 #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
 #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
 #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
...
PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x67/0x9b
 __lock_acquire+0x429/0x18f0
 ? __lock_acquire+0x2af/0x18f0
 ? __lock_acquire+0x2af/0x18f0
 ? lock_acquire+0x93/0x230
 lock_acquire+0x93/0x230
 ? z3fold_zpool_shrink+0xb7/0x3e0
 _raw_spin_trylock+0x65/0x80
 ? z3fold_zpool_shrink+0xb7/0x3e0
 ? z3fold_zpool_shrink+0x47/0x3e0
 z3fold_zpool_shrink+0xb7/0x3e0
 zswap_frontswap_store+0x180/0x7c0
...
BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0
INFO: lockdep is turned off.
Preemption disabled at:
[<>]   (null)
CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x67/0x9b
 ___might_sleep+0x16c/0x250
 __alloc_pages_nodemask+0x1e7/0x1490
 ? lock_acquire+0x93/0x230
 ? lock_acquire+0x93/0x230
 __read_swap_cache_async+0x14d/0x260
 zswap_writeback_entry+0xdb/0x340
 z3fold_zpool_shrink+0x2b1/0x3e0
 zswap_frontswap_store+0x180/0x7c0
 ? page_vma_mapped_walk+0x22/0x230
 __frontswap_store+0x6e/0xf0
 swap_writepage+0x49/0x70
...

This is with your patch applied on top of v4.17-rc1.

Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Guenter Roeck
On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote:
> Hey Guenter,
> 
> On 04/13/2018 07:56 PM, Guenter Roeck wrote:
> 
> >On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:
> >>On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:
> >>
> >>>On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:
> Hi Guenter,
> 
> 
> Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :
> 
> >Hi all,
> >we are observing crashes with z3pool under memory pressure. The kernel
> version
> >used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> problem was
> >also seen with v4.14 based kernels.
> 
> just before I dig into this, could you please try reproducing the errors
> you see with https://patchwork.kernel.org/patch/10210459/ applied?
> 
> >>>As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
> >>>includes this patch.
> >>>
> >>Bah. Sorry. Expect an update after the weekend.
> >>
> >NP; easy to miss. Thanks a lot for looking into it.
> >
> I wonder if the following patch would make a difference:
> 
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..5e547c2d5832 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool 
> *pool, unsigned int retries)
>   goto next;
>   }
>  next:
> - spin_lock(>lock);
>   if (test_bit(PAGE_HEADLESS, >private)) {
>   if (ret == 0) {
> - spin_unlock(>lock);
>   free_z3fold_page(page);
>   return 0;
>   }
> - } else if (kref_put(>refcount, release_z3fold_page)) {
> - atomic64_dec(>pages_nr);
> - spin_unlock(>lock);
> - return 0;
> + } else {
> + spin_lock(>page_lock);
> + if (kref_put(>refcount, 
> release_z3fold_page_locked)) {
> + atomic64_dec(>pages_nr);
> + return 0;
> + }
> + spin_unlock(>page_lock);
>   }
> + spin_lock(>lock);
>   /*
>* Add to the beginning of LRU.
>* Pool lock has to be kept here to ensure the page has
> 
No, it doesn't. Same crash.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by kswapd0/51:
 #0: 4d7a35a9 (&(>lock)->rlock#3){+.+.}, at: 
z3fold_zpool_shrink+0x47/0x3e0
 #1: 7739f49e (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
 #2: ff6cd4c8 (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
 #3: 4cffc6cb (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
...
PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x67/0x9b
 __lock_acquire+0x429/0x18f0
 ? __lock_acquire+0x2af/0x18f0
 ? __lock_acquire+0x2af/0x18f0
 ? lock_acquire+0x93/0x230
 lock_acquire+0x93/0x230
 ? z3fold_zpool_shrink+0xb7/0x3e0
 _raw_spin_trylock+0x65/0x80
 ? z3fold_zpool_shrink+0xb7/0x3e0
 ? z3fold_zpool_shrink+0x47/0x3e0
 z3fold_zpool_shrink+0xb7/0x3e0
 zswap_frontswap_store+0x180/0x7c0
...
BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0
INFO: lockdep is turned off.
Preemption disabled at:
[<>]   (null)
CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x67/0x9b
 ___might_sleep+0x16c/0x250
 __alloc_pages_nodemask+0x1e7/0x1490
 ? lock_acquire+0x93/0x230
 ? lock_acquire+0x93/0x230
 __read_swap_cache_async+0x14d/0x260
 zswap_writeback_entry+0xdb/0x340
 z3fold_zpool_shrink+0x2b1/0x3e0
 zswap_frontswap_store+0x180/0x7c0
 ? page_vma_mapped_walk+0x22/0x230
 __frontswap_store+0x6e/0xf0
 swap_writepage+0x49/0x70
...

This is with your patch applied on top of v4.17-rc1.

Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Vitaly Wool

Hey Guenter,

On 04/13/2018 07:56 PM, Guenter Roeck wrote:


On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:

On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:


On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:

Hi Guenter,


Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :


Hi all,
we are observing crashes with z3pool under memory pressure. The kernel

version

used to reproduce the problem is v4.16-11827-g5d1365940a68, but the

problem was

also seen with v4.14 based kernels.


just before I dig into this, could you please try reproducing the errors
you see with https://patchwork.kernel.org/patch/10210459/ applied?


As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
includes this patch.


Bah. Sorry. Expect an update after the weekend.


NP; easy to miss. Thanks a lot for looking into it.


I wonder if the following patch would make a difference:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..5e547c2d5832 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
goto next;
}
 next:
-   spin_lock(>lock);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
-   spin_unlock(>lock);
free_z3fold_page(page);
return 0;
}
-   } else if (kref_put(>refcount, release_z3fold_page)) {
-   atomic64_dec(>pages_nr);
-   spin_unlock(>lock);
-   return 0;
+   } else {
+   spin_lock(>page_lock);
+   if (kref_put(>refcount, 
release_z3fold_page_locked)) {
+   atomic64_dec(>pages_nr);
+   return 0;
+   }
+   spin_unlock(>page_lock);
}
 
+		spin_lock(>lock);

/*
 * Add to the beginning of LRU.
 * Pool lock has to be kept here to ensure the page has


Thanks,
   Vitaly



Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-16 Thread Vitaly Wool

Hey Guenter,

On 04/13/2018 07:56 PM, Guenter Roeck wrote:


On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:

On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:


On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:

Hi Guenter,


Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :


Hi all,
we are observing crashes with z3pool under memory pressure. The kernel

version

used to reproduce the problem is v4.16-11827-g5d1365940a68, but the

problem was

also seen with v4.14 based kernels.


just before I dig into this, could you please try reproducing the errors
you see with https://patchwork.kernel.org/patch/10210459/ applied?


As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
includes this patch.


Bah. Sorry. Expect an update after the weekend.


NP; easy to miss. Thanks a lot for looking into it.


I wonder if the following patch would make a difference:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..5e547c2d5832 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
goto next;
}
 next:
-   spin_lock(>lock);
if (test_bit(PAGE_HEADLESS, >private)) {
if (ret == 0) {
-   spin_unlock(>lock);
free_z3fold_page(page);
return 0;
}
-   } else if (kref_put(>refcount, release_z3fold_page)) {
-   atomic64_dec(>pages_nr);
-   spin_unlock(>lock);
-   return 0;
+   } else {
+   spin_lock(>page_lock);
+   if (kref_put(>refcount, 
release_z3fold_page_locked)) {
+   atomic64_dec(>pages_nr);
+   return 0;
+   }
+   spin_unlock(>page_lock);
}
 
+		spin_lock(>lock);

/*
 * Add to the beginning of LRU.
 * Pool lock has to be kept here to ensure the page has


Thanks,
   Vitaly



Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-13 Thread Guenter Roeck
On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:
> On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:
> 
> > On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:
> > > Hi Guenter,
> > >
> > >
> > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :
> > >
> > > > Hi all,
> > >
> > > > we are observing crashes with z3pool under memory pressure. The kernel
> > > version
> > > > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> > > problem was
> > > > also seen with v4.14 based kernels.
> > >
> > >
> > > just before I dig into this, could you please try reproducing the errors
> > > you see with https://patchwork.kernel.org/patch/10210459/ applied?
> > >
> >
> > As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
> > includes this patch.
> >
> 
> Bah. Sorry. Expect an update after the weekend.
> 
NP; easy to miss. Thanks a lot for looking into it.

Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-13 Thread Guenter Roeck
On Fri, Apr 13, 2018 at 05:40:18PM +, Vitaly Wool wrote:
> On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck  wrote:
> 
> > On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:
> > > Hi Guenter,
> > >
> > >
> > > Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :
> > >
> > > > Hi all,
> > >
> > > > we are observing crashes with z3pool under memory pressure. The kernel
> > > version
> > > > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> > > problem was
> > > > also seen with v4.14 based kernels.
> > >
> > >
> > > just before I dig into this, could you please try reproducing the errors
> > > you see with https://patchwork.kernel.org/patch/10210459/ applied?
> > >
> >
> > As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
> > includes this patch.
> >
> 
> Bah. Sorry. Expect an update after the weekend.
> 
NP; easy to miss. Thanks a lot for looking into it.

Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-13 Thread Guenter Roeck
On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:
> Hi Guenter,
> 
> 
> Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :
> 
> > Hi all,
> 
> > we are observing crashes with z3pool under memory pressure. The kernel
> version
> > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> problem was
> > also seen with v4.14 based kernels.
> 
> 
> just before I dig into this, could you please try reproducing the errors
> you see with https://patchwork.kernel.org/patch/10210459/ applied?
> 

As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
includes this patch.

$ git log --oneline v4.14..5d1365940a68 mm/z3fold.c
8a97ea546bb6 mm/z3fold.c: use gfpflags_allow_blocking
1ec6995d1290 z3fold: fix memory leak
5c9bab592f53 z3fold: limit use of stale list for allocation
f144c390f905 mm: docs: fix parameter names mismatch
5d03a6613957 mm/z3fold.c: use kref to prevent page free/compact race

Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-13 Thread Guenter Roeck
On Fri, Apr 13, 2018 at 05:21:02AM +, Vitaly Wool wrote:
> Hi Guenter,
> 
> 
> Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :
> 
> > Hi all,
> 
> > we are observing crashes with z3pool under memory pressure. The kernel
> version
> > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> problem was
> > also seen with v4.14 based kernels.
> 
> 
> just before I dig into this, could you please try reproducing the errors
> you see with https://patchwork.kernel.org/patch/10210459/ applied?
> 

As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
includes this patch.

$ git log --oneline v4.14..5d1365940a68 mm/z3fold.c
8a97ea546bb6 mm/z3fold.c: use gfpflags_allow_blocking
1ec6995d1290 z3fold: fix memory leak
5c9bab592f53 z3fold: limit use of stale list for allocation
f144c390f905 mm: docs: fix parameter names mismatch
5d03a6613957 mm/z3fold.c: use kref to prevent page free/compact race

Guenter


Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-12 Thread Vitaly Wool
Hi Guenter,


Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :

> Hi all,

> we are observing crashes with z3pool under memory pressure. The kernel
version
> used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
problem was
> also seen with v4.14 based kernels.


just before I dig into this, could you please try reproducing the errors
you see with https://patchwork.kernel.org/patch/10210459/ applied?

Thanks,
Vitaly

> For simplicity, here is a set of shortened logs. A more complete log is
> available at [1].

> [ cut here ]
> DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10)
> WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212
preempt_count_add+0x90/0xa0
> Modules linked in:
> CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1
04/01/2014
> RIP: 0010:preempt_count_add+0x90/0xa0
> RSP: :b12740db7750 EFLAGS: 00010286
> RAX:  RBX: 0001 RCX: 00f6
> RDX: 00f6 RSI: 0082 RDI: 
> RBP: f00480f357a0 R08: 004a R09: 01ad
> R10: b12740db77e0 R11:  R12: 9cbc7e265d10
> R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780
> FS:  7f5140791700() GS:9cbc7fd0()
knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f513260f000 CR3: 32086000 CR4: 06e0
> Call Trace:
>   _raw_spin_trylock+0x13/0x30
>   z3fold_zpool_shrink+0xab/0x3a0
>   zswap_frontswap_store+0x10b/0x610
> ...
> WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278
release_z3fold_page_locked+0x25/0x40
> Modules linked in:
> ...
> INFO: rcu_preempt self-detected stall on CPU
>  2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906
>  softirq=4104/4113 fqs=11
> ...
> RIP: 0010:queued_spin_lock_slowpath+0x132/0x190
> RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13
> RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001
> RDX: 0101 RSI: 0001 RDI: 0101
> RBP:  R08: 9cbc7fc21240 R09: af19c900
> R10: b12740db75a0 R11: 0010 R12: f00480522d20
> R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00
>   ? __zswap_pool_current+0x80/0x90
>   z3fold_zpool_shrink+0x1d3/0x3a0
>   zswap_frontswap_store+0x10b/0x610
> ...

> With lock debugging enabled, the log is a bit different, but similar.

> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> depth: 48  max: 48!
> 48 locks held by memory-eater/619:
>   #0: 2da807ce (>mmap_sem){}, at:
__do_page_fault+0x122/0x5a0
>   #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at:
z3fold_zpool_shrink+0x47/0x3e0
>   #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at:
z3fold_zpool_shrink+0xb7/0x3e0
>   #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at:
z3fold_zpool_shrink+0xb7/0x3e0
> ...
> watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613]
> Modules linked in:
> irq event stamp: 1435394
> hardirqs last  enabled at (1435393): []
> _raw_spin_unlock_irqrestore+0x51/0x60
> hardirqs last disabled at (1435394): []
__schedule+0xba/0xbb0
> softirqs last  enabled at (1434508): []
__do_softirq+0x27c/0x516
> softirqs last disabled at (1434323): []
irq_exit+0xa9/0xc0
> CPU: 0 PID: 613 Comm: memory-eater Tainted: GW
> 4.16.0-yocto-standard+ #9
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1
04/01/2014
> RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0
> RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13
> RAX:  RBX: 9704379cba08 RCX: 97043fc22080
> RDX: 0001 RSI: 9c05f6a0 RDI: 0004
> RBP:  R08: 9b1f2215 R09: 
> R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080
> R13: de77404e0920 R14: 970413824000 R15: 9704379cba00
> FS:  7f8c6317f700() GS:97043fc0()
knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0
> Call Trace:
>   do_raw_spin_lock+0xad/0xb0
>   z3fold_zpool_malloc+0x595/0x790
> ...

> The problem is easy to reproduce. Please see [2] and the other files
> at [3] for details. Various additional crash logs, observed with
> chromeos-4.14, are available at [4].

> Please let me know if there is anything else I can do to help solving
> or debugging the problem. I had a look into the code, but I must admit
> that its locking is a mystery to me.

> Thanks,
> Guenter

> ---
> [1] http://server.roeck-us.net/qemu/z3pool/crashdump
> [2] http://server.roeck-us.net/qemu/z3pool/README
> [3] http://server.roeck-us.net/qemu/z3pool/
> [4] 

Re: Crashes/hung tasks with z3pool under memory pressure

2018-04-12 Thread Vitaly Wool
Hi Guenter,


Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck :

> Hi all,

> we are observing crashes with z3pool under memory pressure. The kernel
version
> used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
problem was
> also seen with v4.14 based kernels.


just before I dig into this, could you please try reproducing the errors
you see with https://patchwork.kernel.org/patch/10210459/ applied?

Thanks,
Vitaly

> For simplicity, here is a set of shortened logs. A more complete log is
> available at [1].

> [ cut here ]
> DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10)
> WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212
preempt_count_add+0x90/0xa0
> Modules linked in:
> CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1
04/01/2014
> RIP: 0010:preempt_count_add+0x90/0xa0
> RSP: :b12740db7750 EFLAGS: 00010286
> RAX:  RBX: 0001 RCX: 00f6
> RDX: 00f6 RSI: 0082 RDI: 
> RBP: f00480f357a0 R08: 004a R09: 01ad
> R10: b12740db77e0 R11:  R12: 9cbc7e265d10
> R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780
> FS:  7f5140791700() GS:9cbc7fd0()
knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f513260f000 CR3: 32086000 CR4: 06e0
> Call Trace:
>   _raw_spin_trylock+0x13/0x30
>   z3fold_zpool_shrink+0xab/0x3a0
>   zswap_frontswap_store+0x10b/0x610
> ...
> WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278
release_z3fold_page_locked+0x25/0x40
> Modules linked in:
> ...
> INFO: rcu_preempt self-detected stall on CPU
>  2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906
>  softirq=4104/4113 fqs=11
> ...
> RIP: 0010:queued_spin_lock_slowpath+0x132/0x190
> RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13
> RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001
> RDX: 0101 RSI: 0001 RDI: 0101
> RBP:  R08: 9cbc7fc21240 R09: af19c900
> R10: b12740db75a0 R11: 0010 R12: f00480522d20
> R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00
>   ? __zswap_pool_current+0x80/0x90
>   z3fold_zpool_shrink+0x1d3/0x3a0
>   zswap_frontswap_store+0x10b/0x610
> ...

> With lock debugging enabled, the log is a bit different, but similar.

> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> depth: 48  max: 48!
> 48 locks held by memory-eater/619:
>   #0: 2da807ce (>mmap_sem){}, at:
__do_page_fault+0x122/0x5a0
>   #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at:
z3fold_zpool_shrink+0x47/0x3e0
>   #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at:
z3fold_zpool_shrink+0xb7/0x3e0
>   #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at:
z3fold_zpool_shrink+0xb7/0x3e0
> ...
> watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613]
> Modules linked in:
> irq event stamp: 1435394
> hardirqs last  enabled at (1435393): []
> _raw_spin_unlock_irqrestore+0x51/0x60
> hardirqs last disabled at (1435394): []
__schedule+0xba/0xbb0
> softirqs last  enabled at (1434508): []
__do_softirq+0x27c/0x516
> softirqs last disabled at (1434323): []
irq_exit+0xa9/0xc0
> CPU: 0 PID: 613 Comm: memory-eater Tainted: GW
> 4.16.0-yocto-standard+ #9
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1
04/01/2014
> RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0
> RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13
> RAX:  RBX: 9704379cba08 RCX: 97043fc22080
> RDX: 0001 RSI: 9c05f6a0 RDI: 0004
> RBP:  R08: 9b1f2215 R09: 
> R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080
> R13: de77404e0920 R14: 970413824000 R15: 9704379cba00
> FS:  7f8c6317f700() GS:97043fc0()
knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0
> Call Trace:
>   do_raw_spin_lock+0xad/0xb0
>   z3fold_zpool_malloc+0x595/0x790
> ...

> The problem is easy to reproduce. Please see [2] and the other files
> at [3] for details. Various additional crash logs, observed with
> chromeos-4.14, are available at [4].

> Please let me know if there is anything else I can do to help solving
> or debugging the problem. I had a look into the code, but I must admit
> that its locking is a mystery to me.

> Thanks,
> Guenter

> ---
> [1] http://server.roeck-us.net/qemu/z3pool/crashdump
> [2] http://server.roeck-us.net/qemu/z3pool/README
> [3] http://server.roeck-us.net/qemu/z3pool/
> [4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360


Crashes/hung tasks with z3pool under memory pressure

2018-04-12 Thread Guenter Roeck
Hi all,

we are observing crashes with z3pool under memory pressure. The kernel version
used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was
also seen with v4.14 based kernels.

For simplicity, here is a set of shortened logs. A more complete log is
available at [1].

[ cut here ]
DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10)
WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212 preempt_count_add+0x90/0xa0
Modules linked in:
CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
RIP: 0010:preempt_count_add+0x90/0xa0
RSP: :b12740db7750 EFLAGS: 00010286
RAX:  RBX: 0001 RCX: 00f6
RDX: 00f6 RSI: 0082 RDI: 
RBP: f00480f357a0 R08: 004a R09: 01ad
R10: b12740db77e0 R11:  R12: 9cbc7e265d10
R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780
FS:  7f5140791700() GS:9cbc7fd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f513260f000 CR3: 32086000 CR4: 06e0
Call Trace:
 _raw_spin_trylock+0x13/0x30
 z3fold_zpool_shrink+0xab/0x3a0
 zswap_frontswap_store+0x10b/0x610
...
WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278 release_z3fold_page_locked+0x25/0x40
Modules linked in:
...
INFO: rcu_preempt self-detected stall on CPU
2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906
softirq=4104/4113 fqs=11 
...
RIP: 0010:queued_spin_lock_slowpath+0x132/0x190
RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13
RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001
RDX: 0101 RSI: 0001 RDI: 0101
RBP:  R08: 9cbc7fc21240 R09: af19c900
R10: b12740db75a0 R11: 0010 R12: f00480522d20
R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00
 ? __zswap_pool_current+0x80/0x90
 z3fold_zpool_shrink+0x1d3/0x3a0
 zswap_frontswap_store+0x10b/0x610
...

With lock debugging enabled, the log is a bit different, but similar.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by memory-eater/619:
 #0: 2da807ce (>mmap_sem){}, at: __do_page_fault+0x122/0x5a0
 #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at: 
z3fold_zpool_shrink+0x47/0x3e0
 #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
 #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
...
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613]
Modules linked in:
irq event stamp: 1435394
hardirqs last  enabled at (1435393): []
_raw_spin_unlock_irqrestore+0x51/0x60
hardirqs last disabled at (1435394): [] __schedule+0xba/0xbb0
softirqs last  enabled at (1434508): [] 
__do_softirq+0x27c/0x516
softirqs last disabled at (1434323): [] irq_exit+0xa9/0xc0
CPU: 0 PID: 613 Comm: memory-eater Tainted: GW
4.16.0-yocto-standard+ #9
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0
RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13
RAX:  RBX: 9704379cba08 RCX: 97043fc22080
RDX: 0001 RSI: 9c05f6a0 RDI: 0004
RBP:  R08: 9b1f2215 R09: 
R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080
R13: de77404e0920 R14: 970413824000 R15: 9704379cba00
FS:  7f8c6317f700() GS:97043fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0
Call Trace:
 do_raw_spin_lock+0xad/0xb0
 z3fold_zpool_malloc+0x595/0x790
...

The problem is easy to reproduce. Please see [2] and the other files
at [3] for details. Various additional crash logs, observed with
chromeos-4.14, are available at [4].

Please let me know if there is anything else I can do to help solving
or debugging the problem. I had a look into the code, but I must admit
that its locking is a mystery to me.

Thanks,
Guenter

---
[1] http://server.roeck-us.net/qemu/z3pool/crashdump
[2] http://server.roeck-us.net/qemu/z3pool/README
[3] http://server.roeck-us.net/qemu/z3pool/
[4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360


Crashes/hung tasks with z3pool under memory pressure

2018-04-12 Thread Guenter Roeck
Hi all,

we are observing crashes with z3pool under memory pressure. The kernel version
used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was
also seen with v4.14 based kernels.

For simplicity, here is a set of shortened logs. A more complete log is
available at [1].

[ cut here ]
DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10)
WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212 preempt_count_add+0x90/0xa0
Modules linked in:
CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
RIP: 0010:preempt_count_add+0x90/0xa0
RSP: :b12740db7750 EFLAGS: 00010286
RAX:  RBX: 0001 RCX: 00f6
RDX: 00f6 RSI: 0082 RDI: 
RBP: f00480f357a0 R08: 004a R09: 01ad
R10: b12740db77e0 R11:  R12: 9cbc7e265d10
R13: 9cbc7cd5e000 R14: 9cbc7a7000d8 R15: f00480f35780
FS:  7f5140791700() GS:9cbc7fd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f513260f000 CR3: 32086000 CR4: 06e0
Call Trace:
 _raw_spin_trylock+0x13/0x30
 z3fold_zpool_shrink+0xab/0x3a0
 zswap_frontswap_store+0x10b/0x610
...
WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278 release_z3fold_page_locked+0x25/0x40
Modules linked in:
...
INFO: rcu_preempt self-detected stall on CPU
2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906
softirq=4104/4113 fqs=11 
...
RIP: 0010:queued_spin_lock_slowpath+0x132/0x190
RSP: :b12740db7750 EFLAGS: 0202 ORIG_RAX: ff13
RAX: 00100101 RBX: 9cbc7a7000c8 RCX: 0001
RDX: 0101 RSI: 0001 RDI: 0101
RBP:  R08: 9cbc7fc21240 R09: af19c900
R10: b12740db75a0 R11: 0010 R12: f00480522d20
R13: 9cbc548b4000 R14: 9cbc7a7000d8 R15: f00480522d00
 ? __zswap_pool_current+0x80/0x90
 z3fold_zpool_shrink+0x1d3/0x3a0
 zswap_frontswap_store+0x10b/0x610
...

With lock debugging enabled, the log is a bit different, but similar.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by memory-eater/619:
 #0: 2da807ce (>mmap_sem){}, at: __do_page_fault+0x122/0x5a0
 #1: 12fa6629 (&(>lock)->rlock#3){+.+.}, at: 
z3fold_zpool_shrink+0x47/0x3e0
 #2: c85f45dd (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
 #3: 876f5fdc (&(>page_lock)->rlock){+.+.}, at: 
z3fold_zpool_shrink+0xb7/0x3e0
...
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613]
Modules linked in:
irq event stamp: 1435394
hardirqs last  enabled at (1435393): []
_raw_spin_unlock_irqrestore+0x51/0x60
hardirqs last disabled at (1435394): [] __schedule+0xba/0xbb0
softirqs last  enabled at (1434508): [] 
__do_softirq+0x27c/0x516
softirqs last disabled at (1434323): [] irq_exit+0xa9/0xc0
CPU: 0 PID: 613 Comm: memory-eater Tainted: GW
4.16.0-yocto-standard+ #9
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0
RSP: :a61f80e074e0 EFLAGS: 0246 ORIG_RAX: ff13
RAX:  RBX: 9704379cba08 RCX: 97043fc22080
RDX: 0001 RSI: 9c05f6a0 RDI: 0004
RBP:  R08: 9b1f2215 R09: 
R10: a61f80e07490 R11: 9704379cba20 R12: 97043fc22080
R13: de77404e0920 R14: 970413824000 R15: 9704379cba00
FS:  7f8c6317f700() GS:97043fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8c43f3d010 CR3: 3aba CR4: 06f0
Call Trace:
 do_raw_spin_lock+0xad/0xb0
 z3fold_zpool_malloc+0x595/0x790
...

The problem is easy to reproduce. Please see [2] and the other files
at [3] for details. Various additional crash logs, observed with
chromeos-4.14, are available at [4].

Please let me know if there is anything else I can do to help solving
or debugging the problem. I had a look into the code, but I must admit
that its locking is a mystery to me.

Thanks,
Guenter

---
[1] http://server.roeck-us.net/qemu/z3pool/crashdump
[2] http://server.roeck-us.net/qemu/z3pool/README
[3] http://server.roeck-us.net/qemu/z3pool/
[4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360