Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-12 Thread Shawn Lin


On 2017/9/12 17:42, Linus Walleij wrote:

On Fri, Sep 8, 2017 at 4:51 AM, Shawn Lin  wrote:

On 2017/9/8 4:02, Linus Walleij wrote:

On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson 
wrote:


Even if this fixes the problem it seems like we are papering over the
real issue, which earlier fixes also did during the release cycle for
v4.13.



I think this is the real solution to the issue.


Another unrelated issue with mmc_init_request() is that
mmc_exit_request()
is not called if mmc_init_request() fails, which means
mmc_init_request()
must free anything it allocates when it fails.



Yes, the situations it's just too fragile. We need to fix the behavior
properly, although I haven't myself been able to investigate exactly
how yet.

Adding, Linus, perhaps he has some ideas.



Maybe we should simply bite the bullet and do what was suggested
by another contributor when I refactored the bounce buffer handling:
simply delete the bounce buffer code and let any remaining (few?)
legacy devices suffer a bit (performancewise) at the gain of way
simpler code?


Are you in the same page with what Adrian pointed to?

IIUC, the issue is:
init_rq_fn will be called each time when trying to create new reuqest
from the pre-allocated request_list memeory pool, and exit_rq_fn will is
in the corresponding routine to free request from request_list
(blk_free_request) when finished. But if alloc_request_size fails, it
won't call exit_rq_fn, so you need to prevent memory leak on your own
error path of init_rq_fn.


Yes and that is what the current patch fixes, is it not?


Nope. It fixed the *out-of-bound* memory usage as request queue will
hold 4 requests by default and only use these four requests when meeting
memory pressure. But the code didn't place the correct bouncesz so that
the 4 pre-allocated requests didn't have valid memory page when
allocated.  But in runtime, normally the request queue asks for new
request from request list by dynamically allocating it. So your
init_rq_fn will be called each time. However the mmc_init_request
didn't handle the error path properly。


I simply add SDHCI_QUIRK_BROKEN_ADMA for my SDHCI to force it use
SDMA so that the host->max_segs should be 1. Then add some a hack to
simulate some random failure for mmc_alloc_sg and then after some iozone
stress test, the memory is exhausted finally.

+static u64 skip = 0;
 static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
 {
struct scatterlist *sg;
+   u32 random = 0;
+
+   /* Simply skip the bootup stage */
+   if (skip++ >= 0x800)
+   random = get_random_int();
+
+   if (unlikely((random & 0xf) > 0xd)) {
+   printk("mmc_alloc_sg: make fake failure, random =
0x%x\n", random);
+   return NULL;
+   }


[  540.507195] iozone invoked oom-killer: 
gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), 
nodemask=(null),  order=0, oom_score_adj=0

[  540.508302] iozone cpuset=/ mems_allowed=0
[  540.508727] CPU: 2 PID: 3039 Comm: iozone Not tainted 
4.13.0-next-20170912-3-g01cc0dd5-dirty #110

[  540.509537] Hardware name: Firefly-RK3399 Board (DT)
[  540.509977] Call trace:
[  540.510221] [] dump_backtrace+0x0/0x480
[  540.510707] [] show_stack+0x14/0x20
[  540.511164] [] dump_stack+0xa4/0xc8
[  540.511621] [] dump_header+0xc4/0x2e8
[  540.512091] [] oom_kill_process+0x3a8/0x728
[  540.512605] [] out_of_memory+0x1a8/0x6d8
[  540.513099] [] __alloc_pages_nodemask+0xef8/0xf80
[  540.513660] [] alloc_pages_current+0x9c/0x128
[  540.514191] [] new_slab+0x488/0x5d8
[  540.514645] [] ___slab_alloc+0x378/0x5d0
[  540.515138] [] __slab_alloc.isra.23+0x24/0x38
[  540.515668] [] kmem_cache_alloc+0x1ec/0x218
[  540.516183] [] __blockdev_direct_IO+0x220/0x4a00




But you seem to talk about removing the bounce buffer and so finally
get rid of registering init_rq_fn/exit_rq_fn?


That is in direct response to Ulf's statement
"the situations it's just too fragile" and what can be done about
that is not make the code even more complex but make it
simpler and easier to follow.

One way to achieve that in the long term, is to delete bounce
buffer handling since it adds overhead and complexity.


That is another thing,
and what we right now need to do is to fix the pontential memory leak.
It's quite a simple action, right? :)


It is another thing.

This patch fixes a memory leak, but Ulf is asking how we may
avoid fragile behaviour.


I am a bit hesitant about that because Pierre Ossman said it was
actually a big win on the SDHC hosts that made use of it at one
point.


You had removed packed cmd support to simplify the code, so I think
this is another trade-off need to ask: What you want? and keep
consistent with the direction you insisted on.


Packed command could be removed because it was not used by
any in-tree code. See
commit 03d640ae1f9b24b1d2a11f747143a1ecc0745019



Ok, I was surprised that no any in-tree host enabled it, but if some

Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-12 Thread Shawn Lin


On 2017/9/12 17:42, Linus Walleij wrote:

On Fri, Sep 8, 2017 at 4:51 AM, Shawn Lin  wrote:

On 2017/9/8 4:02, Linus Walleij wrote:

On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson 
wrote:


Even if this fixes the problem it seems like we are papering over the
real issue, which earlier fixes also did during the release cycle for
v4.13.



I think this is the real solution to the issue.


Another unrelated issue with mmc_init_request() is that
mmc_exit_request()
is not called if mmc_init_request() fails, which means
mmc_init_request()
must free anything it allocates when it fails.



Yes, the situations it's just too fragile. We need to fix the behavior
properly, although I haven't myself been able to investigate exactly
how yet.

Adding, Linus, perhaps he has some ideas.



Maybe we should simply bite the bullet and do what was suggested
by another contributor when I refactored the bounce buffer handling:
simply delete the bounce buffer code and let any remaining (few?)
legacy devices suffer a bit (performancewise) at the gain of way
simpler code?


Are you in the same page with what Adrian pointed to?

IIUC, the issue is:
init_rq_fn will be called each time when trying to create new reuqest
from the pre-allocated request_list memeory pool, and exit_rq_fn will is
in the corresponding routine to free request from request_list
(blk_free_request) when finished. But if alloc_request_size fails, it
won't call exit_rq_fn, so you need to prevent memory leak on your own
error path of init_rq_fn.


Yes and that is what the current patch fixes, is it not?


Nope. It fixed the *out-of-bound* memory usage as request queue will
hold 4 requests by default and only use these four requests when meeting
memory pressure. But the code didn't place the correct bouncesz so that
the 4 pre-allocated requests didn't have valid memory page when
allocated.  But in runtime, normally the request queue asks for new
request from request list by dynamically allocating it. So your
init_rq_fn will be called each time. However the mmc_init_request
didn't handle the error path properly。


I simply add SDHCI_QUIRK_BROKEN_ADMA for my SDHCI to force it use
SDMA so that the host->max_segs should be 1. Then add some a hack to
simulate some random failure for mmc_alloc_sg and then after some iozone
stress test, the memory is exhausted finally.

+static u64 skip = 0;
 static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
 {
struct scatterlist *sg;
+   u32 random = 0;
+
+   /* Simply skip the bootup stage */
+   if (skip++ >= 0x800)
+   random = get_random_int();
+
+   if (unlikely((random & 0xf) > 0xd)) {
+   printk("mmc_alloc_sg: make fake failure, random =
0x%x\n", random);
+   return NULL;
+   }


[  540.507195] iozone invoked oom-killer: 
gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), 
nodemask=(null),  order=0, oom_score_adj=0

[  540.508302] iozone cpuset=/ mems_allowed=0
[  540.508727] CPU: 2 PID: 3039 Comm: iozone Not tainted 
4.13.0-next-20170912-3-g01cc0dd5-dirty #110

[  540.509537] Hardware name: Firefly-RK3399 Board (DT)
[  540.509977] Call trace:
[  540.510221] [] dump_backtrace+0x0/0x480
[  540.510707] [] show_stack+0x14/0x20
[  540.511164] [] dump_stack+0xa4/0xc8
[  540.511621] [] dump_header+0xc4/0x2e8
[  540.512091] [] oom_kill_process+0x3a8/0x728
[  540.512605] [] out_of_memory+0x1a8/0x6d8
[  540.513099] [] __alloc_pages_nodemask+0xef8/0xf80
[  540.513660] [] alloc_pages_current+0x9c/0x128
[  540.514191] [] new_slab+0x488/0x5d8
[  540.514645] [] ___slab_alloc+0x378/0x5d0
[  540.515138] [] __slab_alloc.isra.23+0x24/0x38
[  540.515668] [] kmem_cache_alloc+0x1ec/0x218
[  540.516183] [] __blockdev_direct_IO+0x220/0x4a00




But you seem to talk about removing the bounce buffer and so finally
get rid of registering init_rq_fn/exit_rq_fn?


That is in direct response to Ulf's statement
"the situations it's just too fragile" and what can be done about
that is not make the code even more complex but make it
simpler and easier to follow.

One way to achieve that in the long term, is to delete bounce
buffer handling since it adds overhead and complexity.


That is another thing,
and what we right now need to do is to fix the pontential memory leak.
It's quite a simple action, right? :)


It is another thing.

This patch fixes a memory leak, but Ulf is asking how we may
avoid fragile behaviour.


I am a bit hesitant about that because Pierre Ossman said it was
actually a big win on the SDHC hosts that made use of it at one
point.


You had removed packed cmd support to simplify the code, so I think
this is another trade-off need to ask: What you want? and keep
consistent with the direction you insisted on.


Packed command could be removed because it was not used by
any in-tree code. See
commit 03d640ae1f9b24b1d2a11f747143a1ecc0745019



Ok, I was surprised that no any in-tree host enabled it, but if some
host did, the situation was similar with this one.

Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-12 Thread Linus Walleij
On Fri, Sep 8, 2017 at 4:51 AM, Shawn Lin  wrote:
> On 2017/9/8 4:02, Linus Walleij wrote:
>> On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson 
>> wrote:
>>
>>> Even if this fixes the problem it seems like we are papering over the
>>> real issue, which earlier fixes also did during the release cycle for
>>> v4.13.
>>
>>
>> I think this is the real solution to the issue.
>>
 Another unrelated issue with mmc_init_request() is that
 mmc_exit_request()
 is not called if mmc_init_request() fails, which means
 mmc_init_request()
 must free anything it allocates when it fails.
>>>
>>>
>>> Yes, the situations it's just too fragile. We need to fix the behavior
>>> properly, although I haven't myself been able to investigate exactly
>>> how yet.
>>>
>>> Adding, Linus, perhaps he has some ideas.
>>
>>
>> Maybe we should simply bite the bullet and do what was suggested
>> by another contributor when I refactored the bounce buffer handling:
>> simply delete the bounce buffer code and let any remaining (few?)
>> legacy devices suffer a bit (performancewise) at the gain of way
>> simpler code?
>
> Are you in the same page with what Adrian pointed to?
>
> IIUC, the issue is:
> init_rq_fn will be called each time when trying to create new reuqest
> from the pre-allocated request_list memeory pool, and exit_rq_fn will is
> in the corresponding routine to free request from request_list
> (blk_free_request) when finished. But if alloc_request_size fails, it
> won't call exit_rq_fn, so you need to prevent memory leak on your own
> error path of init_rq_fn.

Yes and that is what the current patch fixes, is it not?

> But you seem to talk about removing the bounce buffer and so finally
> get rid of registering init_rq_fn/exit_rq_fn?

That is in direct response to Ulf's statement
"the situations it's just too fragile" and what can be done about
that is not make the code even more complex but make it
simpler and easier to follow.

One way to achieve that in the long term, is to delete bounce
buffer handling since it adds overhead and complexity.

> That is another thing,
> and what we right now need to do is to fix the pontential memory leak.
> It's quite a simple action, right? :)

It is another thing.

This patch fixes a memory leak, but Ulf is asking how we may
avoid fragile behaviour.

>> I am a bit hesitant about that because Pierre Ossman said it was
>> actually a big win on the SDHC hosts that made use of it at one
>> point.
>
> You had removed packed cmd support to simplify the code, so I think
> this is another trade-off need to ask: What you want? and keep
> consistent with the direction you insisted on.

Packed command could be removed because it was not used by
any in-tree code. See
commit 03d640ae1f9b24b1d2a11f747143a1ecc0745019

Bounce buffers on the other hand, as this patch shows, is very much
in use. So if they e.g. improve throughput on these systems
(mainly laptops I think?) it should definately be kept around.

It might be a good idea to make a patch to remove the bounce
buffers and ask people to do before/after throughput tests,
because I do not have this hardware myself. If it doesn't provide
any throughput improvements to any system in use, it should
be deleted. But I don't know that yet.

Yours,
Linus Walleij


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-12 Thread Linus Walleij
On Fri, Sep 8, 2017 at 4:51 AM, Shawn Lin  wrote:
> On 2017/9/8 4:02, Linus Walleij wrote:
>> On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson 
>> wrote:
>>
>>> Even if this fixes the problem it seems like we are papering over the
>>> real issue, which earlier fixes also did during the release cycle for
>>> v4.13.
>>
>>
>> I think this is the real solution to the issue.
>>
 Another unrelated issue with mmc_init_request() is that
 mmc_exit_request()
 is not called if mmc_init_request() fails, which means
 mmc_init_request()
 must free anything it allocates when it fails.
>>>
>>>
>>> Yes, the situations it's just too fragile. We need to fix the behavior
>>> properly, although I haven't myself been able to investigate exactly
>>> how yet.
>>>
>>> Adding, Linus, perhaps he has some ideas.
>>
>>
>> Maybe we should simply bite the bullet and do what was suggested
>> by another contributor when I refactored the bounce buffer handling:
>> simply delete the bounce buffer code and let any remaining (few?)
>> legacy devices suffer a bit (performancewise) at the gain of way
>> simpler code?
>
> Are you in the same page with what Adrian pointed to?
>
> IIUC, the issue is:
> init_rq_fn will be called each time when trying to create new reuqest
> from the pre-allocated request_list memeory pool, and exit_rq_fn will is
> in the corresponding routine to free request from request_list
> (blk_free_request) when finished. But if alloc_request_size fails, it
> won't call exit_rq_fn, so you need to prevent memory leak on your own
> error path of init_rq_fn.

Yes and that is what the current patch fixes, is it not?

> But you seem to talk about removing the bounce buffer and so finally
> get rid of registering init_rq_fn/exit_rq_fn?

That is in direct response to Ulf's statement
"the situations it's just too fragile" and what can be done about
that is not make the code even more complex but make it
simpler and easier to follow.

One way to achieve that in the long term, is to delete bounce
buffer handling since it adds overhead and complexity.

> That is another thing,
> and what we right now need to do is to fix the pontential memory leak.
> It's quite a simple action, right? :)

It is another thing.

This patch fixes a memory leak, but Ulf is asking how we may
avoid fragile behaviour.

>> I am a bit hesitant about that because Pierre Ossman said it was
>> actually a big win on the SDHC hosts that made use of it at one
>> point.
>
> You had removed packed cmd support to simplify the code, so I think
> this is another trade-off need to ask: What you want? and keep
> consistent with the direction you insisted on.

Packed command could be removed because it was not used by
any in-tree code. See
commit 03d640ae1f9b24b1d2a11f747143a1ecc0745019

Bounce buffers on the other hand, as this patch shows, is very much
in use. So if they e.g. improve throughput on these systems
(mainly laptops I think?) it should definately be kept around.

It might be a good idea to make a patch to remove the bounce
buffers and ask people to do before/after throughput tests,
because I do not have this hardware myself. If it doesn't provide
any throughput improvements to any system in use, it should
be deleted. But I don't know that yet.

Yours,
Linus Walleij


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-08 Thread Pavel Machek
On Wed 2017-09-06 10:20:35, Seraphime Kirkovski wrote:
> Hi,
> 
> > > Seems 4.13-rc4 was already broken for that but unfortuantely
> > > I didn't
> > > reproduce that. So maybe Seraphime can do git-bisect as he said "I 
> > > get
> > > it everytime" for which I assume it could be easy for him to find 
> > > out
> > > the problematic commit?
> 
> I can reliably reproduce it, although sometimes it needs some more work.
> For example, I couldn't trigger it while writing less than 1 gigabyte
> and sometimes I have to do it more than once. It helps if the machine is
> doing something else in meantime, I do kernel builds.
> 
> > Another unrelated issue with mmc_init_request() is that
> > mmc_exit_request()
> > is not called if mmc_init_request() fails, which means 
> > mmc_init_request()
> > must free anything it allocates when it fails.
> 
> I'm running your patch for 45 minutes now, it looks like it's fixing the 
> issue on 4.13 81a84ad3cb5711cec79.
> 
> P.S. Sorry about the formatting, have to fix my editor

Thanks for quick testing :-). And your formatting is still better than
some...


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-08 Thread Pavel Machek
On Wed 2017-09-06 10:20:35, Seraphime Kirkovski wrote:
> Hi,
> 
> > > Seems 4.13-rc4 was already broken for that but unfortuantely
> > > I didn't
> > > reproduce that. So maybe Seraphime can do git-bisect as he said "I 
> > > get
> > > it everytime" for which I assume it could be easy for him to find 
> > > out
> > > the problematic commit?
> 
> I can reliably reproduce it, although sometimes it needs some more work.
> For example, I couldn't trigger it while writing less than 1 gigabyte
> and sometimes I have to do it more than once. It helps if the machine is
> doing something else in meantime, I do kernel builds.
> 
> > Another unrelated issue with mmc_init_request() is that
> > mmc_exit_request()
> > is not called if mmc_init_request() fails, which means 
> > mmc_init_request()
> > must free anything it allocates when it fails.
> 
> I'm running your patch for 45 minutes now, it looks like it's fixing the 
> issue on 4.13 81a84ad3cb5711cec79.
> 
> P.S. Sorry about the formatting, have to fix my editor

Thanks for quick testing :-). And your formatting is still better than
some...


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-08 Thread Pavel Machek
On Thu 2017-09-07 14:06:52, Ulf Hansson wrote:
> [...]
> 
> >
> > From: Adrian Hunter 
> > Date: Thu, 7 Sep 2017 10:40:35 +0300
> > Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
> >
> > mmc_init_request() depends on card->bouncesz so it must be calculated
> > before blk_init_allocated_queue() starts allocating requests.
> >
> > Reported-by: Seraphime Kirkovski 
> > Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> > layer core")
> > Signed-off-by: Adrian Hunter 
> 
> Thanks, applied for fixes!

Tested-by: Pavel Machek 

Thanks,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-08 Thread Pavel Machek
On Thu 2017-09-07 14:06:52, Ulf Hansson wrote:
> [...]
> 
> >
> > From: Adrian Hunter 
> > Date: Thu, 7 Sep 2017 10:40:35 +0300
> > Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
> >
> > mmc_init_request() depends on card->bouncesz so it must be calculated
> > before blk_init_allocated_queue() starts allocating requests.
> >
> > Reported-by: Seraphime Kirkovski 
> > Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> > layer core")
> > Signed-off-by: Adrian Hunter 
> 
> Thanks, applied for fixes!

Tested-by: Pavel Machek 

Thanks,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Shawn Lin

On 2017/9/8 4:02, Linus Walleij wrote:

On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson  wrote:


Even if this fixes the problem it seems like we are papering over the
real issue, which earlier fixes also did during the release cycle for
v4.13.


I think this is the real solution to the issue.


Another unrelated issue with mmc_init_request() is that mmc_exit_request()
is not called if mmc_init_request() fails, which means mmc_init_request()
must free anything it allocates when it fails.


Yes, the situations it's just too fragile. We need to fix the behavior
properly, although I haven't myself been able to investigate exactly
how yet.

Adding, Linus, perhaps he has some ideas.


Maybe we should simply bite the bullet and do what was suggested
by another contributor when I refactored the bounce buffer handling:
simply delete the bounce buffer code and let any remaining (few?)
legacy devices suffer a bit (performancewise) at the gain of way
simpler code?


Are you in the same page with what Adrian pointed to?

IIUC, the issue is:
init_rq_fn will be called each time when trying to create new reuqest
from the pre-allocated request_list memeory pool, and exit_rq_fn will is
in the corresponding routine to free request from request_list
(blk_free_request) when finished. But if alloc_request_size fails, it
won't call exit_rq_fn, so you need to prevent memory leak on your own
error path of init_rq_fn.

But you seem to talk about removing the bounce buffer and so finally
get rid of registering init_rq_fn/exit_rq_fn? That is another thing,
and what we right now need to do is to fix the pontential memory leak.
It's quite a simple action, right? :)




I am a bit hesitant about that because Pierre Ossman said it was
actually a big win on the SDHC hosts that made use of it at one
point.


You had removed packed cmd support to simplify the code, so I think
this is another trade-off need to ask: What you want? and keep
consistent with the direction you insisted on.





Yours,
Linus Walleij







Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Shawn Lin

On 2017/9/8 4:02, Linus Walleij wrote:

On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson  wrote:


Even if this fixes the problem it seems like we are papering over the
real issue, which earlier fixes also did during the release cycle for
v4.13.


I think this is the real solution to the issue.


Another unrelated issue with mmc_init_request() is that mmc_exit_request()
is not called if mmc_init_request() fails, which means mmc_init_request()
must free anything it allocates when it fails.


Yes, the situations it's just too fragile. We need to fix the behavior
properly, although I haven't myself been able to investigate exactly
how yet.

Adding, Linus, perhaps he has some ideas.


Maybe we should simply bite the bullet and do what was suggested
by another contributor when I refactored the bounce buffer handling:
simply delete the bounce buffer code and let any remaining (few?)
legacy devices suffer a bit (performancewise) at the gain of way
simpler code?


Are you in the same page with what Adrian pointed to?

IIUC, the issue is:
init_rq_fn will be called each time when trying to create new reuqest
from the pre-allocated request_list memeory pool, and exit_rq_fn will is
in the corresponding routine to free request from request_list
(blk_free_request) when finished. But if alloc_request_size fails, it
won't call exit_rq_fn, so you need to prevent memory leak on your own
error path of init_rq_fn.

But you seem to talk about removing the bounce buffer and so finally
get rid of registering init_rq_fn/exit_rq_fn? That is another thing,
and what we right now need to do is to fix the pontential memory leak.
It's quite a simple action, right? :)




I am a bit hesitant about that because Pierre Ossman said it was
actually a big win on the SDHC hosts that made use of it at one
point.


You had removed packed cmd support to simplify the code, so I think
this is another trade-off need to ask: What you want? and keep
consistent with the direction you insisted on.





Yours,
Linus Walleij







Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Linus Walleij
On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson  wrote:

> Even if this fixes the problem it seems like we are papering over the
> real issue, which earlier fixes also did during the release cycle for
> v4.13.

I think this is the real solution to the issue.

>> Another unrelated issue with mmc_init_request() is that mmc_exit_request()
>> is not called if mmc_init_request() fails, which means mmc_init_request()
>> must free anything it allocates when it fails.
>
> Yes, the situations it's just too fragile. We need to fix the behavior
> properly, although I haven't myself been able to investigate exactly
> how yet.
>
> Adding, Linus, perhaps he has some ideas.

Maybe we should simply bite the bullet and do what was suggested
by another contributor when I refactored the bounce buffer handling:
simply delete the bounce buffer code and let any remaining (few?)
legacy devices suffer a bit (performancewise) at the gain of way
simpler code?

I am a bit hesitant about that because Pierre Ossman said it was
actually a big win on the SDHC hosts that made use of it at one
point.

Yours,
Linus Walleij


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Linus Walleij
On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson  wrote:

> Even if this fixes the problem it seems like we are papering over the
> real issue, which earlier fixes also did during the release cycle for
> v4.13.

I think this is the real solution to the issue.

>> Another unrelated issue with mmc_init_request() is that mmc_exit_request()
>> is not called if mmc_init_request() fails, which means mmc_init_request()
>> must free anything it allocates when it fails.
>
> Yes, the situations it's just too fragile. We need to fix the behavior
> properly, although I haven't myself been able to investigate exactly
> how yet.
>
> Adding, Linus, perhaps he has some ideas.

Maybe we should simply bite the bullet and do what was suggested
by another contributor when I refactored the bounce buffer handling:
simply delete the bounce buffer code and let any remaining (few?)
legacy devices suffer a bit (performancewise) at the gain of way
simpler code?

I am a bit hesitant about that because Pierre Ossman said it was
actually a big win on the SDHC hosts that made use of it at one
point.

Yours,
Linus Walleij


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Linus Walleij
On Thu, Sep 7, 2017 at 9:53 AM, Adrian Hunter  wrote:

> From: Adrian Hunter 
> Date: Thu, 7 Sep 2017 10:40:35 +0300
> Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
>
> mmc_init_request() depends on card->bouncesz so it must be calculated
> before blk_init_allocated_queue() starts allocating requests.
>
> Reported-by: Seraphime Kirkovski 
> Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> layer core")
> Signed-off-by: Adrian Hunter 

Reviewed-by: Linus Walleij 

Really neat and quick fix, thanks a lot Adrian.

My fault for not finding more systems actually *using* these
bounce buffers. :( :(

Yours,
Linus Walleij


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Linus Walleij
On Thu, Sep 7, 2017 at 9:53 AM, Adrian Hunter  wrote:

> From: Adrian Hunter 
> Date: Thu, 7 Sep 2017 10:40:35 +0300
> Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
>
> mmc_init_request() depends on card->bouncesz so it must be calculated
> before blk_init_allocated_queue() starts allocating requests.
>
> Reported-by: Seraphime Kirkovski 
> Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> layer core")
> Signed-off-by: Adrian Hunter 

Reviewed-by: Linus Walleij 

Really neat and quick fix, thanks a lot Adrian.

My fault for not finding more systems actually *using* these
bounce buffers. :( :(

Yours,
Linus Walleij


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Ulf Hansson
On 7 September 2017 at 14:55, Pavel Machek  wrote:
> On Thu 2017-09-07 14:06:52, Ulf Hansson wrote:
>> [...]
>>
>> >
>> > From: Adrian Hunter 
>> > Date: Thu, 7 Sep 2017 10:40:35 +0300
>> > Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
>> >
>> > mmc_init_request() depends on card->bouncesz so it must be calculated
>> > before blk_init_allocated_queue() starts allocating requests.
>> >
>> > Reported-by: Seraphime Kirkovski 
>> > Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the 
>> > block layer core")
>> > Signed-off-by: Adrian Hunter 
>>
>> Thanks, applied for fixes!
>
> Thanks. I believe this one should get cc: stable markups, so
> eventually 4.13 does get fixed, too
> Pavel

Yeah, correct and added!

Kind regards
Uffe


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Ulf Hansson
On 7 September 2017 at 14:55, Pavel Machek  wrote:
> On Thu 2017-09-07 14:06:52, Ulf Hansson wrote:
>> [...]
>>
>> >
>> > From: Adrian Hunter 
>> > Date: Thu, 7 Sep 2017 10:40:35 +0300
>> > Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
>> >
>> > mmc_init_request() depends on card->bouncesz so it must be calculated
>> > before blk_init_allocated_queue() starts allocating requests.
>> >
>> > Reported-by: Seraphime Kirkovski 
>> > Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the 
>> > block layer core")
>> > Signed-off-by: Adrian Hunter 
>>
>> Thanks, applied for fixes!
>
> Thanks. I believe this one should get cc: stable markups, so
> eventually 4.13 does get fixed, too
> Pavel

Yeah, correct and added!

Kind regards
Uffe


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Pavel Machek
On Thu 2017-09-07 14:06:52, Ulf Hansson wrote:
> [...]
> 
> >
> > From: Adrian Hunter 
> > Date: Thu, 7 Sep 2017 10:40:35 +0300
> > Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
> >
> > mmc_init_request() depends on card->bouncesz so it must be calculated
> > before blk_init_allocated_queue() starts allocating requests.
> >
> > Reported-by: Seraphime Kirkovski 
> > Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> > layer core")
> > Signed-off-by: Adrian Hunter 
> 
> Thanks, applied for fixes!

Thanks. I believe this one should get cc: stable markups, so
eventually 4.13 does get fixed, too
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Pavel Machek
On Thu 2017-09-07 14:06:52, Ulf Hansson wrote:
> [...]
> 
> >
> > From: Adrian Hunter 
> > Date: Thu, 7 Sep 2017 10:40:35 +0300
> > Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
> >
> > mmc_init_request() depends on card->bouncesz so it must be calculated
> > before blk_init_allocated_queue() starts allocating requests.
> >
> > Reported-by: Seraphime Kirkovski 
> > Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> > layer core")
> > Signed-off-by: Adrian Hunter 
> 
> Thanks, applied for fixes!

Thanks. I believe this one should get cc: stable markups, so
eventually 4.13 does get fixed, too
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Ulf Hansson
[...]

>
> From: Adrian Hunter 
> Date: Thu, 7 Sep 2017 10:40:35 +0300
> Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
>
> mmc_init_request() depends on card->bouncesz so it must be calculated
> before blk_init_allocated_queue() starts allocating requests.
>
> Reported-by: Seraphime Kirkovski 
> Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> layer core")
> Signed-off-by: Adrian Hunter 

Thanks, applied for fixes!

Kind regards
Uffe

> ---
>  drivers/mmc/core/queue.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index affa7370ba82..74c663b1c0a7 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -242,6 +242,12 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
> limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
>
> +   /*
> +* mmc_init_request() depends on card->bouncesz so it must be 
> calculated
> +* before blk_init_allocated_queue() starts allocating requests.
> +*/
> +   card->bouncesz = mmc_queue_calc_bouncesz(host);
> +
> mq->card = card;
> mq->queue = blk_alloc_queue(GFP_KERNEL);
> if (!mq->queue)
> @@ -265,7 +271,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_can_erase(card))
> mmc_queue_setup_discard(mq->queue, card);
>
> -   card->bouncesz = mmc_queue_calc_bouncesz(host);
> if (card->bouncesz) {
> blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
> blk_queue_max_segments(mq->queue, card->bouncesz / 512);
> --
> 1.9.1
>


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Ulf Hansson
[...]

>
> From: Adrian Hunter 
> Date: Thu, 7 Sep 2017 10:40:35 +0300
> Subject: [PATCH] mmc: block: Fix incorrectly initialized requests
>
> mmc_init_request() depends on card->bouncesz so it must be calculated
> before blk_init_allocated_queue() starts allocating requests.
>
> Reported-by: Seraphime Kirkovski 
> Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
> layer core")
> Signed-off-by: Adrian Hunter 

Thanks, applied for fixes!

Kind regards
Uffe

> ---
>  drivers/mmc/core/queue.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index affa7370ba82..74c663b1c0a7 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -242,6 +242,12 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
> limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
>
> +   /*
> +* mmc_init_request() depends on card->bouncesz so it must be 
> calculated
> +* before blk_init_allocated_queue() starts allocating requests.
> +*/
> +   card->bouncesz = mmc_queue_calc_bouncesz(host);
> +
> mq->card = card;
> mq->queue = blk_alloc_queue(GFP_KERNEL);
> if (!mq->queue)
> @@ -265,7 +271,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_can_erase(card))
> mmc_queue_setup_discard(mq->queue, card);
>
> -   card->bouncesz = mmc_queue_calc_bouncesz(host);
> if (card->bouncesz) {
> blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
> blk_queue_max_segments(mq->queue, card->bouncesz / 512);
> --
> 1.9.1
>


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Adrian Hunter
On 07/09/17 10:18, Ulf Hansson wrote:
> + Linus
> 
> On 6 September 2017 at 08:03, Adrian Hunter  wrote:
>> On 06/09/17 05:44, Shawn Lin wrote:
>>> + Seraphime
>>>
>>> On 2017/9/6 3:47, Pavel Machek wrote:
 Hi!

 I tried to write to the MMC card; process hung and I got this in the
 dmesg.
>>>
>>>
>>> A similar report for 4.13 cycle was here:
>>>
>>> https://lkml.org/lkml/2017/8/10/824
>>>
>>> Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
>>> reproduce that. So maybe Seraphime can do git-bisect as he said "I get
>>> it everytime" for which I assume it could be easy for him to find out
>>> the problematic commit?
>>>
>>
>> One obvious weakness in the new mmc_init_request() is the possibility
>> that it might be called before card->bouncesz is set up.  That could
>> result in bouncing being done but mq_rq->bounce_sg is null.
>> This might help:
>>
>>
>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
>> index affa7370ba82..ad3e53e63abb 100644
>> --- a/drivers/mmc/core/queue.c
>> +++ b/drivers/mmc/core/queue.c
>> @@ -242,6 +242,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
>> *card,
>> if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
>> limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
>>
>> +   card->bouncesz = mmc_queue_calc_bouncesz(host);
>> +
>> mq->card = card;
>> mq->queue = blk_alloc_queue(GFP_KERNEL);
>> if (!mq->queue)
>> @@ -265,7 +267,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
>> *card,
>> if (mmc_can_erase(card))
>> mmc_queue_setup_discard(mq->queue, card);
>>
>> -   card->bouncesz = mmc_queue_calc_bouncesz(host);
>> if (card->bouncesz) {
>> blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
>> blk_queue_max_segments(mq->queue, card->bouncesz / 512);
>>
> 
> Even if this fixes the problem it seems like we are papering over the
> real issue, which earlier fixes also did during the release cycle for
> v4.13.

blk_init_allocated_queue() allocates 1 request for flush and 4 requests
for a memory pool.  The memory pool requests only get used under memory
pressure.  That is why the error doesn't come up straight away.

> 
> Anyway I am happy to apply this as fix for 4.14, if Seraphime/Pavel
> can report it solved the problem. Could you send a proper patch with
> some changlog please?
> 
> I would also appreciate if can add you a small comment in the code,
> why moving this line is needed.

From: Adrian Hunter 
Date: Thu, 7 Sep 2017 10:40:35 +0300
Subject: [PATCH] mmc: block: Fix incorrectly initialized requests

mmc_init_request() depends on card->bouncesz so it must be calculated
before blk_init_allocated_queue() starts allocating requests.

Reported-by: Seraphime Kirkovski 
Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
layer core")
Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/queue.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index affa7370ba82..74c663b1c0a7 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -242,6 +242,12 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
 
+   /*
+* mmc_init_request() depends on card->bouncesz so it must be calculated
+* before blk_init_allocated_queue() starts allocating requests.
+*/
+   card->bouncesz = mmc_queue_calc_bouncesz(host);
+
mq->card = card;
mq->queue = blk_alloc_queue(GFP_KERNEL);
if (!mq->queue)
@@ -265,7 +271,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
 
-   card->bouncesz = mmc_queue_calc_bouncesz(host);
if (card->bouncesz) {
blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
blk_queue_max_segments(mq->queue, card->bouncesz / 512);
-- 
1.9.1



Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Adrian Hunter
On 07/09/17 10:18, Ulf Hansson wrote:
> + Linus
> 
> On 6 September 2017 at 08:03, Adrian Hunter  wrote:
>> On 06/09/17 05:44, Shawn Lin wrote:
>>> + Seraphime
>>>
>>> On 2017/9/6 3:47, Pavel Machek wrote:
 Hi!

 I tried to write to the MMC card; process hung and I got this in the
 dmesg.
>>>
>>>
>>> A similar report for 4.13 cycle was here:
>>>
>>> https://lkml.org/lkml/2017/8/10/824
>>>
>>> Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
>>> reproduce that. So maybe Seraphime can do git-bisect as he said "I get
>>> it everytime" for which I assume it could be easy for him to find out
>>> the problematic commit?
>>>
>>
>> One obvious weakness in the new mmc_init_request() is the possibility
>> that it might be called before card->bouncesz is set up.  That could
>> result in bouncing being done but mq_rq->bounce_sg is null.
>> This might help:
>>
>>
>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
>> index affa7370ba82..ad3e53e63abb 100644
>> --- a/drivers/mmc/core/queue.c
>> +++ b/drivers/mmc/core/queue.c
>> @@ -242,6 +242,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
>> *card,
>> if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
>> limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
>>
>> +   card->bouncesz = mmc_queue_calc_bouncesz(host);
>> +
>> mq->card = card;
>> mq->queue = blk_alloc_queue(GFP_KERNEL);
>> if (!mq->queue)
>> @@ -265,7 +267,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
>> *card,
>> if (mmc_can_erase(card))
>> mmc_queue_setup_discard(mq->queue, card);
>>
>> -   card->bouncesz = mmc_queue_calc_bouncesz(host);
>> if (card->bouncesz) {
>> blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
>> blk_queue_max_segments(mq->queue, card->bouncesz / 512);
>>
> 
> Even if this fixes the problem it seems like we are papering over the
> real issue, which earlier fixes also did during the release cycle for
> v4.13.

blk_init_allocated_queue() allocates 1 request for flush and 4 requests
for a memory pool.  The memory pool requests only get used under memory
pressure.  That is why the error doesn't come up straight away.

> 
> Anyway I am happy to apply this as fix for 4.14, if Seraphime/Pavel
> can report it solved the problem. Could you send a proper patch with
> some changlog please?
> 
> I would also appreciate if can add you a small comment in the code,
> why moving this line is needed.

From: Adrian Hunter 
Date: Thu, 7 Sep 2017 10:40:35 +0300
Subject: [PATCH] mmc: block: Fix incorrectly initialized requests

mmc_init_request() depends on card->bouncesz so it must be calculated
before blk_init_allocated_queue() starts allocating requests.

Reported-by: Seraphime Kirkovski 
Fixes: 304419d8a7e92 ("mmc: core: Allocate per-request data using the block 
layer core")
Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/queue.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index affa7370ba82..74c663b1c0a7 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -242,6 +242,12 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
 
+   /*
+* mmc_init_request() depends on card->bouncesz so it must be calculated
+* before blk_init_allocated_queue() starts allocating requests.
+*/
+   card->bouncesz = mmc_queue_calc_bouncesz(host);
+
mq->card = card;
mq->queue = blk_alloc_queue(GFP_KERNEL);
if (!mq->queue)
@@ -265,7 +271,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
 
-   card->bouncesz = mmc_queue_calc_bouncesz(host);
if (card->bouncesz) {
blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
blk_queue_max_segments(mq->queue, card->bouncesz / 512);
-- 
1.9.1



Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Ulf Hansson
+ Linus

On 6 September 2017 at 08:03, Adrian Hunter  wrote:
> On 06/09/17 05:44, Shawn Lin wrote:
>> + Seraphime
>>
>> On 2017/9/6 3:47, Pavel Machek wrote:
>>> Hi!
>>>
>>> I tried to write to the MMC card; process hung and I got this in the
>>> dmesg.
>>
>>
>> A similar report for 4.13 cycle was here:
>>
>> https://lkml.org/lkml/2017/8/10/824
>>
>> Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
>> reproduce that. So maybe Seraphime can do git-bisect as he said "I get
>> it everytime" for which I assume it could be easy for him to find out
>> the problematic commit?
>>
>
> One obvious weakness in the new mmc_init_request() is the possibility
> that it might be called before card->bouncesz is set up.  That could
> result in bouncing being done but mq_rq->bounce_sg is null.
> This might help:
>
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index affa7370ba82..ad3e53e63abb 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -242,6 +242,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
> limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
>
> +   card->bouncesz = mmc_queue_calc_bouncesz(host);
> +
> mq->card = card;
> mq->queue = blk_alloc_queue(GFP_KERNEL);
> if (!mq->queue)
> @@ -265,7 +267,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_can_erase(card))
> mmc_queue_setup_discard(mq->queue, card);
>
> -   card->bouncesz = mmc_queue_calc_bouncesz(host);
> if (card->bouncesz) {
> blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
> blk_queue_max_segments(mq->queue, card->bouncesz / 512);
>

Even if this fixes the problem it seems like we are papering over the
real issue, which earlier fixes also did during the release cycle for
v4.13.

Anyway I am happy to apply this as fix for 4.14, if Seraphime/Pavel
can report it solved the problem. Could you send a proper patch with
some changlog please?

I would also appreciate if can add you a small comment in the code,
why moving this line is needed.

>
> Another unrelated issue with mmc_init_request() is that mmc_exit_request()
> is not called if mmc_init_request() fails, which means mmc_init_request()
> must free anything it allocates when it fails.

Yes, the situations it's just too fragile. We need to fix the behavior
properly, although I haven't myself been able to investigate exactly
how yet.

Adding, Linus, perhaps he has some ideas.

Kind regards
Uffe


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-07 Thread Ulf Hansson
+ Linus

On 6 September 2017 at 08:03, Adrian Hunter  wrote:
> On 06/09/17 05:44, Shawn Lin wrote:
>> + Seraphime
>>
>> On 2017/9/6 3:47, Pavel Machek wrote:
>>> Hi!
>>>
>>> I tried to write to the MMC card; process hung and I got this in the
>>> dmesg.
>>
>>
>> A similar report for 4.13 cycle was here:
>>
>> https://lkml.org/lkml/2017/8/10/824
>>
>> Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
>> reproduce that. So maybe Seraphime can do git-bisect as he said "I get
>> it everytime" for which I assume it could be easy for him to find out
>> the problematic commit?
>>
>
> One obvious weakness in the new mmc_init_request() is the possibility
> that it might be called before card->bouncesz is set up.  That could
> result in bouncing being done but mq_rq->bounce_sg is null.
> This might help:
>
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index affa7370ba82..ad3e53e63abb 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -242,6 +242,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
> limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
>
> +   card->bouncesz = mmc_queue_calc_bouncesz(host);
> +
> mq->card = card;
> mq->queue = blk_alloc_queue(GFP_KERNEL);
> if (!mq->queue)
> @@ -265,7 +267,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
> *card,
> if (mmc_can_erase(card))
> mmc_queue_setup_discard(mq->queue, card);
>
> -   card->bouncesz = mmc_queue_calc_bouncesz(host);
> if (card->bouncesz) {
> blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
> blk_queue_max_segments(mq->queue, card->bouncesz / 512);
>

Even if this fixes the problem it seems like we are papering over the
real issue, which earlier fixes also did during the release cycle for
v4.13.

Anyway I am happy to apply this as fix for 4.14, if Seraphime/Pavel
can report it solved the problem. Could you send a proper patch with
some changlog please?

I would also appreciate if can add you a small comment in the code,
why moving this line is needed.

>
> Another unrelated issue with mmc_init_request() is that mmc_exit_request()
> is not called if mmc_init_request() fails, which means mmc_init_request()
> must free anything it allocates when it fails.

Yes, the situations it's just too fragile. We need to fix the behavior
properly, although I haven't myself been able to investigate exactly
how yet.

Adding, Linus, perhaps he has some ideas.

Kind regards
Uffe


Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-06 Thread Adrian Hunter
On 06/09/17 05:44, Shawn Lin wrote:
> + Seraphime
> 
> On 2017/9/6 3:47, Pavel Machek wrote:
>> Hi!
>>
>> I tried to write to the MMC card; process hung and I got this in the
>> dmesg.
> 
> 
> A similar report for 4.13 cycle was here:
> 
> https://lkml.org/lkml/2017/8/10/824
> 
> Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
> reproduce that. So maybe Seraphime can do git-bisect as he said "I get
> it everytime" for which I assume it could be easy for him to find out
> the problematic commit?
> 

One obvious weakness in the new mmc_init_request() is the possibility
that it might be called before card->bouncesz is set up.  That could
result in bouncing being done but mq_rq->bounce_sg is null.
This might help:


diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index affa7370ba82..ad3e53e63abb 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -242,6 +242,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
 
+   card->bouncesz = mmc_queue_calc_bouncesz(host);
+
mq->card = card;
mq->queue = blk_alloc_queue(GFP_KERNEL);
if (!mq->queue)
@@ -265,7 +267,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
 
-   card->bouncesz = mmc_queue_calc_bouncesz(host);
if (card->bouncesz) {
blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
blk_queue_max_segments(mq->queue, card->bouncesz / 512);


Another unrelated issue with mmc_init_request() is that mmc_exit_request()
is not called if mmc_init_request() fails, which means mmc_init_request()
must free anything it allocates when it fails.



Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-06 Thread Adrian Hunter
On 06/09/17 05:44, Shawn Lin wrote:
> + Seraphime
> 
> On 2017/9/6 3:47, Pavel Machek wrote:
>> Hi!
>>
>> I tried to write to the MMC card; process hung and I got this in the
>> dmesg.
> 
> 
> A similar report for 4.13 cycle was here:
> 
> https://lkml.org/lkml/2017/8/10/824
> 
> Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
> reproduce that. So maybe Seraphime can do git-bisect as he said "I get
> it everytime" for which I assume it could be easy for him to find out
> the problematic commit?
> 

One obvious weakness in the new mmc_init_request() is the possibility
that it might be called before card->bouncesz is set up.  That could
result in bouncing being done but mq_rq->bounce_sg is null.
This might help:


diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index affa7370ba82..ad3e53e63abb 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -242,6 +242,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
 
+   card->bouncesz = mmc_queue_calc_bouncesz(host);
+
mq->card = card;
mq->queue = blk_alloc_queue(GFP_KERNEL);
if (!mq->queue)
@@ -265,7 +267,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card,
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
 
-   card->bouncesz = mmc_queue_calc_bouncesz(host);
if (card->bouncesz) {
blk_queue_max_hw_sectors(mq->queue, card->bouncesz / 512);
blk_queue_max_segments(mq->queue, card->bouncesz / 512);


Another unrelated issue with mmc_init_request() is that mmc_exit_request()
is not called if mmc_init_request() fails, which means mmc_init_request()
must free anything it allocates when it fails.



Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-05 Thread Shawn Lin

+ Seraphime

On 2017/9/6 3:47, Pavel Machek wrote:

Hi!

I tried to write to the MMC card; process hung and I got this in the
dmesg.



A similar report for 4.13 cycle was here:

https://lkml.org/lkml/2017/8/10/824

Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
reproduce that. So maybe Seraphime can do git-bisect as he said "I get
it everytime" for which I assume it could be easy for him to find out
the problematic commit?



[15909.353822] usb 2-1.2: Product: Ethernet Gadget
[15909.353826] usb 2-1.2: Manufacturer: Linux
4.12.0-03002-gec979a4-dirty with musb-hdrc
[15909.362182] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
usb-:00:1d.0-1.2, CDC Ethernet Device, e2:f7:c0:44:db:77
[16109.316302] perf: interrupt took too long (2544 > 2500), lowering
kernel.perf_event_max_sample_rate to 78500
[18273.845406] mmc0: error -123 whilst initialising SD card
[18275.327982] mmc0: new high speed SDHC card at address 0003
[18275.328962] mmcblk0: mmc0:0003 SB08G 7.21 GiB
[18275.333698]  mmcblk0: p1 p2 p3
[18275.955293] EXT4-fs (mmcblk0p3): mounting ext3 file system using
the ext4 subsystem
[18276.013021] EXT4-fs (mmcblk0p3): recovery complete
[18276.016185] EXT4-fs (mmcblk0p3): mounted filesystem with ordered
data mode. Opts: (null)
[18306.524676]  mmcblk0: p1 p2 p3
[18439.780298] perf: interrupt took too long (3258 > 3180), lowering
kernel.perf_event_max_sample_rate to 61250
[19137.696497] perf: interrupt took too long (4082 > 4072), lowering
kernel.perf_event_max_sample_rate to 49000
[19925.945831] general protection fault:  [#1] SMP
[19925.945909] Modules linked in:
[19925.945940] CPU: 2 PID: 30540 Comm: mmcqd/0 Not tainted 4.13.0 #123
[19925.946010] Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET73WW
(1.43 ) 10/12/2016
[19925.946168] task: 88001de9e800 task.stack: c900087d4000
[19925.946293] RIP: 0010:blk_rq_map_sg+0x21a/0x4a3
[19925.946376] RSP: :c900087d7d10 EFLAGS: 00010202
[19925.946479] RAX: 6b6b6b6b6b6b6b68 RBX: 2000 RCX:
2000
[19925.946622] RDX: 6b6b6b6b6b6b6b6b RSI: 6eed2000 RDI:
88009e32b9e0
[19925.946767] RBP: c900087d7d68 R08:  R09:
63188000
[19925.946919] R10: 1600 R11: 6db6db6db6db6db7 R12:

[19925.947059] R13: 1000 R14: 0002 R15:
ea00015ad5c0
[19925.947197] FS:  () GS:88019e28()
knlGS:
[19925.947347] CS:  0010 DS:  ES:  CR0: 80050033
[19925.947454] CR2: 4482a000 CR3: 0521d000 CR4:
000406a0
[19925.947586] Call Trace:
[19925.947634]  mmc_queue_map_sg+0x2f/0xa3
[19925.947703]  mmc_blk_rw_rq_prep+0x1fa/0x3d8
[19925.94]  mmc_blk_issue_rw_rq+0xe8/0x419
[19925.947852]  mmc_blk_issue_rq+0x589/0x6e7
[19925.947924]  mmc_queue_thread+0xe1/0x175
[19925.947996]  kthread+0x13d/0x145
[19925.948050]  ? mmc_blk_issue_rq+0x6e7/0x6e7
[19925.948125]  ? kthread_bind+0x2c/0x2c
[19925.948191]  ? do_int80_syscall_32+0x5c/0xb4
[19925.948269]  ? SyS_exit_group+0xf/0xf
[19925.948337]  ret_from_fork+0x22/0x30
[19925.948394] Code: 89 e8 4a 8d 74 06 ff 48 09 f7 48 39 fa 75 05 89
48 0c eb 39 48 85 c0 74 0e 48 83 20 fd 48 89 c7 e8 83 02 02 00 eb 04
48 8b 45 b0 <48> 8b 10 83 e2 03 41 f6 c7 03 74 02 0f 0b 4c 09 fa 48 89
10 8b
[19925.948980] RIP: blk_rq_map_sg+0x21a/0x4a3 RSP: c900087d7d10
[19925.958464] ---[ end trace 3ddfa379837fe00b ]---
pavel@duo:~$ uname -a
Linux duo 4.13.0 #123 SMP Mon Sep 4 10:42:23 CEST 2017 x86_64
GNU/Linux
pavel@duo:~$

Similar crash happened yesterday, but that time I got panic (blinking
capslock) and stripes on screen. I did not use MMC before.

Any ideas?

Thanks,
Pavel





Re: 4.13 on thinkpad x220: oops when writing to SD card

2017-09-05 Thread Shawn Lin

+ Seraphime

On 2017/9/6 3:47, Pavel Machek wrote:

Hi!

I tried to write to the MMC card; process hung and I got this in the
dmesg.



A similar report for 4.13 cycle was here:

https://lkml.org/lkml/2017/8/10/824

Seems 4.13-rc4 was already broken for that but unfortuantely I didn't
reproduce that. So maybe Seraphime can do git-bisect as he said "I get
it everytime" for which I assume it could be easy for him to find out
the problematic commit?



[15909.353822] usb 2-1.2: Product: Ethernet Gadget
[15909.353826] usb 2-1.2: Manufacturer: Linux
4.12.0-03002-gec979a4-dirty with musb-hdrc
[15909.362182] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
usb-:00:1d.0-1.2, CDC Ethernet Device, e2:f7:c0:44:db:77
[16109.316302] perf: interrupt took too long (2544 > 2500), lowering
kernel.perf_event_max_sample_rate to 78500
[18273.845406] mmc0: error -123 whilst initialising SD card
[18275.327982] mmc0: new high speed SDHC card at address 0003
[18275.328962] mmcblk0: mmc0:0003 SB08G 7.21 GiB
[18275.333698]  mmcblk0: p1 p2 p3
[18275.955293] EXT4-fs (mmcblk0p3): mounting ext3 file system using
the ext4 subsystem
[18276.013021] EXT4-fs (mmcblk0p3): recovery complete
[18276.016185] EXT4-fs (mmcblk0p3): mounted filesystem with ordered
data mode. Opts: (null)
[18306.524676]  mmcblk0: p1 p2 p3
[18439.780298] perf: interrupt took too long (3258 > 3180), lowering
kernel.perf_event_max_sample_rate to 61250
[19137.696497] perf: interrupt took too long (4082 > 4072), lowering
kernel.perf_event_max_sample_rate to 49000
[19925.945831] general protection fault:  [#1] SMP
[19925.945909] Modules linked in:
[19925.945940] CPU: 2 PID: 30540 Comm: mmcqd/0 Not tainted 4.13.0 #123
[19925.946010] Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET73WW
(1.43 ) 10/12/2016
[19925.946168] task: 88001de9e800 task.stack: c900087d4000
[19925.946293] RIP: 0010:blk_rq_map_sg+0x21a/0x4a3
[19925.946376] RSP: :c900087d7d10 EFLAGS: 00010202
[19925.946479] RAX: 6b6b6b6b6b6b6b68 RBX: 2000 RCX:
2000
[19925.946622] RDX: 6b6b6b6b6b6b6b6b RSI: 6eed2000 RDI:
88009e32b9e0
[19925.946767] RBP: c900087d7d68 R08:  R09:
63188000
[19925.946919] R10: 1600 R11: 6db6db6db6db6db7 R12:

[19925.947059] R13: 1000 R14: 0002 R15:
ea00015ad5c0
[19925.947197] FS:  () GS:88019e28()
knlGS:
[19925.947347] CS:  0010 DS:  ES:  CR0: 80050033
[19925.947454] CR2: 4482a000 CR3: 0521d000 CR4:
000406a0
[19925.947586] Call Trace:
[19925.947634]  mmc_queue_map_sg+0x2f/0xa3
[19925.947703]  mmc_blk_rw_rq_prep+0x1fa/0x3d8
[19925.94]  mmc_blk_issue_rw_rq+0xe8/0x419
[19925.947852]  mmc_blk_issue_rq+0x589/0x6e7
[19925.947924]  mmc_queue_thread+0xe1/0x175
[19925.947996]  kthread+0x13d/0x145
[19925.948050]  ? mmc_blk_issue_rq+0x6e7/0x6e7
[19925.948125]  ? kthread_bind+0x2c/0x2c
[19925.948191]  ? do_int80_syscall_32+0x5c/0xb4
[19925.948269]  ? SyS_exit_group+0xf/0xf
[19925.948337]  ret_from_fork+0x22/0x30
[19925.948394] Code: 89 e8 4a 8d 74 06 ff 48 09 f7 48 39 fa 75 05 89
48 0c eb 39 48 85 c0 74 0e 48 83 20 fd 48 89 c7 e8 83 02 02 00 eb 04
48 8b 45 b0 <48> 8b 10 83 e2 03 41 f6 c7 03 74 02 0f 0b 4c 09 fa 48 89
10 8b
[19925.948980] RIP: blk_rq_map_sg+0x21a/0x4a3 RSP: c900087d7d10
[19925.958464] ---[ end trace 3ddfa379837fe00b ]---
pavel@duo:~$ uname -a
Linux duo 4.13.0 #123 SMP Mon Sep 4 10:42:23 CEST 2017 x86_64
GNU/Linux
pavel@duo:~$

Similar crash happened yesterday, but that time I got panic (blinking
capslock) and stripes on screen. I did not use MMC before.

Any ideas?

Thanks,
Pavel