Re: System hung in I/O when booting with sd card

2018-04-12 Thread Shawn Lin

Hi Bart,

On 2018/4/12 17:05, Shawn Lin wrote:

Hi Bart,

On 2018/4/12 9:54, Bart Van Assche wrote:

On Thu, 2018-04-12 at 09:48 +0800, Shawn Lin wrote:

I ran into 2 times that my system hung here when booting with a ext4 sd
card. No sure how to reproduce it but it seems doesn't matter with the
ext4 as I see it again with a vfat sd card, this morning with Linus'
master branch. Is it a known issue or any idea how to debug this?


Please verify whether the following patch resolves this:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus=37f9579f4c31a6d698dbf3016d7bf132f9288d30 



Thanks for your patch! I set up some devices to test reboot this morning
against Linus' master branch. Two devices out of 5, were already hang
for that without your patch, but the other 5 are still work with your
patch.

Will update the result tomorrow morning.


I think your patch solve this. Thanks.

Tested-by: Shawn Lin <shawn@rock-chips.com>





Thanks,

Bart.












Re: System hung in I/O when booting with sd card

2018-04-12 Thread Shawn Lin

Hi Bart,

On 2018/4/12 9:54, Bart Van Assche wrote:

On Thu, 2018-04-12 at 09:48 +0800, Shawn Lin wrote:

I ran into 2 times that my system hung here when booting with a ext4 sd
card. No sure how to reproduce it but it seems doesn't matter with the
ext4 as I see it again with a vfat sd card, this morning with Linus'
master branch. Is it a known issue or any idea how to debug this?


Please verify whether the following patch resolves this:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus=37f9579f4c31a6d698dbf3016d7bf132f9288d30


Thanks for your patch! I set up some devices to test reboot this morning
against Linus' master branch. Two devices out of 5, were already hang
for that without your patch, but the other 5 are still work with your
patch.

Will update the result tomorrow morning.



Thanks,

Bart.








System hung in I/O when booting with sd card

2018-04-11 Thread Shawn Lin

Hi,

I ran into 2 times that my system hung here when booting with a ext4 sd
card. No sure how to reproduce it but it seems doesn't matter with the
ext4 as I see it again with a vfat sd card, this morning with Linus'
master branch. Is it a known issue or any idea how to debug this?


[  110.877652] Unable to handle kernel NULL pointer dereference at 
virtual address 00c8

[  110.878360] Mem abort info:
[  110.878607]   ESR = 0x9606
[  110.878876]   Exception class = DABT (current EL), IL = 32 bits
[  110.879391]   SET = 0, FnV = 0
[  110.879659]   EA = 0, S1PTW = 0
[  110.879957] Data abort info:
[  110.880210]   ISV = 0, ISS = 0x0006
[  110.880545]   CM = 0, WnR = 0
[  110.880807] user pgtable: 4k pages, 48-bit VAs, pgdp = ed0ed92e
[  110.881382] [00c8] pgd=f015e003, 
pud=f0173003, pmd=

[  110.882143] Internal error: Oops: 9606 [#1] PREEMPT SMP
[  110.882629] Modules linked in:
[  110.882902] CPU: 4 PID: 1777 Comm: ls Not tainted 
4.16.0-next-20180410-00013-geabffa6-dirty #315

[  110.883665] Hardware name: Excavator-RK3399 Board (DT)
[  110.884114] pstate: 4005 (nZcv daif -PAN -UAO)
[  110.884542] pc : percpu_counter_add_batch+0x2c/0x100
[  110.884977] lr : generic_make_request_checks+0x214/0x478
[  110.885440] sp : 0c403960
[  110.885730] x29: 0c403960 x28: 8000efcba6e0
[  110.886196] x27: 08a81000 x26: 082643f8
[  110.886662] x25: 08263828 x24: 3000
[  110.887127] x23: 1000 x22: 
[  110.887591] x21: 1000 x20: 00a8
[  110.888056] x19: 8000f0d7dc00 x18: 0001
[  110.888521] x17: 004ab8a8 x16: 08241628
[  110.888986] x15: ff80 x14: 7e0003bbaf80
[  110.889451] x13: 00c5 x12: 09069b88
[  110.889916] x11: 0040 x10: 8000efc57dc0
[  110.890380] x9 : 8000efc57e90 x8 : 8000f0d3b138
[  110.890844] x7 :  x6 : 8000f0d7dc88
[  110.891309] x5 : 0e56 x4 : 0001
[  110.891774] x3 : 8000f0cab800 x2 : 3fff
[  110.892239] x1 : 8000edf3 x0 : 00a8
[  110.892705] Process ls (pid: 1777, stack limit = 0xb3fb6733)
[  110.893258] Call trace:
[  110.893475]  percpu_counter_add_batch+0x2c/0x100
[  110.893879]  generic_make_request_checks+0x214/0x478
[  110.894313]  generic_make_request+0x34/0x260
[  110.894686]  submit_bio+0xcc/0x1b0
[  110.894988]  ll_rw_block+0xc0/0x100
[  110.895295]  ext4_bread+0x74/0xc0
[  110.895587]  __ext4_read_dirblock+0x3c/0x2d8
[  110.895960]  htree_dirblock_to_tree+0x70/0x1d8
[  110.896349]  ext4_htree_fill_tree+0xa4/0x2c8
[  110.896723]  ext4_readdir+0x5f4/0x7f0
[  110.897046]  iterate_dir+0x9c/0x1a8
[  110.897351]  ksys_getdents64+0x8c/0x168
[  110.897687]  sys_getdents64+0xc/0x18
[  110.898001]  el0_svc_naked+0x30/0x34
[  110.898315] Code: b9401064 11000484 b9001064 d538d081 (f9401000)
[  110.898848] ---[ end trace 3b0d37baa3bb2fb3 ]---
[  110.899260] note: ls[1777] exited with preempt_count 1
[  110.899852] WARNING: CPU: 4 PID: 1777 at kernel/rcu/tree_plugin.h:330 
rcu_note_context_switch+0x30/0x3b8

[  110.900676] Modules linked in:
Se[gmentation fault  110.900947
[root@rockchip:/]# ] CPU: 4 PID: 1777 Comm: ls Tainted: G  D 
  4.16.0-next-20180410-00013-geabffa6-dirty #315

[  110.902121] Hardware name: Excavator-RK3399 Board (DT)
[  110.902570] pstate: 2085 (nzCv daIf -PAN -UAO)
[  110.902990] pc : rcu_note_context_switch+0x30/0x3b8
[  110.903417] lr : rcu_note_context_switch+0x1c/0x3b8
[  110.903841] sp : 0c403480
[  110.904132] x29: 0c403480 x28: 0c403820
[  110.904597] x27:  x26: 8000f0cab800
[  110.905062] x25: 080fdaa8 x24: 09069000
[  110.905527] x23: 09042000 x22: 8000f0cab800
[  110.905992] x21: 8000f6f7ed00 x20: 
[  110.906456] x19: 8000f0cab800 x18: 
[  110.906921] x17: 004ab8a8 x16: 08241628
[  110.907386] x15:  x14: 0400
[  110.907851] x13: 091b6200 x12: 0004
[  110.908316] x11: 0019c85d4000 x10: 0400
[  110.908781] x9 : 0004 x8 : 8000f02e6a00
[  110.909246] x7 : 8000f6f7f760 x6 : 090ba86f
[  110.909711] x5 :  x4 : 
[  110.910176] x3 : 8000edf3 x2 : 0004
[  110.910641] x1 : 0904fa98 x0 : 0001
[  110.911106] Call trace:
[  110.911322]  rcu_note_context_switch+0x30/0x3b8
[  110.911722]  __schedule+0x90/0x600
[  110.912022]  do_task_dead+0x40/0x48
[  110.912328]  do_exit+0x6c8/0x9b8
[  110.912613]  die+0x1cc/0x1f8
[  110.912868]  __do_kernel_fault+0xa4/0xf8
[  110.913212]  do_page_fault+0x1f0/0x428
[  110.913541]  do_translation_fault+0x5c/0x68
[  110.913908]  do_mem_abort+0x54/0xd8
[  110.914215]  el1_da+0x20/0x80
[  110.914476]  

[PATCH] mmc: block: Delete gendisk before cleaning up the request queue

2018-03-22 Thread Shawn Lin
dd if=/dev/urandom of=/dev/mmcblk1 bs=4k count=1
with a SD card hotplug during transfer reports a warning below
introduced by commit a063057d7c73 ("block: Fix a race between
request queue removal and the block cgroup controller"). So we
should now remove the disk, partition and bdi sysfs attributes
before cleaning up the request queue associated with the disk.

[  410.331226] mmc1: card 59b4 removed
[  410.348583] WARNING: CPU: 0 PID: 5 at block/blk-core.c:785
blk_cleanup_queue+0x138/0x140
[  410.349294] Modules linked in:
[  410.349570] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted
4.16.0-rc6-next-20180321-4-gc2ad6a7 #263
[  410.350363] Hardware name: Excavator-RK3399 Board (DT)
[  410.350819] Workqueue: events_freezable mmc_rescan
[  410.351242] pstate: 6005 (nZCv daif -PAN -UAO)
[  410.351663] pc : blk_cleanup_queue+0x138/0x140
[  410.352054] lr : blk_cleanup_queue+0xac/0x140
[  410.352436] sp : 092cbb90
[  410.352727] x29: 092cbb90 x28: 
[  410.353195] x27: 8000f6f23030 x26: 0904e610
[  410.353662] x25: 8000f17cc808 x24: 8000f1038200
[  410.354128] x23: 0060 x22: 
[  410.354595] x21: 8000f11748d8 x20: 8000f1038200
[  410.355061] x19: 8000f1174200 x18: 936347d8
[  410.355528] x17: 935b93c0 x16: 081263f8
[  410.355994] x15:  x14: 0400
[  410.356461] x13: 0001 x12: 0001
[  410.356927] x11: 0040 x10: 8000f2400028
[  410.357393] x9 : 8000f2400040 x8 : 
[  410.357860] x7 : 8000f6f3a340 x6 : 8000f6f3a340
[  410.358326] x5 : 8000f240 x4 : 8000f6f3a340
[  410.358792] x3 :  x2 : 39c1333e45670800
[  410.359259] x1 :  x0 : 0003
[  410.359726] Call trace:
[  410.359943]  blk_cleanup_queue+0x138/0x140
[  410.360305]  mmc_cleanup_queue+0x2c/0x48
[  410.360652]  mmc_blk_remove_req+0x1c/0x98
[  410.361005]  mmc_blk_remove+0x180/0x1c0
[  410.361343]  mmc_bus_remove+0x1c/0x28
[  410.361670]  device_release_driver_internal+0x154/0x1f0
[  410.362128]  device_release_driver+0x14/0x20
[  410.362504]  bus_remove_device+0xc8/0x108
[  410.362858]  device_del+0x120/0x350
[  410.363167]  mmc_remove_card+0x5c/0xb8
[  410.363498]  mmc_sd_detect+0x40/0x78
[  410.363813]  mmc_rescan+0x19c/0x368
[  410.364123]  process_one_work+0x1ac/0x318
[  410.364477]  worker_thread+0x50/0x450
[  410.364801]  kthread+0xf8/0x128
[  410.365081]  ret_from_fork+0x10/0x18
[  410.365395] ---[ end trace 268e87a46c28968c ]---

Signed-off-by: Shawn Lin <shawn@rock-chips.com>
---

 drivers/mmc/core/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index a2b9c25..02485e3 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2659,7 +2659,6 @@ static void mmc_blk_remove_req(struct mmc_blk_data *md)
 * from being accepted.
 */
card = md->queue.card;
-   mmc_cleanup_queue(>queue);
if (md->disk->flags & GENHD_FL_UP) {
device_remove_file(disk_to_dev(md->disk), 
>force_ro);
if ((md->area_type & MMC_BLK_DATA_AREA_BOOT) &&
@@ -2669,6 +2668,7 @@ static void mmc_blk_remove_req(struct mmc_blk_data *md)
 
del_gendisk(md->disk);
}
+   mmc_cleanup_queue(>queue);
mmc_blk_put(md);
}
 }
-- 
1.9.1




Re: [PATCH 1/2 v6] mmc: block: Convert RPMB to a character device

2017-09-27 Thread Shawn Lin

On 2017/9/28 3:45, Linus Walleij wrote:

On Wed, Sep 27, 2017 at 1:35 PM, Adrian Hunter  wrote:


BUG when removing, fixed by reverting this patch.

[  346.548512] mmc1: card 0001 removed
[  346.552782] BUG: unable to handle kernel NULL pointer dereference at 
0070


How did you achieve this? I need to reproduce it.

RPMB is only available on eMMC cards and I don't have a removable eMMC
card since by definition they are soldered on.

Does it happen on an ordinary MMC card without RPMB?

I'm sorry if I'm ignorant of something very obvious here...

I will read the code and see if I'm missing something obvious.


Haven't closely look at this patch, but btw, don't you need
put_device(>dev) for the error path of  mmc_blk_alloc_rpmb_part
and  also do that for mmc_blk_remove_rpmb_part?

And don't you need to check rpmb->md->usage when doing
mmc_blk_remove_rpmb_part ?



Yours,
Linus Walleij







Re: [PATCH 1/5] mmc: block: Move duplicate check

2017-06-15 Thread Shawn Lin

Hi,

On 2017/6/15 20:12, Linus Walleij wrote:

mmc_blk_ioctl() calls either mmc_blk_ioctl_cmd() or
mmc_blk_ioctl_multi_cmd() and each of these make the same
check. Factor it into a new helper function, call it on
both branches of the switch() statement and save a chunk
of duplicate code.

Cc: Shawn Lin <shawn@rock-chips.com>
Signed-off-by: Linus Walleij <linus.wall...@linaro.org>
---
ChangeLog v1->v2:
- We need to check the block device only if an actual
   well-known ioctl() is coming in, on the path of the
   switch() statments, only those branches that handle
   actual ioctl()s. Create a new helper function to check
   the block device and call that.


Reviewed-by: Shawn Lin <shawn@rock-chips.com>


---
  drivers/mmc/core/block.c | 36 
  1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 1ce6012ce3c1..d1b824e65590 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -566,14 +566,6 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
int err = 0, ioc_err = 0;
struct request *req;
  
-	/*

-* The caller must have CAP_SYS_RAWIO, and must be calling this on the
-* whole block device, not on a partition.  This prevents overspray
-* between sibling partitions.
-*/
-   if ((!capable(CAP_SYS_RAWIO)) || (bdev != bdev->bd_contains))
-   return -EPERM;
-
idata = mmc_blk_ioctl_copy_from_user(ic_ptr);
if (IS_ERR(idata))
return PTR_ERR(idata);
@@ -626,14 +618,6 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
__u64 num_of_cmds;
struct request *req;
  
-	/*

-* The caller must have CAP_SYS_RAWIO, and must be calling this on the
-* whole block device, not on a partition.  This prevents overspray
-* between sibling partitions.
-*/
-   if ((!capable(CAP_SYS_RAWIO)) || (bdev != bdev->bd_contains))
-   return -EPERM;
-
if (copy_from_user(_of_cmds, >num_of_cmds,
   sizeof(num_of_cmds)))
return -EFAULT;
@@ -697,14 +681,34 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
return ioc_err ? ioc_err : err;
  }
  
+static int mmc_blk_check_blkdev(struct block_device *bdev)

+{
+   /*
+* The caller must have CAP_SYS_RAWIO, and must be calling this on the
+* whole block device, not on a partition.  This prevents overspray
+* between sibling partitions.
+*/
+   if ((!capable(CAP_SYS_RAWIO)) || (bdev != bdev->bd_contains))
+   return -EPERM;
+   return 0;
+}
+
  static int mmc_blk_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
  {
+   int ret;
+
switch (cmd) {
case MMC_IOC_CMD:
+   ret = mmc_blk_check_blkdev(bdev);
+   if (ret)
+   return ret;
return mmc_blk_ioctl_cmd(bdev,
(struct mmc_ioc_cmd __user *)arg);
case MMC_IOC_MULTI_CMD:
+   ret = mmc_blk_check_blkdev(bdev);
+   if (ret)
+   return ret;
return mmc_blk_ioctl_multi_cmd(bdev,
(struct mmc_ioc_multi_cmd __user *)arg);
default:





Re: [PATCH v4] mmc: sdio: check the buffer address for sdio API

2017-02-14 Thread Shawn Lin

Hi Russell,

On 2017/2/15 3:34, Russell King - ARM Linux wrote:

On Tue, Feb 14, 2017 at 09:18:43AM -0700, Jens Axboe wrote:

The current situation seems like a bit of a mess. Why don't you have two
entry points, one for DMA and one for PIO. If the caller doesn't know if
he can use DMA, he'd better call the PIO variant. Either that, or audit
all callers and ensure they do the right thing wrt having a dma capable
buffer.


It really shouldn't matter.  MMC interfaces are just like USB - you
have a host controller, which interfaces what is a multi-lane serial
bus to the system.  The SDIO card shouldn't care one bit whether
the host controller is using DMA or PIO.

However, I think that the DMA vs PIO thing is actually misleading here,
that's really not the issue at all.

Looking at the oops, I see it uses sdio_memcpy_toio().  Tracing that
code leads me to here:

for_each_sg(data.sg, sg_ptr, data.sg_len, i) {
sg_set_page(sg_ptr, virt_to_page(buf + (i * seg_size)),
min(seg_size, left_size),
offset_in_page(buf + (i * seg_size)));

so the buffer that is passed into sdio_memcpy_toio() gets passed into
virt_to_page().

Firstly, the fact that it's passed to virt_to_page() means that "buf"
must _only_ _ever_ be a lowmem address.  It can't ever be a vmalloc
address (virt_to_page() is invalid on anything but lowmem.)  Just like
certain kernel interfaces, passing pointers to memory of different types
from the one intended by the interface produces invalid results, and
that seems to be what's happening here.

Secondly, it's a scatterlist, and scatterlists can be passed to DMA
mapping operations, which also implies that _if_ a host driver decides
to use DMA on it, the buffer better be DMA-able.

Thirdly, while PIO may work (or even appear to work) because _maybe_
converting a vmalloc address to a ficticious struct page + offset, and
then converting that back again _might_ result in hitting the correct
memory, but it's not guaranteed to.



[1]:
If no DMA involved, the host drivers usually use memcpy or readl/writel
to transfer the data between MMIO address and buffer coming from the
caller. So, is it also not guaranteed when using memcpy or readl to
transfer data between MMIO address and vmalloc/heap buffer?


I suspect that virt_to_page() + kmap_atomic() is likely to try to
dereference a struct page pointer that does not point at a legal entry
in the memmap arrays, and result in scribbling over some random part
of kernel memory.


If that is the fact, so what I am concerned mostly is that by
seraching the APIs, sdio_writeb and sdio_readb, under the drivers/net
/wireless/, I could see almost all sdio based WLAN drivers passed in
a vmalloc area(actually when built as moudle, it should be located in
MODULE range which also be included as vmalloc area, no?) or heap
buffer.

I assume my question[1] above is fine, then thanks to none of the mmc
host drivers use DMA for sdio_writeb and sdio_readb since it only
request one byte which didn't be fetched from host FIFO and the host
controller HW didn't support this kind of request to use DMA(but may be
not in the future). Otherwise, it may result in scribbling over some
random part of kernel memory.

Actually we didn't see that issues before, so I think the actual
question should be if the buffer from heap or vmalloc will be used with 
DMA involved?





So every way I look at this, the binary driver that Shawn downloaded
is buggy, whether the host controller uses PIO or DMA.


So it's really more dangerous that we were/are taking risking of
scribbling over some memory belonging to other code even if using
PIO if it also not guaranteed to use heap/vmalloc buffer..



I bet if Shawn tries running it against a modern kernel with
CONFIG_DEBUG_VIRTUAL enabled, Shawn will get complaints backing up
my claim.