Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-28 Thread Jon Hunter


On 28/02/2019 07:51, Marek Szyprowski wrote:
> Hi Ming,
> 
> On 2019-02-28 00:29, Ming Lei wrote:
>> On Wed, Feb 27, 2019 at 08:47:09PM +, Jon Hunter wrote:
>>> On 21/02/2019 08:42, Marek Szyprowski wrote:
 On 2019-02-15 12:13, Ming Lei wrote:
> This patch pulls the trigger for multi-page bvecs.
>
> Reviewed-by: Omar Sandoval 
> Signed-off-by: Ming Lei 
 Since Linux next-20190218 I've observed problems with block layer on one
 of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
 this issue led me to this change. This is also the first linux-next
 release with this change merged. The issue is fully reproducible and can
 be observed in the following kernel log:

 sdhci: Secure Digital Host Controller Interface driver
 sdhci: Copyright(c) Pierre Ossman
 s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
 s3c-sdhci 1253.sdhci: Got CD GPIO
 mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
 mmc0: new high speed SDHC card at address 
 mmcblk0: mmc0: SL16G 14.8 GiB
>>> I have also noticed some failures when writing to an eMMC device on one
>>> of our Tegra boards. We have a simple eMMC write/read test and it is
>>> currently failing because the data written does not match the source.
>>>
>>> I did not seem the same crash as reported here, however, in our case the
>>> rootfs is NFS mounted and so probably would not. However, the bisect
>>> points to this commit and reverting on top of -next fixes the issues.
>> It is sdhci, probably related with max segment size, could you test the
>> following patch:
>>
>> https://marc.info/?l=linux-mmc&m=155128334122951&w=2
> 
> This seems to be fixing my issue too! Thanks!

Thanks, I can confirm this fixes the issue for Tegra. So feel free to
add my ...

Tested-by: Jon Hunter 

Cheers!
Jon

-- 
nvpublic



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-28 Thread Jon Hunter


On 21/02/2019 08:42, Marek Szyprowski wrote:
> Dear All,
> 
> On 2019-02-15 12:13, Ming Lei wrote:
>> This patch pulls the trigger for multi-page bvecs.
>>
>> Reviewed-by: Omar Sandoval 
>> Signed-off-by: Ming Lei 
> 
> Since Linux next-20190218 I've observed problems with block layer on one
> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
> this issue led me to this change. This is also the first linux-next
> release with this change merged. The issue is fully reproducible and can
> be observed in the following kernel log:
> 
> sdhci: Secure Digital Host Controller Interface driver
> sdhci: Copyright(c) Pierre Ossman
> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
> s3c-sdhci 1253.sdhci: Got CD GPIO
> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
> mmc0: new high speed SDHC card at address 
> mmcblk0: mmc0: SL16G 14.8 GiB
I have also noticed some failures when writing to an eMMC device on one
of our Tegra boards. We have a simple eMMC write/read test and it is
currently failing because the data written does not match the source.

I did not seem the same crash as reported here, however, in our case the
rootfs is NFS mounted and so probably would not. However, the bisect
points to this commit and reverting on top of -next fixes the issues.

Cheers
Jon

-- 
nvpublic



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-27 Thread Marek Szyprowski
Hi Ming,

On 2019-02-28 00:29, Ming Lei wrote:
> On Wed, Feb 27, 2019 at 08:47:09PM +, Jon Hunter wrote:
>> On 21/02/2019 08:42, Marek Szyprowski wrote:
>>> On 2019-02-15 12:13, Ming Lei wrote:
 This patch pulls the trigger for multi-page bvecs.

 Reviewed-by: Omar Sandoval 
 Signed-off-by: Ming Lei 
>>> Since Linux next-20190218 I've observed problems with block layer on one
>>> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
>>> this issue led me to this change. This is also the first linux-next
>>> release with this change merged. The issue is fully reproducible and can
>>> be observed in the following kernel log:
>>>
>>> sdhci: Secure Digital Host Controller Interface driver
>>> sdhci: Copyright(c) Pierre Ossman
>>> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
>>> s3c-sdhci 1253.sdhci: Got CD GPIO
>>> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
>>> mmc0: new high speed SDHC card at address 
>>> mmcblk0: mmc0: SL16G 14.8 GiB
>> I have also noticed some failures when writing to an eMMC device on one
>> of our Tegra boards. We have a simple eMMC write/read test and it is
>> currently failing because the data written does not match the source.
>>
>> I did not seem the same crash as reported here, however, in our case the
>> rootfs is NFS mounted and so probably would not. However, the bisect
>> points to this commit and reverting on top of -next fixes the issues.
> It is sdhci, probably related with max segment size, could you test the
> following patch:
>
> https://marc.info/?l=linux-mmc&m=155128334122951&w=2

This seems to be fixing my issue too! Thanks!

It also fixed the boot issue from USB stick (Exynos EHCI / Mass
Storage), but I suspect that reading the partition table from the sd
card (which hold the bootloader and thus must be present to boot the
device) was enough to trash memory/page cache and break the boot process.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-27 Thread Ming Lei
On Wed, Feb 27, 2019 at 08:47:09PM +, Jon Hunter wrote:
> 
> On 21/02/2019 08:42, Marek Szyprowski wrote:
> > Dear All,
> > 
> > On 2019-02-15 12:13, Ming Lei wrote:
> >> This patch pulls the trigger for multi-page bvecs.
> >>
> >> Reviewed-by: Omar Sandoval 
> >> Signed-off-by: Ming Lei 
> > 
> > Since Linux next-20190218 I've observed problems with block layer on one
> > of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
> > this issue led me to this change. This is also the first linux-next
> > release with this change merged. The issue is fully reproducible and can
> > be observed in the following kernel log:
> > 
> > sdhci: Secure Digital Host Controller Interface driver
> > sdhci: Copyright(c) Pierre Ossman
> > s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
> > s3c-sdhci 1253.sdhci: Got CD GPIO
> > mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
> > mmc0: new high speed SDHC card at address 
> > mmcblk0: mmc0: SL16G 14.8 GiB
> I have also noticed some failures when writing to an eMMC device on one
> of our Tegra boards. We have a simple eMMC write/read test and it is
> currently failing because the data written does not match the source.
> 
> I did not seem the same crash as reported here, however, in our case the
> rootfs is NFS mounted and so probably would not. However, the bisect
> points to this commit and reverting on top of -next fixes the issues.

It is sdhci, probably related with max segment size, could you test the
following patch:

https://marc.info/?l=linux-mmc&m=155128334122951&w=2

Thanks,
Ming



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-22 Thread Ming Lei
On Thu, Feb 21, 2019 at 11:22:39AM +0100, Marek Szyprowski wrote:
> Hi Ming,
> 
> On 2019-02-21 11:16, Ming Lei wrote:
> > On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote:
> >> On 2019-02-21 10:57, Ming Lei wrote:
> >>> On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote:
>  On 2019-02-15 12:13, Ming Lei wrote:
> > This patch pulls the trigger for multi-page bvecs.
> >
> > Reviewed-by: Omar Sandoval 
> > Signed-off-by: Ming Lei 
>  Since Linux next-20190218 I've observed problems with block layer on one
>  of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
>  this issue led me to this change. This is also the first linux-next
>  release with this change merged. The issue is fully reproducible and can
>  be observed in the following kernel log:
> 
>  sdhci: Secure Digital Host Controller Interface driver
>  sdhci: Copyright(c) Pierre Ossman
>  s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
>  s3c-sdhci 1253.sdhci: Got CD GPIO
>  mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
>  mmc0: new high speed SDHC card at address 
>  mmcblk0: mmc0: SL16G 14.8 GiB
> 
>  ...
> 
>  EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
>  EXT4-fs (mmcblk0p2): write access will be enabled during recovery
>  EXT4-fs (mmcblk0p2): recovery complete
>  EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: 
>  (null)
>  VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
>  devtmpfs: mounted
>  Freeing unused kernel memory: 1024K
>  hub 1-3:1.0: USB hub found
>  Run /sbin/init as init process
>  hub 1-3:1.0: 3 ports detected
>  *** stack smashing detected ***:  terminated
>  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004
>  CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546
>  Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>  [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>  [] (show_stack) from [] (dump_stack+0x90/0xc8)
>  [] (dump_stack) from [] (panic+0xfc/0x304)
>  [] (panic) from [] (do_exit+0xabc/0xc6c)
>  [] (do_exit) from [] (do_group_exit+0x3c/0xbc)
>  [] (do_group_exit) from [] (get_signal+0x130/0xbf4)
>  [] (get_signal) from [] (do_work_pending+0x130/0x618)
>  [] (do_work_pending) from []
>  (slow_work_pending+0xc/0x20)
>  Exception stack(0xe88c3fb0 to 0xe88c3ff8)
>  3fa0:  bea7787c 0005
>  b6e8d0b8
>  3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000
>  bea77b60
>  3fe0: 0020 bea77998  b6d52368 6050 
>  CPU3: stopping
> 
>  I would like to help debugging and fixing this issue, but I don't really
>  have idea where to start. Here are some more detailed information about
>  my test system:
> 
>  1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree
>  source: arch/arm/boot/dts/exynos4412-odroidu3.dts)
> 
>  2. Block device: MMC/SDHCI/SDHCI-S3C with SD card
>  (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device
>  tree)
> 
>  3. Rootfs: Ext4
> 
>  4. Kernel config: arch/arm/configs/exynos_defconfig
> 
>  I can gather more logs if needed, just let me which kernel option to
>  enable. Reverting this commit on top of next-20190218 as well as current
>  linux-next (tested with next-20190221) fixes this issue and makes the
>  system bootable again.
> >>> Could you test the patch in following link and see if it can make a 
> >>> difference?
> >>>
> >>> https://marc.info/?l=linux-aio&m=155070355614541&w=2
> >> I've tested that patch, but it doesn't make any difference on the test
> >> system. In the log I see no warning added by it.
> > I guess it might be related with memory corruption, could you enable the
> > following debug options and post the dmesg log?
> >
> > CONFIG_DEBUG_STACKOVERFLOW=y
> > CONFIG_KASAN=y
> 
> It won't be that easy as none of the above options is available on ARM
> 32bit. I will try to apply some ARM KASAN patches floating on the net
> and let you know the result.

Hi Marek,

Could you test the following patch?

diff --git a/block/bounce.c b/block/bounce.c
index add085e28b1d..0c618c0b3cf8 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -295,7 +295,6 @@ static void __blk_queue_bounce(struct request_queue *q, 
struct bio **bio_orig,
bool bounce = false;
int sectors = 0;
bool passthrough = bio_is_passthrough(*bio_orig);
-   struct bvec_iter_all iter_all;
 
bio_for_each_segment(from, *bio_orig, iter) {
if (i++ < BIO_MAX_PAGES)
@@ -315,7 +314,8 @@ static void __blk_queue_bounce(struct request_queue *q, 
struct bio **bio_orig,
bio = bounce_clo

Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-22 Thread Marek Szyprowski
Hi Ming,

On 2019-02-21 10:57, Ming Lei wrote:
> On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote:
>> On 2019-02-15 12:13, Ming Lei wrote:
>>> This patch pulls the trigger for multi-page bvecs.
>>>
>>> Reviewed-by: Omar Sandoval 
>>> Signed-off-by: Ming Lei 
>> Since Linux next-20190218 I've observed problems with block layer on one
>> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
>> this issue led me to this change. This is also the first linux-next
>> release with this change merged. The issue is fully reproducible and can
>> be observed in the following kernel log:
>>
>> sdhci: Secure Digital Host Controller Interface driver
>> sdhci: Copyright(c) Pierre Ossman
>> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
>> s3c-sdhci 1253.sdhci: Got CD GPIO
>> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
>> mmc0: new high speed SDHC card at address 
>> mmcblk0: mmc0: SL16G 14.8 GiB
>>
>> ...
>>
>> EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
>> EXT4-fs (mmcblk0p2): write access will be enabled during recovery
>> EXT4-fs (mmcblk0p2): recovery complete
>> EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
>> VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
>> devtmpfs: mounted
>> Freeing unused kernel memory: 1024K
>> hub 1-3:1.0: USB hub found
>> Run /sbin/init as init process
>> hub 1-3:1.0: 3 ports detected
>> *** stack smashing detected ***:  terminated
>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004
>> CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546
>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> [] (show_stack) from [] (dump_stack+0x90/0xc8)
>> [] (dump_stack) from [] (panic+0xfc/0x304)
>> [] (panic) from [] (do_exit+0xabc/0xc6c)
>> [] (do_exit) from [] (do_group_exit+0x3c/0xbc)
>> [] (do_group_exit) from [] (get_signal+0x130/0xbf4)
>> [] (get_signal) from [] (do_work_pending+0x130/0x618)
>> [] (do_work_pending) from []
>> (slow_work_pending+0xc/0x20)
>> Exception stack(0xe88c3fb0 to 0xe88c3ff8)
>> 3fa0:  bea7787c 0005
>> b6e8d0b8
>> 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000
>> bea77b60
>> 3fe0: 0020 bea77998  b6d52368 6050 
>> CPU3: stopping
>>
>> I would like to help debugging and fixing this issue, but I don't really
>> have idea where to start. Here are some more detailed information about
>> my test system:
>>
>> 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree
>> source: arch/arm/boot/dts/exynos4412-odroidu3.dts)
>>
>> 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card
>> (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device
>> tree)
>>
>> 3. Rootfs: Ext4
>>
>> 4. Kernel config: arch/arm/configs/exynos_defconfig
>>
>> I can gather more logs if needed, just let me which kernel option to
>> enable. Reverting this commit on top of next-20190218 as well as current
>> linux-next (tested with next-20190221) fixes this issue and makes the
>> system bootable again.
> Could you test the patch in following link and see if it can make a 
> difference?
>
> https://marc.info/?l=linux-aio&m=155070355614541&w=2

I've tested that patch, but it doesn't make any difference on the test
system. In the log I see no warning added by it.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-22 Thread Marek Szyprowski
Dear All,

On 2019-02-15 12:13, Ming Lei wrote:
> This patch pulls the trigger for multi-page bvecs.
>
> Reviewed-by: Omar Sandoval 
> Signed-off-by: Ming Lei 

Since Linux next-20190218 I've observed problems with block layer on one
of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
this issue led me to this change. This is also the first linux-next
release with this change merged. The issue is fully reproducible and can
be observed in the following kernel log:

sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
s3c-sdhci 1253.sdhci: Got CD GPIO
mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
mmc0: new high speed SDHC card at address 
mmcblk0: mmc0: SL16G 14.8 GiB

...

EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
EXT4-fs (mmcblk0p2): write access will be enabled during recovery
EXT4-fs (mmcblk0p2): recovery complete
EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
devtmpfs: mounted
Freeing unused kernel memory: 1024K
hub 1-3:1.0: USB hub found
Run /sbin/init as init process
hub 1-3:1.0: 3 ports detected
*** stack smashing detected ***:  terminated
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004
CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x90/0xc8)
[] (dump_stack) from [] (panic+0xfc/0x304)
[] (panic) from [] (do_exit+0xabc/0xc6c)
[] (do_exit) from [] (do_group_exit+0x3c/0xbc)
[] (do_group_exit) from [] (get_signal+0x130/0xbf4)
[] (get_signal) from [] (do_work_pending+0x130/0x618)
[] (do_work_pending) from []
(slow_work_pending+0xc/0x20)
Exception stack(0xe88c3fb0 to 0xe88c3ff8)
3fa0:  bea7787c 0005
b6e8d0b8
3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000
bea77b60
3fe0: 0020 bea77998  b6d52368 6050 
CPU3: stopping

I would like to help debugging and fixing this issue, but I don't really
have idea where to start. Here are some more detailed information about
my test system:

1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree
source: arch/arm/boot/dts/exynos4412-odroidu3.dts)

2. Block device: MMC/SDHCI/SDHCI-S3C with SD card
(drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device
tree)

3. Rootfs: Ext4

4. Kernel config: arch/arm/configs/exynos_defconfig

I can gather more logs if needed, just let me which kernel option to
enable. Reverting this commit on top of next-20190218 as well as current
linux-next (tested with next-20190221) fixes this issue and makes the
system bootable again.

> ---
>  block/bio.c | 22 +++---
>  fs/iomap.c  |  4 ++--
>  fs/xfs/xfs_aops.c   |  4 ++--
>  include/linux/bio.h |  2 +-
>  4 files changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 968b12fea564..83a2dfa417ca 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -753,6 +753,8 @@ EXPORT_SYMBOL(bio_add_pc_page);
>   * @page: page to add
>   * @len: length of the data to add
>   * @off: offset of the data in @page
> + * @same_page: if %true only merge if the new data is in the same physical
> + *   page as the last segment of the bio.
>   *
>   * Try to add the data at @page + @off to the last bvec of @bio.  This is a
>   * a useful optimisation for file systems with a block size smaller than the
> @@ -761,19 +763,25 @@ EXPORT_SYMBOL(bio_add_pc_page);
>   * Return %true on success or %false on failure.
>   */
>  bool __bio_try_merge_page(struct bio *bio, struct page *page,
> - unsigned int len, unsigned int off)
> + unsigned int len, unsigned int off, bool same_page)
>  {
>   if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
>   return false;
>  
>   if (bio->bi_vcnt > 0) {
>   struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
> + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
> + bv->bv_offset + bv->bv_len - 1;
> + phys_addr_t page_addr = page_to_phys(page);
>  
> - if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
> - bv->bv_len += len;
> - bio->bi_iter.bi_size += len;
> - return true;
> - }
> + if (vec_end_addr + 1 != page_addr + off)
> + return false;
> + if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
> + return false;
> +
> + bv->bv_len += len;
> + bio->bi_iter.bi_size += len;
> + return true;
>   }
>   return false;
>  }
> @@ -819,7 +827,7

Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-22 Thread Marek Szyprowski
Hi Ming,

On 2019-02-21 11:16, Ming Lei wrote:
> On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote:
>> On 2019-02-21 10:57, Ming Lei wrote:
>>> On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote:
 On 2019-02-15 12:13, Ming Lei wrote:
> This patch pulls the trigger for multi-page bvecs.
>
> Reviewed-by: Omar Sandoval 
> Signed-off-by: Ming Lei 
 Since Linux next-20190218 I've observed problems with block layer on one
 of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
 this issue led me to this change. This is also the first linux-next
 release with this change merged. The issue is fully reproducible and can
 be observed in the following kernel log:

 sdhci: Secure Digital Host Controller Interface driver
 sdhci: Copyright(c) Pierre Ossman
 s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
 s3c-sdhci 1253.sdhci: Got CD GPIO
 mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
 mmc0: new high speed SDHC card at address 
 mmcblk0: mmc0: SL16G 14.8 GiB

 ...

 EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
 EXT4-fs (mmcblk0p2): write access will be enabled during recovery
 EXT4-fs (mmcblk0p2): recovery complete
 EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: 
 (null)
 VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
 devtmpfs: mounted
 Freeing unused kernel memory: 1024K
 hub 1-3:1.0: USB hub found
 Run /sbin/init as init process
 hub 1-3:1.0: 3 ports detected
 *** stack smashing detected ***:  terminated
 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004
 CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546
 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
 [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
 [] (show_stack) from [] (dump_stack+0x90/0xc8)
 [] (dump_stack) from [] (panic+0xfc/0x304)
 [] (panic) from [] (do_exit+0xabc/0xc6c)
 [] (do_exit) from [] (do_group_exit+0x3c/0xbc)
 [] (do_group_exit) from [] (get_signal+0x130/0xbf4)
 [] (get_signal) from [] (do_work_pending+0x130/0x618)
 [] (do_work_pending) from []
 (slow_work_pending+0xc/0x20)
 Exception stack(0xe88c3fb0 to 0xe88c3ff8)
 3fa0:  bea7787c 0005
 b6e8d0b8
 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000
 bea77b60
 3fe0: 0020 bea77998  b6d52368 6050 
 CPU3: stopping

 I would like to help debugging and fixing this issue, but I don't really
 have idea where to start. Here are some more detailed information about
 my test system:

 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree
 source: arch/arm/boot/dts/exynos4412-odroidu3.dts)

 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card
 (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device
 tree)

 3. Rootfs: Ext4

 4. Kernel config: arch/arm/configs/exynos_defconfig

 I can gather more logs if needed, just let me which kernel option to
 enable. Reverting this commit on top of next-20190218 as well as current
 linux-next (tested with next-20190221) fixes this issue and makes the
 system bootable again.
>>> Could you test the patch in following link and see if it can make a 
>>> difference?
>>>
>>> https://marc.info/?l=linux-aio&m=155070355614541&w=2
>> I've tested that patch, but it doesn't make any difference on the test
>> system. In the log I see no warning added by it.
> I guess it might be related with memory corruption, could you enable the
> following debug options and post the dmesg log?
>
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_KASAN=y

It won't be that easy as none of the above options is available on ARM
32bit. I will try to apply some ARM KASAN patches floating on the net
and let you know the result.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-22 Thread Marek Szyprowski
Hi Ming,

On 2019-02-21 11:38, Ming Lei wrote:
> On Thu, Feb 21, 2019 at 11:22:39AM +0100, Marek Szyprowski wrote:
>> On 2019-02-21 11:16, Ming Lei wrote:
>>> On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote:
 On 2019-02-21 10:57, Ming Lei wrote:
> On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote:
>> On 2019-02-15 12:13, Ming Lei wrote:
>>> This patch pulls the trigger for multi-page bvecs.
>>>
>>> Reviewed-by: Omar Sandoval 
>>> Signed-off-by: Ming Lei 
>> Since Linux next-20190218 I've observed problems with block layer on one
>> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
>> this issue led me to this change. This is also the first linux-next
>> release with this change merged. The issue is fully reproducible and can
>> be observed in the following kernel log:
>>
>> sdhci: Secure Digital Host Controller Interface driver
>> sdhci: Copyright(c) Pierre Ossman
>> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
>> s3c-sdhci 1253.sdhci: Got CD GPIO
>> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
>> mmc0: new high speed SDHC card at address 
>> mmcblk0: mmc0: SL16G 14.8 GiB
>>
>> ...
>>
>> EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
>> EXT4-fs (mmcblk0p2): write access will be enabled during recovery
>> EXT4-fs (mmcblk0p2): recovery complete
>> EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: 
>> (null)
>> VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
>> devtmpfs: mounted
>> Freeing unused kernel memory: 1024K
>> hub 1-3:1.0: USB hub found
>> Run /sbin/init as init process
>> hub 1-3:1.0: 3 ports detected
>> *** stack smashing detected ***:  terminated
>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004
>> CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546
>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> [] (show_stack) from [] (dump_stack+0x90/0xc8)
>> [] (dump_stack) from [] (panic+0xfc/0x304)
>> [] (panic) from [] (do_exit+0xabc/0xc6c)
>> [] (do_exit) from [] (do_group_exit+0x3c/0xbc)
>> [] (do_group_exit) from [] (get_signal+0x130/0xbf4)
>> [] (get_signal) from [] (do_work_pending+0x130/0x618)
>> [] (do_work_pending) from []
>> (slow_work_pending+0xc/0x20)
>> Exception stack(0xe88c3fb0 to 0xe88c3ff8)
>> 3fa0:  bea7787c 0005
>> b6e8d0b8
>> 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000
>> bea77b60
>> 3fe0: 0020 bea77998  b6d52368 6050 
>> CPU3: stopping
>>
>> I would like to help debugging and fixing this issue, but I don't really
>> have idea where to start. Here are some more detailed information about
>> my test system:
>>
>> 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree
>> source: arch/arm/boot/dts/exynos4412-odroidu3.dts)
>>
>> 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card
>> (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device
>> tree)
>>
>> 3. Rootfs: Ext4
>>
>> 4. Kernel config: arch/arm/configs/exynos_defconfig
>>
>> I can gather more logs if needed, just let me which kernel option to
>> enable. Reverting this commit on top of next-20190218 as well as current
>> linux-next (tested with next-20190221) fixes this issue and makes the
>> system bootable again.
> Could you test the patch in following link and see if it can make a 
> difference?
>
> https://marc.info/?l=linux-aio&m=155070355614541&w=2
 I've tested that patch, but it doesn't make any difference on the test
 system. In the log I see no warning added by it.
>>> I guess it might be related with memory corruption, could you enable the
>>> following debug options and post the dmesg log?
>>>
>>> CONFIG_DEBUG_STACKOVERFLOW=y
>>> CONFIG_KASAN=y
>> It won't be that easy as none of the above options is available on ARM
>> 32bit. I will try to apply some ARM KASAN patches floating on the net
>> and let you know the result.
> Hi Marek,
>
> Could you test the following patch?

Yes. Sadly, no change observed.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-21 Thread Ming Lei
On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote:
> Hi Ming,
> 
> On 2019-02-21 10:57, Ming Lei wrote:
> > On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote:
> >> On 2019-02-15 12:13, Ming Lei wrote:
> >>> This patch pulls the trigger for multi-page bvecs.
> >>>
> >>> Reviewed-by: Omar Sandoval 
> >>> Signed-off-by: Ming Lei 
> >> Since Linux next-20190218 I've observed problems with block layer on one
> >> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
> >> this issue led me to this change. This is also the first linux-next
> >> release with this change merged. The issue is fully reproducible and can
> >> be observed in the following kernel log:
> >>
> >> sdhci: Secure Digital Host Controller Interface driver
> >> sdhci: Copyright(c) Pierre Ossman
> >> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
> >> s3c-sdhci 1253.sdhci: Got CD GPIO
> >> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
> >> mmc0: new high speed SDHC card at address 
> >> mmcblk0: mmc0: SL16G 14.8 GiB
> >>
> >> ...
> >>
> >> EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
> >> EXT4-fs (mmcblk0p2): write access will be enabled during recovery
> >> EXT4-fs (mmcblk0p2): recovery complete
> >> EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: 
> >> (null)
> >> VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
> >> devtmpfs: mounted
> >> Freeing unused kernel memory: 1024K
> >> hub 1-3:1.0: USB hub found
> >> Run /sbin/init as init process
> >> hub 1-3:1.0: 3 ports detected
> >> *** stack smashing detected ***:  terminated
> >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004
> >> CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546
> >> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> >> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> >> [] (show_stack) from [] (dump_stack+0x90/0xc8)
> >> [] (dump_stack) from [] (panic+0xfc/0x304)
> >> [] (panic) from [] (do_exit+0xabc/0xc6c)
> >> [] (do_exit) from [] (do_group_exit+0x3c/0xbc)
> >> [] (do_group_exit) from [] (get_signal+0x130/0xbf4)
> >> [] (get_signal) from [] (do_work_pending+0x130/0x618)
> >> [] (do_work_pending) from []
> >> (slow_work_pending+0xc/0x20)
> >> Exception stack(0xe88c3fb0 to 0xe88c3ff8)
> >> 3fa0:  bea7787c 0005
> >> b6e8d0b8
> >> 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000
> >> bea77b60
> >> 3fe0: 0020 bea77998  b6d52368 6050 
> >> CPU3: stopping
> >>
> >> I would like to help debugging and fixing this issue, but I don't really
> >> have idea where to start. Here are some more detailed information about
> >> my test system:
> >>
> >> 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree
> >> source: arch/arm/boot/dts/exynos4412-odroidu3.dts)
> >>
> >> 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card
> >> (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device
> >> tree)
> >>
> >> 3. Rootfs: Ext4
> >>
> >> 4. Kernel config: arch/arm/configs/exynos_defconfig
> >>
> >> I can gather more logs if needed, just let me which kernel option to
> >> enable. Reverting this commit on top of next-20190218 as well as current
> >> linux-next (tested with next-20190221) fixes this issue and makes the
> >> system bootable again.
> > Could you test the patch in following link and see if it can make a 
> > difference?
> >
> > https://marc.info/?l=linux-aio&m=155070355614541&w=2
> 
> I've tested that patch, but it doesn't make any difference on the test
> system. In the log I see no warning added by it.

I guess it might be related with memory corruption, could you enable the
following debug options and post the dmesg log?

CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_KASAN=y

Thanks,
Ming



Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-21 Thread Ming Lei
On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote:
> Dear All,
> 
> On 2019-02-15 12:13, Ming Lei wrote:
> > This patch pulls the trigger for multi-page bvecs.
> >
> > Reviewed-by: Omar Sandoval 
> > Signed-off-by: Ming Lei 
> 
> Since Linux next-20190218 I've observed problems with block layer on one
> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting
> this issue led me to this change. This is also the first linux-next
> release with this change merged. The issue is fully reproducible and can
> be observed in the following kernel log:
> 
> sdhci: Secure Digital Host Controller Interface driver
> sdhci: Copyright(c) Pierre Ossman
> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz)
> s3c-sdhci 1253.sdhci: Got CD GPIO
> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA
> mmc0: new high speed SDHC card at address 
> mmcblk0: mmc0: SL16G 14.8 GiB
> 
> ...
> 
> EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
> EXT4-fs (mmcblk0p2): write access will be enabled during recovery
> EXT4-fs (mmcblk0p2): recovery complete
> EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
> VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
> devtmpfs: mounted
> Freeing unused kernel memory: 1024K
> hub 1-3:1.0: USB hub found
> Run /sbin/init as init process
> hub 1-3:1.0: 3 ports detected
> *** stack smashing detected ***:  terminated
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004
> CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546
> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0x90/0xc8)
> [] (dump_stack) from [] (panic+0xfc/0x304)
> [] (panic) from [] (do_exit+0xabc/0xc6c)
> [] (do_exit) from [] (do_group_exit+0x3c/0xbc)
> [] (do_group_exit) from [] (get_signal+0x130/0xbf4)
> [] (get_signal) from [] (do_work_pending+0x130/0x618)
> [] (do_work_pending) from []
> (slow_work_pending+0xc/0x20)
> Exception stack(0xe88c3fb0 to 0xe88c3ff8)
> 3fa0:  bea7787c 0005
> b6e8d0b8
> 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000
> bea77b60
> 3fe0: 0020 bea77998  b6d52368 6050 
> CPU3: stopping
> 
> I would like to help debugging and fixing this issue, but I don't really
> have idea where to start. Here are some more detailed information about
> my test system:
> 
> 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree
> source: arch/arm/boot/dts/exynos4412-odroidu3.dts)
> 
> 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card
> (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device
> tree)
> 
> 3. Rootfs: Ext4
> 
> 4. Kernel config: arch/arm/configs/exynos_defconfig
> 
> I can gather more logs if needed, just let me which kernel option to
> enable. Reverting this commit on top of next-20190218 as well as current
> linux-next (tested with next-20190221) fixes this issue and makes the
> system bootable again.

Could you test the patch in following link and see if it can make a difference?

https://marc.info/?l=linux-aio&m=155070355614541&w=2

Thanks,
Ming



[Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs

2019-02-15 Thread Ming Lei
This patch pulls the trigger for multi-page bvecs.

Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 block/bio.c | 22 +++---
 fs/iomap.c  |  4 ++--
 fs/xfs/xfs_aops.c   |  4 ++--
 include/linux/bio.h |  2 +-
 4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 968b12fea564..83a2dfa417ca 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -753,6 +753,8 @@ EXPORT_SYMBOL(bio_add_pc_page);
  * @page: page to add
  * @len: length of the data to add
  * @off: offset of the data in @page
+ * @same_page: if %true only merge if the new data is in the same physical
+ * page as the last segment of the bio.
  *
  * Try to add the data at @page + @off to the last bvec of @bio.  This is a
  * a useful optimisation for file systems with a block size smaller than the
@@ -761,19 +763,25 @@ EXPORT_SYMBOL(bio_add_pc_page);
  * Return %true on success or %false on failure.
  */
 bool __bio_try_merge_page(struct bio *bio, struct page *page,
-   unsigned int len, unsigned int off)
+   unsigned int len, unsigned int off, bool same_page)
 {
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
return false;
 
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+   phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
+   bv->bv_offset + bv->bv_len - 1;
+   phys_addr_t page_addr = page_to_phys(page);
 
-   if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
-   bv->bv_len += len;
-   bio->bi_iter.bi_size += len;
-   return true;
-   }
+   if (vec_end_addr + 1 != page_addr + off)
+   return false;
+   if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
+   return false;
+
+   bv->bv_len += len;
+   bio->bi_iter.bi_size += len;
+   return true;
}
return false;
 }
@@ -819,7 +827,7 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
 int bio_add_page(struct bio *bio, struct page *page,
 unsigned int len, unsigned int offset)
 {
-   if (!__bio_try_merge_page(bio, page, len, offset)) {
+   if (!__bio_try_merge_page(bio, page, len, offset, false)) {
if (bio_full(bio))
return 0;
__bio_add_page(bio, page, len, offset);
diff --git a/fs/iomap.c b/fs/iomap.c
index af736acd9006..0c350e658b7f 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -318,7 +318,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
 */
sector = iomap_sector(iomap, pos);
if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
-   if (__bio_try_merge_page(ctx->bio, page, plen, poff))
+   if (__bio_try_merge_page(ctx->bio, page, plen, poff, true))
goto done;
is_contig = true;
}
@@ -349,7 +349,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
ctx->bio->bi_end_io = iomap_read_end_io;
}
 
-   __bio_add_page(ctx->bio, page, plen, poff);
+   bio_add_page(ctx->bio, page, plen, poff);
 done:
/*
 * Move the caller beyond our range so that it keeps making progress.
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1f1829e506e8..b9fd44168f61 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -616,12 +616,12 @@ xfs_add_to_ioend(
bdev, sector);
}
 
-   if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
+   if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff, true)) {
if (iop)
atomic_inc(&iop->write_count);
if (bio_full(wpc->ioend->io_bio))
xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
-   __bio_add_page(wpc->ioend->io_bio, page, len, poff);
+   bio_add_page(wpc->ioend->io_bio, page, len, poff);
}
 
wpc->ioend->io_size += len;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 089370eb84d9..9f77adcfde82 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -441,7 +441,7 @@ extern int bio_add_page(struct bio *, struct page *, 
unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
   unsigned int, unsigned int);
 bool __bio_try_merge_page(struct bio *bio, struct page *page,
-   unsigned int len, unsigned int off);
+   unsigned int len, unsigned int off, bool same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
-- 
2.9.5