Re: [REGRESSION] mm: process_vm_readv testcase no longer works after compat_prcoess_vm_readv removed

2020-10-26 Thread Jens Axboe
On 10/26/20 6:05 PM, Al Viro wrote:
> On Mon, Oct 26, 2020 at 05:56:11PM -0600, Jens Axboe wrote:
>> On 10/26/20 4:55 PM, Kyle Huey wrote:
>>> A test program from the rr[0] test suite, vm_readv_writev[1], no
>>> longer works on 5.10-rc1 when compiled as a 32 bit binary and executed
>>> on a 64 bit kernel. The first process_vm_readv call (on line 35) now
>>> fails with EFAULT. I have bisected this to
>>> c3973b401ef2b0b8005f8074a10e96e3ea093823.
>>>
>>> It should be fairly straightforward to extract the test case from our
>>> repository into a standalone program.
>>
>> Can you check with this applied?
>>
>> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
>> index fd12da80b6f2..05676722d9cd 100644
>> --- a/mm/process_vm_access.c
>> +++ b/mm/process_vm_access.c
>> @@ -273,7 +273,8 @@ static ssize_t process_vm_rw(pid_t pid,
>>  return rc;
>>  if (!iov_iter_count())
>>  goto free_iov_l;
>> -iov_r = iovec_from_user(rvec, riovcnt, UIO_FASTIOV, iovstack_r, false);
>> +iov_r = iovec_from_user(rvec, riovcnt, UIO_FASTIOV, iovstack_r,
>> +in_compat_syscall());
> 
> _ouch_
> 
> There's a bug, all right, but I'm not sure that this is all there is
> to it. For now it's probably the right fix, but...  Consider the fun
> trying to use that from 32bit process to access the memory of 64bit
> one.  IOW, we might want to add an explicit flag for "force 64bit
> addresses/sizes in rvec".

Ouch yes good point, nice catch.

-- 
Jens Axboe



Re: [REGRESSION] mm: process_vm_readv testcase no longer works after compat_prcoess_vm_readv removed

2020-10-26 Thread Jens Axboe
On 10/26/20 4:55 PM, Kyle Huey wrote:
> A test program from the rr[0] test suite, vm_readv_writev[1], no
> longer works on 5.10-rc1 when compiled as a 32 bit binary and executed
> on a 64 bit kernel. The first process_vm_readv call (on line 35) now
> fails with EFAULT. I have bisected this to
> c3973b401ef2b0b8005f8074a10e96e3ea093823.
> 
> It should be fairly straightforward to extract the test case from our
> repository into a standalone program.

Can you check with this applied?

diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index fd12da80b6f2..05676722d9cd 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -273,7 +273,8 @@ static ssize_t process_vm_rw(pid_t pid,
return rc;
if (!iov_iter_count())
goto free_iov_l;
-   iov_r = iovec_from_user(rvec, riovcnt, UIO_FASTIOV, iovstack_r, false);
+   iov_r = iovec_from_user(rvec, riovcnt, UIO_FASTIOV, iovstack_r,
+   in_compat_syscall());
if (IS_ERR(iov_r)) {
rc = PTR_ERR(iov_r);
    goto free_iov_l;

-- 
Jens Axboe



Re: rename ->make_request_fn and move it to the block_device_operations v2

2020-07-01 Thread Jens Axboe
On 7/1/20 2:59 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series moves the make_request_fn method into block_device_operations
> with the much more descriptive ->submit_bio name.  It then also gives
> generic_make_request a more descriptive name, and further optimize the
> path to issue to blk-mq, removing the need for the direct_make_request
> bypass.

Thanks, I'll try this again.

-- 
Jens Axboe



Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 12:21 PM, Jens Axboe wrote:
> On 6/30/20 12:19 PM, Christoph Hellwig wrote:
>> On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
>>> On 6/30/20 7:57 AM, Jens Axboe wrote:
>>>> On 6/29/20 1:39 PM, Christoph Hellwig wrote:
>>>>> Hi Jens,
>>>>>
>>>>> this series moves the make_request_fn method into block_device_operations
>>>>> with the much more descriptive ->submit_bio name.  It then also gives
>>>>> generic_make_request a more descriptive name, and further optimize the
>>>>> path to issue to blk-mq, removing the need for the direct_make_request
>>>>> bypass.
>>>>
>>>> Looks good to me, and it's a nice cleanup as well. Applied.
>>>
>>> Dropped, insta-crashes with dm:
>>
>> Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
>> Or your .config?
> 
> I'd have to apply and compile again. But it's a bad RIP, so I'm guessing
> it's ->submit_bio == NULL. Let me know if you really need it, and I can
> re-generate the OOPS and have the vmlinux too.

Here's the .config

-- 
Jens Axboe

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.8.0-rc1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 10.1.0-2ubuntu1~18.04) 10.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100100
CONFIG_LD_VERSION=23000
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_SCHED_THERMAL_PRESSURE is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=18
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BA

Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 12:19 PM, Christoph Hellwig wrote:
> On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
>> On 6/30/20 7:57 AM, Jens Axboe wrote:
>>> On 6/29/20 1:39 PM, Christoph Hellwig wrote:
>>>> Hi Jens,
>>>>
>>>> this series moves the make_request_fn method into block_device_operations
>>>> with the much more descriptive ->submit_bio name.  It then also gives
>>>> generic_make_request a more descriptive name, and further optimize the
>>>> path to issue to blk-mq, removing the need for the direct_make_request
>>>> bypass.
>>>
>>> Looks good to me, and it's a nice cleanup as well. Applied.
>>
>> Dropped, insta-crashes with dm:
> 
> Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
> Or your .config?

I'd have to apply and compile again. But it's a bad RIP, so I'm guessing
it's ->submit_bio == NULL. Let me know if you really need it, and I can
re-generate the OOPS and have the vmlinux too.

-- 
Jens Axboe



Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 7:57 AM, Jens Axboe wrote:
> On 6/29/20 1:39 PM, Christoph Hellwig wrote:
>> Hi Jens,
>>
>> this series moves the make_request_fn method into block_device_operations
>> with the much more descriptive ->submit_bio name.  It then also gives
>> generic_make_request a more descriptive name, and further optimize the
>> path to issue to blk-mq, removing the need for the direct_make_request
>> bypass.
> 
> Looks good to me, and it's a nice cleanup as well. Applied.

Dropped, insta-crashes with dm:

[   10.240134] BUG: kernel NULL pointer dereference, address: 
[   10.241000] #PF: supervisor instruction fetch in kernel mode
[   10.241666] #PF: error_code(0x0010) - not-present page
[   10.242280] PGD 0 P4D 0 
[   10.242600] Oops: 0010 [#1] PREEMPT SMP
[   10.243073] CPU: 1 PID: 2110 Comm: systemd-udevd Not tainted 5.8.0-rc3+ #6655
[   10.243939] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
[   10.245012] RIP: 0010:0x0
[   10.245322] Code: Bad RIP value.
[   10.245695] RSP: 0018:c92f7af8 EFLAGS: 00010246
[   10.246333] RAX: 81c83520 RBX: 8881b805dea8 RCX: 88819e844070
[   10.247227] RDX:  RSI:  RDI: 88819e844070
[   10.248112] RBP: c92f7b48 R08: 8881b6f38800 R09: 88818ff0ea58
[   10.248994] R10:  R11: 88818ff0ea58 R12: 88819e844070
[   10.250077] R13:  R14:  R15: 888107812948
[   10.251168] FS:  7f5c3ed66a80() GS:8881b9c8() 
knlGS:
[   10.252161] CS:  0010 DS:  ES:  CR0: 80050033
[   10.253189] CR2: ffd6 CR3: 0001b2953003 CR4: 001606e0
[   10.254157] DR0:  DR1:  DR2: 
[   10.255279] DR3:  DR6: fffe0ff0 DR7: 0400
[   10.256365] Call Trace:
[   10.256781]  submit_bio_noacct+0x1f6/0x3d0
[   10.257297]  submit_bio+0x37/0x130
[   10.257780]  ? guard_bio_eod+0x2e/0x70
[   10.258418]  mpage_readahead+0x13c/0x180
[   10.259096]  ? blkdev_direct_IO+0x490/0x490
[   10.259654]  read_pages+0x68/0x2d0
[   10.260051]  page_cache_readahead_unbounded+0x1b7/0x220
[   10.260818]  generic_file_buffered_read+0x865/0xc80
[   10.261587]  ? _copy_to_user+0x6d/0x80
[   10.262171]  ? cp_new_stat+0x119/0x130
[   10.262680]  new_sync_read+0xfe/0x170
[   10.263155]  vfs_read+0xc8/0x180
[   10.263647]  ksys_read+0x53/0xc0
[   10.264209]  do_syscall_64+0x3c/0x70
[   10.264759]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   10.265200] RIP: 0033:0x7f5c3fcc9ab2
[   10.265510] Code: Bad RIP value.
[   10.265775] RSP: 002b:7ffc8e0cf9c8 EFLAGS: 0246 ORIG_RAX: 

[   10.266426] RAX: ffda RBX: 55d5eca76c68 RCX: 7f5c3fcc9ab2
[   10.267012] RDX: 0040 RSI: 55d5eca76c78 RDI: 0006
[   10.267591] RBP: 55d5eca44890 R08: 55d5eca76c50 R09: 7f5c3fd99a40
[   10.268168] R10: 0008 R11: 0246 R12: 3bd9
[   10.268744] R13: 0040 R14: 55d5eca76c50 R15: 55d5eca448e0
[   10.269319] Modules linked in:
[   10.269562] CR2: 
[   10.269845] ---[ end trace f09b8963e5a3593b ]---

-- 
Jens Axboe



Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series moves the make_request_fn method into block_device_operations
> with the much more descriptive ->submit_bio name.  It then also gives
> generic_make_request a more descriptive name, and further optimize the
> path to issue to blk-mq, removing the need for the direct_make_request
> bypass.

Looks good to me, and it's a nice cleanup as well. Applied.

-- 
Jens Axboe



Re: [PATCH 11/20] fs: remove a weird comment in submit_bh_wbc

2020-06-30 Thread Jens Axboe
On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> All bios can get remapped if submitted to partitions.  No need to
> comment on that.

I'm pretty sure that comment is from me, dating back to when the bio
code was introduced in 2001. The point wasn't the remapping, just
that from here on down the IO was purely bio based, not buffer_heads.
Anyway, totally agree that it should just die, it's not that
interesting or useful anymore.

-- 
Jens Axboe



Re: Build failure on latest powerpc/merge (311ae9e159d8 io_uring: fix dead-hung for non-iter fixed rw)

2019-11-29 Thread Jens Axboe

On 11/29/19 10:07 AM, Pavel Begunkov wrote:

On 29/11/2019 20:16, Jens Axboe wrote:

On 11/29/19 8:14 AM, Christophe Leroy wrote:


Reverting commit 311ae9e159d8 ("io_uring: fix dead-hung for non-iter
fixed rw") clears the failure.

Most likely an #include is missing.


Huh weird how the build bots didn't catch that. Does the below work?


Yes it works, thanks.


Thanks for reporting and testing, I've queued it up with your reported
and tested-by.


My bad, thanks for the report and fixing.


No worries, usually the build bots are great at finding these before
patches go upstream. They have been unreliable lately, unfortunately.

--
Jens Axboe



Re: Build failure on latest powerpc/merge (311ae9e159d8 io_uring: fix dead-hung for non-iter fixed rw)

2019-11-29 Thread Jens Axboe

On 11/29/19 8:14 AM, Christophe Leroy wrote:



Le 29/11/2019 à 17:04, Jens Axboe a écrit :

On 11/29/19 6:53 AM, Christophe Leroy wrote:

 CC  fs/io_uring.o
fs/io_uring.c: In function ‘loop_rw_iter’:
fs/io_uring.c:1628:21: error: implicit declaration of function ‘kmap’
[-Werror=implicit-function-declaration]
   iovec.iov_base = kmap(iter->bvec->bv_page)
^
fs/io_uring.c:1628:19: warning: assignment makes pointer from integer
without a cast [-Wint-conversion]
   iovec.iov_base = kmap(iter->bvec->bv_page)
  ^
fs/io_uring.c:1643:4: error: implicit declaration of function ‘kunmap’
[-Werror=implicit-function-declaration]
   kunmap(iter->bvec->bv_page);
   ^


Reverting commit 311ae9e159d8 ("io_uring: fix dead-hung for non-iter
fixed rw") clears the failure.

Most likely an #include is missing.


Huh weird how the build bots didn't catch that. Does the below work?


Yes it works, thanks.


Thanks for reporting and testing, I've queued it up with your reported
and tested-by.

--
Jens Axboe



Re: Build failure on latest powerpc/merge (311ae9e159d8 io_uring: fix dead-hung for non-iter fixed rw)

2019-11-29 Thread Jens Axboe

On 11/29/19 6:53 AM, Christophe Leroy wrote:

CC  fs/io_uring.o
fs/io_uring.c: In function ‘loop_rw_iter’:
fs/io_uring.c:1628:21: error: implicit declaration of function ‘kmap’
[-Werror=implicit-function-declaration]
  iovec.iov_base = kmap(iter->bvec->bv_page)
   ^
fs/io_uring.c:1628:19: warning: assignment makes pointer from integer
without a cast [-Wint-conversion]
  iovec.iov_base = kmap(iter->bvec->bv_page)
 ^
fs/io_uring.c:1643:4: error: implicit declaration of function ‘kunmap’
[-Werror=implicit-function-declaration]
  kunmap(iter->bvec->bv_page);
  ^


Reverting commit 311ae9e159d8 ("io_uring: fix dead-hung for non-iter
fixed rw") clears the failure.

Most likely an #include is missing.


Huh weird how the build bots didn't catch that. Does the below work?


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2c2e8c25da01..745eb005fefe 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS

 #include 

--
Jens Axboe



Re: [PATCH v6 15/24] fs/io_uring: set FOLL_PIN via pin_user_pages()

2019-11-19 Thread Jens Axboe
On 11/19/19 1:16 AM, John Hubbard wrote:
> Convert fs/io_uring to use the new pin_user_pages() call, which sets
> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
> tracking of pinned pages, and therefore for any code that calls
> put_user_page().
> 
> In partial anticipation of this work, the io_uring code was already
> calling put_user_page() instead of put_page(). Therefore, in order to
> convert from the get_user_pages()/put_page() model, to the
> pin_user_pages()/put_user_page() model, the only change required
> here is to change get_user_pages() to pin_user_pages().
> 
> Reviewed-by: Jan Kara 
> Signed-off-by: John Hubbard 

You dropped my reviewed-by now... Given the file, you'd probably want
to keep that.

-- 
Jens Axboe



Re: [PATCH 10/19] fs/io_uring: set FOLL_PIN via pin_user_pages()

2019-11-01 Thread Jens Axboe
On 10/30/19 4:49 PM, John Hubbard wrote:
> Convert fs/io_uring to use the new pin_user_pages() call, which sets
> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
> tracking of pinned pages, and therefore for any code that calls
> put_user_page().

Reviewed-by: Jens Axboe 

-- 
Jens Axboe



Re: [PATCH] block/ps3vram: Use %llu to format sector_t after LBDAF removal

2019-06-13 Thread Jens Axboe
On 6/13/19 1:30 AM, Geert Uytterhoeven wrote:
> The removal of CONFIG_LBDAF changed the type of sector_t from "unsigned
> long" to "u64" aka "unsigned long long" on 64-bit platforms, leading to
> a compiler warning regression:
> 
>  drivers/block/ps3vram.c: In function ‘ps3vram_probe’:
>  drivers/block/ps3vram.c:770:23: warning: format ‘%lu’ expects argument 
> of type ‘long unsigned int’, but argument 4 has type ‘sector_t {aka long long 
> unsigned int}’ [-Wformat=]
> 
> Fix this by using "%llu" instead.

Applied, thanks.

-- 
Jens Axboe



Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Jens Axboe
On 4/3/19 9:49 AM, Will Deacon wrote:
> On Wed, Apr 03, 2019 at 09:39:52AM -0600, Jens Axboe wrote:
>> On 4/3/19 9:19 AM, Will Deacon wrote:
>>> On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
>>>> On 4/3/19 5:11 AM, Will Deacon wrote:
>>>>> will@autoplooker:~/liburing/test$ ./io_uring_register 
>>>>> RELIMIT_MEMLOCK: 67108864 (67108864)
>>>>> [   35.477875] Unable to handle kernel NULL pointer dereference at 
>>>>> virtual address 0070
>>>>> [   35.478969] Mem abort info:
>>>>> [   35.479296]   ESR = 0x9604
>>>>> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
>>>>> [   35.480528]   SET = 0, FnV = 0
>>>>> [   35.480980]   EA = 0, S1PTW = 0
>>>>> [   35.481345] Data abort info:
>>>>> [   35.481680]   ISV = 0, ISS = 0x0004
>>>>> [   35.482267]   CM = 0, WnR = 0
>>>>> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
>>>>> [   35.483486] [0070] pgd=
>>>>> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
>>>>> [   35.484788] Modules linked in:
>>>>> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
>>>>> 5.1.0-rc3-00012-g40b114779944 #1
>>>>> [   35.486712] Hardware name: linux,dummy-virt (DT)
>>>>> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
>>>>> [   35.488228] pc : link_pwq+0x10/0x60
>>>>> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
>>>>> [   35.489550] sp : ffff000017e2bbc0
>>>>
>>>> Huh, this looks odd, it's crashing inside the wq setup.
>>>
>>> Enabling KASAN seems to indicate a double-free, which may well be related.
>>
>> Does this help?
> 
> Yes, thanks for the quick patch. Feel free to add:
> 
> Reported-by: Will Deacon 
> Tested-by: Will Deacon 
> 
> if you spin a proper patch.

Great, thanks for reporting/testing.

-- 
Jens Axboe



Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Jens Axboe
On 4/3/19 9:19 AM, Will Deacon wrote:
> Hi Jens,
> 
> On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
>> On 4/3/19 5:11 AM, Will Deacon wrote:
>>> will@autoplooker:~/liburing/test$ ./io_uring_register 
>>> RELIMIT_MEMLOCK: 67108864 (67108864)
>>> [   35.477875] Unable to handle kernel NULL pointer dereference at virtual 
>>> address 0070
>>> [   35.478969] Mem abort info:
>>> [   35.479296]   ESR = 0x9604
>>> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
>>> [   35.480528]   SET = 0, FnV = 0
>>> [   35.480980]   EA = 0, S1PTW = 0
>>> [   35.481345] Data abort info:
>>> [   35.481680]   ISV = 0, ISS = 0x0004
>>> [   35.482267]   CM = 0, WnR = 0
>>> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
>>> [   35.483486] [0070] pgd=
>>> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
>>> [   35.484788] Modules linked in:
>>> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
>>> 5.1.0-rc3-00012-g40b114779944 #1
>>> [   35.486712] Hardware name: linux,dummy-virt (DT)
>>> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
>>> [   35.488228] pc : link_pwq+0x10/0x60
>>> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
>>> [   35.489550] sp : 17e2bbc0
>>
>> Huh, this looks odd, it's crashing inside the wq setup.
> 
> Enabling KASAN seems to indicate a double-free, which may well be related.

Does this help?


diff --git a/fs/io_uring.c b/fs/io_uring.c
index bbdbd56cf2ac..07d6ef195d05 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2215,6 +2215,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, 
void __user *arg,
fput(ctx->user_files[i]);
 
kfree(ctx->user_files);
+   ctx->user_files = NULL;
ctx->nr_user_files = 0;
return ret;
}

-- 
Jens Axboe



Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Jens Axboe
On 4/3/19 5:11 AM, Will Deacon wrote:
> Hi Michael,
> 
> On Wed, Apr 03, 2019 at 01:47:50PM +1100, Michael Ellerman wrote:
>> Arnd Bergmann  writes:
>>> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
>>> b/arch/powerpc/kernel/syscalls/syscall.tbl
>>> index b18abb0c3dae..00f5a63c8d9a 100644
>>> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
>>> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
>>> @@ -505,3 +505,7 @@
>>>  42132  rt_sigtimedwait_time64  sys_rt_sigtimedwait 
>>> compat_sys_rt_sigtimedwait_time64
>>>  42232  futex_time64sys_futex   
>>> sys_futex
>>>  42332  sched_rr_get_interval_time64
>>> sys_sched_rr_get_interval   sys_sched_rr_get_interval
>>> +424common  pidfd_send_signal   sys_pidfd_send_signal
>>> +425common  io_uring_setup  sys_io_uring_setup
>>> +426common  io_uring_enter  sys_io_uring_enter
>>> +427common  io_uring_register   sys_io_uring_register
>>
>> Acked-by: Michael Ellerman  (powerpc)
>>
>> Lightly tested.
>>
>> The pidfd_test selftest passes.
> 
> That reports pass for me too, although it fails to unshare the pid ns, which I
> assume is benign.
> 
>> Ran the io_uring example from fio, which prints lots of:
> 
> How did you invoke that? I had a play with the tests in:

It's t/io_uring from the fio repo:

git://git.kernel.dk/fio

and you just run it ala:

# make t/io_uring
# t/io_uring /dev/some_device

>   git://git.kernel.dk/liburing
> 
> but I quickly ran into the kernel oops below.
> 
> Will
> 
> --->8
> 
> will@autoplooker:~/liburing/test$ ./io_uring_register 
> RELIMIT_MEMLOCK: 67108864 (67108864)
> [   35.477875] Unable to handle kernel NULL pointer dereference at virtual 
> address 0070
> [   35.478969] Mem abort info:
> [   35.479296]   ESR = 0x9604
> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
> [   35.480528]   SET = 0, FnV = 0
> [   35.480980]   EA = 0, S1PTW = 0
> [   35.481345] Data abort info:
> [   35.481680]   ISV = 0, ISS = 0x0004
> [   35.482267]   CM = 0, WnR = 0
> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
> [   35.483486] [0070] pgd=
> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [   35.484788] Modules linked in:
> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
> 5.1.0-rc3-00012-g40b114779944 #1
> [   35.486712] Hardware name: linux,dummy-virt (DT)
> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
> [   35.488228] pc : link_pwq+0x10/0x60
> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> [   35.489550] sp : 17e2bbc0

Huh, this looks odd, it's crashing inside the wq setup.


-- 
Jens Axboe



Re: [PATCH] block/swim3: Fix regression on PowerBook G3

2018-12-31 Thread Jens Axboe
On 12/30/18 10:44 PM, Finn Thain wrote:
> As of v4.20, the swim3 driver crashes when loaded on a PowerBook G3
> (Wallstreet).
> 
> MacIO PCI driver attached to Gatwick chipset
> MacIO PCI driver attached to Heathrow chipset
> swim3 0.00015000:floppy: [fd0] SWIM3 floppy controller in media bay
> 0.00013020:ch-a: ttyS0 at MMIO 0xf3013020 (irq = 16, base_baud = 230400) is a 
> Z85c30 ESCC - Serial port
> 0.00013000:ch-b: ttyS1 at MMIO 0xf3013000 (irq = 17, base_baud = 230400) is a 
> Z85c30 ESCC - Infrared port
> macio: fixed media-bay irq on gatwick
> macio: fixed left floppy irqs
> swim3 1.00015000:floppy: [fd1] Couldn't request interrupt
> Unable to handle kernel paging request for data at address 0x0024
> Faulting instruction address: 0xc02652f8
> Oops: Kernel access of bad area, sig: 11 [#1]
> BE SMP NR_CPUS=2 PowerMac
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0 #2
> NIP:  c02652f8 LR: c026915c CTR: c0276d1c
> REGS: df43ba10 TRAP: 0300   Not tainted  (4.20.0)
> MSR:  9032   CR: 28228288  XER: 0100
> DAR: 0024 DSISR: 4000
> GPR00: c026915c df43bac0 df439060 c0731524 df494700  c06e1c08 0001
> GPR08: 0001  df5ff220 1032 28228282  c0004ca4 
> GPR16:    c073144c dfffe064 c0731524 0120 c0586108
> GPR24: c073132c c073143c c073143c  c0731524 df67cd70 df494700 0001
> NIP [c02652f8] blk_mq_free_rqs+0x28/0xf8
> LR [c026915c] blk_mq_sched_tags_teardown+0x58/0x84
> Call Trace:
> [df43bac0] [c0045f50] flush_workqueue_prep_pwqs+0x178/0x1c4 (unreliable)
> [df43bae0] [c026915c] blk_mq_sched_tags_teardown+0x58/0x84
> [df43bb00] [c02697f0] blk_mq_exit_sched+0x9c/0xb8
> [df43bb20] [c0252794] elevator_exit+0x84/0xa4
> [df43bb40] [c0256538] blk_exit_queue+0x30/0x50
> [df43bb50] [c0256640] blk_cleanup_queue+0xe8/0x184
> [df43bb70] [c034732c] swim3_attach+0x330/0x5f0
> [df43bbb0] [c034fb24] macio_device_probe+0x58/0xec
> [df43bbd0] [c032ba88] really_probe+0x1e4/0x2f4
> [df43bc00] [c032bd28] driver_probe_device+0x64/0x204
> [df43bc20] [c0329ac4] bus_for_each_drv+0x60/0xac
> [df43bc50] [c032b824] __device_attach+0xe8/0x160
> [df43bc80] [c032ab38] bus_probe_device+0xa0/0xbc
> [df43bca0] [c0327338] device_add+0x3d8/0x630
> [df43bcf0] [c0350848] macio_add_one_device+0x444/0x48c
> [df43bd50] [c03509f8] macio_pci_add_devices+0x168/0x1bc
> [df43bd90] [c03500ec] macio_pci_probe+0xc0/0x10c
> [df43bda0] [c02ad884] pci_device_probe+0xd4/0x184
> [df43bdd0] [c032ba88] really_probe+0x1e4/0x2f4
> [df43be00] [c032bd28] driver_probe_device+0x64/0x204
> [df43be20] [c032bfcc] __driver_attach+0x104/0x108
> [df43be40] [c0329a00] bus_for_each_dev+0x64/0xb4
> [df43be70] [c032add8] bus_add_driver+0x154/0x238
> [df43be90] [c032ca24] driver_register+0x84/0x148
> [df43bea0] [c0004aa0] do_one_initcall+0x40/0x188
> [df43bf00] [c0690100] kernel_init_freeable+0x138/0x1d4
> [df43bf30] [c0004cbc] kernel_init+0x18/0x10c
> [df43bf40] [c00121e4] ret_from_kernel_thread+0x14/0x1c
> Instruction dump:
> 5484d97e 4bfff4f4 9421ffe0 7c0802a6 bf410008 7c9e2378 90010024 8124005c
> 2f89 419e0078 81230004 7c7c1b78 <81290024> 2f89 419e0064 8144
> ---[ end trace 12025ab921a9784c ]---
> 
> Reverting commit 8ccb8cb1892b ("swim3: convert to blk-mq") resolves the
> problem.
> 
> That commit added a struct blk_mq_tag_set to struct floppy_state and
> initialized it with a blk_mq_init_sq_queue() call. Unfortunately, there
> is a memset() in swim3_add_device() that subsequently clears the
> floppy_state struct. That means fs->tag_set->ops is a NULL pointer, and
> it gets dereferenced by blk_mq_free_rqs() which gets called in the
> request_irq() error path. Move the memset() to fix this bug.
> 
> BTW, the request_irq() failure for the left mediabay floppy (fd1) is not
> a regression. I don't know why it happens. The right media bay floppy
> (fd0) works fine however.

Applied, thanks.

-- 
Jens Axboe



Re: [PATCH] block/swim3: Fix -EBUSY error when re-opening device after unmount

2018-12-31 Thread Jens Axboe
On 12/30/18 10:44 PM, Finn Thain wrote:
> When the block device is opened with FMODE_EXCL, ref_count is set to -1.
> This value doesn't get reset when the device is closed which means the
> device cannot be opened again. Fix this by checking for refcount <= 0
> in the release method.

Applied, thanks.


-- 
Jens Axboe



Re: [PATCH] block/swim3: Remove dead return statement

2018-12-31 Thread Jens Axboe
On 12/30/18 10:44 PM, Finn Thain wrote:
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Finn Thain 

Applied, thanks.

-- 
Jens Axboe



Re: [Intel-wired-lan] [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread Jens Axboe
On 11/26/18 10:56 AM, Jeff Kirsher wrote:
> On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote:
>> At present there are multiple places where invalid node number is
>> encoded
>> as -1. Even though implicitly understood it is always better to have
>> macros
>> in there. Replace these open encodings for an invalid node number
>> with the
>> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions
>> like
>> 'invalid node' from various places redirecting them to a common
>> definition.
>>
>> Signed-off-by: Anshuman Khandual 
> 
> For the 'ixgbe' driver changes.
> 
> Acked-by: Jeff Kirsher 

Lost the original, but for mtip32xx:

Acked-by: Jens Axboe 

-- 
Jens Axboe



Re: System not booting since dm changes? (was Linux 4.20-rc1)

2018-11-05 Thread Jens Axboe
On 11/5/18 7:35 AM, Satheesh Rajendran wrote:
> On Mon, Nov 05, 2018 at 08:51:57AM -0500, Mike Snitzer wrote:
>> On Mon, Nov 05 2018 at  5:25am -0500,
>> Michael Ellerman  wrote:
>>
>>> Linus Torvalds  writes:
>>> ...
>>>> Mike Snitzer (1):
>>>> device mapper updates
>>>
>>> Hi Mike,
>>>
>>> Replying here because I can't find the device-mapper pull or the patch
>>> in question on LKML. I guess I should be subscribed to dm-devel.
>>>
>>> We have a box that doesn't boot any more, bisect points at one of:
>>>
>>>   cef6f55a9fb4 Mike Snitzer   dm table: require that request-based DM 
>>> be layered on blk-mq devices 
>>>   953923c09fe8 Mike Snitzer   dm: rename DM_TYPE_MQ_REQUEST_BASED to 
>>> DM_TYPE_REQUEST_BASED 
>>>   6a23e05c2fe3 Jens Axboe dm: remove legacy request-based IO path 
>>>
>>>
>>> It's a Power8 system running Rawhide, it does have multipath, but I'm
>>> told it was setup by the Fedora installer, ie. nothing fancy.
>>>
>>> The symptom is the system can't find its root filesystem and drops into
>>> the initramfs shell. The dmesg includes a bunch of errors like below:
>>>
>>>   [   43.263460] localhost multipathd[1344]: sdb: fail to get serial
>>>   [   43.268762] localhost multipathd[1344]: mpatha: failed in domap for 
>>> addition of new path sdb
>>>   [   43.268762] localhost multipathd[1344]: uevent trigger error
>>>   [   43.282065] localhost kernel: device-mapper: table: table load 
>>> rejected: not all devices are blk-mq request-stackable
>> ...
>>>
>>> Any ideas what's going wrong here?
>>
>> "table load rejected: not all devices are blk-mq request-stackable"
>> speaks to the fact that you aren't using blk-mq for scsi (aka scsi-mq).
>>
>> You need to use scsi_mod.use_blk_mq=Y on the kernel commandline (or set
>> CONFIG_SCSI_MQ_DEFAULT in your kernel config)
> 
> Thanks Mike!, above solution worked and the system booted fine now:-)

This quirk will go away for the next kernel, fwiw, since the non-mq
path for SCSI will be dropped as well.

-- 
Jens Axboe



Re: [PATCH] [RFC v2] Drop all 00-INDEX files from Documentation/

2018-09-04 Thread Jens Axboe
On 9/3/18 4:15 PM, Henrik Austad wrote:
> This is a respin with a wider audience (all that get_maintainer returned)
> and I know this spams a *lot* of people. Not sure what would be the correct
> way, so my apologies for ruining your inbox.
> 
> The 00-INDEX files are supposed to give a summary of all files present
> in a directory, but these files are horribly out of date and their
> usefulness is brought into question. Often a simple "ls" would reveal
> the same information as the filenames are generally quite descriptive as
> a short introduction to what the file covers (it should not surprise
> anyone what Documentation/sched/sched-design-CFS.txt covers)
> 
> A few years back it was mentioned that these files were no longer really
> needed, and they have since then grown further out of date, so perhaps
> it is time to just throw them out.
> 
> A short status yields the following _outdated_ 00-INDEX files, first
> counter is files listed in 00-INDEX but missing in the directory, last
> is files present but not listed in 00-INDEX.

For the block related bits:

Reviewed-by: Jens Axboe 

-- 
Jens Axboe



Re: Oops in kmem_cache_free() via bioset_exit() (was Re: [next-20180601][nvme][ppc] Kernel Oops is triggered when creating lvm snapshots on nvme disks)

2018-06-28 Thread Jens Axboe
On 6/28/18 8:42 AM, Michael Ellerman wrote:
> Kent, Jens,
> 
> This looks like it might be related to the recent bioset changes?
> 
> cheers
> 
> Abdul Haleem  writes:
>> On Tue, 2018-06-26 at 23:36 +1000, Michael Ellerman wrote:
>>> Abdul Haleem  writes:
> ...
>> I was able to reproduce again with slub_debug=FZP and DEBUG_INFO enabled
>> on 4.17.0-rc7-next-20180601, but not much traces other than the Oops stack 
>> trace
> 
> Are you still testing on that revision? It's nearly a month old.
> 
> Please try to reproduce on mainline or today's linux-next.
> 
> 
>> the faulty instruction points to below code path :
>>
>> gdb -batch vmlinux -ex 'list *(0xc0304fe0)'
>> 0xc0304fe0 is in kmem_cache_free (mm/slab.h:231).
>> 226  }
>> 227  
>> 228  static inline bool slab_equal_or_root(struct kmem_cache *s,
>> 229struct kmem_cache *p)
>> 230  {
>> 231  return p == s || p == s->memcg_params.root_cache;
>> 232  }
> 
> And s is NULL.
> 
> Called via:
>   kmem_cache_free+0x210/0x2a0
>   mempool_free_slab+0x24/0x40
>   mempool_exit+0x50/0x90
>   bioset_exit+0x40/0x1d0
>   dm_io_client_destroy+0x2c/0x50
>   dm_bufio_client_destroy+0x1fc/0x2d0 [dm_bufio]
>   persistent_read_metadata+0x430/0x660 [dm_snapshot]
>   snapshot_ctr+0x5c8/0x7a0 [dm_snapshot]
>   dm_table_add_target+0x19c/0x3c0
>   table_load+0x104/0x450
>   ctl_ioctl+0x1f8/0x570
>   dm_ctl_ioctl+0x18/0x30
>   do_vfs_ioctl+0xcc/0x9e0
>   ksys_ioctl+0x5c/0xe0
>   sys_ioctl+0x20/0x80
>   system_call+0x58/0x6c
> 
> So looks like we did:
> 
>   kmem_cache_free(NULL
> 
> Probably a bad error path that frees before the cache has been allocated.
> 
> mempool_init_node() calls mempool_exit() on a partially initialised
> mempool, which looks fishy, though you're not hitting that patch AFAICS.

The slab cache is setup elsewhere, it's pending_cache. So if pending_cache
is NULL, then yeah and exit there will barf. I'd try something like the
below, but from the trace, we already basically see the path.


diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 0c964ac107c2..ebfa2f89ffdd 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -59,6 +59,7 @@ void mempool_free_slab(void *element, void *pool_data);
 static inline int
 mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc)
 {
+   BUG_ON(!kc);
return mempool_init(pool, min_nr, mempool_alloc_slab,
mempool_free_slab, (void *) kc);
 }
diff --git a/mm/mempool.c b/mm/mempool.c
index b54f2c20e5e0..060f44acd0df 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -508,7 +508,9 @@ EXPORT_SYMBOL(mempool_alloc_slab);
 void mempool_free_slab(void *element, void *pool_data)
 {
struct kmem_cache *mem = pool_data;
-   kmem_cache_free(mem, element);
+
+   if (!WARN_ON(!mem))
+   kmem_cache_free(mem, element);
 }
 EXPORT_SYMBOL(mempool_free_slab);
 

-- 
Jens Axboe



Re: Constant ata messages on console with commit 28361c403683 ("libata: add extra internal command") (was Re: [GIT PULL 2/2] libata changes for v4.18-rc1)

2018-06-19 Thread Jens Axboe
On 6/19/18 5:35 PM, Michael Ellerman wrote:
> Jens Axboe  writes:
>> On 6/19/18 1:29 AM, Michael Ellerman wrote:
>>> Jens Axboe  writes:
>>>> On 6/18/18 1:33 AM, Michael Ellerman wrote:
>>>>> Tejun Heo  writes:
>>>>> ...
>>>>>> Jens Axboe (10):
>>>>>>   libata: introduce notion of separate hardware tags
>>>>>>   libata: convert core and drivers to ->hw_tag usage
>>>>>>   libata: bump ->qc_active to a 64-bit type
>>>>>>   libata: use ata_tag_internal() consistently
>>>>>>   libata: remove assumption that ATA_MAX_QUEUE - 1 is the max
>>>>>>   sata_nv: set host can_queue count appropriately
>>>>>>   libata: add extra internal command
>>>>>
>>>>> Replying here because I can't find the original mail.
>>>>>
>>>>> The above commit is causing one of my machines to constantly spew ata
>>>>> messages on the console, according to bisect:
>>>>>
>>>>> # first bad commit: [28361c403683c2b00d4f5e76045f3ccd299bf99d] libata: 
>>>>> add extra internal command
>>>>>
>>>>> To get it to boot I have to also apply:
>>>>>
>>>>>   88e10092f6a6 ("sata_fsl: use the right type for tag bitshift")
>>>>>
>>>>>
>>>>> The system boots OK and seems fine, except that it's just printing
>>>>> multiple of these per second:
>>>>>
>>>>>   ata2: Signature Update detected @ 0 msecs
>>>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>>>   ata2.00: configured for UDMA/100
>>>>>   ata2: Signature Update detected @ 0 msecs
>>>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>>>   ata2.00: configured for UDMA/100
>>>>>   ata2: Signature Update detected @ 0 msecs
>>>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>>>   ata2.00: configured for UDMA/100
>>>>>   ata2: Signature Update detected @ 0 msecs
>>>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>>>   ata2.00: configured for UDMA/100
>>>>>   ata2: Signature Update detected @ 0 msecs
> ...
>>
>> Actually, just try this one on top of current -git.
> 
> Yep that fixes it.
> 
> No more message spam, and when I try to mount sr0 it says "no medium".
> 
> I'll have to go into the office to actually put a disc in the drive
> to check it's really working, but it seems likely.
> 
> Thanks for debugging it, here's a tested-by if you like:
> 
> Tested-by: Michael Ellerman 

Thanks, but I packaged it a little differently, see the other
series I CC'ed you on. It'll work the same, though. It's in Tejun's
tree now, so should end up with Linus some time this week I think.

Thanks for reporting and testing the fix!

-- 
Jens Axboe



Re: Constant ata messages on console with commit 28361c403683 ("libata: add extra internal command") (was Re: [GIT PULL 2/2] libata changes for v4.18-rc1)

2018-06-19 Thread Jens Axboe
On 6/19/18 1:29 AM, Michael Ellerman wrote:
> Jens Axboe  writes:
>> On 6/18/18 1:33 AM, Michael Ellerman wrote:
>>> Tejun Heo  writes:
>>> ...
>>>> Jens Axboe (10):
>>>>   libata: introduce notion of separate hardware tags
>>>>   libata: convert core and drivers to ->hw_tag usage
>>>>   libata: bump ->qc_active to a 64-bit type
>>>>   libata: use ata_tag_internal() consistently
>>>>   libata: remove assumption that ATA_MAX_QUEUE - 1 is the max
>>>>   sata_nv: set host can_queue count appropriately
>>>>   libata: add extra internal command
>>>
>>> Replying here because I can't find the original mail.
>>>
>>> The above commit is causing one of my machines to constantly spew ata
>>> messages on the console, according to bisect:
>>>
>>> # first bad commit: [28361c403683c2b00d4f5e76045f3ccd299bf99d] libata: add 
>>> extra internal command
>>>
>>> To get it to boot I have to also apply:
>>>
>>>   88e10092f6a6 ("sata_fsl: use the right type for tag bitshift")
>>>
>>>
>>> The system boots OK and seems fine, except that it's just printing
>>> multiple of these per second:
>>>
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>
>>> And it never seems to stop.
>>>
>>> The machine is a Freescale/NXP P5020ds, using the sata_fsl driver
>>> presumably. Any ideas?
>>
>> Hmm that's odd. Can you include the boot log from a working boot as
>> well? Would be nice to see what devices are on the sata adapter.
>> The above just looks like a hardreset loop.
> 
> Ah yep. I stupidly assumed it was working, because the machine booted,
> but that's because the root disk is on ata1.
> 
> Booting the good kernel:
> 
>   ba80c3a572f4 ("sata_nv: set host can_queue count appropriately")
> 
> I see:
> 
>   root@p5020ds:~# ls -l /sys/class/ata_port/
>   total 0
>   lrwxrwxrwx 1 root root 0 Jun 19 06:49 ata1 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/ata_port/ata1
>   lrwxrwxrwx 1 root root 0 Jun 19 17:06 ata2 -> 
> ../../devices/platform/ffe00.soc/ffe221000.sata/ata2/ata_port/ata2
> 
>   root@p5020ds:~# ls -l /sys/class/block/ | grep ata
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda1 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda2 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda5 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda5
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sr0 -> 
> ../../devices/platform/ffe00.soc/ffe221000.sata/ata2/host1/target1:0:0/1:0:0:0/block/sr0
> 
> So it's the DVD drive.
> 
>   
> root@p5020ds:/sys/devices/platform/ffe00.soc/ffe221000.sata/ata2/host1/target1:0:0/1:0:0:0/scsi_device/1:0:0:0/device#
>  cat vendor 
>   Optiarc 
>   
> root@p5020ds:/sys/devices/platform/ffe00.soc/ffe221000.sata/ata2/host1/target1:0:0/1:0:0:0/scsi_device/1:0:0:0/device#
>  cat model
>   DVD RW AD-7260S 
> 
> 
> Full boot log from a good boot attached if that's helpful.
> 
> All of the above looks the same when I boot with the broken setup, it
> just spams dmesg constantly.
> 
> One thing that is different, on the good kernel I see:
> 
>   root@p5020ds:~# mount /dev/sr0 /mnt
>   mount: no medium found on /dev/sr0
> 
> vs bad (88e10092f6a6):
> 
>   root@p5020ds:~# mount /dev/sr0 /mnt
>   mount: /dev/sr0 is already mounted or /mnt busy
> 
> cheers

Actually, just try this one on top of current -git.

diff --git a/drivers/ata/sat

Re: Constant ata messages on console with commit 28361c403683 ("libata: add extra internal command") (was Re: [GIT PULL 2/2] libata changes for v4.18-rc1)

2018-06-19 Thread Jens Axboe
On 6/19/18 1:29 AM, Michael Ellerman wrote:
> Jens Axboe  writes:
>> On 6/18/18 1:33 AM, Michael Ellerman wrote:
>>> Tejun Heo  writes:
>>> ...
>>>> Jens Axboe (10):
>>>>   libata: introduce notion of separate hardware tags
>>>>   libata: convert core and drivers to ->hw_tag usage
>>>>   libata: bump ->qc_active to a 64-bit type
>>>>   libata: use ata_tag_internal() consistently
>>>>   libata: remove assumption that ATA_MAX_QUEUE - 1 is the max
>>>>   sata_nv: set host can_queue count appropriately
>>>>   libata: add extra internal command
>>>
>>> Replying here because I can't find the original mail.
>>>
>>> The above commit is causing one of my machines to constantly spew ata
>>> messages on the console, according to bisect:
>>>
>>> # first bad commit: [28361c403683c2b00d4f5e76045f3ccd299bf99d] libata: add 
>>> extra internal command
>>>
>>> To get it to boot I have to also apply:
>>>
>>>   88e10092f6a6 ("sata_fsl: use the right type for tag bitshift")
>>>
>>>
>>> The system boots OK and seems fine, except that it's just printing
>>> multiple of these per second:
>>>
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>   ata2.00: configured for UDMA/100
>>>   ata2: Signature Update detected @ 0 msecs
>>>
>>> And it never seems to stop.
>>>
>>> The machine is a Freescale/NXP P5020ds, using the sata_fsl driver
>>> presumably. Any ideas?
>>
>> Hmm that's odd. Can you include the boot log from a working boot as
>> well? Would be nice to see what devices are on the sata adapter.
>> The above just looks like a hardreset loop.
> 
> Ah yep. I stupidly assumed it was working, because the machine booted,
> but that's because the root disk is on ata1.
> 
> Booting the good kernel:
> 
>   ba80c3a572f4 ("sata_nv: set host can_queue count appropriately")
> 
> I see:
> 
>   root@p5020ds:~# ls -l /sys/class/ata_port/
>   total 0
>   lrwxrwxrwx 1 root root 0 Jun 19 06:49 ata1 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/ata_port/ata1
>   lrwxrwxrwx 1 root root 0 Jun 19 17:06 ata2 -> 
> ../../devices/platform/ffe00.soc/ffe221000.sata/ata2/ata_port/ata2
> 
>   root@p5020ds:~# ls -l /sys/class/block/ | grep ata
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda1 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda2 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sda5 -> 
> ../../devices/platform/ffe00.soc/ffe22.sata/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda5
>   lrwxrwxrwx 1 root root 0 Jun 19 17:11 sr0 -> 
> ../../devices/platform/ffe00.soc/ffe221000.sata/ata2/host1/target1:0:0/1:0:0:0/block/sr0
> 
> So it's the DVD drive.
> 
>   
> root@p5020ds:/sys/devices/platform/ffe00.soc/ffe221000.sata/ata2/host1/target1:0:0/1:0:0:0/scsi_device/1:0:0:0/device#
>  cat vendor 
>   Optiarc 
>   
> root@p5020ds:/sys/devices/platform/ffe00.soc/ffe221000.sata/ata2/host1/target1:0:0/1:0:0:0/scsi_device/1:0:0:0/device#
>  cat model
>   DVD RW AD-7260S 
> 
> 
> Full boot log from a good boot attached if that's helpful.
> 
> All of the above looks the same when I boot with the broken setup, it
> just spams dmesg constantly.
> 
> One thing that is different, on the good kernel I see:
> 
>   root@p5020ds:~# mount /dev/sr0 /mnt
>   mount: no medium found on /dev/sr0
> 
> vs bad (88e10092f6a6):
> 
>   root@p5020ds:~# mount /dev/sr0 /mnt
>   mount: /dev/sr0 is already mounted or /mnt busy

Can you try the below patch, on both 4.17 and on current -git? Might
help shed some light 

Re: Constant ata messages on console with commit 28361c403683 ("libata: add extra internal command") (was Re: [GIT PULL 2/2] libata changes for v4.18-rc1)

2018-06-18 Thread Jens Axboe
On 6/18/18 1:33 AM, Michael Ellerman wrote:
> Tejun Heo  writes:
> ...
>> Jens Axboe (10):
>>   libata: introduce notion of separate hardware tags
>>   libata: convert core and drivers to ->hw_tag usage
>>   libata: bump ->qc_active to a 64-bit type
>>   libata: use ata_tag_internal() consistently
>>   libata: remove assumption that ATA_MAX_QUEUE - 1 is the max
>>   sata_nv: set host can_queue count appropriately
>>   libata: add extra internal command
> 
> Replying here because I can't find the original mail.
> 
> The above commit is causing one of my machines to constantly spew ata
> messages on the console, according to bisect:
> 
> # first bad commit: [28361c403683c2b00d4f5e76045f3ccd299bf99d] libata: add 
> extra internal command
> 
> To get it to boot I have to also apply:
> 
>   88e10092f6a6 ("sata_fsl: use the right type for tag bitshift")
> 
> 
> The system boots OK and seems fine, except that it's just printing
> multiple of these per second:
> 
>   ata2: Signature Update detected @ 0 msecs
>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>   ata2.00: configured for UDMA/100
>   ata2: Signature Update detected @ 0 msecs
>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>   ata2.00: configured for UDMA/100
>   ata2: Signature Update detected @ 0 msecs
>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>   ata2.00: configured for UDMA/100
>   ata2: Signature Update detected @ 0 msecs
>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>   ata2.00: configured for UDMA/100
>   ata2: Signature Update detected @ 0 msecs
> 
> And it never seems to stop.
> 
> The machine is a Freescale/NXP P5020ds, using the sata_fsl driver
> presumably. Any ideas?

Hmm that's odd. Can you include the boot log from a working boot as
well? Would be nice to see what devices are on the sata adapter.
The above just looks like a hardreset loop.

-- 
Jens Axboe



Re: Fwd: [powerpc/Baremetal]Kernel OOPS while executing memory hotplug on Power8 baremetal

2018-06-07 Thread Jens Axboe
On 6/7/18 8:45 AM, Jens Axboe wrote:
> On 6/7/18 4:37 AM, vrbagal1 wrote:
>> On 2018-06-07 13:12, Bart Van Assche wrote:
>>> On Thu, 2018-06-07 at 12:56 +0530, Venkat Rao B wrote:
>>>> On Thursday 07 June 2018 12:46 PM, Bart Van Assche wrote:
>>>>> On Thu, 2018-06-07 at 12:38 +0530, vrbagal1 wrote:
>>>>>> Observing Kernel oops and machine reboots while executing memory hotplug
>>>>>> test case, on Power8 Baremetal machine.
>>>>>>
>>>>>> I see this is introduced some where between rc6 and 4.17.
>>>>>
>>>>> Please provide the exact versions (git commit IDs) of the kernel versions
>>>>> you have tested.
>>>>
>>>> Commit Id ---> 5037be168f
>>>
>>> The reason I was asking for the commit ID is because I saw that 
>>> clone_endio()
>>> occurs in the oops which means that the dm driver is involved. An 
>>> important fix
>>> for the dm driver went upstream recently, namely d37753540568 ("dm: Use 
>>> kzalloc
>>> for all structs with embedded biosets/mempools"). Can you double check 
>>> whether
>>> that commit it present in your tree? If it is not present, please 
>>> update to the
>>> latest master and retest. If it is present, please report how to 
>>> reproduce
>>> this oops to Kent Overstreet, Jens Axboe, linux-block and Mike Snitzer.
>>>
>>> Thanks,
>>>
>>> Bart.
>>
>>
>> Yes, the fix is present in the tree, which I have tested.
>>
>> Steps to reproduce:
>>
>> Step1: Clone and Install avocado git clone 
>> https://github.com/avocado-framework/avocado.git
>> Step2: Clone 
>> https://github.com/avocado-framework-tests/avocado-misc-tests.git
>> Test case is 
>> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/memhotplug.py
>> Step3: Command to run the test is avocado run 
>> avocado-misc-tests/memory/memhotplug.py
> 
> Can you try with the below? Not a fully formed fix since I'd prefer
> if the dm bioset copy stuff was changed instead, but worth a shot.

This is closer to an actual fix, please try that instead.


diff --git a/block/bio.c b/block/bio.c
index 595663e0281a..0616d86b15c6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1967,6 +1967,21 @@ int bioset_init(struct bio_set *bs,
 }
 EXPORT_SYMBOL(bioset_init);
 
+int bioset_init_from_src(struct bio_set *new, struct bio_set *src)
+{
+   unsigned int pool_size = src->bio_pool.min_nr;
+   int flags;
+
+   flags = 0;
+   if (src->bvec_pool.min_nr)
+   flags |= BIOSET_NEED_BVECS;
+   if (src->rescue_workqueue)
+   flags |= BIOSET_NEED_RESCUER;
+
+   return bioset_init(new, pool_size, src->front_pad, flags);
+}
+EXPORT_SYMBOL(bioset_init_from_src);
+
 #ifdef CONFIG_BLK_CGROUP
 
 /**
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 98dff36b89a3..20a8d63754bf 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1953,9 +1953,10 @@ static void free_dev(struct mapped_device *md)
kvfree(md);
 }
 
-static void __bind_mempools(struct mapped_device *md, struct dm_table *t)
+static int __bind_mempools(struct mapped_device *md, struct dm_table *t)
 {
struct dm_md_mempools *p = dm_table_get_md_mempools(t);
+   int ret = 0;
 
if (dm_table_bio_based(t)) {
/*
@@ -1982,13 +1983,16 @@ static void __bind_mempools(struct mapped_device *md, 
struct dm_table *t)
   bioset_initialized(>bs) ||
   bioset_initialized(>io_bs));
 
-   md->bs = p->bs;
-   memset(>bs, 0, sizeof(p->bs));
-   md->io_bs = p->io_bs;
-   memset(>io_bs, 0, sizeof(p->io_bs));
+   ret = bioset_init_from_src(>bs, >bs);
+   if (ret)
+   goto out;
+   ret = bioset_init_from_src(>io_bs, >io_bs);
+   if (ret)
+   bioset_exit(>bs);
 out:
/* mempool bind completed, no longer need any mempools in the table */
dm_table_free_md_mempools(t);
+   return ret;
 }
 
 /*
@@ -2033,6 +2037,7 @@ static struct dm_table *__bind(struct mapped_device *md, 
struct dm_table *t,
struct request_queue *q = md->queue;
bool request_based = dm_table_request_based(t);
sector_t size;
+   int ret;
 
lockdep_assert_held(>suspend_lock);
 
@@ -2068,7 +2073,11 @@ static struct dm_table *__bind(struct mapped_device *md, 
struct dm_table *t,
md->immutable_target = dm_table_get_immutable_target(t);
}
 
-   __bind_mempools(md, t);
+   ret = __bind_mempools(md, t);
+   if (ret) {
+   old_map = ERR_PTR(ret);
+   g

Re: Fwd: [powerpc/Baremetal]Kernel OOPS while executing memory hotplug on Power8 baremetal

2018-06-07 Thread Jens Axboe
On 6/7/18 4:37 AM, vrbagal1 wrote:
> On 2018-06-07 13:12, Bart Van Assche wrote:
>> On Thu, 2018-06-07 at 12:56 +0530, Venkat Rao B wrote:
>>> On Thursday 07 June 2018 12:46 PM, Bart Van Assche wrote:
>>>> On Thu, 2018-06-07 at 12:38 +0530, vrbagal1 wrote:
>>>>> Observing Kernel oops and machine reboots while executing memory hotplug
>>>>> test case, on Power8 Baremetal machine.
>>>>>
>>>>> I see this is introduced some where between rc6 and 4.17.
>>>>
>>>> Please provide the exact versions (git commit IDs) of the kernel versions
>>>> you have tested.
>>>
>>> Commit Id ---> 5037be168f
>>
>> The reason I was asking for the commit ID is because I saw that 
>> clone_endio()
>> occurs in the oops which means that the dm driver is involved. An 
>> important fix
>> for the dm driver went upstream recently, namely d37753540568 ("dm: Use 
>> kzalloc
>> for all structs with embedded biosets/mempools"). Can you double check 
>> whether
>> that commit it present in your tree? If it is not present, please 
>> update to the
>> latest master and retest. If it is present, please report how to 
>> reproduce
>> this oops to Kent Overstreet, Jens Axboe, linux-block and Mike Snitzer.
>>
>> Thanks,
>>
>> Bart.
> 
> 
> Yes, the fix is present in the tree, which I have tested.
> 
> Steps to reproduce:
> 
> Step1: Clone and Install avocado git clone 
> https://github.com/avocado-framework/avocado.git
> Step2: Clone 
> https://github.com/avocado-framework-tests/avocado-misc-tests.git
> Test case is 
> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/memhotplug.py
> Step3: Command to run the test is avocado run 
> avocado-misc-tests/memory/memhotplug.py

Can you try with the below? Not a fully formed fix since I'd prefer
if the dm bioset copy stuff was changed instead, but worth a shot.


diff --git a/block/bio.c b/block/bio.c
index 595663e0281a..45bdee67d28b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1967,6 +1967,27 @@ int bioset_init(struct bio_set *bs,
 }
 EXPORT_SYMBOL(bioset_init);
 
+void bioset_move(struct bio_set *dst, struct bio_set *src)
+{
+   dst->bio_slab = src->bio_slab;
+   dst->front_pad = src->front_pad;
+   mempool_move(>bio_pool, >bio_pool);
+   mempool_move(>bvec_pool, >bvec_pool);
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+   mempool_move(>bio_integrity_pool, >bio_integrity_pool);
+   mempool_move(>bvec_integrity_pool, >bvec_integrity_pool);
+#endif
+   BUG_ON(!bio_list_empty(>rescue_list));
+   BUG_ON(work_pending(>rescue_work));
+   spin_lock_init(>rescue_lock);
+   bio_list_init(>rescue_list);
+   INIT_WORK(>rescue_work, bio_alloc_rescue);
+   dst->rescue_workqueue = src->rescue_workqueue;
+
+   memset(src, 0, sizeof(*src));
+}
+EXPORT_SYMBOL(bioset_move);
+
 #ifdef CONFIG_BLK_CGROUP
 
 /**
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 98dff36b89a3..87f636815baf 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1982,10 +1982,8 @@ static void __bind_mempools(struct mapped_device *md, 
struct dm_table *t)
   bioset_initialized(>bs) ||
   bioset_initialized(>io_bs));
 
-   md->bs = p->bs;
-   memset(>bs, 0, sizeof(p->bs));
-   md->io_bs = p->io_bs;
-   memset(>io_bs, 0, sizeof(p->io_bs));
+   bioset_move(>bs, >bs);
+   bioset_move(>io_bs, >io_bs);
 out:
/* mempool bind completed, no longer need any mempools in the table */
dm_table_free_md_mempools(t);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 810a8bee8f85..7581231dd0a3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -417,6 +417,7 @@ enum {
 extern int bioset_init(struct bio_set *, unsigned int, unsigned int, int 
flags);
 extern void bioset_exit(struct bio_set *);
 extern int biovec_init_pool(mempool_t *pool, int pool_entries);
+extern void bioset_move(struct bio_set *dst, struct bio_set *src);
 
 extern struct bio *bio_alloc_bioset(gfp_t, unsigned int, struct bio_set *);
 extern void bio_put(struct bio *);
diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 0c964ac107c2..20818919180c 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -47,6 +47,7 @@ extern int mempool_resize(mempool_t *pool, int new_min_nr);
 extern void mempool_destroy(mempool_t *pool);
 extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc;
 extern void mempool_free(void *element, mempool_t *pool);
+extern void mempool_move(mempool_t *dst, mempool_t *src);
 
 /*
  * A mempool_alloc_t and mempool_free_t that get the memory from
diff --git a/mm

Re: make a few block drivers highmem safe

2018-05-11 Thread Jens Axboe
On 5/9/18 7:59 AM, Christoph Hellwig wrote:
> Hi all,
> 
> this series converts a few random block drivers to be highmem safe,
> in preparation of eventually getting rid of the block layer bounce
> buffering support.

Applied, thanks.

-- 
Jens Axboe



Re: make a few block drivers highmem safe

2018-05-09 Thread Jens Axboe
On 5/9/18 7:59 AM, Christoph Hellwig wrote:
> Hi all,
> 
> this series converts a few random block drivers to be highmem safe,
> in preparation of eventually getting rid of the block layer bounce
> buffering support.

Series looks good to me.

-- 
Jens Axboe



Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Jens Axboe
On 09/18/2017 09:43 AM, Al Viro wrote:
> On Mon, Sep 18, 2017 at 05:39:47PM +0200, Christoph Hellwig wrote:
>> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
>>> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
>>> the time running xfstests as well.
>>
>> Dave insisted on it to decourage users/applications from mixing
>> mmap and direct I/O.
>>
>> In many ways a tracepoint might be the better way to diagnose these.
> 
> sysctl suppressing those two, perhaps?

I'd rather just make it a trace point, but don't care too much.

The code doesn't even have a comment as to why that WARN_ON() is
there or expected. Seems pretty sloppy to me, not a great way
to "discourage" users to mix mmap/dio.

-- 
Jens Axboe



Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Jens Axboe
On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
>> Hi,
>>
>> A warning is triggered from:
>>
>> file fs/iomap.c in function iomap_dio_rw
>>
>> if (ret)
>> goto out_free_dio;
>>
>> ret = invalidate_inode_pages2_range(mapping,
>> start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>>>>  WARN_ON_ONCE(ret);
>> ret = 0;
>>
>> inode_dio_begin(inode);
> 
> This is expected and an indication of a problematic workload - which
> may be triggered by a fuzzer.

If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
the time running xfstests as well.

-- 
Jens Axboe



Re: [PATCH 0/3] Minor updates for PS3

2017-08-08 Thread Jens Axboe
On 08/08/2017 04:16 AM, Michael Ellerman wrote:
> Geoff Levand <ge...@infradead.org> writes:
> 
>> Hi Michael,
>>
>> A few very minor updates for PS3.  Please apply.
> 
> Jens do you want to take the block ones, or should I just take the lot?

Up to you, I'm fine either way.

-- 
Jens Axboe



Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-08-01 Thread Jens Axboe
On 08/01/2017 12:55 AM, Michael Ellerman wrote:
> Jens Axboe <ax...@kernel.dk> writes:
> ...
>>
>> Can you try the below fix? Should be more palatable than the previous
>> one. Brian, maybe you can take a look at the IRQ issue mentioned above?
> 
> Given the patch from Brian fixed the lockdep warning, do you still want
> me to try and test this one?

Nope, we don't have to do that. I'd much rather just add a WARN_ON()
or similar to make sure we catch buggy users earlier. scsi_run_queue()
needs a

WARN_ON(in_interrupt());

but it might be better to put that in __blk_mq_run_hw_queue().

-- 
Jens Axboe



Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-07-28 Thread Jens Axboe
On 07/28/2017 09:13 AM, Bart Van Assche wrote:
> On Fri, 2017-07-28 at 08:25 -0600, Jens Axboe wrote:
>> On 07/28/2017 12:19 AM, Michael Ellerman wrote:
>>> OK, so the resolution is "fix it in IPR" ?
>>
>> I'll leave that to the SCSI crew. But at least one bug is in IPR, if you
>> look at the call trace:
>>
>> - timer function triggers, runs ipr_reset_timer_done(), which grabs the
>>   host lock AND disables interrupts.
>> - further down in the call path, ipr_ioa_bringdown_done() uncondtionally
>>   enables interrupts:
>>
>> spin_unlock_irq(ioa_cfg->host->host_lock);
>> scsi_unblock_requests(ioa_cfg->host);
>> spin_lock_irq(ioa_cfg->host->host_lock); 
>>
>> And the call to scsi_unblock_requests() is the one that ultimately runs
>> the queue. The IRQ issue aside here, scsi_unblock_requests() could run
>> the queue async, and we could retain the normal sync run otherwise.
>>
>> Can you try the below fix? Should be more palatable than the previous
>> one. Brian, maybe you can take a look at the IRQ issue mentioned above?
>>
>> [ ... ]
> 
> Hello Jens,
> 
> Are there other block drivers that can call blk_mq_start_hw_queues()
> from interrupt context? I'm currently working on converting the skd
> driver (drivers/block/skd_main.c) from a single queue block driver
> into a scsi-mq driver. The skd driver calls blk_start_queue() from
> interrupt context. As we know it is not safe to call
> blk_mq_start_hw_queues() from interrupt context.  Can you recommend me
> how I should proceed: should I implement a solution in the skd driver
> or should perhaps the blk-mq core be modified?

Great that you a converting that driver! If there's a need for it, we
could always expose the sync/async need in blk_mq_start_hw_queues().
>From a quick look at the driver, it's using start queue very liberally.
Would probably make sense to see which ones of those are actually
needed. For resource management, we've got better interfaces on the
blk-mq side, for instance.

Since this is a conversion, might make sense to not modify
blk_mq_start_hw_queues() and simply provide an alternative
blk_mq_start_hw_queues_async(). That will keep the conversion straight
forward. Then the next step could be to fixup skd, and then we could
drop the _async() variant again, hopefully.

-- 
Jens Axboe



Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-07-28 Thread Jens Axboe
On 07/28/2017 12:19 AM, Michael Ellerman wrote:
> Jens Axboe <ax...@kernel.dk> writes:
>> On 07/27/2017 08:47 AM, Bart Van Assche wrote:
>>> On Thu, 2017-07-27 at 08:02 -0600, Jens Axboe wrote:
>>>> The bug looks like SCSI running the queue inline from IRQ
>>>> context, that's not a good idea.
> ...
>>>
>>> scsi_run_queue() works fine if no scheduler is configured. Additionally, 
>>> that
>>> code predates the introduction of blk-mq I/O schedulers. I think it is
>>> nontrivial for block driver authors to figure out that a queue has to be run
>>> from process context if a scheduler has been configured that does not 
>>> support
>>> to be run from interrupt context.
>>
>> No it doesn't, you could never run the queue from interrupt context with
>> async == false. So I don't think that's confusing at all, you should
>> always be aware of the context.
>>
>>> How about adding WARN_ON_ONCE(in_interrupt()) to
>>> blk_mq_start_hw_queue() or replacing the above patch by the following:
>>
>> No, I hate having dependencies like that, because they always just catch
>> one of them. Looks like the IPR path that hits this should just offload
>> to a workqueue or similar, you don't have to make any scsi_run_queue()
>> async.
> 
> OK, so the resolution is "fix it in IPR" ?

I'll leave that to the SCSI crew. But at least one bug is in IPR, if you
look at the call trace:

- timer function triggers, runs ipr_reset_timer_done(), which grabs the
  host lock AND disables interrupts.
- further down in the call path, ipr_ioa_bringdown_done() uncondtionally
  enables interrupts:

spin_unlock_irq(ioa_cfg->host->host_lock);
scsi_unblock_requests(ioa_cfg->host);
spin_lock_irq(ioa_cfg->host->host_lock); 

And the call to scsi_unblock_requests() is the one that ultimately runs
the queue. The IRQ issue aside here, scsi_unblock_requests() could run
the queue async, and we could retain the normal sync run otherwise.

Can you try the below fix? Should be more palatable than the previous
one. Brian, maybe you can take a look at the IRQ issue mentioned above?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..dfb89596af81 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -481,13 +481,14 @@ static void scsi_starved_list_run(struct Scsi_Host *shost)
  * Purpose:Select a proper request queue to serve next
  *
  * Arguments:  q   - last request's queue
+ * async   - run queues async, if we need to
  *
  * Returns: Nothing
  *
  * Notes:  The previous command was completely finished, start
  * a new one if possible.
  */
-static void scsi_run_queue(struct request_queue *q)
+static void scsi_run_queue(struct request_queue *q, bool async)
 {
struct scsi_device *sdev = q->queuedata;
 
@@ -497,7 +498,7 @@ static void scsi_run_queue(struct request_queue *q)
scsi_starved_list_run(sdev->host);
 
if (q->mq_ops)
-   blk_mq_run_hw_queues(q, false);
+   blk_mq_run_hw_queues(q, async);
else
blk_run_queue(q);
 }
@@ -509,7 +510,7 @@ void scsi_requeue_run_queue(struct work_struct *work)
 
sdev = container_of(work, struct scsi_device, requeue_work);
q = sdev->request_queue;
-   scsi_run_queue(q);
+   scsi_run_queue(q, false);
 }
 
 /*
@@ -543,17 +544,22 @@ static void scsi_requeue_command(struct request_queue *q, 
struct scsi_cmnd *cmd)
blk_requeue_request(q, req);
spin_unlock_irqrestore(q->queue_lock, flags);
 
-   scsi_run_queue(q);
+   scsi_run_queue(q, true);
 
put_device(>sdev_gendev);
 }
 
-void scsi_run_host_queues(struct Scsi_Host *shost)
+static void __scsi_run_host_queues(struct Scsi_Host *shost, bool async)
 {
struct scsi_device *sdev;
 
shost_for_each_device(sdev, shost)
-   scsi_run_queue(sdev->request_queue);
+   scsi_run_queue(sdev->request_queue, async);
+}
+
+void scsi_run_host_queues(struct Scsi_Host *shost)
+{
+   __scsi_run_host_queues(shost, false);
 }
 
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
@@ -671,7 +677,7 @@ static bool scsi_end_request(struct request *req, 
blk_status_t error,
blk_finish_request(req, error);
spin_unlock_irqrestore(q->queue_lock, flags);
 
-   scsi_run_queue(q);
+   scsi_run_queue(q, false);
}
 
put_device(>sdev_gendev);
@@ -2293,7 +2299,7 @@ EXPORT_SYMBOL(scsi_block_requests);
 void scsi_unblock_requests(struct Scsi_Host *shost)
 {
shost->host_self_blocked = 0;
-   scsi_run_host_queues(shost);
+   __scsi_run_host_queues(shost, true);
 }
 EX

Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-07-27 Thread Jens Axboe
On 07/27/2017 08:47 AM, Bart Van Assche wrote:
> On Thu, 2017-07-27 at 08:02 -0600, Jens Axboe wrote:
>> The bug looks like SCSI running the queue inline from IRQ
>> context, that's not a good idea. Can you confirm the below works for
>> you?
>>
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6097b89d5d3..78740ebf966c 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -497,7 +497,7 @@ static void scsi_run_queue(struct request_queue *q)
>>  scsi_starved_list_run(sdev->host);
>>  
>>  if (q->mq_ops)
>> -blk_mq_run_hw_queues(q, false);
>> +blk_mq_run_hw_queues(q, true);
>>  else
>>  blk_run_queue(q);
>>  }
> 
> Hello Jens,
> 
> scsi_run_queue() works fine if no scheduler is configured. Additionally, that
> code predates the introduction of blk-mq I/O schedulers. I think it is
> nontrivial for block driver authors to figure out that a queue has to be run
> from process context if a scheduler has been configured that does not support
> to be run from interrupt context.

No it doesn't, you could never run the queue from interrupt context with
async == false. So I don't think that's confusing at all, you should
always be aware of the context.

> How about adding WARN_ON_ONCE(in_interrupt()) to
> blk_mq_start_hw_queue() or replacing the above patch by the following:

No, I hate having dependencies like that, because they always just catch
one of them. Looks like the IPR path that hits this should just offload
to a workqueue or similar, you don't have to make any scsi_run_queue()
async.

-- 
Jens Axboe



Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-07-27 Thread Jens Axboe
ivers/scsi/scsi_lib.c
index f6097b89d5d3..78740ebf966c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -497,7 +497,7 @@ static void scsi_run_queue(struct request_queue *q)
scsi_starved_list_run(sdev->host);
 
if (q->mq_ops)
-   blk_mq_run_hw_queues(q, false);
+   blk_mq_run_hw_queues(q, true);
else
blk_run_queue(q);
 }

-- 
Jens Axboe


-- 
Jens Axboe



Re: linux-next: build failure after merge of the block tree

2017-06-28 Thread Jens Axboe
On 06/28/2017 08:01 AM, Jens Axboe wrote:
> On 06/28/2017 06:43 AM, Jens Axboe wrote:
>> On 06/28/2017 02:04 AM, Stephen Rothwell wrote:
>>> Hi Jens,
>>>
>>> After merging the block tree, today's linux-next build (powerpc
>>> allnoconfig) failed like this:
>>>
>>> fs/fcntl.o: In function `do_fcntl': 
>>> fcntl.c:(.text+0x6d4): undefined reference to `__get_user_bad'
>>> fcntl.c:(.text+0x730): undefined reference to `__get_user_bad'
>>>
>>> Probably caused by commit
>>>
>>>   c75b1d9421f8 ("fs: add fcntl() interface for setting/getting write life 
>>> time hints")
>>>
>>> On powerpc (at least) you cannot use get_user() to fetch anything larger
>>> than "unsigned long" i.e. 32 bits on 32 bit powerpc.
>>>
>>> This has been discussed before (and, I think, a fix attempted).
>>
>> Gah, thanks for letting me know. I'll test your patch and queue it
>> up to fix this issue.
> 
> But put_user() is fine? Just checking here, since the change adds
> both a u64 put and get user.

I just changed all 4, at least that provides some symmetry in how
we copy things in and out for that set of fcntls.

-- 
Jens Axboe



Re: linux-next: build failure after merge of the block tree

2017-06-28 Thread Jens Axboe
On 06/28/2017 06:43 AM, Jens Axboe wrote:
> On 06/28/2017 02:04 AM, Stephen Rothwell wrote:
>> Hi Jens,
>>
>> After merging the block tree, today's linux-next build (powerpc
>> allnoconfig) failed like this:
>>
>> fs/fcntl.o: In function `do_fcntl': 
>> fcntl.c:(.text+0x6d4): undefined reference to `__get_user_bad'
>> fcntl.c:(.text+0x730): undefined reference to `__get_user_bad'
>>
>> Probably caused by commit
>>
>>   c75b1d9421f8 ("fs: add fcntl() interface for setting/getting write life 
>> time hints")
>>
>> On powerpc (at least) you cannot use get_user() to fetch anything larger
>> than "unsigned long" i.e. 32 bits on 32 bit powerpc.
>>
>> This has been discussed before (and, I think, a fix attempted).
> 
> Gah, thanks for letting me know. I'll test your patch and queue it
> up to fix this issue.

But put_user() is fine? Just checking here, since the change adds
both a u64 put and get user.

-- 
Jens Axboe



Re: linux-next: build failure after merge of the block tree

2017-06-28 Thread Jens Axboe
On 06/28/2017 02:04 AM, Stephen Rothwell wrote:
> Hi Jens,
> 
> After merging the block tree, today's linux-next build (powerpc
> allnoconfig) failed like this:
> 
> fs/fcntl.o: In function `do_fcntl': 
> fcntl.c:(.text+0x6d4): undefined reference to `__get_user_bad'
> fcntl.c:(.text+0x730): undefined reference to `__get_user_bad'
> 
> Probably caused by commit
> 
>   c75b1d9421f8 ("fs: add fcntl() interface for setting/getting write life 
> time hints")
> 
> On powerpc (at least) you cannot use get_user() to fetch anything larger
> than "unsigned long" i.e. 32 bits on 32 bit powerpc.
> 
> This has been discussed before (and, I think, a fix attempted).

Gah, thanks for letting me know. I'll test your patch and queue it
up to fix this issue.

-- 
Jens Axboe



Re: [linux-next][bock] WARNING: CPU: 22 PID: 0 at block/blk-core.c:2655 .blk_update_request+0x4f8/0x500

2017-05-08 Thread Jens Axboe
On 05/08/2017 01:13 AM, Abdul Haleem wrote:
> On Fri, 2017-05-05 at 08:02 -0600, Jens Axboe wrote:
>> On 05/05/2017 12:25 AM, Abdul Haleem wrote:
>>> Hi,
>>>
>>> 4.11.0 Linus mainline booted with Warnings on PowerPC.
>>>
>>> We did not see this on next-20170407 but on next-20170410 and later.
>>
>> Have you tried current Linus -git? Both of the -next versions you list
>> are rather old.
>>
> 
> Hi Jens, 
> 
> Warning is still seen with next-20170505 and also with today's mainline.
> 
> It was first seen on next-20170410, so the last good was next-20170407.

The log between the known good and first bad version, condensed a bit for
primary suspects, is below.

Christoph Hellwig (4):
  sd: split sd_setup_discard_cmnd
  sd: implement REQ_OP_WRITE_ZEROES
  sd: implement unmapping Write Zeroes
  block: remove the discard_zeroes_data flag

Martin K. Petersen (2):
  scsi: sd: Separate zeroout and discard command choices
  scsi: sd: Remove LBPRZ dependency for discards

Christoph Hellwig (7):
  block: implement splitting of REQ_OP_WRITE_ZEROES bios
  block: stop using blkdev_issue_write_same for zeroing
  block: add a flags argument to (__)blkdev_issue_zeroout
  block: add a REQ_NOUNMAP flag for REQ_OP_WRITE_ZEROES
  block: add a new BLKDEV_ZERO_NOFALLBACK flag
  block: stop using discards for zeroing
  block: remove the discard_zeroes_data flag

Christoph, Martin - any ideas? Trace from Abdul below.

WARNING: CPU: 12 PID: 0 at block/blk-core.c:2651 .blk_update_request+0x4cc/0x4e0
Modules linked in: sg(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) 
sunrpc(E) binfmt_misc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) 
ibmvscsi(E) scsi_transport_srp(E) ibmveth(E)
CPU: 12 PID: 0 Comm: swapper/12 Tainted: GE   4.11.0-autotest #1
task: c009f455ee80 task.stack: c009fb2e8000
NIP: c050bd1c LR: c050b8ec CTR: c05114b0
REGS: c013fff73740 TRAP: 0700   Tainted: GE(4.11.0-autotest)
MSR: 80029032 <SF,EE,ME,IR,DR,RI>
  CR: 48042048  XER: 0001
CFAR: c050bb34 SOFTE: 1 
GPR00: c050b8ec c013fff739c0 c1389c00 c009eca9c800
GPR04:   0001 0060 
GPR08: 00067887  c009eca9c800 de5f7e30 
GPR12: 88044044 ce9f6c00 c009fb2ebf90 00200042 
GPR16: 9367 c013fff7  c0df4100 
GPR20: c13c3b00 c0df4100  0005 
GPR24: 2ee0 c17789f8   
GPR28:  c38ba400  c009eca9c800 
NIP [c050bd1c] .blk_update_request+0x4cc/0x4e0
LR [c050b8ec] .blk_update_request+0x9c/0x4e0
Call Trace:
[c013fff739c0] [c050b8ec] .blk_update_request+0x9c/0x4e0 
(unreliable)
[c013fff73a60] [c06b06fc] .scsi_end_request+0x4c/0x240
[c013fff73b10] [c06b4564] .scsi_io_completion+0x1d4/0x6c0
[c013fff73be0] [c06a8cd0] .scsi_finish_command+0x100/0x1b0
[c013fff73c70] [c06b3978] .scsi_softirq_done+0x188/0x1e0
[c013fff73d00] [c0516b44] .blk_done_softirq+0xc4/0xf0
[c013fff73d90] [c00daef8] .__do_softirq+0x158/0x3b0
[c013fff73e90] [c00db5b8] .irq_exit+0x1a8/0x1c0
[c013fff73f10] [c0014f84] .__do_irq+0x94/0x1f0
[c013fff73f90] [c0026cbc] .call_do_irq+0x14/0x24
[c009fb2eb7f0] [c001516c] .do_IRQ+0x8c/0x100
[c009fb2eb890] [c0008bf4] hardware_interrupt_common+0x114/0x120
--- interrupt: 501 at .plpar_hcall_norets+0x14/0x20
LR = .check_and_cede_processor+0x24/0x40
[c009fb2ebb80] [0002] 0x2 (unreliable)
[c009fb2ebbf0] [c07c360c] .dedicated_cede_loop+0x4c/0x150
[c009fb2ebc70] [c07c1040] .cpuidle_enter_state+0xb0/0x3b0
[c009fb2ebd20] [c012d1bc] .call_cpuidle+0x3c/0x70
[c009fb2ebd90] [c012d550] .do_idle+0x280/0x2e0
[c009fb2ebe50] [c012d768] .cpu_startup_entry+0x28/0x40
[c009fb2ebed0] [c00428a4] .start_secondary+0x304/0x350
[c009fb2ebf90] [c000aa6c] start_secondary_prolog+0x10/0x14
Instruction dump:
3f82ff90 3b9cc190 4bfffd8c 3f82ff90 3b9cc1a8 4bfffd80 61290040 b13f0018
4bfffbd4 3cc2ff8b 38c63160 4bfffd9c <0fe0> 4bfffe18 6000 6000 
---[ end trace 0f80359f8fb9c5f4 ]---
EXT4-fs (sda3): Delayed block allocation failed for inode 11011467 at logical 
offset 0 with max blocks 7 with error 121
EXT4-fs (sda3): This should not happen!! Data will be lost

-- 
Jens Axboe



Re: [linux-next][bock] WARNING: CPU: 22 PID: 0 at block/blk-core.c:2655 .blk_update_request+0x4f8/0x500

2017-05-05 Thread Jens Axboe
On 05/05/2017 12:25 AM, Abdul Haleem wrote:
> Hi,
> 
> 4.11.0 Linus mainline booted with Warnings on PowerPC.
> 
> We did not see this on next-20170407 but on next-20170410 and later.

Have you tried current Linus -git? Both of the -next versions you list
are rather old.

-- 
Jens Axboe



Re: 4.11.0-rc1 boot resulted in WARNING: CPU: 14 PID: 1722 at fs/sysfs/dir.c:31 .sysfs_warn_dup+0x78/0xb0

2017-03-11 Thread Jens Axboe
On 03/09/2017 05:59 AM, Brian Foster wrote:
> cc linux-block
> 
> On Thu, Mar 09, 2017 at 04:20:06PM +0530, Abdul Haleem wrote:
>> On Wed, 2017-03-08 at 08:17 -0500, Brian Foster wrote:
>>> On Tue, Mar 07, 2017 at 10:01:04PM +0530, Abdul Haleem wrote:
>>>>
>>>> Hi,
>>>>
>>>> Today's mainline (4.11.0-rc1) booted with warnings on Power7 LPAR.
>>>>
>>>> Issue is not reproducible all the time.

Is that still the case with -git as of yesterday? Check that you
have this merge:

34bbce9e344b47e8871273409632f525973afad4

in your tree.

-- 
Jens Axboe



Re: [PATCH] axonram: Fix gendisk handling

2017-03-08 Thread Jens Axboe
On 03/08/2017 06:56 AM, Jan Kara wrote:
> It is invalid to call del_gendisk() when disk->queue is NULL. Fix error
> handling in axon_ram_probe() to avoid doing that.
> 
> Also del_gendisk() does not drop a reference to gendisk allocated by
> alloc_disk(). That has to be done by put_disk(). Add that call where
> needed.

Applied, thanks.

-- 
Jens Axboe



Re: [PATCH] direct-io: don't introduce another read of inode->i_blkbits

2017-01-10 Thread Jens Axboe
On 01/09/2017 10:27 PM, Chandan Rajendra wrote:
> On Monday, January 09, 2017 04:42:58 PM Jeff Moyer wrote:
>> Commit 20ce44d545844 ("do_direct_IO: Use inode->i_blkbits to compute
>> block count to be cleaned") introduced a regression: if the block size
>> of the block device is changed while a direct I/O request is being
>> setup, it can result in a panic.  See commit ab73857e354ab ("direct-io:
>> don't read inode->i_blkbits multiple times") for the reasoning, and
>> commit b87570f5d3496 ("Fix a crash when block device is read and block
>> size is changed at the same time") for a more detailed problem
>> description and reproducer.
>>
>> Fixes: 20ce44d545844
>> Signed-off-by: Jeff Moyer <jmo...@redhat.com>
>>
>> ---
>> Chandan, can you please test this to ensure this still fixes your problem?
> 
> This patch fixes the failure,
> 
> Tested-by: Chandan Rajendra <chan...@linux.vnet.ibm.com>
> 

I've updated the patch, thanks guys.

-- 
Jens Axboe



Re: [PATCH] direct-io: don't introduce another read of inode->i_blkbits

2017-01-09 Thread Jens Axboe
On 01/09/2017 02:42 PM, Jeff Moyer wrote:
> Commit 20ce44d545844 ("do_direct_IO: Use inode->i_blkbits to compute
> block count to be cleaned") introduced a regression: if the block size
> of the block device is changed while a direct I/O request is being
> setup, it can result in a panic.  See commit ab73857e354ab ("direct-io:
> don't read inode->i_blkbits multiple times") for the reasoning, and
> commit b87570f5d3496 ("Fix a crash when block device is read and block
> size is changed at the same time") for a more detailed problem
> description and reproducer.
> 
> Fixes: 20ce44d545844
> Signed-off-by: Jeff Moyer <jmo...@redhat.com>
> 
> ---
> Chandan, can you please test this to ensure this still fixes your problem?

Please, that would be great. And if so, then let's fold the two.

-- 
Jens Axboe



Re: [PATCH] do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned

2017-01-09 Thread Jens Axboe
On 01/08/2017 07:47 AM, Chandan Rajendra wrote:
> The code currently uses sdio->blkbits to compute the number of blocks to
> be cleaned. However sdio->blkbits is derived from the logical block size
> of the underlying block device (Refer to the definition of
> do_blockdev_direct_IO()). Due to this, generic/299 test would rarely
> fail when executed on an ext4 filesystem with 64k as the block size and
> when using a virtio based disk (having 512 byte as the logical block
> size) inside a kvm guest.
> 
> This commit fixes the bug by using inode->i_blkbits to compute the
> number of blocks to be cleaned.

Thanks, added for 4.10.

-- 
Jens Axboe



Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

2017-01-04 Thread Jens Axboe
On 01/04/2017 08:28 AM, Theodore Ts'o wrote:
> On Wed, Jan 04, 2017 at 11:32:42AM +0530, Chandan Rajendra wrote:
>> On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
>>> I'm consistently seeing ext4 filesystem corruption using a mainline
>>> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
>>> cloud image, boot it in KVM and run:
>>>
>>> sudo apt-get update
>>> sudo apt-get dist-upgrade
>>> sudo reboot
>>>
>>> And it never makes it back up, dying with rather severe filesystem
>>> corruption.
>>
>> The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
>> bug.
> 
> It looks like this patch is already queued up on the "for-linus"
> branch on the linux-block.git tree.
> 
> Chandra, thanks for pointing this out!  I had missed your e-mail from
> Christmas day, and it was on my todo list to figure out why I was
> seeing lots of 1k block regressions on gce-xfstests post-merge window
> that wasn't showing up on the ext4.git tree before I sent my pull
> request to Linus.
> 
> Jens, could you expedite a pull request to Linus?  This is affecting
> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
> regression.  

Yes, it'll go out this morning.

-- 
Jens Axboe



Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

2017-01-04 Thread Jens Axboe
On 01/03/2017 10:18 PM, Anton Blanchard wrote:
> Hi,
> 
> I'm consistently seeing ext4 filesystem corruption using a mainline
> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> cloud image, boot it in KVM and run:
> 
> sudo apt-get update
> sudo apt-get dist-upgrade
> sudo reboot
> 
> And it never makes it back up, dying with rather severe filesystem
> corruption.
> 
> I've narrowed it down to:
> 
> 64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
> e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
> ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")
> 
> Backing these patches out fixes the issue.

Fix is going out today, I see Chandan already pointed you at it. For the
other reporter, it's not an LE vs BE thing, it's a fs blocksize < page
size problem.

-- 
Jens Axboe



Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-12-04 Thread Jens Axboe
 that should fix it. You need commit a88d32af18b8 or newer.


--
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Jens Axboe

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


We only support merging on REQ_TYPE_FS already, so how is the above 
making it any different? In general, NOMERGE being set or not should not 
make a difference. It's only a hint that we need not check further if we 
should be merging on this request, since we already tried it once, found 
we'd exceed various limits, then set NOMERGE to reflect that.



--
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Jens Axboe

On 11/25/2015 12:10 PM, Hannes Reinecke wrote:

On 11/25/2015 06:56 PM, Jens Axboe wrote:

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


We only support merging on REQ_TYPE_FS already, so how is the above
making it any different? In general, NOMERGE being set or not should not
make a difference. It's only a hint that we need not check further if we
should be merging on this request, since we already tried it once, found
we'd exceed various limits, then set NOMERGE to reflect that.


The problem is that NOMERGE does too much, as it inhibits _any_ merging.


Right, that is the point of the flag from the block layer view, where it 
was originally added for the case mentioned.



Unfortunately, the req->nr_phys_segments value is evaluated in the final
_driver_ context _after_ the merging happend; cf
scsi_lib.c:scsi_init_sgtable().
As nr_phys_segments is inherited from the original request (and never
recalculated with the new request queue limits) the following
blk_rq_map_sg() call might end up at a different calculation, especially
after retrying a request on another path.


That all sounds pretty horrible. Why is blk_rq_check_limits() checking 
for mergeable at all? If merging is disabled on the request, I'm 
assuming that's an attempt at an optimization since we know it won't 
change. But that should be tracked separately, like how it's done on the 
bio.



--
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] block/ps3vram: Remove obsolete reference to MTD

2015-06-11 Thread Jens Axboe

On 06/11/2015 07:04 AM, Geert Uytterhoeven wrote:

On Wed, Jun 10, 2015 at 8:00 PM, Geert Uytterhoeven
ge...@linux-m68k.org did not write:

The ps3vram driver is a plain block device driver since commit
f507cd22035fdadd5dbb476dd05e9e7ee21c3b84 (ps3/block: Replace mtd/ps3vram
by block/ps3vram).

Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
Signed-off-by: Geoff Levand ge...@infradead.org


For the record: while I did send out this patch back in 2013, I didn't resend it
yesterday.

Geoff:
   1. Please fix your scripts (start using git-send-email?), to avoid sending
  fake email. Thanks!


Indeed, 2/3 of these patches ended up in my spam.


   2. I am grateful you didn't forget about this patch, and that it got applied.


I'll add:

3. Your git base should be the base of where you want this to be pulled 
in, or prior. Yours was 4.1-rc7, and I'm not pulling in tons of 
unrelated changes for 3 patches. If you are targeting the next release 
and don't depend on existing block changes, then your base should be 
4.0. If you are dependent on changes in the block tree, the block tree 
should be your base.


--
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] block/ps3vram: Minor updates and fixes

2015-06-10 Thread Jens Axboe

On 06/10/2015 12:00 PM, Geoff Levand wrote:

Hi Jens,

Here are a few minor updates for the ps3vram driver.  The third patch adds me
as co-maintainer of the driver, which I think is fitting as I have been
maintaining it for the last few years and expect I would be involved in any
future inquiries regarding it.

Please apply, thanks.


Applied to for-4.2/drivers, thanks.

--
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] block/ps3: Remove obsolete reference to MTD

2014-01-02 Thread Jens Axboe
On 01/02/2014 11:46 AM, Geoff Levand wrote:
 On Mon, 2013-12-30 at 10:07 +0100, Geert Uytterhoeven wrote:
 The ps3vram driver is a plain block device driver since commit
 f507cd22035fdadd5dbb476dd05e9e7ee21c3b84 (ps3/block: Replace mtd/ps3vram
 by block/ps3vram).

 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
 ---
  drivers/block/ps3vram.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
 index 06a2e53e5f37..313ee641ea10 100644
 --- a/drivers/block/ps3vram.c
 +++ b/drivers/block/ps3vram.c
 @@ -1,5 +1,5 @@
  /*
 - * ps3vram - Use extra PS3 video ram as MTD block device.
 + * ps3vram - Use extra PS3 video ram as block device.
   *
   * Copyright 2009 Sony Corporation
   *
 
 Looks good.
 
 Jens, could you please apply.
 
 Acked-by: Geoff Levand ge...@infradead.org

Yup will do, thanks.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/7] powerpc: remove the legacy iSeries platform specific drivers

2012-03-08 Thread Jens Axboe
On 03/08/2012 05:28 AM, Stephen Rothwell wrote:
 It would probably be easiest if all these patches were merged via the
 powerpc tree to eliminate any dependencies between them.  Their impact on
 generic code is very small.

Go ahead, no point in doing otherwise.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] block/swim3: Locking fixes

2011-12-12 Thread Jens Axboe
On 2011-12-12 05:57, Benjamin Herrenschmidt wrote:
 The old PowerMac swim3 driver has some interesting locking issues,
 using a private lock and failing to lock the queue before completing
 requests, which triggered WARN_ONs among others.
 
 This rips out the private lock, makes everything operate under the
 block queue lock, and generally makes things simpler.
 
 We used to also share a queue between the two possible instances which
 was problematic since we might pick the wrong controller in some cases,
 so make the queue and the current request per-instance and use
 queuedata to point to our private data which is a lot cleaner.
 
 We still share the queue lock but then, it's nearly impossible to actually
 use 2 swim3's simultaneously: one would need to have a Wallstreet
 PowerBook, the only machine afaik with two of these on the motherboard,
 and populate both hotswap bays with a floppy drive (the machine ships
 only with one), so nobody cares...
 
 While at it, add a little fix to clear up stale interrupts when loading
 the driver or plugging a floppy drive in a bay.

Applied for current for-linus branch.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [regression] 2.6.39-rc[1-3] fail to boot on G5 PowerMac

2011-04-17 Thread Jens Axboe
On 2011-04-17 05:16, Hugh Dickins wrote:
 On Thu, Apr 14, 2011 at 2:54 PM, Benjamin Herrenschmidt
 b...@kernel.crashing.org wrote:
 On Thu, 2011-04-14 at 14:25 -0700, Hugh Dickins wrote:

 Something worth trying: turn off CONFIG_IDE.  That's what I need to
 boot 2.6.39-rc[1-3] on PowerPC G5.

 I know Jens has been fixing problems with IDE versus his plug/unplug
 changes, but it's still not fixed for me in rc3.

 In other mail http://lkml.org/lkml/2011/4/14/614
 I see Linus recommending his post-rc3 commit 6631e635c65d
 but I've not tried that myself yet.

 Well, the disk is SATA so it's CONFIG_ATA/libata, which works fine here,
 unless you somewhat replaced your CD-ROM with a legacy IDE disk :-) Or
 maybe the problem is related to the CD-ROM drive. There's a libata
 driver for it nowadays, so you can use PATA_MACIO instead of IDE_PMAC
 
 Thanks for that, Ben: I remember you were brewing up such a driver,
 but I missed when it actually went in.  I can confirm that switching
 off CONFIG_IDE and switching on CONFIG_PATA_MACIO and
 CONFIG_BLK_DEV_SR now gives me a booting system with a working CD-ROM
 (though I've not yet tried burning).
 
 Whereas CONFIG_IDE=y with current git still does not boot: hangs for a
 minute or three around the windfarm announcements, then an endless
 splurge of hda error messages - sorry, I'm not being helpful, other
 worries...

It's the media event notification that goes crazy. Try and comment out
this line:

drive-dev_flags |= IDE_DFLAG_MEDIA_CHANGED;

in drivers/ide/ide-cd.c:cdrom_saw_media_change() and see if it boots.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [patch] ps3disk: passing wrong variable to bvec_kunmap_irq()

2010-10-17 Thread Jens Axboe
On 2010-10-16 21:39, Geert Uytterhoeven wrote:
 On Mon, Oct 11, 2010 at 21:42, Jens Axboe jax...@fusionio.com wrote:
 On 2010-10-11 21:13, Dan Carpenter wrote:
 This should pass buf to bvec_kunmap_irq() instead of bv.  The api is
 like kmap_atomic() instead of kmap().

 Signed-off-by: Dan Carpenter erro...@gmail.com

 diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
 index e9da874..03688c2 100644
 --- a/drivers/block/ps3disk.c
 +++ b/drivers/block/ps3disk.c
 @@ -113,7 +113,7 @@ static void ps3disk_scatter_gather(struct 
 ps3_storage_device *dev,
   memcpy(buf, dev-bounce_buf+offset, size);
   offset += size;
   flush_kernel_dcache_page(bvec-bv_page);
 - bvec_kunmap_irq(bvec, flags);
 + bvec_kunmap_irq(buf, flags);
   i++;
   }
  }

 Thanks applied, that bug is all too common.
 
 Because  bvec_kunmap_irq() is a macro if !CONFIG_HIGHMEM, and thus there's no
 argument type checking on e.g. pp64, which doesn't support HIGHMEM?

It's a generic problem, not isolated to this case. The problem is that
the API isn't symmetric, and the unmap parts take a void * pointer.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [patch] ps3disk: passing wrong variable to bvec_kunmap_irq()

2010-10-11 Thread Jens Axboe
On 2010-10-11 21:13, Dan Carpenter wrote:
 This should pass buf to bvec_kunmap_irq() instead of bv.  The api is
 like kmap_atomic() instead of kmap().
 
 Signed-off-by: Dan Carpenter erro...@gmail.com
 
 diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
 index e9da874..03688c2 100644
 --- a/drivers/block/ps3disk.c
 +++ b/drivers/block/ps3disk.c
 @@ -113,7 +113,7 @@ static void ps3disk_scatter_gather(struct 
 ps3_storage_device *dev,
   memcpy(buf, dev-bounce_buf+offset, size);
   offset += size;
   flush_kernel_dcache_page(bvec-bv_page);
 - bvec_kunmap_irq(bvec, flags);
 + bvec_kunmap_irq(buf, flags);
   i++;
   }
  }

Thanks applied, that bug is all too common.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Oops while running fs_racer test on a POWER6 box against latest git

2010-07-09 Thread Jens Axboe
On 2010-07-09 08:57, divya wrote:
 On Friday 02 July 2010 12:16 PM, divya wrote:
 On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
 On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
 While running fs_racer test from LTP on a POWER6 box against latest
 git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the 
 following
 warning followed by multiple oops.

 I created a Bugzilla entry at
 https://bugzilla.kernel.org/show_bug.cgi?id=16324
 for your bug report, please add your address to the CC list in there, 
 thanks!


 Here I find a cleaner back trace while running fs_racer test from LTP 
 on a POWER6
 box against the latest git(2.6.35-rc3-git5 - commitid 980019d74e4b242)

 Badness at kernel/mutex-debug.c:64
 BUG: key (null) not in .data!
 NIP: c00be9e8 LR: c00be9cc CTR: 
 REGS: c0010bb176f0 TRAP: 0700   Not tainted  
 (2.6.35-rc3-git5-autotest)
 BUG: key 01d8 not in .data!
 BUG: key 01e0 not in .data!
 BUG: key 01e8 not in .data!
 MSR: 80029032
 Unable to handle kernel paging request for data at address 0x0028
 Faulting instruction address: 0xc03ad0ec
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=1024 NUMA pSeries
 last sysfs file: 
 /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
 Page fault in user mode with in_atomic() = 1 mm = c0010943e600
 Modules linked in:
 NIP = fff9e98fc40  MSR = 80004001d032
  ipv6 fuse loop
 Unable to handle kernel paging request for unknown fault
  dm_mod
 Faulting instruction address: 0xc008d0f4
  sr_mod ibmveth cdrom sg sd_mod crc_t10dif ibmvscsic 
 scsi_transport_srp scsi_tgt scsi_mod
 NIP: c03ad0ec LR: c064c3b0 CTR: c03a6eb0
 REGS: c00109b4f610 TRAP: 0300   Not tainted  
 (2.6.35-rc3-git5-autotest)
 MSR: 80009032EE,ME,IR,DR   CR: 88004484  XER: 0001
 DAR: 0028, DSISR: 4001
 TASK = c00109a98600[7403] 'mkdir' THREAD: c00109b4c000 CPU: 19
 GPR00: 8013 c00109b4f890 c0d3d798 
 0028
 GPR04:    
 0001
 GPR08:  0028 c0189f2c 
 c00109a98600
 GPR12: 24004424 cf602f80 41ff 
 0001
 GPR16: 0002 c0010d8304c0 c00109b4fb44 
 
 GPR20: c0010df77908 f000 0001 
 41ff
 GPR24: c0010df77758 c00109fa1800 c0010df77908 
 c000ff236600
 GPR28: 0028 0040 c0ca7b38 
 c0189f2c
 NIP [c03ad0ec] .do_raw_spin_trylock+0x10/0x48
 LR [c064c3b0] ._raw_spin_lock+0x50/0xa4
 Call Trace:
 [c00109b4f890] [c064c3a4] ._raw_spin_lock+0x44/0xa4 
 (unreliable)
 [c00109b4f920] [c0189f2c] .new_inode+0x4c/0xe4
 [c00109b4f9b0] [c02257fc] .ext3_new_inode+0x84/0xb70
 [c00109b4fad0] [c022f1ec] .ext3_mkdir+0x130/0x438
 [c00109b4fbe0] [c017adb4] .vfs_mkdir+0xb8/0x160
 [c00109b4fc80] [c017e52c] .SyS_mkdirat+0xb0/0x114
 [c00109b4fdc0] [c017a730] .SyS_mkdir+0x1c/0x30
 [c00109b4fe30] [c00085b4] syscall_exit+0x0/0x40
 Instruction dump:
 eb41ffd0 7c0803a6 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020
 3800 7c691b78 980d0214 800d00087d601829  2c0b 40c20010 7c00192d
 Oops: Weird page fault, sig: 11 [#2]

 Pls let me know if this back trace would help in analyzing further.
 Meanwhile I shall do a git bisect and send the inputs.

 Thanks
 Divya



 Hi All,
 
  From the git bisect,seems like the commit
  57439f878afafefad8836ebf5c49da2a0a746105 is the corrupt for the above
  issue.

CC'ing Nick and Al.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Warning in block code

2009-06-16 Thread Jens Axboe
On Wed, Jun 17 2009, Benjamin Herrenschmidt wrote:
 Hoy !
 
 I see that:
 
 block/blk-settings.c: In function ???blk_set_default_limits???:
 block/blk-settings.c:115: warning: large integer implicitly truncated to 
 unsigned type
 
 Comes from
 
   lim-bounce_pfn = BLK_BOUNCE_ANY;
 
 With BLK_BOUNCE_ANY being a
 
 include/linux/blkdev.h:#define BLK_BOUNCE_ANY (-1ULL)
 
 And struct queue_limit.bounce_pfn is:
 
   unsigned long   bounce_pfn;
 
 (The warning is fishy as both quantities are unsigned, but there -is-
 truncation happening here).

Should be safe to just make it -1UL now.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 05/33] block: Add bio_list_peek()

2009-06-14 Thread Jens Axboe
On Mon, Jun 15 2009, Benjamin Herrenschmidt wrote:
 On Wed, 2009-06-10 at 16:38 +0200, Geert Uytterhoeven wrote:
  Introduce bio_list_peek(), to obtain a pointer to the first bio on the 
  bio_list
  without actually removing it from the list. This is needed when you want to
  serialize based on the list being empty or not.
 
 Leaving that one (and the next one) out for now until Jens Ack them.

You can add my acked-by, this one is trivial and a good addition.

 
 Cheers,
 Ben.
 
  Signed-off-by: Geert Uytterhoeven geert.uytterhoe...@sonycom.com
  Cc: Jens Axboe ax...@kernel.dk
  ---
   include/linux/bio.h |5 +
   1 files changed, 5 insertions(+), 0 deletions(-)
  
  diff --git a/include/linux/bio.h b/include/linux/bio.h
  index 7b214fd..618bb7d 100644
  --- a/include/linux/bio.h
  +++ b/include/linux/bio.h
  @@ -590,6 +590,11 @@ static inline void bio_list_merge_head(struct bio_list 
  *bl,
  bl-head = bl2-head;
   }
   
  +static inline struct bio *bio_list_peek(struct bio_list *bl)
  +{
  +   return bl-head;
  +}
  +
   static inline struct bio *bio_list_pop(struct bio_list *bl)
   {
  struct bio *bio = bl-head;
 

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [CFQ/OOPS] rb_erase with April 9 next tree

2009-04-09 Thread Jens Axboe
On Thu, Apr 09 2009, Sachin Sant wrote:
 I had Next 09 booted on a powerpc box and was compiling a kernel.
 That's when i ran into this oops.

 Unable to handle kernel paging request for data at address 0x0010.
 Faulting instruction address: 0xc02ee1b0...
 0:mon e
 cpu 0x0: Vector: 300 (Data Access) at [c000d6cf63c0]
pc: c02ee1b0: .rb_erase+0x16c/0x3b4
lr: c02e14d0: .cfq_prio_tree_add+0x58/0x120
sp: c000d6cf6640
   msr: 80009032
   dar: 10
 dsisr: 4000
  current = 0xc000fbdf5880
  paca= 0xc0a92300
   pid   = 1867, comm = ld
 0:mon t
 [c000d6cf66d0] c02e14d0 .cfq_prio_tree_add+0x58/0x120
 [c000d6cf6770] c02e16c8 .__cfq_slice_expired+0xc8/0x11c
 [c000d6cf6800] c02e3920 .cfq_insert_request+0x374/0x3f4
 [c000d6cf68a0] c02cf448 .elv_insert+0x234/0x348
 [c000d6cf6940] c02d3348 .__make_request+0x514/0x5b0
 [c000d6cf6a00] c02d1348 .generic_make_request+0x430/0x4c8
 [c000d6cf6b30] c02d14dc .submit_bio+0xfc/0x124
 [c000d6cf6bf0] c0156998 .submit_bh+0x14c/0x198
 [c000d6cf6c80] c015ba78 .block_read_full_page+0x394/0x40c
 [c000d6cf7180] c0163080 .do_mpage_readpage+0x680/0x688
 [c000d6cf7690] c0163200 .mpage_readpages+0x104/0x190
 [c000d6cf77f0] c01e2aac .ext3_readpages+0x28/0x40
 [c000d6cf7870] c00ebd20 .__do_page_cache_readahead+0x180/0x278
 [c000d6cf7960] c00ec16c .ondemand_readahead+0x1ac/0x1d8
 [c000d6cf7a00] c00e1f28 .generic_file_aio_read+0x260/0x6b0
 [c000d6cf7b40] c0129f74 .do_sync_read+0xcc/0x130
 [c000d6cf7ce0] c012af44 .vfs_read+0xd0/0x1bc
 [c000d6cf7d80] c012b138 .SyS_read+0x58/0xa0
 [c000d6cf7e30] c00084ac syscall_exit+0x0/0x40

Just ran into this myself, too. I'll pull that bad patch from -next
asap. I wont be able to fix this before next week.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [CFQ/OOPS] rb_erase with April 9 next tree

2009-04-09 Thread Jens Axboe
On Thu, Apr 09 2009, Jens Axboe wrote:
 On Thu, Apr 09 2009, Sachin Sant wrote:
  I had Next 09 booted on a powerpc box and was compiling a kernel.
  That's when i ran into this oops.
 
  Unable to handle kernel paging request for data at address 0x0010.
  Faulting instruction address: 0xc02ee1b0...
  0:mon e
  cpu 0x0: Vector: 300 (Data Access) at [c000d6cf63c0]
 pc: c02ee1b0: .rb_erase+0x16c/0x3b4
 lr: c02e14d0: .cfq_prio_tree_add+0x58/0x120
 sp: c000d6cf6640
msr: 80009032
dar: 10
  dsisr: 4000
   current = 0xc000fbdf5880
   paca= 0xc0a92300
pid   = 1867, comm = ld
  0:mon t
  [c000d6cf66d0] c02e14d0 .cfq_prio_tree_add+0x58/0x120
  [c000d6cf6770] c02e16c8 .__cfq_slice_expired+0xc8/0x11c
  [c000d6cf6800] c02e3920 .cfq_insert_request+0x374/0x3f4
  [c000d6cf68a0] c02cf448 .elv_insert+0x234/0x348
  [c000d6cf6940] c02d3348 .__make_request+0x514/0x5b0
  [c000d6cf6a00] c02d1348 .generic_make_request+0x430/0x4c8
  [c000d6cf6b30] c02d14dc .submit_bio+0xfc/0x124
  [c000d6cf6bf0] c0156998 .submit_bh+0x14c/0x198
  [c000d6cf6c80] c015ba78 .block_read_full_page+0x394/0x40c
  [c000d6cf7180] c0163080 .do_mpage_readpage+0x680/0x688
  [c000d6cf7690] c0163200 .mpage_readpages+0x104/0x190
  [c000d6cf77f0] c01e2aac .ext3_readpages+0x28/0x40
  [c000d6cf7870] c00ebd20 .__do_page_cache_readahead+0x180/0x278
  [c000d6cf7960] c00ec16c .ondemand_readahead+0x1ac/0x1d8
  [c000d6cf7a00] c00e1f28 .generic_file_aio_read+0x260/0x6b0
  [c000d6cf7b40] c0129f74 .do_sync_read+0xcc/0x130
  [c000d6cf7ce0] c012af44 .vfs_read+0xd0/0x1bc
  [c000d6cf7d80] c012b138 .SyS_read+0x58/0xa0
  [c000d6cf7e30] c00084ac syscall_exit+0x0/0x40
 
 Just ran into this myself, too. I'll pull that bad patch from -next
 asap. I wont be able to fix this before next week.

Can you see if this fixes it for you?

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e01b103..64de5c0 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1654,6 +1654,7 @@ retry:
}
 
RB_CLEAR_NODE(cfqq-rb_node);
+   RB_CLEAR_NODE(cfqq-p_node);
INIT_LIST_HEAD(cfqq-fifo);
 
atomic_set(cfqq-ref, 0);

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ps3vram: Replace mutex by spinlock + list

2009-04-03 Thread Jens Axboe
On Fri, Apr 03 2009, Geert Uytterhoeven wrote:
 Remove the mutex serializing access to the cache.
 Instead, queue up new requests on a list if the driver is busy.
 
 This improves sequential write performance by ca. 2%.

Great, precisely the method I wanted to see there! I've added your three
patches, thanks Geert.


-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Next March 25: Boot failure on powerpc [recursive locking detected]

2009-03-27 Thread Jens Axboe
On Fri, Mar 27 2009, Stephen Rothwell wrote:
 Hi all,
 
 On Thu, 26 Mar 2009 08:50:03 -0500 James Bottomley 
 james.bottom...@hansenpartnership.com wrote:
 
  On Thu, 2009-03-26 at 12:04 +0530, Sachin Sant wrote:
   Sachin Sant wrote:
Today's next failed to boot on a powerpc box
(Power6 blade IBM,7998-61X) with following recursive locking message.
   
=
[ INFO: possible recursive locking detected ]
2.6.29-next-20090325 #1
   After bisecting the failure seems to be because of the following
   patch from James ( block: move SCSI timeout check into block )
   
   http://patchwork.kernel.org/patch/8017/
   
   If i back out the above mentioned patch, the machine boots fine
   without any problems.
  
  Yes, that patch already got dropped for other reasons:
  
  http://marc.info/?t=12374077372
  
  I'm going to see if I can redo it in a better way, since moving this
  type of timeout checking from scsi to block is a useful generalisation.
 
 I will revert it from next-20090327 as well as it is still in the
 for-next branch of the block tree.

I'll update for-next, sorry about that. I had dropped it from
for-2.6.30, but forgot to update akpm/next branches.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-09 Thread Jens Axboe
On Mon, Mar 09 2009, Geert Uytterhoeven wrote:
 On Fri, 6 Mar 2009, Jens Axboe wrote:
  On Fri, Mar 06 2009, Geert Uytterhoeven wrote:
   On Fri, 6 Mar 2009, Jens Axboe wrote:
On Fri, Mar 06 2009, Geert Uytterhoeven wrote:
 On Fri, 6 Mar 2009, Jens Axboe wrote:
  On Thu, Mar 05 2009, Geert Uytterhoeven wrote:
   But then I noticed ps3vram_make_request() may be called 
   concurrently,
   so I had to add a mutex to avoid data corruption. This slows the
   driver down, and in the end, the version with a thread turns out 
   to be
   ca. 1% faster. The version without a thread is about 50 lines less
   code, though.
 
  That is correct, -make_request_fn may get reentered. I'm not 
  surprised
  that performance dropped if you just shoved everything under a 
  mutex.
  You could be a little more smart and queue concurrent bio's for
  processing when the current one is complete though, there are 
  several
  approaches there that be a lot faster than going all the way 
  through the
  IO stack and scheduler just to avoid concurrency.

 Yes, using a spinlock and queueing requests on a list if the driver is
 busy can be done after 2.6.29...
   
Certainly. Even just replacing your current mutex with a spinlock during
the memcpy() would surely be a lot faster. Or even just grabbing the
mutex before calling into the write for the duration of the bio. The way
you do it is certain context switch death :-)
  
   It's not just the memcpy(). ps3vram_{up,down}load() call msleep(), so
   I cannot use a spinlock.
 
  Ah right, I hadn't looked close enough. But putting the mutex_lock()
  outside of the bio_for_each_segment() is going to be much faster than
  getting/releasing it for each segment.
 
 It doesn't seem to make any measurable difference, so I'm gonna leave it for
 now.

It will depend on where the bio's are coming from. If they are all
single segment, then there will be no difference. If they contain
multiple segments, you reduce the lock/release by that amount.

But yeah, just leave it as-is for now. You can send a final patch for
inclusion though. Unless I'm mistaken, I only saw the original and then
an incremental patch for changing it to -make_request_fn?

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-09 Thread Jens Axboe
On Mon, Mar 09 2009, Jens Axboe wrote:
 On Mon, Mar 09 2009, Geert Uytterhoeven wrote:
  On Fri, 6 Mar 2009, Jens Axboe wrote:
   On Fri, Mar 06 2009, Geert Uytterhoeven wrote:
On Fri, 6 Mar 2009, Jens Axboe wrote:
 On Fri, Mar 06 2009, Geert Uytterhoeven wrote:
  On Fri, 6 Mar 2009, Jens Axboe wrote:
   On Thu, Mar 05 2009, Geert Uytterhoeven wrote:
But then I noticed ps3vram_make_request() may be called 
concurrently,
so I had to add a mutex to avoid data corruption. This slows the
driver down, and in the end, the version with a thread turns 
out to be
ca. 1% faster. The version without a thread is about 50 lines 
less
code, though.
  
   That is correct, -make_request_fn may get reentered. I'm not 
   surprised
   that performance dropped if you just shoved everything under a 
   mutex.
   You could be a little more smart and queue concurrent bio's for
   processing when the current one is complete though, there are 
   several
   approaches there that be a lot faster than going all the way 
   through the
   IO stack and scheduler just to avoid concurrency.
 
  Yes, using a spinlock and queueing requests on a list if the driver 
  is
  busy can be done after 2.6.29...

 Certainly. Even just replacing your current mutex with a spinlock 
 during
 the memcpy() would surely be a lot faster. Or even just grabbing the
 mutex before calling into the write for the duration of the bio. The 
 way
 you do it is certain context switch death :-)
   
It's not just the memcpy(). ps3vram_{up,down}load() call msleep(), so
I cannot use a spinlock.
  
   Ah right, I hadn't looked close enough. But putting the mutex_lock()
   outside of the bio_for_each_segment() is going to be much faster than
   getting/releasing it for each segment.
  
  It doesn't seem to make any measurable difference, so I'm gonna leave it for
  now.
 
 It will depend on where the bio's are coming from. If they are all
 single segment, then there will be no difference. If they contain
 multiple segments, you reduce the lock/release by that amount.
 
 But yeah, just leave it as-is for now. You can send a final patch for
 inclusion though. Unless I'm mistaken, I only saw the original and then
 an incremental patch for changing it to -make_request_fn?

There was a full version, my mistake. I got confused by the removal of
the old driver in another directory :-)

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-09 Thread Jens Axboe
On Mon, Mar 09 2009, Geert Uytterhoeven wrote:
 On Mon, 9 Mar 2009, Jens Axboe wrote:
  On Mon, Mar 09 2009, Jens Axboe wrote:
   On Mon, Mar 09 2009, Geert Uytterhoeven wrote:
On Fri, 6 Mar 2009, Jens Axboe wrote:
 On Fri, Mar 06 2009, Geert Uytterhoeven wrote:
  On Fri, 6 Mar 2009, Jens Axboe wrote:
   On Fri, Mar 06 2009, Geert Uytterhoeven wrote:
On Fri, 6 Mar 2009, Jens Axboe wrote:
 On Thu, Mar 05 2009, Geert Uytterhoeven wrote:
  But then I noticed ps3vram_make_request() may be called 
  concurrently,
  so I had to add a mutex to avoid data corruption. This 
  slows the
  driver down, and in the end, the version with a thread 
  turns out to be
  ca. 1% faster. The version without a thread is about 50 
  lines less
  code, though.

 That is correct, -make_request_fn may get reentered. I'm not 
 surprised
 that performance dropped if you just shoved everything under 
 a mutex.
 You could be a little more smart and queue concurrent bio's 
 for
 processing when the current one is complete though, there are 
 several
 approaches there that be a lot faster than going all the way 
 through the
 IO stack and scheduler just to avoid concurrency.
   
Yes, using a spinlock and queueing requests on a list if the 
driver is
busy can be done after 2.6.29...
  
   Certainly. Even just replacing your current mutex with a spinlock 
   during
   the memcpy() would surely be a lot faster. Or even just grabbing 
   the
   mutex before calling into the write for the duration of the bio. 
   The way
   you do it is certain context switch death :-)
 
  It's not just the memcpy(). ps3vram_{up,down}load() call msleep(), 
  so
  I cannot use a spinlock.

 Ah right, I hadn't looked close enough. But putting the mutex_lock()
 outside of the bio_for_each_segment() is going to be much faster than
 getting/releasing it for each segment.
   
It doesn't seem to make any measurable difference, so I'm gonna leave 
it for
now.
  
   It will depend on where the bio's are coming from. If they are all
   single segment, then there will be no difference. If they contain
   multiple segments, you reduce the lock/release by that amount.
  
   But yeah, just leave it as-is for now. You can send a final patch for
   inclusion though. Unless I'm mistaken, I only saw the original and then
   an incremental patch for changing it to -make_request_fn?
 
  There was a full version, my mistake. I got confused by the removal of
 
 Indeed.
 
  the old driver in another directory :-)
 
 Can you please ack it? Thx!

Sure, I thought we had agreed to queue it up for 2.6.29?

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-06 Thread Jens Axboe
On Fri, Mar 06 2009, Geert Uytterhoeven wrote:
 On Fri, 6 Mar 2009, Jens Axboe wrote:
  On Thu, Mar 05 2009, Geert Uytterhoeven wrote:
   But then I noticed ps3vram_make_request() may be called concurrently,
   so I had to add a mutex to avoid data corruption. This slows the
   driver down, and in the end, the version with a thread turns out to be
   ca. 1% faster. The version without a thread is about 50 lines less
   code, though.
 
  That is correct, -make_request_fn may get reentered. I'm not surprised
  that performance dropped if you just shoved everything under a mutex.
  You could be a little more smart and queue concurrent bio's for
  processing when the current one is complete though, there are several
  approaches there that be a lot faster than going all the way through the
  IO stack and scheduler just to avoid concurrency.
 
 Yes, using a spinlock and queueing requests on a list if the driver is
 busy can be done after 2.6.29...

Certainly. Even just replacing your current mutex with a spinlock during
the memcpy() would surely be a lot faster. Or even just grabbing the
mutex before calling into the write for the duration of the bio. The way
you do it is certain context switch death :-)


-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Fix Xilinx SystemACE driver to handle empty CF slot

2009-03-06 Thread Jens Axboe
On Fri, Mar 06 2009, Grant Likely wrote:
 Oops, sorry Jens.  I forgot to CC: you on this patch.
 
 g.
 
 On Sat, Feb 28, 2009 at 1:46 PM, Grant Likely grant.lik...@secretlab.ca 
 wrote:
  From: Grant Likely grant.lik...@secretlab.ca
 
  The SystemACE driver does not handle an empty CF slot gracefully.  An
  empty CF slot ends up hanging the system.  This patch adds a check for
  the CF state and stops trying to process requests if the slot is empty.

So with patches like this, it's always nice to know what your target is.
Do you want this in .29, or just queued up for .30? It's not always easy
to judge the urgency of such patches :-)

Also, I note that you are using end_request() throughout the driver. We
really want to get away from that, you should be using blk_end_request()
as that will handle full requests and not just segment-by-segment. No
worries for this patch, but you may want to consider that for a future
patch.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-05 Thread Jens Axboe
On Wed, Mar 04 2009, Geert Uytterhoeven wrote:
   Hi,
 
 Below is the rewrite of the PS3 Video RAM Storage Driver as a plain block
 device, as requested by Arnd Bergmann.
 
 The MTD-based PS3 Video RAM Storage Driver was integrated into the mainline
 kernel in 2.6.29-rc1.
 
 Ideally, we think it would be best if the existing MTD-based ps3vram driver
 would be replaced by the new block-based ps3vram driver before 2.6.29 is
 released. This would relieve the burden of supporting two different swap space
 schemes on PS3 (swap on /dev/mtdblock0 vs. /dev/ps3vram) from the distro
 maintainer's shoulders, as in that case there would never have been a stable
 kernel version containing the MTD-based ps3vram driver.
 
 What do you think? If this is accepted, I'll submit a patch to remove the MTD
 ps3vram and add the new driver as ps3vram (instead of ps3vram-ng).
 
 Thanks for your (review and other) comments!

I'd rewrite this as a -make_request_fn handler instead. Then you can
get rid of the kernel thread. IOW, change

queue = blk_init_queue(ps3vram_request, priv-lock);

to

queue = blk_alloc_queue(GFP_KERNEL);
blk_queue_make_request(queue, ps3vram_make_request);

Add error handling of course, and call blk_queue_max_*() to set your
limits for this device. Then add a ps3vram_make_request() ala:

static void ps3vram_do_request(struct request_queue *q, struct bio *bio)
{
struct ps3_system_bus_device *dev = q-queuedata;
struct ps3vram_priv *priv = dev-core.driver_data;
int write, res, err = -EIO;
struct bio_vec *bv;
sector_t sector;
loff_t offset;
size_t len, retlen;
int i;

write = bio_data_dir(bio) == WRITE;

sector = bio-bi_sector;
bio_for_each_segment(bv, bio, i) {
char *ptr = page_address(bv-bv_page) + bv-bv_offset;

len = bv-bv_len;
offset = sector  9;

if (write)
res = ps3vram_write(dev, offset, len, retlen, ptr);
else
res = ps3vram_read(dev, offset, len, retlen, ptr);

if (res) {
dev_err(dev-core, %s failed\n, op);
goto out;
}

if (retlen != len) {
dev_err(dev-core, Short %s\n, op);
goto out;
}
sector += (len  9);
}

dev_dbg(dev-core, %s completed\n, op);
err = 0;
out:
bio_endio(bio, err);
}

I just typed it here, so if it doesn't compile you get to keep the
pieces :-)

Since ps3 is very RAM limited, I didn't bother with any highmem mapping
for the bio, since I gather that isn't an issue on that platform. You
may want to detail that in a comment above the page_addres() thing at
the top of the loop, though.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-05 Thread Jens Axboe
On Thu, Mar 05 2009, Geert Uytterhoeven wrote:
   Hi Jens,
 
 On Thu, 5 Mar 2009, Jens Axboe wrote:
  On Wed, Mar 04 2009, Geert Uytterhoeven wrote:
   Below is the rewrite of the PS3 Video RAM Storage Driver as a plain block
   device, as requested by Arnd Bergmann.
  
   The MTD-based PS3 Video RAM Storage Driver was integrated into the 
   mainline
   kernel in 2.6.29-rc1.
  
   Ideally, we think it would be best if the existing MTD-based ps3vram 
   driver
   would be replaced by the new block-based ps3vram driver before 2.6.29 is
   released. This would relieve the burden of supporting two different swap 
   space
   schemes on PS3 (swap on /dev/mtdblock0 vs. /dev/ps3vram) from the distro
   maintainer's shoulders, as in that case there would never have been a 
   stable
   kernel version containing the MTD-based ps3vram driver.
  
   What do you think? If this is accepted, I'll submit a patch to remove the 
   MTD
   ps3vram and add the new driver as ps3vram (instead of ps3vram-ng).
  
   Thanks for your (review and other) comments!
 
  I'd rewrite this as a -make_request_fn handler instead. Then you can
  get rid of the kernel thread. IOW, change
 
  queue = blk_init_queue(ps3vram_request, priv-lock);
 
  to
 
  queue = blk_alloc_queue(GFP_KERNEL);
  blk_queue_make_request(queue, ps3vram_make_request);
 
 Thanks, I didn't know that part...
 
  Add error handling of course, and call blk_queue_max_*() to set your
  limits for this device.
 
 I took out the blk_queue_max_*() calls (compared to ps3disk.c), as
 none of the limits apply, and the defaults are fine.
 
 Is that OK, or is it better to make it explicit?

I think it's always good to make it explicit. Plus for this case you
definitely need it, as blk_init_queue() wont do it for you anymore.

  Then add a ps3vram_make_request() ala:
 
  static void ps3vram_do_request(struct request_queue *q, struct bio *bio)
  {
  struct ps3_system_bus_device *dev = q-queuedata;
  struct ps3vram_priv *priv = dev-core.driver_data;
  int write, res, err = -EIO;
  struct bio_vec *bv;
  sector_t sector;
  loff_t offset;
  size_t len, retlen;
  int i;
 
  write = bio_data_dir(bio) == WRITE;
 
  sector = bio-bi_sector;
  bio_for_each_segment(bv, bio, i) {
  char *ptr = page_address(bv-bv_page) + bv-bv_offset;
 
  len = bv-bv_len;
  offset = sector  9;
 
  if (write)
  res = ps3vram_write(dev, offset, len, retlen, ptr);
  else
  res = ps3vram_read(dev, offset, len, retlen, ptr);
 
  if (res) {
  dev_err(dev-core, %s failed\n, op);
  goto out;
  }
 
  if (retlen != len) {
  dev_err(dev-core, Short %s\n, op);
  goto out;
  }
  sector += (len  9);
  }
 
  dev_dbg(dev-core, %s completed\n, op);
  err = 0;
  out:
  bio_endio(bio, err);
  }
 
  I just typed it here, so if it doesn't compile you get to keep the
  pieces :-)
 
 OK, I'll give it a try...
 
 BTW, does this mean the `simple' way, which I used based on LDD3, is
 deprecated?

Depends.. It's obviously not a very effective approach, since you punt
to a thread for each request. But if you need the IO scheduler helping
you with merging and sorting (for a rotational device), it still has
some merit. For this particular case, the -make_request_fn approach is
much better.

  Since ps3 is very RAM limited, I didn't bother with any highmem mapping
  for the bio, since I gather that isn't an issue on that platform. You
  may want to detail that in a comment above the page_addres() thing at
  the top of the loop, though.
 
 PS3 is ppc64, so no highmem. But I guess I best add the code for that anyway,
 in case people will copy the driver in the future (I remember receiving a
 similar comment for ps3disk).

I'd prefer just the comment, since you have to be able to handle
disabled interrupts on return from bvec_kmap_irq() if you use that. And
that would just complicate your driver, for something that you don't
need. So I'd put the comment in there stating why it's OK to just use
page_address() instead. If people copy-paste the code to some other
driver, then they will get the comment as well.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-05 Thread Jens Axboe
On Thu, Mar 05 2009, Geert Uytterhoeven wrote:
 On Thu, 5 Mar 2009, Jens Axboe wrote:
  On Thu, Mar 05 2009, Geert Uytterhoeven wrote:
   On Thu, 5 Mar 2009, Jens Axboe wrote:
On Wed, Mar 04 2009, Geert Uytterhoeven wrote:
 Below is the rewrite of the PS3 Video RAM Storage Driver as a plain 
 block
 device, as requested by Arnd Bergmann.
 
I'd rewrite this as a -make_request_fn handler instead. Then you can
get rid of the kernel thread. IOW, change
   
queue = blk_init_queue(ps3vram_request, priv-lock);
   
to
   
queue = blk_alloc_queue(GFP_KERNEL);
blk_queue_make_request(queue, ps3vram_make_request);
  
   Thanks, I didn't know that part...
  
Add error handling of course, and call blk_queue_max_*() to set your
limits for this device.
  
   I took out the blk_queue_max_*() calls (compared to ps3disk.c), as
   none of the limits apply, and the defaults are fine.
  
   Is that OK, or is it better to make it explicit?
 
  I think it's always good to make it explicit. Plus for this case you
  definitely need it, as blk_init_queue() wont do it for you anymore.
 
 blk_queue_make_request() does it for me, too:
 
 void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
 {
   ...
   blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
   blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
   ...
   blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
   ...
   blk_queue_max_sectors(q, SAFE_MAX_SECTORS);
   ...
 }
 
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
   ...
   blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
 
   blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
   blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
   ...
 }

Indeed, there's some duplicated code in blk_init_queue_node(), I'll make
sure to get rid of that!

Then add a ps3vram_make_request() ala:
  
static void ps3vram_do_request(struct request_queue *q, struct bio *bio)
{
 
}
   
I just typed it here, so if it doesn't compile you get to keep the
pieces :-)
  
   OK, I'll give it a try...
  
   BTW, does this mean the `simple' way, which I used based on LDD3, is
   deprecated?
 
  Depends.. It's obviously not a very effective approach, since you punt
  to a thread for each request. But if you need the IO scheduler helping
  you with merging and sorting (for a rotational device), it still has
  some merit. For this particular case, the -make_request_fn approach is
  much better.
 
 Without the thread, performance indeed increased.
 
 But then I noticed ps3vram_make_request() may be called concurrently,
 so I had to add a mutex to avoid data corruption. This slows the
 driver down, and in the end, the version with a thread turns out to be
 ca. 1% faster. The version without a thread is about 50 lines less
 code, though.

That is correct, -make_request_fn may get reentered. I'm not surprised
that performance dropped if you just shoved everything under a mutex.
You could be a little more smart and queue concurrent bio's for
processing when the current one is complete though, there are several
approaches there that be a lot faster than going all the way through the
IO stack and scheduler just to avoid concurrency.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] ps3/block: Add ps3vram-ng driver for accessing video RAM as block device

2009-03-04 Thread Jens Axboe
On Thu, Mar 05 2009, Benjamin Herrenschmidt wrote:
 On Wed, 2009-03-04 at 14:57 +0100, Geert Uytterhoeven wrote:
  Hi,
  
  Below is the rewrite of the PS3 Video RAM Storage Driver as a plain block
  device, as requested by Arnd Bergmann.
  
  The MTD-based PS3 Video RAM Storage Driver was integrated into the mainline
  kernel in 2.6.29-rc1.
  
  Ideally, we think it would be best if the existing MTD-based ps3vram driver
  would be replaced by the new block-based ps3vram driver before 2.6.29 is
  released. This would relieve the burden of supporting two different swap 
  space
  schemes on PS3 (swap on /dev/mtdblock0 vs. /dev/ps3vram) from the distro
  maintainer's shoulders, as in that case there would never have been a stable
  kernel version containing the MTD-based ps3vram driver.
 
 This is very very very late ... we are at rc7, probably one rc before
 final... as much as I like integrating drivers later, I'll ask Linus
 opinion on this one.
 
 Linus ? What do you reckon ? Maybe a better option is just to remove
 ps3nvram from .29 and merge the new one in .30 ?

It's an isolated driver for a special purpose platform, I would have
zero problems merging it :-)

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 07/13] powerpc/ps3: printing fixups for l64 to ll64 conversion: drivers/block

2009-01-13 Thread Jens Axboe
On Wed, Jan 14 2009, Stephen Rothwell wrote:
 
 Signed-off-by: Stephen Rothwell s...@canb.auug.org.au
 ---
  drivers/block/ps3disk.c |   18 +-
  1 files changed, 9 insertions(+), 9 deletions(-)
 
 Jens, if its OK by you, this could go in via the PowerPC tree.

Of course.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] powerpc/iseries: viodasd needs to depend on CONFIG_BLOCK

2008-12-17 Thread Jens Axboe
On Wed, Dec 17 2008, Stephen Rothwell wrote:
 Otherwise you get lot of errors like these:
 
 drivers/block/viodasd.c:72: error: dereferencing pointer to incomplete type
 drivers/block/viodasd.c: In function 'viodasd_open':
 drivers/block/viodasd.c:135: error: dereferencing pointer to incomplete type
 drivers/block/viodasd.c: In function 'viodasd_release':
 drivers/block/viodasd.c:184: error: dereferencing pointer to incomplete type
 drivers/block/viodasd.c: In function 'viodasd_getgeo':
 drivers/block/viodasd.c:209: error: dereferencing pointer to incomplete type
 drivers/block/viodasd.c:214: error: implicit declaration of function 
 'get_capacity'
 drivers/block/viodasd.c: At top level:
 drivers/block/viodasd.c:222: error: variable 'viodasd_fops' has initializer 
 but incomplete type
 drivers/block/viodasd.c:223: error: unknown field 'owner' specified in 
 initializer
 
 Discovered by a randconfig build.
 
 Signed-off-by: Stephen Rothwell s...@canb.auug.org.au
 ---
  arch/powerpc/platforms/iseries/Kconfig |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/platforms/iseries/Kconfig 
 b/arch/powerpc/platforms/iseries/Kconfig
 index 45ffd8e..ed3753d 100644
 --- a/arch/powerpc/platforms/iseries/Kconfig
 +++ b/arch/powerpc/platforms/iseries/Kconfig
 @@ -9,6 +9,7 @@ menu iSeries device drivers
  
  config VIODASD
   tristate iSeries Virtual I/O disk support
 + depends on BLOCK
   help
 If you are running on an iSeries system and you want to use
 virtual disks created and managed by OS/400, say Y.
 -- 
 1.6.0.5

Indeed, looks good. I trust the ppc folks will carry this one? If so,
feel free to add my acked-by to this.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RESEND] [PATCH] powerpc: remove dead BIO_VMERGE_BOUNDARY definition

2008-12-14 Thread Jens Axboe
On Sun, Dec 14 2008, FUJITA Tomonori wrote:
 This is a resend of:
 
 http://marc.info/?l=linux-kernelm=122482703616607w=2
 
 =
 From: FUJITA Tomonori fujita.tomon...@lab.ntt.co.jp
 Subject: [PATCH] powerpc: remove dead BIO_VMERGE_BOUNDARY definition
 
 The block layer dropped the virtual merge feature
 (b8b3e16cfe6435d961f6aaebcfd52a1ff2a988c5). BIO_VMERGE_BOUNDARY
 definition is meaningless now (For POWER, BIO_VMERGE_BOUNDARY has been
 meaningless for a long time since POWER disables the virtual merge
 feature).
 
 Signed-off-by: FUJITA Tomonori fujita.tomon...@lab.ntt.co.jp

Acked-by: Jens Axboe jens.ax...@oracle.com

 ---
  arch/powerpc/include/asm/io.h |7 ---
  1 files changed, 0 insertions(+), 7 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
 index 08266d2..494cd8b 100644
 --- a/arch/powerpc/include/asm/io.h
 +++ b/arch/powerpc/include/asm/io.h
 @@ -713,13 +713,6 @@ static inline void * phys_to_virt(unsigned long address)
   */
  #define page_to_phys(page)   ((phys_addr_t)page_to_pfn(page)  PAGE_SHIFT)
  
 -/* We do NOT want virtual merging, it would put too much pressure on
 - * our iommu allocator. Instead, we want drivers to be smart enough
 - * to coalesce sglists that happen to have been mapped in a contiguous
 - * way by the iommu
 - */
 -#define BIO_VMERGE_BOUNDARY  0
 -
  /*
   * 32 bits still uses virt_to_bus() for it's implementation of DMA
   * mappings se we have to keep it defined here. We also have some old
 -- 
 1.5.5.GIT
 

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Jens Axboe
On Wed, Nov 19 2008, Stephen Rothwell wrote:
 Hi Jens,
 
 On Wed, 19 Nov 2008 10:16:28 +0100 Jens Axboe [EMAIL PROTECTED] wrote:
 
  Strange, so it gets stuck on the timer lock, very weird. You don't
  happen to have output showing that the other CPU is up to at that point?
 
 Unfortunately, no, but I will see what I can find tomorrow.
 
 Today's linux-next still has a problem, but it is slightly different:
 
 Unable to handle kernel paging request for data at address 0x
 Faulting instruction address: 0xc0503030
 cpu 0x0: Vector: 300 (Data Access) at [ca40]
 pc: c0503030: ._spin_lock_irqsave+0x40/0x110
 lr: c02571f8: .blk_rq_timed_out_timer+0x48/0x190
 sp: ccc0
msr: 80009032
dar: 0
  dsisr: 4000
   current = 0xc00022d31040
   paca= 0xc0897300
 pid   = 3399, comm = ckbcomp
 enter ? for help
 [cd50] c02571f8 .blk_rq_timed_out_timer+0x48/0x190
 [ce00] c006c2f4 .run_timer_softirq+0x1c4/0x2a0
 [ced0] c0065298 .__do_softirq+0xe8/0x1f0
 [cf90] c0029224 .call_do_softirq+0x14/0x24
 [c00022ad3c80] c000d420 .do_softirq+0xf0/0x140
 [c00022ad3d20] c00654a4 .irq_exit+0x74/0x90
 [c00022ad3da0] c0025844 .timer_interrupt+0x134/0x150
 [c00022ad3e30] c0003700 decrementer_common+0x100/0x180
 --- Exception: 901 (Decrementer) at 0ff52440

That's even more weird, how could 'data' passed in to the timer ever be
0? It's setup like this:

setup_timer(q-timeout, blk_rq_timed_out_timer, (unsigned long) q);

when we allocate the queue. How did this trigger?

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Jens Axboe
On Wed, Nov 19 2008, Stephen Rothwell wrote:
 Hi all,
 
 I got this in my boot test last night:
 
 Begin: Waiting for root file system... ...
 BUG: spinlock lockup on CPU#1, vol_id/3246, c0b09700
 Call Trace:
 [c00040ef7080] [c000fb58] .show_stack+0x70/0x184 (unreliable)
 [c00040ef7130] [c027adac] ._raw_spin_lock+0x140/0x17c
 [c00040ef71d0] [c04ec648] ._spin_lock_irqsave+0x8c/0xc4
 [c00040ef7270] [c00659dc] .lock_timer_base+0x38/0x90
 [c00040ef7310] [c0065b50] .__mod_timer+0x4c/0x11c
 [c00040ef73c0] [c025ae9c] .blk_plug_device+0xc0/0xd8
 [c00040ef7440] [c025bb90] .__make_request+0x498/0x518
 [c00040ef74f0] [c0259dc8] .generic_make_request+0x24c/0x2a4
 [c00040ef75b0] [c025b6d0] .submit_bio+0x108/0x130
 [c00040ef7670] [c01210e4] .submit_bh+0x174/0x1c0
 [c00040ef7700] [c01259a8] .block_read_full_page+0x34c/0x3b4
 [c00040ef7820] [c0129a60] .blkdev_readpage+0x20/0x38
 [c00040ef78a0] [c00c111c] .__do_page_cache_readahead+0x23c/0x2b8
 [c00040ef7980] [c00c1370] .ondemand_readahead+0x1d8/0x210
 [c00040ef7a30] [c00b7f20] .generic_file_aio_read+0x224/0x620
 [c00040ef7b60] [c00f9020] .do_sync_read+0xc4/0x124
 [c00040ef7cf0] [c00f98e0] .vfs_read+0xd8/0x1bc
 [c00040ef7d90] [c00f9f0c] .sys_read+0x4c/0x8c
 [c00040ef7e30] [c00084d4] syscall_exit+0x0/0x40
 
 This was on a Power5 partition.  I am attempting to reproduce the problem.
 
 Any clues?

Strange, so it gets stuck on the timer lock, very weird. You don't
happen to have output showing that the other CPU is up to at that point?

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Jens Axboe
On Thu, Nov 20 2008, Stephen Rothwell wrote:
 Hi Jens,
 
 On Wed, 19 Nov 2008 14:34:09 +0100 Jens Axboe [EMAIL PROTECTED] wrote:
 
  Are you removing devices or modules? We have a bug there it seems, does
  this help?
 
 This is early in boot (we are waiting for the root device while running
 on the initramfs) so there could well be modules being unloaded.
 
 That patch makes the problem go away.

Excellent, since it was an apparent but, I already updated the original
patch with this hunk.

Thanks a lot for your bisection work, Stephen!

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: generic-ipi tree build failure

2008-07-03 Thread Jens Axboe
On Thu, Jul 03 2008, Stephen Rothwell wrote:
 Hi Ingo, Jens,
 
 Today's linux-next build (powerpc ppc64_defconfig) failed like this:
 
 arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now':
 arch/powerpc/mm/tlb_64.c:66: error: too many arguments to function 
 'smp_call_function'
 arch/powerpc/mm/slice.c: In function 'slice_get_unmapped_area':
 arch/powerpc/mm/slice.c:559: error: too many arguments to function 
 'on_each_cpu'
 arch/powerpc/kernel/machine_kexec_64.c: In function 'kexec_prepare_cpus':
 arch/powerpc/kernel/machine_kexec_64.c:175: error: too many arguments to 
 function 'smp_call_function'
 
 I applied the patch below.

Thanks Stephen, I've verified that the patch is correct!

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] iSeries: remove unused mail address

2008-05-23 Thread Jens Axboe
On Fri, May 23 2008, Stephen Rothwell wrote:
 I don't use my IBM email address normally and people can find me in
 CREDITS.
 
 Signed-off-by: Stephen Rothwell [EMAIL PROTECTED]
 ---
  drivers/block/viodasd.c |2 +-
  drivers/cdrom/viocd.c   |2 +-
  drivers/char/viocons.c  |2 +-
  drivers/char/viotape.c  |2 +-
  4 files changed, 4 insertions(+), 4 deletions(-)
 
 Jens, I assume that this is trivial enough that it can be sent via the
 powerpc tree.

Of course :-)

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-21 Thread Jens Axboe
On Thu, Feb 21 2008, Andrew Morton wrote:
 On Tue, 19 Feb 2008 09:36:34 +0100 Jens Axboe [EMAIL PROTECTED] wrote:
 
  But I think the radix 'scan over entire tree' is a bit fragile.
 
 eek, it had better not be.  Was this an error in the caller?  Hope so.

The cfq use of it, not the radix tree code! It juggled the keys and
wants to make sure that we see all users, modulo raced added ones (ok if
we see them, doesn't matter if we don't).

  This
  patch adds a parallel hlist for ease of properly browsing the members,
 
 Even though io_contexts are fairly uncommon, adding more stuff to a data
 structure was a pretty sad alternative to fixing a bug in
 radix_tree_gang_lookup(), or to fixing a bug in a caller of it.
 
 IOW: what exactly went wrong here??

I could not convince myself that the current code would always do the
right thing. We should not have been seeing -key == NULL entries in
there, it implied a double exit of that process. So I decided to fix it
by making the code a lot more readable (the patch in question deleted a
lot more than it added), at the cost of that hlist head + node.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread Jens Axboe
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
 On Sun, 17 Feb 2008 20:29:13 +0100
 Jens Axboe [EMAIL PROTECTED] wrote:
 
  It's odd stuff. Could you perhaps try and add some printks to
  block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
  from radix_tree_gang_lookup() and the pointer value of cics[i] in the
  for() loop after the lookup?
  
 I met the same issue on ia64/NUMA box.
 seems cisc[]-key is NULL and index for radix_tree_gang_lookup() was
 always '1'.

Why does it keep repeating then? If -key is NULL, the next lookup index
should be 1UL.

But I think the radix 'scan over entire tree' is a bit fragile. This
patch adds a parallel hlist for ease of properly browsing the members,
does that work for you? It compiles, but I haven't booted it here yet...

 Attached patch works well for me, but I don't know much about cfq.
 please confirm. 

It doesn't make a lot of sense, I'm afraid.

 block/blk-ioc.c   |   35 +++
 block/cfq-iosched.c   |   37 +++--
 include/linux/iocontext.h |2 ++
 3 files changed, 28 insertions(+), 46 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 80245dc..73c7002 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -17,17 +17,13 @@ static struct kmem_cache *iocontext_cachep;
 
 static void cfq_dtor(struct io_context *ioc)
 {
-   struct cfq_io_context *cic[1];
-   int r;
+   if (!hlist_empty(ioc-cic_list)) {
+   struct cfq_io_context *cic;
 
-   /*
-* We don't have a specific key to lookup with, so use the gang
-* lookup to just retrieve the first item stored. The cfq exit
-* function will iterate the full tree, so any member will do.
-*/
-   r = radix_tree_gang_lookup(ioc-radix_root, (void **) cic, 0, 1);
-   if (r  0)
-   cic[0]-dtor(ioc);
+   cic = list_entry(ioc-cic_list.first, struct cfq_io_context,
+   cic_list);
+   cic-dtor(ioc);
+   }
 }
 
 /*
@@ -57,18 +53,16 @@ EXPORT_SYMBOL(put_io_context);
 
 static void cfq_exit(struct io_context *ioc)
 {
-   struct cfq_io_context *cic[1];
-   int r;
-
rcu_read_lock();
-   /*
-* See comment for cfq_dtor()
-*/
-   r = radix_tree_gang_lookup(ioc-radix_root, (void **) cic, 0, 1);
-   rcu_read_unlock();
 
-   if (r  0)
-   cic[0]-exit(ioc);
+   if (!hlist_empty(ioc-cic_list)) {
+   struct cfq_io_context *cic;
+
+   cic = list_entry(ioc-cic_list.first, struct cfq_io_context,
+   cic_list);
+   cic-exit(ioc);
+   }
+   rcu_read_unlock();
 }
 
 /* Called by the exitting task */
@@ -105,6 +99,7 @@ struct io_context *alloc_io_context(gfp_t gfp_flags, int 
node)
ret-nr_batch_requests = 0; /* because this is 0 */
ret-aic = NULL;
INIT_RADIX_TREE(ret-radix_root, GFP_ATOMIC | __GFP_HIGH);
+   INIT_HLIST_HEAD(ret-cic_list);
ret-ioc_data = NULL;
}
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ca198e6..62eda3f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1145,38 +1145,19 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
 /*
  * Call func for each cic attached to this ioc. Returns number of cic's seen.
  */
-#define CIC_GANG_NR16
 static unsigned int
 call_for_each_cic(struct io_context *ioc,
  void (*func)(struct io_context *, struct cfq_io_context *))
 {
-   struct cfq_io_context *cics[CIC_GANG_NR];
-   unsigned long index = 0;
-   unsigned int called = 0;
-   int nr;
+   struct cfq_io_context *cic;
+   struct hlist_node *n;
+   int called = 0;
 
rcu_read_lock();
-
-   do {
-   int i;
-
-   /*
-* Perhaps there's a better way - this just gang lookups from
-* 0 to the end, restarting after each CIC_GANG_NR from the
-* last key + 1.
-*/
-   nr = radix_tree_gang_lookup(ioc-radix_root, (void **) cics,
-   index, CIC_GANG_NR);
-   if (!nr)
-   break;
-
-   called += nr;
-   index = 1 + (unsigned long) cics[nr - 1]-key;
-
-   for (i = 0; i  nr; i++)
-   func(ioc, cics[i]);
-   } while (nr == CIC_GANG_NR);
-
+   hlist_for_each_entry_rcu(cic, n, ioc-cic_list, cic_list) {
+   func(ioc, cic);
+   called++;
+   }
rcu_read_unlock();
 
return called;
@@ -1190,6 +1171,7 @@ static void cic_free_func(struct io_context *ioc, struct 
cfq_io_context *cic)
 
spin_lock_irqsave(ioc-lock, flags);
radix_tree_delete(ioc-radix_root, cic-dead_key

Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread Jens Axboe
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
 On Tue, 19 Feb 2008 09:58:38 +0100
 Jens Axboe [EMAIL PROTECTED] wrote:
   when I inserted printk here
   ==
 for (i = 0; i  nr; i++)
 func(ioc, cics[i]);
 printk(%d %lx\n, nr, index);
   ==
   index was always 1 and  nr was always 32.
   
   So, cics[31]-key was always NULL when index=1 is passed to
   radix_tree_gang_lookup().
  
  Hang on, it returned 32? It should not return more than 16, since that
  is what we have room for and asked for. 
 sorry. Of course, it was 16 ;(

I expected so, otherwise we would have had far more serious problems :-)

 your patch works well. thank you.

It's committed now and posted in the relevant bugzilla as well (#9948).

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread Jens Axboe
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
 On Tue, 19 Feb 2008 09:36:34 +0100
 Jens Axboe [EMAIL PROTECTED] wrote:
 
  On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
   On Sun, 17 Feb 2008 20:29:13 +0100
   Jens Axboe [EMAIL PROTECTED] wrote:
   
It's odd stuff. Could you perhaps try and add some printks to
block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
from radix_tree_gang_lookup() and the pointer value of cics[i] in the
for() loop after the lookup?

   I met the same issue on ia64/NUMA box.
   seems cisc[]-key is NULL and index for radix_tree_gang_lookup() was
   always '1'.
  
  Why does it keep repeating then? If -key is NULL, the next lookup index
  should be 1UL.
  
 when I inserted printk here
 ==
   for (i = 0; i  nr; i++)
   func(ioc, cics[i]);
   printk(%d %lx\n, nr, index);
 ==
 index was always 1 and  nr was always 32.
 
 So, cics[31]-key was always NULL when index=1 is passed to
 radix_tree_gang_lookup().

Hang on, it returned 32? It should not return more than 16, since that
is what we have room for and asked for. Using -dead_key when -key is
NULL is correct btw, since that is the correct location in the tree once
the process has exited. But that should not happen until AFTER the
func() call, so I still think the list patch is safer.

  But I think the radix 'scan over entire tree' is a bit fragile. This
  patch adds a parallel hlist for ease of properly browsing the
  members, does that work for you? It compiles, but I haven't booted
  it here yet...
  
 will try. please wait a bit.

It boots here, so at least it passes normal sanity tests. It should
solve your problem as well, hopefully.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread Jens Axboe
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
 On Tue, 19 Feb 2008 09:36:34 +0100
 Jens Axboe [EMAIL PROTECTED] wrote:
 
  On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
   On Sun, 17 Feb 2008 20:29:13 +0100
   Jens Axboe [EMAIL PROTECTED] wrote:
   
It's odd stuff. Could you perhaps try and add some printks to
block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
from radix_tree_gang_lookup() and the pointer value of cics[i] in the
for() loop after the lookup?

   I met the same issue on ia64/NUMA box.
   seems cisc[]-key is NULL and index for radix_tree_gang_lookup() was
   always '1'.
  
  Why does it keep repeating then? If -key is NULL, the next lookup index
  should be 1UL.
  
  But I think the radix 'scan over entire tree' is a bit fragile. This
  patch adds a parallel hlist for ease of properly browsing the members,
  does that work for you? It compiles, but I haven't booted it here yet...
  
 Works well for me and my box booted !

Super, I'll get it upstream. Thanks for testing and debugging!

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] iSeries: fix section mismatch in viocd

2008-02-05 Thread Jens Axboe
On Tue, Feb 05 2008, Stephen Rothwell wrote:
 WARNING: drivers/cdrom/viocd.o(.text+0x504): Section mismatch in reference 
 from the function .viocd_probe() to the function .init.text:.find_capability()
 
 Signed-off-by: Stephen Rothwell [EMAIL PROTECTED]
 ---
  drivers/cdrom/viocd.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 Jens, can this go in through the powerpc tree?

Certainly

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] compat_ioctl: fix block device compat ioctl regression

2007-10-27 Thread Jens Axboe
On Sat, Oct 27 2007, Arnd Bergmann wrote:
 From: Philip Langdale [EMAIL PROTECTED]
 The conversion of handlers to compat_blkdev_ioctl accidentally
 disabled handling of most ioctl numbers on block devices because
 of a typo. Fix the one line to enable it all again.
 
 Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]
 ---
 
 Mea Culpa. This should have been found by my testing, as it's clear
 that most of my big patch never worked at all. Sorry for causing
 problems for everyone involved here.
 
 I'm attributing the patch to Philip, as he's the one who pointed
 out to me what the fix is.

Oops, added for swift inclusion.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fix powerpc breakage in sg chaining code (again)

2007-10-24 Thread Jens Axboe
On Tue, Oct 23 2007, Johannes Berg wrote:
 On Tue, 2007-10-23 at 23:00 +0200, Johannes Berg wrote:
  Commit 33ff910f0f466184ffc3514628f18403dcd86761 fixed commit
  78bdc3106a877cfa50439fa66b52acbc4e7868df but the code that got put in
  uses struct scatterlist-page which no longer exists since commit
  18dabf473e15850c0dbc8ff13ac1e2806d542c15 which requires using
  sg_page(sg) instead of sg-page.
 
 This doesn't help though, when I boot I get a NULL-pointer dereference
 that looks approximately like this:
 
 NIP blk_rq_map_sg + 0x64/0x190
 LR ide_map_sg + 0x38/?
 Call trace:
 cfq_remove_request + 0x168 (unreliable)
 cfq_dispatch_request + 0x44
 ide_build_sglist + 0x38?
 pmac_ide_dma_setup + 0xa0?
 ide_do_rw_disk? + ?
 [...]
 
 I can upload the picture I took but it's hardly readable (taken with my
 crappy cell phone)

I lost track - which kernel are you booting? This looks like something
that should be fixed, did you try 2.6.24-rc1?

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fix powerpc breakage in sg chaining code (again)

2007-10-24 Thread Jens Axboe
On Wed, Oct 24 2007, Johannes Berg wrote:
 On Wed, 2007-10-24 at 11:14 +0200, Jens Axboe wrote:
 
  I lost track - which kernel are you booting? This looks like something
  that should be fixed, did you try 2.6.24-rc1?
 
 No, it came out after I pulled, I was on v2.6.23-6815-g0895e91. I'll
 give it a try, but I don't think I can confirm it works before tomorrow.
 I see the build failure got fixed with commit
 5edadbd0ae35d2daabaf6b44f2c58d67d4021ed2 too.

0895e91d60ef9bdef426d1ce14bb94bd5875870d is definitely too old, so it
will break on IDE. I'm confident that a newer kernel will solve this
issue.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: IB/ehca: Fix sg_page() fallout

2007-10-23 Thread Jens Axboe
On Tue, Oct 23 2007, Olof Johansson wrote:
 
 More fallout from sg_page changes:
 
 drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user1':
 drivers/infiniband/hw/ehca/ehca_mrmw.c:1779: error: 'struct scatterlist' has 
 no member named 'page'
 drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 
 'ehca_check_kpages_per_ate':
 drivers/infiniband/hw/ehca/ehca_mrmw.c:1835: error: 'struct scatterlist' has 
 no member named 'page'
 drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user2':
 drivers/infiniband/hw/ehca/ehca_mrmw.c:1870: error: 'struct scatterlist' has 
 no member named 'page'
 
 
 Signed-off-by: Olof Johansson [EMAIL PROTECTED]
 
 
 ---
 
 On Tue, Oct 23, 2007 at 07:05:12AM +0200, Jens Axboe wrote:
  On Mon, Oct 22 2007, Olof Johansson wrote:
   More fallout from sg_page changes, found with powerpc allyesconfig:
   
   drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 
   'ehca_set_pagebuf_user1':
   drivers/infiniband/hw/ehca/ehca_mrmw.c:1779: error: 'struct scatterlist' 
   has no member named 'page'
   drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 
   'ehca_check_kpages_per_ate':
   drivers/infiniband/hw/ehca/ehca_mrmw.c:1835: error: 'struct scatterlist' 
   has no member named 'page'
   drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 
   'ehca_set_pagebuf_user2':
   drivers/infiniband/hw/ehca/ehca_mrmw.c:1870: error: 'struct scatterlist' 
   has no member named 'page'
  
  Thanks a lot Olof, applied both fixups!
 
 I messed up the second fix. :( please replace with this.

No problem, applied.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: powerpc: Fix fallout from sg_page() changes

2007-10-23 Thread Jens Axboe
On Mon, Oct 22 2007, Olof Johansson wrote:
 Fix fallout from 18dabf473e15850c0dbc8ff13ac1e2806d542c15:
 
 In file included from include/linux/dma-mapping.h:52,
  from drivers/base/dma-mapping.c:10:
 include/asm/dma-mapping.h: In function 'dma_map_sg':
 include/asm/dma-mapping.h:288: error: 'struct scatterlist' has no member 
 named 'page'
 include/asm/dma-mapping.h:288: error: 'struct scatterlist' has no member 
 named 'page'
 include/asm/dma-mapping.h:288: error: 'struct scatterlist' has no member 
 named 'page'
 include/asm/dma-mapping.h:289: error: 'struct scatterlist' has no member 
 named 'page'
 include/asm/dma-mapping.h:290: error: 'struct scatterlist' has no member 
 named 'page'
 include/asm/dma-mapping.h: In function 'dma_sync_sg_for_cpu':
 include/asm/dma-mapping.h:331: error: 'struct scatterlist' has no member 
 named 'page'
 
 drivers/scsi/ps3rom.c: In function 'fetch_to_dev_buffer':
 drivers/scsi/ps3rom.c:150: error: 'struct scatterlist' has no member named 
 'page'

Applied.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Fix fallout from sg_page changes

2007-10-23 Thread Jens Axboe
On Tue, Oct 23 2007, Dale Farnsworth wrote:
 
 Signed-off-by: Dale Farnsworth [EMAIL PROTECTED]
 ---
  include/asm-powerpc/dma-mapping.h |   10 +-
  1 files changed, 5 insertions(+), 5 deletions(-)

Should already be merged in Linus' current tree.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use the new sg_page() helper

2007-10-23 Thread Jens Axboe
On Tue, Oct 23 2007, Linus Torvalds wrote:
 
 
 On Tue, 23 Oct 2007, Emil Medve wrote:
 
  Fix build error messages such as this:
 
 Some - but apparently not all - of these are already fixed in my tree, 
 through pulls from Jens. I just pushed out the result, can you resend the 
 parts that didn't already get fixed?
 
 As is, the patch won't apply for me, since it touches many different parts 
 and some of them are fixed.

Looks like just the mmc and xtensa bits, I've merged those up.

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


  1   2   >