from:"Jens Axboe"

Re: KASAN: null-ptr-deref Write in tctx_task_work_run

2024-03-17 Thread Jens Axboe

On 3/17/24 6:59 PM, Ubisectech Sirius wrote:
> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 
> 6.8.0-ge5e038b7ae9d. Attached to the email were a POC file of the issue.
> 
> Stack dump:
> 
> ==
> BUG: KASAN: null-ptr-deref in instrument_atomic_read_write 
> include/linux/instrumented.h:96 [inline]
> BUG: KASAN: null-ptr-deref in llist_del_all include/linux/llist.h:266 [inline]
> BUG: KASAN: null-ptr-deref in tctx_task_work_run+0x7d/0x330 
> io_uring/io_uring.c:1267
> Write of size 8 at addr 01c0 by task iou-sqp-215603/215604
> 
> CPU: 0 PID: 215604 Comm: iou-sqp-215603 Not tainted 6.8.0-ge5e038b7ae9d #40
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:114
>  kasan_report+0xbd/0xf0 mm/kasan/report.c:601
>  check_region_inline mm/kasan/generic.c:183 [inline]
>  kasan_check_range+0xf4/0x1a0 mm/kasan/generic.c:189
>  instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
>  llist_del_all include/linux/llist.h:266 [inline]
>  tctx_task_work_run+0x7d/0x330 io_uring/io_uring.c:1267
>  io_sq_tw+0x12a/0x1d0 io_uring/sqpoll.c:245
>  io_sq_thread+0x8d7/0x18a0 io_uring/sqpoll.c:308
>  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
>  
> ==
> Kernel panic - not syncing: KASAN: panic_on_warn set ...
> CPU: 0 PID: 215604 Comm: iou-sqp-215603 Not tainted 6.8.0-ge5e038b7ae9d #40
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014

I think you snipped the fault injection that came before this. It looks
like an allocation failure, so we don't get tsk->io_uring setup for the
SQPOLL thread. Not a great way to handle this, but can you try the
below? Would be nicer if we could just prune the task rather than wake
it and have it error.

diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 363052b4ea76..db7b0fdfe1cb 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -274,6 +274,10 @@ static int io_sq_thread(void *data)
char buf[TASK_COMM_LEN];
DEFINE_WAIT(wait);
 
+   /* offload context creation failed, just exit */
+   if (!current->io_uring) {
+   goto err_out;
+
snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
set_task_comm(current, buf);
 
@@ -371,7 +375,7 @@ static int io_sq_thread(void *data)
atomic_or(IORING_SQ_NEED_WAKEUP, >rings->sq_flags);
io_run_task_work();
mutex_unlock(>lock);
-
+err_out:
complete(>exited);
do_exit(0);
 }

-- 
Jens Axboe

Re: [PATCH RERESEND 00/11] splice(file<>pipe) I/O on file as-if O_NONBLOCK

2023-12-14 Thread Jens Axboe

On 12/14/23 11:44 AM, Ahelenia Ziemia?ska wrote:
> First:  
> https://lore.kernel.org/lkml/cover.1697486714.git.nabijaczlew...@nabijaczleweli.xyz/t/#u
> Resend: 
> https://lore.kernel.org/lkml/1cover.1697486714.git.nabijaczlew...@nabijaczleweli.xyz/t/#u
> Resending again per 
> https://lore.kernel.org/lkml/20231214093859.01f6e...@kernel.org/t/#u
> 
> Hi!
> 
> As it stands, splice(file -> pipe):
> 1. locks the pipe,
> 2. does a read from the file,
> 3. unlocks the pipe.
> 
> For reading from regular files and blcokdevs this makes no difference.
> But if the file is a tty or a socket, for example, this means that until
> data appears, which it may never do, every process trying to read from
> or open the pipe enters an uninterruptible sleep,
> and will only exit it if the splicing process is killed.
> 
> This trivially denies service to:
> * any hypothetical pipe-based log collexion system
> * all nullmailer installations
> * me, personally, when I'm pasting stuff into qemu -serial chardev:pipe
> 
> This follows:
> 1. 
> https://lore.kernel.org/linux-fsdevel/qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw/t/#u
> 2. a security@ thread rooted in
>
> 3. https://nabijaczleweli.xyz/content/blogn_t/011-linux-splice-exclusion.html
> 
> Patches were posted and then discarded on principle or funxionality,
> all in all terminating in Linus posting
>> But it is possible that we need to just bite the bullet and say
>> "copy_splice_read() needs to use a non-blocking kiocb for the IO".
> 
> This does that, effectively making splice(file -> pipe)
> request (and require) O_NONBLOCK on reads fron the file:
> this doesn't affect splicing from regular files and blockdevs,
> since they're always non-blocking
> (and requesting the stronger "no kernel sleep" IOCB_NOWAIT is non-sensical),

Not sure how you got the idea that regular files or block devices is
always non-blocking, this is certainly not true without IOCB_NOWAIT.
Without IOCB_NOWAIT, you can certainly be waiting for previous IO to
complete.

> but always returns -EINVAL for ttys.
> Sockets behave as expected from O_NONBLOCK reads:
> splice if there's data available else -EAGAIN.
> 
> This should all pretty much behave as-expected.

Should it? Seems like there's a very high risk of breaking existing use
cases here.

Have you at all looked into the approach of enabling splice to/from
_without_ holding the pipe lock? That, to me, would seem like a much
saner approach, with the caveat that I have not looked into that at all
so there may indeed be reasons why this is not feasible.

-- 
Jens Axboe

Re: [PATCH 027/141] drbd: Fix fall-through warnings for Clang

2021-04-20 Thread Jens Axboe

On 4/20/21 2:25 PM, Gustavo A. R. Silva wrote:
> Hi all,
> 
> Friendly ping: who can take this, please?

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH 032/141] floppy: Fix fall-through warnings for Clang

2021-04-20 Thread Jens Axboe

On 4/20/21 2:25 PM, Gustavo A. R. Silva wrote:
> Hi all,
> 
> Friendly ping: who can take this, please?
> 
> Thanks
> --
> Gustavo
> 
> On 11/20/20 12:28, Gustavo A. R. Silva wrote:
>> In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
>> by explicitly adding a fallthrough pseudo-keyword in places where the
>> code is intended to fall through to the next case.
>>
>> Link: https://github.com/KSPP/linux/issues/115
>> Signed-off-by: Gustavo A. R. Silva 
>> ---
>>  drivers/block/floppy.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
>> index 7df79ae6b0a1..21a2a7becba0 100644
>> --- a/drivers/block/floppy.c
>> +++ b/drivers/block/floppy.c
>> @@ -2124,6 +2124,7 @@ static void format_interrupt(void)
>>  switch (interpret_errors()) {
>>  case 1:
>>  cont->error();
>> +fallthrough;
>>  case 2:
>>  break;
>>  case 0:

I wonder about the consistency of the patches. The one I just applied
for libata adds a break, this one annotates fallthrough. But the cases
are really 100% the same. Why aren't the changes consistent? Both are
obviously fine, but for identical cases it seems odd that they differ.

IMHO, adding a break makes more sense. Annotate the fallthrough if the
two cases share work that needs to be done, as then that solution makes
sense.

-- 
Jens Axboe

Re: [PATCH 092/141] libata: Fix fall-through warnings for Clang

2021-04-20 Thread Jens Axboe

On 4/20/21 2:11 PM, Gustavo A. R. Silva wrote:
> Hi all,
> 
> Friendly ping: who can take this, please?

Applied for 5.13.

-- 
Jens Axboe

Re: [PATCH] io_uring: check ctx->sq_data before io_sq_offload_start

2021-04-19 Thread Jens Axboe

On 4/19/21 6:36 AM, Palash Oswal wrote:
> syzkaller identified KASAN: null-ptr-deref Read in io_uring_create
> bug on the stable 5.11-y tree.
> 
> BUG: KASAN: null-ptr-deref in io_sq_offload_start fs/io_uring.c:8254 [inline]
> BUG: KASAN: null-ptr-deref in io_disable_sqo_submit fs/io_uring.c:8999 
> [inline]
> BUG: KASAN: null-ptr-deref in io_uring_create+0x1275/0x22f0 fs/io_uring.c:9824
> Read of size 8 at addr 0068 by task syz-executor.0/4350
> 
> A simple reproducer for this bug is:
> 
> int main(void)
> {
>   syscall(__NR_mmap, 0x2000ul, 0x100ul, 7ul, 0x32ul, -1, 0ul);
>   intptr_t res = 0;
>   pid_t parent = getpid();
>   *(uint32_t*)0x2084 = 0;
>   *(uint32_t*)0x2088 = 0x42;
>   *(uint32_t*)0x208c = 0;
>   *(uint32_t*)0x2090 = 0;
>   *(uint32_t*)0x2098 = -1;
>   *(uint32_t*)0x209c = 0;
>   *(uint32_t*)0x20a0 = 0;
>   *(uint32_t*)0x20a4 = 0;
>   if (fork() == 0) {
> kill(parent,SIGKILL);
> exit(0);
>   }
>   res = syscall(__NR_io_uring_setup, 0x7994, 0x2080ul);
>   return 0;
> }
> 
> Due to the SIGKILL sent to the process before io_uring_setup
> completes, ctx->sq_data is NULL. Therefore, io_sq_offload_start
> does a null pointer dereferenced read. More details on this bug
> are in [1]. Discussion for this patch happened in [2].
> 
> [1] https://oswalpalash.com/exploring-null-ptr-deref-io-uring-submit
> [2] https://lore.kernel.org/io-uring/a08121be-f481-e9f8-b28d-3eb5d4f
> a5...@gmail.com/

This should be a backport of the 5.12 fix, not a separate patch.

-- 
Jens Axboe

Re: [syzbot] KASAN: use-after-free Read in __cpuhp_state_remove_instance

2021-04-19 Thread Jens Axboe

On 4/19/21 8:41 AM, syzbot wrote:
> syzbot suspects this issue was fixed by commit:
> 
> commit 470ec4ed8c91b4db398ad607c700e9ce88365202
> Author: Jens Axboe 
> Date:   Fri Feb 26 17:20:34 2021 +
> 
> io-wq: fix double put of 'wq' in error path
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11e89cc5d0
> start commit:   cee407c5 Merge tag 'for-linus' of git://git.kernel.org/pub..
> git tree:   upstream
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8f67201de02a572b
> dashboard link: https://syzkaller.appspot.com/bug?extid=38769495e847cea2dcca
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=154e360ad0
> 
> If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: io-wq: fix double put of 'wq' in error path


-- 
Jens Axboe

Re: linux-next: Tree for Apr 19 (bcache)

2021-04-19 Thread Jens Axboe

On 4/19/21 10:26 AM, Coly Li wrote:
> On 4/19/21 11:40 PM, Randy Dunlap wrote:
>> On 4/19/21 3:23 AM, Stephen Rothwell wrote:
>>> Hi all,
>>>
>>> Changes since 20210416:
>>>
>>
>> on x86_64:
>>
>> when
>> # CONFIG_BLK_DEV is not set
>>
>>
>> WARNING: unmet direct dependencies detected for LIBNVDIMM
>>   Depends on [n]: PHYS_ADDR_T_64BIT [=y] && HAS_IOMEM [=y] && BLK_DEV [=n]
>>   Selected by [y]:
>>   - BCACHE_NVM_PAGES [=y] && MD [=y] && BCACHE [=y] && PHYS_ADDR_T_64BIT [=y]
>>
>>
>> Full randconfig file is attached.
>>
> 
> I need hint from kbuild expert.
> 
> My original idea to use "select LIBNVDIMM" is to avoid the
> BCACHE_NVM_PAGES option to disappear if LIBNVDIMM is not enabled.
> Otherwise if nvdimm driver is not configure, users won't know there is a
> BCACHE_NVM_PAGES option unless they read bcache Kconfig file.

But why? That's exactly how it should work. Just use depends to set the
dependency.

> But I see nvdimm's Kconfig, it uses "depends on BLK_DEV", I understand
> it is acceptable that LIBNVDIMM option to disappear from "make
> menuconfig" if BLK_DEV is not enabled.
> 
> For such condition, which one is the proper way to set the dependence ?
> - Change "select LIBNVDIMM" and "select DAX" to "depends on LIBNVDIMM"
> and "depends on DAX" in bcache Kconfig
> - Or change "depends on BLK_DEV" to "select BLK_DEV" in nvdimm Kconfig.

The former.

-- 
Jens Axboe

Re: [PATCH] blk-mq: Fix spurious debugfs directory creation during initialization

2021-04-16 Thread Jens Axboe

On 4/7/21 11:59 AM, Saravanan D wrote:
> blk_mq_debugfs_register_sched_hctx() called from
> device_add_disk()->elevator_init_mq()->blk_mq_init_sched()
> initialization sequence does not have relevant parent directory
> setup and thus spuriously attempts "sched" directory creation
> from root mount of debugfs for every hw queue detected on the
> block device
> 
> dmesg
> ...
> debugfs: Directory 'sched' with parent '/' already present!
> debugfs: Directory 'sched' with parent '/' already present!
> .
> .
> debugfs: Directory 'sched' with parent '/' already present!
> ...
> 
> The parent debugfs directory for hw queues get properly setup
> device_add_disk()->blk_register_queue()->blk_mq_debugfs_register()
> ->blk_mq_debugfs_register_hctx() later in the block device
> initialization sequence.
> 
> A simple check for debugfs_dir has been added to thwart premature
> debugfs directory/file creation attempts.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH] floppy: remove redundant assignment to variable st

2021-04-16 Thread Jens Axboe

On 4/16/21 6:29 AM, Denis Efremov wrote:
> Jens, could you please take this one? I thought to send it to you with other
> cleanup patches in a merge request, but you already applied rest of the
> patches. If you prefer to take it as merge request, it's ok I'll send it
> based on your branch for-5.13/drivers.

I have applied it, thanks.

-- 
Jens Axboe

Re: [RFC PATCH 2/2] bfq/mq-deadline: remove redundant check for passthrough request

2021-04-16 Thread Jens Axboe

On 4/14/21 9:43 PM, Lin Feng wrote:
> Since commit 01e99aeca39796003 'blk-mq: insert passthrough request into
> hctx->dispatch directly', passthrough request should not appear in
> IO-scheduler any more, so blk_rq_is_passthrough checking in addon IO
> schedulers is redundant.
> 
> (Notes: this patch passes generic IO load test with hdds under SAS
> controller and hdds under AHCI controller but obviously not covers all.
> Not sure if passthrough request can still escape into IO scheduler from
> blk_mq_sched_insert_requests, which is used by blk_mq_flush_plug_list and
> has lots of indirect callers.)

Applied, with the bfq bits hand edited to apply for 5.13.

-- 
Jens Axboe

Re: [PATCH 1/2] blk-mq: bypass IO scheduler's limit_depth for passthrough request

2021-04-16 Thread Jens Axboe

On 4/14/21 9:39 PM, Lin Feng wrote:
> Commit 01e99aeca39796003 ("blk-mq: insert passthrough request into
> hctx->dispatch directly") gives high priority to passthrough requests and
> bypass underlying IO scheduler. But as we allocate tag for such request it
> still runs io-scheduler's callback limit_depth, while we really want is to
> give full sbitmap-depth capabity to such request for acquiring available
> tag.
> blktrace shows PC requests(dmraid -s -c -i) hit bfq's limit_depth:
>   8,020 0.0 39952 1,0  m   N bfq [bfq_limit_depth] 
> wr_busy 0 sync 0 depth 8
>   8,021 0.08134 39952  D   R 4 [dmraid]
>   8,022 0.2153824  C   R [0]
>   8,020 0.35442 39952 1,0  m   N bfq [bfq_limit_depth] 
> wr_busy 0 sync 0 depth 8
>   8,023 0.38813 39952  D   R 24 [dmraid]
>   8,024 0.4435624  C   R [0]
> 
> This patch introduce a new wrapper to make code not that ugly.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH 0/5] Another small set of cleanups for floppy driver

2021-04-16 Thread Jens Axboe

On 4/16/21 2:34 AM, Denis Efremov wrote:
> Just a couple of patches to make checkpatch.pl a bit more happy.
> All these patches preserve original semantics of the code and only
> memset(), memcpy() patches change binary code.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-14 Thread Jens Axboe

On 4/13/21 5:14 PM, Dave Chinner wrote:
> On Tue, Apr 13, 2021 at 10:13:24AM -0600, Jens Axboe wrote:
>> On 4/13/21 1:51 AM, SeongJae Park wrote:
>>> From: SeongJae Park 
>>>
>>> Hello,
>>>
>>>
>>> Very interesting work, thank you for sharing this :)
>>>
>>> On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao  wrote:
>>>
>>>> What's new in v2
>>>> 
>>>> Special thanks to Jens Axboe for reporting a regression in buffered
>>>> I/O and helping test the fix.
>>>
>>> Is the discussion open?  If so, could you please give me a link?
>>
>> I wasn't on the initial post (or any of the lists it was posted to), but
>> it's on the google page reclaim list. Not sure if that is public or not.
>>
>> tldr is that I was pretty excited about this work, as buffered IO tends
>> to suck (a lot) for high throughput applications. My test case was
>> pretty simple:
>>
>> Randomly read a fast device, using 4k buffered IO, and watch what
>> happens when the page cache gets filled up. For this particular test,
>> we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec
>> with kswapd using a lot of CPU trying to keep up. That's mainline
>> behavior.
> 
> I see this exact same behaviour here, too, but I RCA'd it to
> contention between the inode and memory reclaim for the mapping
> structure that indexes the page cache. Basically the mapping tree
> lock is the contention point here - you can either be adding pages
> to the mapping during IO, or memory reclaim can be removing pages
> from the mapping, but we can't do both at once.
> 
> So we end up with kswapd spinning on the mapping tree lock like so
> when doing 1.6GB/s in 4kB buffered IO:
> 
> -   20.06% 0.00%  [kernel]   [k] kswapd   
>   
>▒
>- 20.06% kswapd
>   
>▒
>   - 20.05% balance_pgdat  
>   
>▒
>  - 20.03% shrink_node 
>   
>▒
> - 19.92% shrink_lruvec
>   
>▒
>- 19.91% shrink_inactive_list  
>   
>▒
>   - 19.22% shrink_page_list   
>   
>▒
>  - 17.51% __remove_mapping
>   
>▒
> - 14.16% _raw_spin_lock_irqsave   
>   
>▒
>- 14.14% do_raw_spin_lock  
>   
>▒
> __pv_queued_spin_lock_slowpath
>   
>▒
> - 1.56% __delete_from_page_cache  
>   
>▒
>  0.63% xas_store  
>   
>▒
> - 0.78% _raw_spin_unlock_irqrestore   
>   
>▒
>- 0.69% do_raw_spin_unlock 
>   
>▒
> __raw_callee_save___pv_queued_spin_unlock 
>   
>▒
>  - 0.82% free_unref_page_list 
>   
>▒
> - 0.72

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-13 Thread Jens Axboe

On 4/13/21 1:51 AM, SeongJae Park wrote:
> From: SeongJae Park 
> 
> Hello,
> 
> 
> Very interesting work, thank you for sharing this :)
> 
> On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao  wrote:
> 
>> What's new in v2
>> ========
>> Special thanks to Jens Axboe for reporting a regression in buffered
>> I/O and helping test the fix.
> 
> Is the discussion open?  If so, could you please give me a link?

I wasn't on the initial post (or any of the lists it was posted to), but
it's on the google page reclaim list. Not sure if that is public or not.

tldr is that I was pretty excited about this work, as buffered IO tends
to suck (a lot) for high throughput applications. My test case was
pretty simple:

Randomly read a fast device, using 4k buffered IO, and watch what
happens when the page cache gets filled up. For this particular test,
we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec
with kswapd using a lot of CPU trying to keep up. That's mainline
behavior.

The initial posting of this patchset did no better, in fact it did a bit
worse. Performance dropped to the same levels and kswapd was using as
much CPU as before, but on top of that we also got excessive swapping.
Not at a high rate, but 5-10MB/sec continually.

I had some back and forths with Yu Zhao and tested a few new revisions,
and the current series does much better in this regard. Performance
still dips a bit when page cache fills, but not nearly as much, and
kswapd is using less CPU than before.

Hope that helps,
-- 
Jens Axboe

Re: [PATCH V12 0/3] Charge loop device i/o to issuing cgroup

2021-04-12 Thread Jens Axboe

On 4/12/21 9:45 AM, Johannes Weiner wrote:
> It looks like all feedback has been addressed and there hasn't been
> any new activity on it in a while.
> 
> As per the suggestion last time [1], Andrew, Jens, could this go
> through the -mm tree to deal with the memcg conflicts?

Yep, I think that would make it the most painless for everyone.

Dan/Andrew, you can add my Acked-by to the series.

-- 
Jens Axboe

Re: mmotm 2021-04-11-20-47 uploaded (fs/io_uring.c)

2021-04-12 Thread Jens Axboe

On 4/12/21 1:21 AM, Randy Dunlap wrote:
> On 4/11/21 8:48 PM, a...@linux-foundation.org wrote:
>> The mm-of-the-moment snapshot 2021-04-11-20-47 has been uploaded to
>>
>>https://www.ozlabs.org/~akpm/mmotm/
>>
>> mmotm-readme.txt says
>>
>> README for mm-of-the-moment:
>>
>> https://www.ozlabs.org/~akpm/mmotm/
>>
>> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
>> more than once a week.
>>
>> You will need quilt to apply these patches to the latest Linus release (5.x
>> or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
>> https://ozlabs.org/~akpm/mmotm/series
>>
>> The file broken-out.tar.gz contains two datestamp files: .DATE and
>> .DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
>> followed by the base kernel version against which this patch series is to
>> be applied.
>>
>> This tree is partially included in linux-next.  To see which patches are
>> included in linux-next, consult the `series' file.  Only the patches
>> within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
>> linux-next.
> 
> on i386:
> # CONFIG_BLOCK is not set
> 
> ../fs/io_uring.c: In function ‘kiocb_done’:
> ../fs/io_uring.c:2766:7: error: implicit declaration of function 
> ‘io_resubmit_prep’; did you mean ‘io_put_req’? 
> [-Werror=implicit-function-declaration]
>if (io_resubmit_prep(req)) {

I'll apply the below to take care of that.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3a837d2b8331..aa29918944f6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2464,6 +2464,10 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
return true;
 }
 #else
+static bool io_resubmit_prep(struct io_kiocb *req)
+{
+   return false;
+}
 static bool io_rw_should_reissue(struct io_kiocb *req)
 {
return false;
@@ -2504,14 +2508,8 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, 
long res, long res2)
if (kiocb->ki_flags & IOCB_WRITE)
kiocb_end_write(req);
if (unlikely(res != req->result)) {
-   bool fail = true;
-
-#ifdef CONFIG_BLOCK
-   if (res == -EAGAIN && io_rw_should_reissue(req) &&
-   io_resubmit_prep(req))
-   fail = false;
-#endif
-   if (fail) {
+   if (!(res == -EAGAIN && io_rw_should_reissue(req) &&
+   io_resubmit_prep(req))) {
req_set_fail_links(req);
req->flags |= REQ_F_DONT_REISSUE;
}

-- 
Jens Axboe

Re: linux-next: Signed-off-by missing for commit in the block tree

2021-04-11 Thread Jens Axboe

On 4/11/21 8:26 PM, Sowjanya Komatineni wrote:
> 
> On 4/11/21 7:14 PM, Jens Axboe wrote:
>> On 4/11/21 4:34 PM, Stephen Rothwell wrote:
>>> Hi all,
>>>
>>> Commit
>>>
>>>6fa6517fe62e ("ata: ahci_tegra: call tegra_powergate_power_off only when 
>>> PM domain is not present")
>>>
>>> is missing a Signed-off-by from its author.
>> Sowjana, please reply that you're OK with me adding your Signed-off-by to 
>> that
>> patch.
>>
> Sorry I should have checked that. Thanks Jens. Sure I am OK with it.

Added, thanks.

-- 
Jens Axboe

Re: linux-next: Signed-off-by missing for commit in the block tree

2021-04-11 Thread Jens Axboe

On 4/11/21 4:34 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   6fa6517fe62e ("ata: ahci_tegra: call tegra_powergate_power_off only when PM 
> domain is not present")
> 
> is missing a Signed-off-by from its author.

Sowjana, please reply that you're OK with me adding your Signed-off-by to that
patch.

-- 
Jens Axboe

Re: [PATCH] gdrom: fix compilation error

2021-04-11 Thread Jens Axboe

On 4/11/21 7:12 PM, Bart Van Assche wrote:
> On 4/11/21 3:43 PM, Chaitanya Kulkarni wrote:
>> Use the right name for the struct request variable that removes the
>> following compilation error :-
>>
>> make --silent --keep-going --jobs=8
>> O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=sh
>> CROSS_COMPILE=sh4-linux-gnu- 'CC=sccache sh4-linux-gnu-gcc'
>> 'HOSTCC=sccache gcc'
>>
>> In file included from /builds/linux/include/linux/scatterlist.h:9,
>>  from /builds/linux/include/linux/dma-mapping.h:10,
>>  from /builds/linux/drivers/cdrom/gdrom.c:16:
>> /builds/linux/drivers/cdrom/gdrom.c: In function 'gdrom_readdisk_dma':
>> /builds/linux/drivers/cdrom/gdrom.c:586:61: error: 'rq' undeclared
>> (first use in this function)
>>   586 |  __raw_writel(page_to_phys(bio_page(req->bio)) + bio_offset(rq->bio),
>>   | ^~
> 
> How about adding a Fixes: tag?

Indeed, that's definitely missing. I've added it and applied it.

-- 
Jens Axboe

Re: [PATCH] pata_ipx4xx_cf: Fix unsigned comparison with less than zero

2021-04-09 Thread Jens Axboe

On 4/9/21 7:54 AM, angkery wrote:
> From: Junlin Yang 
> 
> The return from the call to platform_get_irq() is int, it can be
> a negative error code, however this is being assigned to an unsigned
> int variable 'irq', so making 'irq' an int, and change the position to
> keep the code format.
> 
> ./drivers/ata/pata_ixp4xx_cf.c:168:5-8:
> WARNING: Unsigned expression compared with zero: irq > 0

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH v2] ata: ahci_tegra: call tegra_powergate_power_off only when PM domain is not present

2021-04-09 Thread Jens Axboe

On 4/8/21 2:55 PM, Sowjanya Komatineni wrote:
> This patch adds check to call legacy power domain API
> tegra_powergate_power_off() only when PM domain is not present.

Applied, and added a Fixes line.

-- 
Jens Axboe

Re: [PATCH] block: Fix sys_ioprio_set(.which=IOPRIO_WHO_PGRP) task iteration

2021-04-08 Thread Jens Axboe

On 4/8/21 3:46 AM, Peter Zijlstra wrote:
> 
> do_each_pid_thread() { } while_each_pid_thread() is a double loop and
> thus break doesn't work as expected. Also, it should be used under
> tasklist_lock because otherwise we can race against change_pid() for
> PGID/SID.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH] io-wq: Fix io_wq_worker_affinity()

2021-04-08 Thread Jens Axboe

On 4/8/21 3:44 AM, Peter Zijlstra wrote:
> 
> Do not include private headers and do not frob in internals.
> 
> On top of that, while the previous code restores the affinity, it
> doesn't ensure the task actually moves there if it was running,
> leading to the fun situation that it can be observed running outside
> of its allowed mask for potentially significant time.
> 
> Use the proper API instead.

Applied, thanks Peter.

-- 
Jens Axboe

Re: [PATCH v1 1/1] ata: Drop unneeded inclusion of kernel.h in the header

2021-04-07 Thread Jens Axboe

On 4/7/21 10:27 AM, Andy Shevchenko wrote:
> On Wed, Apr 07, 2021 at 10:04:49AM -0600, Jens Axboe wrote:
>> On 4/7/21 10:03 AM, Andy Shevchenko wrote:
>>> On Wed, Apr 07, 2021 at 11:51:31PM +0800, kernel test robot wrote:
> 
> ...
> 
>>>> All errors (new ones prefixed by >>):
>>>
>>> Thanks, we need to include bits.h.
>>> (It passed my simple build, but appears I have no such driver included)
>>>
>>> Jens, I saw your message, should I send a follow up fix, or a v2?
>>
>> Let's just drop it, not worth it for the risk imho.
> 
> Does it mean I may try again in next cycle?
> 
> Because kernel.h inclusion seems to me too wrong there.

I don't mind taking it, but not on a hunch. If you send something
that has been thought about and went through full compilation, then
you can resend it.

-- 
Jens Axboe

Re: [PATCH v1 1/1] ata: Drop unneeded inclusion of kernel.h in the header

2021-04-07 Thread Jens Axboe

On 4/7/21 10:03 AM, Andy Shevchenko wrote:
> On Wed, Apr 07, 2021 at 11:51:31PM +0800, kernel test robot wrote:
>> Hi Andy,
>>
>> I love your patch! Yet something to improve:
>>
>> [auto build test ERROR on block/for-next]
>> [also build test ERROR on v5.12-rc6 next-20210407]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>>
>> url:
>> https://github.com/0day-ci/linux/commits/Andy-Shevchenko/ata-Drop-unneeded-inclusion-of-kernel-h-in-the-header/20210407-214746
>> base:   
>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git 
>> for-next
>> config: x86_64-randconfig-s021-20210407 (attached as .config)
>> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
>> reproduce:
>> # apt-get install sparse
>> # sparse version: v0.6.3-279-g6d5d9b42-dirty
>> # 
>> https://github.com/0day-ci/linux/commit/d2574103b692b4fc73f1ed36ee9e4d3324902fd9
>> git remote add linux-review https://github.com/0day-ci/linux
>> git fetch --no-tags linux-review 
>> Andy-Shevchenko/ata-Drop-unneeded-inclusion-of-kernel-h-in-the-header/20210407-214746
>> git checkout d2574103b692b4fc73f1ed36ee9e4d3324902fd9
>> # save the attached .config to linux build tree
>> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot 
>>
>> All errors (new ones prefixed by >>):
> 
> Thanks, we need to include bits.h.
> (It passed my simple build, but appears I have no such driver included)
> 
> Jens, I saw your message, should I send a follow up fix, or a v2?

Let's just drop it, not worth it for the risk imho.

-- 
Jens Axboe

Re: [PATCH v4 0/3] Add AHCI support for Tegra186

2021-04-07 Thread Jens Axboe

On 4/6/21 7:25 PM, Sowjanya Komatineni wrote:
> Re-sending dt-binding and ahci_tegra driver patches as v4 as device
> tree patches from v3 are merged but not the AHCI Tegra driver.
> 
> Missed to add Jens Axboe to mailing list in v3. Adding for v4.
> 
> This series adds support for AHCI-compliant SATA to Tegra186 SoC.
> 
> This series includes patches for
> - Converting text based dt-binding document to YAML.
> - Adding dt-bindings for Tegra186.
> - Adding Tegra186 support to Tegra AHCI driver.
> 
> Delta between patch versions:
> [v4]: Same as v3 except removed device tree patches as they are
>   merged.
> [v3]: fixed yaml example to pass dt_binding_check
> [v2]: v1 feedback related to yaml dt-binding.
>   Removed conditional reset order in yaml and updated dts files
>   to maintain same order for commonly available resets across
>   Tegra124 thru Tegra186.

Assuming the libata tree is the best way for this to go in, so I applied
it for 5.13.

-- 
Jens Axboe

Re: [PATCH v1 1/1] ata: Drop unneeded inclusion of kernel.h in the header

2021-04-07 Thread Jens Axboe

On 4/7/21 7:47 AM, Andy Shevchenko wrote:
> There is no need to have kernel.h included, I do not see any
> direct users of it in ata.h. Drop unneeded inclusion of kernel.h.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH] task_work: add helper for more targeted task_work canceling

2021-04-07 Thread Jens Axboe

On 4/6/21 11:47 AM, Oleg Nesterov wrote:
> On 04/04, Jens Axboe wrote:
>>
>> +struct callback_head *task_work_cancel_match(struct task_struct *task,
>> +bool (*match)(struct callback_head *, void *data), void *data);
> ^
> 
> Feel free to ignore, but how about "typedef task_work_match_t" ?
> 
> Either way,
> 
> Reviewed-by: Oleg Nesterov 

Thanks! I actually didn't add that deliberately, as I think they just
tend to hide it. If I see the above, I know what the callback func
looks like without having to find the definition of task_work_match_t.

-- 
Jens Axboe

Re: [syzbot] WARNING in mntput_no_expire (2)

2021-04-06 Thread Jens Axboe

On 4/6/21 8:23 AM, Al Viro wrote:
> On Tue, Apr 06, 2021 at 02:15:01PM +, Al Viro wrote:
> 
>> I'm referring to the fact that your diff is with an already modified 
>> path_lookupat()
>> _and_ those modifications have managed to introduce a bug your patch reverts.
>> No terminate_walk() paired with that path_init() failure, i.e. path_init() is
>> responsible for cleanups on its (many) failure exits...
> 
> I can't tell without seeing the variant your diff is against, but at a guess
> it had a non-trivial amount of trouble with missed rcu_read_unlock() in
> cases when path_init() fails after having done rcu_read_lock().  For trivial
> testcase, consider passing -1 for dfd, so that it would fail with -EBADF.
> Or passing 0 for dfd and "blah" for name (assuming your stdin is not a 
> directory).
> Sure, you could handle those in path_init() (or delay grabbing rcu_read_lock()
> in there, spreading it in a bunch of branches), but duplicated cleanup logics
> for a bunch of failure exits is asking for trouble.

Thanks for taking care of this Al, fwiw I'm (mostly) out on vacation.

-- 
Jens Axboe

Re: [PATCH -next] drbd: use DEFINE_SPINLOCK() for spinlock

2021-04-06 Thread Jens Axboe

On 4/6/21 6:09 AM, Huang Guobin wrote:
> From: Guobin Huang 
> 
> spinlock can be initialized automatically with DEFINE_SPINLOCK()
> rather than explicitly calling spin_lock_init().

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH 0/3] ata: Module parameter clean-ups for pata_legacy and pata_platform

2021-04-06 Thread Jens Axboe

On 3/21/21 1:55 PM, Maciej W. Rozycki wrote:
> Hi,
> 
>  In the course of looking into Christoph's recent proposal to drop legacy 
> IDE drivers I have come across a number of issues with module parameters 
> of the pata_legacy and pata_platform drivers: errors in documentation 
> present in the comment form, missing user-visible documentation, and 
> unconditional poking at ISA I/O ports in pata_legacy that isn't there with 
> the old ide-generic driver (the lack of `probe_mask' parameter).
> 
>  Here's a small patch series that addresses these issues.  Overall I 
> find the design of the pata_legacy driver's options a bit messy, e.g. the 
> `all' vs the `probe_all' parameter, and the interpretation of masks where 
> bits correspond to probed PATA locations in a particular system (rather 
> than either all known or all existing), but it's been there long enough I 
> think we have to keep it, so I merely tried to describe the current 
> semantics.  See the individual change descriptions for details.
> 
>  The changes have been run-time verified with an EISA system and a single 
> ISA PATA adapter at the usual primary I/O location.  They have also been 
> verified (mainly for the correctness of MODULE_PARM_DESC use) with an 
> x86/PC build (for pata_legacy) and a MIPS/SWARM build (for pata_platform).

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH v5 1/2] ata: ahci_brcm: Fix use of BCM7216 reset controller

2021-04-06 Thread Jens Axboe

On 4/6/21 9:16 AM, Lorenzo Pieralisi wrote:
> On Fri, Mar 12, 2021 at 03:45:54PM -0500, Jim Quinlan wrote:
>> This driver may use one of two resets controllers.  Keep them in separate
>> variables to keep things simple.  The reset controller "rescal" is shared
>> between the AHCI driver and the PCIe driver for the BrcmSTB 7216 chip.  Use
>> devm_reset_control_get_optional_shared() to handle this sharing.
>>
>> Fixes: 272ecd60a636 ("ata: ahci_brcm: BCM7216 reset is self de-asserting")
>> Fixes: c345ec6a50e9 ("ata: ahci_brcm: Support BCM7216 reset controller name")
>> Signed-off-by: Jim Quinlan 
>> Acked-by: Florian Fainelli 
>> ---
>>  drivers/ata/ahci_brcm.c | 46 -
>>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> Hi Jens,
> 
> I am happy to take this series via the PCI tree but I'd need your
> ACK on this patch, please let me know if you are OK with it.

That'd be fine:

Acked-by: Jens Axboe 

-- 
Jens Axboe

Re: [PATCH V2] ata: ahci: ceva: Updated code by using dev_err_probe()

2021-04-06 Thread Jens Axboe

On 3/5/21 2:10 AM, Piyush Mehta wrote:
> Updated code with already prepared dev_err_probe(). It reduces code size
> and simplifies EPROBE_DEFER handling.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH 00/11] Rid W=1 warnings from Block

2021-04-06 Thread Jens Axboe

On 3/12/21 3:55 AM, Lee Jones wrote:
> This set is part of a larger effort attempting to clean-up W=1
> kernel builds, which are currently overwhelmingly riddled with
> niggly little warnings.

Applied 2-11, 1 is already in the my tree.

-- 
Jens Axboe

[PATCH] task_work: add helper for more targeted task_work canceling

2021-04-04 Thread Jens Axboe

The only exported helper we have right now is task_work_cancel(), which
cancels any task_work from a given task where func matches the queued
work item. This is a bit too coarse for some use cases. Add a
task_work_cancel_match() that allows to more specifically target
individual work items outside of purely the callback function used.

task_work_cancel() can be trivially implemented on top of that, hence do
so.

Signed-off-by: Jens Axboe 

---

I've got a patch on top of this that uses task_work_cancel_match(), but
sending this one out separately. There should be no functional changes
in this patch, it just allows someone to build func == func && data ==
data matches on top.

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 0d848a1e9e62..5b8a93f288bb 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -22,6 +22,8 @@ enum task_work_notify_mode {
 int task_work_add(struct task_struct *task, struct callback_head *twork,
enum task_work_notify_mode mode);
 
+struct callback_head *task_work_cancel_match(struct task_struct *task,
+   bool (*match)(struct callback_head *, void *data), void *data);
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
 void task_work_run(void);
 
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 9cde961875c0..e9316198c64b 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -59,18 +59,17 @@ int task_work_add(struct task_struct *task, struct 
callback_head *work,
 }
 
 /**
- * task_work_cancel - cancel a pending work added by task_work_add()
+ * task_work_cancel_match - cancel a pending work added by task_work_add()
  * @task: the task which should execute the work
- * @func: identifies the work to remove
- *
- * Find the last queued pending work with ->func == @func and remove
- * it from queue.
+ * @match: match function to call
  *
  * RETURNS:
  * The found work or NULL if not found.
  */
 struct callback_head *
-task_work_cancel(struct task_struct *task, task_work_func_t func)
+task_work_cancel_match(struct task_struct *task,
+  bool (*match)(struct callback_head *, void *data),
+  void *data)
 {
struct callback_head **pprev = >task_works;
struct callback_head *work;
@@ -86,7 +85,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t 
func)
 */
raw_spin_lock_irqsave(>pi_lock, flags);
while ((work = READ_ONCE(*pprev))) {
-   if (work->func != func)
+   if (!match(work, data))
pprev = >next;
else if (cmpxchg(pprev, work, work->next) == work)
break;
@@ -96,6 +95,28 @@ task_work_cancel(struct task_struct *task, task_work_func_t 
func)
return work;
 }
 
+static bool task_work_func_match(struct callback_head *cb, void *data)
+{
+   return cb->func == data;
+}
+
+/**
+ * task_work_cancel - cancel a pending work added by task_work_add()
+ * @task: the task which should execute the work
+ * @func: identifies the work to remove
+ *
+ * Find the last queued pending work with ->func == @func and remove
+ * it from queue.
+ *
+ * RETURNS:
+ * The found work or NULL if not found.
+ */
+struct callback_head *
+task_work_cancel(struct task_struct *task, task_work_func_t func)
+{
+   return task_work_cancel_match(task, task_work_func_match, func);
+}
+
 /**
  * task_work_run - execute the works added by task_work_add()
  *

-- 
Jens Axboe

Re: [PATCH] block: don't ignore REQ_NOWAIT for direct IO

2021-04-02 Thread Jens Axboe

On 11/20/20 10:10 AM, Pavel Begunkov wrote:
> io_uring's direct nowait requests end up waiting on io_schedule() in
> sbitmap, that's seems to be so because blkdev_direct_IO() fails to
> propagate IOCB_NOWAIT to a bio and hence to blk-mq.

Thanks, applied. This slipped through the cracks, and I didn't notice
until I went and directly tested some of this...

iomap suffers from the same issue, fwiw.

-- 
Jens Axboe

Re: [syzbot] WARNING in mntput_no_expire (2)

2021-04-01 Thread Jens Axboe

On 4/1/21 9:45 AM, Christian Brauner wrote:
> On Thu, Apr 01, 2021 at 02:09:20AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:d19cc4bf Merge tag 'trace-v5.12-rc5' of git://git.kernel.o..
>> git tree:   upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1018f281d0
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=d1a3d65a48dbd1bc
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c88a7030da47945a3cc3
>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12f50d11d0
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=137694a1d0
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+c88a7030da47945a3...@syzkaller.appspotmail.com
>>
>> [ cut here ]
>> WARNING: CPU: 1 PID: 8409 at fs/namespace.c:1186 
>> mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186
>> Modules linked in:
>> CPU: 1 PID: 8409 Comm: syz-executor035 Not tainted 5.12.0-rc5-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> Google 01/01/2011
>> RIP: 0010:mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186
>> Code: ff 48 c7 c2 e0 cb 78 89 be c2 02 00 00 48 c7 c7 a0 cb 78 89 c6 05 e5 
>> 6d e5 0b 01 e8 ff e1 f6 06 e9 3f fd ff ff e8 c6 a5 a8 ff <0f> 0b e9 fc fc ff 
>> ff e8 ba a5 a8 ff e8 55 dc 94 ff 31 ff 89 c5 89
>> RSP: 0018:c9000165fc78 EFLAGS: 00010293
>> RAX:  RBX: 1920002cbf95 RCX: 
>> RDX: 88802072d4c0 RSI: 81cb4b8a RDI: 0003
>> RBP: 888011656900 R08:  R09: 8fa978af
>> R10: 81cb4884 R11:  R12: 0008
>> R13: c9000165fcc8 R14: dc00 R15: 
>> FS:  () GS:8880b9d0() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 55a722053160 CR3: 0bc8e000 CR4: 001506e0
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: fffe0ff0 DR7: 0400
>> Call Trace:
>>  mntput fs/namespace.c:1232 [inline]
>>  cleanup_mnt+0x523/0x530 fs/namespace.c:1132
>>  task_work_run+0xdd/0x1a0 kernel/task_work.c:140
>>  exit_task_work include/linux/task_work.h:30 [inline]
>>  do_exit+0xbfc/0x2a60 kernel/exit.c:825
>>  do_group_exit+0x125/0x310 kernel/exit.c:922
>>  __do_sys_exit_group kernel/exit.c:933 [inline]
>>  __se_sys_exit_group kernel/exit.c:931 [inline]
>>  __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
>>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x446af9
>> Code: Unable to access opcode bytes at RIP 0x446acf.
>> RSP: 002b:005dfe48 EFLAGS: 0246 ORIG_RAX: 00e7
>> RAX: ffda RBX: 004ce450 RCX: 00446af9
>> RDX: 003c RSI: 00e7 RDI: 0001
>> RBP: 0001 R08: ffbc R09: 
>> R10:  R11: 0246 R12: 004ce450
>> R13: 0001 R14:  R15: 0001
> 
> [+Cc Jens + io_uring]
> 
> Hm, this reproducer uses io_uring and it's the io_uring_enter() that
> triggers this reliably. With this reproducer I've managed to reproduce
> the issue on v5.12-rc4, and v5.12-rc3, v5.12-rc2 and v5.12-rc1.
> It's not reproducible at
> 9820b4dca0f9c6b7ab8b4307286cdace171b724d
> which is the commit immediately before the first v5.12 io_uring merge.
> It's first reproducible with the first io_uring merge for v5.12, i.e.
> 5bbb336ba75d95611a7b9456355b48705016bdb1

Thanks, that's good info. I'll take a look at it and see if I can
reproduce.

-- 
Jens Axboe

Re: [block] 4c95131e3c: kernel_BUG_at_block/bio.c

2021-03-31 Thread Jens Axboe

On 3/31/21 2:13 AM, kernel test robot wrote:
> 
> 
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 4c95131e3cb8bbb08c3b370b4ae3611874b48be0 ("block: enable use of bio 
> allocation cache")
> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git 
> io_uring-bio-cache
> 
> 
> in testcase: ocfs2test
> version: ocfs2test-x86_64-d802bf7-1_20210329
> with following parameters:
> 
>   disk: 1HDD
>   test: test-reserve_space
>   ucode: 0x21
> 
> 
> 
> on test machine: 8 threads Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz with 16G 
> memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 
> [   47.859997] kernel BUG at block/bio.c:628!
> [   47.860086] invalid opcode:  [#1] SMP PTI
> [   47.860167] CPU: 5 PID: 5042 Comm: mkfs.ocfs2 Not tainted 
> 5.12.0-rc5-00060-g4c95131e3cb8 #1
> [   47.860317] Hardware name:  /DZ77BH-55K, BIOS 
> BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> [   47.860445] RIP: 0010:__bio_put (kbuild/src/consumer/block/bio.c:628 
> kbuild/src/consumer/block/bio.c:623) 
> [ 47.860514] Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f6 47 14 40 
> 74 0f 8b 47 64 85 c0 74 0e f0 ff 4f 64 0f 94 c0 c3 b8 01 00 00 00 c3 <0f> 0b 
> 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00
> All code
> 
>0: 66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
>7: 00 00 00 
>a: 0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
>f: f6 47 14 40 testb  $0x40,0x14(%rdi)
>   13: 74 0f   je 0x24
>   15: 8b 47 64mov0x64(%rdi),%eax
>   18: 85 c0   test   %eax,%eax
>   1a: 74 0e   je 0x2a
>   1c: f0 ff 4f 64 lock decl 0x64(%rdi)
>   20: 0f 94 c0sete   %al
>   23: c3  retq   
>   24: b8 01 00 00 00  mov$0x1,%eax
>   29: c3  retq   
>   2a:*0f 0b   ud2 <-- trapping instruction
>   2c: 66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
>   33: 00 00 00 00 
>   37: 66  data16
>   38: 66  data16
>   39: 2e  cs
>   3a: 0f  .byte 0xf
>   3b: 1f  (bad)  
>   3c: 84 00   test   %al,(%rax)
>   ...
> 
> Code starting with the faulting instruction
> ===
>0: 0f 0b   ud2
>2: 66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
>9: 00 00 00 00 
>d: 66  data16
>e: 66  data16
>f: 2e  cs
>   10: 0f  .byte 0xf
>   11: 1f  (bad)  
>   12: 84 00   test   %al,(%rax)
>   ...
> [   47.860801] RSP: 0018:c9000d42fc88 EFLAGS: 00010246
> [   47.860885] RAX:  RBX: c9000d42fe80 RCX: 
> 0002
> [   47.860998] RDX: 88841a9a4f00 RSI: 88841a9a4f00 RDI: 
> 88841b721318
> [   47.861109] RBP: 88841b721318 R08: 88841b721318 R09: 
> 0002ab30
> [   47.861221] R10: 02c8 R11: 88841f36a144 R12: 
> 88841b721318
> [   47.861352] R13: 81395875 R14:  R15: 
> 88841a9a4f00
> [   47.861464] FS:  7f5c66517280() GS:88841f34() 
> knlGS:
> [   47.861590] CS:  0010 DS:  ES:  CR0: 80050033
> [   47.861681] CR2: 55d13280c000 CR3: 000414e82006 CR4: 
> 001706e0
> [   47.861793] Call Trace:
> [   47.861836] bio_put (kbuild/src/consumer/block/bio.c:646) 
> [   47.861888] __blkdev_direct_IO (kbuild/src/consumer/fs/block_dev.c:508) 
> [   47.861959] generic_file_direct_write 
> (kbuild/src/consumer/mm/filemap.c:3473) 
> [   47.862037] __generic_file_write_iter 
> (kbuild/src/consumer/mm/filemap.c:3661) 
> [   47.862113] blkdev_write_iter (kbuild/src/consumer/fs/block_dev.c:1719 
> kbuild/src/consumer/fs/block_dev.c:1693) 
> [   47.862181] new_sync_write (kbuild/src/consumer/fs/read_write.c:519 
> (discriminator 1)) 
> [   47.862265] vfs_write (kbuild/src/consumer/fs/read_write.c:605) 
> [   47.862322] ksys_pwrite64 (kbuild/src/consumer/include/linux/file.h:45 
> kbuild/src/consumer/fs/read_write.c:713) 
> [   47.862381] do_syscall_64 (kbuild/src/consumer/arch/x86/entry/common.c:46) 
> [   47.862443] entry_SYSCALL_64_after_hwframe 
> (kbuild/src/consumer/arch/x86/entry/entry_64.S:11

Re: [PATCH v1] ata: ahci: Disable SXS for Hisilicon Kunpeng920

2021-03-31 Thread Jens Axboe

On 3/15/21 5:29 AM, luojiaxing wrote:
> 
> On 2021/3/12 22:27, Jens Axboe wrote:
>> Is this controller arm exclusive?
> 
> 
> Yes, our SoC is base on ARM64 only.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH] sata_mv: add IRQ checks

2021-03-30 Thread Jens Axboe

On 3/27/21 3:13 PM, Sergey Shtylyov wrote:
> The function mv_platform_probe() neglects to check the results of the
> calls to platform_get_irq() and irq_of_parse_and_map() and blithely
> passes them to ata_host_activate() -- while the latter only checks
> for IRQ0 (treating it as a polling mode indicattion) and passes the
> negative values to devm_request_irq() causing it to fail as it takes
> unsigned values for the IRQ #...
> 
> Add to mv_platform_probe() the proper IRQ checks to pass the positive IRQ
> #s to ata_host_activate(), propagate upstream the negative error codes,
> and override the IRQ0 with -EINVAL (as we don't want the polling mode).

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH 00/15] [Set 2] Rid W=1 warnings from ATA

2021-03-30 Thread Jens Axboe

On 3/18/21 2:51 AM, Lee Jones wrote:
> This set is part of a larger effort attempting to clean-up W=1
> kernel builds, which are currently overwhelmingly riddled with
> niggly little warnings.
> 
> This is set 2 out of 2 sets required.

Applied, thanks.

-- 
Jens Axboe

Re: [syzbot] KASAN: use-after-free Read in create_worker_cb

2021-03-29 Thread Jens Axboe

:22 [inline]
>  do_exit+0x299/0x2a60 kernel/exit.c:780
>  do_group_exit+0x125/0x310 kernel/exit.c:922
>  __do_sys_exit_group kernel/exit.c:933 [inline]
>  __se_sys_exit_group kernel/exit.c:931 [inline]
>  __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> The buggy address belongs to the object at 88801bf15000
>  which belongs to the cache kmalloc-1k of size 1024
> The buggy address is located 232 bytes inside of
>  1024-byte region [88801bf15000, 88801bf15400)
> The buggy address belongs to the page:
> page:ea6fc400 refcount:1 mapcount:0 mapping: 
> index:0x88801bf15800 pfn:0x1bf10
> head:ea6fc400 order:3 compound_mapcount:0 compound_pincount:0
> flags: 0xfff0010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff0010200 ea4cd808 888010840888 888010841dc0
> raw: 88801bf15800 001d 0001 
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  88801bf14f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  88801bf15000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> 88801bf15080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ^
>  88801bf15100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  88801bf15180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==

#syz test: git://git.kernel.dk/linux-block for-5.13/io_uring

-- 
Jens Axboe

Re: [PATCH -next 1/2] mtip32xx: use DEFINE_SPINLOCK() for spinlock

2021-03-29 Thread Jens Axboe

On 3/29/21 3:53 AM, Shixin Liu wrote:
> spinlock can be initialized automatically with DEFINE_SPINLOCK()
> rather than explicitly calling spin_lock_init().

Applied both, thanks.

-- 
Jens Axboe

Re: [syzbot] WARNING: still has locks held in io_sq_thread

2021-03-29 Thread Jens Axboe

On 3/29/21 7:29 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering 
> an issue:
> WARNING in kvm_wait
> 
> [ cut here ]
> raw_local_irq_restore() called with IRQs enabled
> WARNING: CPU: 1 PID: 5134 at kernel/locking/irqflag-debug.c:10 
> warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
> Modules linked in:
> CPU: 1 PID: 5134 Comm: syz-executor.2 Not tainted 5.12.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
> Code: bf ff cc cc cc cc cc cc cc cc cc cc cc 80 3d 65 c2 0f 04 00 74 01 c3 48 
> c7 c7 a0 7b 6b 89 c6 05 54 c2 0f 04 01 e8 65 19 bf ff <0f> 0b c3 48 39 77 10 
> 0f 84 97 00 00 00 66 f7 47 22 f0 ff 74 4b 48
> RSP: 0018:c90002f5f9c0 EFLAGS: 00010286
> RAX:  RBX: 888023a7d040 RCX: 
> RDX: 88801bbcc2c0 RSI: 815b7375 RDI: f520005ebf2a
> RBP: 0200 R08:  R09: 
> R10: 815b00de R11:  R12: 0003
> R13: ed100474fa08 R14: 0001 R15: 8880b9f36000
> FS:  0293e400() GS:8880b9f0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7ffd20e04f88 CR3: 116b8000 CR4: 001506e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  kvm_wait arch/x86/kernel/kvm.c:860 [inline]
>  kvm_wait+0xc9/0xe0 arch/x86/kernel/kvm.c:837
>  pv_wait arch/x86/include/asm/paravirt.h:564 [inline]
>  pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline]
>  __pv_queued_spin_lock_slowpath+0x8b8/0xb40 kernel/locking/qspinlock.c:508
>  pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:554 [inline]
>  queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
>  queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
>  do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:113
>  spin_lock include/linux/spinlock.h:354 [inline]
>  ext4_lock_group fs/ext4/ext4.h:3383 [inline]
>  __ext4_new_inode+0x384f/0x5570 fs/ext4/ialloc.c:1188
>  ext4_symlink+0x489/0xd50 fs/ext4/namei.c:3347
>  vfs_symlink fs/namei.c:4176 [inline]
>  vfs_symlink+0x10f/0x270 fs/namei.c:4161
>  do_symlinkat+0x27a/0x300 fs/namei.c:4206
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xae

Same one that keeps happening, it's not related.

-- 
Jens Axboe

Re: [syzbot] WARNING: still has locks held in io_sq_thread

2021-03-29 Thread Jens Axboe

On 3/29/21 1:34 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:81b1d39f Merge tag '5.12-rc4-smb3' of git://git.samba.org/..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10fcce62d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d4e9addca54f3b44
> dashboard link: https://syzkaller.appspot.com/bug?extid=796d767eb376810256f5
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17d06ddcd0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=150764bed0

#syz test: git://git.kernel.dk/linux-block io_uring-5.12

-- 
Jens Axboe

Re: [PATCH 2/7] io_uring: handle signals for IO threads like a normal thread

2021-03-27 Thread Jens Axboe

On 3/27/21 11:40 AM, Eric W. Biederman wrote:
> Jens Axboe  writes:
> 
>> On 3/26/21 4:38 PM, Jens Axboe wrote:
>>> OK good point, and follows the same logic even if it won't make a
>>> difference in my case. I'll make the change.
>>
>> Made the suggested edits and ran the quick tests and the KILL/STOP
>> testing, and no ill effects observed. Kicked off the longer runs now.
>>
>> Not a huge amount of changes from the posted series, but please peruse
>> here if you want to double check:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-5.12
>>
>> And diff against v2 posted is below. Thanks!
> 
> That looks good.  Thanks.
> 
> Acked-by: "Eric W. Biederman" 

Thanks Eric, amended to add that.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-27 Thread Jens Axboe

On 3/26/21 7:46 PM, Stefan Metzmacher wrote:
> 
> Hi Jens,
> 
>> root@ub1704-166:~# LANG=C gdb --pid 1320
>> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.
>> Type "show copying" and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu".
>> Type "show configuration" for configuration details.
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>.
>> Find the GDB manual and other documentation resources online at:
>> <http://www.gnu.org/software/gdb/documentation/>.
>>
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word".
>> Attaching to process 1320
>> [New LWP 1321]
>> [New LWP 1322]
>>
>> warning: Selected architecture i386:x86-64 is not compatible with reported 
>> target architecture i386
>>
>> warning: Architecture rejected target-supplied description
>> syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>> 38  ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or 
>> directory.
>> (gdb)
> 
> Ok, the following makes gdb happy again:
> 
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -163,6 +163,8 @@ int copy_thread(unsigned long clone_flags, unsigned long 
> sp, unsigned long arg,
> /* Kernel thread ? */
> if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> memset(childregs, 0, sizeof(struct pt_regs));
> +   if (p->flags & PF_IO_WORKER)
> +   childregs->cs = current_pt_regs()->cs;
> kthread_frame_init(frame, sp, arg);
> return 0;
> }

Confirmed, it stops complaining about the arch at that point.

> I'm wondering if we should decouple the PF_KTHREAD and PF_IO_WORKER
> cases even more and keep as much of a userspace-like copy_thread as
> possible.

Probably makes sense, the only thing they really share is the func+arg
setup. Hence PF_IO_WORKER threads likely just use the rest of the init,
where it doesn't conflict with the frame setup.

-- 
Jens Axboe

Re: [PATCH] livepatch: Replace the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure

2021-03-26 Thread Jens Axboe

On 3/26/21 8:30 AM, Miroslav Benes wrote:
> Livepatch sends a fake signal to all remaining blocking tasks of a
> running transition after a set period of time. It uses TIF_SIGPENDING
> flag for the purpose. Commit 12db8b690010 ("entry: Add support for
> TIF_NOTIFY_SIGNAL") added a generic infrastructure to achieve the same.
> Replace our bespoke solution with the generic one.
> 
> Signed-off-by: Miroslav Benes 

Looks good to me.

Reviewed-by: Jens Axboe 

-- 
Jens Axboe

Re: [PATCH] io_uring: remove unsued assignment to pointer io

2021-03-26 Thread Jens Axboe

On 3/26/21 1:52 PM, Colin King wrote:
> From: Colin Ian King 
> 
> There is an assignment to io that is never read after the assignment,
> the assignment is redundant and can be removed.

Thanks, applied.

-- 
Jens Axboe

Re: [PATCH 2/7] io_uring: handle signals for IO threads like a normal thread

2021-03-26 Thread Jens Axboe

On 3/26/21 4:38 PM, Jens Axboe wrote:
> On 3/26/21 4:35 PM, Eric W. Biederman wrote:
>> Jens Axboe  writes:
>>
>>> On 3/26/21 4:23 PM, Eric W. Biederman wrote:
>>>> Jens Axboe  writes:
>>>>
>>>>> On 3/26/21 2:29 PM, Eric W. Biederman wrote:
>>>>>> Jens Axboe  writes:
>>>>>>
>>>>>>> We go through various hoops to disallow signals for the IO threads, but
>>>>>>> there's really no reason why we cannot just allow them. The IO threads
>>>>>>> never return to userspace like a normal thread, and hence don't go 
>>>>>>> through
>>>>>>> normal signal processing. Instead, just check for a pending signal as 
>>>>>>> part
>>>>>>> of the work loop, and call get_signal() to handle it for us if anything
>>>>>>> is pending.
>>>>>>>
>>>>>>> With that, we can support receiving signals, including special ones like
>>>>>>> SIGSTOP.
>>>>>>>
>>>>>>> Signed-off-by: Jens Axboe 
>>>>>>> ---
>>>>>>>  fs/io-wq.c| 24 +---
>>>>>>>  fs/io_uring.c | 12 
>>>>>>>  2 files changed, 25 insertions(+), 11 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>>>>>> index b7c1fa932cb3..3e2f059a1737 100644
>>>>>>> --- a/fs/io-wq.c
>>>>>>> +++ b/fs/io-wq.c
>>>>>>> @@ -16,7 +16,6 @@
>>>>>>>  #include 
>>>>>>>  #include 
>>>>>>>  #include 
>>>>>>> -#include 
>>>>>>>  
>>>>>>>  #include "../kernel/sched/sched.h"
>>>>>>>  #include "io-wq.h"
>>>>>>> @@ -503,10 +502,16 @@ static int io_wqe_worker(void *data)
>>>>>>> if (io_flush_signals())
>>>>>>> continue;
>>>>>>> ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
>>>>>>> -   if (try_to_freeze() || ret)
>>>>>>> +   if (signal_pending(current)) {
>>>>>>> +   struct ksignal ksig;
>>>>>>> +
>>>>>>> +   if (fatal_signal_pending(current))
>>>>>>> +   break;
>>>>>>> +   if (get_signal())
>>>>>>> +   continue;
>>>>>> ^^
>>>>>>
>>>>>> That is wrong.  You are promising to deliver a signal to signal
>>>>>> handler and them simply discarding it.  Perhaps:
>>>>>>
>>>>>>  if (!get_signal())
>>>>>>  continue;
>>>>>>  WARN_ON(!sig_kernel_stop(ksig->sig));
>>>>>> break;
>>>>>
>>>>> Thanks, updated.
>>>>
>>>> Gah.  Kill the WARN_ON.
>>>>
>>>> I was thinking "WARN_ON(!sig_kernel_fatal(ksig->sig));"
>>>> The function sig_kernel_fatal does not exist.
>>>>
>>>> Fatal is the state that is left when a signal is neither
>>>> ignored nor a stop signal, and does not have a handler.
>>>>
>>>> The rest of the logic still works.
>>>
>>> I've just come to the same conclusion myself after testing it.
>>> Of the 3 cases, most of them can do the continue, but doesn't
>>> really matter with the way the loop is structured. Anyway, looks
>>> like this now:
>>
>> This idiom in the code:
>>> +   if (signal_pending(current)) {
>>> +   struct ksignal ksig;
>>> +
>>> +   if (fatal_signal_pending(current))
>>> +   break;
>>> +   if (!get_signal())
>>> +   continue;
>>>  }
>>
>> Needs to be:
>>> +   if (signal_pending(current)) {
>>> +   struct ksignal ksig;
>>> +
>>> +   if (!get_signal())
>>> +   continue;
>>> +

Re: [PATCH 2/7] io_uring: handle signals for IO threads like a normal thread

2021-03-26 Thread Jens Axboe

On 3/26/21 4:35 PM, Eric W. Biederman wrote:
> Jens Axboe  writes:
> 
>> On 3/26/21 4:23 PM, Eric W. Biederman wrote:
>>> Jens Axboe  writes:
>>>
>>>> On 3/26/21 2:29 PM, Eric W. Biederman wrote:
>>>>> Jens Axboe  writes:
>>>>>
>>>>>> We go through various hoops to disallow signals for the IO threads, but
>>>>>> there's really no reason why we cannot just allow them. The IO threads
>>>>>> never return to userspace like a normal thread, and hence don't go 
>>>>>> through
>>>>>> normal signal processing. Instead, just check for a pending signal as 
>>>>>> part
>>>>>> of the work loop, and call get_signal() to handle it for us if anything
>>>>>> is pending.
>>>>>>
>>>>>> With that, we can support receiving signals, including special ones like
>>>>>> SIGSTOP.
>>>>>>
>>>>>> Signed-off-by: Jens Axboe 
>>>>>> ---
>>>>>>  fs/io-wq.c| 24 +---
>>>>>>  fs/io_uring.c | 12 
>>>>>>  2 files changed, 25 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>>>>> index b7c1fa932cb3..3e2f059a1737 100644
>>>>>> --- a/fs/io-wq.c
>>>>>> +++ b/fs/io-wq.c
>>>>>> @@ -16,7 +16,6 @@
>>>>>>  #include 
>>>>>>  #include 
>>>>>>  #include 
>>>>>> -#include 
>>>>>>  
>>>>>>  #include "../kernel/sched/sched.h"
>>>>>>  #include "io-wq.h"
>>>>>> @@ -503,10 +502,16 @@ static int io_wqe_worker(void *data)
>>>>>>  if (io_flush_signals())
>>>>>>  continue;
>>>>>>  ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
>>>>>> -if (try_to_freeze() || ret)
>>>>>> +if (signal_pending(current)) {
>>>>>> +struct ksignal ksig;
>>>>>> +
>>>>>> +if (fatal_signal_pending(current))
>>>>>> +break;
>>>>>> +if (get_signal())
>>>>>> +continue;
>>>>> ^^
>>>>>
>>>>> That is wrong.  You are promising to deliver a signal to signal
>>>>> handler and them simply discarding it.  Perhaps:
>>>>>
>>>>>   if (!get_signal())
>>>>>   continue;
>>>>>   WARN_ON(!sig_kernel_stop(ksig->sig));
>>>>> break;
>>>>
>>>> Thanks, updated.
>>>
>>> Gah.  Kill the WARN_ON.
>>>
>>> I was thinking "WARN_ON(!sig_kernel_fatal(ksig->sig));"
>>> The function sig_kernel_fatal does not exist.
>>>
>>> Fatal is the state that is left when a signal is neither
>>> ignored nor a stop signal, and does not have a handler.
>>>
>>> The rest of the logic still works.
>>
>> I've just come to the same conclusion myself after testing it.
>> Of the 3 cases, most of them can do the continue, but doesn't
>> really matter with the way the loop is structured. Anyway, looks
>> like this now:
> 
> This idiom in the code:
>> +if (signal_pending(current)) {
>> +struct ksignal ksig;
>> +
>> +if (fatal_signal_pending(current))
>> +break;
>> +if (!get_signal())
>> +continue;
>>  }
> 
> Needs to be:
>> +if (signal_pending(current)) {
>> +struct ksignal ksig;
>> +
>> +if (!get_signal())
>> +continue;
>> +break;
>>  }
> 
> Because any signal returned from get_signal is fatal in this case.
> It might make sense to "WARN_ON(ksig->ka.sa.sa_handler != SIG_DFL)".
> As the io workers don't handle that case.
> 
> It won't happen because you have everything blocked.
>
> The extra fatal_signal_pending(current) logic is just confusing in this
> case.

OK good point, and follows the same logic even if it won't make a
difference in my case. I'll make the change.

-- 
Jens Axboe

Re: [PATCH 1/1] scripts/spelling.txt: add entries for recent discoveries

2021-03-26 Thread Jens Axboe

On 3/26/21 1:22 PM, Tom Saeger wrote:
> @@ -1153,6 +1170,7 @@ quering||querying
>  queus||queues
>  randomally||randomly
>  raoming||roaming
> +readded||read
>  reasearcher||researcher
>  reasearchers||researchers
>  reasearch||research

davej brought up a good point that this one was actually re-added, which
does make sense. So don't think that one should be added to the list.

-- 
Jens Axboe

Re: [PATCH 2/7] io_uring: handle signals for IO threads like a normal thread

2021-03-26 Thread Jens Axboe

On 3/26/21 4:23 PM, Eric W. Biederman wrote:
> Jens Axboe  writes:
> 
>> On 3/26/21 2:29 PM, Eric W. Biederman wrote:
>>> Jens Axboe  writes:
>>>
>>>> We go through various hoops to disallow signals for the IO threads, but
>>>> there's really no reason why we cannot just allow them. The IO threads
>>>> never return to userspace like a normal thread, and hence don't go through
>>>> normal signal processing. Instead, just check for a pending signal as part
>>>> of the work loop, and call get_signal() to handle it for us if anything
>>>> is pending.
>>>>
>>>> With that, we can support receiving signals, including special ones like
>>>> SIGSTOP.
>>>>
>>>> Signed-off-by: Jens Axboe 
>>>> ---
>>>>  fs/io-wq.c| 24 +---
>>>>  fs/io_uring.c | 12 
>>>>  2 files changed, 25 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>>> index b7c1fa932cb3..3e2f059a1737 100644
>>>> --- a/fs/io-wq.c
>>>> +++ b/fs/io-wq.c
>>>> @@ -16,7 +16,6 @@
>>>>  #include 
>>>>  #include 
>>>>  #include 
>>>> -#include 
>>>>  
>>>>  #include "../kernel/sched/sched.h"
>>>>  #include "io-wq.h"
>>>> @@ -503,10 +502,16 @@ static int io_wqe_worker(void *data)
>>>>if (io_flush_signals())
>>>>continue;
>>>>ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
>>>> -  if (try_to_freeze() || ret)
>>>> +  if (signal_pending(current)) {
>>>> +  struct ksignal ksig;
>>>> +
>>>> +  if (fatal_signal_pending(current))
>>>> +  break;
>>>> +  if (get_signal())
>>>> +  continue;
>>> ^^
>>>
>>> That is wrong.  You are promising to deliver a signal to signal
>>> handler and them simply discarding it.  Perhaps:
>>>
>>> if (!get_signal())
>>> continue;
>>> WARN_ON(!sig_kernel_stop(ksig->sig));
>>>     break;
>>
>> Thanks, updated.
> 
> Gah.  Kill the WARN_ON.
> 
> I was thinking "WARN_ON(!sig_kernel_fatal(ksig->sig));"
> The function sig_kernel_fatal does not exist.
> 
> Fatal is the state that is left when a signal is neither
> ignored nor a stop signal, and does not have a handler.
> 
> The rest of the logic still works.

I've just come to the same conclusion myself after testing it.
Of the 3 cases, most of them can do the continue, but doesn't
really matter with the way the loop is structured. Anyway, looks
like this now:


commit 769186e30cd437f5e1a000e7cf00286948779da4
Author: Jens Axboe 
Date:   Thu Mar 25 18:16:06 2021 -0600

io_uring: handle signals for IO threads like a normal thread

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.

Signed-off-by: Jens Axboe 

diff --git a/fs/io-wq.c b/fs/io-wq.c
index b7c1fa932cb3..7e5970c8b0be 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "../kernel/sched/sched.h"
 #include "io-wq.h"
@@ -503,10 +502,16 @@ static int io_wqe_worker(void *data)
if (io_flush_signals())
continue;
ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
-   if (try_to_freeze() || ret)
+   if (signal_pending(current)) {
+   struct ksignal ksig;
+
+   if (fatal_signal_pending(current))
+   break;
+   if (!get_signal())
+   continue;
+   }
+   if (ret)
continue;
-   if (fatal_signal_pending(current))
-   break;
/* timed out, exit unless we're the fixed worker */
if

Re: [PATCH 2/7] io_uring: handle signals for IO threads like a normal thread

2021-03-26 Thread Jens Axboe

On 3/26/21 2:29 PM, Eric W. Biederman wrote:
> Jens Axboe  writes:
> 
>> We go through various hoops to disallow signals for the IO threads, but
>> there's really no reason why we cannot just allow them. The IO threads
>> never return to userspace like a normal thread, and hence don't go through
>> normal signal processing. Instead, just check for a pending signal as part
>> of the work loop, and call get_signal() to handle it for us if anything
>> is pending.
>>
>> With that, we can support receiving signals, including special ones like
>> SIGSTOP.
>>
>> Signed-off-by: Jens Axboe 
>> ---
>>  fs/io-wq.c| 24 +---
>>  fs/io_uring.c | 12 
>>  2 files changed, 25 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>> index b7c1fa932cb3..3e2f059a1737 100644
>> --- a/fs/io-wq.c
>> +++ b/fs/io-wq.c
>> @@ -16,7 +16,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  
>>  #include "../kernel/sched/sched.h"
>>  #include "io-wq.h"
>> @@ -503,10 +502,16 @@ static int io_wqe_worker(void *data)
>>  if (io_flush_signals())
>>  continue;
>>  ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
>> -if (try_to_freeze() || ret)
>> +if (signal_pending(current)) {
>> +struct ksignal ksig;
>> +
>> +if (fatal_signal_pending(current))
>> +break;
>> +if (get_signal())
>> +continue;
> ^^
> 
> That is wrong.  You are promising to deliver a signal to signal
> handler and them simply discarding it.  Perhaps:
> 
>   if (!get_signal())
>   continue;
>   WARN_ON(!sig_kernel_stop(ksig->sig));
> break;

Thanks, updated.

-- 
Jens Axboe

Re: [PATCH 1/7] kernel: don't call do_exit() for PF_IO_WORKER threads

2021-03-26 Thread Jens Axboe

On 3/26/21 2:43 PM, Eric W. Biederman wrote:
> Jens Axboe  writes:
> 
>> Right now we're never calling get_signal() from PF_IO_WORKER threads, but
>> in preparation for doing so, don't handle a fatal signal for them. The
>> workers have state they need to cleanup when exiting, and they don't do
>> coredumps, so just return instead of performing either a dump or calling
>> do_exit() on their behalf. The threads themselves will detect a fatal
>> signal and do proper shutdown.
>>
>> Signed-off-by: Jens Axboe 
>> ---
>>  kernel/signal.c | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index f2a1b898da29..e3e1b8fbfe8a 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2756,6 +2756,15 @@ bool get_signal(struct ksignal *ksig)
>>   */
>>  current->flags |= PF_SIGNALED;
>>  
>> +/*
>> + * PF_IO_WORKER threads will catch and exit on fatal signals
>> + * themselves. They have cleanup that must be performed, so
>> + * we cannot call do_exit() on their behalf. coredumps also
>> + * do not apply to them.
>> + */
>> +if (current->flags & PF_IO_WORKER)
>> +return false;
>> +
> 
> Returning false when get_signal needs the caller to handle a signal
> adds a very weird and awkward special case to how get_signal returns
> arguments.
> 
> Instead you should simply break and let get_signal return SIGKILL like
> any other signal that has a handler that the caller of get_signal needs
> to handle.
> 
> Something like:
>> +/*
>> + * PF_IO_WORKER have cleanup that must be performed,
>> + * before calling do_exit().
>> + */
>> +if (current->flags & PF_IO_WORKER)
>> +break;
> 
> 
> As do_coredump does not call do_exit there is no reason to skip calling into
> the coredump handling either.   And allowing it will remove yet another
> special case from the io worker code.

Thanks, I'll turn it into a break, that does seem like a better idea in
general. Actually it wants to be a goto or similar, as a break will
assume that we have the sighand lock held. With the coredump being
irrelevant, I'll just it before the do_exit() call.

-- 
Jens Axboe

Re: [PATCH 2/8] kernel: unmask SIGSTOP for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 12:01 PM, Stefan Metzmacher wrote:
> Am 26.03.21 um 16:29 schrieb Jens Axboe:
>> On 3/26/21 9:23 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 16:01 schrieb Jens Axboe:
>>>> On 3/26/21 7:48 AM, Oleg Nesterov wrote:
>>>>> Jens, sorry, I got lost :/
>>>>
>>>> Let's bring you back in :-)
>>>>
>>>>> On 03/25, Jens Axboe wrote:
>>>>>>
>>>>>> With IO threads accepting signals, including SIGSTOP,
>>>>>
>>>>> where can I find this change? Looks like I wasn't cc'ed...
>>>>
>>>> It's this very series.
>>>>
>>>>>> unmask the
>>>>>> SIGSTOP signal from the default blocked mask.
>>>>>>
>>>>>> Signed-off-by: Jens Axboe 
>>>>>> ---
>>>>>>  kernel/fork.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>> index d3171e8e88e5..d5a40552910f 100644
>>>>>> --- a/kernel/fork.c
>>>>>> +++ b/kernel/fork.c
>>>>>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int 
>>>>>> (*fn)(void *), void *arg, int node)
>>>>>>  tsk = copy_process(NULL, 0, node, );
>>>>>>  if (!IS_ERR(tsk)) {
>>>>>>  sigfillset(>blocked);
>>>>>> -sigdelsetmask(>blocked, sigmask(SIGKILL));
>>>>>> +sigdelsetmask(>blocked, 
>>>>>> sigmask(SIGKILL)|sigmask(SIGSTOP));
>>>>>
>>>>> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is 
>>>>> minor.
>>>>
>>>> Ah thanks.
>>>>
>>>>> To remind, either way this is racy and can't really help.
>>>>>
>>>>> And if "IO threads accepting signals" then I don't understand why. Sorry,
>>>>> I must have missed something.
>>>>
>>>> I do think the above is a no-op at this point, and we can probably just
>>>> kill it. Let me double check, hopefully we can just remove this blocked
>>>> part.
>>>
>>> Is this really correct to drop in your "kernel: stop masking signals in 
>>> create_io_thread()"
>>> commit?
>>>
>>> I don't assume signals wanted by userspace should potentially handled in an 
>>> io_thread...
>>> e.g. things set with fcntl(fd, F_SETSIG,) used together with F_SETLEASE?
>>
>> I guess we do actually need it, if we're not fiddling with
>> wants_signal() for them. To quell Oleg's concerns, we can just move it
>> to post dup_task_struct(), that should eliminate any race concerns
>> there.
> 
> If that one is racy, don' we better also want this one?
> https://lore.kernel.org/io-uring/438b738c1e4827a7fdfe43087da88bbe17eedc72.1616197787.git.me...@samba.org/T/#u
> 
> And clear tsk->pf_io_worker ?

Definitely prudent. I'll get round 2 queued up shortly.

-- 
Jens Axboe

Re: [PATCH] tomoyo: don't special case PF_IO_WORKER for PF_KTHREAD

2021-03-26 Thread Jens Axboe

On 3/26/21 10:03 AM, Casey Schaufler wrote:
> On 3/25/2021 5:44 PM, Jens Axboe wrote:
>> The io_uring PF_IO_WORKER threads no longer have PF_KTHREAD set, so no
>> need to special case them for credential checks.
> 
> Could you cite the commit where that change was made?

See previous reply, same one:

commit 3bfe6106693b6b4ba175ad1f929c4660b8f59ca8
Author: Jens Axboe 
Date:   Tue Feb 16 14:15:30 2021 -0700

io-wq: fork worker threads from original task

-- 
Jens Axboe

Re: [PATCH] Revert "Smack: Handle io_uring kernel thread privileges"

2021-03-26 Thread Jens Axboe

On 3/26/21 10:00 AM, Casey Schaufler wrote:
> On 3/25/2021 5:42 PM, Jens Axboe wrote:
>> This reverts commit 942cb357ae7d9249088e3687ee6a00ed2745a0c7.
>>
>> The io_uring PF_IO_WORKER threads no longer have PF_KTHREAD set, so no
>> need to special case them for credential checks.
> 
> Could you cite the commit making that change?
> I wouldn't want to see this change back-ported to a kernel
> that doesn't have that change as well.

This is strictly 5.12+. The change came about from:

commit 3bfe6106693b6b4ba175ad1f929c4660b8f59ca8
Author: Jens Axboe 
Date:   Tue Feb 16 14:15:30 2021 -0700

io-wq: fork worker threads from original task

So don't backport it.

-- 
Jens Axboe

Re: [PATCHSET v2 0/7] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 9:51 AM, Jens Axboe wrote:
> Hi,
> 
> For the v1 posting, see here:

Sigh, just ignore the last 4 patches (07...10/10) in this series,
there are sitting on top of this series and I messed up the git send-email.
This patch series ends in the 4 reverts.

-- 
Jens Axboe

[PATCH 08/10] io_uring: do post-completion chore on t-out cancel

2021-03-26 Thread Jens Axboe

From: Pavel Begunkov 

Don't forget about io_commit_cqring() + io_cqring_ev_posted() after
exit/exec cancelling timeouts. Both functions declared only after
io_kill_timeouts(), so to avoid tons of forward declarations move
it down.

Signed-off-by: Pavel Begunkov 
Link: 
https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.sile...@gmail.com
Signed-off-by: Jens Axboe 
---
 fs/io_uring.c | 42 ++
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4d0cb2548a67..69896ae204d6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1262,26 +1262,6 @@ static void io_kill_timeout(struct io_kiocb *req, int 
status)
}
 }
 
-/*
- * Returns true if we found and killed one or more timeouts
- */
-static bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk,
-struct files_struct *files)
-{
-   struct io_kiocb *req, *tmp;
-   int canceled = 0;
-
-   spin_lock_irq(>completion_lock);
-   list_for_each_entry_safe(req, tmp, >timeout_list, timeout.list) {
-   if (io_match_task(req, tsk, files)) {
-   io_kill_timeout(req, -ECANCELED);
-   canceled++;
-   }
-   }
-   spin_unlock_irq(>completion_lock);
-   return canceled != 0;
-}
-
 static void __io_queue_deferred(struct io_ring_ctx *ctx)
 {
do {
@@ -8612,6 +8592,28 @@ static void io_ring_exit_work(struct work_struct *work)
io_ring_ctx_free(ctx);
 }
 
+/* Returns true if we found and killed one or more timeouts */
+static bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk,
+struct files_struct *files)
+{
+   struct io_kiocb *req, *tmp;
+   int canceled = 0;
+
+   spin_lock_irq(>completion_lock);
+   list_for_each_entry_safe(req, tmp, >timeout_list, timeout.list) {
+   if (io_match_task(req, tsk, files)) {
+   io_kill_timeout(req, -ECANCELED);
+   canceled++;
+   }
+   }
+   io_commit_cqring(ctx);
+   spin_unlock_irq(>completion_lock);
+
+   if (canceled != 0)
+   io_cqring_ev_posted(ctx);
+   return canceled != 0;
+}
+
 static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 {
unsigned long index;
-- 
2.31.0

[PATCH 10/10] io_uring: don't cancel extra on files match

2021-03-26 Thread Jens Axboe

From: Pavel Begunkov 

As tasks always wait and kill their io-wq on exec/exit, files are of no
more concern to us, so we don't need to specifically cancel them by hand
in those cases. Moreover we should not, because io_match_task() looks at
req->task->files now, which is always true and so leads to extra
cancellations, that wasn't a case before per-task io-wq.

Signed-off-by: Pavel Begunkov 
Link: 
https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.sile...@gmail.com
Signed-off-by: Jens Axboe 
---
 fs/io_uring.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4189e1b684e1..66ae46874d85 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1094,8 +1094,6 @@ static bool io_match_task(struct io_kiocb *head,
io_for_each_link(req, head) {
if (req->flags & REQ_F_INFLIGHT)
return true;
-   if (req->task->files == files)
-   return true;
}
return false;
 }
-- 
2.31.0

[PATCH 09/10] io_uring: don't cancel-track common timeouts

2021-03-26 Thread Jens Axboe

From: Pavel Begunkov 

Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but
keep behaviour prior to dd59a3d595cc1 ("io_uring: reliably cancel linked
timeouts").

Signed-off-by: Pavel Begunkov 
Link: 
https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.sile...@gmail.com
Signed-off-by: Jens Axboe 
---
 fs/io_uring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 69896ae204d6..4189e1b684e1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5563,7 +5563,8 @@ static int io_timeout_prep(struct io_kiocb *req, const 
struct io_uring_sqe *sqe,
 
data->mode = io_translate_timeout_mode(flags);
hrtimer_init(>timer, CLOCK_MONOTONIC, data->mode);
-   io_req_track_inflight(req);
+   if (is_timeout_link)
+   io_req_track_inflight(req);
return 0;
 }
 
-- 
2.31.0

[PATCH 06/10] Revert "signal: don't allow STOP on PF_IO_WORKER threads"

2021-03-26 Thread Jens Axboe

This reverts commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597.

The IO threads allow and handle SIGSTOP now, so don't special case them
anymore in task_set_jobctl_pending().

Signed-off-by: Jens Axboe 
---
 kernel/signal.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 76d85830d4fa..5b75fbe3d2d6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -288,8 +288,7 @@ bool task_set_jobctl_pending(struct task_struct *task, 
unsigned long mask)
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
 
-   if (unlikely(fatal_signal_pending(task) ||
-(task->flags & (PF_EXITING | PF_IO_WORKER
+   if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
return false;
 
if (mask & JOBCTL_STOP_SIGMASK)
-- 
2.31.0

[PATCH 4/7] Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"

2021-03-26 Thread Jens Axboe

This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5.

IO threads now take signals just fine, so there's no reason to limit them
specifically. Revert the change that prevented that from happening.

Signed-off-by: Jens Axboe 
---
 kernel/signal.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index e3e1b8fbfe8a..af890479921a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -834,9 +834,6 @@ static int check_kill_permission(int sig, struct 
kernel_siginfo *info,
 
if (!valid_signal(sig))
return -EINVAL;
-   /* PF_IO_WORKER threads don't take any signals */
-   if (t->flags & PF_IO_WORKER)
-   return -ESRCH;
 
if (!si_fromuser(info))
return 0;
-- 
2.31.0

[PATCH 07/10] io_uring: fix timeout cancel return code

2021-03-26 Thread Jens Axboe

From: Pavel Begunkov 

When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.

Signed-off-by: Pavel Begunkov 
Link: 
https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.sile...@gmail.com
Signed-off-by: Jens Axboe 
---
 fs/io_uring.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 350418a88db3..4d0cb2548a67 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1247,7 +1247,7 @@ static void io_queue_async_work(struct io_kiocb *req)
io_queue_linked_timeout(link);
 }
 
-static void io_kill_timeout(struct io_kiocb *req)
+static void io_kill_timeout(struct io_kiocb *req, int status)
 {
struct io_timeout_data *io = req->async_data;
int ret;
@@ -1257,7 +1257,7 @@ static void io_kill_timeout(struct io_kiocb *req)
atomic_set(>ctx->cq_timeouts,
atomic_read(>ctx->cq_timeouts) + 1);
list_del_init(>timeout.list);
-   io_cqring_fill_event(req, 0);
+   io_cqring_fill_event(req, status);
io_put_req_deferred(req, 1);
}
 }
@@ -1274,7 +1274,7 @@ static bool io_kill_timeouts(struct io_ring_ctx *ctx, 
struct task_struct *tsk,
spin_lock_irq(>completion_lock);
list_for_each_entry_safe(req, tmp, >timeout_list, timeout.list) {
if (io_match_task(req, tsk, files)) {
-   io_kill_timeout(req);
+   io_kill_timeout(req, -ECANCELED);
canceled++;
}
}
@@ -1326,7 +1326,7 @@ static void io_flush_timeouts(struct io_ring_ctx *ctx)
break;
 
list_del_init(>timeout.list);
-   io_kill_timeout(req);
+   io_kill_timeout(req, 0);
} while (!list_empty(>timeout_list));
 
ctx->cq_last_tm_flush = seq;
-- 
2.31.0

[PATCH 5/7] Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"

2021-03-26 Thread Jens Axboe

This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.

The IO threads do allow signals now, including SIGSTOP, and we can allow
ptrace attach. Attaching won't reveal anything interesting for the IO
threads, but it will allow eg gdb to attach to a task with io_urings
and IO threads without complaining. And once attached, it will allow
the usual introspection into regular threads.

Signed-off-by: Jens Axboe 
---
 kernel/ptrace.c | 2 +-
 kernel/signal.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 821cf1723814..61db50f7ca86 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -375,7 +375,7 @@ static int ptrace_attach(struct task_struct *task, long 
request,
audit_ptrace(task);
 
retval = -EPERM;
-   if (unlikely(task->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if (unlikely(task->flags & PF_KTHREAD))
goto out;
if (same_thread_group(task, current))
goto out;
diff --git a/kernel/signal.c b/kernel/signal.c
index af890479921a..76d85830d4fa 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,7 +91,7 @@ static bool sig_task_ignored(struct task_struct *t, int sig, 
bool force)
return true;
 
/* Only allow kernel generated signals to this kthread */
-   if (unlikely((t->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+   if (unlikely((t->flags & PF_KTHREAD) &&
 (handler == SIG_KTHREAD_KERNEL) && !force))
return true;
 
@@ -1097,7 +1097,7 @@ static int __send_signal(int sig, struct kernel_siginfo 
*info, struct task_struc
/*
 * Skip useless siginfo allocation for SIGKILL and kernel threads.
 */
-   if ((sig == SIGKILL) || (t->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
goto out_set;
 
/*
-- 
2.31.0

[PATCH 05/10] Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"

2021-03-26 Thread Jens Axboe

This reverts commit 15b2219facadec583c24523eed40fa45865f859f.

Before IO threads accepted signals, the freezer using take signals to wake
up an IO thread would cause them to loop without any way to clear the
pending signal. That is no longer the case, so stop special casing
PF_IO_WORKER in the freezer.

Signed-off-by: Jens Axboe 
---
 kernel/freezer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 1a2d57d1327c..dc520f01f99d 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p)
return false;
}
 
-   if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if (!(p->flags & PF_KTHREAD))
fake_signal_wake_up(p);
else
wake_up_state(p, TASK_INTERRUPTIBLE);
-- 
2.31.0

[PATCH 6/7] Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"

2021-03-26 Thread Jens Axboe

This reverts commit 15b2219facadec583c24523eed40fa45865f859f.

Before IO threads accepted signals, the freezer using take signals to wake
up an IO thread would cause them to loop without any way to clear the
pending signal. That is no longer the case, so stop special casing
PF_IO_WORKER in the freezer.

Signed-off-by: Jens Axboe 
---
 kernel/freezer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 1a2d57d1327c..dc520f01f99d 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p)
return false;
}
 
-   if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if (!(p->flags & PF_KTHREAD))
fake_signal_wake_up(p);
else
wake_up_state(p, TASK_INTERRUPTIBLE);
-- 
2.31.0

[PATCH 7/7] Revert "signal: don't allow STOP on PF_IO_WORKER threads"

2021-03-26 Thread Jens Axboe

This reverts commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597.

The IO threads allow and handle SIGSTOP now, so don't special case them
anymore in task_set_jobctl_pending().

Signed-off-by: Jens Axboe 
---
 kernel/signal.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 76d85830d4fa..5b75fbe3d2d6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -288,8 +288,7 @@ bool task_set_jobctl_pending(struct task_struct *task, 
unsigned long mask)
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
 
-   if (unlikely(fatal_signal_pending(task) ||
-(task->flags & (PF_EXITING | PF_IO_WORKER
+   if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
return false;
 
if (mask & JOBCTL_STOP_SIGMASK)
-- 
2.31.0

[PATCH 04/10] Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"

2021-03-26 Thread Jens Axboe

This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.

The IO threads do allow signals now, including SIGSTOP, and we can allow
ptrace attach. Attaching won't reveal anything interesting for the IO
threads, but it will allow eg gdb to attach to a task with io_urings
and IO threads without complaining. And once attached, it will allow
the usual introspection into regular threads.

Signed-off-by: Jens Axboe 
---
 kernel/ptrace.c | 2 +-
 kernel/signal.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 821cf1723814..61db50f7ca86 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -375,7 +375,7 @@ static int ptrace_attach(struct task_struct *task, long 
request,
audit_ptrace(task);
 
retval = -EPERM;
-   if (unlikely(task->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if (unlikely(task->flags & PF_KTHREAD))
goto out;
if (same_thread_group(task, current))
goto out;
diff --git a/kernel/signal.c b/kernel/signal.c
index af890479921a..76d85830d4fa 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,7 +91,7 @@ static bool sig_task_ignored(struct task_struct *t, int sig, 
bool force)
return true;
 
/* Only allow kernel generated signals to this kthread */
-   if (unlikely((t->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+   if (unlikely((t->flags & PF_KTHREAD) &&
 (handler == SIG_KTHREAD_KERNEL) && !force))
return true;
 
@@ -1097,7 +1097,7 @@ static int __send_signal(int sig, struct kernel_siginfo 
*info, struct task_struc
/*
 * Skip useless siginfo allocation for SIGKILL and kernel threads.
 */
-   if ((sig == SIGKILL) || (t->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
goto out_set;
 
/*
-- 
2.31.0

[PATCH 2/7] io_uring: handle signals for IO threads like a normal thread

2021-03-26 Thread Jens Axboe

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.

Signed-off-by: Jens Axboe 
---
 fs/io-wq.c| 24 +---
 fs/io_uring.c | 12 
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index b7c1fa932cb3..3e2f059a1737 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "../kernel/sched/sched.h"
 #include "io-wq.h"
@@ -503,10 +502,16 @@ static int io_wqe_worker(void *data)
if (io_flush_signals())
continue;
ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
-   if (try_to_freeze() || ret)
+   if (signal_pending(current)) {
+   struct ksignal ksig;
+
+   if (fatal_signal_pending(current))
+   break;
+   if (get_signal())
+   continue;
+   }
+   if (ret)
continue;
-   if (fatal_signal_pending(current))
-   break;
/* timed out, exit unless we're the fixed worker */
if (test_bit(IO_WQ_BIT_EXIT, >state) ||
!(worker->flags & IO_WORKER_F_FIXED))
@@ -714,9 +719,14 @@ static int io_wq_manager(void *data)
set_current_state(TASK_INTERRUPTIBLE);
io_wq_check_workers(wq);
schedule_timeout(HZ);
-   try_to_freeze();
-   if (fatal_signal_pending(current))
-   set_bit(IO_WQ_BIT_EXIT, >state);
+   if (signal_pending(current)) {
+   struct ksignal ksig;
+
+   if (fatal_signal_pending(current))
+   set_bit(IO_WQ_BIT_EXIT, >state);
+   else if (get_signal())
+   continue;
+   }
} while (!test_bit(IO_WQ_BIT_EXIT, >state));
 
io_wq_check_workers(wq);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54ea561db4a5..350418a88db3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -78,7 +78,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -6765,8 +6764,14 @@ static int io_sq_thread(void *data)
timeout = jiffies + sqd->sq_thread_idle;
continue;
}
-   if (fatal_signal_pending(current))
-   break;
+   if (signal_pending(current)) {
+   struct ksignal ksig;
+
+   if (fatal_signal_pending(current))
+   break;
+   if (get_signal())
+   continue;
+   }
sqt_spin = false;
cap_entries = !list_is_singular(>ctx_list);
list_for_each_entry(ctx, >ctx_list, sqd_list) {
@@ -6809,7 +6814,6 @@ static int io_sq_thread(void *data)
 
mutex_unlock(>lock);
schedule();
-   try_to_freeze();
mutex_lock(>lock);
list_for_each_entry(ctx, >ctx_list, sqd_list)
io_ring_clear_wakeup_flag(ctx);
-- 
2.31.0

[PATCH 3/7] kernel: stop masking signals in create_io_thread()

2021-03-26 Thread Jens Axboe

This is racy - move the blocking into when the task is created and
we're marking it as PF_IO_WORKER anyway. The IO threads are now
prepared to handle signals like SIGSTOP as well, so clear that from
the mask to allow proper stopping of IO threads.

Reported-by: Oleg Nesterov 
Signed-off-by: Jens Axboe 
---
 kernel/fork.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d3171e8e88e5..ddaa15227071 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1940,8 +1940,14 @@ static __latent_entropy struct task_struct *copy_process(
p = dup_task_struct(current, node);
if (!p)
goto fork_out;
-   if (args->io_thread)
+   if (args->io_thread) {
+   /*
+* Mark us an IO worker, and block any signal that isn't
+* fatal or STOP
+*/
p->flags |= PF_IO_WORKER;
+   siginitsetinv(>blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
+   }
 
/*
 * This _must_ happen before we call free_task(), i.e. before we jump
@@ -2430,14 +2436,8 @@ struct task_struct *create_io_thread(int (*fn)(void *), 
void *arg, int node)
.stack_size = (unsigned long)arg,
.io_thread  = 1,
};
-   struct task_struct *tsk;
 
-   tsk = copy_process(NULL, 0, node, );
-   if (!IS_ERR(tsk)) {
-   sigfillset(>blocked);
-   sigdelsetmask(>blocked, sigmask(SIGKILL));
-   }
-   return tsk;
+   return copy_process(NULL, 0, node, );
 }
 
 /*
-- 
2.31.0

[PATCH 03/10] Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"

2021-03-26 Thread Jens Axboe

This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5.

IO threads now take signals just fine, so there's no reason to limit them
specifically. Revert the change that prevented that from happening.

Signed-off-by: Jens Axboe 
---
 kernel/signal.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index e3e1b8fbfe8a..af890479921a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -834,9 +834,6 @@ static int check_kill_permission(int sig, struct 
kernel_siginfo *info,
 
if (!valid_signal(sig))
return -EINVAL;
-   /* PF_IO_WORKER threads don't take any signals */
-   if (t->flags & PF_IO_WORKER)
-   return -ESRCH;
 
if (!si_fromuser(info))
return 0;
-- 
2.31.0

[PATCH 1/7] kernel: don't call do_exit() for PF_IO_WORKER threads

2021-03-26 Thread Jens Axboe

Right now we're never calling get_signal() from PF_IO_WORKER threads, but
in preparation for doing so, don't handle a fatal signal for them. The
workers have state they need to cleanup when exiting, and they don't do
coredumps, so just return instead of performing either a dump or calling
do_exit() on their behalf. The threads themselves will detect a fatal
signal and do proper shutdown.

Signed-off-by: Jens Axboe 
---
 kernel/signal.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index f2a1b898da29..e3e1b8fbfe8a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2756,6 +2756,15 @@ bool get_signal(struct ksignal *ksig)
 */
current->flags |= PF_SIGNALED;
 
+   /*
+* PF_IO_WORKER threads will catch and exit on fatal signals
+* themselves. They have cleanup that must be performed, so
+* we cannot call do_exit() on their behalf. coredumps also
+* do not apply to them.
+*/
+   if (current->flags & PF_IO_WORKER)
+   return false;
+
if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
print_fatal_signal(ksig->info.si_signo);
-- 
2.31.0

[PATCHSET v2 0/7] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

Hi,

For the v1 posting, see here:

https://lore.kernel.org/io-uring/20210326003928.978750-1-ax...@kernel.dk/

I've run this through the usual testing, and it's running long term right
now. I've tested the cases that Stefan reported, and we seem fine now.

Changes since v1:

- Catch fatal signals in get_signal() for PF_IO_WORKER. This is only a
  problem for nested signals, like SIGSTOP followed by SIGKILL. We
  can't have get_signal() calling do_exit() on behalf of the IO threads,
  they have cleanups to do. Thanks Stefan.

- Move signal masking to when the PF_IO_WORKER thread is created, and since
  we now handle SIGSTOP, unmask that as well. Thanks Oleg.

- Remove try_to_freeze() parts in IO threads, we don't need those anymore
  with the calling of get_signal().

- Minor cleanups.

 fs/io-wq.c   | 24 +---
 fs/io_uring.c| 12 
 kernel/fork.c| 16 
 kernel/freezer.c |  2 +-
 kernel/ptrace.c  |  2 +-
 kernel/signal.c  | 19 ---
 6 files changed, 47 insertions(+), 28 deletions(-)

-- 
Jens Axboe

Re: [PATCH 1/1] block: fix trivial typos in comments

2021-03-26 Thread Jens Axboe

On 3/26/21 9:45 AM, Tom Saeger wrote:
> On Fri, Mar 26, 2021 at 09:41:49AM -0600, Jens Axboe wrote:
>> On 3/25/21 9:04 PM, Tom Saeger wrote:
>>>
>>> s/Additonal/Additional/
>>> s/assocaited/associated/
>>> s/assocaited/associated/
>>> s/assocating/associating/
>>> s/becasue/because/
>>> s/configred/configured/
>>> s/deactive/deactivate/
>>> s/followings/following/
>>> s/funtion/function/
>>> s/heirarchy/hierarchy/
>>> s/intiailized/initialized/
>>> s/prefered/preferred/
>>> s/readded/read/
>>> s/Secion/Section/
>>> s/soley/solely/
>>
>> While I'm generally happy to accept any patch that makes sense, the
>> recent influx of speling fixes have me less than excited. They just
>> add complications to backports and stable patches, for example, and
>> I'd prefer not to take them for that reason alone.
> 
> Nod.
> 
> In that case - perhaps adding these entries to scripts/spelling.txt
> would at least catch some going forward?
> 
> I can send that.

That seems like a good idea.

-- 
Jens Axboe

Re: [PATCH 1/1] block: fix trivial typos in comments

2021-03-26 Thread Jens Axboe

On 3/25/21 9:04 PM, Tom Saeger wrote:
> 
> s/Additonal/Additional/
> s/assocaited/associated/
> s/assocaited/associated/
> s/assocating/associating/
> s/becasue/because/
> s/configred/configured/
> s/deactive/deactivate/
> s/followings/following/
> s/funtion/function/
> s/heirarchy/hierarchy/
> s/intiailized/initialized/
> s/prefered/preferred/
> s/readded/read/
> s/Secion/Section/
> s/soley/solely/

While I'm generally happy to accept any patch that makes sense, the
recent influx of speling fixes have me less than excited. They just
add complications to backports and stable patches, for example, and
I'd prefer not to take them for that reason alone.

-- 
Jens Axboe

Re: [PATCH 2/8] kernel: unmask SIGSTOP for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 9:23 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 16:01 schrieb Jens Axboe:
>> On 3/26/21 7:48 AM, Oleg Nesterov wrote:
>>> Jens, sorry, I got lost :/
>>
>> Let's bring you back in :-)
>>
>>> On 03/25, Jens Axboe wrote:
>>>>
>>>> With IO threads accepting signals, including SIGSTOP,
>>>
>>> where can I find this change? Looks like I wasn't cc'ed...
>>
>> It's this very series.
>>
>>>> unmask the
>>>> SIGSTOP signal from the default blocked mask.
>>>>
>>>> Signed-off-by: Jens Axboe 
>>>> ---
>>>>  kernel/fork.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>> index d3171e8e88e5..d5a40552910f 100644
>>>> --- a/kernel/fork.c
>>>> +++ b/kernel/fork.c
>>>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void 
>>>> *), void *arg, int node)
>>>>tsk = copy_process(NULL, 0, node, );
>>>>if (!IS_ERR(tsk)) {
>>>>sigfillset(>blocked);
>>>> -  sigdelsetmask(>blocked, sigmask(SIGKILL));
>>>> +  sigdelsetmask(>blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>>>
>>> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.
>>
>> Ah thanks.
>>
>>> To remind, either way this is racy and can't really help.
>>>
>>> And if "IO threads accepting signals" then I don't understand why. Sorry,
>>> I must have missed something.
>>
>> I do think the above is a no-op at this point, and we can probably just
>> kill it. Let me double check, hopefully we can just remove this blocked
>> part.
> 
> Is this really correct to drop in your "kernel: stop masking signals in 
> create_io_thread()"
> commit?
> 
> I don't assume signals wanted by userspace should potentially handled in an 
> io_thread...
> e.g. things set with fcntl(fd, F_SETSIG,) used together with F_SETLEASE?

I guess we do actually need it, if we're not fiddling with
wants_signal() for them. To quell Oleg's concerns, we can just move it
to post dup_task_struct(), that should eliminate any race concerns
there.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 9:11 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 16:10 schrieb Jens Axboe:
>> On 3/26/21 9:08 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 15:55 schrieb Jens Axboe:
>>>> On 3/26/21 8:53 AM, Jens Axboe wrote:
>>>>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>>>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>>>>
>>>>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>>>>
>>>>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>>>>
>>>>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>>>>> tested both of them separately and didn't observe anything. I also 
>>>>>>>>>> ran
>>>>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>>>>> pushed), fwiw.
>>>>>>>>>
>>>>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>>>>
>>>>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for 
>>>>>>>> us
>>>>>>>> and never return. So we might need a special case in there to deal with
>>>>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>>>>> correctly for IO threads.
>>>>>>>
>>>>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() 
>>>>>>> from being called?
>>>>>>
>>>>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>>>>
>>>>> Yes exactly, we're waiting in there being stopped. So we either need to
>>>>> check to something ala:
>>>>>
>>>>> relock:
>>>>> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>>>>> + return false;
>>>>>
>>>>> to catch it upfront and from the relock case, or add:
>>>>>
>>>>>   fatal:
>>>>> + if (current->flags & PF_IO_WORKER)
>>>>> + return false;
>>>>>
>>>>> to catch it in the fatal section.
>>>>
>>>> Can you try this? Not crazy about adding a special case, but I don't
>>>> think there's any way around this one. And should be pretty cheap, as
>>>> we're already pulling in ->flags right above anyway.
>>>>
>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>> index 5ad8566534e7..5b75fbe3d2d6 100644
>>>> --- a/kernel/signal.c
>>>> +++ b/kernel/signal.c
>>>> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
>>>> */
>>>>current->flags |= PF_SIGNALED;
>>>>  
>>>> +  /*
>>>> +   * PF_IO_WORKER threads will catch and exit on fatal signals
>>>> +   * themselves. They have cleanup that must be performed, so
>>>> +   * we cannot call do_exit() on their behalf. coredumps also
>>>> +   * do not apply to them.
>>>> +   */
>>>> +  if (current->flags & PF_IO_WORKER)
>>>> +  return false;
>>>> +
>>>>if (sig_kernel_coredump(signr)) {
>>>>if (print_fatal_signals)
>>>>print_fatal_signal(ksig->info.si_signo);
>>>>
>>>
>>> I guess not before next week, but if it resolves the problem for you,
>>> I guess it would be good to get this into rc5.
>>
>> It does, I pushed out a new branch. I'll send out a v2 series in a bit.
> 
> Great, thanks!
> 
> Any chance to get the "cmdline" hiding included?

I'll take a look at your response there, haven't yet. Wanted to get this
one sorted first.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 9:08 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 15:55 schrieb Jens Axboe:
>> On 3/26/21 8:53 AM, Jens Axboe wrote:
>>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>>
>>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>>
>>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>>
>>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>>> pushed), fwiw.
>>>>>>>
>>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>>
>>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>>> and never return. So we might need a special case in there to deal with
>>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>>> correctly for IO threads.
>>>>>
>>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from 
>>>>> being called?
>>>>
>>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>>
>>> Yes exactly, we're waiting in there being stopped. So we either need to
>>> check to something ala:
>>>
>>> relock:
>>> +   if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>>> +   return false;
>>>
>>> to catch it upfront and from the relock case, or add:
>>>
>>> fatal:
>>> +   if (current->flags & PF_IO_WORKER)
>>> +   return false;
>>>
>>> to catch it in the fatal section.
>>
>> Can you try this? Not crazy about adding a special case, but I don't
>> think there's any way around this one. And should be pretty cheap, as
>> we're already pulling in ->flags right above anyway.
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 5ad8566534e7..5b75fbe3d2d6 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
>>   */
>>  current->flags |= PF_SIGNALED;
>>  
>> +/*
>> + * PF_IO_WORKER threads will catch and exit on fatal signals
>> + * themselves. They have cleanup that must be performed, so
>> + * we cannot call do_exit() on their behalf. coredumps also
>> + * do not apply to them.
>> + */
>> +if (current->flags & PF_IO_WORKER)
>> +return false;
>> +
>>  if (sig_kernel_coredump(signr)) {
>>  if (print_fatal_signals)
>>  print_fatal_signal(ksig->info.si_signo);
>>
> 
> I guess not before next week, but if it resolves the problem for you,
> I guess it would be good to get this into rc5.

It does, I pushed out a new branch. I'll send out a v2 series in a bit.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 9:04 AM, Stefan Metzmacher wrote:
> 
> Am 26.03.21 um 15:53 schrieb Jens Axboe:
>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>
>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>
>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>
>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>> pushed), fwiw.
>>>>>>
>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>
>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>> and never return. So we might need a special case in there to deal with
>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>> correctly for IO threads.
>>>>
>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from 
>>>> being called?
>>>
>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>
>> Yes exactly, we're waiting in there being stopped. So we either need to
>> check to something ala:
>>
>> relock:
>> +if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>> +return false;
>>
>> to catch it upfront and from the relock case, or add:
>>
>>  fatal:
>> +if (current->flags & PF_IO_WORKER)
>> +return false;
>>
>> to catch it in the fatal section.
>>
> 
> Or something like io_uring_files_cancel()
> 
> Maybe change current->pf_io_worker with a generic current->io_thread
> structure which, has exit hooks, as well as
> io_wq_worker_sleeping() and io_wq_worker_running().
> 
> Maybe create_io_thread would take such an structure
> as argument instead of a single function pointer.
> 
> struct io_thread_description {
>   const char *name;
>   int (*thread_fn)(struct io_thread_description *);
>   void (*sleeping_fn)((struct io_thread_description *);
>   void (*running_fn)((struct io_thread_description *);
>   void (*exit_fn)((struct io_thread_description *);
> };
> 
> And then
> struct io_wq_manager {
>   struct io_thread_description description;
>   ... manager specific stuff...
> };

I did consider something like that, but seems a bit over-engineered
just for catching this case. And any kind of logic for PF_EXITING
ends up being a bit tricky for cancelations.

We can look into doing that for 5.13 potentially.

-- 
Jens Axboe

Re: [PATCH 2/8] kernel: unmask SIGSTOP for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 7:48 AM, Oleg Nesterov wrote:
> Jens, sorry, I got lost :/

Let's bring you back in :-)

> On 03/25, Jens Axboe wrote:
>>
>> With IO threads accepting signals, including SIGSTOP,
> 
> where can I find this change? Looks like I wasn't cc'ed...

It's this very series.

>> unmask the
>> SIGSTOP signal from the default blocked mask.
>>
>> Signed-off-by: Jens Axboe 
>> ---
>>  kernel/fork.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index d3171e8e88e5..d5a40552910f 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void 
>> *), void *arg, int node)
>>  tsk = copy_process(NULL, 0, node, );
>>  if (!IS_ERR(tsk)) {
>>  sigfillset(>blocked);
>> -sigdelsetmask(>blocked, sigmask(SIGKILL));
>> +sigdelsetmask(>blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
> 
> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.

Ah thanks.

> To remind, either way this is racy and can't really help.
> 
> And if "IO threads accepting signals" then I don't understand why. Sorry,
> I must have missed something.

I do think the above is a no-op at this point, and we can probably just
kill it. Let me double check, hopefully we can just remove this blocked
part.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 8:53 AM, Jens Axboe wrote:
> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>
>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>> incremental you sent, which wasn't complete.
>>>>>>
>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>
>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>> pushed), fwiw.
>>>>>
>>>>> I can reproduce this one! I'll take a closer look.
>>>>
>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>> and never return. So we might need a special case in there to deal with
>>>> that, or some other way of ensuring that fatal signal gets processed
>>>> correctly for IO threads.
>>>
>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from 
>>> being called?
>>
>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
> 
> Yes exactly, we're waiting in there being stopped. So we either need to
> check to something ala:
> 
> relock:
> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
> + return false;
> 
> to catch it upfront and from the relock case, or add:
> 
>   fatal:
> + if (current->flags & PF_IO_WORKER)
> + return false;
> 
> to catch it in the fatal section.

Can you try this? Not crazy about adding a special case, but I don't
think there's any way around this one. And should be pretty cheap, as
we're already pulling in ->flags right above anyway.

diff --git a/kernel/signal.c b/kernel/signal.c
index 5ad8566534e7..5b75fbe3d2d6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
 */
current->flags |= PF_SIGNALED;
 
+   /*
+* PF_IO_WORKER threads will catch and exit on fatal signals
+* themselves. They have cleanup that must be performed, so
+* we cannot call do_exit() on their behalf. coredumps also
+* do not apply to them.
+*/
+   if (current->flags & PF_IO_WORKER)
+   return false;
+
if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
print_fatal_signal(ksig->info.si_signo);

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>> The KILL after STOP deadlock still exists.
>>>>>
>>>>> In which tree? Sounds like you're still on the old one with that
>>>>> incremental you sent, which wasn't complete.
>>>>>
>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>
>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>> pushed), fwiw.
>>>>
>>>> I can reproduce this one! I'll take a closer look.
>>>
>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>> and never return. So we might need a special case in there to deal with
>>> that, or some other way of ensuring that fatal signal gets processed
>>> correctly for IO threads.
>>
>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from 
>> being called?
> 
> Ah, we're still in the first get_signal() from SIGSTOP, correct?

Yes exactly, we're waiting in there being stopped. So we either need to
check to something ala:

relock:
+   if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
+   return false;

to catch it upfront and from the relock case, or add:

fatal:
+   if (current->flags & PF_IO_WORKER)
+   return false;

to catch it in the fatal section.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 8:43 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>> The KILL after STOP deadlock still exists.
>>>>
>>>> In which tree? Sounds like you're still on the old one with that
>>>> incremental you sent, which wasn't complete.
>>>>
>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>
>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>> tested both of them separately and didn't observe anything. I also ran
>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>> pushed), fwiw.
>>>
>>> I can reproduce this one! I'll take a closer look.
>>
>> OK, that one is actually pretty straight forward - we rely on cleaning
>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>> and never return. So we might need a special case in there to deal with
>> that, or some other way of ensuring that fatal signal gets processed
>> correctly for IO threads.
> 
> And if (fatal_signal_pending(current)) doesn't prevent get_signal()
> from being called?

Usually yes, but this case is first doing SIGSTOP, so we're waiting in
get_signal() -> do_signal_stop() when the SIGKILL arrives. Hence there's
no way to catch it in the worker themselves.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 7:59 AM, Jens Axboe wrote:
> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>> The KILL after STOP deadlock still exists.
>>
>> In which tree? Sounds like you're still on the old one with that
>> incremental you sent, which wasn't complete.
>>
>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>
>> No, it should kill up in all cases. I'll try your stop + kill, I just
>> tested both of them separately and didn't observe anything. I also ran
>> your io_uring-cp example (and found a bug in the example, fixed and
>> pushed), fwiw.
> 
> I can reproduce this one! I'll take a closer look.

OK, that one is actually pretty straight forward - we rely on cleaning
up on exit, but for fatal cases, get_signal() will call do_exit() for us
and never return. So we might need a special case in there to deal with
that, or some other way of ensuring that fatal signal gets processed
correctly for IO threads.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 7:54 AM, Jens Axboe wrote:
>> The KILL after STOP deadlock still exists.
> 
> In which tree? Sounds like you're still on the old one with that
> incremental you sent, which wasn't complete.
> 
>> Does io_wq_manager() exits without cleaning up on SIGKILL?
> 
> No, it should kill up in all cases. I'll try your stop + kill, I just
> tested both of them separately and didn't observe anything. I also ran
> your io_uring-cp example (and found a bug in the example, fixed and
> pushed), fwiw.

I can reproduce this one! I'll take a closer look.

-- 
Jens Axboe

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 7:31 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 13:56 schrieb Jens Axboe:
>> On 3/26/21 5:48 AM, Stefan Metzmacher wrote:
>>>
>>> Am 26.03.21 um 01:39 schrieb Jens Axboe:
>>>> Hi,
>>>>
>>>> As discussed in a previous thread today, the seemingly much saner approach
>>>> is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
>>>> threads. If we just have the threads call get_signal() for
>>>> signal_pending(), then everything just falls out naturally with how
>>>> we receive and handle signals.
>>>>
>>>> Patch 1 adds support for checking and calling get_signal() from the
>>>> regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
>>>> SIGSTOP from the default IO thread blocked mask, and the rest just revert
>>>> special cases that were put in place for PF_IO_WORKER threads.
>>>>
>>>> With this done, only two special cases remain for PF_IO_WORKER, and they
>>>> aren't related to signals so not part of this patchset. But both of them
>>>> can go away as well now that we have "real" threads as IO workers, and
>>>> then we'll have zero special cases for PF_IO_WORKER.
>>>>
>>>> This passes the usual regression testing, my other usual 24h run has been
>>>> kicked off. But I wanted to send this out early.
>>>>
>>>> Thanks to Linus for the suggestion. As with most other good ideas, it's
>>>> obvious once you hear it. The fact that we end up with _zero_ special
>>>> cases with this is a clear sign that this is the right way to do it
>>>> indeed. The fact that this series is 2/3rds revert further drives that
>>>> point home. Also thanks to Eric for diligent review on the signal side
>>>> of things for the past changes (and hopefully ditto on this series :-))
>>>
>>> Ok, I'm testing a8ff6a3b20bd16d071ef66824ae4428529d114f9 from
>>> your io_uring-5.12 branch.
>>>
>>> And using this patch:
>>> diff --git a/examples/io_uring-cp.c b/examples/io_uring-cp.c
>>> index cc7a227a5ec7..6e26a4214015 100644
>>> --- a/examples/io_uring-cp.c
>>> +++ b/examples/io_uring-cp.c
>>> @@ -116,13 +116,16 @@ static void queue_write(struct io_uring *ring, struct 
>>> io_data *data)
>>> io_uring_submit(ring);
>>>  }
>>>
>>> -static int copy_file(struct io_uring *ring, off_t insize)
>>> +static int copy_file(struct io_uring *ring, off_t _insize)
>>>  {
>>> +   off_t insize = _insize;
>>> unsigned long reads, writes;
>>> struct io_uring_cqe *cqe;
>>> off_t write_left, offset;
>>> int ret;
>>>
>>> +again:
>>> +   insize = _insize;
>>> write_left = insize;
>>> writes = reads = offset = 0;
>>>
>>> @@ -221,6 +224,12 @@ static int copy_file(struct io_uring *ring, off_t 
>>> insize)
>>> }
>>> }
>>>
>>> +   {
>>> +   struct timespec ts = { .tv_nsec = 99, };
>>> +   nanosleep(, NULL);
>>> +   goto again;
>>> +   }
>>> +
>>> return 0;
>>>  }
>>>
>>> Running ./io_uring-cp ~/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb 
>>> file
>>> What I see is this:
>>>
>>> kill -SIGSTOP to any thread I used a worker with pid 2061 here, results in
>>>
>>> root@ub1704-166:~# head /proc/2061/status
>>> Name:   iou-wrk-2041
>>> Umask:  0022
>>> State:  R (running)
>>> Tgid:   2041
>>> Ngid:   0
>>> Pid:2061
>>> PPid:   1857
>>> TracerPid:  0
>>> Uid:0   0   0   0
>>> Gid:0   0   0   0
>>> root@ub1704-166:~# head /proc/2041/status
>>> Name:   io_uring-cp
>>> Umask:  0022
>>> State:  T (stopped)
>>> Tgid:   2041
>>> Ngid:   0
>>> Pid:2041
>>> PPid:   1857
>>> TracerPid:  0
>>> Uid:0   0   0   0
>>> Gid:0   0   0   0
>>> root@ub1704-166:~# head /proc/2042/status
>>> Name:   iou-mgr-2041
>>> Umask:  0022
>>> State:  T (stopped)
>>> Tgid:   2041
>>> Ngid:   0
>>> Pid:2042
>>> PPid:   1857
>>> TracerPid:  0
>>> Uid:0   0

Re: [PATCH 0/6] Allow signals for IO threads

2021-03-26 Thread Jens Axboe

On 3/26/21 5:48 AM, Stefan Metzmacher wrote:
> 
> Am 26.03.21 um 01:39 schrieb Jens Axboe:
>> Hi,
>>
>> As discussed in a previous thread today, the seemingly much saner approach
>> is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
>> threads. If we just have the threads call get_signal() for
>> signal_pending(), then everything just falls out naturally with how
>> we receive and handle signals.
>>
>> Patch 1 adds support for checking and calling get_signal() from the
>> regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
>> SIGSTOP from the default IO thread blocked mask, and the rest just revert
>> special cases that were put in place for PF_IO_WORKER threads.
>>
>> With this done, only two special cases remain for PF_IO_WORKER, and they
>> aren't related to signals so not part of this patchset. But both of them
>> can go away as well now that we have "real" threads as IO workers, and
>> then we'll have zero special cases for PF_IO_WORKER.
>>
>> This passes the usual regression testing, my other usual 24h run has been
>> kicked off. But I wanted to send this out early.
>>
>> Thanks to Linus for the suggestion. As with most other good ideas, it's
>> obvious once you hear it. The fact that we end up with _zero_ special
>> cases with this is a clear sign that this is the right way to do it
>> indeed. The fact that this series is 2/3rds revert further drives that
>> point home. Also thanks to Eric for diligent review on the signal side
>> of things for the past changes (and hopefully ditto on this series :-))
> 
> Ok, I'm testing a8ff6a3b20bd16d071ef66824ae4428529d114f9 from
> your io_uring-5.12 branch.
> 
> And using this patch:
> diff --git a/examples/io_uring-cp.c b/examples/io_uring-cp.c
> index cc7a227a5ec7..6e26a4214015 100644
> --- a/examples/io_uring-cp.c
> +++ b/examples/io_uring-cp.c
> @@ -116,13 +116,16 @@ static void queue_write(struct io_uring *ring, struct 
> io_data *data)
> io_uring_submit(ring);
>  }
> 
> -static int copy_file(struct io_uring *ring, off_t insize)
> +static int copy_file(struct io_uring *ring, off_t _insize)
>  {
> +   off_t insize = _insize;
> unsigned long reads, writes;
> struct io_uring_cqe *cqe;
> off_t write_left, offset;
> int ret;
> 
> +again:
> +   insize = _insize;
> write_left = insize;
> writes = reads = offset = 0;
> 
> @@ -221,6 +224,12 @@ static int copy_file(struct io_uring *ring, off_t insize)
> }
> }
> 
> +   {
> +   struct timespec ts = { .tv_nsec = 99, };
> +   nanosleep(, NULL);
> +   goto again;
> +   }
> +
> return 0;
>  }
> 
> Running ./io_uring-cp ~/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb 
> file
> What I see is this:
> 
> kill -SIGSTOP to any thread I used a worker with pid 2061 here, results in
> 
> root@ub1704-166:~# head /proc/2061/status
> Name:   iou-wrk-2041
> Umask:  0022
> State:  R (running)
> Tgid:   2041
> Ngid:   0
> Pid:2061
> PPid:   1857
> TracerPid:  0
> Uid:0   0   0   0
> Gid:0   0   0   0
> root@ub1704-166:~# head /proc/2041/status
> Name:   io_uring-cp
> Umask:  0022
> State:  T (stopped)
> Tgid:   2041
> Ngid:   0
> Pid:2041
> PPid:   1857
> TracerPid:  0
> Uid:0   0   0   0
> Gid:0   0   0   0
> root@ub1704-166:~# head /proc/2042/status
> Name:   iou-mgr-2041
> Umask:  0022
> State:  T (stopped)
> Tgid:   2041
> Ngid:   0
> Pid:2042
> PPid:   1857
> TracerPid:  0
> Uid:0   0   0   0
> Gid:0   0   0   0
> 
> So userspace and iou-mgr-2041 stop, but the workers don't.
> 49 workers burn cpu as much as possible.
> 
> kill -KILL 2061
> results in this:
> - all workers are gone
> - iou-mgr-2041 is gone
> - io_uring-cp waits in status D forever
> 
> root@ub1704-166:~# head /proc/2041/status
> Name:   io_uring-cp
> Umask:  0022
> State:  D (disk sleep)
> Tgid:   2041
> Ngid:   0
> Pid:2041
> PPid:   1857
> TracerPid:  0
> Uid:0   0   0   0
> Gid:0   0   0   0
> root@ub1704-166:~# cat /proc/2041/stack
> [<0>] io_wq_destroy_manager+0x36/0xa0
> [<0>] io_wq_put_and_exit+0x2b/0x40
> [<0>] io_uring_clean_tctx+0xc5/0x110
> [<0>] __io_uring_files_cancel+0x336/0x4e0
> [<0>] do_exit+0x16b/0x13b0
> [<0>] do_group_exit+0x8b/0x140
> [<0>] get_signal+0x219/0xc90

[PATCH] tomoyo: don't special case PF_IO_WORKER for PF_KTHREAD

2021-03-25 Thread Jens Axboe

The io_uring PF_IO_WORKER threads no longer have PF_KTHREAD set, so no
need to special case them for credential checks.

Cc: Tetsuo Handa 
Signed-off-by: Jens Axboe 
---
 security/tomoyo/network.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/tomoyo/network.c b/security/tomoyo/network.c
index 478f757ff843..8dc61335f65e 100644
--- a/security/tomoyo/network.c
+++ b/security/tomoyo/network.c
@@ -613,7 +613,7 @@ static int tomoyo_check_unix_address(struct sockaddr *addr,
 static bool tomoyo_kernel_service(void)
 {
/* Nothing to do if I am a kernel service. */
-   return (current->flags & (PF_KTHREAD | PF_IO_WORKER)) == PF_KTHREAD;
+   return current->flags & PF_KTHREAD;
 }
 
 /**
-- 
2.31.0

-- 
Jens Axboe

[PATCH] Revert "Smack: Handle io_uring kernel thread privileges"

2021-03-25 Thread Jens Axboe

This reverts commit 942cb357ae7d9249088e3687ee6a00ed2745a0c7.

The io_uring PF_IO_WORKER threads no longer have PF_KTHREAD set, so no
need to special case them for credential checks.

Cc: Casey Schaufler 
Signed-off-by: Jens Axboe 
---
 security/smack/smack_access.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
index 7eabb448acab..efe2406a3960 100644
--- a/security/smack/smack_access.c
+++ b/security/smack/smack_access.c
@@ -688,10 +688,9 @@ bool smack_privileged_cred(int cap, const struct cred 
*cred)
 bool smack_privileged(int cap)
 {
/*
-* Kernel threads may not have credentials we can use.
-* The io_uring kernel threads do have reliable credentials.
+* All kernel tasks are privileged
 */
-   if ((current->flags & (PF_KTHREAD | PF_IO_WORKER)) == PF_KTHREAD)
+   if (unlikely(current->flags & PF_KTHREAD))
return true;
 
return smack_privileged_cred(cap, current_cred());
-- 
2.31.0

-- 
Jens Axboe

[PATCH 6/8] Revert "signal: don't allow STOP on PF_IO_WORKER threads"

2021-03-25 Thread Jens Axboe

This reverts commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597.

The IO threads allow and handle SIGSTOP now, so don't special case them
anymore in task_set_jobctl_pending().

Signed-off-by: Jens Axboe 
---
 kernel/signal.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 8ce96078cb76..5ad8566534e7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -288,8 +288,7 @@ bool task_set_jobctl_pending(struct task_struct *task, 
unsigned long mask)
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
 
-   if (unlikely(fatal_signal_pending(task) ||
-(task->flags & (PF_EXITING | PF_IO_WORKER
+   if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
return false;
 
if (mask & JOBCTL_STOP_SIGMASK)
-- 
2.31.0

[PATCH 4/8] Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"

2021-03-25 Thread Jens Axboe

This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.

The IO threads do allow signals now, including SIGSTOP, and we can allow
ptrace attach. Attaching won't reveal anything interesting for the IO
threads, but it will allow eg gdb to attach to a task with io_urings
and IO threads without complaining. And once attached, it will allow
the usual introspection into regular threads.

Signed-off-by: Jens Axboe 
---
 kernel/ptrace.c | 2 +-
 kernel/signal.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 821cf1723814..61db50f7ca86 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -375,7 +375,7 @@ static int ptrace_attach(struct task_struct *task, long 
request,
audit_ptrace(task);
 
retval = -EPERM;
-   if (unlikely(task->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if (unlikely(task->flags & PF_KTHREAD))
goto out;
if (same_thread_group(task, current))
goto out;
diff --git a/kernel/signal.c b/kernel/signal.c
index cb9acdfb32fa..8ce96078cb76 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,7 +91,7 @@ static bool sig_task_ignored(struct task_struct *t, int sig, 
bool force)
return true;
 
/* Only allow kernel generated signals to this kthread */
-   if (unlikely((t->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+   if (unlikely((t->flags & PF_KTHREAD) &&
 (handler == SIG_KTHREAD_KERNEL) && !force))
return true;
 
@@ -1097,7 +1097,7 @@ static int __send_signal(int sig, struct kernel_siginfo 
*info, struct task_struc
/*
 * Skip useless siginfo allocation for SIGKILL and kernel threads.
 */
-   if ((sig == SIGKILL) || (t->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
goto out_set;
 
/*
-- 
2.31.0

[PATCH 3/8] Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"

2021-03-25 Thread Jens Axboe

This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5.

IO threads now take signals just fine, so there's no reason to limit them
specifically. Revert the change that prevented that from happening.

Signed-off-by: Jens Axboe 
---
 kernel/signal.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index f2a1b898da29..cb9acdfb32fa 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -834,9 +834,6 @@ static int check_kill_permission(int sig, struct 
kernel_siginfo *info,
 
if (!valid_signal(sig))
return -EINVAL;
-   /* PF_IO_WORKER threads don't take any signals */
-   if (t->flags & PF_IO_WORKER)
-   return -ESRCH;
 
if (!si_fromuser(info))
return 0;
-- 
2.31.0

[PATCH 1/8] io_uring: handle signals for IO threads like a normal thread

2021-03-25 Thread Jens Axboe

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.

Signed-off-by: Jens Axboe 
---
 fs/io-wq.c| 21 +
 fs/io_uring.c | 10 --
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index b7c1fa932cb3..2dbdc552f3ba 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -505,8 +505,14 @@ static int io_wqe_worker(void *data)
ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
if (try_to_freeze() || ret)
continue;
-   if (fatal_signal_pending(current))
-   break;
+   if (signal_pending(current)) {
+   struct ksignal ksig;
+
+   if (fatal_signal_pending(current))
+   break;
+   get_signal();
+   continue;
+   }
/* timed out, exit unless we're the fixed worker */
if (test_bit(IO_WQ_BIT_EXIT, >state) ||
!(worker->flags & IO_WORKER_F_FIXED))
@@ -715,8 +721,15 @@ static int io_wq_manager(void *data)
io_wq_check_workers(wq);
schedule_timeout(HZ);
try_to_freeze();
-   if (fatal_signal_pending(current))
-   set_bit(IO_WQ_BIT_EXIT, >state);
+   if (signal_pending(current)) {
+   struct ksignal ksig;
+
+   if (fatal_signal_pending(current))
+   set_bit(IO_WQ_BIT_EXIT, >state);
+   else
+   get_signal();
+   continue;
+   }
} while (!test_bit(IO_WQ_BIT_EXIT, >state));
 
io_wq_check_workers(wq);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54ea561db4a5..3a9d021db328 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6765,8 +6765,14 @@ static int io_sq_thread(void *data)
timeout = jiffies + sqd->sq_thread_idle;
continue;
}
-   if (fatal_signal_pending(current))
-   break;
+   if (signal_pending(current)) {
+   struct ksignal ksig;
+
+   if (fatal_signal_pending(current))
+   break;
+   get_signal();
+   continue;
+   }
sqt_spin = false;
cap_entries = !list_is_singular(>ctx_list);
list_for_each_entry(ctx, >ctx_list, sqd_list) {
-- 
2.31.0

[PATCH 5/8] Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"

2021-03-25 Thread Jens Axboe

This reverts commit 15b2219facadec583c24523eed40fa45865f859f.

Before IO threads accepted signals, the freezer using take signals to wake
up an IO thread would cause them to loop without any way to clear the
pending signal. That is no longer the case, so stop special casing
PF_IO_WORKER in the freezer.

Signed-off-by: Jens Axboe 
---
 kernel/freezer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 1a2d57d1327c..dc520f01f99d 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p)
return false;
}
 
-   if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if (!(p->flags & PF_KTHREAD))
fake_signal_wake_up(p);
else
wake_up_state(p, TASK_INTERRUPTIBLE);
-- 
2.31.0

[PATCH 0/6] Allow signals for IO threads

2021-03-25 Thread Jens Axboe

Hi,

As discussed in a previous thread today, the seemingly much saner approach
is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
threads. If we just have the threads call get_signal() for
signal_pending(), then everything just falls out naturally with how
we receive and handle signals.

Patch 1 adds support for checking and calling get_signal() from the
regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
SIGSTOP from the default IO thread blocked mask, and the rest just revert
special cases that were put in place for PF_IO_WORKER threads.

With this done, only two special cases remain for PF_IO_WORKER, and they
aren't related to signals so not part of this patchset. But both of them
can go away as well now that we have "real" threads as IO workers, and
then we'll have zero special cases for PF_IO_WORKER.

This passes the usual regression testing, my other usual 24h run has been
kicked off. But I wanted to send this out early.

Thanks to Linus for the suggestion. As with most other good ideas, it's
obvious once you hear it. The fact that we end up with _zero_ special
cases with this is a clear sign that this is the right way to do it
indeed. The fact that this series is 2/3rds revert further drives that
point home. Also thanks to Eric for diligent review on the signal side
of things for the past changes (and hopefully ditto on this series :-))

-- 
Jens Axboe

[PATCH 2/8] kernel: unmask SIGSTOP for IO threads

2021-03-25 Thread Jens Axboe

With IO threads accepting signals, including SIGSTOP, unmask the
SIGSTOP signal from the default blocked mask.

Signed-off-by: Jens Axboe 
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d3171e8e88e5..d5a40552910f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), 
void *arg, int node)
tsk = copy_process(NULL, 0, node, );
if (!IS_ERR(tsk)) {
sigfillset(>blocked);
-   sigdelsetmask(>blocked, sigmask(SIGKILL));
+   sigdelsetmask(>blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
}
return tsk;
 }
-- 
2.31.0

Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc//task/

2021-03-25 Thread Jens Axboe

On 3/25/21 3:57 PM, Stefan Metzmacher wrote:
> 
> Am 25.03.21 um 22:44 schrieb Jens Axboe:
>> On 3/25/21 2:40 PM, Jens Axboe wrote:
>>> On 3/25/21 2:12 PM, Linus Torvalds wrote:
>>>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>>>>  wrote:
>>>>>
>>>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>>>>  wrote:
>>>>>>
>>>>>> I don't know what the gdb logic is, but maybe there's some other
>>>>>> option that makes gdb not react to them?
>>>>>
>>>>> .. maybe we could have a different name for them under the task/
>>>>> subdirectory, for example (not  just the pid)? Although that probably
>>>>> messes up 'ps' too..
>>>>
>>>> Actually, maybe the right model is to simply make all the io threads
>>>> take signals, and get rid of all the special cases.
>>>>
>>>> Sure, the signals will never be delivered to user space, but if we
>>>>
>>>>  - just made the thread loop do "get_signal()" when there are pending 
>>>> signals
>>>>
>>>>  - allowed ptrace_attach on them
>>>>
>>>> they'd look pretty much like regular threads that just never do the
>>>> user-space part of signal handling.
>>>>
>>>> The whole "signals are very special for IO threads" thing has caused
>>>> so many problems, that maybe the solution is simply to _not_ make them
>>>> special?
>>>
>>> Just to wrap up the previous one, yes it broke all sorts of things to
>>> make the 'tid' directory different. They just end up being hidden anyway
>>> through that, for both ps and top.
>>>
>>> Yes, I do think that maybe it's better to just embrace maybe just
>>> embrace the signals, and have everything just work by default. It's
>>> better than continually trying to make the threads special. I'll see
>>> if there are some demons lurking down that path.
>>
>> In the spirit of "let's just try it", I ran with the below patch. With
>> that, I can gdb attach just fine to a test case that creates an io_uring
>> and a regular thread with pthread_create(). The regular thread uses
>> the ring, so you end up with two iou-mgr threads. Attach:
>>
>> [root@archlinux ~]# gdb -p 360
>> [snip gdb noise]
>> Attaching to process 360
>> [New LWP 361]
>> [New LWP 362]
>> [New LWP 363]
>>
>> warning: Selected architecture i386:x86-64 is not compatible with reported 
>> target architecture i386
>>
>> warning: Architecture rejected target-supplied description
>> Error while reading shared library symbols for /usr/lib/libpthread.so.0:
>> Cannot find user-level thread for LWP 363: generic error
>> 0x7f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
>> (gdb) info threads
>>   Id   Target Id Frame 
>> * 1LWP 360 "io_uring"0x7f7aa526e125 in 
>> clock_nanosleep@GLIBC_2.2.5 ()
>>from /usr/lib/libc.so.6
>>   2LWP 361 "iou-mgr-360" 0x in ?? ()
>>   3LWP 362 "io_uring"0x7f7aa52a0a9d in syscall () from 
>> /usr/lib/libc.so.6
>>   4LWP 363 "iou-mgr-362" 0x in ?? ()
>> (gdb) thread 2
>> [Switching to thread 2 (LWP 361)]
>> #0  0x in ?? ()
>> (gdb) bt
>> #0  0x in ?? ()
>> Backtrace stopped: Cannot access memory at address 0x0
>> (gdb) cont
>> Continuing.
>> ^C
>> Thread 1 "io_uring" received signal SIGINT, Interrupt.
>> [Switching to LWP 360]
>> 0x7f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
>> (gdb) q
>> A debugging session is active.
>>
>>  Inferior 1 [process 360] will be detached.
>>
>> Quit anyway? (y or n) y
>> Detaching from program: /root/git/fio/t/io_uring, process 360
>> [Inferior 1 (process 360) detached]
>>
>> The iou-mgr-x threads are stopped just fine, gdb obviously can't get any
>> real info out of them. But it works... Regular test cases work fine too,
>> just a sanity check. Didn't expect them not to.
> 
> I guess that's basically what I tried to describe when I said they
> should look like a userspace process that is blocked in a syscall
> forever.

Right, that's almost what they look like, in practice that is what they
look like.

>> Only thing that I dislike a bit, but I guess that's just a Linuxism, is
>> that

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 10937 matches

Mail list logo