Re: [PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-03-02 Thread Oleg Nesterov
On 02/26, Waiman Long wrote: +void queue_spin_lock_slowpath(struct qspinlock *lock, int qsval) +{ + unsigned int cpu_nr, qn_idx; + struct qnode *node, *next; + u32 prev_qcode, my_qcode; + + /* + * Get the queue node + */ + cpu_nr = smp_processor_id(); +

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-03-02 Thread Oleg Nesterov
On 02/26, Waiman Long wrote: @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) int qlcode = atomic_read(lock-qlcode); if (!(qlcode _QSPINLOCK_LOCKED) (atomic_cmpxchg(lock-qlcode, - qlcode, qlcode|_QSPINLOCK_LOCKED) ==

Re: [PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-03-02 Thread Oleg Nesterov
Forgot to ask... On 02/26, Waiman Long wrote: +notify_next: + /* + * Wait, if needed, until the next one in queue set up the next field + */ + while (!(next = ACCESS_ONCE(node-next))) + arch_mutex_cpu_relax(); + /* + * The next one in queue is now

Re: [PATCH v5 2/8] qspinlock, x86: Enable x86-64 to use queue spinlock

2014-03-02 Thread Oleg Nesterov
On 02/26, Waiman Long wrote: +#define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS + +/* + * x86-64 specific queue spinlock union structure + */ +union arch_qspinlock { + struct qspinlock slock; + u8 lock; /* Lock bit */ +}; And this enables the optimized version of

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Oleg Nesterov
On 02/06, Sasha Levin wrote: Can we modify it slightly to avoid potentially accessing invalid memory: diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 5315887..cd22d73 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Oleg Nesterov
On 02/10, Jeremy Fitzhardinge wrote: On 02/10/2015 05:26 AM, Oleg Nesterov wrote: On 02/10, Raghavendra K T wrote: Unfortunately xadd could result in head overflow as tail is high. The other option was repeated cmpxchg which is bad I believe. Any suggestions? Stupid question... what

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Oleg Nesterov
On 02/11, Raghavendra K T wrote: On 02/10/2015 06:56 PM, Oleg Nesterov wrote: In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg the whole .head_tail. Plus obviously more boring changes. This needs a separate patch even _if_ this can work. Correct, but apart from

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Oleg Nesterov
On 02/10, Denys Vlasenko wrote: # define HEAD_MASK (TICKET_SLOWPATH_FLAG-1) ... unlock_again: val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC); if (unlikely(!(val HEAD_MASK))) { /* overflow. we inadvertently incremented the tail word. * tail's lsb is TICKET_SLOWPATH_FLAG.

Re: [PATCH V2] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Oleg Nesterov
On 02/09, Raghavendra K T wrote: +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ + arch_spinlock_t old, new; + __ticket_t diff; + + old.tickets = READ_ONCE(lock-tickets); + diff = (old.tickets.tail ~TICKET_SLOWPATH_FLAG) - old.tickets.head;

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Oleg Nesterov
On 02/10, Raghavendra K T wrote: On 02/10/2015 06:23 AM, Linus Torvalds wrote: add_smp(lock-tickets.head, TICKET_LOCK_INC); if (READ_ONCE(lock-tickets.tail) TICKET_SLOWPATH_FLAG) .. into something like val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC

Re: [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-13 Thread Oleg Nesterov
On 02/13, Raghavendra K T wrote: @@ -164,7 +161,7 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock) { struct __raw_tickets tmp = READ_ONCE(lock-tickets); - return tmp.tail != tmp.head; + return tmp.tail != (tmp.head ~TICKET_SLOWPATH_FLAG); } Well, this can

Re: [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-13 Thread Oleg Nesterov
On 02/13, Oleg Nesterov wrote: On 02/13, Raghavendra K T wrote: @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) * check again make sure it didn't become free while * we weren't looking. */ - if (ACCESS_ONCE(lock

Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
Damn, sorry for noise, forgot to mention... On 02/12, Raghavendra K T wrote: +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, + __ticket_t head) +{ + if (head TICKET_SLOWPATH_FLAG) { +

Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/12, Raghavendra K T wrote: @@ -191,8 +189,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) * We need to check unlocked in a loop, tmp.head == head * can be false positive because of overflow. */ - if

Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/12, Raghavendra K T wrote: @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) * check again make sure it didn't become free while * we weren't looking. */ - if (ACCESS_ONCE(lock-tickets.head) == want) { +

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/11, Jeremy Fitzhardinge wrote: On 02/11/2015 09:24 AM, Oleg Nesterov wrote: I agree, and I have to admit I am not sure I fully understand why unlock uses the locked add. Except we need a barrier to avoid the race with the enter_slowpath() users, of course. Perhaps this is the only

Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Oleg Nesterov
Well, I regret I mentioned the lack of barrier after enter_slowpath ;) On 02/15, Raghavendra K T wrote: @@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct static_key *key); static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { - set_bit(0, (volatile

Re: [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Oleg Nesterov
On 02/15, Raghavendra K T wrote: On 02/13/2015 09:02 PM, Oleg Nesterov wrote: @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) * check again make sure it didn't become free while * we weren't looking. */ - if (ACCESS_ONCE

Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Oleg Nesterov
On 02/15, Raghavendra K T wrote: * Raghavendra K T raghavendra...@linux.vnet.ibm.com [2015-02-15 11:25:44]: Resending the V5 with smp_mb__after_atomic() change without bumping up revision Reviewed-by: Oleg Nesterov o...@redhat.com Of course, this needs the acks from maintainers. And I

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-14 Thread Oleg Nesterov
On 06/11, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Could you spell to explain why this can't work (again, in this simple case) > > ? > > > > My current (and I know, very poor) understanding is that .release() should > > roughly

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Oleg Nesterov
Mike, sorry, I don't understand your email. Just in case, let me remind I know nothing about drivers/vhost/ On 05/29, michael.chris...@oracle.com wrote: > > On 5/29/23 6:19 AM, Oleg Nesterov wrote: > > On 05/27, Eric W. Biederman wrote: > >> > >> Looking forward I d

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Oleg Nesterov
On 05/29, Oleg Nesterov wrote: > > Mike, sorry, I don't understand your email. > > Just in case, let me remind I know nothing about drivers/vhost/ And... this is slightly off-topic but you didn't answer my previous question and I am just curious. Do you agree that, even if we

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Oleg Nesterov
On 05/27, Eric W. Biederman wrote: > > Looking forward I don't see not asking the worker threads to stop > for the coredump right now causing any problems in the future. > So I think we can use this to resolve the coredump issue I spotted. But we have almost the same problem with exec. Execing

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-31 Thread Oleg Nesterov
On 05/31, Jason Wang wrote: > > On Wed, May 31, 2023 at 3:25 PM Oleg Nesterov wrote: > > > > On 05/31, Jason Wang wrote: > > > > > > 在 2023/5/23 20:15, Oleg Nesterov 写道: > > > > > > > > /* make sur

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-31 Thread Oleg Nesterov
On 05/31, Jason Wang wrote: > > 在 2023/5/23 20:15, Oleg Nesterov 写道: > > > > /* make sure flag is seen after deletion */ > > smp_wmb(); > > llist_for_each_entry_safe(work, work_next, node, node) { > >

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Oleg Nesterov
Hi Mike, sorry, but somehow I can't understand this patch... I'll try to read it with a fresh head on Weekend, but for example, On 06/01, Mike Christie wrote: > > static int vhost_task_fn(void *data) > { > struct vhost_task *vtsk = data; > - int ret; > + bool dead = false; > + >

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Oleg Nesterov
On 06/02, Jason Wang wrote: > > On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov wrote: > > > > and the final rewrite: > > > > if (work->node) { > > work_next = work->node->next; > > if

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-22 Thread Oleg Nesterov
On 05/22, Mike Christie wrote: > > On 5/22/23 7:30 AM, Oleg Nesterov wrote: > >> + /* > >> + * When we get a SIGKILL our release function will > >> + * be called. That will stop new IOs from being queued >

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-24 Thread Oleg Nesterov
On 05/23, Eric W. Biederman wrote: > > I want to point out that we need to consider not just SIGKILL, but > SIGABRT that causes a coredump, as well as the process peforming > an ordinary exit(2). All of which will cause get_signal to return > SIGKILL in this context. Yes, but probably

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-25 Thread Oleg Nesterov
On 05/24, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Yes, but probably SIGABRT/exit doesn't really differ from SIGKILL wrt > > vhost_worker(). > > Actually I think it reveals that exiting with SIGABRT will cause > a deadlock. > > coredump_w

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/02, Linus Torvalds wrote: > > On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov wrote: > > > > As I said from the very beginning, this code is fine on x86 because > > atomic ops are fully serialised on x86. > > Yes. Other architectures require __smp_mb__{befo

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/02, Eric W. Biederman wrote: > > static int vhost_task_fn(void *data) > { > struct vhost_task *vtsk = data; > - int ret; > + bool dead = false; > + > + for (;;) { > + bool did_work; > + > + if (!dead && signal_pending(current)) { > +

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/01, Mike Christie wrote: > > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1368,7 +1368,9 @@ int zap_other_threads(struct task_struct *p) > > while_each_thread(p, t) { > task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); > - count++; > + /*

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/03, michael.chris...@oracle.com wrote: > > On 6/2/23 11:15 PM, Eric W. Biederman wrote: > The problem is that as part of the flush the drivers/vhost/scsi.c code > will wait for outstanding commands, because we can't free the device and > it's resources before the commands complete or we will

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/02, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Hi Mike, > > > > sorry, but somehow I can't understand this patch... > > > > I'll try to read it with a fresh head on Weekend, but for example, > > > > On 06/01, Mike Christie wr

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-06 Thread Oleg Nesterov
On 06/06, Mike Christie wrote: > > On 6/6/23 7:16 AM, Oleg Nesterov wrote: > > On 06/05, Mike Christie wrote: > > > >> So it works like if we were using a kthread still: > >> > >> 1. Userapce thread0 opens /dev/vhost-$something. > >

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-23 Thread Oleg Nesterov
On 05/22, Oleg Nesterov wrote: > > Right now I think that "int dead" should die, No, probably we shouldn't call get_signal() if we have already dequeued SIGKILL. > but let me think tomorrow. May be something like this... I don't like it but I can't suggest anythin

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-22 Thread Oleg Nesterov
Confused, please help... On 05/21, Mike Christie wrote: > > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -338,6 +338,7 @@ static int vhost_worker(void *data) > struct vhost_worker *worker = data; > struct vhost_work *work, *work_next; > struct llist_node *node;

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-30 Thread Oleg Nesterov
On 05/30, Christian Brauner wrote: > > On Mon, May 29, 2023 at 01:19:39PM +0200, Oleg Nesterov wrote: > > > > If we want CLONE_THREAD, I think vhost_worker() should exit after > > get_signal() > > returns SIGKILL. Perhaps it should "disable" vhost_work_queu

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-01 Thread Oleg Nesterov
On 06/01, Jason Wang wrote: > > On Wed, May 31, 2023 at 5:14 PM Oleg Nesterov wrote: > > > > > > I don't understand you. OK, to simplify, suppose we have 2 global vars > > > > > > > > void *PTR = something_non_null; > > > >

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-06 Thread Oleg Nesterov
On 06/05, Mike Christie wrote: > > On 6/5/23 10:10 AM, Oleg Nesterov wrote: > > On 06/03, michael.chris...@oracle.com wrote: > >> > >> On 6/2/23 11:15 PM, Eric W. Biederman wrote: > >> The problem is that as part of the flush the drivers/vhost/scsi.c code

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-22 Thread Oleg Nesterov
On 05/18, Eric W. Biederman wrote: > > void recalc_sigpending(void) > { > - if (!recalc_sigpending_tsk(current) && !freezing(current)) > + if ((!recalc_sigpending_tsk(current) && !freezing(current)) || > + ((current->signal->flags & SIGNAL_GROUP_EXIT) && > +

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Oleg Nesterov
On 05/15, Mike Christie wrote: > > Oleg and Christian, > > > Below is an updated patch that doesn't check for PF_USER_WORKER in the > signal.c code and instead will check for if we have blocked the signal. Looks like I need to read the whole series... will try tomorrow. > --- a/kernel/fork.c >

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Oleg Nesterov
On 05/15, Mike Christie wrote: > > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -75,13 +75,13 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), > void *arg, >const char *name) > { > struct kernel_clone_args args = { > -

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Oleg Nesterov
On 05/16, Eric W. Biederman wrote: > > A kernel thread can block SIGKILL and that is supported. > > For a thread that is part of a process you can't block SIGKILL when the > task is part of a user mode process. Or SIGSTOP. Another thread can call do_signal_stop()->signal_wake_up/etc. > There is

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-17 Thread Oleg Nesterov
On 05/16, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > >> There is this bit in complete_signal when SIGKILL is delivered to any > >> thread in the process. > >> > >>t = p; > >>do { >

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
I too do not understand the 1st change in this patch ... On 05/18, Mike Christie wrote: > > In the other patches we do: > > if (get_signal(ksig)) > start_exit_cleanup_by_stopping_newIO() > flush running IO() > exit() > > But to do the flush running IO() part of this I need to

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
On 05/18, Mike Christie wrote: > > On 5/18/23 11:25 AM, Oleg Nesterov wrote: > > I too do not understand the 1st change in this patch ... > > > > On 05/18, Mike Christie wrote: > >> > >> In the other patches we do: > >> > >> if (ge

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
On 05/18, Christian Brauner wrote: > > Yeah, but these are issues that exist with PF_IO_WORKER then too This was my thought too but I am starting to think I was wrong. Of course I don't understand the code in io_uring/ but it seems that it always breaks the IO loops if get_signal() returns

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
On 05/18, Christian Brauner wrote: > > On Thu, May 18, 2023 at 08:08:10PM +0200, Oleg Nesterov wrote: > > On 05/18, Christian Brauner wrote: > > > > > > Yeah, but these are issues that exist with PF_IO_WORKER then too > > > > This was my thought