Re: [PATCH 0/4] (Was: lockdep trace from posix timers)

2012-09-07 Thread Oleg Nesterov
On 09/06, Peter Zijlstra wrote:
>
> Anyway, do you want me to take these patches through -tip?

This would be great, thanks.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (Was: lockdep trace from posix timers)

2012-09-07 Thread Oleg Nesterov
On 09/06, Peter Zijlstra wrote:

 Anyway, do you want me to take these patches through -tip?

This would be great, thanks.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (Was: lockdep trace from posix timers)

2012-09-06 Thread Peter Zijlstra
On Thu, 2012-09-06 at 20:01 +0200, Oleg Nesterov wrote:
> Ping...

Right, email backlog :-)

> Peter, do you think you can do your make-it-lockless patch (hehe, I
> think this is not possible ;) on top?

Sure, I was trying to see if I could play games with the _cancel
semantics that would satisfy the two callsites and be possible. No joy
yet though.

Anyway, do you want me to take these patches through -tip?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (Was: lockdep trace from posix timers)

2012-09-06 Thread Oleg Nesterov
Ping...

Al, will you agree with these changes?

Peter, do you think you can do your make-it-lockless patch (hehe, I
think this is not possible ;) on top?

On 08/26, Oleg Nesterov wrote:
>
> On 08/24, Oleg Nesterov wrote:
> >
> > Peter, if you think it can work for you and if you agree with
> > the implementation I will be happy to send the patch.
>
> I think I should try anyway ;)
>
> To simplify the review, I attached the resulting code below.
>
> Changes:
>
>   - Comments.
>
>   - Not sure this is really better, but task_work_run()
> does not need to actually take pi_lock, unlock_wait
> is enough.
>
> However, in this case the dummy entry is better than
> the fake pointer.
>
> Oleg.
>
> #include 
> #include 
> #include 
>
> static struct callback_head work_exited; /* all we need is ->next == NULL */
>
> int
> task_work_add(struct task_struct *task, struct callback_head *work, bool 
> notify)
> {
>   struct callback_head *head;
>
>   do {
>   head = ACCESS_ONCE(task->task_works);
>   if (unlikely(head == _exited))
>   return -ESRCH;
>   work->next = head;
>   } while (cmpxchg(>task_works, head, work) != head);
>
>   if (notify)
>   set_notify_resume(task);
>   return 0;
> }
>
> struct callback_head *
> task_work_cancel(struct task_struct *task, task_work_func_t func)
> {
>   struct callback_head **pprev = >task_works;
>   struct callback_head *work = NULL;
>   unsigned long flags;
>   /*
>* If cmpxchg() fails we continue without updating pprev.
>* Either we raced with task_work_add() which added the
>* new entry before this work, we will find it again. Or
>* we raced with task_work_run(), *pprev == NULL/exited.
>*/
>   raw_spin_lock_irqsave(>pi_lock, flags);
>   while ((work = ACCESS_ONCE(*pprev))) {
>   read_barrier_depends();
>   if (work->func != func)
>   pprev = >next;
>   else if (cmpxchg(pprev, work, work->next) == work)
>   break;
>   }
>   raw_spin_unlock_irqrestore(>pi_lock, flags);
>
>   return work;
> }
>
> void task_work_run(void)
> {
>   struct task_struct *task = current;
>   struct callback_head *work, *head, *next;
>
>   for (;;) {
>   /*
>* work->func() can do task_work_add(), do not set
>* work_exited unless the list is empty.
>*/
>   do {
>   work = ACCESS_ONCE(task->task_works);
>   head = !work && (task->flags & PF_EXITING) ?
>   _exited : NULL;
>   } while (cmpxchg(>task_works, work, head) != work);
>
>   if (!work)
>   break;
>   /*
>* Synchronize with task_work_cancel(). It can't remove
>* the first entry == work, cmpxchg(task_works) should
>* fail, but it can play with *work and other entries.
>*/
>   raw_spin_unlock_wait(>pi_lock);
>   smp_mb();
>
>   /* Reverse the list to run the works in fifo order */
>   head = NULL;
>   do {
>   next = work->next;
>   work->next = head;
>   head = work;
>   work = next;
>   } while (work);
>
>   work = head;
>   do {
>   next = work->next;
>   work->func(work);
>   work = next;
>   cond_resched();
>   } while (work);
>   }
> }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (Was: lockdep trace from posix timers)

2012-09-06 Thread Oleg Nesterov
Ping...

Al, will you agree with these changes?

Peter, do you think you can do your make-it-lockless patch (hehe, I
think this is not possible ;) on top?

On 08/26, Oleg Nesterov wrote:

 On 08/24, Oleg Nesterov wrote:
 
  Peter, if you think it can work for you and if you agree with
  the implementation I will be happy to send the patch.

 I think I should try anyway ;)

 To simplify the review, I attached the resulting code below.

 Changes:

   - Comments.

   - Not sure this is really better, but task_work_run()
 does not need to actually take pi_lock, unlock_wait
 is enough.

 However, in this case the dummy entry is better than
 the fake pointer.

 Oleg.

 #include linux/spinlock.h
 #include linux/task_work.h
 #include linux/tracehook.h

 static struct callback_head work_exited; /* all we need is -next == NULL */

 int
 task_work_add(struct task_struct *task, struct callback_head *work, bool 
 notify)
 {
   struct callback_head *head;

   do {
   head = ACCESS_ONCE(task-task_works);
   if (unlikely(head == work_exited))
   return -ESRCH;
   work-next = head;
   } while (cmpxchg(task-task_works, head, work) != head);

   if (notify)
   set_notify_resume(task);
   return 0;
 }

 struct callback_head *
 task_work_cancel(struct task_struct *task, task_work_func_t func)
 {
   struct callback_head **pprev = task-task_works;
   struct callback_head *work = NULL;
   unsigned long flags;
   /*
* If cmpxchg() fails we continue without updating pprev.
* Either we raced with task_work_add() which added the
* new entry before this work, we will find it again. Or
* we raced with task_work_run(), *pprev == NULL/exited.
*/
   raw_spin_lock_irqsave(task-pi_lock, flags);
   while ((work = ACCESS_ONCE(*pprev))) {
   read_barrier_depends();
   if (work-func != func)
   pprev = work-next;
   else if (cmpxchg(pprev, work, work-next) == work)
   break;
   }
   raw_spin_unlock_irqrestore(task-pi_lock, flags);

   return work;
 }

 void task_work_run(void)
 {
   struct task_struct *task = current;
   struct callback_head *work, *head, *next;

   for (;;) {
   /*
* work-func() can do task_work_add(), do not set
* work_exited unless the list is empty.
*/
   do {
   work = ACCESS_ONCE(task-task_works);
   head = !work  (task-flags  PF_EXITING) ?
   work_exited : NULL;
   } while (cmpxchg(task-task_works, work, head) != work);

   if (!work)
   break;
   /*
* Synchronize with task_work_cancel(). It can't remove
* the first entry == work, cmpxchg(task_works) should
* fail, but it can play with *work and other entries.
*/
   raw_spin_unlock_wait(task-pi_lock);
   smp_mb();

   /* Reverse the list to run the works in fifo order */
   head = NULL;
   do {
   next = work-next;
   work-next = head;
   head = work;
   work = next;
   } while (work);

   work = head;
   do {
   next = work-next;
   work-func(work);
   work = next;
   cond_resched();
   } while (work);
   }
 }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (Was: lockdep trace from posix timers)

2012-09-06 Thread Peter Zijlstra
On Thu, 2012-09-06 at 20:01 +0200, Oleg Nesterov wrote:
 Ping...

Right, email backlog :-)

 Peter, do you think you can do your make-it-lockless patch (hehe, I
 think this is not possible ;) on top?

Sure, I was trying to see if I could play games with the _cancel
semantics that would satisfy the two callsites and be possible. No joy
yet though.

Anyway, do you want me to take these patches through -tip?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] (Was: lockdep trace from posix timers)

2012-08-26 Thread Oleg Nesterov
On 08/24, Oleg Nesterov wrote:
>
> Peter, if you think it can work for you and if you agree with
> the implementation I will be happy to send the patch.

I think I should try anyway ;)

To simplify the review, I attached the resulting code below.

Changes:

- Comments.

- Not sure this is really better, but task_work_run()
  does not need to actually take pi_lock, unlock_wait
  is enough.

  However, in this case the dummy entry is better than
  the fake pointer.

Oleg.

#include 
#include 
#include 

static struct callback_head work_exited; /* all we need is ->next == NULL */

int
task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
{
struct callback_head *head;

do {
head = ACCESS_ONCE(task->task_works);
if (unlikely(head == _exited))
return -ESRCH;
work->next = head;
} while (cmpxchg(>task_works, head, work) != head);

if (notify)
set_notify_resume(task);
return 0;
}

struct callback_head *
task_work_cancel(struct task_struct *task, task_work_func_t func)
{
struct callback_head **pprev = >task_works;
struct callback_head *work = NULL;
unsigned long flags;
/*
 * If cmpxchg() fails we continue without updating pprev.
 * Either we raced with task_work_add() which added the
 * new entry before this work, we will find it again. Or
 * we raced with task_work_run(), *pprev == NULL/exited.
 */
raw_spin_lock_irqsave(>pi_lock, flags);
while ((work = ACCESS_ONCE(*pprev))) {
read_barrier_depends();
if (work->func != func)
pprev = >next;
else if (cmpxchg(pprev, work, work->next) == work)
break;
}
raw_spin_unlock_irqrestore(>pi_lock, flags);

return work;
}

void task_work_run(void)
{
struct task_struct *task = current;
struct callback_head *work, *head, *next;

for (;;) {
/*
 * work->func() can do task_work_add(), do not set
 * work_exited unless the list is empty.
 */
do {
work = ACCESS_ONCE(task->task_works);
head = !work && (task->flags & PF_EXITING) ?
_exited : NULL;
} while (cmpxchg(>task_works, work, head) != work);

if (!work)
break;
/*
 * Synchronize with task_work_cancel(). It can't remove
 * the first entry == work, cmpxchg(task_works) should
 * fail, but it can play with *work and other entries.
 */
raw_spin_unlock_wait(>pi_lock);
smp_mb();

/* Reverse the list to run the works in fifo order */
head = NULL;
do {
next = work->next;
work->next = head;
head = work;
work = next;
} while (work);

work = head;
do {
next = work->next;
work->func(work);
work = next;
cond_resched();
} while (work);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] (Was: lockdep trace from posix timers)

2012-08-26 Thread Oleg Nesterov
On 08/24, Oleg Nesterov wrote:

 Peter, if you think it can work for you and if you agree with
 the implementation I will be happy to send the patch.

I think I should try anyway ;)

To simplify the review, I attached the resulting code below.

Changes:

- Comments.

- Not sure this is really better, but task_work_run()
  does not need to actually take pi_lock, unlock_wait
  is enough.

  However, in this case the dummy entry is better than
  the fake pointer.

Oleg.

#include linux/spinlock.h
#include linux/task_work.h
#include linux/tracehook.h

static struct callback_head work_exited; /* all we need is -next == NULL */

int
task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
{
struct callback_head *head;

do {
head = ACCESS_ONCE(task-task_works);
if (unlikely(head == work_exited))
return -ESRCH;
work-next = head;
} while (cmpxchg(task-task_works, head, work) != head);

if (notify)
set_notify_resume(task);
return 0;
}

struct callback_head *
task_work_cancel(struct task_struct *task, task_work_func_t func)
{
struct callback_head **pprev = task-task_works;
struct callback_head *work = NULL;
unsigned long flags;
/*
 * If cmpxchg() fails we continue without updating pprev.
 * Either we raced with task_work_add() which added the
 * new entry before this work, we will find it again. Or
 * we raced with task_work_run(), *pprev == NULL/exited.
 */
raw_spin_lock_irqsave(task-pi_lock, flags);
while ((work = ACCESS_ONCE(*pprev))) {
read_barrier_depends();
if (work-func != func)
pprev = work-next;
else if (cmpxchg(pprev, work, work-next) == work)
break;
}
raw_spin_unlock_irqrestore(task-pi_lock, flags);

return work;
}

void task_work_run(void)
{
struct task_struct *task = current;
struct callback_head *work, *head, *next;

for (;;) {
/*
 * work-func() can do task_work_add(), do not set
 * work_exited unless the list is empty.
 */
do {
work = ACCESS_ONCE(task-task_works);
head = !work  (task-flags  PF_EXITING) ?
work_exited : NULL;
} while (cmpxchg(task-task_works, work, head) != work);

if (!work)
break;
/*
 * Synchronize with task_work_cancel(). It can't remove
 * the first entry == work, cmpxchg(task_works) should
 * fail, but it can play with *work and other entries.
 */
raw_spin_unlock_wait(task-pi_lock);
smp_mb();

/* Reverse the list to run the works in fifo order */
head = NULL;
do {
next = work-next;
work-next = head;
head = work;
work = next;
} while (work);

work = head;
do {
next = work-next;
work-func(work);
work = next;
cond_resched();
} while (work);
}
}

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/