Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-21 Thread Will Deacon
On Tue, Jul 12, 2011 at 10:31:00AM +0100, Peter Zijlstra wrote:
 On Tue, 2011-07-12 at 12:27 +0300, Avi Kivity wrote:
  On 07/12/2011 12:18 PM, Peter Zijlstra wrote:
   
 The guarantee is that the task was sleeping just before the function is
 called.  Of course it's woken up to run the function.
   
 The idea is that you run the function in a known safe point to avoid
 extra synchronization.
   
  
   I'd much rather we didn't wake the task and let it sleep, that's usually
   a very safe place for tasks to be. All you'd need is a guarantee it
   won't be woken up while you're doing your thing.
  
  But it means that 'current' is not set to the right value.  If the 
  function depends on it, then it will misbehave.  And in fact 
  preempt_notifier_register(), which is the function we want to call here, 
  does depend on current.
  
  Of course we need to find more users for this, but I have a feeling this 
  will be generally useful.  The alternative is to keep adding bits to 
  thread_info::flags.
 
 Using TIF_bits sounds like a much better solution for this, wakeups are
 really rather expensive and its best to avoid extra if at all possible.

The problem with using a TIF bit to tell a task that it needs to perform
some preempt_notifier registrations is that you end up with something that
looks a lot like preempt notifiers! You also don't escape the concurrent
read/write to thelist of pending registrations.

One thing I tried was simply using an RCU protected hlist for the preempt
notifiers so that we don't have to worry about atomicity when reading the
notifiers in finish_task_switch. It's a bit odd, since we know we only ever
have a single reader, but I've included it below anyway.

If anybody has any better ideas, I'm all ears.

Will


diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 2e681d9..2e21ffe 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -132,6 +132,11 @@ struct preempt_notifier {
 void preempt_notifier_register(struct preempt_notifier *notifier);
 void preempt_notifier_unregister(struct preempt_notifier *notifier);
 
+void preempt_notifier_register_task(struct preempt_notifier *notifier,
+   struct task_struct *tsk);
+void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
+ struct task_struct *tsk);
+
 static inline void preempt_notifier_init(struct preempt_notifier *notifier,
 struct preempt_ops *ops)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..5530d91 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1233,6 +1233,7 @@ struct task_struct {
 #ifdef CONFIG_PREEMPT_NOTIFIERS
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
+   struct mutex preempt_notifiers_mutex;
 #endif
 
/*
diff --git a/kernel/sched.c b/kernel/sched.c
index 9769c75..d3c46ca 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2784,6 +2784,7 @@ static void __sched_fork(struct task_struct *p)
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(p-preempt_notifiers);
+   mutex_init(p-preempt_notifiers_mutex);
 #endif
 }
 
@@ -2901,13 +2902,31 @@ void wake_up_new_task(struct task_struct *p)
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 
+void preempt_notifier_register_task(struct preempt_notifier *notifier,
+   struct task_struct *tsk)
+{
+   mutex_lock(tsk-preempt_notifiers_mutex);
+   hlist_add_head_rcu(notifier-link, tsk-preempt_notifiers);
+   mutex_unlock(tsk-preempt_notifiers_mutex);
+}
+EXPORT_SYMBOL_GPL(preempt_notifier_register_task);
+
+void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
+ struct task_struct *tsk)
+{
+   mutex_lock(tsk-preempt_notifiers_mutex);
+   hlist_del_rcu(notifier-link);
+   mutex_unlock(tsk-preempt_notifiers_mutex);
+}
+EXPORT_SYMBOL_GPL(preempt_notifier_unregister_task);
+
 /**
  * preempt_notifier_register - tell me when current is being preempted  
rescheduled
  * @notifier: notifier struct to register
  */
 void preempt_notifier_register(struct preempt_notifier *notifier)
 {
-   hlist_add_head(notifier-link, current-preempt_notifiers);
+   preempt_notifier_register_task(notifier, current);
 }
 EXPORT_SYMBOL_GPL(preempt_notifier_register);
 
@@ -2919,7 +2938,7 @@ EXPORT_SYMBOL_GPL(preempt_notifier_register);
  */
 void preempt_notifier_unregister(struct preempt_notifier *notifier)
 {
-   hlist_del(notifier-link);
+   preempt_notifier_unregister_task(notifier, current);
 }
 EXPORT_SYMBOL_GPL(preempt_notifier_unregister);
 
@@ -2928,8 +2947,12 @@ static void fire_sched_in_preempt_notifiers(struct 
task_struct *curr)
struct preempt_notifier *notifier;
struct hlist_node *node;
 
-   hlist_for_each_entry(notifier, node, curr-preempt_notifiers, link)
+   

Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-21 Thread Avi Kivity

On 07/21/2011 06:32 PM, Will Deacon wrote:


  Using TIF_bits sounds like a much better solution for this, wakeups are
  really rather expensive and its best to avoid extra if at all possible.

The problem with using a TIF bit to tell a task that it needs to perform
some preempt_notifier registrations is that you end up with something that
looks a lot like preempt notifiers! You also don't escape the concurrent
read/write to thelist of pending registrations.

One thing I tried was simply using an RCU protected hlist for the preempt
notifiers so that we don't have to worry about atomicity when reading the
notifiers in finish_task_switch. It's a bit odd, since we know we only ever
have a single reader, but I've included it below anyway.

If anybody has any better ideas, I'm all ears.



+void preempt_notifier_register_task(struct preempt_notifier *notifier,
+   struct task_struct *tsk)
+{
+   mutex_lock(tsk-preempt_notifiers_mutex);
+   hlist_add_head_rcu(notifier-link,tsk-preempt_notifiers);
+   mutex_unlock(tsk-preempt_notifiers_mutex);
+}
+EXPORT_SYMBOL_GPL(preempt_notifier_register_task);
+
+void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
+ struct task_struct *tsk)
+{
+   mutex_lock(tsk-preempt_notifiers_mutex);
+   hlist_del_rcu(notifier-link);
+   mutex_unlock(tsk-preempt_notifiers_mutex);
+}
+EXPORT_SYMBOL_GPL(preempt_notifier_unregister_task);
+
  /**
   * preempt_notifier_register - tell me when current is being preempted  
rescheduled
   * @notifier: notifier struct to register
   */
  void preempt_notifier_register(struct preempt_notifier *notifier)
  {
-   hlist_add_head(notifier-link,current-preempt_notifiers);
+   preempt_notifier_register_task(notifier, current);
  }
  EXPORT_SYMBOL_GPL(preempt_notifier_register);


This is (and must be) called from a preempt disabled context, no mutexes 
around here.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-21 Thread Will Deacon
On Thu, Jul 21, 2011 at 04:36:43PM +0100, Avi Kivity wrote:
 On 07/21/2011 06:32 PM, Will Deacon wrote:
  
Using TIF_bits sounds like a much better solution for this, wakeups are
really rather expensive and its best to avoid extra if at all possible.
 
  The problem with using a TIF bit to tell a task that it needs to perform
  some preempt_notifier registrations is that you end up with something that
  looks a lot like preempt notifiers! You also don't escape the concurrent
  read/write to thelist of pending registrations.
 
  One thing I tried was simply using an RCU protected hlist for the preempt
  notifiers so that we don't have to worry about atomicity when reading the
  notifiers in finish_task_switch. It's a bit odd, since we know we only ever
  have a single reader, but I've included it below anyway.
 
  If anybody has any better ideas, I'm all ears.
 
  +void preempt_notifier_register_task(struct preempt_notifier *notifier,
  +   struct task_struct *tsk)
  +{
  +   mutex_lock(tsk-preempt_notifiers_mutex);
  +   hlist_add_head_rcu(notifier-link,tsk-preempt_notifiers);
  +   mutex_unlock(tsk-preempt_notifiers_mutex);
  +}
  +EXPORT_SYMBOL_GPL(preempt_notifier_register_task);
  +
  +void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
  + struct task_struct *tsk)
  +{
  +   mutex_lock(tsk-preempt_notifiers_mutex);
  +   hlist_del_rcu(notifier-link);
  +   mutex_unlock(tsk-preempt_notifiers_mutex);
  +}
  +EXPORT_SYMBOL_GPL(preempt_notifier_unregister_task);
  +
/**
 * preempt_notifier_register - tell me when current is being preempted  
  rescheduled
 * @notifier: notifier struct to register
 */
void preempt_notifier_register(struct preempt_notifier *notifier)
{
  -   hlist_add_head(notifier-link,current-preempt_notifiers);
  +   preempt_notifier_register_task(notifier, current);
}
EXPORT_SYMBOL_GPL(preempt_notifier_register);
 
 This is (and must be) called from a preempt disabled context, no mutexes 
 around here.

Bah, yes, that is essential if you're dealing with current. Maybe use a
spinlock instead?

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-21 Thread Avi Kivity

On 07/21/2011 06:46 PM, Will Deacon wrote:


  This is (and must be) called from a preempt disabled context, no mutexes
  around here.

Bah, yes, that is essential if you're dealing with current. Maybe use a
spinlock instead?


Could work.  Not thrilled about adding it to the kvm hot path, but I 
can't say it will make a measurable impact.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-21 Thread Will Deacon
On Thu, Jul 21, 2011 at 04:59:00PM +0100, Avi Kivity wrote:
 On 07/21/2011 06:46 PM, Will Deacon wrote:
  
This is (and must be) called from a preempt disabled context, no mutexes
around here.
 
  Bah, yes, that is essential if you're dealing with current. Maybe use a
  spinlock instead?
 
 Could work.  Not thrilled about adding it to the kvm hot path, but I 
 can't say it will make a measurable impact.

Understood, but at least it doesn't contribute to finish_task_switch. I also
wouldn't expect to have multiple preempt registrations in parallel for the
same task so that lock should rarely (if ever) be contended.

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 12:07 AM, Will Deacon wrote:

On Mon, Jul 04, 2011 at 03:36:57PM +0100, Frederic Weisbecker wrote:
  On Mon, Jul 04, 2011 at 05:10:20PM +0300, Avi Kivity wrote:
On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:
Another thing I would like to do in the even longer term is to not use 
perf anymore
for ptrace breakpoints, because that involves a heavy dependency and few 
people are
happy with that. Instead we should just have a generic hook into the 
sched_switch()
and handle pure ptrace breakpoints there. The central breakpoint API 
would still be
there to reserve/schedule breakpoint resources between ptrace and perf.

  
'struct preempt_notifier' may be the hook you're looking for.

  Yeah looks like a perfect fit as it's per task.

I had a quick look at this and I think the preempt_notifier stuff needs
slightly extending so that we can register a notifier for a task other than
current [e.g. the child of current on which we are installing breakpoints].

If the task in question is running, it looks like this will introduce a race
condition between notifier registration and rescheduling. For the purposes
of ptrace this shouldn't be a problem as the child will be stopped, but
others might also want to make use of the new functionality.

Any ideas on how this could be achieved, or am I better off just restricting
this to children that are being traced?


Maybe we need a generic run this function in this task's context 
mechanism instead.  Like an IPI, but targeting tasks instead of cpus.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Peter Zijlstra
On Tue, 2011-07-12 at 10:20 +0300, Avi Kivity wrote:

 Maybe we need a generic run this function in this task's context 
 mechanism instead.  Like an IPI, but targeting tasks instead of cpus.
 

kernel/event/core.c:task_function_call() like?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 11:38 AM, Peter Zijlstra wrote:

On Tue, 2011-07-12 at 10:20 +0300, Avi Kivity wrote:

  Maybe we need a generic run this function in this task's context
  mechanism instead.  Like an IPI, but targeting tasks instead of cpus.


kernel/event/core.c:task_function_call() like?


Similar, but with stronger guarantees: when the function is called, 
current == p, and the task was either sleeping or in userspace.


This can be used to reduce synchronization requirements between tasks.  
For example, you can set a task-local bit flag with this, without 
atomics.  If this is rare enough, it's a net win compared to adding atomics.


aio completions might also make use of this.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Peter Zijlstra
On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
 Similar, but with stronger guarantees: when the function is called, 
 current == p, and the task was either sleeping or in userspace. 

If the task is sleeping, current can never be p.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 12:14 PM, Peter Zijlstra wrote:

On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
  Similar, but with stronger guarantees: when the function is called,
  current == p, and the task was either sleeping or in userspace.

If the task is sleeping, current can never be p.


The guarantee is that the task was sleeping just before the function is 
called.  Of course it's woken up to run the function.


The idea is that you run the function in a known safe point to avoid 
extra synchronization.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 12:18 PM, Peter Zijlstra wrote:


  The guarantee is that the task was sleeping just before the function is
  called.  Of course it's woken up to run the function.

  The idea is that you run the function in a known safe point to avoid
  extra synchronization.


I'd much rather we didn't wake the task and let it sleep, that's usually
a very safe place for tasks to be. All you'd need is a guarantee it
won't be woken up while you're doing your thing.


But it means that 'current' is not set to the right value.  If the 
function depends on it, then it will misbehave.  And in fact 
preempt_notifier_register(), which is the function we want to call here, 
does depend on current.


Of course we need to find more users for this, but I have a feeling this 
will be generally useful.  The alternative is to keep adding bits to 
thread_info::flags.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Peter Zijlstra
On Tue, 2011-07-12 at 12:27 +0300, Avi Kivity wrote:
 On 07/12/2011 12:18 PM, Peter Zijlstra wrote:
  
The guarantee is that the task was sleeping just before the function is
called.  Of course it's woken up to run the function.
  
The idea is that you run the function in a known safe point to avoid
extra synchronization.
  
 
  I'd much rather we didn't wake the task and let it sleep, that's usually
  a very safe place for tasks to be. All you'd need is a guarantee it
  won't be woken up while you're doing your thing.
 
 But it means that 'current' is not set to the right value.  If the 
 function depends on it, then it will misbehave.  And in fact 
 preempt_notifier_register(), which is the function we want to call here, 
 does depend on current.
 
 Of course we need to find more users for this, but I have a feeling this 
 will be generally useful.  The alternative is to keep adding bits to 
 thread_info::flags.

Using TIF_bits sounds like a much better solution for this, wakeups are
really rather expensive and its best to avoid extra if at all possible.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 12:31 PM, Peter Zijlstra wrote:


  But it means that 'current' is not set to the right value.  If the
  function depends on it, then it will misbehave.  And in fact
  preempt_notifier_register(), which is the function we want to call here,
  does depend on current.

  Of course we need to find more users for this, but I have a feeling this
  will be generally useful.  The alternative is to keep adding bits to
  thread_info::flags.

Using TIF_bits sounds like a much better solution for this, wakeups are
really rather expensive and its best to avoid extra if at all possible.



We do need a way to make the task look at them.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Joerg Roedel
On Tue, Jul 12, 2011 at 11:31:00AM +0200, Peter Zijlstra wrote:
 On Tue, 2011-07-12 at 12:27 +0300, Avi Kivity wrote:

  But it means that 'current' is not set to the right value.  If the 
  function depends on it, then it will misbehave.  And in fact 
  preempt_notifier_register(), which is the function we want to call here, 
  does depend on current.
  
  Of course we need to find more users for this, but I have a feeling this 
  will be generally useful.  The alternative is to keep adding bits to 
  thread_info::flags.
 
 Using TIF_bits sounds like a much better solution for this, wakeups are
 really rather expensive and its best to avoid extra if at all possible.

I would rather vote for Avi's solution too. Such a functionality is very
helpful for LWP-perf integration as well (because we need a way to call
do_mmap for a task != current).

Regards,

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Will Deacon
On Tue, Jul 12, 2011 at 10:36:05AM +0100, Avi Kivity wrote:
 On 07/12/2011 12:31 PM, Peter Zijlstra wrote:
  
But it means that 'current' is not set to the right value.  If the
function depends on it, then it will misbehave.  And in fact
preempt_notifier_register(), which is the function we want to call here,
does depend on current.
  
Of course we need to find more users for this, but I have a feeling this
will be generally useful.  The alternative is to keep adding bits to
thread_info::flags.
 
  Using TIF_bits sounds like a much better solution for this, wakeups are
  really rather expensive and its best to avoid extra if at all possible.
 
 
 We do need a way to make the task look at them.

... especially without pushing this down into the arch code where I don't
think it belongs.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 12:41 PM, Joerg Roedel wrote:

  Using TIF_bits sounds like a much better solution for this, wakeups are
  really rather expensive and its best to avoid extra if at all possible.

I would rather vote for Avi's solution too. Such a functionality is very
helpful for LWP-perf integration as well (because we need a way to call
do_mmap for a task != current).



It's not needed for that.  See use_mm() (caller must be a kernel thread).

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Joerg Roedel
On Tue, Jul 12, 2011 at 12:44:17PM +0300, Avi Kivity wrote:
 On 07/12/2011 12:41 PM, Joerg Roedel wrote:
   Using TIF_bits sounds like a much better solution for this, wakeups are
   really rather expensive and its best to avoid extra if at all possible.

 I would rather vote for Avi's solution too. Such a functionality is very
 helpful for LWP-perf integration as well (because we need a way to call
 do_mmap for a task != current).


 It's not needed for that.  See use_mm() (caller must be a kernel thread).

But in our case the caller would be the process that wants to count, not
a kernel thread.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Peter Zijlstra
On Tue, 2011-07-12 at 12:16 +0300, Avi Kivity wrote:
 On 07/12/2011 12:14 PM, Peter Zijlstra wrote:
  On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
Similar, but with stronger guarantees: when the function is called,
current == p, and the task was either sleeping or in userspace.
 
  If the task is sleeping, current can never be p.
 
 The guarantee is that the task was sleeping just before the function is 
 called.  Of course it's woken up to run the function.
 
 The idea is that you run the function in a known safe point to avoid 
 extra synchronization.
 

I'd much rather we didn't wake the task and let it sleep, that's usually
a very safe place for tasks to be. All you'd need is a guarantee it
won't be woken up while you're doing your thing.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 12:48 PM, Joerg Roedel wrote:

On Tue, Jul 12, 2011 at 12:44:17PM +0300, Avi Kivity wrote:
  On 07/12/2011 12:41 PM, Joerg Roedel wrote:
 Using TIF_bits sounds like a much better solution for this, wakeups are
 really rather expensive and its best to avoid extra if at all possible.

  I would rather vote for Avi's solution too. Such a functionality is very
  helpful for LWP-perf integration as well (because we need a way to call
  do_mmap for a task != current).


  It's not needed for that.  See use_mm() (caller must be a kernel thread).

But in our case the caller would be the process that wants to count, not
a kernel thread.



Have a helper kernel thread do it for you.  Or extend use_mm() to return 
the old mm (without dropping its refcount) and add a way to restore it.


Regarding LWP - I thought the intent was self-profiling by the process 
for jits and the like?  If you also use it for perf, won't it be 
unusable for that?  Also, can't the process interfere, from userspace, 
by executing the unprivileged LWP instructions?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Joerg Roedel
On Tue, Jul 12, 2011 at 12:55:25PM +0300, Avi Kivity wrote:

 Have a helper kernel thread do it for you.  Or extend use_mm() to return
 the old mm (without dropping its refcount) and add a way to restore it.

Making use_mm usable for regular tasks too sounds like a good idea.
Thanks.

 Regarding LWP - I thought the intent was self-profiling by the process  
 for jits and the like?  If you also use it for perf, won't it be  
 unusable for that?  Also, can't the process interfere, from userspace,  
 by executing the unprivileged LWP instructions?

Ingo made perf-integration a merge-requirement for LWP. It is not really
well-suited for being integrated into perf because the design goal was
easy and efficient self-profiling of tasks (like you stated). So
integrating it into perf causes some pain. But lets see how it works
out.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 01:03 PM, Joerg Roedel wrote:

  Regarding LWP - I thought the intent was self-profiling by the process
  for jits and the like?  If you also use it for perf, won't it be
  unusable for that?  Also, can't the process interfere, from userspace,
  by executing the unprivileged LWP instructions?

Ingo made perf-integration a merge-requirement for LWP. It is not really
well-suited for being integrated into perf because the design goal was
easy and efficient self-profiling of tasks (like you stated). So
integrating it into perf causes some pain. But lets see how it works
out.


I don't think it's workable.  Having do_mmap() called in the task's 
context can change how it works.  And the task being able to kill/modify 
the profile, and not able to use LWP for itself, is a show stopper IMO.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Joerg Roedel
On Tue, Jul 12, 2011 at 01:07:25PM +0300, Avi Kivity wrote:
 On 07/12/2011 01:03 PM, Joerg Roedel wrote:

 Ingo made perf-integration a merge-requirement for LWP. It is not really
 well-suited for being integrated into perf because the design goal was
 easy and efficient self-profiling of tasks (like you stated). So
 integrating it into perf causes some pain. But lets see how it works
 out.

 I don't think it's workable.  Having do_mmap() called in the task's  
 context can change how it works.

According to Ingo that is already done at other places in the kernel and
should not be an issue.

 And the task being able to kill/modify  the profile, and not able to
 use LWP for itself, is a show stopper IMO.

I agree, that is a problem we have no solution for so far.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-12 Thread Avi Kivity

On 07/12/2011 01:24 PM, Joerg Roedel wrote:

On Tue, Jul 12, 2011 at 01:07:25PM +0300, Avi Kivity wrote:
  On 07/12/2011 01:03 PM, Joerg Roedel wrote:

  Ingo made perf-integration a merge-requirement for LWP. It is not really
  well-suited for being integrated into perf because the design goal was
  easy and efficient self-profiling of tasks (like you stated). So
  integrating it into perf causes some pain. But lets see how it works
  out.

  I don't think it's workable.  Having do_mmap() called in the task's
  context can change how it works.

According to Ingo that is already done at other places in the kernel and
should not be an issue.


kvm is one of those places, unfortunately.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-11 Thread Will Deacon
On Mon, Jul 04, 2011 at 03:36:57PM +0100, Frederic Weisbecker wrote:
 On Mon, Jul 04, 2011 at 05:10:20PM +0300, Avi Kivity wrote:
  On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:
  Another thing I would like to do in the even longer term is to not use 
  perf anymore
  for ptrace breakpoints, because that involves a heavy dependency and few 
  people are
  happy with that. Instead we should just have a generic hook into the 
  sched_switch()
  and handle pure ptrace breakpoints there. The central breakpoint API would 
  still be
  there to reserve/schedule breakpoint resources between ptrace and perf.
  
  
  'struct preempt_notifier' may be the hook you're looking for.
 
 Yeah looks like a perfect fit as it's per task.

I had a quick look at this and I think the preempt_notifier stuff needs
slightly extending so that we can register a notifier for a task other than
current [e.g. the child of current on which we are installing breakpoints].

If the task in question is running, it looks like this will introduce a race
condition between notifier registration and rescheduling. For the purposes
of ptrace this shouldn't be a problem as the child will be stopped, but
others might also want to make use of the new functionality.

Any ideas on how this could be achieved, or am I better off just restricting
this to children that are being traced?

Cheers,

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-05 Thread Will Deacon
Hi Frederic,

On Mon, Jul 04, 2011 at 02:58:24PM +0100, Frederic Weisbecker wrote:
 On Wed, Jun 29, 2011 at 05:27:25PM +0100, Will Deacon wrote:
  Hi Frederic,
  
  Thanks for including me on CC.
  
  On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
   On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure.  This is ugly and doesn't scale if a
single callback services many perf_events.
   
Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the 
perf_event.
The field can be accessed from the callback as 
event-overflow_handler_context.
All callers are updated.
   
Signed-off-by: Avi Kivity a...@redhat.com
   
   I believe it can micro-optimize ptrace through 
   register_user_hw_breakpoint() because
   we could store the index of the breakpoint that way, instead of iterating 
   through 4 slots.
   
   Perhaps it can help in arm too, adding Will in Cc.
  
  Yes, we could store the breakpoint index in there and it would save us
  walking over the breakpoints when one fires. Not sure this helps us for
  anything else though. My main gripe with the ptrace interface to
  hw_breakpoints is that we have to convert all the breakpoint information
  from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back again
  in the hw_breakpoint code. Yuck!
 
 Agreed, I don't like that either.
 
 Would you like to improve that? We probably need to be able to pass some arch 
 data
 through the whole call of breakpoint creation, including 
 perf_event_create_kernel_counter().

Sure, I'll make some time to look at this and try and get an RFC out in the
next few weeks.

 There can be a transition step where we can either take generic attr or arch 
 datas, until
 every archs are converted. So that you can handle the arm part and other arch 
 developers
 can relay.

Yup.

 
 Another thing I would like to do in the even longer term is to not use perf 
 anymore
 for ptrace breakpoints, because that involves a heavy dependency and few 
 people are
 happy with that. Instead we should just have a generic hook into the 
 sched_switch()
 and handle pure ptrace breakpoints there. The central breakpoint API would 
 still be
 there to reserve/schedule breakpoint resources between ptrace and perf.
 
 But beeing able to create ptrace breakpoints without converting to generic 
 perf attr
 is a necessary first step in order to achieve this.

Agreed, but I'll bear that in mind so I don't make it any more difficult
than it already is!

Cheers,

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-05 Thread Frederic Weisbecker
On Tue, Jul 05, 2011 at 03:30:26PM +0100, Will Deacon wrote:
 Hi Frederic,
 
 On Mon, Jul 04, 2011 at 02:58:24PM +0100, Frederic Weisbecker wrote:
  On Wed, Jun 29, 2011 at 05:27:25PM +0100, Will Deacon wrote:
   Hi Frederic,
   
   Thanks for including me on CC.
   
   On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
 The perf_event overflow handler does not receive any caller-derived
 argument, so many callers need to resort to looking up the perf_event
 in their local data structure.  This is ugly and doesn't scale if a
 single callback services many perf_events.

 Fix by adding a context parameter to 
 perf_event_create_kernel_counter()
 (and derived hardware breakpoints APIs) and storing it in the 
 perf_event.
 The field can be accessed from the callback as 
 event-overflow_handler_context.
 All callers are updated.

 Signed-off-by: Avi Kivity a...@redhat.com

I believe it can micro-optimize ptrace through 
register_user_hw_breakpoint() because
we could store the index of the breakpoint that way, instead of 
iterating through 4 slots.

Perhaps it can help in arm too, adding Will in Cc.
   
   Yes, we could store the breakpoint index in there and it would save us
   walking over the breakpoints when one fires. Not sure this helps us for
   anything else though. My main gripe with the ptrace interface to
   hw_breakpoints is that we have to convert all the breakpoint information
   from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back 
   again
   in the hw_breakpoint code. Yuck!
  
  Agreed, I don't like that either.
  
  Would you like to improve that? We probably need to be able to pass some 
  arch data
  through the whole call of breakpoint creation, including 
  perf_event_create_kernel_counter().
 
 Sure, I'll make some time to look at this and try and get an RFC out in the
 next few weeks.

Great! Thanks a lot!

 
  There can be a transition step where we can either take generic attr or 
  arch datas, until
  every archs are converted. So that you can handle the arm part and other 
  arch developers
  can relay.
 
 Yup.
 
  
  Another thing I would like to do in the even longer term is to not use perf 
  anymore
  for ptrace breakpoints, because that involves a heavy dependency and few 
  people are
  happy with that. Instead we should just have a generic hook into the 
  sched_switch()
  and handle pure ptrace breakpoints there. The central breakpoint API would 
  still be
  there to reserve/schedule breakpoint resources between ptrace and perf.
  
  But beeing able to create ptrace breakpoints without converting to generic 
  perf attr
  is a necessary first step in order to achieve this.
 
 Agreed, but I'll bear that in mind so I don't make it any more difficult
 than it already is!
 
 Cheers,
 
 Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-04 Thread Frederic Weisbecker
On Wed, Jun 29, 2011 at 05:27:25PM +0100, Will Deacon wrote:
 Hi Frederic,
 
 Thanks for including me on CC.
 
 On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
  On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
   The perf_event overflow handler does not receive any caller-derived
   argument, so many callers need to resort to looking up the perf_event
   in their local data structure.  This is ugly and doesn't scale if a
   single callback services many perf_events.
  
   Fix by adding a context parameter to perf_event_create_kernel_counter()
   (and derived hardware breakpoints APIs) and storing it in the perf_event.
   The field can be accessed from the callback as 
   event-overflow_handler_context.
   All callers are updated.
  
   Signed-off-by: Avi Kivity a...@redhat.com
  
  I believe it can micro-optimize ptrace through 
  register_user_hw_breakpoint() because
  we could store the index of the breakpoint that way, instead of iterating 
  through 4 slots.
  
  Perhaps it can help in arm too, adding Will in Cc.
 
 Yes, we could store the breakpoint index in there and it would save us
 walking over the breakpoints when one fires. Not sure this helps us for
 anything else though. My main gripe with the ptrace interface to
 hw_breakpoints is that we have to convert all the breakpoint information
 from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back again
 in the hw_breakpoint code. Yuck!

Agreed, I don't like that either.

Would you like to improve that? We probably need to be able to pass some arch 
data
through the whole call of breakpoint creation, including 
perf_event_create_kernel_counter().

There can be a transition step where we can either take generic attr or arch 
datas, until
every archs are converted. So that you can handle the arm part and other arch 
developers
can relay.


Another thing I would like to do in the even longer term is to not use perf 
anymore
for ptrace breakpoints, because that involves a heavy dependency and few people 
are
happy with that. Instead we should just have a generic hook into the 
sched_switch()
and handle pure ptrace breakpoints there. The central breakpoint API would 
still be
there to reserve/schedule breakpoint resources between ptrace and perf.

But beeing able to create ptrace breakpoints without converting to generic perf 
attr
is a necessary first step in order to achieve this.

Thanks. 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-04 Thread Avi Kivity

On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:

Another thing I would like to do in the even longer term is to not use perf 
anymore
for ptrace breakpoints, because that involves a heavy dependency and few people 
are
happy with that. Instead we should just have a generic hook into the 
sched_switch()
and handle pure ptrace breakpoints there. The central breakpoint API would 
still be
there to reserve/schedule breakpoint resources between ptrace and perf.



'struct preempt_notifier' may be the hook you're looking for.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-07-04 Thread Frederic Weisbecker
On Mon, Jul 04, 2011 at 05:10:20PM +0300, Avi Kivity wrote:
 On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:
 Another thing I would like to do in the even longer term is to not use perf 
 anymore
 for ptrace breakpoints, because that involves a heavy dependency and few 
 people are
 happy with that. Instead we should just have a generic hook into the 
 sched_switch()
 and handle pure ptrace breakpoints there. The central breakpoint API would 
 still be
 there to reserve/schedule breakpoint resources between ptrace and perf.
 
 
 'struct preempt_notifier' may be the hook you're looking for.

Yeah looks like a perfect fit as it's per task.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] perf: add context field to perf_event

2011-06-29 Thread Avi Kivity
The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure.  This is ugly and doesn't scale if a
single callback services many perf_events.

Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event-overflow_handler_context.
All callers are updated.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/arm/kernel/ptrace.c|3 ++-
 arch/powerpc/kernel/ptrace.c|2 +-
 arch/sh/kernel/ptrace_32.c  |3 ++-
 arch/x86/kernel/kgdb.c  |2 +-
 arch/x86/kernel/ptrace.c|3 ++-
 drivers/oprofile/oprofile_perf.c|2 +-
 include/linux/hw_breakpoint.h   |   10 --
 include/linux/perf_event.h  |4 +++-
 kernel/events/core.c|   21 +++--
 kernel/events/hw_breakpoint.c   |   10 +++---
 kernel/watchdog.c   |2 +-
 samples/hw_breakpoint/data_breakpoint.c |2 +-
 12 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 9726006..4911c94 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -479,7 +479,8 @@ static struct perf_event *ptrace_hbp_create(struct 
task_struct *tsk, int type)
attr.bp_type= type;
attr.disabled   = 1;
 
-   return register_user_hw_breakpoint(attr, ptrace_hbptriggered, tsk);
+   return register_user_hw_breakpoint(attr, ptrace_hbptriggered, NULL,
+  tsk);
 }
 
 static int ptrace_gethbpregs(struct task_struct *tsk, long num,
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index cb22024..5249308 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -973,7 +973,7 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned 
long addr,
attr.bp_type);
 
thread-ptrace_bps[0] = bp = register_user_hw_breakpoint(attr,
-   ptrace_triggered, task);
+  ptrace_triggered, NULL, task);
if (IS_ERR(bp)) {
thread-ptrace_bps[0] = NULL;
ptrace_put_breakpoints(task);
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 3d7b209..930312f 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -91,7 +91,8 @@ static int set_single_step(struct task_struct *tsk, unsigned 
long addr)
attr.bp_len = HW_BREAKPOINT_LEN_2;
attr.bp_type = HW_BREAKPOINT_R;
 
-   bp = register_user_hw_breakpoint(attr, ptrace_triggered, tsk);
+   bp = register_user_hw_breakpoint(attr, ptrace_triggered,
+NULL, tsk);
if (IS_ERR(bp))
return PTR_ERR(bp);
 
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 5f9ecff..473ab53 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -638,7 +638,7 @@ void kgdb_arch_late(void)
for (i = 0; i  HBP_NUM; i++) {
if (breakinfo[i].pev)
continue;
-   breakinfo[i].pev = register_wide_hw_breakpoint(attr, NULL);
+   breakinfo[i].pev = register_wide_hw_breakpoint(attr, NULL, 
NULL);
if (IS_ERR((void * __force)breakinfo[i].pev)) {
printk(KERN_ERR kgdb: Could not allocate hw
   breakpoints\nDisabling the kernel debugger\n);
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 807c2a2..28092ae 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -715,7 +715,8 @@ static int ptrace_set_breakpoint_addr(struct task_struct 
*tsk, int nr,
attr.bp_type = HW_BREAKPOINT_W;
attr.disabled = 1;
 
-   bp = register_user_hw_breakpoint(attr, ptrace_triggered, tsk);
+   bp = register_user_hw_breakpoint(attr, ptrace_triggered,
+NULL, tsk);
 
/*
 * CHECKME: the previous code returned -EIO if the addr wasn't
diff --git a/drivers/oprofile/oprofile_perf.c b/drivers/oprofile/oprofile_perf.c
index 9046f7b..59acf9e 100644
--- a/drivers/oprofile/oprofile_perf.c
+++ b/drivers/oprofile/oprofile_perf.c
@@ -79,7 +79,7 @@ static int op_create_counter(int cpu, int event)
 
pevent = perf_event_create_kernel_counter(counter_config[event].attr,
  cpu, NULL,
- op_overflow_handler);
+

Re: [PATCH 1/3] perf: add context field to perf_event

2011-06-29 Thread Frederic Weisbecker
On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
 The perf_event overflow handler does not receive any caller-derived
 argument, so many callers need to resort to looking up the perf_event
 in their local data structure.  This is ugly and doesn't scale if a
 single callback services many perf_events.
 
 Fix by adding a context parameter to perf_event_create_kernel_counter()
 (and derived hardware breakpoints APIs) and storing it in the perf_event.
 The field can be accessed from the callback as 
 event-overflow_handler_context.
 All callers are updated.
 
 Signed-off-by: Avi Kivity a...@redhat.com

I believe it can micro-optimize ptrace through register_user_hw_breakpoint() 
because
we could store the index of the breakpoint that way, instead of iterating 
through 4 slots.

Perhaps it can help in arm too, adding Will in Cc.

But for register_wide_hw_breakpoint, I'm not sure. kgdb is the main user, may 
be Jason
could find some use of it.

 ---
  arch/arm/kernel/ptrace.c|3 ++-
  arch/powerpc/kernel/ptrace.c|2 +-
  arch/sh/kernel/ptrace_32.c  |3 ++-
  arch/x86/kernel/kgdb.c  |2 +-
  arch/x86/kernel/ptrace.c|3 ++-
  drivers/oprofile/oprofile_perf.c|2 +-
  include/linux/hw_breakpoint.h   |   10 --
  include/linux/perf_event.h  |4 +++-
  kernel/events/core.c|   21 +++--
  kernel/events/hw_breakpoint.c   |   10 +++---
  kernel/watchdog.c   |2 +-
  samples/hw_breakpoint/data_breakpoint.c |2 +-
  12 files changed, 44 insertions(+), 20 deletions(-)
 
 diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
 index 9726006..4911c94 100644
 --- a/arch/arm/kernel/ptrace.c
 +++ b/arch/arm/kernel/ptrace.c
 @@ -479,7 +479,8 @@ static struct perf_event *ptrace_hbp_create(struct 
 task_struct *tsk, int type)
   attr.bp_type= type;
   attr.disabled   = 1;
  
 - return register_user_hw_breakpoint(attr, ptrace_hbptriggered, tsk);
 + return register_user_hw_breakpoint(attr, ptrace_hbptriggered, NULL,
 +tsk);
  }
  
  static int ptrace_gethbpregs(struct task_struct *tsk, long num,
 diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
 index cb22024..5249308 100644
 --- a/arch/powerpc/kernel/ptrace.c
 +++ b/arch/powerpc/kernel/ptrace.c
 @@ -973,7 +973,7 @@ int ptrace_set_debugreg(struct task_struct *task, 
 unsigned long addr,
   attr.bp_type);
  
   thread-ptrace_bps[0] = bp = register_user_hw_breakpoint(attr,
 - ptrace_triggered, task);
 +ptrace_triggered, NULL, task);
   if (IS_ERR(bp)) {
   thread-ptrace_bps[0] = NULL;
   ptrace_put_breakpoints(task);
 diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
 index 3d7b209..930312f 100644
 --- a/arch/sh/kernel/ptrace_32.c
 +++ b/arch/sh/kernel/ptrace_32.c
 @@ -91,7 +91,8 @@ static int set_single_step(struct task_struct *tsk, 
 unsigned long addr)
   attr.bp_len = HW_BREAKPOINT_LEN_2;
   attr.bp_type = HW_BREAKPOINT_R;
  
 - bp = register_user_hw_breakpoint(attr, ptrace_triggered, tsk);
 + bp = register_user_hw_breakpoint(attr, ptrace_triggered,
 +  NULL, tsk);
   if (IS_ERR(bp))
   return PTR_ERR(bp);
  
 diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
 index 5f9ecff..473ab53 100644
 --- a/arch/x86/kernel/kgdb.c
 +++ b/arch/x86/kernel/kgdb.c
 @@ -638,7 +638,7 @@ void kgdb_arch_late(void)
   for (i = 0; i  HBP_NUM; i++) {
   if (breakinfo[i].pev)
   continue;
 - breakinfo[i].pev = register_wide_hw_breakpoint(attr, NULL);
 + breakinfo[i].pev = register_wide_hw_breakpoint(attr, NULL, 
 NULL);
   if (IS_ERR((void * __force)breakinfo[i].pev)) {
   printk(KERN_ERR kgdb: Could not allocate hw
  breakpoints\nDisabling the kernel debugger\n);
 diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
 index 807c2a2..28092ae 100644
 --- a/arch/x86/kernel/ptrace.c
 +++ b/arch/x86/kernel/ptrace.c
 @@ -715,7 +715,8 @@ static int ptrace_set_breakpoint_addr(struct task_struct 
 *tsk, int nr,
   attr.bp_type = HW_BREAKPOINT_W;
   attr.disabled = 1;
  
 - bp = register_user_hw_breakpoint(attr, ptrace_triggered, tsk);
 + bp = register_user_hw_breakpoint(attr, ptrace_triggered,
 +  NULL, tsk);
  
   /*
* CHECKME: the previous code returned -EIO if the addr wasn't
 diff --git a/drivers/oprofile/oprofile_perf.c 
 

Re: [PATCH 1/3] perf: add context field to perf_event

2011-06-29 Thread Avi Kivity

On 06/29/2011 07:08 PM, Frederic Weisbecker wrote:

On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
  The perf_event overflow handler does not receive any caller-derived
  argument, so many callers need to resort to looking up the perf_event
  in their local data structure.  This is ugly and doesn't scale if a
  single callback services many perf_events.

  Fix by adding a context parameter to perf_event_create_kernel_counter()
  (and derived hardware breakpoints APIs) and storing it in the perf_event.
  The field can be accessed from the callback as 
event-overflow_handler_context.
  All callers are updated.

  Signed-off-by: Avi Kivitya...@redhat.com

I believe it can micro-optimize ptrace through register_user_hw_breakpoint() 
because
we could store the index of the breakpoint that way, instead of iterating 
through 4 slots.



Right, I noticed that while writing the patch.


Perhaps it can help in arm too, adding Will in Cc.

But for register_wide_hw_breakpoint, I'm not sure. kgdb is the main user, may 
be Jason
could find some use of it.


I think an API should not require its users to iterate in their 
callbacks, even if it doesn't affect current users for some reason.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] perf: add context field to perf_event

2011-06-29 Thread Will Deacon
Hi Frederic,

Thanks for including me on CC.

On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
 On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
  The perf_event overflow handler does not receive any caller-derived
  argument, so many callers need to resort to looking up the perf_event
  in their local data structure.  This is ugly and doesn't scale if a
  single callback services many perf_events.
 
  Fix by adding a context parameter to perf_event_create_kernel_counter()
  (and derived hardware breakpoints APIs) and storing it in the perf_event.
  The field can be accessed from the callback as 
  event-overflow_handler_context.
  All callers are updated.
 
  Signed-off-by: Avi Kivity a...@redhat.com
 
 I believe it can micro-optimize ptrace through register_user_hw_breakpoint() 
 because
 we could store the index of the breakpoint that way, instead of iterating 
 through 4 slots.
 
 Perhaps it can help in arm too, adding Will in Cc.

Yes, we could store the breakpoint index in there and it would save us
walking over the breakpoints when one fires. Not sure this helps us for
anything else though. My main gripe with the ptrace interface to
hw_breakpoints is that we have to convert all the breakpoint information
from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back again
in the hw_breakpoint code. Yuck!

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html