Re: [PATCH V2 2/5] ara virt interface of perf to support kvm guest os statistics collection in guest os

2010-06-22 Thread Avi Kivity

On 06/22/2010 05:08 AM, Zhang, Yanmin wrote:



Something that is worrying is that we don't expose group information.
perf will multiplex the events for us, but there will be a loss in accuracy.

 

   #ifdef CONFIG_HAVE_HW_BREAKPOINT
   #includeasm/hw_breakpoint.h
   #endif
@@ -753,6 +752,20 @@ struct perf_event {

perf_overflow_handler_t overflow_handler;

+   /*
+* pointers used by kvm perf paravirt interface.
+*
+* 1) Used in host kernel and points to host_perf_shadow which
+* has information about guest perf_event
+*/
+   void*host_perf_shadow;

   

Can we have real types instead of void pointers?
 

I just want perf generic codes have less dependency on KVM codes.
   


One way to do that and retain type safety is to have

struct perf_client {
  struct perf_client_ops *ops;
  ...
}

The client (kvm) can do

   struct kvm_perf_client {
 struct perf_client pc;
 // kvm specific stuff
   };

the callbacks receive struct perf_client and use container_of to reach 
the kvm_perf_client that contains it.



+   /*
+* 2) Used in guest kernel and points to guest_perf_shadow which
+* is used as a communication area with host kernel. Host kernel
+* copies overflow data to it when an event overflows.
+*/
+   void*guest_perf_shadow;

   

It's strange to see both guest and host parts in the same patch.
Splitting to separate patches will really help review.
 

It's a little hard to split the patches if they change the same file. Perhaps
I could add more statements before the patch when I send it out.
   


With git, it's easy (once you're used to it):

  # go back one commit:
  git reset HEAD^
  # selectively add bits:
  git add -p
  # commit first patch
  git commit -s
  # selectively add bits:
  git add -p
  # commit second patch
  git commit -s



@@ -1626,9 +1629,22 @@ void perf_event_task_tick(struct task_st
if (ctx   ctx-nr_events   ctx-nr_events != ctx-nr_active)
rotate = 1;

-   perf_ctx_adjust_freq(cpuctx-ctx);
-   if (ctx)
-   perf_ctx_adjust_freq(ctx);
+#ifdef CONFIG_KVM_PERF
+   if (kvm_para_available()) {
+   /*
+* perf_ctx_adjust_freq causes lots of pmu-read which would
+* trigger too many vmexit to host kernel. We disable it
+* under para virt situation
+*/
+   adjust_freq = 0;
+   }
+#endif

   

Perhaps we can have a batch read interface which will read many counters
at once.
 

It's a good idea. But that will touch many perf generic codes which causes it's 
hard
to maintain or follow future changes.
   


I'm talking about the guest/host interface.  So you have one vmexit and 
many host perf calls.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/5] ara virt interface of perf to support kvm guest os statistics collection in guest os

2010-06-22 Thread Zhang, Yanmin
On Tue, 2010-06-22 at 12:12 +0300, Avi Kivity wrote:
 On 06/22/2010 05:08 AM, Zhang, Yanmin wrote:
 
  Something that is worrying is that we don't expose group information.
  perf will multiplex the events for us, but there will be a loss in 
  accuracy.
 
   
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #includeasm/hw_breakpoint.h
 #endif
  @@ -753,6 +752,20 @@ struct perf_event {
 
perf_overflow_handler_t overflow_handler;
 
  + /*
  +  * pointers used by kvm perf paravirt interface.
  +  *
  +  * 1) Used in host kernel and points to host_perf_shadow which
  +  * has information about guest perf_event
  +  */
  + void*host_perf_shadow;
 
 
  Can we have real types instead of void pointers?
   
  I just want perf generic codes have less dependency on KVM codes.
 
 
 One way to do that and retain type safety is to have
 
  struct perf_client {
struct perf_client_ops *ops;
...
  }
 
 The client (kvm) can do
 
 struct kvm_perf_client {
   struct perf_client pc;
   // kvm specific stuff
 };
 
 the callbacks receive struct perf_client and use container_of to reach 
 the kvm_perf_client that contains it.
Let me double check it.

 
  + /*
  +  * 2) Used in guest kernel and points to guest_perf_shadow which
  +  * is used as a communication area with host kernel. Host kernel
  +  * copies overflow data to it when an event overflows.
  +  */
  + void*guest_perf_shadow;
 
 
  It's strange to see both guest and host parts in the same patch.
  Splitting to separate patches will really help review.
   
  It's a little hard to split the patches if they change the same file. 
  Perhaps
  I could add more statements before the patch when I send it out.
 
 
 With git, it's easy (once you're used to it):
 
# go back one commit:
git reset HEAD^
# selectively add bits:
git add -p
# commit first patch
git commit -s
# selectively add bits:
git add -p
# commit second patch
git commit -s
Thanks for your teaching.

 
 
  @@ -1626,9 +1629,22 @@ void perf_event_task_tick(struct task_st
if (ctx   ctx-nr_events   ctx-nr_events != ctx-nr_active)
rotate = 1;
 
  - perf_ctx_adjust_freq(cpuctx-ctx);
  - if (ctx)
  - perf_ctx_adjust_freq(ctx);
  +#ifdef CONFIG_KVM_PERF
  + if (kvm_para_available()) {
  + /*
  +  * perf_ctx_adjust_freq causes lots of pmu-read which would
  +  * trigger too many vmexit to host kernel. We disable it
  +  * under para virt situation
  +  */
  + adjust_freq = 0;
  + }
  +#endif
 
 
  Perhaps we can have a batch read interface which will read many counters
  at once.
   
  It's a good idea. But that will touch many perf generic codes which causes 
  it's hard
  to maintain or follow future changes.
 
 
 I'm talking about the guest/host interface.  So you have one vmexit and 
 many host perf calls.
I understood what you were speaking. I mean, perf generic codes operate 
perf_event
one by one. At low layer, we just know one perf_event before calling hypercall 
to
vmexit to host kernel.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/5] ara virt interface of perf to support kvm guest os statistics collection in guest os

2010-06-22 Thread Avi Kivity

On 06/22/2010 12:25 PM, Zhang, Yanmin wrote:



I'm talking about the guest/host interface.  So you have one vmexit and
many host perf calls.
 

I understood what you were speaking. I mean, perf generic codes operate 
perf_event
one by one. At low layer, we just know one perf_event before calling hypercall 
to
vmexit to host kernel.
   


Ah.

We might fix that by having the perf ops work on guest memory, and add a 
commit that transfers them to the host.  But this is complicated and can 
be left for later.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 2/5] ara virt interface of perf to support kvm guest os statistics collection in guest os

2010-06-21 Thread Zhang, Yanmin
The 2nd patch is to change the definition of perf_event to facilitate
perf attr copy when a hypercall happens.

Signed-off-by: Zhang Yanmin yanmin_zh...@linux.intel.com

---

--- linux-2.6_tip0620/include/linux/perf_event.h2010-06-21 
15:19:52.821999849 +0800
+++ linux-2.6_tip0620perfkvm/include/linux/perf_event.h 2010-06-21 
16:53:49.283999849 +0800
@@ -188,7 +188,10 @@ struct perf_event_attr {
__u64   sample_type;
__u64   read_format;
 
-   __u64   disabled   :  1, /* off by default*/
+   union {
+   __u64   flags;
+   struct {
+   __u64   disabled   :  1, /* off by default*/
inherit:  1, /* children inherit it   */
pinned :  1, /* must always be on PMU */
exclusive  :  1, /* only group on PMU */
@@ -217,6 +220,8 @@ struct perf_event_attr {
mmap_data  :  1, /* non-exec mmap data*/
 
__reserved_1   : 46;
+   };
+   };
 
union {
__u32   wakeup_events;/* wakeup every n events */
@@ -465,12 +470,6 @@ enum perf_callchain_context {
 # include asm/local64.h
 #endif
 
-struct perf_guest_info_callbacks {
-   int (*is_in_guest) (void);
-   int (*is_user_mode) (void);
-   unsigned long (*get_guest_ip) (void);
-};
-
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include asm/hw_breakpoint.h
 #endif
@@ -753,6 +752,20 @@ struct perf_event {
 
perf_overflow_handler_t overflow_handler;
 
+   /*
+* pointers used by kvm perf paravirt interface.
+*
+* 1) Used in host kernel and points to host_perf_shadow which
+* has information about guest perf_event
+*/
+   void*host_perf_shadow;
+   /*
+* 2) Used in guest kernel and points to guest_perf_shadow which
+* is used as a communication area with host kernel. Host kernel
+* copies overflow data to it when an event overflows.
+*/
+   void*guest_perf_shadow;
+
 #ifdef CONFIG_EVENT_TRACING
struct ftrace_event_call*tp_event;
struct event_filter *filter;
@@ -838,6 +851,16 @@ struct perf_output_handle {
int sample;
 };
 
+struct perf_guest_info_callbacks {
+   /* Support collect guest statistics from host side */
+   int (*is_in_guest) (void);
+   int (*is_user_mode) (void);
+   unsigned long (*get_guest_ip) (void);
+
+   /* Support paravirt interface */
+   void (*copy_event_to_shadow) (struct perf_event *event, int overflows);
+};
+
 #ifdef CONFIG_PERF_EVENTS
 
 /*
@@ -871,6 +894,10 @@ perf_event_create_kernel_counter(struct 
perf_overflow_handler_t callback);
 extern u64 perf_event_read_value(struct perf_event *event,
 u64 *enabled, u64 *running);
+extern void perf_event_output(struct perf_event *event, int nmi,
+   struct perf_sample_data *data, struct pt_regs *regs);
+void perf_event_attach(struct perf_event *event);
+void perf_event_detach(struct perf_event *event);
 
 struct perf_sample_data {
u64 type;
@@ -1023,6 +1050,14 @@ perf_event_task_sched_in(struct task_str
 static inline void
 perf_event_task_sched_out(struct task_struct *task,
struct task_struct *next)   { }
+
+static inline void
+perf_event_output(struct perf_event *event, int nmi,
+   struct perf_sample_data *data, struct pt_regs *regs){ }
+
+static inline void perf_event_attach(struct perf_event *event) { }
+static inline void perf_event_detach(struct perf_event *event) { }
+
 static inline void
 perf_event_task_tick(struct task_struct *task) { }
 static inline int perf_event_init_task(struct task_struct *child)  { 
return 0; }
--- linux-2.6_tip0620/kernel/watchdog.c 2010-06-21 15:20:48.517999849 +0800
+++ linux-2.6_tip0620perfkvm/kernel/watchdog.c  2010-06-21 15:21:39.315999849 
+0800
@@ -197,8 +197,6 @@ static struct perf_event_attr wd_hw_attr
.type   = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.size   = sizeof(struct perf_event_attr),
-   .pinned = 1,
-   .disabled   = 1,
 };
 
 /* Callback function for perf event subsystem */
@@ -361,6 +359,8 @@ static int watchdog_nmi_enable(int cpu)
/* Try to register using hardware perf events */
wd_attr = wd_hw_attr;
wd_attr-sample_period = hw_nmi_get_sample_period();
+   wd_attr-pinned = 1;
+   wd_attr-disabled = 1;
event = perf_event_create_kernel_counter(wd_attr, cpu, -1, 

Re: [PATCH V2 2/5] ara virt interface of perf to support kvm guest os statistics collection in guest os

2010-06-21 Thread Avi Kivity

On 06/21/2010 12:31 PM, Zhang, Yanmin wrote:

The 2nd patch is to change the definition of perf_event to facilitate
perf attr copy when a hypercall happens.

Signed-off-by: Zhang Yanminyanmin_zh...@linux.intel.com

---

--- linux-2.6_tip0620/include/linux/perf_event.h2010-06-21 
15:19:52.821999849 +0800
+++ linux-2.6_tip0620perfkvm/include/linux/perf_event.h 2010-06-21 
16:53:49.283999849 +0800
@@ -188,7 +188,10 @@ struct perf_event_attr {
__u64   sample_type;
__u64   read_format;

   


Assuming these flags are available to the guest?


-   __u64   disabled   :  1, /* off by default*/
+   union {
+   __u64   flags;
+   struct {
+   __u64   disabled   :  1, /* off by default*/
inherit:  1, /* children inherit it   */
   


inherit is meaningless for a guest.


pinned :  1, /* must always be on PMU */
   


We cannot allow a guest to pin a counter.

The other flags are also problematic.  I'd like to see virt-specific 
flags (probably we'll only need kernel/user and nested_hv for nested 
virtualization).


Something that is worrying is that we don't expose group information.  
perf will multiplex the events for us, but there will be a loss in accuracy.



  #ifdef CONFIG_HAVE_HW_BREAKPOINT
  #includeasm/hw_breakpoint.h
  #endif
@@ -753,6 +752,20 @@ struct perf_event {

perf_overflow_handler_t overflow_handler;

+   /*
+* pointers used by kvm perf paravirt interface.
+*
+* 1) Used in host kernel and points to host_perf_shadow which
+* has information about guest perf_event
+*/
+   void*host_perf_shadow;
   


Can we have real types instead of void pointers?


+   /*
+* 2) Used in guest kernel and points to guest_perf_shadow which
+* is used as a communication area with host kernel. Host kernel
+* copies overflow data to it when an event overflows.
+*/
+   void*guest_perf_shadow;
   


It's strange to see both guest and host parts in the same patch.  
Splitting to separate patches will really help review.



@@ -1626,9 +1629,22 @@ void perf_event_task_tick(struct task_st
if (ctx  ctx-nr_events  ctx-nr_events != ctx-nr_active)
rotate = 1;

-   perf_ctx_adjust_freq(cpuctx-ctx);
-   if (ctx)
-   perf_ctx_adjust_freq(ctx);
+#ifdef CONFIG_KVM_PERF
+   if (kvm_para_available()) {
+   /*
+* perf_ctx_adjust_freq causes lots of pmu-read which would
+* trigger too many vmexit to host kernel. We disable it
+* under para virt situation
+*/
+   adjust_freq = 0;
+   }
+#endif
   


Perhaps we can have a batch read interface which will read many counters 
at once.  This would reduce the number of exits.  Also adjust the 
frequency less frequently.



+
+   if (adjust_freq) {
+   perf_ctx_adjust_freq(cpuctx-ctx);
+   if (ctx)
+   perf_ctx_adjust_freq(ctx);
+   }

   


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/5] ara virt interface of perf to support kvm guest os statistics collection in guest os

2010-06-21 Thread Zhang, Yanmin
On Mon, 2010-06-21 at 15:00 +0300, Avi Kivity wrote:
 On 06/21/2010 12:31 PM, Zhang, Yanmin wrote:
  The 2nd patch is to change the definition of perf_event to facilitate
  perf attr copy when a hypercall happens.
 
  Signed-off-by: Zhang Yanminyanmin_zh...@linux.intel.com
 
  ---
 
  --- linux-2.6_tip0620/include/linux/perf_event.h2010-06-21 
  15:19:52.821999849 +0800
  +++ linux-2.6_tip0620perfkvm/include/linux/perf_event.h 2010-06-21 
  16:53:49.283999849 +0800
  @@ -188,7 +188,10 @@ struct perf_event_attr {
  __u64   sample_type;
  __u64   read_format;
 
 
 
 Assuming these flags are available to the guest?
These flags are used by generic perf codes. To match with host kernel, we wish
guest os also use the flags.

 
  -   __u64   disabled   :  1, /* off by default*/
  +   union {
  +   __u64   flags;
  +   struct {
  +   __u64   disabled   :  1, /* off by default*/
  inherit:  1, /* children inherit it   */
 
 
 inherit is meaningless for a guest.
Right. host kernel will reset it to 0 before create perf_event for guest os.
Here we couldn't delete the flag as it's used by perf generic codes.
I need separate the patch a little better. All definitions in 
include/linux/perf_event.h
are mostly perf generic code related. I'm very careful to change it.

 
  pinned :  1, /* must always be on PMU */
 
 
 We cannot allow a guest to pin a counter.
Ok. I will reset it in function kvm_pv_perf_op_open.

 
 The other flags are also problematic.  I'd like to see virt-specific 
 flags (probably we'll only need kernel/user and nested_hv for nested 
 virtualization).
How about to add more comments around struct guest_perf_attr-flags that
guest os developers should take a look at include/linux/perf_event.h?
BTW, I will reset more flags to 0 in kvm_pv_perf_op_open.

 
 Something that is worrying is that we don't expose group information.  
 perf will multiplex the events for us, but there will be a loss in accuracy.
 
#ifdef CONFIG_HAVE_HW_BREAKPOINT
#includeasm/hw_breakpoint.h
#endif
  @@ -753,6 +752,20 @@ struct perf_event {
 
  perf_overflow_handler_t overflow_handler;
 
  +   /*
  +* pointers used by kvm perf paravirt interface.
  +*
  +* 1) Used in host kernel and points to host_perf_shadow which
  +* has information about guest perf_event
  +*/
  +   void*host_perf_shadow;
 
 
 Can we have real types instead of void pointers?
I just want perf generic codes have less dependency on KVM codes.

 
  +   /*
  +* 2) Used in guest kernel and points to guest_perf_shadow which
  +* is used as a communication area with host kernel. Host kernel
  +* copies overflow data to it when an event overflows.
  +*/
  +   void*guest_perf_shadow;
 
 
 It's strange to see both guest and host parts in the same patch.  
 Splitting to separate patches will really help review.
It's a little hard to split the patches if they change the same file. Perhaps
I could add more statements before the patch when I send it out.

 
  @@ -1626,9 +1629,22 @@ void perf_event_task_tick(struct task_st
  if (ctx  ctx-nr_events  ctx-nr_events != ctx-nr_active)
  rotate = 1;
 
  -   perf_ctx_adjust_freq(cpuctx-ctx);
  -   if (ctx)
  -   perf_ctx_adjust_freq(ctx);
  +#ifdef CONFIG_KVM_PERF
  +   if (kvm_para_available()) {
  +   /*
  +* perf_ctx_adjust_freq causes lots of pmu-read which would
  +* trigger too many vmexit to host kernel. We disable it
  +* under para virt situation
  +*/
  +   adjust_freq = 0;
  +   }
  +#endif
 
 
 Perhaps we can have a batch read interface which will read many counters 
 at once.
It's a good idea. But that will touch many perf generic codes which causes it's 
hard
to maintain or follow future changes.

   This would reduce the number of exits.  Also adjust the 
 frequency less frequently.
Here it depends on process scheduler frequency, CONFIG_HZ. 

 
  +
  +   if (adjust_freq) {
  +   perf_ctx_adjust_freq(cpuctx-ctx);
  +   if (ctx)
  +   perf_ctx_adjust_freq(ctx);
  +   }
 
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html