Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-28 Thread Steven Rostedt
On Tue, 28 Aug 2018 12:07:30 -0700
Song Liu  wrote:

> Hi all,
> 
> What's our plan with this work? Will this be routed via Steven's tree?
> 
> 

I can start pulling these in and testing them.

Thanks,

-- Steve


Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-28 Thread Steven Rostedt
On Tue, 28 Aug 2018 12:07:30 -0700
Song Liu  wrote:

> Hi all,
> 
> What's our plan with this work? Will this be routed via Steven's tree?
> 
> 

I can start pulling these in and testing them.

Thanks,

-- Steve


Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-28 Thread Song Liu
Hi all,

What's our plan with this work? Will this be routed via Steven's tree?

Thanks,
Song

On Wed, Aug 22, 2018 at 5:39 AM Srikar Dronamraju
 wrote:
>
> * Ravi Bangoria  [2018-08-20 10:12:47]:
>
> > Userspace Statically Defined Tracepoints[1] are dtrace style markers
> > inside userspace applications. Applications like PostgreSQL, MySQL,
> > Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> > have these markers embedded in them. These markers are added by developer
> > at important places in the code. Each marker source expands to a single
> > nop instruction in the compiled code but there may be additional
> > overhead for computing the marker arguments which expands to couple of
> > instructions. In case the overhead is more, execution of it can be
> > omitted by runtime if() condition when no one is tracing on the marker:
> >
> > if (reference_counter > 0) {
> > Execute marker instructions;
> > }
> >
> > Default value of reference counter is 0. Tracer has to increment the
> > reference counter before tracing on a marker and decrement it when
> > done with the tracing.
> >
> > Implement the reference counter logic in core uprobe. User will be
> > able to use it from trace_uprobe as well as from kernel module. New
> > trace_uprobe definition with reference counter will now be:
> >
> > :[(ref_ctr_offset)]
> >
> > where ref_ctr_offset is an optional field. For kernel module, new
> > variant of uprobe_register() has been introduced:
> >
> > uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> >
> > No new variant for uprobe_unregister() because it's assumed to have
> > only one reference counter for one uprobe.
> >
> > [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> >
> > Note: 'reference counter' is called as 'semaphore' in original Dtrace
> > (or Systemtap, bcc and even in ELF) documentation and code. But the
> > term 'semaphore' is misleading in this context. This is just a counter
> > used to hold number of tracers tracing on a marker. This is not really
> > used for any synchronization. So we are calling it a 'reference counter'
> > in kernel / perf code.
> >
>
>
> Acked-by: Srikar Dronamraju 
>
> > Signed-off-by: Ravi Bangoria 
> > Reviewed-by: Masami Hiramatsu 
> > [Only trace_uprobe.c]
> > Reviewed-by: Oleg Nesterov 
> > ---
>


Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-28 Thread Song Liu
Hi all,

What's our plan with this work? Will this be routed via Steven's tree?

Thanks,
Song

On Wed, Aug 22, 2018 at 5:39 AM Srikar Dronamraju
 wrote:
>
> * Ravi Bangoria  [2018-08-20 10:12:47]:
>
> > Userspace Statically Defined Tracepoints[1] are dtrace style markers
> > inside userspace applications. Applications like PostgreSQL, MySQL,
> > Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> > have these markers embedded in them. These markers are added by developer
> > at important places in the code. Each marker source expands to a single
> > nop instruction in the compiled code but there may be additional
> > overhead for computing the marker arguments which expands to couple of
> > instructions. In case the overhead is more, execution of it can be
> > omitted by runtime if() condition when no one is tracing on the marker:
> >
> > if (reference_counter > 0) {
> > Execute marker instructions;
> > }
> >
> > Default value of reference counter is 0. Tracer has to increment the
> > reference counter before tracing on a marker and decrement it when
> > done with the tracing.
> >
> > Implement the reference counter logic in core uprobe. User will be
> > able to use it from trace_uprobe as well as from kernel module. New
> > trace_uprobe definition with reference counter will now be:
> >
> > :[(ref_ctr_offset)]
> >
> > where ref_ctr_offset is an optional field. For kernel module, new
> > variant of uprobe_register() has been introduced:
> >
> > uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> >
> > No new variant for uprobe_unregister() because it's assumed to have
> > only one reference counter for one uprobe.
> >
> > [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> >
> > Note: 'reference counter' is called as 'semaphore' in original Dtrace
> > (or Systemtap, bcc and even in ELF) documentation and code. But the
> > term 'semaphore' is misleading in this context. This is just a counter
> > used to hold number of tracers tracing on a marker. This is not really
> > used for any synchronization. So we are calling it a 'reference counter'
> > in kernel / perf code.
> >
>
>
> Acked-by: Srikar Dronamraju 
>
> > Signed-off-by: Ravi Bangoria 
> > Reviewed-by: Masami Hiramatsu 
> > [Only trace_uprobe.c]
> > Reviewed-by: Oleg Nesterov 
> > ---
>


Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-22 Thread Srikar Dronamraju
* Ravi Bangoria  [2018-08-20 10:12:47]:

> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
> if (reference_counter > 0) {
> Execute marker instructions;
> }
> 
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
> 
> :[(ref_ctr_offset)]
> 
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
> 
> uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> 
> No new variant for uprobe_unregister() because it's assumed to have
> only one reference counter for one uprobe.
> 
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> 
> Note: 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not really
> used for any synchronization. So we are calling it a 'reference counter'
> in kernel / perf code.
> 


Acked-by: Srikar Dronamraju 

> Signed-off-by: Ravi Bangoria 
> Reviewed-by: Masami Hiramatsu 
> [Only trace_uprobe.c]
> Reviewed-by: Oleg Nesterov 
> ---



Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-22 Thread Srikar Dronamraju
* Ravi Bangoria  [2018-08-20 10:12:47]:

> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
> if (reference_counter > 0) {
> Execute marker instructions;
> }
> 
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
> 
> :[(ref_ctr_offset)]
> 
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
> 
> uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> 
> No new variant for uprobe_unregister() because it's assumed to have
> only one reference counter for one uprobe.
> 
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> 
> Note: 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not really
> used for any synchronization. So we are calling it a 'reference counter'
> in kernel / perf code.
> 


Acked-by: Srikar Dronamraju 

> Signed-off-by: Ravi Bangoria 
> Reviewed-by: Masami Hiramatsu 
> [Only trace_uprobe.c]
> Reviewed-by: Oleg Nesterov 
> ---



Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-21 Thread Song Liu
On Sun, Aug 19, 2018 at 10:53 PM, Song Liu  wrote:
> On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
>  wrote:
>> Userspace Statically Defined Tracepoints[1] are dtrace style markers
>> inside userspace applications. Applications like PostgreSQL, MySQL,
>> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
>> have these markers embedded in them. These markers are added by developer
>> at important places in the code. Each marker source expands to a single
>> nop instruction in the compiled code but there may be additional
>> overhead for computing the marker arguments which expands to couple of
>> instructions. In case the overhead is more, execution of it can be
>> omitted by runtime if() condition when no one is tracing on the marker:
>>
>> if (reference_counter > 0) {
>> Execute marker instructions;
>> }
>>
>> Default value of reference counter is 0. Tracer has to increment the
>> reference counter before tracing on a marker and decrement it when
>> done with the tracing.
>>
>> Implement the reference counter logic in core uprobe. User will be
>> able to use it from trace_uprobe as well as from kernel module. New
>> trace_uprobe definition with reference counter will now be:
>>
>> :[(ref_ctr_offset)]
>>
>> where ref_ctr_offset is an optional field. For kernel module, new
>> variant of uprobe_register() has been introduced:
>>
>> uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>>
>> No new variant for uprobe_unregister() because it's assumed to have
>> only one reference counter for one uprobe.
>>
>> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
>>
>> Note: 'reference counter' is called as 'semaphore' in original Dtrace
>> (or Systemtap, bcc and even in ELF) documentation and code. But the
>> term 'semaphore' is misleading in this context. This is just a counter
>> used to hold number of tracers tracing on a marker. This is not really
>> used for any synchronization. So we are calling it a 'reference counter'
>> in kernel / perf code.
>>
>> Signed-off-by: Ravi Bangoria 
>> Reviewed-by: Masami Hiramatsu 
>> [Only trace_uprobe.c]
>> Reviewed-by: Oleg Nesterov 
>
> Reviewed-by: Song Liu 

Reviewed-and-tested-by: Song Liu 

>
>> ---
>>  include/linux/uprobes.h |   5 +
>>  kernel/events/uprobes.c | 259 
>> ++--
>>  kernel/trace/trace.c|   2 +-
>>  kernel/trace/trace_uprobe.c |  38 ++-
>>  4 files changed, 293 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
>> index bb9d2084af03..103a48a48872 100644
>> --- a/include/linux/uprobes.h
>> +++ b/include/linux/uprobes.h
>> @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
>> *regs);
>>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct 
>> mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
>>  extern int uprobe_register(struct inode *inode, loff_t offset, struct 
>> uprobe_consumer *uc);
>> +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, 
>> loff_t ref_ctr_offset, struct uprobe_consumer *uc);
>>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
>> uprobe_consumer *uc, bool);
>>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
>> uprobe_consumer *uc);
>>  extern int uprobe_mmap(struct vm_area_struct *vma);
>> @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, 
>> struct uprobe_consumer *uc)
>>  {
>> return -ENOSYS;
>>  }
>> +static inline int uprobe_register_refctr(struct inode *inode, loff_t 
>> offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>> +{
>> +   return -ENOSYS;
>> +}
>>  static inline int
>>  uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer 
>> *uc, bool add)
>>  {
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 919c1ce32beb..35065febcb6c 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -73,6 +73,7 @@ struct uprobe {
>> struct uprobe_consumer  *consumers;
>> struct inode*inode; /* Also hold a ref to inode 
>> */
>> loff_t  offset;
>> +   loff_t  ref_ctr_offset;
>> unsigned long   flags;
>>
>> /*
>> @@ -88,6 +89,15 @@ struct uprobe {
>> struct arch_uprobe  arch;
>>  };
>>
>> +struct delayed_uprobe {
>> +   struct list_head list;
>> +   struct uprobe *uprobe;
>> +   struct mm_struct *mm;
>> +};
>> +
>> +static DEFINE_MUTEX(delayed_uprobe_lock);
>> +static LIST_HEAD(delayed_uprobe_list);
>> +
>>  /*
>>   * Execute out of line area: anonymous executable mapping installed
>>   * by the probed task to execute the copy of the original instruction
>> @@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned 
>> long vaddr, 

Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-21 Thread Song Liu
On Sun, Aug 19, 2018 at 10:53 PM, Song Liu  wrote:
> On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
>  wrote:
>> Userspace Statically Defined Tracepoints[1] are dtrace style markers
>> inside userspace applications. Applications like PostgreSQL, MySQL,
>> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
>> have these markers embedded in them. These markers are added by developer
>> at important places in the code. Each marker source expands to a single
>> nop instruction in the compiled code but there may be additional
>> overhead for computing the marker arguments which expands to couple of
>> instructions. In case the overhead is more, execution of it can be
>> omitted by runtime if() condition when no one is tracing on the marker:
>>
>> if (reference_counter > 0) {
>> Execute marker instructions;
>> }
>>
>> Default value of reference counter is 0. Tracer has to increment the
>> reference counter before tracing on a marker and decrement it when
>> done with the tracing.
>>
>> Implement the reference counter logic in core uprobe. User will be
>> able to use it from trace_uprobe as well as from kernel module. New
>> trace_uprobe definition with reference counter will now be:
>>
>> :[(ref_ctr_offset)]
>>
>> where ref_ctr_offset is an optional field. For kernel module, new
>> variant of uprobe_register() has been introduced:
>>
>> uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>>
>> No new variant for uprobe_unregister() because it's assumed to have
>> only one reference counter for one uprobe.
>>
>> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
>>
>> Note: 'reference counter' is called as 'semaphore' in original Dtrace
>> (or Systemtap, bcc and even in ELF) documentation and code. But the
>> term 'semaphore' is misleading in this context. This is just a counter
>> used to hold number of tracers tracing on a marker. This is not really
>> used for any synchronization. So we are calling it a 'reference counter'
>> in kernel / perf code.
>>
>> Signed-off-by: Ravi Bangoria 
>> Reviewed-by: Masami Hiramatsu 
>> [Only trace_uprobe.c]
>> Reviewed-by: Oleg Nesterov 
>
> Reviewed-by: Song Liu 

Reviewed-and-tested-by: Song Liu 

>
>> ---
>>  include/linux/uprobes.h |   5 +
>>  kernel/events/uprobes.c | 259 
>> ++--
>>  kernel/trace/trace.c|   2 +-
>>  kernel/trace/trace_uprobe.c |  38 ++-
>>  4 files changed, 293 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
>> index bb9d2084af03..103a48a48872 100644
>> --- a/include/linux/uprobes.h
>> +++ b/include/linux/uprobes.h
>> @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
>> *regs);
>>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct 
>> mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
>>  extern int uprobe_register(struct inode *inode, loff_t offset, struct 
>> uprobe_consumer *uc);
>> +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, 
>> loff_t ref_ctr_offset, struct uprobe_consumer *uc);
>>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
>> uprobe_consumer *uc, bool);
>>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
>> uprobe_consumer *uc);
>>  extern int uprobe_mmap(struct vm_area_struct *vma);
>> @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, 
>> struct uprobe_consumer *uc)
>>  {
>> return -ENOSYS;
>>  }
>> +static inline int uprobe_register_refctr(struct inode *inode, loff_t 
>> offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>> +{
>> +   return -ENOSYS;
>> +}
>>  static inline int
>>  uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer 
>> *uc, bool add)
>>  {
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 919c1ce32beb..35065febcb6c 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -73,6 +73,7 @@ struct uprobe {
>> struct uprobe_consumer  *consumers;
>> struct inode*inode; /* Also hold a ref to inode 
>> */
>> loff_t  offset;
>> +   loff_t  ref_ctr_offset;
>> unsigned long   flags;
>>
>> /*
>> @@ -88,6 +89,15 @@ struct uprobe {
>> struct arch_uprobe  arch;
>>  };
>>
>> +struct delayed_uprobe {
>> +   struct list_head list;
>> +   struct uprobe *uprobe;
>> +   struct mm_struct *mm;
>> +};
>> +
>> +static DEFINE_MUTEX(delayed_uprobe_lock);
>> +static LIST_HEAD(delayed_uprobe_list);
>> +
>>  /*
>>   * Execute out of line area: anonymous executable mapping installed
>>   * by the probed task to execute the copy of the original instruction
>> @@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned 
>> long vaddr, 

Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-19 Thread Song Liu
On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
 wrote:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
>
> if (reference_counter > 0) {
> Execute marker instructions;
> }
>
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
>
> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
>
> :[(ref_ctr_offset)]
>
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
>
> uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>
> No new variant for uprobe_unregister() because it's assumed to have
> only one reference counter for one uprobe.
>
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
>
> Note: 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not really
> used for any synchronization. So we are calling it a 'reference counter'
> in kernel / perf code.
>
> Signed-off-by: Ravi Bangoria 
> Reviewed-by: Masami Hiramatsu 
> [Only trace_uprobe.c]
> Reviewed-by: Oleg Nesterov 

Reviewed-by: Song Liu 

> ---
>  include/linux/uprobes.h |   5 +
>  kernel/events/uprobes.c | 259 
> ++--
>  kernel/trace/trace.c|   2 +-
>  kernel/trace/trace_uprobe.c |  38 ++-
>  4 files changed, 293 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index bb9d2084af03..103a48a48872 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
> *regs);
>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
> *mm, unsigned long vaddr, uprobe_opcode_t);
>  extern int uprobe_register(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc);
> +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
> ref_ctr_offset, struct uprobe_consumer *uc);
>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc, bool);
>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc);
>  extern int uprobe_mmap(struct vm_area_struct *vma);
> @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, 
> struct uprobe_consumer *uc)
>  {
> return -ENOSYS;
>  }
> +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
> loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> +{
> +   return -ENOSYS;
> +}
>  static inline int
>  uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
> bool add)
>  {
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 919c1ce32beb..35065febcb6c 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -73,6 +73,7 @@ struct uprobe {
> struct uprobe_consumer  *consumers;
> struct inode*inode; /* Also hold a ref to inode */
> loff_t  offset;
> +   loff_t  ref_ctr_offset;
> unsigned long   flags;
>
> /*
> @@ -88,6 +89,15 @@ struct uprobe {
> struct arch_uprobe  arch;
>  };
>
> +struct delayed_uprobe {
> +   struct list_head list;
> +   struct uprobe *uprobe;
> +   struct mm_struct *mm;
> +};
> +
> +static DEFINE_MUTEX(delayed_uprobe_lock);
> +static LIST_HEAD(delayed_uprobe_list);
> +
>  /*
>   * Execute out of line area: anonymous executable mapping installed
>   * by the probed task to execute the copy of the original instruction
> @@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned 
> long vaddr, uprobe_opcode_t
> return 1;
>  }
>
> +static struct delayed_uprobe *
> +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +   struct delayed_uprobe *du;
> +
> +   

Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-19 Thread Song Liu
On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
 wrote:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
>
> if (reference_counter > 0) {
> Execute marker instructions;
> }
>
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
>
> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
>
> :[(ref_ctr_offset)]
>
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
>
> uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>
> No new variant for uprobe_unregister() because it's assumed to have
> only one reference counter for one uprobe.
>
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
>
> Note: 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not really
> used for any synchronization. So we are calling it a 'reference counter'
> in kernel / perf code.
>
> Signed-off-by: Ravi Bangoria 
> Reviewed-by: Masami Hiramatsu 
> [Only trace_uprobe.c]
> Reviewed-by: Oleg Nesterov 

Reviewed-by: Song Liu 

> ---
>  include/linux/uprobes.h |   5 +
>  kernel/events/uprobes.c | 259 
> ++--
>  kernel/trace/trace.c|   2 +-
>  kernel/trace/trace_uprobe.c |  38 ++-
>  4 files changed, 293 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index bb9d2084af03..103a48a48872 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
> *regs);
>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
> *mm, unsigned long vaddr, uprobe_opcode_t);
>  extern int uprobe_register(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc);
> +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
> ref_ctr_offset, struct uprobe_consumer *uc);
>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc, bool);
>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc);
>  extern int uprobe_mmap(struct vm_area_struct *vma);
> @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, 
> struct uprobe_consumer *uc)
>  {
> return -ENOSYS;
>  }
> +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
> loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> +{
> +   return -ENOSYS;
> +}
>  static inline int
>  uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
> bool add)
>  {
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 919c1ce32beb..35065febcb6c 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -73,6 +73,7 @@ struct uprobe {
> struct uprobe_consumer  *consumers;
> struct inode*inode; /* Also hold a ref to inode */
> loff_t  offset;
> +   loff_t  ref_ctr_offset;
> unsigned long   flags;
>
> /*
> @@ -88,6 +89,15 @@ struct uprobe {
> struct arch_uprobe  arch;
>  };
>
> +struct delayed_uprobe {
> +   struct list_head list;
> +   struct uprobe *uprobe;
> +   struct mm_struct *mm;
> +};
> +
> +static DEFINE_MUTEX(delayed_uprobe_lock);
> +static LIST_HEAD(delayed_uprobe_list);
> +
>  /*
>   * Execute out of line area: anonymous executable mapping installed
>   * by the probed task to execute the copy of the original instruction
> @@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned 
> long vaddr, uprobe_opcode_t
> return 1;
>  }
>
> +static struct delayed_uprobe *
> +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +   struct delayed_uprobe *du;
> +
> +   

[PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-19 Thread Ravi Bangoria
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are calling it a 'reference counter'
in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
Reviewed-by: Oleg Nesterov 
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 259 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 ++-
 4 files changed, 293 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 919c1ce32beb..35065febcb6c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe_check(uprobe, mm))
+   

[PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-19 Thread Ravi Bangoria
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are calling it a 'reference counter'
in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
Reviewed-by: Oleg Nesterov 
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 259 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 ++-
 4 files changed, 293 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 919c1ce32beb..35065febcb6c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe_check(uprobe, mm))
+