Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-25 Thread Ravi Bangoria
Thanks Oleg for the review,

On 05/24/2018 09:56 PM, Oleg Nesterov wrote:
> On 04/17, Ravi Bangoria wrote:
>>
>> @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer 
>> *self,
>>  if (ret)
>>  goto err_buffer;
>>  
>> +if (tu->ref_ctr_offset)
>> +sdt_increment_ref_ctr(tu);
>> +
> 
> iiuc, this is probe_event_enable()...
> 
> Looks racy, but afaics the race with uprobe_mmap() will be closed by the next
> change. However, it seems that probe_event_disable() can race with 
> trace_uprobe_mmap()
> too and the next 7/9 patch won't help,
> 
>> +if (tu->ref_ctr_offset)
>> +sdt_decrement_ref_ctr(tu);
>> +
>>  uprobe_unregister(tu->inode, tu->offset, >consumer);
>>  tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
> 
> so what if trace_uprobe_mmap() comes right after uprobe_unregister() ?
> Note that trace_probe_is_enabled() is T until we update tp.flags.

Sure, I'll look at your comments.

Apart from these, I've also found a deadlock between uprobe_lock and
mm->mmap_sem. trace_uprobe_mmap() takes these locks in

   mm->mmap_sem
  uprobe_lock

order but some other code path is taking these locks in reverse order. I've
mentioned sample lockdep warning at the end. The issue is, mm->mmap_sem is
not in control of trace_uprobe_mmap() and we have to take uprobe_lock to
loop over all trace_uprobes.

Any idea how this can be resolved?


Sample lockdep warning:

[  499.258006] ==
[  499.258205] WARNING: possible circular locking dependency detected
[  499.258409] 4.17.0-rc3+ #76 Not tainted
[  499.258528] --
[  499.258731] perf/6744 is trying to acquire lock:
[  499.258895] e4895f49 (uprobe_lock){+.+.}, at: 
trace_uprobe_mmap+0x78/0x130
[  499.259147]
[  499.259147] but task is already holding lock:
[  499.259349] 9ec93a76 (>mmap_sem){}, at: 
vm_mmap_pgoff+0xe0/0x160
[  499.259597]
[  499.259597] which lock already depends on the new lock.
[  499.259597]
[  499.259848]
[  499.259848] the existing dependency chain (in reverse order) is:
[  499.260086]
[  499.260086] -> #4 (>mmap_sem){}:
[  499.260277]__lock_acquire+0x53c/0x910
[  499.260442]lock_acquire+0xf4/0x2f0
[  499.260595]down_write_killable+0x6c/0x150
[  499.260764]copy_process.isra.34.part.35+0x1594/0x1be0
[  499.260967]_do_fork+0xf8/0x910
[  499.261090]ppc_clone+0x8/0xc
[  499.261209]
[  499.261209] -> #3 (_mmap_sem){}:
[  499.261378]__lock_acquire+0x53c/0x910
[  499.261540]lock_acquire+0xf4/0x2f0
[  499.261669]down_write+0x6c/0x110
[  499.261793]percpu_down_write+0x48/0x140
[  499.261954]register_for_each_vma+0x6c/0x2a0
[  499.262116]uprobe_register+0x230/0x320
[  499.262277]probe_event_enable+0x1cc/0x540
[  499.262435]perf_trace_event_init+0x1e0/0x350
[  499.262587]perf_trace_init+0xb0/0x110
[  499.262750]perf_tp_event_init+0x38/0x90
[  499.262910]perf_try_init_event+0x10c/0x150
[  499.263075]perf_event_alloc+0xbb0/0xf10
[  499.263235]sys_perf_event_open+0x2a8/0xdd0
[  499.263396]system_call+0x58/0x6c
[  499.263516]
[  499.263516] -> #2 (>register_rwsem){}:
[  499.263723]__lock_acquire+0x53c/0x910
[  499.263884]lock_acquire+0xf4/0x2f0
[  499.264002]down_write+0x6c/0x110
[  499.264118]uprobe_register+0x1ec/0x320
[  499.264283]probe_event_enable+0x1cc/0x540
[  499.264442]perf_trace_event_init+0x1e0/0x350
[  499.264603]perf_trace_init+0xb0/0x110
[  499.264766]perf_tp_event_init+0x38/0x90
[  499.264930]perf_try_init_event+0x10c/0x150
[  499.265092]perf_event_alloc+0xbb0/0xf10
[  499.265261]sys_perf_event_open+0x2a8/0xdd0
[  499.265424]system_call+0x58/0x6c
[  499.265542]
[  499.265542] -> #1 (event_mutex){+.+.}:
[  499.265738]__lock_acquire+0x53c/0x910
[  499.265896]lock_acquire+0xf4/0x2f0
[  499.266019]__mutex_lock+0xa0/0xab0
[  499.266142]trace_add_event_call+0x44/0x100
[  499.266310]create_trace_uprobe+0x4a0/0x8b0
[  499.266474]trace_run_command+0xa4/0xc0
[  499.266631]trace_parse_run_command+0xe4/0x200
[  499.266799]probes_write+0x20/0x40
[  499.266922]__vfs_write+0x6c/0x240
[  499.267041]vfs_write+0xd0/0x240
[  499.267166]ksys_write+0x6c/0x110
[  499.267295]system_call+0x58/0x6c
[  499.267413]
[  499.267413] -> #0 (uprobe_lock){+.+.}:
[  499.267591]validate_chain.isra.34+0xbd0/0x1000
[  499.267747]__lock_acquire+0x53c/0x910
[  499.267917]lock_acquire+0xf4/0x2f0
[  499.268048]__mutex_lock+0xa0/0xab0
[  499.268170]trace_uprobe_mmap+0x78/0x130
[  499.268335]uprobe_mmap+0x80/0x3b0
[  499.268464]mmap_region+0x290/0x660
[  499.268590]

Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-25 Thread Ravi Bangoria
Thanks Oleg for the review,

On 05/24/2018 09:56 PM, Oleg Nesterov wrote:
> On 04/17, Ravi Bangoria wrote:
>>
>> @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer 
>> *self,
>>  if (ret)
>>  goto err_buffer;
>>  
>> +if (tu->ref_ctr_offset)
>> +sdt_increment_ref_ctr(tu);
>> +
> 
> iiuc, this is probe_event_enable()...
> 
> Looks racy, but afaics the race with uprobe_mmap() will be closed by the next
> change. However, it seems that probe_event_disable() can race with 
> trace_uprobe_mmap()
> too and the next 7/9 patch won't help,
> 
>> +if (tu->ref_ctr_offset)
>> +sdt_decrement_ref_ctr(tu);
>> +
>>  uprobe_unregister(tu->inode, tu->offset, >consumer);
>>  tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
> 
> so what if trace_uprobe_mmap() comes right after uprobe_unregister() ?
> Note that trace_probe_is_enabled() is T until we update tp.flags.

Sure, I'll look at your comments.

Apart from these, I've also found a deadlock between uprobe_lock and
mm->mmap_sem. trace_uprobe_mmap() takes these locks in

   mm->mmap_sem
  uprobe_lock

order but some other code path is taking these locks in reverse order. I've
mentioned sample lockdep warning at the end. The issue is, mm->mmap_sem is
not in control of trace_uprobe_mmap() and we have to take uprobe_lock to
loop over all trace_uprobes.

Any idea how this can be resolved?


Sample lockdep warning:

[  499.258006] ==
[  499.258205] WARNING: possible circular locking dependency detected
[  499.258409] 4.17.0-rc3+ #76 Not tainted
[  499.258528] --
[  499.258731] perf/6744 is trying to acquire lock:
[  499.258895] e4895f49 (uprobe_lock){+.+.}, at: 
trace_uprobe_mmap+0x78/0x130
[  499.259147]
[  499.259147] but task is already holding lock:
[  499.259349] 9ec93a76 (>mmap_sem){}, at: 
vm_mmap_pgoff+0xe0/0x160
[  499.259597]
[  499.259597] which lock already depends on the new lock.
[  499.259597]
[  499.259848]
[  499.259848] the existing dependency chain (in reverse order) is:
[  499.260086]
[  499.260086] -> #4 (>mmap_sem){}:
[  499.260277]__lock_acquire+0x53c/0x910
[  499.260442]lock_acquire+0xf4/0x2f0
[  499.260595]down_write_killable+0x6c/0x150
[  499.260764]copy_process.isra.34.part.35+0x1594/0x1be0
[  499.260967]_do_fork+0xf8/0x910
[  499.261090]ppc_clone+0x8/0xc
[  499.261209]
[  499.261209] -> #3 (_mmap_sem){}:
[  499.261378]__lock_acquire+0x53c/0x910
[  499.261540]lock_acquire+0xf4/0x2f0
[  499.261669]down_write+0x6c/0x110
[  499.261793]percpu_down_write+0x48/0x140
[  499.261954]register_for_each_vma+0x6c/0x2a0
[  499.262116]uprobe_register+0x230/0x320
[  499.262277]probe_event_enable+0x1cc/0x540
[  499.262435]perf_trace_event_init+0x1e0/0x350
[  499.262587]perf_trace_init+0xb0/0x110
[  499.262750]perf_tp_event_init+0x38/0x90
[  499.262910]perf_try_init_event+0x10c/0x150
[  499.263075]perf_event_alloc+0xbb0/0xf10
[  499.263235]sys_perf_event_open+0x2a8/0xdd0
[  499.263396]system_call+0x58/0x6c
[  499.263516]
[  499.263516] -> #2 (>register_rwsem){}:
[  499.263723]__lock_acquire+0x53c/0x910
[  499.263884]lock_acquire+0xf4/0x2f0
[  499.264002]down_write+0x6c/0x110
[  499.264118]uprobe_register+0x1ec/0x320
[  499.264283]probe_event_enable+0x1cc/0x540
[  499.264442]perf_trace_event_init+0x1e0/0x350
[  499.264603]perf_trace_init+0xb0/0x110
[  499.264766]perf_tp_event_init+0x38/0x90
[  499.264930]perf_try_init_event+0x10c/0x150
[  499.265092]perf_event_alloc+0xbb0/0xf10
[  499.265261]sys_perf_event_open+0x2a8/0xdd0
[  499.265424]system_call+0x58/0x6c
[  499.265542]
[  499.265542] -> #1 (event_mutex){+.+.}:
[  499.265738]__lock_acquire+0x53c/0x910
[  499.265896]lock_acquire+0xf4/0x2f0
[  499.266019]__mutex_lock+0xa0/0xab0
[  499.266142]trace_add_event_call+0x44/0x100
[  499.266310]create_trace_uprobe+0x4a0/0x8b0
[  499.266474]trace_run_command+0xa4/0xc0
[  499.266631]trace_parse_run_command+0xe4/0x200
[  499.266799]probes_write+0x20/0x40
[  499.266922]__vfs_write+0x6c/0x240
[  499.267041]vfs_write+0xd0/0x240
[  499.267166]ksys_write+0x6c/0x110
[  499.267295]system_call+0x58/0x6c
[  499.267413]
[  499.267413] -> #0 (uprobe_lock){+.+.}:
[  499.267591]validate_chain.isra.34+0xbd0/0x1000
[  499.267747]__lock_acquire+0x53c/0x910
[  499.267917]lock_acquire+0xf4/0x2f0
[  499.268048]__mutex_lock+0xa0/0xab0
[  499.268170]trace_uprobe_mmap+0x78/0x130
[  499.268335]uprobe_mmap+0x80/0x3b0
[  499.268464]mmap_region+0x290/0x660
[  499.268590]

Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-24 Thread Oleg Nesterov
Hi Ravi,

sorry for delay!

I am trying to recall what this code should do ;) At first glance, I do
not see any serious problem in this version... except it doesn't apply
to Linus's tree. just one question for now.

On 04/17, Ravi Bangoria wrote:
>
> @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer 
> *self,
>   if (ret)
>   goto err_buffer;
>  
> + if (tu->ref_ctr_offset)
> + sdt_increment_ref_ctr(tu);
> +

iiuc, this is probe_event_enable()...

Looks racy, but afaics the race with uprobe_mmap() will be closed by the next
change. However, it seems that probe_event_disable() can race with 
trace_uprobe_mmap()
too and the next 7/9 patch won't help,

> + if (tu->ref_ctr_offset)
> + sdt_decrement_ref_ctr(tu);
> +
>   uprobe_unregister(tu->inode, tu->offset, >consumer);
>   tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;

so what if trace_uprobe_mmap() comes right after uprobe_unregister() ?
Note that trace_probe_is_enabled() is T until we update tp.flags.

Oleg.



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-24 Thread Oleg Nesterov
Hi Ravi,

sorry for delay!

I am trying to recall what this code should do ;) At first glance, I do
not see any serious problem in this version... except it doesn't apply
to Linus's tree. just one question for now.

On 04/17, Ravi Bangoria wrote:
>
> @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer 
> *self,
>   if (ret)
>   goto err_buffer;
>  
> + if (tu->ref_ctr_offset)
> + sdt_increment_ref_ctr(tu);
> +

iiuc, this is probe_event_enable()...

Looks racy, but afaics the race with uprobe_mmap() will be closed by the next
change. However, it seems that probe_event_disable() can race with 
trace_uprobe_mmap()
too and the next 7/9 patch won't help,

> + if (tu->ref_ctr_offset)
> + sdt_decrement_ref_ctr(tu);
> +
>   uprobe_unregister(tu->inode, tu->offset, >consumer);
>   tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;

so what if trace_uprobe_mmap() comes right after uprobe_unregister() ?
Note that trace_probe_is_enabled() is T until we update tp.flags.

Oleg.



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-08 Thread Ravi Bangoria
Hi Masami,

On 05/07/2018 09:26 PM, Masami Hiramatsu wrote:
> On Mon, 7 May 2018 13:51:21 +0530
> Ravi Bangoria  wrote:
>
>> Hi Masami,
>>
>> On 05/04/2018 07:51 PM, Ravi Bangoria wrote:
> +}
> +
> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
> +{
> + struct uprobe_map_info *info;
> +
> + uprobe_down_write_dup_mmap();
> + info = uprobe_build_map_info(tu->inode->i_mapping,
> + tu->ref_ctr_offset, false);
> + if (IS_ERR(info))
> + goto out;
> +
> + while (info) {
> + down_write(>mm->mmap_sem);
> +
> + if (sdt_find_vma(tu, info->mm, info->vaddr))
> + sdt_update_ref_ctr(info->mm, info->vaddr, 1);
 Don't you have to handle the error to map pages here?
>>> Correct.. I think, I've to feedback error code to 
>>> probe_event_{enable|disable}
>>> and handler failure there.
>> I looked at this. Actually, It looks difficult to feedback errors to
>> probe_event_{enable|disable}, esp. in the mmap() case.
> Hmm, can't you roll that back if sdt_increment_ref_ctr() fails?
> If so, how does sdt_decrement_ref_ctr() work in that case?

Yes, it's easy to rollback in sdt_increment_ref_ctr(). But not much can
be done if trace_uprobe_mmap() fails.

What would be good is, if we can feedback uprobe_mmap() failures
to the perf infrastructure, which can finally be parsed by perf record.
But that should be done as a separate work.

>> Is it fine if we just warn sdt_update_ref_ctr() failures in dmesg? I'm
>> doing this in [PATCH 7]. (Though, it makes more sense to do that in
>> [PATCH 6], will change it in next version).
> Of course we need to warn it at least, but the best is rejecting to
> enable it.

Yes, we can reject it for sdt_increment_ref_ctr() failures.

>> Any better ideas?
>>
>> BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
>> there is no feedback to trace_uprobe and no warnigns in dmesg as
>> well !! There was a patch by Naveen to warn such failures in dmesg
>> but that didn't go in: https://lkml.org/lkml/2017/9/22/155
> Oops, that's a real bug. It seems the ball is in Naveen's hand.
> Naveen, could you update it according to Oleg's comment, and resend it?
>
>> Also, I'll add a check in sdt_update_ref_ctr() to make sure reference
>> counter never goes to negative incase increment fails but decrement
>> succeeds. OTOH, if increment succeeds but decrement fails, the
>> counter remains >0 but there is no harm as such, except we will
>> execute some unnecessary code.
> I see. Please carefully clarify whether such case is kernel's bug or not.
> I would like to know what the condition causes that uneven behavior.

Sure, will do that.

Thanks,
Ravi



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-08 Thread Ravi Bangoria
Hi Masami,

On 05/07/2018 09:26 PM, Masami Hiramatsu wrote:
> On Mon, 7 May 2018 13:51:21 +0530
> Ravi Bangoria  wrote:
>
>> Hi Masami,
>>
>> On 05/04/2018 07:51 PM, Ravi Bangoria wrote:
> +}
> +
> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
> +{
> + struct uprobe_map_info *info;
> +
> + uprobe_down_write_dup_mmap();
> + info = uprobe_build_map_info(tu->inode->i_mapping,
> + tu->ref_ctr_offset, false);
> + if (IS_ERR(info))
> + goto out;
> +
> + while (info) {
> + down_write(>mm->mmap_sem);
> +
> + if (sdt_find_vma(tu, info->mm, info->vaddr))
> + sdt_update_ref_ctr(info->mm, info->vaddr, 1);
 Don't you have to handle the error to map pages here?
>>> Correct.. I think, I've to feedback error code to 
>>> probe_event_{enable|disable}
>>> and handler failure there.
>> I looked at this. Actually, It looks difficult to feedback errors to
>> probe_event_{enable|disable}, esp. in the mmap() case.
> Hmm, can't you roll that back if sdt_increment_ref_ctr() fails?
> If so, how does sdt_decrement_ref_ctr() work in that case?

Yes, it's easy to rollback in sdt_increment_ref_ctr(). But not much can
be done if trace_uprobe_mmap() fails.

What would be good is, if we can feedback uprobe_mmap() failures
to the perf infrastructure, which can finally be parsed by perf record.
But that should be done as a separate work.

>> Is it fine if we just warn sdt_update_ref_ctr() failures in dmesg? I'm
>> doing this in [PATCH 7]. (Though, it makes more sense to do that in
>> [PATCH 6], will change it in next version).
> Of course we need to warn it at least, but the best is rejecting to
> enable it.

Yes, we can reject it for sdt_increment_ref_ctr() failures.

>> Any better ideas?
>>
>> BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
>> there is no feedback to trace_uprobe and no warnigns in dmesg as
>> well !! There was a patch by Naveen to warn such failures in dmesg
>> but that didn't go in: https://lkml.org/lkml/2017/9/22/155
> Oops, that's a real bug. It seems the ball is in Naveen's hand.
> Naveen, could you update it according to Oleg's comment, and resend it?
>
>> Also, I'll add a check in sdt_update_ref_ctr() to make sure reference
>> counter never goes to negative incase increment fails but decrement
>> succeeds. OTOH, if increment succeeds but decrement fails, the
>> counter remains >0 but there is no harm as such, except we will
>> execute some unnecessary code.
> I see. Please carefully clarify whether such case is kernel's bug or not.
> I would like to know what the condition causes that uneven behavior.

Sure, will do that.

Thanks,
Ravi



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-08 Thread Naveen N. Rao

Masami Hiramatsu wrote:

On Mon, 7 May 2018 13:51:21 +0530
Ravi Bangoria  wrote:


BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
there is no feedback to trace_uprobe and no warnigns in dmesg as
well !! There was a patch by Naveen to warn such failures in dmesg
but that didn't go in: https://lkml.org/lkml/2017/9/22/155


Oops, that's a real bug. It seems the ball is in Naveen's hand.
Naveen, could you update it according to Oleg's comment, and resend it?


Yes, I've had to put that series on the backburner. I will try and get 
to it soon. Thanks for the reminder.


- Naveen




Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-08 Thread Naveen N. Rao

Masami Hiramatsu wrote:

On Mon, 7 May 2018 13:51:21 +0530
Ravi Bangoria  wrote:


BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
there is no feedback to trace_uprobe and no warnigns in dmesg as
well !! There was a patch by Naveen to warn such failures in dmesg
but that didn't go in: https://lkml.org/lkml/2017/9/22/155


Oops, that's a real bug. It seems the ball is in Naveen's hand.
Naveen, could you update it according to Oleg's comment, and resend it?


Yes, I've had to put that series on the backburner. I will try and get 
to it soon. Thanks for the reminder.


- Naveen




Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-07 Thread Masami Hiramatsu
On Mon, 7 May 2018 13:51:21 +0530
Ravi Bangoria  wrote:

> Hi Masami,
> 
> On 05/04/2018 07:51 PM, Ravi Bangoria wrote:
> >
> >>> +}
> >>> +
> >>> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
> >>> +{
> >>> + struct uprobe_map_info *info;
> >>> +
> >>> + uprobe_down_write_dup_mmap();
> >>> + info = uprobe_build_map_info(tu->inode->i_mapping,
> >>> + tu->ref_ctr_offset, false);
> >>> + if (IS_ERR(info))
> >>> + goto out;
> >>> +
> >>> + while (info) {
> >>> + down_write(>mm->mmap_sem);
> >>> +
> >>> + if (sdt_find_vma(tu, info->mm, info->vaddr))
> >>> + sdt_update_ref_ctr(info->mm, info->vaddr, 1);
> >> Don't you have to handle the error to map pages here?
> > Correct.. I think, I've to feedback error code to 
> > probe_event_{enable|disable}
> > and handler failure there.
> 
> I looked at this. Actually, It looks difficult to feedback errors to
> probe_event_{enable|disable}, esp. in the mmap() case.

Hmm, can't you roll that back if sdt_increment_ref_ctr() fails?
If so, how does sdt_decrement_ref_ctr() work in that case?

> Is it fine if we just warn sdt_update_ref_ctr() failures in dmesg? I'm
> doing this in [PATCH 7]. (Though, it makes more sense to do that in
> [PATCH 6], will change it in next version).

Of course we need to warn it at least, but the best is rejecting to
enable it.

> 
> Any better ideas?
> 
> BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
> there is no feedback to trace_uprobe and no warnigns in dmesg as
> well !! There was a patch by Naveen to warn such failures in dmesg
> but that didn't go in: https://lkml.org/lkml/2017/9/22/155

Oops, that's a real bug. It seems the ball is in Naveen's hand.
Naveen, could you update it according to Oleg's comment, and resend it?

> 
> Also, I'll add a check in sdt_update_ref_ctr() to make sure reference
> counter never goes to negative incase increment fails but decrement
> succeeds. OTOH, if increment succeeds but decrement fails, the
> counter remains >0 but there is no harm as such, except we will
> execute some unnecessary code.

I see. Please carefully clarify whether such case is kernel's bug or not.
I would like to know what the condition causes that uneven behavior.

Thank you,

> 
> Thanks,
> Ravi
> 


-- 
Masami Hiramatsu 


Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-07 Thread Masami Hiramatsu
On Mon, 7 May 2018 13:51:21 +0530
Ravi Bangoria  wrote:

> Hi Masami,
> 
> On 05/04/2018 07:51 PM, Ravi Bangoria wrote:
> >
> >>> +}
> >>> +
> >>> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
> >>> +{
> >>> + struct uprobe_map_info *info;
> >>> +
> >>> + uprobe_down_write_dup_mmap();
> >>> + info = uprobe_build_map_info(tu->inode->i_mapping,
> >>> + tu->ref_ctr_offset, false);
> >>> + if (IS_ERR(info))
> >>> + goto out;
> >>> +
> >>> + while (info) {
> >>> + down_write(>mm->mmap_sem);
> >>> +
> >>> + if (sdt_find_vma(tu, info->mm, info->vaddr))
> >>> + sdt_update_ref_ctr(info->mm, info->vaddr, 1);
> >> Don't you have to handle the error to map pages here?
> > Correct.. I think, I've to feedback error code to 
> > probe_event_{enable|disable}
> > and handler failure there.
> 
> I looked at this. Actually, It looks difficult to feedback errors to
> probe_event_{enable|disable}, esp. in the mmap() case.

Hmm, can't you roll that back if sdt_increment_ref_ctr() fails?
If so, how does sdt_decrement_ref_ctr() work in that case?

> Is it fine if we just warn sdt_update_ref_ctr() failures in dmesg? I'm
> doing this in [PATCH 7]. (Though, it makes more sense to do that in
> [PATCH 6], will change it in next version).

Of course we need to warn it at least, but the best is rejecting to
enable it.

> 
> Any better ideas?
> 
> BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
> there is no feedback to trace_uprobe and no warnigns in dmesg as
> well !! There was a patch by Naveen to warn such failures in dmesg
> but that didn't go in: https://lkml.org/lkml/2017/9/22/155

Oops, that's a real bug. It seems the ball is in Naveen's hand.
Naveen, could you update it according to Oleg's comment, and resend it?

> 
> Also, I'll add a check in sdt_update_ref_ctr() to make sure reference
> counter never goes to negative incase increment fails but decrement
> succeeds. OTOH, if increment succeeds but decrement fails, the
> counter remains >0 but there is no harm as such, except we will
> execute some unnecessary code.

I see. Please carefully clarify whether such case is kernel's bug or not.
I would like to know what the condition causes that uneven behavior.

Thank you,

> 
> Thanks,
> Ravi
> 


-- 
Masami Hiramatsu 


Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-07 Thread Ravi Bangoria
Hi Masami,

On 05/04/2018 07:51 PM, Ravi Bangoria wrote:
>
>>> +}
>>> +
>>> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
>>> +{
>>> +   struct uprobe_map_info *info;
>>> +
>>> +   uprobe_down_write_dup_mmap();
>>> +   info = uprobe_build_map_info(tu->inode->i_mapping,
>>> +   tu->ref_ctr_offset, false);
>>> +   if (IS_ERR(info))
>>> +   goto out;
>>> +
>>> +   while (info) {
>>> +   down_write(>mm->mmap_sem);
>>> +
>>> +   if (sdt_find_vma(tu, info->mm, info->vaddr))
>>> +   sdt_update_ref_ctr(info->mm, info->vaddr, 1);
>> Don't you have to handle the error to map pages here?
> Correct.. I think, I've to feedback error code to probe_event_{enable|disable}
> and handler failure there.

I looked at this. Actually, It looks difficult to feedback errors to
probe_event_{enable|disable}, esp. in the mmap() case.

Is it fine if we just warn sdt_update_ref_ctr() failures in dmesg? I'm
doing this in [PATCH 7]. (Though, it makes more sense to do that in
[PATCH 6], will change it in next version).

Any better ideas?

BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
there is no feedback to trace_uprobe and no warnigns in dmesg as
well !! There was a patch by Naveen to warn such failures in dmesg
but that didn't go in: https://lkml.org/lkml/2017/9/22/155

Also, I'll add a check in sdt_update_ref_ctr() to make sure reference
counter never goes to negative incase increment fails but decrement
succeeds. OTOH, if increment succeeds but decrement fails, the
counter remains >0 but there is no harm as such, except we will
execute some unnecessary code.

Thanks,
Ravi



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-07 Thread Ravi Bangoria
Hi Masami,

On 05/04/2018 07:51 PM, Ravi Bangoria wrote:
>
>>> +}
>>> +
>>> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
>>> +{
>>> +   struct uprobe_map_info *info;
>>> +
>>> +   uprobe_down_write_dup_mmap();
>>> +   info = uprobe_build_map_info(tu->inode->i_mapping,
>>> +   tu->ref_ctr_offset, false);
>>> +   if (IS_ERR(info))
>>> +   goto out;
>>> +
>>> +   while (info) {
>>> +   down_write(>mm->mmap_sem);
>>> +
>>> +   if (sdt_find_vma(tu, info->mm, info->vaddr))
>>> +   sdt_update_ref_ctr(info->mm, info->vaddr, 1);
>> Don't you have to handle the error to map pages here?
> Correct.. I think, I've to feedback error code to probe_event_{enable|disable}
> and handler failure there.

I looked at this. Actually, It looks difficult to feedback errors to
probe_event_{enable|disable}, esp. in the mmap() case.

Is it fine if we just warn sdt_update_ref_ctr() failures in dmesg? I'm
doing this in [PATCH 7]. (Though, it makes more sense to do that in
[PATCH 6], will change it in next version).

Any better ideas?

BTW, same issue exists for normal uprobe. If uprobe_mmap() fails,
there is no feedback to trace_uprobe and no warnigns in dmesg as
well !! There was a patch by Naveen to warn such failures in dmesg
but that didn't go in: https://lkml.org/lkml/2017/9/22/155

Also, I'll add a check in sdt_update_ref_ctr() to make sure reference
counter never goes to negative incase increment fails but decrement
succeeds. OTOH, if increment succeeds but decrement fails, the
counter remains >0 but there is no harm as such, except we will
execute some unnecessary code.

Thanks,
Ravi



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-04 Thread Ravi Bangoria
Hi Masami,

On 05/04/2018 10:18 AM, Masami Hiramatsu wrote:
>> +void uprobe_down_write_dup_mmap(void)
>> +{
>> +percpu_down_write(_mmap_sem);
>> +}
>> +
>> +void uprobe_up_write_dup_mmap(void)
>> +{
>> +percpu_up_write(_mmap_sem);
>> +}
>> +
> I'm not sure why these hunks are not done in previous patch.
> If you separate "uprobe_map_info" export patch, this also
> should be separated. (Or both merged into this patch)

Sure, I'll add separate patch for dup_mmap_sem.

>> +/*
>> + * Reference counter gate the invocation of probe. If present,
>> + * by default reference counter is 0. One needs to increment
>> + * it before tracing the probe and decrement it when done.
>> + */
>> +static int
>> +sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
>> +{
>> +void *kaddr;
>> +struct page *page;
>> +struct vm_area_struct *vma;
>> +int ret = 0;
>> +unsigned short *ptr;
>> +
>> +if (vaddr == 0)
>> +return -EINVAL;
>> +
>> +ret = get_user_pages_remote(NULL, mm, vaddr, 1,
>> +FOLL_FORCE | FOLL_WRITE, , , NULL);
>> +if (ret <= 0)
>> +return ret;
> Hmm, get_user_pages_remote() said
>
> ===
> If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns 
> -errno.
> ===
>
> And you've passed 1 for nr_pages, so it must be 1 or -errno.
>
>> +
>> +kaddr = kmap_atomic(page);
>> +ptr = kaddr + (vaddr & ~PAGE_MASK);
>> +*ptr += d;
>> +kunmap_atomic(kaddr);
>> +
>> +put_page(page);
>> +return 0;
> And obviously 0 means "success" for sdt_update_ref_ctr().
> I think if get_user_pages_remote returns 0, this should
> return -EBUSY (*) or something else.
>
> * It seems that if faultin_page() in __get_user_pages()
> returns -EBUSY, get_user_pages_remote() can return 0.

Ah good catch :). Will change it.

>> +}
>> +
>> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
>> +{
>> +struct uprobe_map_info *info;
>> +
>> +uprobe_down_write_dup_mmap();
>> +info = uprobe_build_map_info(tu->inode->i_mapping,
>> +tu->ref_ctr_offset, false);
>> +if (IS_ERR(info))
>> +goto out;
>> +
>> +while (info) {
>> +down_write(>mm->mmap_sem);
>> +
>> +if (sdt_find_vma(tu, info->mm, info->vaddr))
>> +sdt_update_ref_ctr(info->mm, info->vaddr, 1);
> Don't you have to handle the error to map pages here?

Correct.. I think, I've to feedback error code to probe_event_{enable|disable}
and handler failure there.

Thanks for the review,
Ravi



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-04 Thread Ravi Bangoria
Hi Masami,

On 05/04/2018 10:18 AM, Masami Hiramatsu wrote:
>> +void uprobe_down_write_dup_mmap(void)
>> +{
>> +percpu_down_write(_mmap_sem);
>> +}
>> +
>> +void uprobe_up_write_dup_mmap(void)
>> +{
>> +percpu_up_write(_mmap_sem);
>> +}
>> +
> I'm not sure why these hunks are not done in previous patch.
> If you separate "uprobe_map_info" export patch, this also
> should be separated. (Or both merged into this patch)

Sure, I'll add separate patch for dup_mmap_sem.

>> +/*
>> + * Reference counter gate the invocation of probe. If present,
>> + * by default reference counter is 0. One needs to increment
>> + * it before tracing the probe and decrement it when done.
>> + */
>> +static int
>> +sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
>> +{
>> +void *kaddr;
>> +struct page *page;
>> +struct vm_area_struct *vma;
>> +int ret = 0;
>> +unsigned short *ptr;
>> +
>> +if (vaddr == 0)
>> +return -EINVAL;
>> +
>> +ret = get_user_pages_remote(NULL, mm, vaddr, 1,
>> +FOLL_FORCE | FOLL_WRITE, , , NULL);
>> +if (ret <= 0)
>> +return ret;
> Hmm, get_user_pages_remote() said
>
> ===
> If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns 
> -errno.
> ===
>
> And you've passed 1 for nr_pages, so it must be 1 or -errno.
>
>> +
>> +kaddr = kmap_atomic(page);
>> +ptr = kaddr + (vaddr & ~PAGE_MASK);
>> +*ptr += d;
>> +kunmap_atomic(kaddr);
>> +
>> +put_page(page);
>> +return 0;
> And obviously 0 means "success" for sdt_update_ref_ctr().
> I think if get_user_pages_remote returns 0, this should
> return -EBUSY (*) or something else.
>
> * It seems that if faultin_page() in __get_user_pages()
> returns -EBUSY, get_user_pages_remote() can return 0.

Ah good catch :). Will change it.

>> +}
>> +
>> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
>> +{
>> +struct uprobe_map_info *info;
>> +
>> +uprobe_down_write_dup_mmap();
>> +info = uprobe_build_map_info(tu->inode->i_mapping,
>> +tu->ref_ctr_offset, false);
>> +if (IS_ERR(info))
>> +goto out;
>> +
>> +while (info) {
>> +down_write(>mm->mmap_sem);
>> +
>> +if (sdt_find_vma(tu, info->mm, info->vaddr))
>> +sdt_update_ref_ctr(info->mm, info->vaddr, 1);
> Don't you have to handle the error to map pages here?

Correct.. I think, I've to feedback error code to probe_event_{enable|disable}
and handler failure there.

Thanks for the review,
Ravi



Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-03 Thread Masami Hiramatsu
Hi Ravi,

I have some comments, please see below.

On Tue, 17 Apr 2018 10:02:41 +0530
Ravi Bangoria  wrote:\

> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 7bd2760..2db3ed1 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -122,6 +122,8 @@ struct uprobe_map_info {
>   unsigned long vaddr;
>  };
>  
> +extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
> +
>  extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned 
> long vaddr);
>  extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, 
> unsigned long vaddr);
>  extern bool is_swbp_insn(uprobe_opcode_t *insn);
> @@ -136,6 +138,8 @@ struct uprobe_map_info {
>  extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, 
> unsigned long end);
>  extern void uprobe_start_dup_mmap(void);
>  extern void uprobe_end_dup_mmap(void);
> +extern void uprobe_down_write_dup_mmap(void);
> +extern void uprobe_up_write_dup_mmap(void);
>  extern void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct 
> *newmm);
>  extern void uprobe_free_utask(struct task_struct *t);
>  extern void uprobe_copy_process(struct task_struct *t, unsigned long flags);
> @@ -192,6 +196,12 @@ static inline void uprobe_start_dup_mmap(void)
>  static inline void uprobe_end_dup_mmap(void)
>  {
>  }
> +static inline void uprobe_down_write_dup_mmap(void)
> +{
> +}
> +static inline void uprobe_up_write_dup_mmap(void)
> +{
> +}
>  static inline void
>  uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
>  {
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 096d1e6..e26ad83 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1044,6 +1044,9 @@ static void build_probe_list(struct inode *inode,
>   spin_unlock(_treelock);
>  }
>  
> +/* Rightnow the only user of this is trace_uprobe. */
> +void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
> +
>  /*
>   * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
>   *
> @@ -1056,7 +1059,13 @@ int uprobe_mmap(struct vm_area_struct *vma)
>   struct uprobe *uprobe, *u;
>   struct inode *inode;
>  
> - if (no_uprobe_events() || !valid_vma(vma, true))
> + if (no_uprobe_events())
> + return 0;
> +
> + if (uprobe_mmap_callback)
> + uprobe_mmap_callback(vma);
> +
> + if (!valid_vma(vma, true))
>   return 0;
>  
>   inode = file_inode(vma->vm_file);
> @@ -1247,6 +1256,16 @@ void uprobe_end_dup_mmap(void)
>   percpu_up_read(_mmap_sem);
>  }
>  
> +void uprobe_down_write_dup_mmap(void)
> +{
> + percpu_down_write(_mmap_sem);
> +}
> +
> +void uprobe_up_write_dup_mmap(void)
> +{
> + percpu_up_write(_mmap_sem);
> +}
> +

I'm not sure why these hunks are not done in previous patch.
If you separate "uprobe_map_info" export patch, this also
should be separated. (Or both merged into this patch)


>  void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
>  {
>   if (test_bit(MMF_HAS_UPROBES, >flags)) {
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 0d450b4..1a48b04 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -25,6 +25,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "trace_probe.h"
>  
> @@ -58,6 +60,7 @@ struct trace_uprobe {
>   struct inode*inode;
>   char*filename;
>   unsigned long   offset;
> + unsigned long   ref_ctr_offset;
>   unsigned long   nhit;
>   struct trace_probe  tp;
>  };
> @@ -364,10 +367,10 @@ static int create_trace_uprobe(int argc, char **argv)
>  {
>   struct trace_uprobe *tu;
>   struct inode *inode;
> - char *arg, *event, *group, *filename;
> + char *arg, *event, *group, *filename, *rctr, *rctr_end;
>   char buf[MAX_EVENT_NAME_LEN];
>   struct path path;
> - unsigned long offset;
> + unsigned long offset, ref_ctr_offset;
>   bool is_delete, is_return;
>   int i, ret;
>  
> @@ -377,6 +380,7 @@ static int create_trace_uprobe(int argc, char **argv)
>   is_return = false;
>   event = NULL;
>   group = NULL;
> + ref_ctr_offset = 0;
>  
>   /* argc must be >= 1 */
>   if (argv[0][0] == '-')
> @@ -456,6 +460,26 @@ static int create_trace_uprobe(int argc, char **argv)
>   goto fail_address_parse;
>   }
>  
> + /* Parse reference counter offset if specified. */
> + rctr = strchr(arg, '(');
> + if (rctr) {
> + rctr_end = strchr(rctr, ')');
> + if (rctr > rctr_end || *(rctr_end + 1) != 0) {
> + ret = -EINVAL;
> + pr_info("Invalid reference counter offset.\n");
> + goto fail_address_parse;
> + }
> +
> +   

Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-03 Thread Masami Hiramatsu
Hi Ravi,

I have some comments, please see below.

On Tue, 17 Apr 2018 10:02:41 +0530
Ravi Bangoria  wrote:\

> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 7bd2760..2db3ed1 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -122,6 +122,8 @@ struct uprobe_map_info {
>   unsigned long vaddr;
>  };
>  
> +extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
> +
>  extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned 
> long vaddr);
>  extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, 
> unsigned long vaddr);
>  extern bool is_swbp_insn(uprobe_opcode_t *insn);
> @@ -136,6 +138,8 @@ struct uprobe_map_info {
>  extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, 
> unsigned long end);
>  extern void uprobe_start_dup_mmap(void);
>  extern void uprobe_end_dup_mmap(void);
> +extern void uprobe_down_write_dup_mmap(void);
> +extern void uprobe_up_write_dup_mmap(void);
>  extern void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct 
> *newmm);
>  extern void uprobe_free_utask(struct task_struct *t);
>  extern void uprobe_copy_process(struct task_struct *t, unsigned long flags);
> @@ -192,6 +196,12 @@ static inline void uprobe_start_dup_mmap(void)
>  static inline void uprobe_end_dup_mmap(void)
>  {
>  }
> +static inline void uprobe_down_write_dup_mmap(void)
> +{
> +}
> +static inline void uprobe_up_write_dup_mmap(void)
> +{
> +}
>  static inline void
>  uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
>  {
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 096d1e6..e26ad83 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1044,6 +1044,9 @@ static void build_probe_list(struct inode *inode,
>   spin_unlock(_treelock);
>  }
>  
> +/* Rightnow the only user of this is trace_uprobe. */
> +void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
> +
>  /*
>   * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
>   *
> @@ -1056,7 +1059,13 @@ int uprobe_mmap(struct vm_area_struct *vma)
>   struct uprobe *uprobe, *u;
>   struct inode *inode;
>  
> - if (no_uprobe_events() || !valid_vma(vma, true))
> + if (no_uprobe_events())
> + return 0;
> +
> + if (uprobe_mmap_callback)
> + uprobe_mmap_callback(vma);
> +
> + if (!valid_vma(vma, true))
>   return 0;
>  
>   inode = file_inode(vma->vm_file);
> @@ -1247,6 +1256,16 @@ void uprobe_end_dup_mmap(void)
>   percpu_up_read(_mmap_sem);
>  }
>  
> +void uprobe_down_write_dup_mmap(void)
> +{
> + percpu_down_write(_mmap_sem);
> +}
> +
> +void uprobe_up_write_dup_mmap(void)
> +{
> + percpu_up_write(_mmap_sem);
> +}
> +

I'm not sure why these hunks are not done in previous patch.
If you separate "uprobe_map_info" export patch, this also
should be separated. (Or both merged into this patch)


>  void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
>  {
>   if (test_bit(MMF_HAS_UPROBES, >flags)) {
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 0d450b4..1a48b04 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -25,6 +25,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "trace_probe.h"
>  
> @@ -58,6 +60,7 @@ struct trace_uprobe {
>   struct inode*inode;
>   char*filename;
>   unsigned long   offset;
> + unsigned long   ref_ctr_offset;
>   unsigned long   nhit;
>   struct trace_probe  tp;
>  };
> @@ -364,10 +367,10 @@ static int create_trace_uprobe(int argc, char **argv)
>  {
>   struct trace_uprobe *tu;
>   struct inode *inode;
> - char *arg, *event, *group, *filename;
> + char *arg, *event, *group, *filename, *rctr, *rctr_end;
>   char buf[MAX_EVENT_NAME_LEN];
>   struct path path;
> - unsigned long offset;
> + unsigned long offset, ref_ctr_offset;
>   bool is_delete, is_return;
>   int i, ret;
>  
> @@ -377,6 +380,7 @@ static int create_trace_uprobe(int argc, char **argv)
>   is_return = false;
>   event = NULL;
>   group = NULL;
> + ref_ctr_offset = 0;
>  
>   /* argc must be >= 1 */
>   if (argv[0][0] == '-')
> @@ -456,6 +460,26 @@ static int create_trace_uprobe(int argc, char **argv)
>   goto fail_address_parse;
>   }
>  
> + /* Parse reference counter offset if specified. */
> + rctr = strchr(arg, '(');
> + if (rctr) {
> + rctr_end = strchr(rctr, ')');
> + if (rctr > rctr_end || *(rctr_end + 1) != 0) {
> + ret = -EINVAL;
> + pr_info("Invalid reference counter offset.\n");
> + goto fail_address_parse;
> + }
> +
> + *rctr++ = '\0';
> +