from:"Thomas\-Mich Richter"

Re: [PATCH 1/3] perf report: Fix OOM error in TUI mode on s390

2019-05-23 Thread Thomas-Mich Richter

On 5/22/19 8:08 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 22, 2019 at 04:45:59PM +0200, Thomas Richter escreveu:
>


.

>>
>> This size kills the TUI interface when executing the following
>> code:
>>
>>   process_sample_event()
>> hist_entry_iter__add()
>>   hist_iter__report_callback()
>> hist_entry__inc_addr_samples()
>>   symbol__inc_addr_samples(symbol = __bss_stop)
>> symbol__cycles_hist()
>>annotated_source__alloc_histograms(...,
>>  symbol__size(sym),
>>  ...)
>>
>> This function allocates memory to save sample histograms.
>> The symbol_size() marco is defined as sym->end - sym->start, which
>> results in above value of 0x3fe6e64a850 bytes and
>> the call to calloc() in annotated_source__alloc_histograms() fails.
> 
> Humm, why are we getting samples in that area? Is it some JITted thing
> like BPF? What is it?
> 
> Why not just not consider the calloc failure as fatal, and when/if the
> user asks for annotation on such symbol, tell the user that it wasn't
> possible to allocate N bytes for it?
> 
> - Arnaldo


I have not debugged why this sample address was recorded. BPF code was not
running.

I have no problem with making calloc() failure no fatal, however we might
still allocate large memory...

I will send a new patch.

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-10 Thread Thomas-Mich Richter

On 4/9/19 10:53 AM, Mark Rutland wrote:
> On Mon, Apr 08, 2019 at 11:50:31AM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
>>> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
>>

.


>>
>> Instead encode the CPU number in @pending_disable, such that we can
>> tell which CPU requested the disable. This then allows us to detect
>> the above scenario and even redirect the IPI to make up for the failed
>> queue.
>>
>> Cc: Heiko Carstens 
>> Cc: Kees Cook 
>> Cc: Hendrik Brueckner 
>> Cc: a...@redhat.com
>> Cc: Martin Schwidefsky 
>> Cc: Mark Rutland 
>> Cc: Jiri Olsa 
>> Reported-by: Thomas-Mich Richter 
>> Signed-off-by: Peter Zijlstra (Intel) 
> 
> I can't think of a nicer way of handling this, so FWIW:
> 
> Acked-by: Mark Rutland 
> 
> Mark.
> 
>> ---

Thanks for the fix with commit id 86071b11317550d994b55ce5e31aa06bcad783b5.

However doing an fgrep on the pending_disable member of struct perf_event
reveals two more hits in file kernel/events/ringbuffer.c when events
have auxiliary data.

Not sure if this needs to be addresses too, just wanted to let you know.

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf/report: [RFC] Handling OOM in perf report

2019-04-10 Thread Thomas-Mich Richter

On 4/9/19 12:42 PM, Jiri Olsa wrote:
> On Mon, Apr 01, 2019 at 04:20:00PM +0200, Thomas Richter wrote:
> 
> SNIP
> 
>> perf_session__process_event() returns to its caller, where -ENOMEM is
>> changed to -EINVAL and processing stops:
>>
>>  if ((skip = perf_session__process_event(session, event, head)) < 0) {
>>   pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
>>   head, event->header.size, event->header.type);
>>   err = -EINVAL;
>>   goto out_err;
>>  }
>>
>> This occured in the FINISHED_ROUND event when it has to process some
>> 1 entries and ran out of memory.
>>
>> I understand that my perf.data file might just be too big, but I would
>> like to see some error message indicating OOM error instead of
>> processing failure of a unrelated event.
> 
> you can limit the size of the report queue via ~/.perfconfig file:
> 
> [report]
> queue-size=1M
> 
> above should use only 1M for the queue management data the data
> for sample still get allocated thought.. but it could help
> 
>>
>> However this patch just does what the pr_debug() statement indicates,
>> the event is skipped and processing continues.
>> But at least the root cause is indicated and also shows up in the
>> GUI.
>>
>> Signed-off-by: Thomas Richter 
>> ---
>>  tools/perf/builtin-report.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index 4054eb1f98ac..7a27b815f7a8 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -283,8 +283,13 @@ static int process_sample_event(struct perf_tool *tool,
>>  al.map->dso->hit = 1;
>>  
>>  ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
>> -if (ret < 0)
>> +if (ret < 0) {
>>  pr_debug("problem adding hist entry, skipping event\n");
>> +if (ret == -ENOMEM) {
>> +pr_err("Running out of memory\n");
>> +ret = 0;
>> +}
>> +}
> 
> 
> I think we can propagate the error completely (like below),
> and even warn about ENOMEM specificaly
> 
> but I think we should bail out in case of ENOMEM, because
> the data are incomplete and misleading
> 
> jirka
> 
> 
> ---
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index b17f1c9bc965..eea247a26ad8 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1933,7 +1933,7 @@ reader__process_events(struct reader *rd, struct 
> perf_session *session,
>   pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
>  file_offset + head, event->header.size,
>  event->header.type);
> - err = -EINVAL;
> + err = skip;
>   goto out;
>   }
>  
> 

Above patch does not help, you simply return -ENOMEM instead of -EINVAL and 
processing
stops with no indication that perf ran out of memory. Bailing out in this case 
is ok.

I am fine with your patch, as long as it gives a reason why processing stopped.
In the GUI it shows on the bottom line the reason:

  0xf4198 [0x8]: failed to process type: 68 [Cannot allocate memory]


diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index b17f1c9bc965..e89716175588 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1930,10 +1930,10 @@ reader__process_events(struct reader *rd, struct 
perf_session *session,
 
if (size < sizeof(struct perf_event_header) ||
(skip = rd->process(session, event, file_pos)) < 0) {
-   pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
+   pr_err("%#" PRIx64 " [%#x]: failed to process type: %d [%s]\n",
   file_offset + head, event->header.size,
-  event->header.type);
-   err = -EINVAL;
+  event->header.type, strerror(-skip));
+   err = skip;
goto out;
}
 
[root@m35lp76 perf]#
-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-08 Thread Thomas-Mich Richter

On 4/8/19 11:50 AM, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> 
>>> very good news, your fix ran over the weekend without any hit!!!
>>>
>>> Thanks very much for your help. Do you submit this patch to the kernel 
>>> mailing list?
>>
>> Most excellent, let me go write a Changelog.
> 
> Hi Thomas, find below.
> 
> Sadly, while writing the Changelog I ended up with a 'completely'
> differet patch again, could I bother you to test this one too?
> 

Hi Peter,

I verified your second patch, it ran overnight without any message in the
kernel log. Thanks again.

Tested-by: Thomas Richter 

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-08 Thread Thomas-Mich Richter

On 4/8/19 11:50 AM, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> 
>>> very good news, your fix ran over the weekend without any hit!!!
>>>
>>> Thanks very much for your help. Do you submit this patch to the kernel 
>>> mailing list?
>>
>> Most excellent, let me go write a Changelog.
> 
> Hi Thomas, find below.
> 
> Sadly, while writing the Changelog I ended up with a 'completely'
> differet patch again, could I bother you to test this one too?
> 


Hi Peter, no problem, test is now running overnight.
Let you know the outcome tomorrow...

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-08 Thread Thomas-Mich Richter

On 4/8/19 10:22 AM, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
>>> Does the below cure things? It's not exactly pretty, but it could just
>>> do the trick.
>>>
>>> ---
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index dfc4bab0b02b..d496e6911442 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
>>> event->pmu->del(event, 0);
>>> event->oncpu = -1;
>>>  
>>> -   if (event->pending_disable) {
>>> -   event->pending_disable = 0;
>>> +   if (event->pending_disable == smp_processor_id()) {
>>> +   event->pending_disable = -1;
>>> state = PERF_EVENT_STATE_OFF;
>>> }
>>> perf_event_set_state(event, state);
>>> @@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>>>  
>>>  void perf_event_disable_inatomic(struct perf_event *event)
>>>  {
>>> -   event->pending_disable = 1;
>>> +   event->pending_disable = smp_processor_id();
>>> irq_work_queue(&event->pending);
>>>  }
>>>  
>>> @@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
>>>  * and we won't recurse 'further'.
>>>  */
>>>  
>>> -   if (event->pending_disable) {
>>> -   event->pending_disable = 0;
>>> +   if (event->pending_disable == smp_processor_id()) {
>>> +   event->pending_disable = -1;
>>> perf_event_disable_local(event);
>>> }
>>>  
>>> @@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int 
>>> cpu,
>>>  
>>>  
>>> init_waitqueue_head(&event->waitq);
>>> +   event->pending_disable = -1;
>>> init_irq_work(&event->pending, perf_pending_event);
>>>  
>>> mutex_init(&event->mmap_mutex);
>>>
>>
>> Peter,
>>
>> very good news, your fix ran over the weekend without any hit!!!
>>
>> Thanks very much for your help. Do you submit this patch to the kernel 
>> mailing list?
> 
> Most excellent, let me go write a Changelog.
> 
> Could I convince you to implement arch_irq_work_raise() for s390?
> 

Yes, I am convinced, however I need to discuss this with the s390 maintainers
Martin Schwidesfky and Heiko Carstens first.

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-08 Thread Thomas-Mich Richter

On 4/4/19 3:03 PM, Peter Zijlstra wrote:
> On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> 
>> That is not entirely the scenario I talked about, but *groan*.
>>
>> So what I meant was:
>>
>>  CPU-0   CPU-n
>>
>>  __schedule()
>>local_irq_disable()
>>
>>...
>>  deactivate_task(prev);
>>
>>  
>> try_to_wake_up(@p)
>>...
>>
>> smp_cond_load_acquire(&p->on_cpu, !VAL);
>>
>>
>>  ..
>>  perf_event_disable_inatomic()
>>event->pending_disable = 1;
>>irq_work_queue() /* self-IPI */
>>
>>
>>context_switch()
>>  prepare_task_switch()
>>perf_event_task_sched_out()
>>  // the above chain that clears pending_disable
>>
>>  finish_task_switch()
>>finish_task()
>>  smp_store_release(prev->on_cpu, 0);
>>/* 
>> finally */
>>  // take woken
>>  // 
>> context_switch to @p
>>finish_lock_switch()
>>  raw_spin_unlock_irq()
>>  /* w00t, IRQs enabled, self-IPI time */
>>  
>>perf_pending_event()
>>  // event->pending_disable == 0
>>  
>>
>>
>> What you're suggesting, is that the time between:
>>
>>   smp_store_release(prev->on_cpu, 0);
>>
>> and
>>
>>   
>>
>> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
>> the event there, trigger a PMI that calls perf_event_disable_inatomic()
>> _again_ (this would mean irq_work_queue() failing, which we don't check)
>> (and schedule out again, although that's not required).
>>
>> This being virt that might actually be possible if (v)CPU-0 takes a nap
>> I suppose.
>>
>> Let me think about this a little more...
> 
> Does the below cure things? It's not exactly pretty, but it could just
> do the trick.
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dfc4bab0b02b..d496e6911442 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
>   event->pmu->del(event, 0);
>   event->oncpu = -1;
>  
> - if (event->pending_disable) {
> - event->pending_disable = 0;
> + if (event->pending_disable == smp_processor_id()) {
> + event->pending_disable = -1;
>   state = PERF_EVENT_STATE_OFF;
>   }
>   perf_event_set_state(event, state);
> @@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>  
>  void perf_event_disable_inatomic(struct perf_event *event)
>  {
> - event->pending_disable = 1;
> + event->pending_disable = smp_processor_id();
>   irq_work_queue(&event->pending);
>  }
>  
> @@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
>* and we won't recurse 'further'.
>*/
>  
> - if (event->pending_disable) {
> - event->pending_disable = 0;
> + if (event->pending_disable == smp_processor_id()) {
> + event->pending_disable = -1;
>   perf_event_disable_local(event);
>   }
>  
> @@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int 
> cpu,
>  
>  
>   init_waitqueue_head(&event->waitq);
> + event->pending_disable = -1;
>   init_irq_work(&event->pending, perf_pending_event);
>  
>   mutex_init(&event->mmap_mutex);
> 

Peter,

very good news, your fix ran over the weekend without any hit!!!

Thanks very much for your help. Do you submit this patch to the kernel mailing 
list?

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-05 Thread Thomas-Mich Richter

On 4/4/19 3:03 PM, Peter Zijlstra wrote:
> On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> 
>> That is not entirely the scenario I talked about, but *groan*.
>>
>> So what I meant was:
>>
>>  CPU-0   CPU-n
>>
>>  __schedule()
>>local_irq_disable()
>>
>>...
>>  deactivate_task(prev);
>>
>>  
>> try_to_wake_up(@p)
>>...
>>
>> smp_cond_load_acquire(&p->on_cpu, !VAL);
>>
>>
>>  ..
>>  perf_event_disable_inatomic()
>>event->pending_disable = 1;
>>irq_work_queue() /* self-IPI */
>>
>>
>>context_switch()
>>  prepare_task_switch()
>>perf_event_task_sched_out()
>>  // the above chain that clears pending_disable
>>
>>  finish_task_switch()
>>finish_task()
>>  smp_store_release(prev->on_cpu, 0);
>>/* 
>> finally */
>>  // take woken
>>  // 
>> context_switch to @p
>>finish_lock_switch()
>>  raw_spin_unlock_irq()
>>  /* w00t, IRQs enabled, self-IPI time */
>>  
>>perf_pending_event()
>>  // event->pending_disable == 0
>>  
>>
>>
>> What you're suggesting, is that the time between:
>>
>>   smp_store_release(prev->on_cpu, 0);
>>
>> and
>>
>>   
>>
>> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
>> the event there, trigger a PMI that calls perf_event_disable_inatomic()
>> _again_ (this would mean irq_work_queue() failing, which we don't check)
>> (and schedule out again, although that's not required).
>>
>> This being virt that might actually be possible if (v)CPU-0 takes a nap
>> I suppose.
>>
>> Let me think about this a little more...
> 
> Does the below cure things? It's not exactly pretty, but it could just
> do the trick.
> 

Thanks a lot for the patch, I have built a new kernel and let it run over the 
week end.

s390 does not have a PMI, all interrupts (including the measurement interrupts 
from 
the PMU) are normal, maskable interrupts.


-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-04 Thread Thomas-Mich Richter

On 4/4/19 3:03 PM, Peter Zijlstra wrote:
> On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> 
>> That is not entirely the scenario I talked about, but *groan*.
>>
>> So what I meant was:
>>
>>  CPU-0   CPU-n
>>
>>  __schedule()
>>local_irq_disable()
>>
>>...
>>  deactivate_task(prev);
>>
>>  
>> try_to_wake_up(@p)
>>...
>>
>> smp_cond_load_acquire(&p->on_cpu, !VAL);
>>
>>
>>  ..
>>  perf_event_disable_inatomic()
>>event->pending_disable = 1;
>>irq_work_queue() /* self-IPI */
>>
>>
>>context_switch()
>>  prepare_task_switch()
>>perf_event_task_sched_out()
>>  // the above chain that clears pending_disable
>>
>>  finish_task_switch()
>>finish_task()
>>  smp_store_release(prev->on_cpu, 0);
>>/* 
>> finally */
>>  // take woken
>>  // 
>> context_switch to @p
>>finish_lock_switch()
>>  raw_spin_unlock_irq()
>>  /* w00t, IRQs enabled, self-IPI time */
>>  
>>perf_pending_event()
>>  // event->pending_disable == 0
>>  
>>
>>
>> What you're suggesting, is that the time between:
>>
>>   smp_store_release(prev->on_cpu, 0);
>>
>> and
>>
>>   
>>
>> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
>> the event there, trigger a PMI that calls perf_event_disable_inatomic()
>> _again_ (this would mean irq_work_queue() failing, which we don't check)
>> (and schedule out again, although that's not required).
>>
>> This being virt that might actually be possible if (v)CPU-0 takes a nap
>> I suppose.
>>
>> Let me think about this a little more...
> 
> Does the below cure things? It's not exactly pretty, but it could just
> do the trick.
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dfc4bab0b02b..d496e6911442 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
>   event->pmu->del(event, 0);
>   event->oncpu = -1;
>  
> - if (event->pending_disable) {
> - event->pending_disable = 0;
> + if (event->pending_disable == smp_processor_id()) {
> + event->pending_disable = -1;
>   state = PERF_EVENT_STATE_OFF;
>   }
>   perf_event_set_state(event, state);
> @@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>  
>  void perf_event_disable_inatomic(struct perf_event *event)
>  {
> - event->pending_disable = 1;
> + event->pending_disable = smp_processor_id();
>   irq_work_queue(&event->pending);
>  }
>  
> @@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
>* and we won't recurse 'further'.
>*/
>  
> - if (event->pending_disable) {
> - event->pending_disable = 0;
> + if (event->pending_disable == smp_processor_id()) {
> + event->pending_disable = -1;
>   perf_event_disable_local(event);
>   }
>  
> @@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int 
> cpu,
>  
>  
>   init_waitqueue_head(&event->waitq);
> + event->pending_disable = -1;
>   init_irq_work(&event->pending, perf_pending_event);
>  
>   mutex_init(&event->mmap_mutex);
> 

Thanks for the quick reply, will give it a try...


-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-04 Thread Thomas-Mich Richter

On 4/3/19 12:41 PM, Peter Zijlstra wrote:
> On Wed, Apr 03, 2019 at 11:47:00AM +0200, Thomas-Mich Richter wrote:
>> I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:
>>
>> WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>> event_function_local.constprop.79+0xe2/0xe8
>>
>> which was introduced with
>>commit cca2094605ef ("perf/core: Fix event_function_local()").
>> ..snip 
>>
>> Any ideas or hints who to avoid/fix this warning?
> 
> Some thoughts here:
> 
>   
> https://lkml.kernel.org/r/20190213101644.gn32...@hirez.programming.kicks-ass.net
> 
> tl;dr, I've no frigging clue.
> 

I have read this thread and at the end you mentioned:

Humm, but in that case:

   context_switch()
prepare_task_switch()
  perf_event_task_sched_out()
__perf_event_task_sched_out()
  perf_event_context_sched_out()
task_ctx_sched_out()
  ctx_sched_out()
group_sched_out()
  event_sched_out()
if (event->pending_disable)

   Would have already cleared the pending_disable state, so the IPI would
   not have ran perf_event_disable_local() in the first place.

Our test system is configured to panic in WARN_ON_ONCE(). I looked
at the dump. The event triggering WARN_ON_ONCE:

crash> struct perf_event.oncpu 0x1f9b24800
  oncpu = 0
crash> struct perf_event.state 0x1f9b24800
  state = PERF_EVENT_STATE_ACTIVE
crash> 

This means the code in 
static void event_sched_out()
{

event->pmu->del(event, 0);
event->oncpu = -1;

if (event->pending_disable) {
event->pending_disable = 0;
state = PERF_EVENT_STATE_OFF;
}
perf_event_set_state(event, state);
...
}

has not finished and returned from this function. So the task was not 
completely context-switched
out from CPU 0 while the interrupt handler was executing on CPU 15:

static void perf_pending_event(...)
{

if (event->pending_disable) {
event->pending_disable = 0;
perf_event_disable_local(event);  <--- Causes the WARN_ON_ONCE()
}
.
}

I think there is a race, especially when the interrupt handler on CPU 15
was invoked via timer interrupt an runs on a different CPU.

> 
> Does it reproduce on x86 without virt on? I don't have a PPC LPAR to
> test things on.
> 

s390 LPARs run under hipervisor control, no chance to run any OS without it.

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-03 Thread Thomas-Mich Richter

On 4/3/19 12:41 PM, Peter Zijlstra wrote:
> On Wed, Apr 03, 2019 at 11:47:00AM +0200, Thomas-Mich Richter wrote:
>> I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:
>>
>> WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>> event_function_local.constprop.79+0xe2/0xe8
>>
>> which was introduced with
>>commit cca2094605ef ("perf/core: Fix event_function_local()").
>>
>> This is the WARN_ON_ONCE message, which sometimes shows up in the kernel
>> log:
>>  [ 4598.316519] WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>> event_function_local.constprop.79+0xe2/0xe8
>>  [ 4598.316524] Kernel panic - not syncing: panic_on_warn set ...
>>  [ 4598.316527] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G   OE
>> 5.1.0-20190402.rc3.git0.5e7a8ca31926.300.fc29.s390x+git #1
>>  [ 4598.316529] Hardware name: IBM 2964 NC9 712 (LPAR)
>>  [ 4598.316531] Call Trace:
>>  [ 4598.316534] ([<00112eb8>] show_stack+0x58/0x70)
>>  [ 4598.316538]  [<00a820aa>] dump_stack+0x7a/0xa8
>>  [ 4598.316541]  [<00143b52>] panic+0x11a/0x2d0
>>  [ 4598.316543]  [<001439b0>] __warn+0xf8/0x118
>>  [ 4598.316545]  [<00a811a8>] report_bug+0xd8/0x150
>>  [ 4598.316547]  [<001014ac>] do_report_trap+0xc4/0xe0
>>  [ 4598.316549]  [<00101680>] illegal_op+0x138/0x150
>>  [ 4598.316552]  [<00aa233c>] pgm_check_handler+0x1cc/0x220
>>  [ 4598.316554]  [<002a6d02>] event_function_local.constprop.79+
>> 0xe2/0xe8
>>  [ 4598.316556] ([<002a6c7a>] event_function_local.constprop.79+
>> 0x5a/0xe8)
>>  [ 4598.316559]  [<002aa400>] perf_pending_event+0x88/0xb0
>>  [ 4598.316561]  [<00271a8c>] irq_work_run_list+0x8c/0xb8
>>  [ 4598.316563]  [<00271d78>] irq_work_tick+0x48/0x68
>>  [ 4598.316566]  [<001d6058>] update_process_times+0x68/0x80
>>  [ 4598.316568]  [<001e6c10>] tick_sched_handle.isra.6+0x50/0x60
>>  [ 4598.316570]  [<001e6c7e>] tick_sched_timer+0x5e/0xb8
>>  [ 4598.316573]  [<001d6b6a>] __hrtimer_run_queues+0x10a/0x2c0
>>  [ 4598.316575]  [<001d7a88>] hrtimer_interrupt+0x138/0x2a8
>>  [ 4598.316577]  [<0010c3e4>] do_IRQ+0xac/0xb0
>>  [ 4598.316597]  [<00aa2744>] ext_int_handler+0x128/0x12c
>>  [ 4598.316600]  [<001034f6>] enabled_wait+0x46/0xd0
>>  [ 4598.316602] ([<03e000d2fe10>] 0x3e000d2fe10)
>>  [ 4598.316604]  [<00103842>] arch_cpu_idle+0x3a/0x50
>>  [ 4598.316606]  [<0017b808>] do_idle+0x180/0x1b0
>>  [ 4598.316608]  [<0017ba06>] cpu_startup_entry+0x36/0x40
>>  [ 4598.316611]  [<00115d16>] smp_init_secondary+0xd6/0xf0
>>  [ 4598.316613]  [<0011521e>] smp_start_secondary+0x86/0x98
>>
>> Any ideas or hints who to avoid/fix this warning?
> 
> Some thoughts here:
> 
>   
> https://lkml.kernel.org/r/20190213101644.gn32...@hirez.programming.kicks-ass.net
> 
> tl;dr, I've no frigging clue.
> 
>> I ran a user space program which uses
>>
>>   1. perf_event_open() system call with type HARDWARE
>>  and PERF_COUNT_HW_CPU_CYCLES in sampling mode.
>>   2. Assigns signal SIGIO to the file descriptor returned from
>>  perf_event_open() using:
>>rc = fcntl(fd, F_SETFL, O_RDWR | O_NONBLOCK | O_ASYNC);
>>rc |= fcntl(fd, F_SETSIG, signo);
>>rc |= fcntl(fd, F_SETOWN, getpid());
>>
>>   3. The signal handler increments some variables and issues
>>ioctl(fd, PERF_EVENT_IOC_REFRESH, 1);
>>  to restart signal delivery.
>>
>> I can send you the test program if needed.
> 
> Does it reproduce on x86 without virt on? I don't have a PPC LPAR to
> test things on.
> 

It happens on an s390 mainframe (not PowerPC) without KVM or z/VM 
virtualisation.
I'll try to repro it on x86.

PS: Put my team members Heiko and Hendrik on cc.
-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-03 Thread Thomas-Mich Richter

I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:

WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
event_function_local.constprop.79+0xe2/0xe8

which was introduced with
   commit cca2094605ef ("perf/core: Fix event_function_local()").

This is the WARN_ON_ONCE message, which sometimes shows up in the kernel
log:
 [ 4598.316519] WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
event_function_local.constprop.79+0xe2/0xe8
 [ 4598.316524] Kernel panic - not syncing: panic_on_warn set ...
 [ 4598.316527] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G   OE
5.1.0-20190402.rc3.git0.5e7a8ca31926.300.fc29.s390x+git #1
 [ 4598.316529] Hardware name: IBM 2964 NC9 712 (LPAR)
 [ 4598.316531] Call Trace:
 [ 4598.316534] ([<00112eb8>] show_stack+0x58/0x70)
 [ 4598.316538]  [<00a820aa>] dump_stack+0x7a/0xa8
 [ 4598.316541]  [<00143b52>] panic+0x11a/0x2d0
 [ 4598.316543]  [<001439b0>] __warn+0xf8/0x118
 [ 4598.316545]  [<00a811a8>] report_bug+0xd8/0x150
 [ 4598.316547]  [<001014ac>] do_report_trap+0xc4/0xe0
 [ 4598.316549]  [<00101680>] illegal_op+0x138/0x150
 [ 4598.316552]  [<00aa233c>] pgm_check_handler+0x1cc/0x220
 [ 4598.316554]  [<002a6d02>] event_function_local.constprop.79+
0xe2/0xe8
 [ 4598.316556] ([<002a6c7a>] event_function_local.constprop.79+
0x5a/0xe8)
 [ 4598.316559]  [<002aa400>] perf_pending_event+0x88/0xb0
 [ 4598.316561]  [<00271a8c>] irq_work_run_list+0x8c/0xb8
 [ 4598.316563]  [<00271d78>] irq_work_tick+0x48/0x68
 [ 4598.316566]  [<001d6058>] update_process_times+0x68/0x80
 [ 4598.316568]  [<001e6c10>] tick_sched_handle.isra.6+0x50/0x60
 [ 4598.316570]  [<001e6c7e>] tick_sched_timer+0x5e/0xb8
 [ 4598.316573]  [<001d6b6a>] __hrtimer_run_queues+0x10a/0x2c0
 [ 4598.316575]  [<001d7a88>] hrtimer_interrupt+0x138/0x2a8
 [ 4598.316577]  [<0010c3e4>] do_IRQ+0xac/0xb0
 [ 4598.316597]  [<00aa2744>] ext_int_handler+0x128/0x12c
 [ 4598.316600]  [<001034f6>] enabled_wait+0x46/0xd0
 [ 4598.316602] ([<03e000d2fe10>] 0x3e000d2fe10)
 [ 4598.316604]  [<00103842>] arch_cpu_idle+0x3a/0x50
 [ 4598.316606]  [<0017b808>] do_idle+0x180/0x1b0
 [ 4598.316608]  [<0017ba06>] cpu_startup_entry+0x36/0x40
 [ 4598.316611]  [<00115d16>] smp_init_secondary+0xd6/0xf0
 [ 4598.316613]  [<0011521e>] smp_start_secondary+0x86/0x98

Any ideas or hints who to avoid/fix this warning?

I ran a user space program which uses

  1. perf_event_open() system call with type HARDWARE
 and PERF_COUNT_HW_CPU_CYCLES in sampling mode.
  2. Assigns signal SIGIO to the file descriptor returned from
 perf_event_open() using:
   rc = fcntl(fd, F_SETFL, O_RDWR | O_NONBLOCK | O_ASYNC);
   rc |= fcntl(fd, F_SETSIG, signo);
   rc |= fcntl(fd, F_SETOWN, getpid());

  3. The signal handler increments some variables and issues
   ioctl(fd, PERF_EVENT_IOC_REFRESH, 1);
 to restart signal delivery.

I can send you the test program if needed.

Thanks a lot for your help.


-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCHv2 1/3] perf report: Display s390 diagnostic counter sets

2019-01-21 Thread Thomas-Mich Richter

On 01/21/2019 02:13 PM, Jiri Olsa wrote:
> On Sun, Jan 20, 2019 at 07:18:14PM +0100, Jiri Olsa wrote:
>> On Thu, Jan 17, 2019 at 11:00:53AM -0300, Arnaldo Carvalho de Melo wrote:
>>
>> SNIP
>>
>>> --- a/tools/perf/util/python-ext-sources
>>> +++ b/tools/perf/util/python-ext-sources
>>> @@ -25,6 +25,7 @@ util/parse-branch-options.c
>>>  util/rblist.c
>>>  util/counts.c
>>>  util/print_binary.c
>>> +util/s390-sample-raw.c
>>>  util/strlist.c
>>>  util/trace-event.c
>>>  ../lib/rbtree.c
>>
>> hi,
>> this change breaks the python module:
>>
>>   >>> import perf
>>   Traceback (most recent call last):
>> File "", line 1, in 
>>   ImportError: ./perf.so: undefined symbol: color_fprintf
>>
>> changelog doesn't say anything about python related change
> 
> I made some chenages and movedthat raw sample code
> under s390.. which cured the python module, but
> I haven't tested it on s390.
> 
> Could you please check if code in here works for you:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/s390
> 
> if it works, I'll post it
> 
> thanks,
> jirka
> 

I downloaded your repository and checked out branch perf/s390. 
It works nicely, you have my tested by:
-
Tested-by: Thomas Richter 

PS: This looks very similar to my version 1 submitted around Jan 11th.
Arnaldo wanted a rework to be able to get these counter values on 
non-s390 platforms, that why we came up with version 2(which unfortunately
broke the perf.so python module).

PS2: This was my first encounter to python and I played around a bit
to include the necessary modules to get rid of the undefined warning.
However this ended up in a larger list of C files to include and
stopped at modules
1. util/pmu.c which refers to symbol perf_buf_in which is a FILE pointer
   used by some yacc input file.
2. perf_config_bool() which is located in util/config.c and emits a 
   compile error when included in the util/python-ext-sources file
   for python/perf.so module.

So I think this road is a dead end.

Thanks for fixing this.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCHv2 1/3] perf report: Display s390 diagnostic counter sets

2019-01-18 Thread Thomas-Mich Richter

On 01/17/2019 03:00 PM, Arnaldo Carvalho de Melo wrote:
> erf report: Display arch specific diagnostic counter sets, starting with s390
> 
> On s390 the event bc000 (also named CF_DIAG) extracts the CPU
> Measurement Facility diagnostic counter sets and displays them as
> counter number and counter value pairs sorted by counter set number.
> 
> Output:
>  [root@s35lp76 perf]# ./perf report -D --stdio
> 
>  [] Counterset:0 Counters:6
>Counter:000 Value:0x0085ec36 Counter:001 
> Value:0x00796c94
>Counter:002 Value:0x5ada Counter:003 
> Value:0x00092460
>Counter:004 Value:0x6073 Counter:005 
> Value:0x001a9a73
>  [0x38] Counterset:1 Counters:2
>Counter:000 Value:0x0007c59f Counter:001 
> Value:0x0002fad6
>  [0x50] Counterset:2 Counters:16
>Counter:000 Value:00 Counter:001 
> Value:00
>Counter:002 Value:00 Counter:003 
> Value:00
>Counter:004 Value:00 Counter:005 
> Value:00
>Counter:006 Value:00 Counter:007 
> Value:00
>Counter:008 Value:00 Counter:009 
> Value:00
>Counter:010 Value:00 Counter:011 
> Value:00
>Counter:012 Value:00 Counter:013 
> Value:00
>Counter:014 Value:00 Counter:015 
> Value:00
>  [0xd8] Counterset:3 Counters:128
>Counter:000 Value:0x020f Counter:001 
> Value:0x01d8
>Counter:002 Value:0xd7fa Counter:003 
> Value:0x008b
>...
> 
> The number in brackets is the offset into the raw data field of the
> sample.
> 
> New functions trace_event_sample_raw__init() and s390_sample_raw() are
> introduced in the code path to enable interpretation on non s390
> platforms. This event bc000 attached raw data is generated only on s390
> platform. Correct display on other platforms requires correct endianness
> handling.
> 
> Committer notes:
> 
> Added a init function that sets up a evlist function pointer to avoid
> repeated tests on evlist->env and calls to perf_env__name() that
> involves normalizing, etc, for each PERF_RECORD_SAMPLE.
> 
> Removed needless __maybe_unused from the trace_event_raw()
> prototype in session.h, move it to be an static function in evlist.
> 
> Reviewed-by: Hendrik Brueckner 

I have applied your changed patch. Works great. Thanks a lot.

Tested-by: Thomas Richter 

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [Patch 1/4] perf report: Add function for verbose dump of raw data

2019-01-14 Thread Thomas-Mich Richter

On 01/11/2019 03:00 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 11, 2019 at 12:52:56PM +0100, Thomas Richter escreveu:
>> Add support to call an architecture dependend function to interpret
>> raw data verbatim when dumping the perf.data file with
>> option -D.
> 
> Please add "per-arch" to the summary, so that just by looking at my
> inbox main page I can see what this is about.
> 

Will do.

> Also how this will deal with processing a perf.data file generated on a
> S/390 machine in another arch? I.e.
> 
> on s/390:
> 
> perf record -a sleep 10
> 
> on a x86_64:
> 
> scp that file
> perf report -i perf.data.from.s390
> 
> ?
> 

This is then the raw data in byte format, as in 
   [root@f29 perf]# ./perf report -D \
 -i ~/perf-s390-ctrset/perf.data.s390.ctrset

0x750 [0x6e8]: event: 9
.
. ... raw event: size 1768 bytes
.  :  09 00 00 00 01 00 e8 06 1d 00 00 00 00 00 00 00  
.  0010:  00 00 00 00 00 00 00 00 34 82 00 00 34 82 00 00  4...4...
.  0020:  04 a3 d3 1e c4 01 00 00 00 00 00 00 00 00 00 00  
.  0030:  a0 0f 00 00 00 00 00 00 00 00 06 ac fe ef 00 00  
.  0040:  00 06 00 00 00 00 00 00 01 6c c8 ad 00 00 00 00  .l..
.  0050:  00 d2 cf a5 00 00 00 00 00 00 39 93 00 00 00 00  ..9.
.  0060:  00 04 0c df 00 00 00 00 00 00 2c 97 00 00 00 00  ..,.
.  0070:  00 08 82 01 fe ef 00 01 00 02 00 00 00 00 00 00  
.  06b0:  00 00 14 58 d5 80 20 9a 29 34 a5 53 d5 80 20 9a  ...X.. .)4.S.. .
.  06c0:  28 15 83 53 00 00 00 00 00 00 00 00 00 00 00 00  (..S
.  06d0:  00 00 00 00 d5 80 19 5e 46 96 8a 74 39 06 00 00  ...^F..t9...
.  06e0:  00 00 00 00 00 00 00 00  

0 1941842404100 0x750 [0x6e8]: PERF_RECORD_SAMPLE(IP, 0x1): 2/2: 0 peri>
 ... thread: facultaet:2
 .. dso: 



> report code should lookup a function for the architecture the perf.data
> was recorded on, using the perf.data file header, etc.
> 
> const char *arch_name = perf_env__arch(session->header->env);
> 
> Then lookup a table to find the right function, ok? See arch__find() for
> an example used in the annotation code.
> 
> - Arnaldo
>  

I can make this platform independent and call a function to display
the event raw data collected on a s/390 and any other platform.
I will start with s390 and x86.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf/stat: Handle different PMU names with common prefix

2018-10-29 Thread Thomas-Mich Richter

On 10/26/2018 05:04 PM, Sébastien Boisvert wrote:
> On 2018-10-23 11:16 a.m., Thomas Richter wrote:
>> On s390 the CPU Measurement Facility for counters now supports
>> 2 PMUs named cpum_cf (CPU Measurement Facility for counters) and
>> cpum_cf_diag (CPU Measurement Facility for diagnostic counters)
>> for one and the same CPU.
>>
>> Running command
>>
>>  [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
>>   -- ~/mytests/cf-tx-events 1
>>
> 
> Is tx_c_tend related to these ?
> 
>   tx-abort OR cpu/tx-abort/  [Kernel PMU event]
>   tx-capacity OR cpu/tx-capacity/[Kernel PMU event]
>   tx-commit OR cpu/tx-commit/[Kernel PMU event]
>   tx-conflict OR cpu/tx-conflict/[Kernel PMU event]
>   tx-start OR cpu/tx-start/  [Kernel PMU event]
> 
> 

Yes, these are the transaction counters on intel platforms, 
tx_c_tend is a transaction counter for s390.

>>  Measuring transactions
>>  TX_C_TABORT_NO_SPECIAL: 0 expected:0
>>  TX_C_TABORT_SPECIAL: 0 expected:0
>>  TX_C_TEND: 1 expected:1
>>  TX_NC_TABORT: 11 expected:11
>>  TX_NC_TEND: 1 expected:1
>>
>>  Performance counter stats for '/root/mytests/cf-tx-events 1':
>>
>>   2  tx_c_tend
>>
>>   0.002120091 seconds time elapsed
>>
>>   0.000121000 seconds user
>>   0.002127000 seconds sys
>>
>>  [root@s35lp76 perf]#
>>
>> displays output which is unexpected (and wrong):
>>
>>   2  tx_c_tend
>>
>> The test program definitely triggers only one transaction, as shown
>> in line 'TX_C_TEND: 1 expected:1'.
> 
> OK, something is done twice in perf stat, but should be done once.
> 

Correct

>>
>> This is caused by the following call sequence:
>>
>> pmu_lookup() scans and installs a PMU.
>> +--> pmu_aliases() parses all aliases in directory
>>  ...//events/* which are file names.
>>  +--> pmu_aliases_parse() Read each file in directory and create
>>   an new alias entry. This is done with
>>   +--> perf_pmu__new_alias() and
>> +--> __perf_pmu__new_alias() which also check for
>> identical alias names.
>>
>> After pmu_aliases() returns, a complete list of event names
>> for this pmu has been created. Now function
>>
>> pmu_add_cpu_aliases()   is called to add the events listed in the json
>> |   files to the alias list of the cpu.
>> +--> perf_pmu__find_map()  Returns a pointer to the json events.
>>
>> Now function pmu_add_cpu_aliases() scans through all events listed
>> in the JSON files for this CPU.
> 
> Are these JSON files temporary in nature ?

No they are not, they are part of the perf tool, directory

[root@s35lp76 linux]# ls tools/perf/pmu-events/arch/s390/cf_z14/*
tools/perf/pmu-events/arch/s390/cf_z14/basic.json
tools/perf/pmu-events/arch/s390/cf_z14/crypto.json
tools/perf/pmu-events/arch/s390/cf_z14/extended.json
tools/perf/pmu-events/arch/s390/cf_z14/transaction.json
[root@s35lp76 linux]# 


>> Each json event pmu name is compared with the current PMU being
>> built up and if they mismatch, the json event is added to the
>> current PMUs alias list.
>> To avoid duplicate entries the following comparison is done:
>>
>>  if (!is_arm_pmu_core(name)) {
>>   pname = pe->pmu ? pe->pmu : "cpu";
>>   if (strncmp(pname, name, strlen(pname)))
>>   continue;
>>  }
> 
> Is this the test to know whether a PMU event must be added or not ?

Correct.

> 
>>
>> The culprit is the strncmp() function.
>>
>> Using current s390 PMU naming, the first PMU is 'cpum_cf'
>> and a long list of events is added, among them 'tx_c_tend'
>>
>> When the second PMU named 'cpum_cf_diag' is added, only one event
>> named 'CF_DIAG' is added by the pmu_aliases()  function.
>>
>> Now function pmu_add_cpu_aliases() is invoked for PMU 'cpum_cf_diag'.
>> Since the CPUID string is the same for both PMUs, json file events
>> for PMU named 'cpum_cf' are added to the PMU 'cpm_cf_diag'
>>
>> This happens because the strncmp() actually compares:
>>>  strncmp("cpum_cf", "cpum_cf_diag", 6);
> 
> I fail to see how these argument values can be generated by this code:
> 
>pname = pe->pmu ? pe->pmu : "cpu";
>if (strncmp(pname, name, strlen(pname)))
> 
> The 3rd argument is the length of the first argument, pname.
> 
> With "cpum_cf", it should be 7, not 6.

Yes, mistake on my side...
but it is the same type of error , regardless if 6 or 7 characters are 
compared:

  strncmp("cpmu_cf", "cpum_cf_diag", 7) 

returns 0 when it should not. The device names of both PMUs are different.

> 
> 
> 
>>
>> The first parameter is the pmu name taken from the event in
>> the json file. The second parameter is the pmu name of the PMU
>> currently being built.
>> They are different, but the length of the compare only tests the
>> common prefix and this returns 0(true) when it should return false.
>>
> 
> The 6 comes from strlen(pname

Linux 4.19.0 Build Error when CONFIG_ACPI not set.

2018-10-29 Thread Thomas-Mich Richter

When I compile the 4.19.0 Linux kernel, I get this build error:

[root@f28 linux]# fgrep -r CONFIG_ACPI .config
# CONFIG_ACPI is not set
[root@f28 linux]#

[root@f28 linux]# make
  CALLscripts/checksyscalls.sh
  DESCEND  objtool
  CHK include/generated/compile.h
  CC  drivers/cpufreq/intel_pstate.o
drivers/cpufreq/intel_pstate.c: In function ‘show_base_frequency’:
drivers/cpufreq/intel_pstate.c:726:10: error: implicit declaration of
function ‘intel_pstate_get_cppc_guranteed’;
did you mean ‘intel_pstate_get_epp’?
[-Werror=implicit-function-declaration]
ratio = intel_pstate_get_cppc_guranteed(policy->cpu);
^~~
intel_pstate_get_epp
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:306: drivers/cpufreq/intel_pstate.o]
Error 1
make[1]: *** [scripts/Makefile.build:546: drivers/cpufreq] Error 2
make: *** [Makefile:1052: drivers] Error 2
[root@f28 linux]# 

I am building on a virtual machine.

This was introduced with
commit 86d333a8cc7f ("cpufreq: intel_pstate: Add base_frequency attribute")

The function intel_pstate_get_cppc_guranteed() is called but the function
definition is within #ifdef CONFIG_ACPI/#endif conditional compile.

Any ideas how to fix this?
Thanks

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2] perf test: S390 does not support watchpoints in test 22

2018-09-28 Thread Thomas-Mich Richter

On 09/28/2018 12:17 PM, Ravi Bangoria wrote:
> Hi Thomas,
> 
> On 09/28/2018 03:32 PM, Thomas-Mich Richter wrote:
>> I can rework the patch to use the is_supported() member function. The
>> down side is that the test does not show up in the list of executed tests 
>> anymore,
>> unless some debug output is enabled:
> 
> Which should be fine because s390 is doing that in all breakpoint related 
> tests.
> But more important thing to suggest .is_supported field is, this patch has a
> regression on x86:
> 
> Before:
>   $ sudo ./perf test 22
>   22: Watchpoint:
>   22.1: Read Only Watchpoint: Skip
>   22.2: Write Only Watchpoint   : Ok
>   22.3: Read / Write Watchpoint : Ok
>   22.4: Modify Watchpoint   : Ok
> 
> After:
>   $ git apply 1.patch
>   $ make
>   $ sudo ./perf test 22
>   22: Watchpoint:
>   22.1: Read Only Watchpoint: Skip
>   22.2: Write Only Watchpoint   : Skip
>   22.3: Read / Write Watchpoint : Skip
>   22.4: Modify Watchpoint   : Skip
> 
> Intel hw does not support read only watchpoitns. If you use skip_if_fail=true,
> all subsequent testcases are skipped as well.
> 
> Thanks,
> Ravi
> 

Ahhh    thanks a lot for the hint, will post version 3.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2] perf test: S390 does not support watchpoints in test 22

2018-09-28 Thread Thomas-Mich Richter

On 09/28/2018 10:37 AM, Ravi Bangoria wrote:
> Hi Thomas,
> 
> On 09/28/2018 01:13 PM, Thomas Richter wrote:
>> diff --git a/tools/perf/tests/builtin-test.c 
>> b/tools/perf/tests/builtin-test.c
>> index 54ca7d87236f..e83c15a95c43 100644
>> --- a/tools/perf/tests/builtin-test.c
>> +++ b/tools/perf/tests/builtin-test.c
>> @@ -124,7 +124,7 @@ static struct test generic_tests[] = {
>>  .desc = "Watchpoint",
>>  .func = test__wp,
>>  .subtest = {
>> -.skip_if_fail   = false,
>> +.skip_if_fail   = true,
> 
> Can you instead use ".is_supported"?
> 
> Thanks,
> Ravi
> 

Hi Ravi,

I can rework the patch to use the is_supported() member function. The
down side is that the test does not show up in the list of executed tests 
anymore,
unless some debug output is enabled:

[root@s8360046 perf]# ./perf test -F 22
[root@s8360046 perf]# ./perf test -Fv 22
22: Watchpoint: Disabled
[root@s8360046 perf]#

I leave it to Arnaldo to decide...

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 77/77] perf tests: Fix record+probe_libc_inet_pton.sh without ping's debuginfo

2018-09-07 Thread Thomas-Mich Richter

On 09/06/2018 12:04 AM, Arnaldo Carvalho de Melo wrote:
> rpm -e iputils-debuginfo

I have done the same on my s390 this morning:

[root@p23lp27 perf]# ./perf test ping
60: probe libc's inet_pton & backtrace it with ping   : Ok
[root@p23lp27 perf]# rpm -e iputils-debuginfo
[root@p23lp27 perf]# ./perf test ping
60: probe libc's inet_pton & backtrace it with ping   : Ok
[root@p23lp27 perf]#

Tested-by: Thomas Richter 

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf annotate: Handle arm64 move instructions

2018-08-27 Thread Thomas-Mich Richter

On 08/27/2018 10:08 PM, Kim Phillips wrote:
> Add default handler for non-jump instructions.  This really only has an
> effect on instructions that compute a PC-relative address, such as 'adrp,'
> as seen in these couple of examples:
> 
> BEFORE: adrp   x0, 2aa11000 
> AFTER:  adrp   x0, kallsyms_token_index+0xce000
> 
> BEFORE: adrp   x23, 2ae94000 <__per_cpu_load>
> AFTER:  adrp   x23, __per_cpu_load
> 
> The implementation is identical to that of s390, but with a slight
> adjustment for objdump whitespace propagation (arm64 objdump puts
> spaces after commas, whereas s390's presumably doesn't).
> 
> The mov__scnprintf() declaration is moved from s390's to arm64's
> instructions.c because arm64's gets included before s390's.
> 

Tested-by: Thomas Richter 

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf annotate: fix parsing aarch64 branch instructions after objdump update

2018-08-24 Thread Thomas-Mich Richter

On 08/24/2018 02:10 AM, Kim Phillips wrote:
> Starting with binutils 2.28, aarch64 objdump adds comments to the
> disassembly output to show the alternative names of a condition code [1].
> 
> It is assumed that commas in objdump comments could occur in other arches
> now or in the future, so this fix is arch-independent.
> 
> The fix could have been done with arm64 specific jump__parse and
> jump__scnprintf functions, but the jump__scnprintf instruction would
> have to have its comment character be a literal, since the scnprintf
> functions cannot receive a struct arch easily.
> 
> This inconvenience also applies to the generic jump__scnprintf, which
> is why we add a raw_comment pointer to struct ins_operands, so the
> __parse function assigns it to be re-used by its corresponding __scnprintf
> function.
> 
> Example differences in 'perf annotate --stdio2' output on an
> aarch64 perf.data file:
> 
> BEFORE: → b.cs   28133d1c   // b.hs, d7ecc47b
> AFTER : ↓ b.cs   18c
> 
> BEFORE: → b.cc   28d8d9cc   // b.lo, b.ul, 
> d727295b
> AFTER : ↓ b.cc   31c
> 
> The branch target labels 18c and 31c also now appear in the output:
> 
> BEFORE:addx26, x29, #0x80
> AFTER : 18c:   addx26, x29, #0x80
> 
> BEFORE:addx21, x21, #0x8
> AFTER : 31c:   addx21, x21, #0x8
> 
> The Fixes: tag below is added so stable branches will get the update; it
> doesn't necessarily mean that commit was broken at the time, rather it
> didn't withstand the aarch64 objdump update.
> 
> Tested no difference in output for sample x86_64, power arch perf.data files.

Tested,  no difference in output on s390. Just to let you know.
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 2/3] perf report: Add raw report support for s390 auxiliary trace

2018-08-08 Thread Thomas-Mich Richter

On 08/08/2018 06:42 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Aug 08, 2018 at 01:14:51PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Aug 08, 2018 at 01:08:43PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> No need for __packed.
>>  
>>> I'm removing that to avoid having this failling in compilers that have
>>> such a warning, since we default to Werror.
>>  
>>> Holler if there is something I'missing :-)
>>
>> In file included from util/cpumap.h:10,
>>  from util/s390-cpumsf.c:150:
>> util/s390-cpumsf.c: In function 's390_cpumsf_diag_show':
>> util/s390-cpumsf.c:208:10: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} 
>> [-Werror=format=]
>>pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
>>   ^~~
>> /git/linux/tools/perf/util/debug.h:20:21: note: in definition of macro 
>> 'pr_fmt'
>>  #define pr_fmt(fmt) fmt
>>  ^~~
>> util/s390-cpumsf.c:208:3: note: in expansion of macro 'pr_err'
>>pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
>>^~
>> util/s390-cpumsf.c: In function 's390_cpumsf_trailer_show':
>> util/s390-cpumsf.c:233:10: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} 
>> [-Werror=format=]
>>pr_err("Invalid AUX trace trailer entry [%#08lx]\n", pos);
> 
> So for those I applied this, seems to pass the ones that were failing,
> restarting tests...
> 
> diff --git a/tools/perf/util/s390-cpumsf.c b/tools/perf/util/s390-cpumsf.c
> index 2bcb160a08f0..d2c78ffd9fee 100644
> --- a/tools/perf/util/s390-cpumsf.c
> +++ b/tools/perf/util/s390-cpumsf.c
> @@ -187,7 +187,7 @@ static bool s390_cpumsf_basic_show(const char *color, 
> size_t pos,
>   pr_err("Invalid AUX trace basic entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] Basic   Def:%04x Inst:%#04x"
> + color_fprintf(stdout, color, "[%#08zx] Basic   Def:%04x Inst:%#04x"
> " %c%c%c%c AS:%d ASN:%#04x IA:%#018llx\n"
> "\t\tCL:%d HPP:%#018llx GPP:%#018llx\n",
> pos, basic->def, basic->U,
> @@ -205,10 +205,10 @@ static bool s390_cpumsf_diag_show(const char *color, 
> size_t pos,
> struct hws_diag_entry *diag)
>  {
>   if (diag->def < S390_CPUMSF_DIAG_DEF_FIRST) {
> - pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
> + pr_err("Invalid AUX trace diagnostic entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] DiagDef:%04x %c\n",
> + color_fprintf(stdout, color, "[%#08zx] DiagDef:%04x %c\n",
> pos, diag->def, diag->I ? 'I' : ' ');
>   return true;
>  }
> @@ -230,10 +230,10 @@ static bool s390_cpumsf_trailer_show(const char *color, 
> size_t pos,
>struct hws_trailer_entry *te)
>  {
>   if (te->bsdes != sizeof(struct hws_basic_entry)) {
> - pr_err("Invalid AUX trace trailer entry [%#08lx]\n", pos);
> + pr_err("Invalid AUX trace trailer entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] Trailer %c%c%c bsdes:%d"
> + color_fprintf(stdout, color, "[%#08zx] Trailer %c%c%c bsdes:%d"
> " dsdes:%d Overflow:%lld Time:%#llx\n"
> "\t\tC:%d TOD:%#lx 1:%#llx 2:%#llx\n",
> pos,
> @@ -418,7 +418,7 @@ static bool s390_cpumsf_make_event(size_t pos,
>   event.sample.header.misc = sample.cpumode;
>   event.sample.header.size = sizeof(struct perf_event_header);
>  
> - pr_debug4("%s pos:%#zx ip:%#lx P:%d CL:%d pid:%d.%d cpumode:%d 
> cpu:%d\n",
> + pr_debug4("%s pos:%#zx ip:%#" PRIx64 " P:%d CL:%d pid:%d.%d cpumode:%d 
> cpu:%d\n",
>__func__, pos, sample.ip, basic->P, basic->CL, sample.pid,
>sample.tid, sample.cpumode, sample.cpu);
>   if (perf_session__deliver_synth_event(sfq->sf->session, &event,
> @@ -498,7 +498,7 @@ static int s390_cpumsf_samples(struct s390_cpumsf_queue 
> *sfq, u64 *ts)
>*/
>   aux_ts = get_trailer_time(buf);
>   if (!aux_ts) {
> - pr_err("[%#08lx] Invalid AUX trailer entry TOD clock base\n",
> + pr_err("[%#08" PRIx64 "] Invalid AUX trailer entry TOD clock 
> base\n",
>  sfq->buffer->data_offset);
>   aux_ts = ~0ULL;
>   goto out;
> @@ -607,7 +607,7 @@ static int s390_cpumsf_run_decoder(struct 
> s390_cpumsf_queue *sfq,
>   buffer->use_size = buffer->size;
>   buffer->use_data = buffer->data;
>   }
> - pr_debug4("%s queue_nr:%d buffer:%ld offset:%#lx size:%#zx rest:%#zx\n",
> + pr_debug4("%s que

Re: [PATCH 2/3] perf report: Add raw report support for s390 auxiliary trace

2018-08-07 Thread Thomas-Mich Richter

On 08/08/2018 05:37 AM, m...@ellerman.id.au wrote:
> Thomas Richter  writes:
>> Add support for s390 auxiliary trace support.
>> Use 'perf record -e rbd000' to create the perf.data file.
>> The event also has the symbolic name SF_CYCLES_BASIC_DIAG,
>> using 'perf record -e SF_CYCLES_BASIC_DIAG' is equivalent.
> ...
>>
>> Signed-off-by: Thomas Richter 
>> Reviewed-by: Hendrik Brueckner 
>> ---
>>  tools/perf/util/s390-cpumsf-kernel.h |  71 
>>  tools/perf/util/s390-cpumsf.c| 244 ++-
>>  2 files changed, 314 insertions(+), 1 deletion(-)
>>  create mode 100644 tools/perf/util/s390-cpumsf-kernel.h
> 
> 
> I'm seeing a build break on ppc64le which seems to be caused by this
> commit (I haven't bisected):
> 
>   util/s390-cpumsf.c: In function ‘trailer_timestamp’:
>   util/s390-cpumsf.c:222:2: error: dereferencing type-punned pointer will 
> break strict-aliasing rules [-Werror=strict-aliasing]
> return *((unsigned long long *) &te->timestamp[te->t]);
> ^
> 
> 
> And on powerpc big endian:
>   In file included from util/cpumap.h:10:0,
>from util/s390-cpumsf.c:150:
>   util/s390-cpumsf.c: In function ‘s390_cpumsf_basic_show’:
>   util/s390-cpumsf.c:187:10: error: format ‘%lx’ expects argument of type 
> ‘long unsigned int’, but argument 4 has type ‘size_t {aka unsigned int}’ 
> [-Werror=format=]
>  pr_err("Invalid AUX trace basic entry [%#08lx]\n", pos);
> ^
> 
> cheers
> 

Can you please try this patch. Thanks a lot

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294
>From add6a709f79e20e8b4eaa6ded2bd1af043f8a235 Mon Sep 17 00:00:00 2001
From: Thomas Richter 
Date: Wed, 8 Aug 2018 07:11:23 +0100
Subject: [PATCH] perf report: Fix build error for s390 auxiliary trace support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit 73a48b7e ("perf report: Add raw report support for s390 auxiliary trace")
introduced a compile errors on powerpc:
  util/s390-cpumsf.c: In function âtrailer_timestampâ:
  util/s390-cpumsf.c:222:2: error: dereferencing type-punned pointer will
	break strict-aliasing rules [-Werror=strict-aliasing]
	return *((unsigned long long *) &te->timestamp[te->t]);
	^

  In file included from util/cpumap.h:10:0,
   from util/s390-cpumsf.c:150:
  util/s390-cpumsf.c: In function âs390_cpumsf_basic_showâ:
  util/s390-cpumsf.c:187:10: error: format â%lxâ expects argument of type
	âlong unsigned intâ, but argument 4 has type
	âsize_t {aka unsigned int}â [-Werror=format=]
 pr_err("Invalid AUX trace basic entry [%#08lx]\n", pos);
^

Fix these errors:
1. Use memcpy to extract value from character array.
2. Use %zu as format string in pr_err

Fixes: 73a48b7e
Signed-off-by: Thomas Richter 
---
 tools/perf/util/s390-cpumsf.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/s390-cpumsf.c b/tools/perf/util/s390-cpumsf.c
index 1604ddb3..2bcb160a08f0 100644
--- a/tools/perf/util/s390-cpumsf.c
+++ b/tools/perf/util/s390-cpumsf.c
@@ -184,7 +184,7 @@ static bool s390_cpumsf_basic_show(const char *color, size_t pos,
    struct hws_basic_entry *basic)
 {
 	if (basic->def != 1) {
-		pr_err("Invalid AUX trace basic entry [%#08lx]\n", pos);
+		pr_err("Invalid AUX trace basic entry [%#08zx]\n", pos);
 		return false;
 	}
 	color_fprintf(stdout, color, "[%#08x] Basic   Def:%04x Inst:%#04x"
@@ -219,7 +219,10 @@ static unsigned long long trailer_timestamp(struct hws_trailer_entry *te)
 	/* te->t set: TOD in STCKE format, bytes 8-15
 	 * to->t not set: TOD in STCK format, bytes 0-7
 	 */
-	return *((unsigned long long *) &te->timestamp[te->t]);
+	unsigned long long ts;
+
+	memcpy(&ts, &te->timestamp[te->t], sizeof(ts));
+	return ts;
 }
 
 /* Display s390 CPU measurement facility trailer entry */
-- 
2.14.3

Re: BPF relocation 'perf test' entries failing was: Re: [GIT PULL 00/27] perf/core improvements and fixes

2018-07-26 Thread Thomas-Mich Richter

On 07/26/2018 09:14 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jul 26, 2018 at 03:58:05PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Thu, Jul 26, 2018 at 11:04:08PM +0530, Sandipan Das escreveu:
>>> I came across the same problem. Does the following patch fix it?
> 
>>> https://lkml.org/lkml/2018/7/26/669
>  
>> Oh my, that one was subtle... Checking...
> 
> Right, when it said it was running:
> 
>   40.3: BPF prologue generation
> 
> In fact it was running:
> 
>   40.4: BPF relocation checker
> 
> That error message in the 'perf test -v' output:
> 
>   libbpf: Program 'func=sys_write' contains non-map related relo data 
> pointing to section 65522
> 
> Was the _expected_ one for "40.4: BPF relocation checker", then it
> failed when it couldn't run the off by one "last" subtest, duh.
> 
> Thomas, this was a problem introduced by a patch from you, just for
> reference, this one:
> 
> Fixes: 9ef0112442bd ("perf test: Fix subtest number when showing results")
> 
> I should have caught this by running 'perf test' before/after applying
> that patch, I'll now make sure this is done before sending pull reqs
> upstream.
> 
> In fact I did after, and thought, hey, some BPF related regression, I
> must've updated clang/llvm and something new appeared on the radar, so I
> went that direction and went nowhere, well, now I have an uptodate
> llvm/clang combo to play with BTF, pahole, etc 8-)
> 
> Thanks!
> 
> - Arnaldo



Sandipan,
thanks for fixing what I have messed up.

Arnaldo,
sorry for the hickup, I should have caught this too. In fact I ran
perf test before and after, but redirected the output and just 
looked at the grep'ed lines on the screen
I should be looked more carefully...

Apologies for the mess and thanks again for fixing this.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2 1/4] perf test shell: Replace '|&' with '2>&1 |' to work with more shells

2018-07-01 Thread Thomas-Mich Richter

On 06/29/2018 07:46 PM, Kim Phillips wrote:
> Since we do not specify bash (and/or zsh) as a requirement, use the
> standard error redirection that is more widely supported.
> 
> BEFORE:
> 
>  $ sudo ./perf test -v 62
>  62: Check open filename arg using perf trace + vfs_getname:
>  --- start ---
>  test child forked, pid 27305
>  ./tests/shell/trace+probe_vfs_getname.sh: 20: 
> ./tests/shell/trace+probe_vfs_getname.sh: Syntax error: "&" unexpected
>  test child finished with -2
>   end 
>  Check open filename arg using perf trace + vfs_getname: Skip
> 
> AFTER:
> 
>  $ sudo ./perf test -v 62
>  64: Check open filename arg using perf trace + vfs_getname   :
>  --- start ---
>  test child forked, pid 23008
>  Added new event:
>probe:vfs_getname(on getname_flags:72 with 
> pathname=result->name:string)
> 
>  You can now use it in all perf tools, such as:
> 
>  perf record -e probe:vfs_getname -aR sleep 1
> 
>   0.361 ( 0.008 ms): touch/23032 openat(dfd: CWD, filename: 
> /tmp/temporary_file.VEh0n, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: 
> IRUGO|IWUGO) = 4
>  test child finished with 0
>   end 
>  Check open filename arg using perf trace + vfs_getname: Ok
> 
> Similar to commit 35435cd06081, with the same title.
> 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> Cc: Namhyung Kim 
> Cc: Thomas Richter 
> Cc: Michael Petlan 
> Signed-off-by: Kim Phillips 
> ---
> v2: indent terminal session logs with a space to avoid git-am parsing
> '--- start ---' as the end of the description text.
> 
>  tools/perf/tests/shell/trace+probe_vfs_getname.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/shell/trace+probe_vfs_getname.sh 
> b/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> index 55ad9793d544..4ce276efe6b4 100755
> --- a/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> +++ b/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> @@ -17,7 +17,7 @@ skip_if_no_perf_probe || exit 2
>  file=$(mktemp /tmp/temporary_file.X)
> 
>  trace_open_vfs_getname() {
> - evts=$(echo $(perf list syscalls:sys_enter_open* |& egrep 'open(at)? ' 
> | sed -r 's/.*sys_enter_([a-z]+) +\[.*$/\1/') | sed 's/ /,/')
> + evts=$(echo $(perf list syscalls:sys_enter_open* 2>&1 | egrep 
> 'open(at)? ' | sed -r 's/.*sys_enter_([a-z]+) +\[.*$/\1/') | sed 's/ /,/')
>   perf trace -e $evts touch $file 2>&1 | \
>   egrep " +[0-9]+\.[0-9]+ +\( +[0-9]+\.[0-9]+ ms\): +touch\/[0-9]+ 
> open(at)?\((dfd: +CWD, +)?filename: +${file}, +flags: 
> CREAT\|NOCTTY\|NONBLOCK\|WRONLY, +mode: +IRUGO\|IWUGO\) += +[0-9]+$"
>  }
> 


Applied and tested for s390. You have my tested by.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 4/4] perf stat: Add transaction flag (-T) support for s390

2018-06-22 Thread Thomas-Mich Richter

On 06/22/2018 04:36 AM, Andi Kleen wrote:
> Thomas Richter  writes:
>>  
>> +/* Handle -T as -M transaction. Once platform specific metrics
>> + * support has been added to the json files, all archiectures
>> + * will use this approach.
>> + */
>> +if (!strcmp(perf_env__arch(NULL), "s390")) {
> 
> Use pmu_have_event() instead.

Done see version v2

> 
> You may need to add support for wildcard pmus to it.
> 
> -Andi
> 

Currently only one PMU on s390 does support transactions, its the PMU with 
counters
named cpum_cf.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 2/2] perf test shell: make perf inet_pton test more portable

2018-06-20 Thread Thomas-Mich Richter

On 06/20/2018 01:49 AM, Kim Phillips wrote:
> Debian based systems such as Ubuntu have dash as their default shell.
> Even if the normal or root user's shell is bash, certain scripts still
> call /bin/sh, which points to dash, so we fix this perf test by
> rewriting it in a more portable way.
> 
> BEFORE:
> 
> $ sudo ./perf test -v 64
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 31942
> ./tests/shell/record+probe_libc_inet_pton.sh: 18: 
> ./tests/shell/record+probe_libc_inet_pton.sh: expected[0]=ping[][0-9 
> \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\): not found
> ./tests/shell/record+probe_libc_inet_pton.sh: 19: 
> ./tests/shell/record+probe_libc_inet_pton.sh: 
> expected[1]=.*inet_pton\+0x[[:xdigit:]]+[[:space:]]\(/lib/x86_64-linux-gnu/libc-2.27.so|inlined\)$:
>  not found
> ./tests/shell/record+probe_libc_inet_pton.sh: 29: 
> ./tests/shell/record+probe_libc_inet_pton.sh: 
> expected[2]=getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\(/lib/x86_64-linux-gnu/libc-2.27.so\)$:
>  not found
> ./tests/shell/record+probe_libc_inet_pton.sh: 30: 
> ./tests/shell/record+probe_libc_inet_pton.sh: 
> expected[3]=.*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$: not found
> ping 31963 [004] 83577.670613: probe_libc:inet_pton: (7fe15f87f4b0)
> ./tests/shell/record+probe_libc_inet_pton.sh: 39: 
> ./tests/shell/record+probe_libc_inet_pton.sh: Bad substitution
> ./tests/shell/record+probe_libc_inet_pton.sh: 41: 
> ./tests/shell/record+probe_libc_inet_pton.sh: Bad substitution
> test child finished with -2
>  end 
> probe libc's inet_pton & backtrace it with ping: Skip
> 
> AFTER:
> 
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 32277
> ping 32295 [001] 83679.690020: probe_libc:inet_pton: (7ff244f504b0)
> 7ff244f504b0 __GI___inet_pton+0x0 (/lib/x86_64-linux-gnu/libc-2.27.so)
> 7ff244f14ce4 getaddrinfo+0x124 (/lib/x86_64-linux-gnu/libc-2.27.so)
> 556ac036b57d _init+0xb75 (/bin/ping)
> test child finished with 0
>  end 
> probe libc's inet_pton & backtrace it with ping: Ok
> 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> Cc: Namhyung Kim 
> Cc: Thomas Richter 
> Cc: Michael Petlan 
> Cc: "Hendrik Brückner" 
> Cc: Sandipan Das 
> Signed-off-by: Kim Phillips 
> ---
> Fixed using fake array technique described here:
> 
> https://www.linuxquestions.org/questions/programming-9/basic-bash-how-to-use-eval-to-evaluate-variable-names-made-of-arbitrary-strings-775622/
> 
> Note that bashisms can also be found with 'checkbashisms':
> 
> $ checkbashisms record+probe_libc_inet_pton.sh
> script record+probe_libc_inet_pton.sh does not appear to have a #! 
> interpreter line;
> you may get strange results
> possible bashism in record+probe_libc_inet_pton.sh line 18 (bash arrays, 
> H[0]):
>   expected[0]="ping[][0-9 \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\)"
> possible bashism in record+probe_libc_inet_pton.sh line 19 (bash arrays, 
> H[0]):
>   expected[1]=".*inet_pton\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$"
> possible bashism in record+probe_libc_inet_pton.sh line 23 (bash arrays, 
> H[0]):
>   
> expected[2]="gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$"
> possible bashism in record+probe_libc_inet_pton.sh line 24 (bash arrays, 
> H[0]):
>   
> expected[3]="(__GI_)?getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$"
> possible bashism in record+probe_libc_inet_pton.sh line 25 (bash arrays, 
> H[0]):
>   expected[4]="main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
> possible bashism in record+probe_libc_inet_pton.sh line 29 (bash arrays, 
> H[0]):
>   expected[2]="getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc\)$"
> possible bashism in record+probe_libc_inet_pton.sh line 30 (bash arrays, 
> H[0]):
>   expected[3]=".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
> possible bashism in record+probe_libc_inet_pton.sh line 39 (bash arrays, 
> ${name[0|*|@]}):
>   echo "$line" | egrep -q "${expected[$idx]}"
> possible bashism in record+probe_libc_inet_pton.sh line 41 (bash arrays, 
> ${name[0|*|@]}):
>   printf "FAIL: expected backtrace entry %d \"%s\" got 
> \"%s\"\n" $idx "${expected[$idx]}" "$line"
> possible bashism in record+probe_libc_inet_pton.sh line 44 (let ...):
>   let idx+=1
> possible bashism in record+probe_libc_inet_pton.sh line 45 (bash arrays, 
> ${name[0|*|@]}):
>   [ -z "${expected[$idx]}" ] && break
> 
>  .../shell/record+probe_libc_inet_pton.sh  | 23 ++-
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/record+probe_libc_inet_pton.sh 
> b/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
> index 263057039693..41124fa12913 100755
> --- a/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
> +++ b/tools/per

Re: [PATCH 2/3] perf alias: Rebuild alias expression string to make it comparable

2018-06-15 Thread Thomas-Mich Richter

On 06/15/2018 10:12 AM, Jiri Olsa wrote:
> On Thu, Jun 14, 2018 at 08:53:14AM -0500, Paul Clarke wrote:
> 
> SNIP
> 
>>> +   if (ret)
>>> +   ret += scnprintf(newval + ret, sizeof(newval) - ret,
>>> +",");
>>> +   if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM)
>>> +   ret += scnprintf(newval + ret, sizeof(newval) - ret,
>>> +"%s=%#x", term->config, term->val.num);
>>> +   else if (term->type_val == PARSE_EVENTS__TERM_TYPE_STR)
>>> +   ret += scnprintf(newval + ret, sizeof(newval) - ret,
>>> +"%s=%s", term->config, term->val.str);
>>
>> If we exceed 256, we just suddenly terminate the rebuilding without 
>> reporting any issues.
>>
>>> +   }
>>> +
>>> alias->name = strdup(name);
>>> if (dir) {
>>> /*
>>> @@ -285,7 +308,7 @@ static int __perf_pmu__new_alias(struct list_head 
>>> *list, char *dir, char *name,
>>> snprintf(alias->unit, sizeof(alias->unit), "%s", unit);
>>> }
>>> alias->per_pkg = perpkg && sscanf(perpkg, "%d", &num) == 1 && num == 1;
>>> -   alias->str = strdup(val);
>>> +   alias->str = strdup(newval);
> 
> hum, how is newval different from val? AFAICS it's the same
> 

Not really, depends on the platform, here is some debug output from s390:
root@s35lp76 perf]# ./perf stat -e tx_nc_tend -- ~/mytesttx 1 >/tmp/111

 Performance counter stats for '/root/mytesttx 1':

 1  tx_nc_tend  


   0.001050150 seconds time elapsed

[root@s35lp76 perf]# fgrep -i tx_nc_tend /tmp/111
__perf_pmu__new_alias TX_NC_TEND val:event=0x008d newval:event=0x8d
__perf_pmu__new_alias tx_nc_tend val:event=0x8d newval:event=0x8d
TX_NC_TEND  1 rc 8
[root@s35lp76 perf]# 

On s390 the events in the PMU sysfs file are printed with leading zeroes.
This means the strcmp() of alias->str differs, but the contents is logically
identical. (Same of some files contains spaces).

Thats why I do the rewrite of val into newval.

The alias name does not match too, but we use strcasecmp() to ignore case.

Hope this helps.
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 3/3] perf stat: Remove duplicate event counting

2018-06-15 Thread Thomas-Mich Richter

On 06/15/2018 10:21 AM, Jiri Olsa wrote:
> On Thu, Jun 14, 2018 at 01:48:45PM +0200, Thomas Richter wrote:
> 
> SNIP
> 
>> +static void perf_pmu_assign_str(char *name, const char *field, char 
>> **old_str,
>> +char **new_str)
>> +{
>> +if (!*old_str)
>> +goto set_new;
>> +
>> +if (*new_str) { /* Have new string, check with old */
>> +if (strcasecmp(*old_str, *new_str))
>> +pr_debug("alias %s differs in field '%s'\n",
>> + name, field);
>> +zfree(old_str);
>> +} else  /* Nothing new --> keep old string */
>> +return;
>> +set_new:
>> +*old_str = *new_str;
>> +*new_str = NULL;
>> +}
>> +
>> +static void perf_pmu_update_alias(struct perf_pmu_alias *old,
>> +  struct perf_pmu_alias *newalias)
>> +{
>> +perf_pmu_assign_str(old->name, "desc", &old->desc, &newalias->desc);
>> +perf_pmu_assign_str(old->name, "long_desc", &old->long_desc,
>> +&newalias->long_desc);
>> +perf_pmu_assign_str(old->name, "topic", &old->topic, &newalias->topic);
>> +perf_pmu_assign_str(old->name, "metric_expr", &old->metric_expr,
>> +&newalias->metric_expr);
>> +perf_pmu_assign_str(old->name, "metric_name", &old->metric_name,
>> +&newalias->metric_name);
>> +perf_pmu_assign_str(old->name, "value", &old->str, &newalias->str);
>> +old->scale = newalias->scale;
>> +old->per_pkg = newalias->per_pkg;
>> +old->snapshot = newalias->snapshot;
>> +memcpy(old->unit, newalias->unit, sizeof(old->unit));
>> +}
>> +
>> +/* Delete an alias entry. */
>> +static void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
>> +{
>> +zfree(&newalias->name);
>> +zfree(&newalias->desc);
>> +zfree(&newalias->long_desc);
>> +zfree(&newalias->topic);
>> +zfree(&newalias->str);
>> +zfree(&newalias->metric_expr);
>> +zfree(&newalias->metric_name);
>> +parse_events_terms__purge(&newalias->terms);
>> +free(newalias);
>> +}
>> +
>> +/* Merge an alias, search in alias list. If this name is already
>> + * present merge both of them to combine all information.
>> + */
>> +static bool perf_pmu_merge_alias(struct perf_pmu_alias *newalias,
>> + struct list_head *alist)
>> +{
>> +struct perf_pmu_alias *a;
>> +
>> +list_for_each_entry(a, alist, list) {
>> +if (!strcasecmp(newalias->name, a->name)) {
>> +perf_pmu_update_alias(a, newalias);
>> +perf_pmu_free_alias(newalias);
>> +return true;
>> +}
>> +}
>> +return false;
>> +}
> 
> ok, I like your change better.. we can rebuilt it to use
> rb tree later when we have this fixed
> 
> thanks,
> jirka
> 

Thanks for the review. I will resend the patch set later today when I have added
Arnaldo's comments.
After that we add your rb tree stuff.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 2/3] perf alias: Rebuild alias expression string to make it comparable

2018-06-14 Thread Thomas-Mich Richter

On 06/14/2018 03:53 PM, Paul Clarke wrote:
> On 06/14/2018 06:48 AM, Thomas Richter wrote:
>> PMU alias definitions in sysfs files may have spaces, newlines
>> and number with leading zeroes. Same alias definitions may
>> also appear in JSON files without spaces, etc.
>>
>> Scan alias definitions and remove leading zeroes, spaces,
>> newlines, etc and rebuild string to make alias->str member
>> comparable.
>>
>> s390 for example  has terms specified as
>> event=0x0091 (read from files ..//events/
>> and terms specified as event=0x91 (read from JSON files).
>>
>> Signed-off-by: Thomas Richter 
>> ---
>>  tools/perf/util/pmu.c | 25 -
>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 26c79a9c4142..da8f243743d3 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -241,9 +241,11 @@ static int __perf_pmu__new_alias(struct list_head 
>> *list, char *dir, char *name,
>>   char *metric_expr,
>>   char *metric_name)
>>  {
>> +struct parse_events_term *term;
>>  struct perf_pmu_alias *alias;
>>  int ret;
>>  int num;
>> +char newval[256];
> 
> How was 256 chosen?

I copied this from function perf_pmu__new_alias() which also uses:

char buf[256];

Looking a the sysfs file contents this seems to be sufficient.
This number comes from commit long time ago.

> 
>>
>>  alias = malloc(sizeof(*alias));
>>  if (!alias)
>> @@ -262,6 +264,27 @@ static int __perf_pmu__new_alias(struct list_head 
>> *list, char *dir, char *name,
>>  return ret;
>>  }
>>
>> +/* Scan event and remove leading zeroes, spaces, newlines, some
>> + * platforms have terms specified as
>> + * event=0x0091 (read from files ..//events/
>> + * and terms specified as event=0x91 (read from JSON files).
>> + *
>> + * Rebuild string to make alias->str member comparable.
>> + */
>> +memset(newval, 0, sizeof(newval));
>> +ret = 0;
>> +list_for_each_entry(term, &alias->terms, list) {
>> +if (ret)
>> +ret += scnprintf(newval + ret, sizeof(newval) - ret,
>> + ",");
>> +if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM)
>> +ret += scnprintf(newval + ret, sizeof(newval) - ret,
>> + "%s=%#x", term->config, term->val.num);
>> +else if (term->type_val == PARSE_EVENTS__TERM_TYPE_STR)
>> +ret += scnprintf(newval + ret, sizeof(newval) - ret,
>> + "%s=%s", term->config, term->val.str);
> 
> If we exceed 256, we just suddenly terminate the rebuilding without reporting 
> any issues.

Correct the string would be truncated, but see above, it would not have been 
read correctly
anyway.

> 
>> +}
>> +
>>  alias->name = strdup(name);
>>  if (dir) {
>>  /*
>> @@ -285,7 +308,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
>> char *dir, char *name,
>>  snprintf(alias->unit, sizeof(alias->unit), "%s", unit);
>>  }
>>  alias->per_pkg = perpkg && sscanf(perpkg, "%d", &num) == 1 && num == 1;
>> -alias->str = strdup(val);
>> +alias->str = strdup(newval);
>>
>>  list_add_tail(&alias->list, list);
>>
> 
> PC
> 

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 1/3] perf alias: Remove trailing newline when reading sysfs files

2018-06-14 Thread Thomas-Mich Richter

On 06/14/2018 03:17 PM, Paul Clarke wrote:
> On 06/14/2018 06:48 AM, Thomas Richter wrote:
>> Remove a trailing newline when reading sysfs file contents
>> such as /sys/devices/cpum_cf/events/TX_NC_TEND.
>> This show when verbose option -v is used.
>>
>> Output before:
>> tx_nc_tend -> 'cpum_cf'/'event=0x008d
>> '/
>>
>> Output after:
>> tx_nc_tend -> 'cpum_cf'/'event=0x8d'/
>>
>> Signed-off-by: Thomas Richter 
>> ---
>>  tools/perf/util/pmu.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 7878934ebb23..26c79a9c4142 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -294,7 +294,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
>> char *dir, char *name,
>>
>>  static int perf_pmu__new_alias(struct list_head *list, char *dir, char 
>> *name, FILE *file)
>>  {
>> -char buf[256];
>> +char *cp, buf[256];
>>  int ret;
>>
>>  ret = fread(buf, 1, sizeof(buf), file);
>> @@ -303,6 +303,11 @@ static int perf_pmu__new_alias(struct list_head *list, 
>> char *dir, char *name, FI
>>
>>  buf[ret] = 0;
>>
>> +/* Remove trailing newline from sysfs file */
>> +cp = strrchr(buf, '\n');
>> +if (cp)
>> +*cp = '\0';
> 
> A nit, perhaps, but this will search backwards through the entire string if a 
> newline is not found, which is the most common case, I presume.  Would it be 
> more efficient to just look at the last character?  Something like:
> i = strlen(buf);
> if (i > 0 && buf[i-1] == '\n')
>   buf[i-1] = '\0';
> 
>> +
>>  return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL, 
>> NULL,
>>   NULL, NULL, NULL);
>>  }
>>
> 
> PC
> 

Arnaldo suggested rtrim() which I will use.
I agree that your approach is a bit faster...



-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf test: Test 6 dumps core on s390

2018-06-11 Thread Thomas-Mich Richter

On 06/08/2018 04:53 PM, Kim Phillips wrote:
> On Fri, 8 Jun 2018 15:17:28 +0200
> Thomas Richter  wrote:
> 
>> Perf test case 6 "Parse event definition strings"
>> dumps core when executed on s390.
> 
> I reported it actually fails on any $ARCH system without
> Intel Processor Trace (PT) h/w:
> 
> https://www.spinics.net/lists/linux-perf-users/msg06020.html
> 
> There was a follow-up patch sent here:
> 
> https://www.spinics.net/lists/linux-perf-users/msg06029.html
> 
> which worked for me, but I don't know what has/has-not transpired
> since, other than it is still broken, as you found.
> 
> Kim


Looks like this is still broken. I also can not find this patch on
git clone -b perf/core git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

We actually need 2 patches
- one to fix the core dump
- one to test for x86 platform


-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: regression: "95cde3c59966 debugfs: inode: debugfs_create_dir uses mode permission from parent" terminally annoys libvirt

2018-06-08 Thread Thomas-Mich Richter

On 06/08/2018 09:16 AM, Mike Galbraith wrote:
> Greetings,
> 
> $subject bisected and verified via revert.  Box is garden variety
> i4790, distro is openSUSE Leap 15.0.
> 
> Error starting domain: internal error: process exited while connecting to 
> monitor: ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory
> 2018-06-08T03:18:00.453006Z qemu-system-x86_64: failed to initialize KVM: 
> Cannot allocate memory
> 
> Traceback (most recent call last):
>   File "/usr/share/virt-manager/virtManager/asyncjob.py", line 89, in 
> cb_wrapper
> callback(asyncjob, *args, **kwargs)
>   File "/usr/share/virt-manager/virtManager/asyncjob.py", line 125, in tmpcb
> callback(*args, **kwargs)
>   File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 82, in 
> newfn
> ret = fn(self, *args, **kwargs)
>   File "/usr/share/virt-manager/virtManager/domain.py", line 1508, in startup
> self._backend.create()
>   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1069, in create
> if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
> libvirt.libvirtError: internal error: process exited while connecting to 
> monitor: ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory
> 2018-06-08T03:18:00.453006Z qemu-system-x86_64: failed to initialize KVM: 
> Cannot allocate memory
> 
> 95cde3c59966f6371b6bcd9e4e2da2ba64ee9775 is the first bad commit
> commit 95cde3c59966f6371b6bcd9e4e2da2ba64ee9775
> Author: Thomas Richter 
> Date:   Fri Apr 27 14:35:47 2018 +0200
> 
> debugfs: inode: debugfs_create_dir uses mode permission from parent
> 
...

Hi,

that patch does not allocate any memory, it just sets permission on newly
created subdirectories in /sys/kernel/debug.
I do not know what ioctl(KVM_CREATE_VM) does, but maybe requires permissions
for group and others in that directory.

Best thing is to revert this patch.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf docu: Update section on cpu topology

2018-05-29 Thread Thomas-Mich Richter

On 05/28/2018 09:54 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 28, 2018 at 09:44:33AM +0200, Thomas Richter escreveu:
>> Add an explanation of each cpu's core and socket
>> identifier to the documentation.
> 
> Thanks, applying. I guess it is not that worth to mention that older
> files may have just the string lists, right?
> 
> - Arnaldo

Ah, sorry but that did not cross my mind

It was introduced with commit 
2bb00d2f95193 ("perf tools: Store the cpu socket and core ids in the perf.data 
header")
and is available since Linux 4.4

Hope this helps...

>  
>> Signed-off-by: Thomas Richter 
>> ---
>>  tools/perf/Documentation/perf.data-file-format.txt | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf.data-file-format.txt 
>> b/tools/perf/Documentation/perf.data-file-format.txt
>> index d00f0d51cab8..c57904a526ce 100644
>> --- a/tools/perf/Documentation/perf.data-file-format.txt
>> +++ b/tools/perf/Documentation/perf.data-file-format.txt
>> @@ -153,10 +153,18 @@ struct {
>>  HEADER_CPU_TOPOLOGY = 13,
>>  
>>  String lists defining the core and CPU threads topology.
>> +The string lists are followed by a variable length array
>> +which contains core_id and socket_id of each cpu.
>> +The number of entries can be determined by the size of the
>> +section minus the sizes of both string lists.
>>  
>>  struct {
>> struct perf_header_string_list cores; /* Variable length */
>> struct perf_header_string_list threads; /* Variable length */
>> +   struct {
>> +  uint32_t core_id;
>> +  uint32_t socket_id;
>> +   } cpus[nr]; /* Variable length records */
>>  };
>>  
>>  Example:
>> -- 
>> 2.14.3
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf test 39 (Session topology) dumps core on s390

2018-05-27 Thread Thomas-Mich Richter

On 05/28/2018 05:49 AM, Ravi Bangoria wrote:
> Hi Thomas,
> 
> On 05/24/2018 07:26 PM, Thomas Richter wrote:
>> @@ -95,7 +98,7 @@ int test__session_topology(struct test *test 
>> __maybe_unused, int subtest __maybe
>>  {
>>  char path[PATH_MAX];
>>  struct cpu_map *map;
>> -int ret = -1;
>> +int ret;
> 
> This is failing for me:
> 
> tests/topology.c: In function ‘test__session_topology’:
> tests/topology.c:101:6: error: ‘ret’ may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>   int ret;
>   ^
> cc1: all warnings being treated as errors
>   CC   util/intlist.o
> mv: cannot stat 'tests/.topology.o.tmp': No such file or directory
> /home/ravi/Workspace/linux/tools/build/Makefile.build:96: recipe for target 
> 'tests/topology.o' failed
> make[4]: *** [tests/topology.o] Error 1
> make[4]: *** Waiting for unfinished jobs
> 
> 
> Thanks,
> Ravi
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Thanks for the pointer, will provide V2.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-21 Thread Thomas-Mich Richter

On 05/18/2018 04:14 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 18, 2018 at 01:09:48PM +0200, Thomas-Mich Richter escreveu:
>> On 05/18/2018 12:29 PM, Sandipan Das wrote:
>>> Can you please apply these two patches as well and then re-test?
> 
>>> [1] https://lkml.org/lkml/2018/5/17/112
>>> [2] https://lkml.org/lkml/2018/5/17/113
> 
>> Ahhh, yes that helped. Must have missed it. Thanks for the pointer.
> 
> Cool, can I take that as a Tested-by: Thomas?
> 
> - Arnaldo
> 
Sure

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Thomas-Mich Richter

On 05/18/2018 12:29 PM, Sandipan Das wrote:
> Hi Thomas,
> 
> On 05/18/2018 03:51 PM, Thomas-Mich Richter wrote:
> [...]
>>
>> This patch fails on s390. I used 4.17.0rc5 + fedora 27 and I get this output:
>>
>>
>> [root@p23lp27 perf]# ./perf test 59
>> 59: probe libc's inet_pton & backtrace it with ping   : Ok
>> [root@p23lp27 linux]# cd ~/linux; patch -p1 < ../inet_pton1 
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file tools/perf/tests/shell/record+probe_libc_inet_pton.sh
>> [root@p23lp27 linux]# cd -; 
>> [root@p23lp27 perf]# ./perf test 59
>> 59: probe libc's inet_pton & backtrace it with ping   : FAILED!
>> [root@p23lp27 perf]#
>>
>> Debugging revealed this line as cause of failure:
>>
>>
>> FAIL: expected backtrace entry 2 
>> "gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$"
>>  got "fdcb1 gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)"
>>
>> Here is the output of the trace file
>>
>> [root@p23lp27 perf]# perf script 
>> ping 87291 [001] 96936.231618: probe_libc:inet_pton: (3ff96342378)
>>   142378 __inet_pton (inlined)
>>fdcb1 gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
>>   100189 __GI_getaddrinfo (inlined)
>> 398d main (/usr/bin/ping)
>>2303d __libc_start_main (/usr/lib64/libc-2.26.so)
>> 457b [unknown] (/usr/bin/ping)
>>
>> [root@p23lp27 perf]# 
>>
>> Hope this helps
>>
> 
> Can you please apply these two patches as well and then re-test?
> 
> [1] https://lkml.org/lkml/2018/5/17/112
> [2] https://lkml.org/lkml/2018/5/17/113
> 
> - Sandipan
> 

Ahhh, yes that helped. Must have missed it. Thanks for the pointer.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Thomas-Mich Richter

On 05/18/2018 09:24 AM, Sandipan Das wrote:
> This test currently fails because the regular expressions for
> matching the output of perf script do not consider the symbol
> offsets to be part of the output.
> 
> The symbol offsets are seen because of the default behaviour
> introduced by commit 4140d2ea74b3 ("perf script: Show symbol
> offsets by default").
> 
> Before applying this patch:
> 
>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> 
>   62: probe libc's inet_pton & backtrace it with ping   :
>   --- start ---
>   test child forked, pid 30389
>   ping 30406 [002] 307144.280983: probe_libc:inet_pton: (7f4117adf220)
>   7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>   FAIL: expected backtrace entry 1 
> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.25.so|inlined\)$" got 
> "7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)"
>   test child finished with -1
>    end 
>   probe libc's inet_pton & backtrace it with ping: FAILED!
> 
> After applying this patch:
> 
>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> 
>   62: probe libc's inet_pton & backtrace it with ping   :
>   --- start ---
>   test child forked, pid 30539
>   ping 30556 [003] 307254.313217: probe_libc:inet_pton: (7fe19ab10220)
>   7fe19ab10220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>   7fe19aad5ebd getaddrinfo+0x11d (/usr/lib64/libc-2.25.so)
>   56351e3c1c71 main+0x891 (/usr/bin/ping)
>   test child finished with 0
>    end 
>   probe libc's inet_pton & backtrace it with ping: Ok
> 
> Signed-off-by: Sandipan Das 
> ---
>  tools/perf/tests/shell/record+probe_libc_inet_pton.sh | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/record+probe_libc_inet_pton.sh 
> b/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
> index ee86473643be..650b208f700f 100755
> --- a/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
> +++ b/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
> @@ -16,18 +16,18 @@ nm -g $libc 2>/dev/null | fgrep -q inet_pton || exit 254
>  trace_libc_inet_pton_backtrace() {
>   idx=0
>   expected[0]="ping[][0-9 \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\)"
> - expected[1]=".*inet_pton[[:space:]]\($libc|inlined\)$"
> + expected[1]=".*inet_pton\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$"
>   case "$(uname -m)" in
>   s390x)
>   eventattr='call-graph=dwarf,max-stack=4'
> - expected[2]="gaih_inet.*[[:space:]]\($libc|inlined\)$"
> - expected[3]="(__GI_)?getaddrinfo[[:space:]]\($libc|inlined\)$"
> - expected[4]="main[[:space:]]\(.*/bin/ping.*\)$"
> + 
> expected[2]="gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$"
> + 
> expected[3]="(__GI_)?getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$"
> + expected[4]="main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
>   ;;
>   *)
>   eventattr='max-stack=3'
> - expected[2]="getaddrinfo[[:space:]]\($libc\)$"
> - expected[3]=".*\(.*/bin/ping.*\)$"
> + expected[2]="getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc\)$"
> + expected[3]=".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
>   ;;
>   esac
>  
> 

This patch fails on s390. I used 4.17.0rc5 + fedora 27 and I get this output:


[root@p23lp27 perf]# ./perf test 59
59: probe libc's inet_pton & backtrace it with ping   : Ok
[root@p23lp27 linux]# cd ~/linux; patch -p1 < ../inet_pton1 
(Stripping trailing CRs from patch; use --binary to disable.)
patching file tools/perf/tests/shell/record+probe_libc_inet_pton.sh
[root@p23lp27 linux]# cd -; 
[root@p23lp27 perf]# ./perf test 59
59: probe libc's inet_pton & backtrace it with ping   : FAILED!
[root@p23lp27 perf]#

Debugging revealed this line as cause of failure:


FAIL: expected backtrace entry 2 
"gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" 
got "fdcb1 gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)"

Here is the output of the trace file

[root@p23lp27 perf]# perf script 
ping 87291 [001] 96936.231618: probe_libc:inet_pton: (3ff96342378)
  142378 __inet_pton (inlined)
   fdcb1 gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
  100189 __GI_getaddrinfo (inlined)
398d main (/usr/bin/ping)
   2303d __libc_start_main (/usr/lib64/libc-2.26.so)
457b [unknown] (/usr/bin/ping)

[root@p23lp27 perf]# 

Hope this helps
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v3] module: Fix display of wrong module .text address

2018-05-02 Thread Thomas-Mich Richter

On 05/02/2018 04:20 AM, Kees Cook wrote:
> On Wed, Apr 18, 2018 at 12:14 AM, Thomas Richter  
> wrote:
>> Reading file /proc/modules shows the correct address:
>> [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
>> qeth_l2 94208 1 - Live 0x03ff80401000
>>
>> and reading file /sys/module/qeth_l2/sections/.text
>> [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
>> 0x18ea8363
>> displays a random address.
>>
>> This breaks the perf tool which uses this address on s390
>> to calculate start of .text section in memory.
>>
>> Fix this by printing the correct (unhashed) address.
>>
>> Thanks to Jessica Yu for helping on this.
>>
>> Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()' when not 
>> restricting")
>> Cc:  # v4.15+
>> Suggested-by: Linus Torvalds 
>> Signed-off-by: Thomas Richter 
>> Cc: Jessica Yu 
>> ---
>>  kernel/module.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index a6e43a5806a1..40b42000bd80 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -1472,7 +1472,8 @@ static ssize_t module_sect_show(struct 
>> module_attribute *mattr,
>>  {
>> struct module_sect_attr *sattr =
>> container_of(mattr, struct module_sect_attr, mattr);
>> -   return sprintf(buf, "0x%pK\n", (void *)sattr->address);
>> +   return sprintf(buf, "0x%px\n", kptr_restrict < 2 ?
>> +  (void *)sattr->address : NULL);
> 
> Errr... this looks reversed to me.
> 
> I would expect: "kptr_restrict < 2 ? NULL : (void *)sattr->address"
> 
> -Kees
> 

I am confused:
In my patch, if kptr_restrict == 2 it prints NULL, which kptr_restrict 
being 0 or 1 it prints the address.

In your comment if kptr_restrict == 2 it prints the address, which 
kptr_restrict being 0 or 1 it prints NULL.

Looking into Documentation/sysctl/kernel.txt:
  When kptr_restrict is set to (2), kernel pointers printed using
  %pK will be replaced with 0's regardless of privileges.

With my patch, setting kptr_restrict to 0 or 1
prints the real kernel address (format %px, unmodified address
according to Documentation/printk-formats.txt).

I have tested this on s390 (which is the only arch using file
/sys/module//sections/.text) in the perf tool.

root@s8360047 ~]# sysctl  kernel.kptr_restrict
kernel.kptr_restrict = 0
[root@s8360047 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 102400 1 - Live 0x03ff8034d000
[root@s8360047 ~]# cat /sys/module/qeth_l2/sections/.text 
0x03ff8034da68
[root@s8360047 ~]# sysctl  -w kernel.kptr_restrict=2
kernel.kptr_restrict = 2
[root@s8360047 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 102400 1 - Live 0x
[root@s8360047 ~]# cat /sys/module/qeth_l2/sections/.text 
0x
[root@s8360047 ~]# uname -a
Linux s8360047 4.17.0-rc3m-perf+ #6 SMP PREEMPT Wed May 2 10:02:38 CEST 2018 
s390x s390x s390x GNU/Linux
[root@s8360047 ~]# 

Hope this helps.
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2] inode: debugfs_create_dir uses mode permission from parent

2018-05-02 Thread Thomas-Mich Richter

On 04/27/2018 04:58 PM, Kees Cook wrote:
> On Fri, Apr 27, 2018 at 6:49 AM, Greg KH  wrote:
>> I'm going to add Kees and the kernel-hardning list here, as I'd like
>> their opinions for the patch below.
>>
>> Kees, do you have any problems with this patch?  I know you worked on
>> making debugfs more "secure" from non-root users, this should still keep
>> the intial mount permissions all fine, right?  Anything I'm not
>> considering here?
> 
> This appears correct to me. I'd like to see some stronger rationale
> for why this is needed, just so I have a "design" to compare the
> implementation against. :)
> 
> Normally, the top-level directory permissions should block all the
> subdirectories too. The only time I think of this being needed is if
> someone is explicitly bind-mounting a subdirectory to another location
> (e.g. Chrome OS does this for the i915 subdirectory). In that case,
> I'd expect them to tweak permissions too. Thomas, what's your
> use-case?
> 
> -Kees
> 

There is no 'real use case'. I wrote the patch because of discussions
regarding file permissions for files located deeply in the
directory tree, for example

  -r--r--r-- 1 root root 0 Apr 27 14:23 /sys/kernel/debug/kprobes/blacklist

which gives the impression it is world readable.
This happened to me in recent discussions when I wrote patches to fix some
of the address randomized output of /sys files which broke the perf tool.

During discussion people often forgot that the root /sys/kernel/debug is rwx for
root only and blocks non root access to subdirectories and files. They simply
looked at the file permissions.

I have not thougth about the bind-mount case nor did I test this scenario.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] inode: debugfs_create_dir uses mode permission from parent

2018-04-27 Thread Thomas-Mich Richter

On 04/27/2018 12:06 PM, Greg KH wrote:
> On Fri, Apr 27, 2018 at 11:14:26AM +0200, Thomas-Mich Richter wrote:
>> On 04/27/2018 10:27 AM, Greg KH wrote:
>>> On Fri, Apr 27, 2018 at 10:07:12AM +0200, Thomas Richter wrote:
>>>> Currently function debugfs_create_dir() creates a new
>>>> directory in the debugfs (usually mounted /sys/kernel/debug)
>>>> with permission rwxr-xr-x. This is hard coded.
>>>>
>>>> Change this to use the parent directory permission.
>>>>
>>>> Fixes: edac65eaf8d5c ("debugfs: take mode-dependent parts of 
>>>> debugfs_get_inode() into callers")
>>>> Signed-off-by: Thomas Richter 
>>>> Cc: Greg Kroah-Hartman 
>>>> ---
>>>>  fs/debugfs/inode.c | 5 -
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
>>>> index 13b01351dd1c..80618330d86a 100644
>>>> --- a/fs/debugfs/inode.c
>>>> +++ b/fs/debugfs/inode.c
>>>> @@ -512,7 +512,10 @@ struct dentry *debugfs_create_dir(const char *name, 
>>>> struct dentry *parent)
>>>>if (unlikely(!inode))
>>>>return failed_creating(dentry);
>>>>  
>>>> -  inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
>>>> +  if(!parent)
>>>> +  parent = debugfs_mount->mnt_root;
>>>> +  inode->i_mode = S_IFDIR | (d_inode(parent)->i_mode
>>>> + & (S_IRWXU | S_IRWXG));
>>>>inode->i_op = &simple_dir_inode_operations;
>>>>inode->i_fop = &simple_dir_operations;
>>>>  
>>>
>>> This looks ok, but is it going to change the permissions of existing
>>> stuff in ways that might breaks things, right?
>>
>> Right, but debugfs is usually mounted on /sys/kernel/debug with
>> permissions rwx to root owner. It can be changed after the mount, of course.
>> Unless this is done, the directory permissions for /sys/kernel/debug
>> will stop any descend regardless  of the subdirectory permissions.
>>
>>>
>>> Have you done a before/after comparison?
>>
>> I have tested this patch on my Linux 4.17.0rc2 kernel on s390.
>> That worked well, I have not tested other systems.
> 
> What do you mean by "worked well"?  What were the full tree differences
> between before and after?  You should be able to get this by using:
>   tree -dp /sys/kernel/debug/
> and then doing a diff on the two files.
> 
> thanks,
> 
> greg k-h
> 

Ok, this is the tree output

Before the patch:
root@s8360047 ~]# tree -dp -L 1 /sys/kernel/debug/ 
/sys/kernel/debug/
├── [drwxr-xr-x]  bdi
├── [drwxr-xr-x]  block
├── [drwxr-xr-x]  dasd
├── [drwxr-xr-x]  device_component
├── [drwxr-xr-x]  extfrag
├── [drwxr-xr-x]  hid
├── [drwxr-xr-x]  kprobes
├── [drwxr-xr-x]  kvm
├── [drwxr-xr-x]  memblock
├── [drwxr-xr-x]  pm_qos
├── [drwxr-xr-x]  qdio
├── [drwxr-xr-x]  s390
├── [drwxr-xr-x]  s390dbf
└── [drwx--]  tracing

14 directories

After the patch:
[root@s8360047 ~]# tree -dp -L 1 /sys/kernel/debug/
sys/kernel/debug/
├── [drwx--]  bdi
├── [drwx--]  block
├── [drwx--]  dasd
├── [drwx--]  device_component
├── [drwx--]  extfrag
├── [drwx--]  hid
├── [drwx--]  kprobes
├── [drwx--]  kvm
├── [drwx--]  memblock
├── [drwx--]  pm_qos
├── [drwx--]  qdio
├── [drwx--]  s390
├── [drwx--]  s390dbf
└── [drwx--]  tracing

14 directories
[root@s8360047 ~]#

I attached the diff of the full tree before and after the patch.


-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294
2,52c2,52
< âââ [drwxr-xr-x]  bdi
< âÂ Â  âââ [drwxr-xr-x]  1:0
< âÂ Â  âââ [drwxr-xr-x]  1:1
< âÂ Â  âââ [drwxr-xr-x]  1:10
< âÂ Â  âââ [drwxr-xr-x]  1:11
< âÂ Â  âââ [drwxr-xr-x]  1:12
< âÂ Â  âââ [drwxr-xr-x]  1:13
< âÂ Â  âââ [drwxr-xr-x]  1:14
< âÂ Â  âââ [drwxr-xr-x]  1:15
< âÂ Â  âââ [drwxr-xr-x]  1:2
< âÂ Â  âââ [drwxr-xr-x]  1:3
< âÂ Â  âââ [drwxr-xr-x]  1:4
< âÂ Â  âââ [drwxr-xr-x]  1:5
< âÂ Â  âââ [drwxr-xr-x]  1:6
< âÂ Â  âââ [drwxr-xr-x]  1:7
< âÂ Â  âââ [drwxr-xr-x]  1:8
< âÂ Â  âââ [drwxr-xr-x]  1:9
< âÂ Â  âââ [drwxr-xr-x]  94:0
< âââ [drwxr-xr-x]  block
< âââ [drwxr-xr-x]  dasd
< âÂ Â  âââ [drwx

Re: [PATCH] inode: debugfs_create_dir uses mode permission from parent

2018-04-27 Thread Thomas-Mich Richter

On 04/27/2018 10:27 AM, Greg KH wrote:
> On Fri, Apr 27, 2018 at 10:07:12AM +0200, Thomas Richter wrote:
>> Currently function debugfs_create_dir() creates a new
>> directory in the debugfs (usually mounted /sys/kernel/debug)
>> with permission rwxr-xr-x. This is hard coded.
>>
>> Change this to use the parent directory permission.
>>
>> Fixes: edac65eaf8d5c ("debugfs: take mode-dependent parts of 
>> debugfs_get_inode() into callers")
>> Signed-off-by: Thomas Richter 
>> Cc: Greg Kroah-Hartman 
>> ---
>>  fs/debugfs/inode.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
>> index 13b01351dd1c..80618330d86a 100644
>> --- a/fs/debugfs/inode.c
>> +++ b/fs/debugfs/inode.c
>> @@ -512,7 +512,10 @@ struct dentry *debugfs_create_dir(const char *name, 
>> struct dentry *parent)
>>  if (unlikely(!inode))
>>  return failed_creating(dentry);
>>  
>> -inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
>> +if(!parent)
>> +parent = debugfs_mount->mnt_root;
>> +inode->i_mode = S_IFDIR | (d_inode(parent)->i_mode
>> +   & (S_IRWXU | S_IRWXG));
>>  inode->i_op = &simple_dir_inode_operations;
>>  inode->i_fop = &simple_dir_operations;
>>  
> 
> This looks ok, but is it going to change the permissions of existing
> stuff in ways that might breaks things, right?

Right, but debugfs is usually mounted on /sys/kernel/debug with
permissions rwx to root owner. It can be changed after the mount, of course.
Unless this is done, the directory permissions for /sys/kernel/debug
will stop any descend regardless  of the subdirectory permissions.

> 
> Have you done a before/after comparison?

I have tested this patch on my Linux 4.17.0rc2 kernel on s390.
That worked well, I have not tested other systems.


-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf test: Adapt test case record+probe_libc_inet_pton.sh for s390

2018-04-27 Thread Thomas-Mich Richter

On 04/26/2018 05:26 PM, Martin Vuille wrote:
> 
> 
> On 04/26/18 04:09, Thomas-Mich Richter wrote:
>> was different. With you patch it changed from /usr/lib64/libc.so (old) to
>> /usr/lib/debug/lib64/libc-2.26.so.debug (new)
>>
> Thomas,
> 
> Can you tell me what 'file' reports for the old and new files?
> 
> Regards,
> MV
> 

Martin,

not sure I understand your question:

Before your patch the file which was picked by dwarf was
/usr/lib64/libc.so

With your patch file which was picked by dwarf was 
/usr/lib/debug/lib64/libc-2.26.so.debug

Was this your question?
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf test: Adapt test case record+probe_libc_inet_pton.sh for s390

2018-04-26 Thread Thomas-Mich Richter

On 04/25/2018 04:19 PM, Martin Vuille wrote:
> Apologies for any problems my patch may be causing.
> 
> I'm unclear on what is the proposed fix, other than reverting the commit.
> 
> In the problem scenario, is a --symfs option used? Is the debug info being 
> obtained from the symfs directory?
> 
> Unfortunately, I've had to change my focus for the time being, so won't be 
> able to investigate further for a while.
> 
> Arnaldo, I'm fine with you reverting this change for now.
> 
> Regards,
> 
> MV
> 
> 

Martin,

there is no need to revert the patch. I have adopted the test case for s390 and 
it
now stops at main, which is identical to x86.

The --symfs option was not used, however the debug file selected for dwarf 
examination
was different. With you patch it changed from /usr/lib64/libc.so (old) to
/usr/lib/debug/lib64/libc-2.26.so.debug (new)
and I think that this file has not all the info (at least on s390).

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: s390 perf events JSONs query

2018-04-20 Thread Thomas-Mich Richter

On 04/20/2018 12:51 PM, John Garry wrote:
> Hi Hendrik, Thomas,
> 
> I noticed that in 4.17-rc1 support was included for s390 perf pmu-events. I 
> also notice that the JSONs contain many common (identical actually) events 
> between different chips for this arch.
> 
> Support was added for factoring out common arch events in 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/tools/perf/pmu-events?h=next-20180420&id=e9d32c1bf0cd7a98358ec4aa1625bf2b3459b9ac
> 
> ARM64 chips use this feature. I am not familiar with the s390 arch, but do 
> you think you could also use this feature?
> 
> Thanks,
> John
> 

I have just played with this feature. I was caught off by this error message:

[root@s35lp76 pmu-events]# ./jevents s390 arch /tmp/xxx 10
d 04096 s390 arch/s390
d 14096 cf_z14   arch/s390/cf_z14
f 21338 basic.json   arch/s390/cf_z14/basic.json

jevents: Ignoring file arch/s390/archevent.json  < confusing error message

jevents: Processing mapfile arch/s390/mapfile.csv
[root@s35lp76 pmu-events]# 

I started debugging, until I realized this file is still processed.
(Just a side remark).

Anyway the features is nice, but it does not save anything in the resulting
pmu-events.c file, correct? The events defined in the common archevent.json
files are just copied into the structures of a specific machine.

The feature saves time and space when you create the machine specific json
files because it allows you to refer to a common event by name. Cool!

On s390 we do not create the json files manually, but have some scripts to
create them based on s390 type/model hardware specific input files.

@Hendirk,
we could rework our internal tool chain to emit the new "ArchStdEvent"
keyword for common events, but in the end we do not save anything in the
resulting pmu-events.c file. And it requires considerable rework to 
support it.
Given that, I would put it very low priority on your todo list, comments?

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: s390 perf events JSONs query

2018-04-20 Thread Thomas-Mich Richter

On 04/20/2018 12:51 PM, John Garry wrote:
> Hi Hendrik, Thomas,
> 
> I noticed that in 4.17-rc1 support was included for s390 perf pmu-events. I 
> also notice that the JSONs contain many common (identical actually) events 
> between different chips for this arch.
> 
> Support was added for factoring out common arch events in 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/tools/perf/pmu-events?h=next-20180420&id=e9d32c1bf0cd7a98358ec4aa1625bf2b3459b9ac
> 
> ARM64 chips use this feature. I am not familiar with the s390 arch, but do 
> you think you could also use this feature?
> 
> Thanks,
> John
> 

Thanks John,

for bringing this to my attention. Yes I will definitely look into this feature 
and will try to
rework our s390 json files when this seems beneficial.

I did not notice this patch on the linux-perf-users mailing list. Was it there 
for review?
I do ask this because recently I have nearly no traffic on this list in my 
reader, so I wonder
if there is something wrong with my mailing list subscription or setup.

Thanks.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2] perf list: Add s390 support for detailed/verbose pmu event description

2018-04-18 Thread Thomas-Mich Richter

On 04/16/2018 04:43 PM, Mark Rutland wrote:
> On Mon, Apr 16, 2018 at 03:23:14PM +0200, Thomas Richter wrote:
>> From: Thomas Richter 
>>
>> Perf list with flags -d and -v print a description (-d) or
>> a very verbose explanation (-v) of CPU specific counter events.
>> These descriptions are provided with the json files in
>> directory pmu-events/arch/s390/*.json.
>>
>> Display of these descriptions on s390 requires the
>> corresponding json files.
>>
>> On s390 this does not work because function is_pmu_core()
>> does not detect the s390 directory name where the
>> CPU specific events are listed. On x86 it is
>>   /sys/bus/event_source/devices/cpu
>> whereas on s390 it is
>>   /sys/bus/event_source/devices/cpum_cf
>>   /sys/bus/event_source/devices/cpum_sf
>>
>> Fix this by adding s390 directory name testing to
>> function is_pmu_core(). This is the same approach as taken for
>> arm platform.
> 
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index d5bf15ca..8675ddf558c6 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -562,6 +562,12 @@ static int is_pmu_core(const char *name)
>>  if (stat(path, &st) == 0)
>>  return 1;
>>  
>> +/* Look for cpu sysfs (specific to s390) */
>> +scnprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s",
>> +  sysfs, name);
>> +if (stat(path, &st) == 0 && !strncmp(name, "cpum_", 5))
>> +return 1;
>> +
> 
> This shouldn't adversely affect ARM, so FWIW:
> 
> Acked-by: Mark Rutland 
> 

Arnaldo,

with Mark Acked-by can you please apply the patch.

Thanks a lot.
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v3] module: Fix display of wrong module .text address

2018-04-18 Thread Thomas-Mich Richter

On 04/18/2018 09:17 AM, Tobin C. Harding wrote:
> On Wed, Apr 18, 2018 at 09:14:36AM +0200, Thomas Richter wrote:
>> Reading file /proc/modules shows the correct address:
>> [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
>> qeth_l2 94208 1 - Live 0x03ff80401000
>>
>> and reading file /sys/module/qeth_l2/sections/.text
>> [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
>> 0x18ea8363
>> displays a random address.
>>
>> This breaks the perf tool which uses this address on s390
>> to calculate start of .text section in memory.
>>
>> Fix this by printing the correct (unhashed) address.
>>
>> Thanks to Jessica Yu for helping on this.
>>
>> Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()' when not 
>> restricting")
>> Cc:  # v4.15+
>> Suggested-by: Linus Torvalds 
>> Signed-off-by: Thomas Richter 
>> Cc: Jessica Yu 
>> ---
> 
> What's changed in each version please?
> 
> 
> thanks,
> Tobin.
> 

V2: Changed sprintf format string from %#lx to 0x%px (suggested by Kees Cook).
V3: Changed sprintf agrument from 0 to NULL to avoid sparse warning.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: Wrong module .text address in 4.16.0

2018-04-16 Thread Thomas-Mich Richter

On 04/16/2018 08:23 AM, Christian Borntraeger wrote:
> FWIW, this breaks at least perf capability to resolve module symbols.
> Adding some more CCs for perf and module.
> 
> 
> On 04/16/2018 07:51 AM, Thomas-Mich Richter wrote:
>> I just installed 4.16.0 and discovered the module .text address is
>> wrong. It happens on s390 and x86 platforms. I have not tested others.
>>
>> Here is the issue, I have used module qeth_l2 on s390 which is the
>> ethernet device driver:
>>
>> root@s35lp76 ~]# lsmod
>> Module  Size  Used by
>> qeth_l294208  1
>> ...
>>
>> [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
>> qeth_l2 94208 1 - Live 0x03ff80401000   < This is the correct 
>> address in memory
>> [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text 
>> 0x18ea8363  < This is the wrong address
>> [root@s35lp76 ~]# 
>>
>> File /sys/module/qeth_l2/sections/.text displays a very strange
>> address which is definitely wrong. It should be something like
>> 0x03ff80401xxx.
>>
>> Same on x86.
>>
>> I have checked file kernel/module.c function add_sect_attrs()
>> and it calls module_sect_show() when the sysfs file is read.
>> And module_sect_show() uses 
>>
>>   sprintf(buf, "0x%pK\n", (void *)sattr->address);
>>
>> and my sysctl setting should be correct:
>> [root@s35lp76 linux]# sysctl -a | fgrep kernel.kptr_restrict
>> kernel.kptr_restrict = 0
>> [root@s35lp76 linux]#
>>
>> I wonder if somebody else has seen this issue?
>> Ideas how to fix this?
>>
>> Thanks
>>

This new behavior actually break perf report/record/top on s390 for
module address resolution. On s390 each module .text segment actually
start at some offset after the module load addess shown with
/proc/modules. That is the reason why we need 
/sys/module//sections/.text to get the actual
.text segment start address.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Wrong module .text address in 4.16.0

2018-04-15 Thread Thomas-Mich Richter

I just installed 4.16.0 and discovered the module .text address is
wrong. It happens on s390 and x86 platforms. I have not tested others.

Here is the issue, I have used module qeth_l2 on s390 which is the
ethernet device driver:

root@s35lp76 ~]# lsmod
Module  Size  Used by
qeth_l294208  1
...

[root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 94208 1 - Live 0x03ff80401000   < This is the correct address 
in memory
[root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text 
0x18ea8363  < This is the wrong address
[root@s35lp76 ~]# 

File /sys/module/qeth_l2/sections/.text displays a very strange
address which is definitely wrong. It should be something like
0x03ff80401xxx.

Same on x86.

I have checked file kernel/module.c function add_sect_attrs()
and it calls module_sect_show() when the sysfs file is read.
And module_sect_show() uses 

  sprintf(buf, "0x%pK\n", (void *)sattr->address);

and my sysctl setting should be correct:
[root@s35lp76 linux]# sysctl -a | fgrep kernel.kptr_restrict
kernel.kptr_restrict = 0
[root@s35lp76 linux]#

I wonder if somebody else has seen this issue?
Ideas how to fix this?

Thanks
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf record: Remove unnecessary warning for missing sysfs entry

2018-04-12 Thread Thomas-Mich Richter

On 04/12/2018 03:05 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Apr 12, 2018 at 01:47:23PM +0200, Thomas Richter escreveu:
>> Using perf on 4.16.0 kernel on s390 shows warning
>>failed: can't open node sysfs data
>> each time I run command perf record ... for example:
>>
>> [root@s35lp76 perf]# ./perf record -e rB -- sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> failed: can't open node sysfs data
>> [ perf record: Captured and wrote 0.001 MB perf.data (4 samples) ]
>> [root@s35lp76 perf]#
>>
>> BTW: I find this error message not very informative.
> 
> What an understatement :-)
>  
>> It turns out commit e2091cedd51bf ("perf tools: Add MEM_TOPOLOGY
>> feature to perf data file") tries to open directory named
>> /sys/devices/system/node/ which does not exist on s390.
>>
>> This is the call stack:
>>  __cmd_record
>>  +---> perf_session__write_header
>>+---> perf_header__adds_write
>>  +---> do_write_feat
>> +---> write_mem_topology
>>   +---> build_mem_topology
>> prints warning
>> The issue starts in do_write_feat() which unconditionally
>> loops over all features and now includes HEADER_MEM_TOPOLOGY and calls
>> write_mem_topology().
>> Function record__init_features() at the beginning of __cmd_record()
>> sets all features and then turns off some.
>>
>> Fix this by removed the warning, if the directory is not present
>> memory node information is not available and won't be displayed.
> 
> Can't we instead improve the error message and turn this into a
> pr_debug2? Isn't it a reasonable scenario that the user expects this
> topology information to be present and then ends up without it?
> 
> Perhaps something like:
> 
>   pr_debug2("%s: could't read %s, does this arch have topology 
> information?\n", __func__, path);
> 
> - Arnaldo
>  

Fine with me, I will provide a version 2

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf list: Add s390 support for detailed/verbose pmu event description

2018-04-10 Thread Thomas-Mich Richter

On 04/09/2018 04:18 PM, Mark Rutland wrote:
> On Mon, Apr 09, 2018 at 03:03:32PM +0200, Thomas-Mich Richter wrote:
>> On 04/09/2018 01:37 PM, Mark Rutland wrote:
>>> On Mon, Apr 09, 2018 at 01:00:15PM +0200, Thomas Richter wrote:
>>>> @@ -562,6 +562,12 @@ static int is_pmu_core(const char *name)
>>>>if (stat(path, &st) == 0)
>>>>return 1;
>>>>  
>>>> +  /* Look for cpu sysfs (specific to s390) */
>>>> +  scnprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s",
>>>> +sysfs, name);
>>>> +  if (stat(path, &st) == 0)
>>>> +  return 1;
>>>
>>> IIUC here we return true if the PMU has a sysfs directory, but all PMUs
>>> (including uncore PMUs) should have such a directory, so this will make
>>> is_pmu_core() always return true.
>>>
>>> Can you match the "cpum_" prefix specifically, instead?
>>>
>>> Thanks,
>>> Mark.
>>
>> I am sorry, I don't follow you.
>>
>> When I look at the code function sequence
>>
>> perf_pmu__scan()
>> +---> pmu_read_sysfs()
>>   This functions scans directory /sys/bus/event_source/devices/
>>   and calls for each entry in this directory. For s390 these entries 
>> exist:
>>   cpum_sf cpum_cf tracepoint and software
> 
> ... and we want is:
> 
>   is_pmu_core("cpum_sf") == true
>   is_pmu_core("cpum_cf") == true
>   is_pmu_core("tracepoint") == false
>   is_pmu_core("software") == false
> 
>>   +---> perf_pmu__find();
>> +---> pmu_lookup()
> 
>>   +---> pmu_add_cpu_aliases() adds the list of aliases such 
>> as cpum_sf/SF_CYCLES_BASIC/
>> to the global list of aliases. But ths happens only 
>> when
>> +---> is_pmu_core()
>>   function returns true.
>>   And for s390 it needs to test for 
>> /sys/bus/event_source/devices/cpum_sf/ and
>>   /sys/bus/event_source/devices/cpum_cf/
>>   directories.
>>   These directories are used to read the alias 
>> names in function
>>   pmu_aliases() above.
> 
> However, your code also returns true for "tracepoint" and "software".
> 
> You check if there's a directory under event_source/devices/ for the PMU
> name, and every PMU should have such a directory.
> 
> For example, on my x86 box I have
> 
> [mark@lakrids:~]% ls /sys/bus/event_source/devices 
> breakpoint   power   uncore_cbox_13  uncore_cbox_9  uncore_qpi_0
> cpu  softwareuncore_cbox_2   uncore_ha_0uncore_qpi_1
> cstate_core  tracepoint  uncore_cbox_3   uncore_ha_1uncore_r2pcie
> cstate_pkg   uncore_cbox_0   uncore_cbox_4   uncore_imc_0   uncore_r3qpi_0
> intel_btsuncore_cbox_1   uncore_cbox_5   uncore_imc_1   uncore_r3qpi_1
> intel_cqmuncore_cbox_10  uncore_cbox_6   uncore_imc_4   uncore_ubox
> intel_pt uncore_cbox_11  uncore_cbox_7   uncore_imc_5
> msr  uncore_cbox_12  uncore_cbox_8   uncore_pcu
> 
> 
> ... and with your patch and the below hack applied:
> 
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index bd0a32b03523..cec6bf551fe3 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -675,6 +675,12 @@ static void pmu_add_cpu_aliases(struct list_head *head, 
> struct perf_pmu *pmu)
> break;
> }
>  
> +   if (is_pmu_core(name)) {
> +   printf ("is_pmu_core(\"%s\") is true\n", name);
> +   } else {
> +   printf ("is_pmu_core(\"%s\") is false\n", name);
> +   }
> +
> if (!is_pmu_core(name)) {
> /* check for uncore devices */
> if (pe->pmu == NULL)
> 
> 
> ... is_pmu_core() returns true for every PMU:
> 
> [mark@lakrids:~/src/linux/tools/perf]% ./perf list | grep 'is true' | uniq
> is_pmu_core("uncore_imc_4") is true
> is_pmu_core("uncore_cbox_5") is true
> is_pmu_core("uncore_ha_0") is true
> is_pmu_core("uncore_cbox_3") is true
> is_pmu_core("uncore_qpi_0") is true
> is_pmu_core("cstate_pkg") is true
> is_pmu_core("breakpoint")

Re: [PATCH] perf list: Add s390 support for detailed/verbose pmu event description

2018-04-09 Thread Thomas-Mich Richter

On 04/09/2018 01:37 PM, Mark Rutland wrote:
> Hi,
> 
> On Mon, Apr 09, 2018 at 01:00:15PM +0200, Thomas Richter wrote:
>> From: Thomas Richter 
>>
>> Perf list with flags -d and -v print a description (-d) or
>> a very verbose explanation (-v) of CPU specific counter events.
>> These descriptions are provided with the json files in
>> directory pmu-events/arch/s390/*.json.
>>
>> Display of these descriptions on s390 requires the
>> corresponding json files.
>>
>> On s390 this does not work because function is_pmu_core()
>> does not detect the s390 directory name where the
>> CPU specific events are listed. On x86 it is
>>   /sys/bus/event_source/devices/cpu
>> whereas on s390 it is
>>   /sys/bus/event_source/devices/cpum_cf
>>   /sys/bus/event_source/devices/cpum_sf
>>
>> Fix this by adding s390 directory name testing to
>> function is_pmu_core(). This is the same approach as taken for
>> arm platform.
> 
> I don't think this is quite right.
> 
> On arm, we specifically look for PMU directories which have a
> file called CPUs. e.g.
> 
>   /sys/bus/event_source/devices/arm_pmuv3/cpus
> 
> ... where "arm_pmuv3" is the PMU name.
> 
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index d5bf15ca..8675ddf558c6 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -562,6 +562,12 @@ static int is_pmu_core(const char *name)
>>  if (stat(path, &st) == 0)
>>  return 1;
>>  
>> +/* Look for cpu sysfs (specific to s390) */
>> +scnprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s",
>> +  sysfs, name);
>> +if (stat(path, &st) == 0)
>> +return 1;
> 
> IIUC here we return true if the PMU has a sysfs directory, but all PMUs
> (including uncore PMUs) should have such a directory, so this will make
> is_pmu_core() always return true.
> 
> Can you match the "cpum_" prefix specifically, instead?
> 
> Thanks,
> Mark.
> 

I am sorry, I don't follow you.
When I look at the code function sequence

perf_pmu__scan()
+---> pmu_read_sysfs()
  This functions scans directory /sys/bus/event_source/devices/
  and calls for each entry in this directory. For s390 these entries exist:
  cpum_sf cpum_cf tracepoint and software
  +---> perf_pmu__find();
+---> pmu_lookup()
  +---> pmu_format()
Looks for file 
/sys/bus/event_source/devices/cpum_cf/format/event
and parses its contents. This returns 0 on s390.
  +---> pmu_type() 
Looks for file 
/sys/bus/event_source/devices/cpum_cf/type and
and parses its contents. This returns 0 on s390.
  +---> pmu_aliases() 
Looks for directory 
/sys/bus/event_source/devices/cpum_cf/events and
reads out every entry name which is treated as a event 
name and added
to the list of PMU-Aliases together with the file 
contents. For example

/sys/bus/event_source/devices/cpum_cf/events/SF_CYCLES_BASIC with the
file contents event=0xb. This is the raw event 
number.
+> pmu_aliases_parse() + perf_pmu__new_alias()
   Check and add alias name to a list.
  +---> pmu_cpumask()
This tests for files 
/sys/bus/event_source/devices/cpum_cf/cpus or
/sys/bus/event_source/devices/cpum_cf/cpumask which do 
not exist on
s390.
  +---> pmu_add_cpu_aliases() adds the list of aliases such as 
cpum_sf/SF_CYCLES_BASIC/
to the global list of aliases. But ths happens only when
+---> is_pmu_core()
  function returns true.
  And for s390 it needs to test for 
/sys/bus/event_source/devices/cpum_sf/ and
  /sys/bus/event_source/devices/cpum_cf/
  directories.
  These directories are used to read the alias 
names in function
  pmu_aliases() above.

Maybe I misunderstand this whole scheme, but with this patch the perf list 
commands works...

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf stat: Add support for s390 transaction counters

2018-03-14 Thread Thomas-Mich Richter

On 03/14/2018 02:18 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 14, 2018 at 09:34:48AM +0100, Thomas-Mich Richter escreveu:
>> On 03/13/2018 04:23 AM, Andi Kleen wrote:
>>> Thomas Richter  writes:
> 
>>>> Right now there is only hard coded support for x86.
> 
>>> That's not true. There is support for generic transaction events in perf.
> 
>>> As far as I can tell your events would map 1:1 to the generic tx-* events.
> 
>> I might be wrong, but when I look at function add_default_attributes()
>> in file buildin-stat.c the string variables transaction_attrs
>> and transaction_limited_attrs are used when flag T is specified on command 
>> line:
> 
>> /* Default events used for perf stat -T */
>> static const char *transaction_attrs = {
>> "task-clock,"
>> "{"
>> "instructions,"
>> "cycles,"
>> "cpu/cycles-t/,"
>> "cpu/tx-start/,"
>> "cpu/el-start/,"
>> "cpu/cycles-ct/"
>> "}"
>> };
> 
>> These PMU events show up on my x86 notebook but no on the s390.
>> That's why I came to this conclusion. I have not tried other architectures.
> 
> So, I think Andi is saying that the s/390 kernel should map the generic
> transaction events (cpu/cycles-t/, cpu/tx-start/, etc) to the events you
> want to make 'perf stat -T' use, that way 'perf stat' doesn't have to be
> changed, it will continue asking for the generic transaction event names
> and then the kernel does the translation.
> 
> Just like we map "cycles" to different underlying events in different
> architectures, right?
> 
> - Arnaldo

S390 has no support for Elision and uses transaction begin/end/abort
instructions. The CPU measurement counter facility provides counters for
transaction end and transaction abort.

This means s390 counter facility device driver in 
arch/s390/kernel/perf_cpum_cf.c
could support the tx_abort and tx_commit symbolic counter names.

I have used this table (taken from arch/x86/events/intel/core.c) as giudeline:
/* Haswell special events */
EVENT_ATTR_STR(tx-start,tx_start,   "event=0xc9,umask=0x1");
EVENT_ATTR_STR(tx-commit,   tx_commit,  "event=0xc9,umask=0x2");
EVENT_ATTR_STR(tx-abort,tx_abort,   "event=0xc9,umask=0x4");
EVENT_ATTR_STR(tx-capacity, tx_capacity,"event=0x54,umask=0x2");
EVENT_ATTR_STR(tx-conflict, tx_conflict,"event=0x54,umask=0x1");
EVENT_ATTR_STR(el-start,el_start,   "event=0xc8,umask=0x1");
EVENT_ATTR_STR(el-commit,   el_commit,  "event=0xc8,umask=0x2");
EVENT_ATTR_STR(el-abort,el_abort,   "event=0xc8,umask=0x4");
EVENT_ATTR_STR(el-capacity, el_capacity,"event=0x54,umask=0x2");
EVENT_ATTR_STR(el-conflict, el_conflict,"event=0x54,umask=0x1");
EVENT_ATTR_STR(cycles-t,cycles_t,   "event=0x3c,in_tx=1");
EVENT_ATTR_STR(cycles-ct,   cycles_ct,  
"event=0x3c,in_tx=1,in_tx_cp=1");


So s390 can only support tx_commit and tx-abort symbolic names.
None of them show up in the transactions_attrs and transaction_limited_attrs
string variables used in builtin-stat.c file.

That is the reason why I introduced the patch set v2.
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf stat: Add support for s390 transaction counters

2018-03-14 Thread Thomas-Mich Richter

On 03/13/2018 04:23 AM, Andi Kleen wrote:
> Thomas Richter  writes:
> 
>> Right now there is only hard coded support for x86.
> 
> That's not true. There is support for generic transaction events in perf.
> 
> As far as I can tell your events would map 1:1 to the generic tx-* events.
> 
> -Andi
> 

I might be wrong, but when I look at function add_default_attributes()
in file buildin-stat.c the string variables transaction_attrs
and transaction_limited_attrs are used when flag T is specified on command line:

/* Default events used for perf stat -T */
static const char *transaction_attrs = {
"task-clock,"
"{"
"instructions,"
"cycles,"
"cpu/cycles-t/,"
"cpu/tx-start/,"
"cpu/el-start/,"
"cpu/cycles-ct/"
"}"
};

These PMU events show up on my x86 notebook but no on the s390.
That's why I came to this conclusion. I have not tried other architectures.
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH Resent] perf annotate: Fix s390 target function disassembly

2018-03-06 Thread Thomas-Mich Richter

On 03/06/2018 03:04 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 06, 2018 at 01:39:55PM +0100, Thomas Richter escreveu:
>> Perf annotate displays function call assembler instructions
>> with a right arrow. Hitting enter on this line/instruction
>> causes the browser to disassemble this target function and
>> show it on the screen.  On s390 this results in an error
>> message 'The called function was not found.'
>>
>> The function call assembly line parsing does not handle
>> the s390 bras and brasl instructions. Function call__parse
>> expects the target as first operand:
>>  callq   e9140 <__fxstat>
>> S390 has a register number as first operand:
>>  brasl   %r14,41d60 
>> Therefore the target addresses on s390 are always zero
>> which is an invalid address.
>>
>> Fix this by skipping the first operand on s390.
>>
>> Signed-off-by: Thomas Richter 
>> Tested-by: Christian Borntraeger 
>> Reviewed-by: Hendrik Brueckner 
>> ---
>>  tools/perf/util/annotate.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index 49ff825f745c..feb6006b676d 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -192,6 +192,14 @@ static int call__parse(struct arch *arch, struct 
>> ins_operands *ops, struct map *
>>  };
>>  
>>  ops->target.addr = strtoull(ops->raw, &endptr, 16);
>> +if (!strcmp(arch->name, "s390")) {
>> +/* s390 function call 1st operand is register */
>> +tok = strchr(ops->raw, ',');
>> +if (tok)
>> +ops->target.addr = strtoull(tok + 1, &endptr, 16);
>> +else
>> +ops->target.addr = 0;
>> +} else
>> +ops->target.addr = strtoull(ops->raw, &endptr, 16);
> 
> Do we have to do this here? Can't we have a s390_call__parse() and set
> that in the s/390 instructions?


Originally I wanted to add an architecture specific weak function to handle s390
This did not work because file util/annotate.c includes the architecture 
specific versions
around line 100:

#include "arch/arm/annotate/instructions.c"
#include "arch/arm64/annotate/instructions.c"
#include "arch/x86/annotate/instructions.c"
#include "arch/powerpc/annotate/instructions.c"
#include "arch/s390/annotate/instructions.c"

This includes the C file for s390 so we end up having function call__parse 
defined twice,
one with a weak definition and one without which results in a compiler error.

I will use  a s390 specific call__parse function and sent another patch.

>> -- 
>> 2.14.3
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [RFC 0/5] per-event settings fixes

2018-01-16 Thread Thomas-Mich Richter

On 01/16/2018 03:24 PM, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo 
> 
> Hi guys,
> 
>   Jiri asked me to post this series, so here it is, please take a
> look, I'd love to harvest your Reviewed-by/Tested-by/Acked-by,
> 
> - Arnaldo
> 
> Arnaldo Carvalho de Melo (5):
>   perf callchain: Fix attr.sample_max_stack setting
>   perf unwind: Do not look at globals
>   perf trace: Setup DWARF callchains for non-syscall events when --max-stack 
> is used
>   perf trace: Allow overriding global --max-stack per event
>   perf callchains: Ask for PERF_RECORD_MMAP for data mmaps for DWARF unwinding
> 
>  tools/perf/builtin-trace.c   | 19 ---
>  tools/perf/util/evsel.c  | 24 +---
>  tools/perf/util/unwind-libunwind-local.c |  9 -
>  3 files changed, 33 insertions(+), 19 deletions(-)
> 

I have tested your patches on my s390x. I applied
them on top of your perf/core branch.

Here is the output:
[root@s8360047 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.046 ms

 --- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.046/0.046/0.046/0.000 ms
 0.000 probe_libc:inet_pton:(3ffa4ec2060))
[root@s8360047 perf]#

Test ok !

[root@s8360047 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton/max-stack=5,call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.054 ms

 --- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.054/0.054/0.054/0.000 ms
 0.000 probe_libc:inet_pton:(3ff7e742060))
   __GI___inet_pton (/usr/lib64/libc-2.26.so)
   gaih_inet (inlined)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
   __libc_start_main (/usr/lib64/libc-2.26.so)
[root@s8360047 perf]#

Test ok!

[root@s8360047 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.062 ms

 --- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.062/0.062/0.062/0.000 ms
 0.000 probe_libc:inet_pton:(3ff86642060))
   __GI___inet_pton (/usr/lib64/libc-2.26.so)
   gaih_inet (inlined)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
   __libc_start_main (/usr/lib64/libc-2.26.so)
   _start (/usr/bin/ping)
[root@s8360047 perf]#

Test ok:

[root@s8360047 perf]# ./perf trace --no-syscalls --call-graph dwarf
--max-stack 4  -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.031 ms

 --- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.031/0.031/0.031/0.000 ms
 0.000 probe_libc:inet_pton:(3ffa1bc2060))
   __GI___inet_pton (/usr/lib64/libc-2.26.so)
   gaih_inet (inlined)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
[root@s8360047 perf]#

Test ok, also when --max-stack 4 is omitted the complete
backchain is printed.

[root@s8360047 perf]# ./perf trace --no-syscalls --call-graph dwarf
-e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.047 ms

 --- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.047/0.047/0.047/0.000 ms
 0.000 probe_libc:inet_pton:(3ffb4bc2060))
[root@s8360047 perf]#

Test fails, I would have expected some callchain output.

[root@s8360047 perf]# ./perf trace --no-syscalls --call-graph dwarf --max-stack 
4
-e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.055 ms

 --- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.055/0.055/0.055/0.000 ms
 0.000 probe_libc:inet_pton:(3ff7e6c2060))
[root@s8360047 perf]#

Test fails, I would have expected some callchain output.

It looks like /max-stack=3/ without ,call-graph=dwarf
does not trigger the stack unwinding.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

2018-01-15 Thread Thomas-Mich Richter

On 01/15/2018 03:16 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jan 15, 2018 at 10:57:52AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> [root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
>>> -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
>>> PING ::1(::1) 56 data bytes
>>> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.020 ms
>>>
>>> --- ::1 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
>>>  0.000 probe_libc:inet_pton:(7ffbc5f768a0))
>>> __inet_pton (inlined)
>>> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
>>> __GI_getaddrinfo (inlined)
>>> main (/usr/bin/ping)
>>> [root@f27 perf]#
>>>
>>>
>>> --> Dwarf call graph and --max-stack 4 is also honoured.
>>
>> [root@jouet ~]# perf trace --no-syscalls -e 
>> probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
>> perf trace --no-syscalls --max-stack 4 -e 
>> probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
>> Value too large for defined data type
>> [root@jouet ~]# 
>>
>> Grrr.
> 
> 
> Got this one fixed with the following patch:
> 
> commit b78278e11f6992ca348a4b96aad3b2c0a9ecf0f0

[...]

> 
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index efa2e629a669..8f971a2301d1 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -731,14 +731,14 @@ static void apply_config_terms(struct perf_evsel *evsel,
>   struct perf_evsel_config_term *term;
>   struct list_head *config_terms = &evsel->config_terms;
>   struct perf_event_attr *attr = &evsel->attr;
> - struct callchain_param param;
> + /* callgraph default */
> + struct callchain_param param = {
> + .record_mode = callchain_param.record_mode,
> + };
>   u32 dump_size = 0;
>   int max_stack = 0;
>   const char *callgraph_buf = NULL;
> 
> - /* callgraph default */
> - param.record_mode = callchain_param.record_mode;
> -
>   list_for_each_entry(term, config_terms, list) {
>   switch (term->type) {
>   case PERF_EVSEL__CONFIG_TERM_PERIOD:

This patch works for me.
Here is the output on my s390x:

[root@s8360047 perf]# ./perf trace --no-syscalls --max-stack 4 
-e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.070 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.070/0.070/0.070/0.000 ms
 0.000 probe_libc:inet_pton:(3ffa70c2060))
   __GI___inet_pton (/usr/lib64/libc-2.26.so)
   gaih_inet (inlined)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
[root@s8360047 perf]# ./perf trace --no-syscalls --call-graph dwarf 
-e probe_libc:inet_pton -- ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.086 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.086/0.086/0.086/0.000 ms
 0.000 probe_libc:inet_pton:(3ff93fc2060))
   __GI___inet_pton (/usr/lib64/libc-2.26.so)
   gaih_inet (inlined)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
   __libc_start_main (/usr/lib64/libc-2.26.so)
   _start (/usr/bin/ping)
[root@s8360047 perf]#

[root@s8360047 perf]# ./perf trace --no-syscalls 
-e probe_libc:inet_pton/call-graph=dwarf,max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.066 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.066/0.066/0.066/0.000 ms
 0.000 probe_libc:inet_pton:(3ffb82c2060))
   __GI___inet_pton (/usr/lib64/libc-2.26.so)
   gaih_inet (inlined)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
   __libc_start_main (/usr/lib64/libc-2.26.so)
   _start (/usr/bin/ping)
[root@s8360047 perf]# 

Not sure if this can work at all. Since dwarf stack unwinding is done in user 
space
the attr.sample_max_stack set to 3 is useless in this case.

Yout have my Tested-by.
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

2018-01-15 Thread Thomas-Mich Richter

On 01/12/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
>> There is still room for improvement, I noticed overriding is not working
>> for the probe event, investigating it now.
> 
> So, I had to fix this another way to get the possibility of overwriting
> the global options (--max-stack, --call-graph) in an specific tracepoint
> event:
> 
> http://git.kernel.org/acme/c/08e26396c6f2
> 
> replaced that HEAD.
> 
> This cset may take some more minutes to show up, just pushed.
> 

I have installed your perf/core tree on my Fedora 27 Virtual Machine
running on my Intel notebook.

Here are some commands and the output on an Intel platform:

[root@f27 perf]# uname -a
Linux f27 4.15.0-rc6acme+ #1 SMP Mon Jan 15 12:35:23 CET 2018 x86_64 x86_64 
x86_64 GNU/Linux
[root@f27 perf]#

[root@f27 perf]# ./perf trace --no-syscalls --call-graph fp --max-stack 3
-e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.022 ms
 0.000 probe_libc:inet_pton:
 --- ::1 ping statistics ---
(7f8fc407d8a0))
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.022/0.022/0.022/0.000 ms
[root@f27 perf]#

--> No call graph at all, the kernel as been compiled with ORC unwinder.

[root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf --max-stack 3
-e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.024 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.024/0.024/0.024/0.000 ms
0.000 probe_libc:inet_pton:(7f7ff38488a0))
[root@f27 perf]#

--> No call graph at all, the kernel as been compiled with OCR unwinder.

[root@f27 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
ping -6 -c 1 ::1PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.019 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.019/0.019/0.019/0.000 ms
 0.000 probe_libc:inet_pton:(7fc985d658a0))
   __inet_pton (inlined)
   gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
   __libc_start_main (/usr/lib64/libc-2.26.so)
   _start (/usr/bin/ping)
[root@f27 perf]#

--> Dwarf call graph, but max-stack=4 not honoured when specified as
event specific restriction.

[root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
-e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.020 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
 0.000 probe_libc:inet_pton:(7ffbc5f768a0))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
[root@f27 perf]#


--> Dwarf call graph and --max-stack 4 is also honoured.

I have the feeling that your fix does not work very well when
used with the --no-syscalls option.
Omitting --no-syscalls shows your explained behavior.

So there must be a difference between --no-syscalls and --syscalls
invocation. 

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

2018-01-15 Thread Thomas-Mich Richter

On 01/12/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
>> There is still room for improvement, I noticed overriding is not working
>> for the probe event, investigating it now.
> 
> So, I had to fix this another way to get the possibility of overwriting
> the global options (--max-stack, --call-graph) in an specific tracepoint
> event:
> 
> http://git.kernel.org/acme/c/08e26396c6f2
> 
> replaced that HEAD.
> 
> This cset may take some more minutes to show up, just pushed.
> 

Sorry this does *not* work on my s390x.

I have cloned your perf/core tree and above commit is included.
Here is the command I tried:

[root@s8360047 perf]# ./perf trace -vv --no-syscalls  --max-stack 4 
--call-graph dwarf 
   -e probe_libc:inet_pton -- ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192

perf_event_attr:
  type 2
  size 112
  config   0x45f
  { sample_period, sample_freq }   1
  sample_type  IP|TID|TIME|ADDR|CPU|PERIOD|RAW|DATA_SRC
  disabled 1
  inherit  1
  mmap 1
  comm 1
  enable_on_exec   1
  task 1
  mmap_data1
  sample_id_all1
  exclude_guest1
  mmap21
  comm_exec1
  { wakeup_events, wakeup_watermark } 1

sys_perf_event_open: pid 6735  cpu 0  group_fd -1  flags 0x8 = 3
sys_perf_event_open: pid 6735  cpu 1  group_fd -1  flags 0x8 = 4
mmap size 2101248B
perf event ring buffer mmapped per cpu
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.070 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.070/0.070/0.070/0.000 ms
 0.000 probe_libc:inet_pton:(3ffada42060))
[root@s8360047 perf]# 

I do miss in sample_type bits for CALLCHAIN, STACK_USER and REG_USER
in the debug output of the perf_event_open() attribute printout.

When I invoke the command with
[root@s8360047 perf]# ./perf trace -vv --no-syscalls 
-e probe_libc:inet_pton/call-graph=dwarf,max-stack=3/ -- ping -6 -c 1 
::1
callchain: type DWARF
callchain: stack dump size 8192

perf_event_attr:
  type 2
  size 112
  config   0x45f
  { sample_period, sample_freq }   1
  sample_type  
IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER
  disabled 1
  inherit  1
  mmap 1
  comm 1
  enable_on_exec   1
  task 1
  sample_id_all1
  exclude_guest1
  exclude_callchain_user   1
  mmap21
  comm_exec1
  { wakeup_events, wakeup_watermark } 1
  sample_regs_user 0x3
  sample_stack_user8192
  sample_max_stack 3

sys_perf_event_open: pid 6768  cpu 0  group_fd -1  flags 0x8 = 3
sys_perf_event_open: pid 6768  cpu 1  group_fd -1  flags 0x8 = 4
mmap size 528384B
perf event ring buffer mmapped per cpu
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.074 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.074/0.074/0.074/0.000 ms
[... snip ]

unwind: _start:ip = 0x2aa1e38457b (0x457b)
 0.000 probe_libc:inet_pton:(3ff9b142060))
   __GI___inet_pton 
(/usr/lib64/libc-2.26.so)
   gaih_inet (inlined)
   __GI_getaddrinfo (inlined)
   main (/usr/bin/ping)
   __libc_start_main 
(/usr/lib64/libc-2.26.so)
   _start (/usr/bin/ping)
[root@s8360047 perf]# 

I see the proper result as expected.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] trace/uprobes: fix output issue with address randomization

2017-12-19 Thread Thomas-Mich Richter

On 12/18/2017 07:16 PM, Linus Torvalds wrote:
> On Mon, Dec 18, 2017 at 9:59 AM, Steven Rostedt  wrote:
>>
>> Hmm, since the scrambling of %p is to prevent kernel addresses from
>> leaking, I wonder if it would be OK to make it only scramble the address
>> if the address is a kernel address. It should be fine to print user
>> space addresses unscrambled. Shouldn't it?
> 
> So %p itself shouldn't have logic like that, because some of those
> addresses can be sensitive even if they aren't strictly kernel
> addresses.
> 
> For example, anything that prints out sensitive physical addresses
> wouldn't look like a kernel virtual address, but it could still expose
> very sensitive data.
> 
> So that check would have to be done by the user of %p, not by %p itself.
> 
> So generally, the fix for "oops %p hashing now breaks xyz" should
> generally be to make sure that the protections are correct, and then
> it can be turned into %px (when it can't just be removed).
> 
> In this particular case, we already have the magical case of "don't
> print (null)", and I think _that_ the case that could be just extended
> to say "don't print sensitive data" and then the %p can be changed
> into a %px.
> 
> So one approach would be to simply checking (at _open_ time, not at IO
> time! That was one of the things that I absolutely detested about %pK
> - getting that fundamentally wrong) whether the opener could write a
> kernel address to the file, and if the opener has those permissions,
> then it obviously can read the values too.
> 
> But in this case I would suggest just making "uprobe_events" be 0600
> rather than 0644. Why the hell should a normal user be able to read
> whatever events that root has set?
> 
> Same goes for  "uprobe_profile". Is it really sensible that anybody
> can read that file? It sounds insane to me. Why should a random
> regular user be able to see what probes are installed?
> 
> Honestly, very few of the "let's just change %p to %px" has been
> correct. There is pretty much _always_ some permission problem or
> other that says "hell no, that data should not have been so widely
> available before".
> 
> The only time a pure %p -> %px is warranted is likely for oops reports
> etc. We did have a couple of those.
> 
> Linus
> 

As Steven Rostedt already pointed out, the 2 file systems debugfs and
tracefs are accessible for root only:

[root@s35lp76 ~]# mount | egrep 'debug|trace'
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
[root@s35lp76 ~]# ls -ld /sys/kernel/debug/ /sys/kernel/debug/tracing/
drwx-- 14 root root 0 Dec 18 11:34 /sys/kernel/debug/
drwx--  7 root root 0 Dec 18 11:34 /sys/kernel/debug/tracing/
[root@s35lp76 ~]# 

Surely the file permissions for uprobe_events and uprobe_profile
in directory /sys/kernel/debug/tracing 
can be changed to be accessible by owner (root) only, but does this really
help?
I have looked at all the other files in this directory and almost
all of them have read permissions for group and others.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: Linux 4.15.0-rc3 perf probe/uprobe issue with address randomization

2017-12-15 Thread Thomas-Mich Richter

On 12/15/2017 03:21 PM, Peter Zijlstra wrote:
> On Fri, Dec 15, 2017 at 03:14:37PM +0100, Thomas-Mich Richter wrote:
>> During debugging of perf probe tool I discovered an issue with
>> uprobes and address randomization.
>>
>> To set a uprobe on a function named inet_pton in libc library, you
>> obtain the address of the symbol inet_pton using command nm and
>> then use the following command to set the uprobe:
>>
>> # echo "p:probe_libc/inet_pton /usr/lib64/libc-2.26.so:0x142060"
>>  > /sys/kernel/debug/tracing/uprobe_events
>>
>> 0x142060 is the address of inet_pton on my system.
>> This works nicely and the uprobe is usable.
>>
>> The issue is with the output:
>> # cat /sys/kernel/debug/tracing/uprobe_events
>> p:probe_libc/inet_pton /usr/lib64/libc-2.26.so:0x2d0f8952
>> #
>>
>> The displayed address 0x2d0f8952 is wrong, probably
>> randomized and post processing of this output with the perf
>> probe tool fails due to this random address:
>>
>> # linux/tools/perf/perf probe -l
>> Failed to find debug information for address 2d0f8952
>>   probe_libc:inet_pton (on 0x2d0f8952 in /usr/lib64/libc-2.26.so)
>> # 
>>
>> So how to fix this (if at all)?
>> Is replacing %p by %llx in line 612 of file kernel/trace/trace_uprobe.c
>>seq_printf(m, "0x%p", (void *)tu->offset)
>> an option?
>> Or is this broken by design and intention?
> 
> So recently %p got changed to hash pointers in order to avoid leaking
> kernel addresses.
> 
>   ad67b74d2469 ("printk: hash addresses printed with %p")
> 
> I'm not sure what privilidges are required for reading that kprobe
> state, but I suspect its root only, so changing this to %px might be
> what is needed.
> 

Thanks for the hint. The %px printk format string fixes the issue:


# echo "p:probe_libc/inet_pton /usr/lib64/libc-2.26.so:0x142060" 
> /sys/kernel/debug/tracing/uprobe_events
# cat /sys/kernel/debug/tracing/uprobe_events
p:probe_libc/inet_pton /usr/lib64/libc-2.26.so:0x00142060
# ./tools/perf/perf probe -l
  probe_libc:inet_pton (on __inet_pton@resolv/inet_pton.c in 
/usr/lib64/libc-2.26.so)
# 

I will send a patch to fix this.
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Linux 4.15.0-rc3 perf probe/uprobe issue with address randomization

2017-12-15 Thread Thomas-Mich Richter

During debugging of perf probe tool I discovered an issue with
uprobes and address randomization.

To set a uprobe on a function named inet_pton in libc library, you
obtain the address of the symbol inet_pton using command nm and
then use the following command to set the uprobe:

# echo "p:probe_libc/inet_pton /usr/lib64/libc-2.26.so:0x142060"
> /sys/kernel/debug/tracing/uprobe_events

0x142060 is the address of inet_pton on my system.
This works nicely and the uprobe is usable.

The issue is with the output:
# cat /sys/kernel/debug/tracing/uprobe_events
p:probe_libc/inet_pton /usr/lib64/libc-2.26.so:0x2d0f8952
#

The displayed address 0x2d0f8952 is wrong, probably
randomized and post processing of this output with the perf
probe tool fails due to this random address:

# linux/tools/perf/perf probe -l
Failed to find debug information for address 2d0f8952
  probe_libc:inet_pton (on 0x2d0f8952 in /usr/lib64/libc-2.26.so)
# 

So how to fix this (if at all)?
Is replacing %p by %llx in line 612 of file kernel/trace/trace_uprobe.c
   seq_printf(m, "0x%p", (void *)tu->offset)
an option?
Or is this broken by design and intention?

Thanks for hints and suggestions.
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: container test for cross building s390 tools failing

2017-12-14 Thread Thomas-Mich Richter

On 12/13/2017 05:31 PM, Arnaldo Carvalho de Melo wrote:
> Hi, noticed this with my perf/core branch, will investigate later.
> 
> [root@jouet ubuntu]# cat /tmp/dm.log.JAK3XV/ubuntu\:16.04-x-s390 
> ubuntu:16.04-x-s390
> Downloading http://192.168.124.1/perf/perf-4.15.0-rc3.tar.xz...
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 1237k  100 1237k0 0   236M  0 --:--:-- --:--:-- --:--:--  402M
> make: Entering directory '/git/linux/tools/perf'
>   BUILD:   Doing 'make -j4' parallel build
>   HOSTCC   /tmp/build/perf/fixdep.o
>   HOSTLD   /tmp/build/perf/fixdep-in.o
>   LINK /tmp/build/perf/fixdep
> sh: 1: command: Illegal option -c
> 
> Auto-detecting system features:
> ... dwarf: [ on  ]
> ...dwarf_getlocations: [ on  ]
> ... glibc: [ on  ]
> ...  gtk2: [ OFF ]
> ...  libaudit: [ OFF ]
> ...libbfd: [ OFF ]
> ...libelf: [ on  ]
> ...   libnuma: [ OFF ]
> ...numa_num_possible_cpus: [ OFF ]
> ...   libperl: [ OFF ]
> ... libpython: [ OFF ]
> ...  libslang: [ OFF ]
> ... libcrypto: [ OFF ]
> ... libunwind: [ OFF ]
> ...libdw-dwarf-unwind: [ on  ]
> ...  zlib: [ on  ]
> ...  lzma: [ OFF ]
> ... get_cpuid: [ OFF ]
> ...   bpf: [ on  ]
> 
> Makefile.config:414: No sys/sdt.h found, no SDT events are defined, please 
> install systemtap-sdt-devel or systemtap-sdt-dev
> Makefile.config:537: No libaudit.h found, disables 'trace' tool, please 
> install audit-libs-devel or libaudit-dev
> Makefile.config:548: No libcrypto.h found, disables jitted code injection, 
> please install libssl-devel or libssl-dev
> Makefile.config:563: slang not found, disables TUI support. Please install 
> slang-devel, libslang-dev or libslang2-dev
> Makefile.config:577: GTK2 not found, disables GTK2 support. Please install 
> gtk2-devel or libgtk2.0-dev
> Makefile.config:604: Missing perl devel files. Disabling perl scripting 
> support, please install perl-ExtUtils-Embed/libperl-dev
> Makefile.config:630: No python interpreter was found: disables Python support 
> - please install python-devel/python-dev
> Makefile.config:699: No bfd.h/libbfd found, please install 
> binutils-dev[el]/zlib-static/libiberty-dev to gain symbol demangling
> Makefile.config:728: No liblzma found, disables xz kernel module 
> decompression, please install xz-devel/liblzma-dev
> Makefile.config:741: No numa.h found, disables 'perf bench numa mem' 
> benchmark, please install numactl-devel/libnuma-devel/libnuma-dev
> Makefile.config:818: No alternatives command found, you need to set JDIR= to 
> point to the root of your Java directory
>   GEN  /tmp/build/perf/common-cmds.h
>   PERF_VERSION = 4.15.rc3.g5e73c6
>   MKDIR/tmp/build/perf/fd/
>   CC   /tmp/build/perf/fd/array.o
>   CC   /tmp/build/perf/event-parse.o
>   CC   /tmp/build/perf/exec-cmd.o
>   CC   /tmp/build/perf/libbpf.o
>   LD   /tmp/build/perf/fd/libapi-in.o
>   CC   /tmp/build/perf/help.o
>   MKDIR/tmp/build/perf/fs/
>   CC   /tmp/build/perf/fs/fs.o
>   CC   /tmp/build/perf/bpf.o
>   MKDIR/tmp/build/perf/fs/
>   CC   /tmp/build/perf/fs/tracing_path.o
>   LD   /tmp/build/perf/libbpf-in.o
>   LINK /tmp/build/perf/libbpf.a
>   CC   /tmp/build/perf/pager.o
>   LD   /tmp/build/perf/fs/libapi-in.o
>   CC   /tmp/build/perf/cpu.o
>   CC   /tmp/build/perf/debug.o
>   CC   /tmp/build/perf/str_error_r.o
>   CC   /tmp/build/perf/event-plugin.o
>   LD   /tmp/build/perf/libapi-in.o
>   AR   /tmp/build/perf/libapi.a
>   MKDIR/tmp/build/perf/pmu-events/
>   HOSTCC   /tmp/build/perf/pmu-events/json.o
>   CC   /tmp/build/perf/parse-options.o
>   CC   /tmp/build/perf/trace-seq.o
>   CC   /tmp/build/perf/parse-filter.o
>   CC   /tmp/build/perf/parse-utils.o
>   CC   /tmp/build/perf/kbuffer-parse.o
>   MKDIR/tmp/build/perf/pmu-events/
>   HOSTCC   /tmp/build/perf/pmu-events/jsmn.o
>   HOSTCC   /tmp/build/perf/pmu-events/jevents.o
>   LD   /tmp/build/perf/libtraceevent-in.o
>   LINK /tmp/build/perf/libtraceevent.a
>   CC   /tmp/build/perf/run-command.o
>   CC   /tmp/build/perf/plugin_jbd2.o
>   CC   /tmp/build/perf/sigchain.o
>   HOSTLD   /tmp/build/perf/pmu-events/jevents-in.o
>   GEN  perf-archive
>   CC   /tmp/build/perf/subcmd-config.o
>   GEN  perf-with-kcore
>   LD   /tmp/build/perf/plugin_jbd2-in.o
>   MKDIR/tmp/build/perf/util/
>   CC   /tmp/build/perf/util/annotate.o
>   CC   /tmp/build/perf/plugin_hrtimer.o
>   CC   /tmp/build/perf/

Re: [PATCH v2] perf test shell: Fix check open filename arg using 'perf trace

2017-12-14 Thread Thomas-Mich Richter

On 12/12/2017 06:12 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 12, 2017 at 05:24:43PM +0100, Michael Petlan escreveu:
>> Hey Arnaldo, this one did not remove the s390x hack and also, won't work
>> on arm. Please use the one I just have sent, few seconds ago...
> 
> Ok, I have you latest in, with the cset log I wrote, which clarifies
> that this is not a arch issue, but a glibc one, its only that some
> arches have distros where glibc was already at 2.26, where open() calls
> are mapped to openat() syscalls.
> 
> - Arnaldo
> 
>> On Tue, 12 Dec 2017, Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Dec 12, 2017 at 11:57:07AM -0300, Arnaldo Carvalho de Melo escreveu:
 So the patch below does the trick for me, can you please check if does
 for you?
>>>
>>> So I've put this with a long explanation at:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=tmp.perf/core&id=ba15be2acc9e35dc9f034fef3aa292406b518b04
>>>
>>> - Arnaldo
>>>
> 

I have extracted the latest patch with commit id 6c23b4f 
("perf test shell: Fix check open filename arg using 'perf trace'") from the git
repository perf/core today and it works for me.


-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2 0/5] perf-probe: Improve probing on versioned symbols

2017-12-08 Thread Thomas-Mich Richter

On 12/07/2017 08:19 AM, Masami Hiramatsu wrote:
> Hi,
> 
> Here is the 2nd version of the series for probing on
> versioned symbols in libraries. This includes 5 patches
> to fix the issues discussed on perf-users ML 
> (https://www.spinics.net/lists/linux-perf-users/msg04637.html)
> 
> The first version is here; https://lkml.org/lkml/2017/12/5/1124
> 
> Here is the updates;
> 
> [3/5] Add a description in Documentation/perf-probe.txt
> [5/5] Add a description and examples in Documentation/perf-probe.txt
> 
> Thank you,
> ---
> 
> Masami Hiramatsu (5):
>   perf-probe: Add warning message if there is unexpected event name
>   perf-probe: Cut off the version suffix from event name
>   perf-probe: Add __return suffix for return events
>   perf-probe: Find versioned symbols from map
>   perf-probe: Support escaped character in parser
> 
> 
>  tools/perf/Documentation/perf-probe.txt |   18 ++
>  tools/perf/arch/powerpc/util/sym-handling.c |8 +++
>  tools/perf/util/probe-event.c   |   81 
> +++
>  tools/perf/util/string.c|   46 +++
>  tools/perf/util/string2.h   |2 +
>  tools/perf/util/symbol.c|5 ++
>  tools/perf/util/symbol.h|1 
>  7 files changed, 137 insertions(+), 24 deletions(-)
> 
> --
> Masami Hiramatsu (Linaro Ltd.) 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

If you fix the findings for patch 4 and 5 you have my 

Reviewed-by: Thomas Richter 
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2 5/5] perf-probe: Support escaped character in parser

2017-12-08 Thread Thomas-Mich Richter

On 12/07/2017 08:21 AM, Masami Hiramatsu wrote:
> Support the special characters escaped by '\' in parser.
> This allows user to specify versions directly like below.
> 
>   =
>   # ./perf probe -x /lib64/libc-2.25.so malloc_get_state\\@GLIBC_2.2.5
>   Added new event:
> probe_libc:malloc_get_state (on malloc_get_state@GLIBC_2.2.5 in 
> /usr/lib64/libc-2.25.so)
> 
>   You can now use it in all perf tools, such as:
> 
> perf record -e probe_libc:malloc_get_state -aR sleep 1
> 
>   =
> 
> Or, you can use separators in source filename, e.g.
> 
>   =
>   # ./perf probe -x /opt/test/a.out foo+bar.c:3
>   Semantic error :There is non-digit character in offset.
> Error: Command Parse Error.
>   =
> 
> Usually "+" in source file cause parser error, but
> 
>   =
>   # ./perf probe -x /opt/test/a.out foo\\+bar.c:4
>   Added new event:
> probe_a:main (on @foo+bar.c:4 in /opt/test/a.out)
> 
>   You can now use it in all perf tools, such as:
> 
> perf record -e probe_a:main -aR sleep 1
>   =
> 
> escaped "\+" allows you to specify that.
> 
> Signed-off-by: Masami Hiramatsu 
> ---
>  Changes in v2:
>   - Add a description and samples in Documentation/perf-probe.txt
> ---
>  tools/perf/Documentation/perf-probe.txt |   16 +
>  tools/perf/util/probe-event.c   |   54 
> +++
>  tools/perf/util/string.c|   46 ++
>  tools/perf/util/string2.h   |2 +
>  4 files changed, 97 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-probe.txt 
> b/tools/perf/Documentation/perf-probe.txt
> index f96382692f42..b6866a05edd2 100644
> --- a/tools/perf/Documentation/perf-probe.txt
> +++ b/tools/perf/Documentation/perf-probe.txt
> @@ -182,6 +182,14 @@ Note that before using the SDT event, the target binary 
> (on which SDT events are
>  For details of the SDT, see below.
>  https://sourceware.org/gdb/onlinedocs/gdb/Static-Probe-Points.html
> 
> +ESCAPED CHARACTER
> +-
> +
> +In the probe syntax, '=', '@', '+', ':' and ';' are treated as a special 
> character. You can use a backslash ('\') to escape the special characters.
> +This is useful if you need to probe on a specific versioned symbols, like 
> @GLIBC_... suffixes, or also you need to specify a source file which includes 
> the special characters.
> +Note that usually single backslash is consumed by shell, so you might need 
> to pass double backslash (\\) or wrapping with single quotes (\'AAA\@BBB').
> +See EXAMPLES how it is used.
> +
>  PROBE ARGUMENT
>  --
>  Each probe argument follows below syntax.
> @@ -277,6 +285,14 @@ Add a USDT probe to a target process running in a 
> different mount namespace
> 
>   ./perf probe --target-ns  -x 
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64/jre/lib/amd64/server/libjvm.so
>  %sdt_hotspot:thread__sleep__end
> 
> +Add a probe on specific versioned symbol by backslash escape
> +
> + ./perf probe -x /lib64/libc-2.25.so 'malloc_get_state\@GLIBC_2.2.5'
> +
> +Add a probe in a source file using special characters by backslash escape
> +
> + ./perf probe -x /opt/test/a.out 'foo\+bar.c:4'
> +
> 
>  SEE ALSO
>  
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index 94acc5846e2a..c082a27982e5 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -1325,27 +1325,30 @@ static int parse_perf_probe_event_name(char **arg, 
> struct perf_probe_event *pev)
>  {
>   char *ptr;
> 
> - ptr = strchr(*arg, ':');
> + ptr = strpbrk_esc(*arg, ":");
>   if (ptr) {
>   *ptr = '\0';
>   if (!pev->sdt && !is_c_func_name(*arg))
>   goto ng_name;
> - pev->group = strdup(*arg);
> + pev->group = strdup_esc(*arg);
>   if (!pev->group)
>   return -ENOMEM;
>   *arg = ptr + 1;
>   } else
>   pev->group = NULL;
> - if (!pev->sdt && !is_c_func_name(*arg)) {
> +
> + pev->event = strdup_esc(*arg);
> + if (pev->event == NULL)
> + return -ENOMEM;
> +
> + if (!pev->sdt && !is_c_func_name(pev->event)) {
> + zfree(&pev->event);
>  ng_name:
> + zfree(&pev->group);
>   semantic_error("%s is bad for event name -it must "
>  "follow C symbol-naming rule.\n", *arg);
>   return -EINVAL;
>   }
> - pev->event = strdup(*arg);
> - if (pev->event == NULL)
> - return -ENOMEM;
> -
>   return 0;
>  }
> 
> @@ -1373,7 +1376,7 @@ static int parse_perf_probe_point(char *arg, struct 
> perf_probe_event *pev)
>   arg++;
>   }
> 
> - ptr = strpbrk(arg, ";=@+%");
> + ptr = strpbrk_esc(arg, ";=@+%");
>   if (pev->sdt) {
>   if (ptr) {
>   if (*ptr != '@') {
> @@ -1387,7 +1390,

Re: [PATCH v2 4/5] perf-probe: Find versioned symbols from map

2017-12-08 Thread Thomas-Mich Richter

On 12/07/2017 08:21 AM, Masami Hiramatsu wrote:
> Find versioned symbols correctly from map.
> Commit d80406453ad4 ("perf symbols: Allow user probes on
> versioned symbols") allows user to find default versioned
> symbols (with "@@") in map. However, it did not enable
> normal versioned symbol (with "@") for perf-probe.
> E.g.
> 
>   =
>   # ./perf probe -x /lib64/libc-2.25.so malloc_get_state
>   Failed to find symbol malloc_get_state in /usr/lib64/libc-2.25.so
> Error: Failed to add events.
>   =
> 
> This solves above issue by improving perf-probe symbol
> search function, as below.
> 
>   =
>   # ./perf probe -x /lib64/libc-2.25.so malloc_get_state
>   Added new event:
> probe_libc:malloc_get_state (on malloc_get_state in 
> /usr/lib64/libc-2.25.so)
> 
>   You can now use it in all perf tools, such as:
> 
> perf record -e probe_libc:malloc_get_state -aR sleep 1
> 
>   # ./perf probe -l
> probe_libc:malloc_get_state (on malloc_get_state@GLIBC_2.2.5 in 
> /usr/lib64/libc-2.25.so)
>   =
> 
> Signed-off-by: Masami Hiramatsu 
> ---
>  tools/perf/arch/powerpc/util/sym-handling.c |8 
>  tools/perf/util/probe-event.c   |   16 +++-
>  tools/perf/util/symbol.c|5 +
>  tools/perf/util/symbol.h|1 +
>  4 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
> b/tools/perf/arch/powerpc/util/sym-handling.c
> index 9c4e23d8c8ce..a3613c8d97b6 100644
> --- a/tools/perf/arch/powerpc/util/sym-handling.c
> +++ b/tools/perf/arch/powerpc/util/sym-handling.c
> @@ -64,6 +64,14 @@ int arch__compare_symbol_names_n(const char *namea, const 
> char *nameb,
> 
>   return strncmp(namea, nameb, n);
>  }
> +
> +const char *arch__normalize_symbol_name(const char *name)
> +{
> + /* Skip over initial dot */
> + if (*name == '.')
> + name++;
> + return name;
> +}
>  #endif
> 
>  #if defined(_CALL_ELF) && _CALL_ELF == 2
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index 959c4d2ef455..94acc5846e2a 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -2801,16 +2801,30 @@ static int find_probe_functions(struct map *map, char 
> *name,
>   int found = 0;
>   struct symbol *sym;
>   struct rb_node *tmp;
> + const char *norm, *ver;
> + char *buf = NULL;
> 
>   if (map__load(map) < 0)
>   return 0;
> 
>   map__for_each_symbol(map, sym, tmp) {
> - if (strglobmatch(sym->name, name)) {
> + norm = arch__normalize_symbol_name(sym->name);

The weak default function arch__normalize_symbol_name() simply returns
its parameter, so norm can be a NULL ptr when parameter sym->name is a 
NULL pointer.
However when sym->name is a NULL pointer (or can be one) then the
Powerpc version of arch__normalize_symbol_name() dereferences a NULL
ptr (by checking the symbol's first character is a '.').

> + if (!norm)
> + continue;
> +
> + /* We don't care about default symbol or not */
> + ver = strchr(norm, '@');
> + if (ver) {
> + buf = strndup(norm, ver - norm);
> + norm = buf;

if strndup() returns a NULL pointer (due to lack of memory)
then variable norm is a NULL ptr and strglobmatch() --> __match_glob()
derefenences that NULL pointer (1. parameter).

> + }
> + if (strglobmatch(norm, name)) {
>   found++;
>   if (syms && found < probe_conf.max_probes)
>   syms[found - 1] = sym;
>   }
> + if (buf)
> + zfree(&buf);
>   }
> 
>   return found;
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 1b67a8639dfe..cc065d4bfafc 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -94,6 +94,11 @@ static int prefix_underscores_count(const char *str)
>   return tail - str;
>  }
> 
> +const char * __weak arch__normalize_symbol_name(const char *name)
> +{
> + return name;
> +}
> +
>  int __weak arch__compare_symbol_names(const char *namea, const char *nameb)
>  {
>   return strcmp(namea, nameb);
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index a4f0075b4e5c..0563f33c1eb3 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -349,6 +349,7 @@ bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);
>  void arch__sym_update(struct symbol *s, GElf_Sym *sym);
>  #endif
> 
> +const char *arch__normalize_symbol_name(const char *name);
>  #define SYMBOL_A 0
>  #define SYMBOL_B 1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Thomas Richte

Re: [PATCH] perf annotate: Fix objdump comment parsing for Intel mov dissassembly

2017-11-29 Thread Thomas-Mich Richter

On 11/29/2017 02:24 PM, Ravi Bangoria wrote:
> 
> 
> On 11/28/2017 01:26 PM, Thomas Richter wrote:
>> The command 'perf annotate' parses the output of objdump and also
>> investigates the comments produced by objdump. For example the
>> output of objdump produces (on x86):
>>
>> 23eee:  4c 8b 3d 13 01 21 00 mov 0x210113(%rip),%r15
>> # 234008 
>>
>> and the function mov__parse() is called to investigate the complete
>> line. Mov__parse() breaks this line into several parts and finally
>> calls function comment__symbol() to parse the data after the comment
>> character '#'. Comment__symbol() expects a hexadecimal address followed
>> by a symbol in '<' and '>' brackets.
>>
>> However the 2nd parameter given to function comment__symbol()
>> always points to the comment character '#'. The address parsing
>> always returns 0 because the character '#' is not a digit and
>> strtoull() fails without being noticed.
>>
>> Fix this by advancing the second parameter to function comment__symbol()
>> by one byte before invocation and add an error check after strtoull()
>> has been called.
> 
> Yeah, looks like it fails to get correct value in 'addrp'.
> 
> Can you please show the difference in perf annotate output before
> and after patch.
> 
> Thanks,
> Ravi
> 


There is no difference in output of --stdio. The adress value is not
read and remains 0x0 in ops->source.addr or ops->target.addr.
That is not visible because in function mov__scnprintf() that wrong
address is not printed:

static int mov__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
{
return scnprintf(bf, size, "%-6.6s %s,%s", ins->name,
 ops->source.name ?: ops->source.raw,
 ops->target.name ?: ops->target.raw);
}


-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: Question to perf annotate handling mov ...(%rip) instructions

2017-11-29 Thread Thomas-Mich Richter

On 11/28/2017 03:50 PM, Arnaldo Carvalho de Melo wrote:

.

>>  
>> 60b4:   48 8b 05 35 cd 22 00mov 0x22cd35(%rip),%rax   # 232df0 
>> <__gmon_start__> 
>>
>> Commit 6de783b6f50f7f1db18a3fda0aa34b2e84b5771d ("perf annotate: Resolve 
>> symbols 
>> using objdump comment") added this support.
>>
>> Special code for Intel platform handles the mov  at address 60b4: 
>> This is dynamic linkage against the PLT. Function mov__parse() is always 
>> called 
>> to parse the objdump comment following the '#' character. 
>> However the function mov__scnprintf() to replace the text '0x22cd35(%rip)' 
>> by the 
>> target function name __gmon_start__ is only called in tui mode and not in 
>> stdio mode.
>>
>> Now to the confusion:
>> Function mov__parse() calls comment__symbol() which contains:
>>
>>static int comment__symbol(char *raw, char *comment, u64 *addrp, char 
>> **namep)
>>{
>> char *endptr, *name, *t;
>>
>> if (strstr(raw, "(%rip)") == NULL)
>> return 0;
>>
>> This is architecture specific and does not work for non-Intel platforms.
>>
>> I would like to fix perf annotate for s390x and above move instruction on 
>> s390x
>> is
>>  
>> 655a:   c0 10 00 01 9c eb   larl%r1,39f30 <__gmon_start__>
>>
>> There is a need to handle PLT resolution in an architecture independent way.
>>
>> Ideas and suggestions?
> 
> Some historical background there, busy now, but you seem to be on the
> right track and IIRC you already sent a patch for this, right? I'll try
> to look at it.

The two patches I sent are unrelated to this issue.

> Jiri may as well, since he worked a lot recently in this codebase, to
> refactor it some more to make it useful for annotating python code, perl
> next, other scripted languages should follow too.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf test: fix test case probe libc's inet_pton on s390x

2017-11-14 Thread Thomas-Mich Richter

On 11/14/2017 02:47 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 14, 2017 at 10:34:09AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Different name, same contents, need to look at the inode... ;-\
>>
>> Nah, lets ask the kernel how is it that it sees libc, please test the
>> following, works for me:

works for me too.

> 
> BTW, this is what I sticked on that cset:
> 
> Committer changes:
> 
> We can't really use ldd for libc, as in some systems, such as x86_64, it
> has hardlinks and then ldd sees one and the kernel the other, so grep
> for libc in /proc/self/maps to get the one we'll receive from
> PERF_RECORD_MMAP.
> 
> - Arnaldo
> 

Yes I am fine with this changes.
Go ahead.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCHv3] perf bpf: Fix endianness problem when loading parameters in prologue

2017-08-15 Thread Thomas-Mich Richter

On 08/15/2017 05:25 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Aug 15, 2017 at 11:21:59AM +0200, Thomas Richter escreveu:
> 
> Ok, I'm applying this, the only missing bit was the following line,
> right at the start of the patch message body;
> 
> From: Wang Nan 
> 
> To state that the patch was wrote by Wang, the fact that you
> participated in it with some adjustment is implied by having you sign
> off the patch, ok?
> 
> - Arnaldo


Thanks Arnaldo,

will /hopefully) remember next time.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCHv2] perf bpf: Fix endianness problem when loading parameters in prologue

2017-08-14 Thread Thomas-Mich Richter

On 08/14/2017 06:39 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Aug 14, 2017 at 01:46:44PM +0200, Thomas Richter escreveu:
>> Perf BPF prologue generator unconditionally fetches 8 bytes for function
>> parameters, which causes problem on big endian machine. Thomas gives a
>> detail analysis for this problem:
>>
>>  
>> http://lkml.kernel.org/r/968ebda5-abe4-8830-8d69-49f62529d...@linux.vnet.ibm.com
>>
>> This patch parses the type of each argument and converts data from
>> memory to expected type.
>>
>> Now the test runs successfully on 4.13.0-rc5:
>> [root@s8360046 perf]# ./perf test  bpf
>> 38: BPF filter :
>> 38.1: Basic BPF filtering  : Ok
>> 38.2: BPF pinning  : Ok
>> 38.3: BPF prologue generation  : Ok
>> 38.4: BPF relocation checker   : Ok
>> [root@s8360046 perf]#
>>
>> Signed-off-by: Thomas Richter 
>> Signed-off-by: Wang Nan 
>> Acked-by: Wang Nan 
>> Tested-by: Wang Nan 
> 
> 
> That is strange, who is the author of the patch? Also I think Tested-by
> is enough, being an even stronger form of Acked-by?
> 
> But then you also have Signed-off-by: Wang in there...
> 
> From Documentation/process/submitting-patches.rst:
> 
> -
> 
> 12) When to use Acked-by: and Cc:
> -
> 
> The Signed-off-by: tag indicates that the signer was involved in the
> development of the patch, or that he/she was in the patch's delivery path.
> 
> If a person was not directly involved in the preparation or handling of a
> patch but wishes to signify and record their approval of it then they can
> ask to have an Acked-by: line added to the patch's changelog.
> 
> Acked-by: is often used by the maintainer of the affected code when that
> maintainer neither contributed to nor forwarded the patch.
> 
> --
> 
> If Wang wrote the original patch and you made it better working together
> with him, probably having both of you in Signed-off-by lines should be
> enough?
> 
> - Arnaldo
> 

Ok, my fault then.
Wang wrote to patch in the first place, I just fixed one line.
Should I resend the patch and delete the Acked-by/Tested-by lines
in the commit message?

Thanks

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: perf bpf: Reworked fix endianness problem when loading parameters in prologue

2017-08-14 Thread Thomas-Mich Richter

On 08/14/2017 12:30 PM, Wangnan (F) wrote:
> Hi Thomas,
> 
> Your patch looks good to me. I've tested in my environment and it works.
> Please resend it to lkml and let Arnaldo to collect it.
> 
> Thank you.
> 

Will do, I take this as an Acked-by and Tested-by, ok?

Thanks
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf bpf: Fix endianness problem when loading parameters in prologue

2017-08-14 Thread Thomas-Mich Richter

On 08/11/2017 09:23 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Aug 11, 2017 at 06:47:56PM +0800, Wangnan (F) escreveu:
>> Hi Thomas,
>>
>> Please try this patch on your machine and give me the result.
> 
> Right, I'm waiting for test results for the last two patches from Wang:
> 
>   (3.0K) [PATCH] perf bpf: Fix endianness problem when loading parameters 
> in prologue

This patch does not solve the issue completely. It solves load of parameter 1 
which is
loaded from 132(%r2):x32 from memory via function BPF_FUNC_probe_read.

How another issue is introduced with register r3:
The probe_events: p:perf_bpf_probe/func _text+8769704 f_mode=+132(%r2):x32 
   offset=%r3:s64 orig=%r4:s32
Now r4 does not have any offset at all and is not read from memory via
function BPF_FUNC_probe_read. The value is instead stored in a 8 byte register.

When I look into the generated eBPF bytes code I see

   (1) r3 = *(r6 + 56)  <-- R3 loaded from ctx pointer + 56
which is value of register r3 (origin of lseek)
   
   (2) *(r10 - 24) = r3  <-- R3 is stored on stack position for function call)
   ...
   (3) r3 = *(r10 - 8)  <-- load f_mode into r3
   (4) r4 = *(r10 - 16) <-- load offset into r4
   (5) r5 = *(r10 - 24) <-- load origin into r5

I have provided a reworked patch to solve this issue. Please review it because
I do not know if it will work in all cases.
See below.


>   (1.2K) [PATCH] perf test llvm: Fix f_mode endianness problem
> 
> Thanks,
> 
> - Arnaldo
> 
>> Thank you.
>>
>> On 2017/8/13 2:49, Wang Nan wrote:
>>> Perf BPF prologue generator unconditionally fetches 8 bytes for function
>>> parameters, which causes problem on big endian machine. Thomas gives a
>>> detail analysis for this problem:
>>>
>>>   
>>> http://lkml.kernel.org/r/968ebda5-abe4-8830-8d69-49f62529d...@linux.vnet.ibm.com
>>>
>>> This patch parses the type of each argument and converts data from
>>> memory to expected type.
>>>
>>> Signed-off-by: Wang Nan 
>>> Cc: Arnaldo Carvalho de Melo 
>>> Cc: Thomas Richter 
>>> Cc: Alexei Starovoitov 
>>> Cc: Hendrik Brueckner 
>>> Cc: Li Zefan 
>>> ---
>>>   tools/perf/tests/bpf-script-test-prologue.c |  4 ++-
>>>   tools/perf/util/bpf-prologue.c  | 49 
>>> +++--
>>>   2 files changed, 50 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/perf/tests/bpf-script-test-prologue.c 
>>> b/tools/perf/tests/bpf-script-test-prologue.c
>>> index b4ebc75..43f1e16 100644
>>> --- a/tools/perf/tests/bpf-script-test-prologue.c
>>> +++ b/tools/perf/tests/bpf-script-test-prologue.c
>>> @@ -26,9 +26,11 @@ static void (*bpf_trace_printk)(const char *fmt, int 
>>> fmt_size, ...) =
>>> (void *) 6;
>>>   SEC("func=null_lseek file->f_mode offset orig")
>>> -int bpf_func__null_lseek(void *ctx, int err, unsigned long f_mode,
>>> +int bpf_func__null_lseek(void *ctx, int err, unsigned long _f_mode,
>>>  unsigned long offset, unsigned long orig)
>>>   {
>>> +   fmode_t f_mode = (fmode_t)_f_mode;
>>> +
>>> if (err)
>>> return 0;
>>> if (f_mode & FMODE_WRITE)
>>> diff --git a/tools/perf/util/bpf-prologue.c b/tools/perf/util/bpf-prologue.c
>>> index 6cdbee1..ce28993 100644
>>> --- a/tools/perf/util/bpf-prologue.c
>>> +++ b/tools/perf/util/bpf-prologue.c
>>> @@ -57,6 +57,46 @@ check_pos(struct bpf_insn_pos *pos)
>>> return 0;
>>>   }
>>> +/*
>>> + * Convert type string (u8/u16/u32/u64/s8/s16/s32/s64 ..., see
>>> + * Documentation/trace/kprobetrace.txt) to size field of BPF_LDX_MEM
>>> + * instruction (BPF_{B,H,W,DW}).
>>> + */
>>> +static int
>>> +argtype_to_ldx_size(const char *type)
>>> +{
>>> +   int arg_size = type ? atoi(&type[1]) : 64;
>>> +
>>> +   switch (arg_size) {
>>> +   case 8:
>>> +   return BPF_B;
>>> +   case 16:
>>> +   return BPF_H;
>>> +   case 32:
>>> +   return BPF_W;
>>> +   case 64:
>>> +   default:
>>> +   return BPF_DW;
>>> +   }
>>> +}
>>> +
>>> +static const char *
>>> +insn_sz_to_str(int insn_sz)
>>> +{
>>> +   switch (insn_sz) {
>>> +   case BPF_B:
>>> +   return "BPF_B";
>>> +   case BPF_H:
>>> +   return "BPF_H";
>>> +   case BPF_W:
>>> +   return "BPF_W";
>>> +   case BPF_DW:
>>> +   return "BPF_DW";
>>> +   default:
>>> +   return "UNKNOWN";
>>> +   }
>>> +}
>>> +
>>>   /* Give it a shorter name */
>>>   #define ins(i, p) append_insn((i), (p))
>>> @@ -257,9 +297,14 @@ gen_prologue_slowpath(struct bpf_insn_pos *pos,
>>> }
>>> /* Final pass: read to registers */
>>> -   for (i = 0; i < nargs; i++)
>>> -   ins(BPF_LDX_MEM(BPF_DW, BPF_PROLOGUE_START_ARG_REG + i,
>>> +   for (i = 0; i < nargs; i++) {
>>> +   int insn_sz = argtype_to_ldx_size(args[i].type);

int insn_sz = (args[i].ref) ? argtype_to_ldx_size(args[i].type) 
: BPF_DW;

>>> +
>>> +   pr_debug("prologue: load arg %d, insn_sz is %s\n",
>>> +i, insn_sz_to_str(insn_sz));
>>> +

Re: perf test BPF subtest bpf-prologue test fails on s390x

2017-08-11 Thread Thomas-Mich Richter

On 08/11/2017 11:57 AM, Wangnan (F) wrote:
> 
> 
> On 2017/8/11 17:17, Thomas-Mich Richter wrote:
>> On 08/11/2017 07:19 AM, Wangnan (F) wrote:


[]

> Your analysis is correct.
> 
>>
>> Maybe the solution is to add an endianness convertion into the gen_read_mem()
>> function.
> 
> That means alert the prologue. We can utilize the size field in BPF_LDX_MEM.
> 
> Could you please try this patch:
> 
> diff --git a/tools/perf/util/bpf-prologue.c b/tools/perf/util/bpf-prologue.c
> index 6cdbee1..aadbda4 100644
> --- a/tools/perf/util/bpf-prologue.c
> +++ b/tools/perf/util/bpf-prologue.c
> @@ -258,7 +258,7 @@ gen_prologue_slowpath(struct bpf_insn_pos *pos,
> 
> /* Final pass: read to registers */
> for (i = 0; i < nargs; i++)
> -   ins(BPF_LDX_MEM(BPF_DW, BPF_PROLOGUE_START_ARG_REG + i,
> +   ins(BPF_LDX_MEM(BPF_W, BPF_PROLOGUE_START_ARG_REG + i,
> BPF_REG_FP, -BPF_REG_SIZE * (i + 1)), pos);
> 
> ins(BPF_JMP_IMM(BPF_JA, BPF_REG_0, 0, JMP_TO_SUCCESS_CODE), pos);
> 
> 
> On your platform and check if the value of f_mode is correct?
> 
> NOTE: 64 bit load becomes incorrect after applying this patch. I want to make 
> sure
> the behavior of 32 bit BPF load instruction works correct on big endian 
> machine.
> 
> Thank you.
> 

This patch solves the issue with the first parameter. Now I see the correct
value of f_mode member of struct-file.
As you said the other loads are wrong. Here is the output of the 
trace:

   perf-13438 [000] d..2  2102.392312: : f_mode 6001f offset:0 orig:0
perf-13438 [000] d..2  2102.392414: : f_mode 2001d offset:0 orig:0
perf-13438 [000] d..2  2102.392440: : f_mode 2001d offset:0 orig:0
perf-13438 [000] d..2  2102.392453: : f_mode 6001f offset:0 orig:0

PS: When I ran this test I undid the changes to 
tests/bpf-test-script-prologue.c.
It was in its original state when this test was executed.


-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Fwd: Re: perf test BPF subtest bpf-prologue test fails on s390x

2017-08-11 Thread Thomas-Mich Richter


On 08/11/2017 07:19 AM, Wangnan (F) wrote:
> 
> 
> On 2017/8/11 2:13, Arnaldo Carvalho de Melo wrote:
>> Thomas, please try to find who wrote that test and CC him, I'm doing it
>> this time, Wang, can you please take a look at this?
>>
>> I also added lkml to the CC list, here we have more users of perf, lkml
>> is the more developer centric perf list, as perf touches way too many
>> places in the kernel.
>>
>> Best regards,
>>
>> - Arnaldo
>>
>> Em Wed, Aug 09, 2017 at 11:24:19AM +0200, Thomas Richter escreveu:
>>> I investigate perf test BPF for s390x and have a question regarding
>>> the 38.3 subtest (bpf-prologue test) which fails on s390x.
>>>
>>> When I turn on trace_printk in tests/bpf-script-test-prologue.c
>>> I see this output in /sys/kernel/debug/tracing/trace:
>>>
>>> [root@s8360047 perf]# cat /sys/kernel/debug/tracing/trace
>>> perf-30229 [000] d..2 170161.535791: : f_mode 2001d offset:0 orig:0
>>> perf-30229 [000] d..2 170161.535809: : f_mode 6001f offset:0 orig:0
>>> perf-30229 [000] d..2 170161.535815: : f_mode 6001f offset:1 orig:0
>>> perf-30229 [000] d..2 170161.535819: : f_mode 2001d offset:1 orig:0
>>> perf-30229 [000] d..2 170161.535822: : f_mode 2001d offset:2 orig:1
>>> perf-30229 [000] d..2 170161.535825: : f_mode 6001f offset:2 orig:1
>>> perf-30229 [000] d..2 170161.535828: : f_mode 6001f offset:3 orig:1
>>> perf-30229 [000] d..2 170161.535832: : f_mode 2001d offset:3 orig:1
>>> perf-30229 [000] d..2 170161.535835: : f_mode 2001d offset:4 orig:0
>>> perf-30229 [000] d..2 170161.535841: : f_mode 6001f offset:4 orig:0
>>>
>>> [...]
>>>
>>> There are 3 parameters the eBPF program tests/bpf-script-test-prologue.c
>>> accesses: f_mode (member of struct file at offset 140) offset and orig.
>>> They are parameters of the lseek() system call triggered in this
>>> test case in function llseek_loop().
>>>
>>> What is really strange is the value of f_mode. It is an 8 byte
>>> value, whereas in the probe event it is defined as a 4 byte value.
>>> The lower 4 bytes are all zero and do not belong to member f_mode.
>>> The correct value should be 2001d for read-only and 6001f for
>>> read-write open mode.
>>>
>>> Here is the output of the 'perf test -vv bpf' trace:
>>> Try to find probe point from debuginfo.
>>> Matched function: null_lseek [2d9310d]
>>> Probe point found: null_lseek+0
>>> Searching 'file' variable in context.
>>> Converting variable file into trace event.
>>> converting f_mode in file
>>> f_mode type is unsigned int.
>>> Opening /sys/kernel/debug/tracing//README write=0
>>> Searching 'offset' variable in context.
>>> Converting variable offset into trace event.
>>> offset type is long long int.
>>> Searching 'orig' variable in context.
>>> Converting variable orig into trace event.
>>> orig type is int.
>>> Found 1 probe_trace_events.
>>> Opening /sys/kernel/debug/tracing//kprobe_events write=1
>>> Writing event: p:perf_bpf_probe/func _text+8794224 f_mode=+140(%r2):x32
>>> offset=%r3:s64 orig=%r4:s32
> 
> Thank you for your information. This is an endianess problem.
> 
> Here f_mode is x32, so perf probe and debuginfo in vmlinux is correct.
> However, BPF prologue generator doesn't obey type, but unconditionally
> fetch 8 bytes (on 64-bit machine) and pass it to parameter of
> bpf_func__null_lseek. This is reasonable, because all args should be
> unsigned long. However, to recover its original value, we need a casting.
> 
> Please help me verify if the following fix works on your platform:
> 
> diff --git a/tools/perf/tests/bpf-script-test-prologue.c 
> b/tools/perf/tests/bpf-script-test-prologue.c
> index b4ebc75..43f1e16 100644
> --- a/tools/perf/tests/bpf-script-test-prologue.c
> +++ b/tools/perf/tests/bpf-script-test-prologue.c
> @@ -26,9 +26,11 @@ static void (*bpf_trace_printk)(const char *fmt, int 
> fmt_size, ...) =
> (void *) 6;
> 
>  SEC("func=null_lseek file->f_mode offset orig")
> -int bpf_func__null_lseek(void *ctx, int err, unsigned long f_mode,
> +int bpf_func__null_lseek(void *ctx, int err, unsigned long _f_mode,
>  unsigned long offset, unsigned long orig)
>  {
> +   fmode_t f_mode = (fmode_t)_f_mode;
> +
> if (err)
> return 0;
> if (f_mode & FMODE_WRITE)
> 
> Thank you.
> 

Thanks for this patch, unfortunately it does not work. The result is the same
as before. Here is the trace output from /sys/kernel/debug/tracing/trace:   
 
   perf-18119 [001] d..2 66832.765154: : f_mode 0 offset:0 orig:0
   perf-18119 [001] d..2 66832.765189: : f_mode 0 offset:0 orig:0
   perf-18119 [001] d..2 66832.765203: : f_mode 0 offset:1 orig:0
   perf-18119 [001] d..2 66832.765216: : f_mode 0 offset:1 orig:0
   perf-18119 [001] d..2 66832.765229: : f_mode 0 offset:2 orig:1
   perf-18119 [001] d..2 66832.765242: : f_mode 0 offset:2 orig:1

However w

Re: perf test BPF subtest bpf-prologue test fails on s390x

2017-08-11 Thread Thomas-Mich Richter

On 08/11/2017 07:19 AM, Wangnan (F) wrote:
> 
> 
> On 2017/8/11 2:13, Arnaldo Carvalho de Melo wrote:
>> Thomas, please try to find who wrote that test and CC him, I'm doing it
>> this time, Wang, can you please take a look at this?
>>
>> I also added lkml to the CC list, here we have more users of perf, lkml
>> is the more developer centric perf list, as perf touches way too many
>> places in the kernel.
>>
>> Best regards,
>>
>> - Arnaldo
>>
>> Em Wed, Aug 09, 2017 at 11:24:19AM +0200, Thomas Richter escreveu:
>>> I investigate perf test BPF for s390x and have a question regarding
>>> the 38.3 subtest (bpf-prologue test) which fails on s390x.
>>>
>>> When I turn on trace_printk in tests/bpf-script-test-prologue.c
>>> I see this output in /sys/kernel/debug/tracing/trace:
>>>
>>> [root@s8360047 perf]# cat /sys/kernel/debug/tracing/trace
>>> perf-30229 [000] d..2 170161.535791: : f_mode 2001d offset:0 orig:0
>>> perf-30229 [000] d..2 170161.535809: : f_mode 6001f offset:0 orig:0
>>> perf-30229 [000] d..2 170161.535815: : f_mode 6001f offset:1 orig:0
>>> perf-30229 [000] d..2 170161.535819: : f_mode 2001d offset:1 orig:0
>>> perf-30229 [000] d..2 170161.535822: : f_mode 2001d offset:2 orig:1
>>> perf-30229 [000] d..2 170161.535825: : f_mode 6001f offset:2 orig:1
>>> perf-30229 [000] d..2 170161.535828: : f_mode 6001f offset:3 orig:1
>>> perf-30229 [000] d..2 170161.535832: : f_mode 2001d offset:3 orig:1
>>> perf-30229 [000] d..2 170161.535835: : f_mode 2001d offset:4 orig:0
>>> perf-30229 [000] d..2 170161.535841: : f_mode 6001f offset:4 orig:0
>>>
>>> [...]
>>>
>>> There are 3 parameters the eBPF program tests/bpf-script-test-prologue.c
>>> accesses: f_mode (member of struct file at offset 140) offset and orig.
>>> They are parameters of the lseek() system call triggered in this
>>> test case in function llseek_loop().
>>>
>>> What is really strange is the value of f_mode. It is an 8 byte
>>> value, whereas in the probe event it is defined as a 4 byte value.
>>> The lower 4 bytes are all zero and do not belong to member f_mode.
>>> The correct value should be 2001d for read-only and 6001f for
>>> read-write open mode.
>>>
>>> Here is the output of the 'perf test -vv bpf' trace:
>>> Try to find probe point from debuginfo.
>>> Matched function: null_lseek [2d9310d]
>>> Probe point found: null_lseek+0
>>> Searching 'file' variable in context.
>>> Converting variable file into trace event.
>>> converting f_mode in file
>>> f_mode type is unsigned int.
>>> Opening /sys/kernel/debug/tracing//README write=0
>>> Searching 'offset' variable in context.
>>> Converting variable offset into trace event.
>>> offset type is long long int.
>>> Searching 'orig' variable in context.
>>> Converting variable orig into trace event.
>>> orig type is int.
>>> Found 1 probe_trace_events.
>>> Opening /sys/kernel/debug/tracing//kprobe_events write=1
>>> Writing event: p:perf_bpf_probe/func _text+8794224 f_mode=+140(%r2):x32
>>> offset=%r3:s64 orig=%r4:s32
> 
> Thank you for your information. This is an endianess problem.
> 
> Here f_mode is x32, so perf probe and debuginfo in vmlinux is correct.
> However, BPF prologue generator doesn't obey type, but unconditionally
> fetch 8 bytes (on 64-bit machine) and pass it to parameter of
> bpf_func__null_lseek. This is reasonable, because all args should be
> unsigned long. However, to recover its original value, we need a casting.
> 
> Please help me verify if the following fix works on your platform:
> 
> diff --git a/tools/perf/tests/bpf-script-test-prologue.c 
> b/tools/perf/tests/bpf-script-test-prologue.c
> index b4ebc75..43f1e16 100644
> --- a/tools/perf/tests/bpf-script-test-prologue.c
> +++ b/tools/perf/tests/bpf-script-test-prologue.c
> @@ -26,9 +26,11 @@ static void (*bpf_trace_printk)(const char *fmt, int 
> fmt_size, ...) =
> (void *) 6;
> 
>  SEC("func=null_lseek file->f_mode offset orig")
> -int bpf_func__null_lseek(void *ctx, int err, unsigned long f_mode,
> +int bpf_func__null_lseek(void *ctx, int err, unsigned long _f_mode,
>  unsigned long offset, unsigned long orig)
>  {
> +   fmode_t f_mode = (fmode_t)_f_mode;
> +
> if (err)
> return 0;
> if (f_mode & FMODE_WRITE)
> 
> Thank you.
> 

Thanks for this patch, unfortunately it does not work. The result is the same
as before. Here is the trace output from /sys/kernel/debug/tracing/trace:   
 
   perf-18119 [001] d..2 66832.765154: : f_mode 0 offset:0 orig:0
   perf-18119 [001] d..2 66832.765189: : f_mode 0 offset:0 orig:0
   perf-18119 [001] d..2 66832.765203: : f_mode 0 offset:1 orig:0
   perf-18119 [001] d..2 66832.765216: : f_mode 0 offset:1 orig:0
   perf-18119 [001] d..2 66832.765229: : f_mode 0 offset:2 orig:1
   perf-18119 [001] d..2 66832.765242: : f_mode 0 offset:2 orig:1

However wh

Re: Fwd: struct pt_regs missing in /usr/include/ tree for eBPF program compile

2017-08-08 Thread Thomas-Mich Richter

On 08/04/2017 05:28 PM, Daniel Borkmann wrote:
> From: Thomas-Mich Richter 
> Date: Wed, Aug 2, 2017 at 1:22 AM
> [...]
>> I work on the perf tool and its bpf support for IBM s390 and came across a
>> strange issue compiling tools/testing/selftests/bpf/test_verifier.c on s390x.
>>
>> This is the compile error:
>> gcc -Wall -O2 -I../../../include/uapi -I../../../lib
>> -I../../../../include/generated
>>-DHAVE_GENHDR -I../../../includetest_verifier.c
>>/root/linux-devel/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -o
>>/root/linux-devel/tools/testing/selftests/bpf/test_verifier
>> In file included from test_verifier.c:63:0:
>> ../../../include/uapi/linux/bpf_perf_event.h:14:17: error: field ‘regs’ has
>>incomplete type struct pt_regs regs;
> 
> I actually came across the same issue today on s390
> while testing for something else.
> 
>> This shows up in test case "unpriv: spill/fill of different pointers ldx"
>> at line 1811.
>> This issue is located in file /usr/include/linux/bpf_perf_event.h which is a
>> copy of the linux kernels include/uapi/linux/bpf_perf_event.h.
>>
>> It contains:
>> struct bpf_perf_event_data {
>>  struct pt_regs regs;
>>  __u64 sample_period;
>> };
> 
> Yeah, correct.
> 
>> On s390 struct pt_regs is not exported to user space and does not appear
>> anywhere in /usr/include.
>> How about other architectures beside Intel?
>> As far as I know
>> 1. the struct pt_regs contains only kernel registers, no user space 
>> registers?
>> 2. Is part of the kernel API and should not be exported at all?
> 
> Looking into the tree, it appears that the following archs
> export a definition of struct pt_regs as uapi typically in
> arch/*/include/uapi/asm/ptrace.h: x86, sparc, power, mips,
> microblaze, alpha, unicore32, parisc, score, sh, mn10300,
> tile, m68k, m32r, ia64, hexagon, h8300, frv, cris, c6x.
> And for these I couldn't find it: s390, arc, arm64, nios2.
> 
> Anyone knows if there's any guidance on whether they must
> be exported or not? It appears at least the majority of
> archs is exporting them in one way or another.
> 
> Looking at 2dbb4c05d048 ("bpf/samples: Fix PT_REGS_IP on
> s390x and use it") and d912557b3460 ("samples: bpf: enable
> trace samples for s390x"), this was added by Michael for
> the programs themselves, which were using kernel headers
> for walking structs in BPF tracing programs, so a bit
> unrelated to the uapi issue actually, but given the
> test_verifier has couple of test cases containing pt_regs,
> they should probably do the same thing and be using kernel
> headers given tracing programs inspect kernel-internal
> data structures typically (see BPF tracing samples).

I have looked at 2dbb4c05d048 ("bpf/samples: Fix PT_REGS_IP on
> s390x and use it"). This usage scenario is a bit different.
True it requires access to individual registers via context
pointer stored in register R1. The context pointer is
of type struct bpf_perf_event_data and the first member
is of type struct pt_regs.

The big difference is the compilation of samples/bpf/sampleip_kern.c.
It is built inside the kernel root directory .../linux and includes
./arch/s390/include/asm/ptrace.h (which defines struct pt_regs).

This is different from compiling 
tools/testing/selftests/bpf/test_verifier.c. This compilation
does not happen inside the kernel root directory /linux
and thus needs some type of struct pt_regs definition in 
user space.

Interestingly the test_verifier.c only needs the
size of the struct pt_regs. Both failing lines of code use
offset_of:

  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
-(__s32)offsetof(struct bpf_perf_event_data,
 sample_period) - 8)),
which in fact subtracts the sizeof (struct pt_regs) from R2.

> 
> Now, I would like to avoid going down that road to pull
> in kernel internal headers into test_verifier.c, could
> we instead add a bpf_ptregs.h helper in tools/testing/selftests/bpf/,
> where s390 and arm64 would put a definition to fallback when
> otherwise not available? Admittedly, it's a bit of a hack
> if exporting them is not an option, but 'normal' tracing
> progs would consult kernel headers anyway. Thoughts?
> 

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

struct pt_regs missing in /usr/include/ tree for eBPF program compile

2017-08-02 Thread Thomas-Mich Richter


I work on the perf tool and its bpf support for IBM s390 and came across a
strange issue compiling tools/testing/selftests/bpf/test_verifier.c on s390x.

This is the compile error:
gcc -Wall -O2 -I../../../include/uapi -I../../../lib 
-I../../../../include/generated
  -DHAVE_GENHDR -I../../../includetest_verifier.c
  /root/linux-devel/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -o
  /root/linux-devel/tools/testing/selftests/bpf/test_verifier
In file included from test_verifier.c:63:0:
../../../include/uapi/linux/bpf_perf_event.h:14:17: error: field ‘regs’ has
  incomplete type struct pt_regs regs;

This shows up in test case "unpriv: spill/fill of different pointers ldx"
at line 1811.
This issue is located in file /usr/include/linux/bpf_perf_event.h which is a
copy of the linux kernels include/uapi/linux/bpf_perf_event.h.

It contains:
struct bpf_perf_event_data {
struct pt_regs regs;
__u64 sample_period;
};

On s390 struct pt_regs is not exported to user space and does not appear
anywhere in /usr/include.
How about other architectures beside Intel?
As far as I know 
1. the struct pt_regs contains only kernel registers, no user space registers?
2. Is part of the kernel API and should not be exported at all?

When I investigated the kernel side of the bpf() system call, the test case ends
up in functions pe_prog_is_valid_access() and pe_prog_convert_ctx_access()
via syscall(bpf)
+--> bpf_prog_load()
 +--> find_prog_type() to load eBPF type specific verifiers
 |   pe_prog_is_valid_access() and pe_prog_convert_ctx_access()
 +--> bpf_check() to verify (and modify) the eBPF
  +--> check_vfg()
   +--> do_check()
+--> check_xadd()
 +--> check_mem_access()
  +--> check_ctx_access()
   +--> env->prog->aux->ops->is_valid_access
which is set to
pe_prog_is_valid_access()

Now this last function expects and verifies struct pt_regs via struct member
offsets which needs a correct struct pt_regs previously setup by user space
eBPF program.

This also requires a correct struct pt_regs in 
/usr/include/linux/bpf_perf_event.h
(which includes /usr/include/{linux,asm,sym}/ptrace.h

How to achieve this on a platform which does not export struct pt_regs to the
user?

Thanks a lot for your help.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: perf report does not resolve symbols on s390x

2017-07-12 Thread Thomas-Mich Richter

On 07/11/2017 09:48 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jul 11, 2017 at 04:38:28PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Tue, Jul 11, 2017 at 04:03:04PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Fri, Jul 07, 2017 at 02:17:25PM +0200, Thomas-Mich Richter escreveu:
.

> 
>> Argh, this is also broken:
> 
>> static inline bool machine__kernel_ip(struct machine *machine, u64 ip)
>> {
>> u64 kernel_start = machine__kernel_start(machine);
>>
>> return ip >= kernel_start;
>> }
>>
>> We can't judge if a address is in the kernel like that :-\
> 
> So, this is used by:
> 
> [acme@jouet linux]$ find tools/ -name "*.[ch]" | xargs grep -w 
> machine__kernel_ip
> tools/perf/builtin-script.c:  kernel = machine__kernel_ip(machine, start);
> tools/perf/builtin-script.c:  if (kernel != machine__kernel_ip(machine, end)) 
> {
> 
> That is just for "brstackinsn", would that make sense for Sparc, S/390?

No we don't have that on s/390
> 
> tools/perf/util/intel-bts.c:  if (machine__kernel_ip(machine, ip))
> tools/perf/util/intel-bts.c:  if 
> (!machine__kernel_ip(btsq->bts->machine, branch->from) &&
> tools/perf/util/intel-bts.c:  
> machine__kernel_ip(btsq->bts->machine, branch->to) &&
> 
> Intel specific stuff, so should be ok.
> 
> tools/perf/util/event.c:  machine__kernel_ip(machine, 
> al->addr)) {
> 
> For this last one, that affects all arches, I think we can just remove
> this check and look at the kernel when not finding it anywhere else?
> 
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index dc5c3bb69d73..8e435baaae6a 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -1432,8 +1432,7 @@ void thread__find_addr_map(struct thread *thread, u8 
> cpumode,
>* in the whole kernel symbol list.
>*/
>   if (cpumode == PERF_RECORD_MISC_USER && machine &&
> - mg != &machine->kmaps &&
> - machine__kernel_ip(machine, al->addr)) {
> + mg != &machine->kmaps) {
>   mg = &machine->kmaps;
>   load_map = true;
>   goto try_again;
> 
>>>> This raises 2 questions:
>>>> 1. s390 has a 64 bit address space for user and kernel. The processor 
>>>> status word (PSW)
>>>>determines which address space to use. That requires the PSW in the 
>>>> sample. Not sure
>>>>this is the case?
>>>> 2. How does this work on sparc and other architectures with the same 
>>>> addressing scheme?
>>>>
>>>> Thanks.
>>>> --
>>>> Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v2 tip/perf/core 1/6] perf symbols: find symbols in different mount namespace

2017-07-09 Thread Thomas-Mich Richter

On 07/07/2017 09:36 PM, Krister Johansen wrote:
> On Thu, Jul 06, 2017 at 04:41:30PM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Wed, Jul 05, 2017 at 06:48:08PM -0700, Krister Johansen escreveu:
>>> Teach perf how to resolve symbols from binaries that are in a different
>>> mount namespace from the tool.  This allows perf to generate meaningful
>>> stack traces even if the binary resides in a different mount namespace
>>> from the tool.
>>
>> I was trying to find a way to test after applying each of the patches in
>> this series, when it ocurred to me that if a process that appears on a
>> perf.data file has exit, how can we access /proc/%ITS_PID/something?
> 
> You're correct.  We can't access /proc//whatever once the process
> has exited.  That was the impeteus for patches 4 and 6, which allow us
> to capture the binary (and debuginfo, if it exists) into the buildid
> cache so that if we do have a trace that exists after a process or
> container exists, we'll still be able to resolve some of the symbols.
> 
> -K
> 

Any ideas on how to extend this to be able to resolve symbols after
the process/container exited?
I believe it boils down on how to interpret the mnt inode number in the
PERF_RECORD_NAMESPACE record...
Can this be done post-mortem? Maybe the PERF_RECORD_NAMESPACE record
has to contain more data than just the inode number?

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf: fix perf test case 14 result reporting

2017-06-05 Thread Thomas-Mich Richter

On 06/02/2017 04:11 PM, Arnaldo Carvalho de Melo wrote:

[]

>>
>> If you have specific patches in Jiri's branch that you think are good to
>> go, just point me to them and I'll cherry-pick them.
>>
>> I'm looking now at the one you pointed out above (070b9644981e).
> 
> Just looked, but the cset comment should state what is the problem and
> how it is solved, right now it has just a one line summary :-\
> 
> - Arnaldo
> 

Looks like a misunderstanding. When I clone Jiri's tree and checkout
branch perf/attr_test:

url = git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git

oc2666213455 12 $ git branch
  master
* perf/attr_test
  perf/stat
oc2666213455 13 $ git show 070b9644981e
commit 070b9644981e2dd160a6aae2723d0ec2d8b4c0b5
Author: Jiri Olsa 
Date:   Fri Mar 3 15:59:45 2017 +0100

perf tests attr: Do not store failed events

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index 0dd7749..0f2b619 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -138,7 +138,7 @@ void test_attr__open(struct perf_event_attr *attr, pid_t 
pid, int cpu,
 {
int errno_saved = errno;
 
-   if (store_event(attr, pid, cpu, fd, group_fd, flags))
+   if ((fd != -1) && store_event(attr, pid, cpu, fd, group_fd, flags))
die("test attr FAILED");
 
errno = errno_saved;
oc2666213455 14 $ 

I get this commit which is another fix for an issue I discovered last week
while working on test case 14.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH] perf: fix perf test case 14 result reporting

2017-06-02 Thread Thomas-Mich Richter

On 06/01/2017 11:04 PM, Jiri Olsa wrote:
> On Thu, Jun 01, 2017 at 10:20:38AM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Thu, Jun 01, 2017 at 02:34:41PM +0200, Thomas Richter escreveu:
>>> Command perf test -v 14 (Setup struct perf_event_attr test)
>>> always reports success even if the test case fails.
>>> It works correctly if you also specify -F (for don't fork).
>>
>> Thanks for working on this, adding Jiri Olsa, that wrote this test
>> harness, so that he can check and provide his Acked-by or Reviewed-by,
>> Jiri?
>>
>> - Arnaldo
>>  
>>>root@s35lp76 perf]# ./perf test -v 14
>>>14: Setup struct perf_event_attr   :
>>>--- start ---
>>>running './tests/attr/test-record-no-delay'
>>>[ perf record: Woken up 1 times to write data ]
>>>[ perf record: Captured and wrote 0.002 MB /tmp/tmp4E1h7R/perf.data
>>>  (1 samples) ]
>>>expected task=0, got 1
>>>expected precise_ip=0, got 3
>>>expected wakeup_events=1, got 0
>>>FAILED './tests/attr/test-record-no-delay' - match failure
>>>test child finished with 0
>>> end 
>>>Setup struct perf_event_attr: Ok
>>>
>>> The reason for the wrong error reporting is the return value of the
>>> system() library call. It is called in run_dir() file tests/attr.c
>>> and returns the exit status, in above case 0xff00.
>>> This value is given as parameter to the exit() function which
>>> can only handle values 0-0xff.
>>> The child process terminates with exit value of 0 and the parent
>>> does not detect any error.
>>>
>>> This patch corrects the error reporting and prints the
>>> correct test result.
>>>
>>> Signed-off-by: Thomas Richter 
>>> Reviewed-by: Hendrik Brueckner 
>>> ---
>>>  tools/perf/tests/attr.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
>>> index 88dc51f..131b510 100644
>>> --- a/tools/perf/tests/attr.c
>>> +++ b/tools/perf/tests/attr.c
>>> @@ -150,7 +150,7 @@ static int run_dir(const char *d, const char *perf)
>>> snprintf(cmd, 3*PATH_MAX, PYTHON " %s/attr.py -d %s/attr/ -p %s %.*s",
>>>  d, d, perf, vcnt, v);
>>>  
>>> -   return system(cmd);
>>> +   return system(cmd) ? TEST_FAIL : TEST_OK;
>>>  }
>>>  
>>>  int test__attr(int subtest __maybe_unused)
>>> -- 
>>> 2.9.3
> 
> seems ok, however "perf test attr" is broken ATM, since it wasn't updated
> for some time as it showed false 'Ok'
> 
> I started fixing it some time ago, but got distracted, if you are
> interested, you're welcome to pick up from my branch ;-)
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/attr_test
> 
> thanks,
> jirka
> 

I have started work on perf tool very recently to get s390 support
working and up to date.

I downloaded your branch and discovered you have already fixed
another issue I run into this week. For example 
commit 070b9644981e ("perf tests attr: Do not store failed events")

I can certainly help you to get this test case operational again.
Maybe you need to pull some of your patches out of your backlog
and submit them the kernel to get to a common base to work on.

I suggest we should move the discussion to the linux-perf-users
mailing list.

Your thoughts?

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

85 matches

Mail list logo