from:"Pawel Moll"

Re: [PATCHv2 1/2] bus: arm-ccn: Enable stats for CCN-512 interconnect

2019-10-16 Thread Pawel Moll

On Wed, 2019-10-16 at 10:06 +0100, Marek Bykowski wrote:
> Add compatible string for the ARM CCN-502 interconnect
> 
> Signed-off-by: Marek Bykowski 
> Signed-off-by: Boleslaw Malecki 
> ---
> Changelog v1->v2:
> - Change the subject to reflect where the driver got moved to (drivers/perf) 
> from
>   (drivers/bus).
> - Add credit to my work mate that helped me validate the counts from
>   the interconnect.
> ---
>  drivers/perf/arm-ccn.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
> index 7dd850e02f19..b6e00f35a448 100644
> --- a/drivers/perf/arm-ccn.c
> +++ b/drivers/perf/arm-ccn.c
> @@ -1545,6 +1545,7 @@ static int arm_ccn_remove(struct platform_device *pdev)
>  static const struct of_device_id arm_ccn_match[] = {
> { .compatible = "arm,ccn-502", },
> { .compatible = "arm,ccn-504", },
> +   { .compatible = "arm,ccn-512", },
> {},
>  };
>  MODULE_DEVICE_TABLE(of, arm_ccn_match);

Acked-by: Pawel Moll 

Thanks!

Pawel

Re: [PATCH] bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left.

2017-10-12 Thread Pawel Moll

On Wed, 2017-10-11 at 22:33 +0100, Kim Phillips wrote:
> Unregistering the driver before calling cpuhp_remove_multi_state()
> removes
> any remaining hotplug cpu instances so
> __cpuhp_remove_state_cpuslocked()
> doesn't emit this warning:

Cool, thanks!

I'm getting together a series of fixes now, so I'll get this merged
soon.

Pawel

Re: [PATCH v2 1/2] bus: arm-ccn: Fix use of smp_processor_id() in preemptible context

2017-10-04 Thread Pawel Moll

On Wed, 2017-10-04 at 13:32 +0100, Marc Zyngier wrote:
> 
> > Acked-by: Pawel Moll 
> > 
> > I assume you'll get this merged yourself? Or do you want me to
> relay
> > the CCN one (I've got a couple of other small changes to the driver
> in
> > the queue).
>
> I'd rather you take care of it if you already have a queue of
> patches.
> Otherwise, I'll bundle the two patches and ask the armsoc folks to do
> something with them.

Ok. I guess you can include CCI as well. Not today, not tomorrow, but
Friday or early next week.

Cheers!

Paweł
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: [PATCH v2 1/2] bus: arm-ccn: Fix use of smp_processor_id() in preemptible context

2017-10-04 Thread Pawel Moll

On Tue, 2017-10-03 at 18:14 +0100, Marc Zyngier wrote:
> Booting a DEBUG_PREEMPT enabled kernel on a CCN-based system
> results in the following splat:
>
> [...]
> arm-ccn e800.ccn: No access to interrupts, using timer.
> BUG: using smp_processor_id() in preemptible [] code:
> swapper/0/1
> caller is debug_smp_processor_id+0x1c/0x28
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.13.0 #6111
> Hardware name: AMD Seattle/Seattle, BIOS 17:08:23 Jun 26 2017
> Call trace:
> [] dump_backtrace+0x0/0x278
> [] show_stack+0x24/0x30
> [] dump_stack+0x8c/0xb0
> [] check_preemption_disabled+0xfc/0x100
> [] debug_smp_processor_id+0x1c/0x28
> [] arm_ccn_probe+0x358/0x4f0
> [...]
>
> as we use smp_processor_id() in the wrong context.
>
> Turn this into a get_cpu()/put_cpu() that extends over the CPU
> hotplug
> registration, making sure that we don't race against a CPU down
> operation.
>
> Signed-off-by: Marc Zyngier 

Acked-by: Pawel Moll 

I assume you'll get this merged yourself? Or do you want me to relay
the CCN one (I've got a couple of other small changes to the driver in
the queue).

Paweł
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: [PATCH] bus: arm-ccn: constify attribute_group structures.

2017-07-04 Thread Pawel Moll

On Mon, 2017-07-03 at 13:01 +0530, Arvind Yadav wrote:
> attribute_groups are not supposed to change at runtime. All functions
> working with attribute_groups provided by  work with const
> attribute_group. So mark the non-const structs as const.
> 
> File size before:
>    text      data bss dec hex filename
>    9074      5592 416   15082    3aea drivers/bus/arm-ccn.o
> 
> File size After adding 'const':
>    text      data bss dec hex filename
>    9327      5336 416   15079    3ae7 drivers/bus/arm-ccn.o
> 
> Signed-off-by: Arvind Yadav 

Looks fine to me. I'll queue it for the next time I push out CCN driver
fixes (no dates promised, though...)

Thanks!

Paweł

Re: [PATCH RESEND 0/2] Add support for ARM CCN-502 interconnect

2017-06-19 Thread Pawel Moll

On Fri, 2017-06-16 at 13:39 -0700, Scott Branden wrote:
> Arnd,
> 
> Should this patchset go through the ARM maintainers?

Unless something changed in the last months, these patches should go
through arm-soc. Arnd is probably just waiting for me to send a pull
request. And indeed shame on me that I still haven't got around doing
this. Promise to do so tomorrow.

Regards and apologies

Pawel

Re: [PATCH] bus: arm-ccn: Use devm_kcalloc() in arm_ccn_probe()

2017-04-19 Thread Pawel Moll

On Wed, 2017-04-19 at 09:29 +0200, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Wed, 19 Apr 2017 09:12:43 +0200
> 
> Multiplications for the size determination of memory allocations
> indicated that array data structures should be processed.
> Thus use the corresponding function "devm_kcalloc".
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Fine with me. I'll queue it with a couple other CCN driver changes.
Will probably get into 4.13.

Regards!

Pawel

Re: [PATCH 1/2] dt-bindings: arm-ccn: Add bindings info for CCN-502 compatible string

2017-02-20 Thread Pawel Moll

On Tue, 2017-02-14 at 17:48 +, Mark Rutland wrote:
> On Fri, Feb 10, 2017 at 12:42:47PM -0800, Velibor Markovski wrote:
> > 
> > Add CCN-502 to the list of supported devices by ARM CCN PMU driver.
> > 
> > Signed-off-by: Velibor Markovski 
> Acked-by: Mark Rutland 
> 
> I assume Pawel will take this along with the driver patch.

Will do, thanks!

Pawel

Re: [PATCH] bus: arm-ccn: Fix module autoload

2017-02-06 Thread Pawel Moll

On Fri, 2017-02-03 at 18:31 -0300, Javier Martinez Canillas wrote:
> On 01/02/2017 10:02 AM, Javier Martinez Canillas wrote:
> > 
> > If the driver is built as a module, autoload won't work because the module
> > alias information is not filled. So user-space can't match the registered
> > device with the corresponding module.
> > 
> > Export the module alias information using the MODULE_DEVICE_TABLE()
> > macro.
> > 
> > Before this patch:
> > 
> > $ modinfo drivers/bus/arm-ccn.ko | grep alias
> > $
> > 
> > After this patch:
> > 
> > $ modinfo drivers/bus/arm-ccn.ko | grep alias
> > alias:  of:N*T*Carm,ccn-504C*
> > alias:  of:N*T*Carm,ccn-504
> > 
> > Signed-off-by: Javier Martinez Canillas 
> > ---
> Any comments about this patch?

Sorry, I clearly missed that in my mailbox at the beginning of the
year.

It all makes sense to me. I'll try to test it out soon and will make
sure to include it next time I send out CCN update.

Thanks!

Paweł

Re: [PATCH] bus: arm-ccn: Fix module autoload

2017-02-06 Thread Pawel Moll

On Fri, 2017-02-03 at 18:31 -0300, Javier Martinez Canillas wrote:
> On 01/02/2017 10:02 AM, Javier Martinez Canillas wrote:
> >
> > If the driver is built as a module, autoload won't work because the module
> > alias information is not filled. So user-space can't match the registered
> > device with the corresponding module.
> >
> > Export the module alias information using the MODULE_DEVICE_TABLE()
> > macro.
> >
> > Before this patch:
> >
> > $ modinfo drivers/bus/arm-ccn.ko | grep alias
> > $
> >
> > After this patch:
> >
> > $ modinfo drivers/bus/arm-ccn.ko | grep alias
> > alias:  of:N*T*Carm,ccn-504C*
> > alias:  of:N*T*Carm,ccn-504
> >
> > Signed-off-by: Javier Martinez Canillas 
> > ---
> Any comments about this patch?

Sorry, I clearly missed that in my mailbox at the beginning of the
year.

It all makes sense to me. I'll try to test it out soon and will make
sure to include it next time I send out CCN update.

Thanks!

Paweł
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: [patch V2 23/67] bus/arm-ccn: Convert to hotplug statemachine

2016-07-18 Thread Pawel Moll

Dnia 2016-07-13, Wed o godzinie 17:16 +, Anna-Maria Gleixner pisze:
> From: Sebastian Andrzej Siewior 
> 
> Install the callbacks via the state machine and let the core invoke
> the callbacks on the already online CPUs.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Cc: Arnd Bergmann 
> Cc: Dan Carpenter 
> Cc: Linus Torvalds 
> Cc: Pawel Moll 
> Cc: Peter Zijlstra 
> Cc: Suzuki K Poulose 
> Cc: Thomas Gleixner 
> Signed-off-by: Anna-Maria Gleixner 

> @@ -1316,7 +1309,6 @@ static void arm_ccn_pmu_cleanup(struct a
>   ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id);
>  }
>  
> -
>  static int arm_ccn_for_each_valid_region(struct arm_ccn *ccn,
>   int (*callback)(struct arm_ccn *ccn, int region,
>   void __iomem *base, u32 type, u32 id))

This empty line doesn't impact hotplug, does it? ;-)

With this chunk dropped

Acked-by: Pawel Moll 

Cheers!

Pawel

Re: [patch 22/66] bus: arm-ccn: convert to hotplug statemachine

2016-07-12 Thread Pawel Moll

On Mon, 2016-07-11 at 12:28 +, Anna-Maria Gleixner wrote:
> @@ -1270,9 +1262,10 @@ static int arm_ccn_pmu_init(struct arm_c
>* ... and change the selection when it goes offline.
> Priority is
>* picked to have a chance to migrate events before perf is
> notified.
>*/
> - ccn->dt.cpu_nb.notifier_call = arm_ccn_pmu_cpu_notifier;
> - ccn->dt.cpu_nb.priority = CPU_PRI_PERF + 1,
> - err = register_cpu_notifier(&ccn->dt.cpu_nb);
> + cpuhp_armccn_dt = &ccn->dt;
> + err = cpuhp_setup_state(CPUHP_AP_PERF_ARM_CCN_ONLINE,
> + "AP_PERF_ARM_CCN_ONLINE", NULL,
> + arm_ccn_pmu_offline_cpu);
>   if (err)
>   goto error_cpu_notifier;

Also, unless I'm missing something obvious, it seems that the callback
will be executed for CPUs going online? I'm definitely interested in my
"current" CPU going down, in order to migrate my handlers somewhere
else.

Help?

Pawel

Re: [patch 22/66] bus: arm-ccn: convert to hotplug statemachine

2016-07-12 Thread Pawel Moll

On Mon, 2016-07-11 at 12:28 +, Anna-Maria Gleixner wrote:
> From: Sebastian Andrzej Siewior 
> 
> Install the callbacks via the state machine and let the core invoke
> the callbacks on the already online CPUs.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Cc: Pawel Moll 
> Signed-off-by: Anna-Maria Gleixner 

Although I won't shed a tear over the notifiers going, there's a
problem with this patch...

> ---
>  drivers/bus/arm-ccn.c  |   47 --
> ---
>  include/linux/cpuhotplug.h |2 +
>  2 files changed, 23 insertions(+), 26 deletions(-)
> 
> --- a/drivers/bus/arm-ccn.c
> +++ b/drivers/bus/arm-ccn.c
> @@ -167,7 +167,6 @@ struct arm_ccn_dt {
>   struct hrtimer hrtimer;
>  
>   cpumask_t cpu;
> - struct notifier_block cpu_nb;
>  
>   struct pmu pmu;
>  };

Notice that here each instance of CCN (unlikely as it is today, the
code was written with the assumption that there's more than one
interconnect ring in the system) get its own notifier block...

> @@ -1171,30 +1170,23 @@ static enum hrtimer_restart arm_ccn_pmu_
>  }
>  
>  
> -static int arm_ccn_pmu_cpu_notifier(struct notifier_block *nb,
> - unsigned long action, void *hcpu)
> +static struct arm_ccn_dt *cpuhp_armccn_dt;
> +static int arm_ccn_pmu_offline_cpu(unsigned int cpu)
>  {
> - struct arm_ccn_dt *dt = container_of(nb, struct arm_ccn_dt,
> cpu_nb);
> + struct arm_ccn_dt *dt = cpuhp_armccn_dt;
>   struct arm_ccn *ccn = container_of(dt, struct arm_ccn, dt);
> - unsigned int cpu = (long)hcpu; /* for (long) see


... but here (and in all the rest of this change) it's replaced by a
static pointer to a single instance.

> @@ -1270,9 +1262,10 @@ static int arm_ccn_pmu_init(struct arm_c
>* ... and change the selection when it goes offline.
> Priority is
>* picked to have a chance to migrate events before perf is
> notified.
>*/
> - ccn->dt.cpu_nb.notifier_call = arm_ccn_pmu_cpu_notifier;
> - ccn->dt.cpu_nb.priority = CPU_PRI_PERF + 1,
> - err = register_cpu_notifier(&ccn->dt.cpu_nb);
> + cpuhp_armccn_dt = &ccn->dt;

Even without checking if the pointer has been already set.

> + err = cpuhp_setup_state(CPUHP_AP_PERF_ARM_CCN_ONLINE,
> + "AP_PERF_ARM_CCN_ONLINE", NULL,
> + arm_ccn_pmu_offline_cpu);
>   if (err)
>   goto error_cpu_notifier;
>  
> @@ -1293,7 +1286,8 @@ static int arm_ccn_pmu_init(struct arm_c
>  
>  error_pmu_register:
>  error_set_affinity:
> - unregister_cpu_notifier(&ccn->dt.cpu_nb);
> + cpuhp_remove_state_nocalls(CPUHP_AP_PERF_ARM_CCN_ONLINE);
> + cpuhp_armccn_dt = NULL;
  error_cpu_notifier:
>   ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id);
>   for (i = 0; i < ccn->num_xps; i++)
> @@ -1308,7 +1302,8 @@ static void arm_ccn_pmu_cleanup(struct a
>  
>   if (ccn->irq)
>   irq_set_affinity_hint(ccn->irq, NULL);
> - unregister_cpu_notifier(&ccn->dt.cpu_nb);
> + cpuhp_remove_state_nocalls(CPUHP_AP_PERF_ARM_CCN_ONLINE);
> + cpuhp_armccn_dt = NULL;

Same (only the other way round) here...

> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -30,6 +30,7 @@ enum cpuhp_state {
>   CPUHP_AP_PERF_X86_AMD_IBS_STARTING,
>   CPUHP_AP_PERF_X86_CQM_STARTING,
>   CPUHP_AP_PERF_X86_CSTATE_STARTING,
> + CPUHP_AP_PERF_XTENSA_STARTING,
>   CPUHP_AP_NOTIFY_STARTING,
>   CPUHP_AP_ONLINE,
>   CPUHP_TEARDOWN_CPU,

That chunk does not belong here, does it?

Regards

Pawel

Re: [PATCH 4/4] drivers/bus: make arm-ccn.c driver explicitly non-modular

2016-03-29 Thread Pawel Moll

On Tue, 2016-03-29 at 13:30 +0100, Will Deacon wrote:
> On Tue, Mar 29, 2016 at 12:53:06PM +0100, Pawel Moll wrote:
> > Dnia 2016-03-29, Tue o godzinie 12:45 +0100, Will Deacon pisze:
> > > I'd much rather fix the driver to build as a module, if at all
> > > possible.
> > > Suzuki (CC'd) is taking a look at that, so please drop this patch
> > > for
> > > now.
> > 
> > There's no problem with building arm-ccn.c as a module - all it's
> > really doing today is providing a PMU driver. I don't even know why
> > have I made it bool-only in the first place...
> 
> Probably because it doesn't compile due to the irq_set_affinity call.

The original arm-ccn.c did not call it at all. My guess is that I
copied ARM_CCI stanza without thinking. But I'm glad Suzuki will
straighten it out :-)

Paweł

Re: [PATCH 4/4] drivers/bus: make arm-ccn.c driver explicitly non-modular

2016-03-29 Thread Pawel Moll

Dnia 2016-03-29, Tue o godzinie 12:45 +0100, Will Deacon pisze:
> I'd much rather fix the driver to build as a module, if at all
> possible.
> Suzuki (CC'd) is taking a look at that, so please drop this patch for
> now.

There's no problem with building arm-ccn.c as a module - all it's
really doing today is providing a PMU driver. I don't even know why
have I made it bool-only in the first place...

Pawel

Re: [PATCH v5 2/2] arm: perf: Add event descriptions

2015-10-15 Thread Pawel Moll

On Thu, 2015-10-15 at 16:42 +0100, Mark Rutland wrote:
> On Thu, Oct 15, 2015 at 04:20:37PM +0100, Pawel Moll wrote:
> > On Thu, 2015-10-15 at 08:15 -0700, Drew Richardson wrote:
> > > On Thu, Oct 15, 2015 at 02:21:12PM +0100, Will Deacon wrote:
> > > > On Tue, Oct 13, 2015 at 08:36:45AM -0700, Drew Richardson wrote:
> > > > > Add additional information about the ARM architected hardware events
> > > > > to make counters self describing. This makes the hardware PMUs easier
> > > > > to use as perf list contains possible events instead of users having
> > > > > to refer to documentation like the ARM TRMs.
> > > > > 
> > > > > Signed-off-by: Drew Richardson 
> > > > > ---
> > > > >  arch/arm/kernel/perf_event_v7.c | 121 
> > > > > 
> > > > >  drivers/perf/arm_pmu.c  |   1 +
> > > > >  2 files changed, 122 insertions(+)
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > > > > index 2365a32a595e..e933d2dd71c0 100644
> > > > > --- a/drivers/perf/arm_pmu.c
> > > > > +++ b/drivers/perf/arm_pmu.c
> > > > > @@ -548,6 +548,7 @@ static void armpmu_init(struct arm_pmu *armpmu)
> > > > >   .stop   = armpmu_stop,
> > > > >   .read   = armpmu_read,
> > > > >   .filter_match   = armpmu_filter_match,
> > > > > + .attr_groups= armpmu->pmu.attr_groups,
> > > > 
> > > > I don't understand this hunk. What's it doing?
> > > 
> > > I'm not 100% clear either on what it's doing. But without this line
> > > the attr_groups don't get passed on and I don't see them on my TC2. I
> > > debugged the issue down to this but it may not be the proper way to
> > > solve the problem.
> > 
> > The perf core creates attributes passed as pmu.attr_groups
> > in /sys/bus/events/devices/.
> 
> Sure, but the confusion here had to do with why we were re-assigning the
> field to itself rather than what the purpose of said field was.

Ah, right. Sorry. I must admit I haven't looked to the right from the =
sign...

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 2/2] arm: perf: Add event descriptions

2015-10-15 Thread Pawel Moll

On Thu, 2015-10-15 at 08:15 -0700, Drew Richardson wrote:
> On Thu, Oct 15, 2015 at 02:21:12PM +0100, Will Deacon wrote:
> > On Tue, Oct 13, 2015 at 08:36:45AM -0700, Drew Richardson wrote:
> > > Add additional information about the ARM architected hardware events
> > > to make counters self describing. This makes the hardware PMUs easier
> > > to use as perf list contains possible events instead of users having
> > > to refer to documentation like the ARM TRMs.
> > > 
> > > Signed-off-by: Drew Richardson 
> > > ---
> > >  arch/arm/kernel/perf_event_v7.c | 121 
> > > 
> > >  drivers/perf/arm_pmu.c  |   1 +
> > >  2 files changed, 122 insertions(+)
> > 
> > [...]
> > 
> > > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > > index 2365a32a595e..e933d2dd71c0 100644
> > > --- a/drivers/perf/arm_pmu.c
> > > +++ b/drivers/perf/arm_pmu.c
> > > @@ -548,6 +548,7 @@ static void armpmu_init(struct arm_pmu *armpmu)
> > >   .stop   = armpmu_stop,
> > >   .read   = armpmu_read,
> > >   .filter_match   = armpmu_filter_match,
> > > + .attr_groups= armpmu->pmu.attr_groups,
> > 
> > I don't understand this hunk. What's it doing?
> 
> I'm not 100% clear either on what it's doing. But without this line
> the attr_groups don't get passed on and I don't see them on my TC2. I
> debugged the issue down to this but it may not be the proper way to
> solve the problem.

The perf core creates attributes passed as pmu.attr_groups
in /sys/bus/events/devices/.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/38] bus: arm-ccn: fix handling cpumask_any_but return value

2015-09-21 Thread Pawel Moll

Dnia 2015-09-21, Mon o godzinie 15:33 +0200, Andrzej Hajda pisze:
> cpumask_any_but returns value >= nr_cpu_ids if there are no more CPUs.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2038576
> 
> Signed-off-by: Andrzej Hajda 

This has been already reported and discussed:

http://thread.gmane.org/gmane.linux.kernel.janitors/34058

but apparently slipped through cracks :-( therefore:

Acked-by: Pawel Moll 

Thanks for bringing it up again!

Pawel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] clk: versatile: Switch to assigned clock parents

2015-08-06 Thread Pawel Moll

On Wed, 2015-08-05 at 18:56 +0100, Stephen Boyd wrote:
> On 08/05/2015 03:29 AM, Pawel Moll wrote:
> >
> > Right, that's probably my fault. The sp810 nodes appeared in the VE
> > trees very early (in terms of the "DT era"), when we weren't so strict
> > about documentation. So far, it would be fairly simple, something like
> > the text below - feel free to take it, mend it, extend it with your
> > changes and include in the series:
> >
> > Thanks!
> 
> Great! Can you please provide your signed-off-by? 

Sure. 

8<

>From cddb00e4c8b6b57e82008fa23a7359560e5c997c Mon Sep 17 00:00:00 2001
From: Pawel Moll 
Date: Thu, 6 Aug 2015 16:04:10 +0100
Subject: [PATCH] clk: versatile: Add SP810 device tree bindings document

Signed-off-by: Pawel Moll 
---
 Documentation/devicetree/bindings/arm/sp810.txt | 35 +
 1 file changed, 35 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/sp810.txt

diff --git a/Documentation/devicetree/bindings/arm/sp810.txt 
b/Documentation/devicetree/bindings/arm/sp810.txt
new file mode 100644
index 000..440ee08
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/sp810.txt
@@ -0,0 +1,35 @@
+SP810 System Controller
+---
+
+Required properties:
+
+- compatible:  standard compatible string for a Primecell peripheral,
+   see Documentation/devicetree/bindings/arm/primecell.txt
+   for more details
+   should be: "arm,sp810", "arm,primecell"
+
+- reg: standard registers property, physical address and size
+   of the control registers
+
+- clock-names: from the common clock bindings, for more details see
+   Documentation/devicetree/bindings/clock/clock-bindings.txt;
+   should be: "refclk", "timclk", "apb_pclk"
+
+- clocks:  from the common clock bindings, phandle and clock
+   specifier pairs for the entries of clock-names property
+
+- #clock-cells: from the common clock bindings;
+   should be: <1>
+
+- clock-output-names: from the common clock bindings;
+   should be: "timerclken0", "timerclken1", "timerclken2", 
"timerclken3"
+
+Example:
+   sysctl@02 {
+   compatible = "arm,sp810", "arm,primecell";
+   reg = <0x02 0x1000>;
+   clocks = <&v2m_refclk32khz>, <&v2m_refclk1mhz>, <&smbclk>;
+   clock-names = "refclk", "timclk", "apb_pclk";
+   #clock-cells = <1>;
+   clock-output-names = "timerclken0", "timerclken1", 
"timerclken2", "timerclken3";
+   };
-- 
2.1.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] clk: versatile: Switch to assigned clock parents

2015-08-05 Thread Pawel Moll

On Mon, 2015-08-03 at 18:55 +0100, Stephen Boyd wrote:
> On 08/03/2015 07:01 AM, Pawel Moll wrote:
> > On Sat, 2015-08-01 at 00:44 +0100, Stephen Boyd wrote:
> >> @@ -179,6 +124,15 @@ static void __init clk_sp810_of_setup(struct 
> >> device_node *node)
> >>sp810->timerclken[i].channel = i;
> >>sp810->timerclken[i].hw.init = &init;
> >>   
> >> +  /*
> >> +   * If DT isn't setting the parent, force it to be
> >> +   * the 1 MHz clock without going through the framework.
> >> +   * We do this before clk_register() so that it can determine
> >> +   * the parent and setup the tree properly.
> >> +   */
> >> +  if (deprecated)
> >> +  init.ops->set_parent(&sp810->timerclken[i].hw, 1);
> >> +
> >>sp810->timerclken[i].clk = clk_register(NULL,
> >>&sp810->timerclken[i].hw);
> >>WARN_ON(IS_ERR(sp810->timerclken[i].clk));
> > So that's one thing I've got a (small) problem here...
> >
> > The above change assumes that SP810 always have 32kHz clock on input 0
> > and 1MHz clock on input 1. Yes, this is how it made on VExpress, but it
> > doesn't have to be the case. The magic picking up the faster clock was
> > added to handle all possible cases.
> >
> > The bottom line is: if all we care is VExpress than it works, but it's
> > still a hack. Personally I don't like it, however I won't nak the patch
> > because of this.
> 
> All current dts files in the kernel tree have 1MHz on input 1, so we 
> make this change here to set the parent to input 1 if there isn't an 
> assigned-clock-parents property.

That's what I said - all VE platforms known to me will work, because
they all have the clocks wired up the same way. And I still don't like
code in a "generic SP810 driver" assuming this. Call me what you
want ;-)

> Presumably new dts files should have 
> the new property so that things work properly. 

Sure, no argument here.

> I tried to find the 
> binding document, but it doesn't look to exist, so I didn't have 
> anything to update.

Right, that's probably my fault. The sp810 nodes appeared in the VE
trees very early (in terms of the "DT era"), when we weren't so strict
about documentation. So far, it would be fairly simple, something like
the text below - feel free to take it, mend it, extend it with your
changes and include in the series:

Thanks!

Pawel

8<-

SP810 System Controller
---

Required properties:

- compatible:   standard compatible string for a Primecell peripheral,
see Documentation/devicetree/bindings/arm/primecell.txt
for more details
should be: "arm,sp810", "arm,primecell"

- reg:  standard registers property, physical address and size
of the control registers

- clock-names:  from the common clock bindings, for more details see
Documentation/devicetree/bindings/clock/clock-bindings.txt;
should be: "refclk", "timclk", "apb_pclk"

- clocks:   from the common clock bindings, phandle and clock
specifier pairs for the entries of clock-names property

- #clock-cells: from the common clock bindings;
should be: <1>

- clock-output-names:   from the common clock bindings;
should be: "timerclken0", "timerclken1", "timerclken2", 
"timerclken3"

Example:
sysctl@02 {
compatible = "arm,sp810", "arm,primecell";
reg = <0x02 0x1000>;
clocks = <&v2m_refclk32khz>, <&v2m_refclk1mhz>, <&smbclk>;
clock-names = "refclk", "timclk", "apb_pclk";
#clock-cells = <1>;
clock-output-names = "timerclken0", "timerclken1", 
"timerclken2", "timerclken3";
};

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] clk: versatile: Switch to assigned clock parents

2015-08-03 Thread Pawel Moll

On Sat, 2015-08-01 at 00:44 +0100, Stephen Boyd wrote:
> @@ -179,6 +124,15 @@ static void __init clk_sp810_of_setup(struct device_node 
> *node)
>   sp810->timerclken[i].channel = i;
>   sp810->timerclken[i].hw.init = &init;
>  
> + /*
> +  * If DT isn't setting the parent, force it to be
> +  * the 1 MHz clock without going through the framework.
> +  * We do this before clk_register() so that it can determine
> +  * the parent and setup the tree properly.
> +  */
> + if (deprecated)
> + init.ops->set_parent(&sp810->timerclken[i].hw, 1);
> +
>   sp810->timerclken[i].clk = clk_register(NULL,
>   &sp810->timerclken[i].hw);
>   WARN_ON(IS_ERR(sp810->timerclken[i].clk));

So that's one thing I've got a (small) problem here...

The above change assumes that SP810 always have 32kHz clock on input 0
and 1MHz clock on input 1. Yes, this is how it made on VExpress, but it
doesn't have to be the case. The magic picking up the faster clock was
added to handle all possible cases.

The bottom line is: if all we care is VExpress than it works, but it's
still a hack. Personally I don't like it, however I won't nak the patch
because of this.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 24/26] clk: versatile: Convert to clk_hw based provider APIs

2015-07-31 Thread Pawel Moll

On Fri, 2015-07-31 at 18:04 +0100, Stephen Boyd wrote:
> We're removing struct clk from the clk provider API, so switch
> this code to using the clk_hw based provider APIs.
> 
> Cc: Pawel Moll 
> Cc: Linus Walleij 
> Signed-off-by: Stephen Boyd 
> ---
>  drivers/clk/versatile/clk-sp810.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/versatile/clk-sp810.c 
> b/drivers/clk/versatile/clk-sp810.c
> index 7fbe4d4bf35e..af653bfd4901 100644
> --- a/drivers/clk/versatile/clk-sp810.c
> +++ b/drivers/clk/versatile/clk-sp810.c
> @@ -80,7 +80,7 @@ static int clk_sp810_timerclken_prepare(struct clk_hw *hw)
>  {
>   struct clk_sp810_timerclken *timerclken = to_clk_sp810_timerclken(hw);
>   struct clk_sp810 *sp810 = timerclken->sp810;
> - struct clk *old_parent = __clk_get_parent(hw->clk);
> + struct clk_hw *old_parent = clk_hw_get_parent(hw);
>   struct clk *new_parent;
>  
>   if (!sp810->refclk)

Acked-by: Pawel Moll 

(disclaimer: not tested ;-)

Thanks!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:perf/core] perf tools: Fix test build error when bindir contains double slash

2015-07-31 Thread tip-bot for Pawel Moll

Commit-ID:  0927beeca5f9d1a7978f8da9c9d28647859816d3
Gitweb: http://git.kernel.org/tip/0927beeca5f9d1a7978f8da9c9d28647859816d3
Author: Pawel Moll 
AuthorDate: Tue, 28 Jul 2015 15:10:13 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 28 Jul 2015 13:03:49 -0300

perf tools: Fix test build error when bindir contains double slash

When building with a prefix ending with a slash, for example:

$ make prefix=/usr/local/

one of the perf tests fail to compile due to BUILD_STR macro mishandling
bindir_SQ string containing with two slashes:

-DBINDIR="BUILD_STR(/usr/local//bin)"

with the following error:

  CC   tests/attr.o
tests/attr.c: In function ‘test__attr’:
tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token
  snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR);
  ^
tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token
 }
 ^
tests/attr.c:176:1: error: control reaches end of non-void function 
[-Werror=return-type]
 }
 ^
cc1: all warnings being treated as errors

This patch works around the problem by "cleaning" the bindir string
using make's abspath function.

Signed-off-by: Pawel Moll 
Acked-by: Jiri Olsa 
Cc: Adrian Hunter 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1438092613-21014-1-git-send-email-pawel.m...@arm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/config/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 094ddae..d31fac1 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -638,7 +638,7 @@ ifndef DESTDIR
 prefix ?= $(HOME)
 endif
 bindir_relative = bin
-bindir = $(prefix)/$(bindir_relative)
+bindir = $(abspath $(prefix)/$(bindir_relative))
 mandir = share/man
 infodir = share/info
 perfexecdir = libexec/perf-core
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] tools: perf: Fix test build error when bindir contains double slash

2015-07-28 Thread Pawel Moll

When building with a prefix ending with a slash, for example:

$ make prefix=/usr/local/

one of the perf tests fail to compile due to BUILD_STR macro mishandling
bindir_SQ string containing with two slashes:

-DBINDIR="BUILD_STR(/usr/local//bin)"

with the following error:

  CC   tests/attr.o
tests/attr.c: In function ‘test__attr’:
tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token
  snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR);
  ^
tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token
 }
 ^
tests/attr.c:176:1: error: control reaches end of non-void function 
[-Werror=return-type]
 }
 ^
cc1: all warnings being treated as errors

This patch works around the problem by "cleaning" the bindir string
using make's abspath function.

Signed-off-by: Pawel Moll 
---
 tools/perf/config/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index b4a9c29..e8d987a 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -638,7 +638,7 @@ ifndef DESTDIR
 prefix ?= $(HOME)
 endif
 bindir_relative = bin
-bindir = $(prefix)/$(bindir_relative)
+bindir = $(abspath $(prefix)/$(bindir_relative))
 mandir = share/man
 infodir = share/info
 perfexecdir = libexec/perf-core
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tests: Adding build test for having ending slash in double slash

2015-07-28 Thread Pawel Moll

On Tue, 2015-07-28 at 13:40 +0100, Pawel Moll wrote:
> Interesting. Building from a perf-tar-src-pkg tarball works:
> 
> $ rm -rf /tmp/krava/ && make install prefix=/tmp/krava
> [...]
>   LINK perf
>   LINK libperf-gtk.so
>   INSTALL  GTK UI
>   INSTALL  binaries
>   INSTALL  tests
>   INSTALL  libexec
>   INSTALL  perf-archive
>   INSTALL  perf-with-kcore
>   INSTALL  perl-scripts
>   INSTALL  python-scripts
>   INSTALL  perf_completion-script
> 
> but it breaks indeed the way you pointed out when building in-tree...

Ignore what I said above. The real issue is that $(realpath PATH)
returns nothing if PATH does not actually exist. Didn't know that...

Will send v2 soon.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tools: perf: Avoid error message when building with no python installed

2015-07-28 Thread Pawel Moll

On Mon, 2015-07-27 at 19:36 +0100, Jiri Olsa wrote:
> > @@ -150,7 +150,7 @@ ifndef NO_LIBPYTHON
> >PYTHON2 := $(if $(call get-executable,python2),python2,python)
> >override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
> >PYTHON2_CONFIG := \
> > -$(if $(call 
> > get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)
> > +$(if $(PYTHON),$(if $(call 
> > get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config))
> >override PYTHON_CONFIG := \
> >  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG))
> 
> haven't tried, but seems like we want to add also the else part
> to get default into PYTHON_CONFIG?
> -$(if $(PYTHON),$(if $(call 
> get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config))
> +$(if $(PYTHON),$(if $(call 
> get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config),python-config)

It's not strictly necessary, because the issue here is just executing
"command -v -c" (followed by "onfig" :-), but of course it will still
work.

> it fails anyway later either way, but seems like the 'python-config'
> was the default case that was meant to fail later

Well, yes, I had the same impression. But at the same time
get-executable-or-default is constructed in such a way that either of
the variants must be a real executable:

--8<---
# get-supplied-or-default-executable
#
# Usage: absolute-executable-path-or-empty = $(call 
get-executable-or-default,variable,default)
#
define get-executable-or-default
$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(call _ge_attempt,$(2)))
endef
_ge_attempt = $(if $(get-executable),$(get-executable),$(call _gea_err,$(2)))
_gea_err  = $(if $(1),$(error Please set '$(1)' appropriately))
--8<---

See? Both alternatives in the $(if $($(1)) monster do _ge_attempt... So
one could say that the change may rather look like this (untested):

--8<---
diff --git a/tools/perf/config/utilities.mak
b/tools/perf/config/utilities.mak
index 0ebef09..78135dc 100644
--- a/tools/perf/config/utilities.mak
+++ b/tools/perf/config/utilities.mak
@@ -173,7 +173,7 @@ _ge-abspath = $(if $(is-executable),$(1))
 # Usage: absolute-executable-path-or-empty = $(call
get-executable-or-default,variable,default)
 #
 define get-executable-or-default
-$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(call _ge_attempt,$(2)))
+$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(2))
 endef
 _ge_attempt = $(if $(get-executable),$(get-executable),$(call _gea_err,
$(2)))
 _gea_err  = $(if $(1),$(error Please set '$(1)' appropriately))
--8<---

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tests: Adding build test for having ending slash in double slash

2015-07-28 Thread Pawel Moll

On Mon, 2015-07-27 at 19:24 +0100, Jiri Olsa wrote:
> I'm getting install error with your patch:
> 
> [jolsa@krava perf]$ rm -rf /tmp/krava/ && make install prefix=/tmp/krava
> 
> SNIP
> 
>   LINK perf
>   INSTALL  binaries
>   INSTALL  tests
> install: cannot create directory ‘’: No such file or directory
> install: target ‘’ is not a directory: No such file or directory
> ln: failed to access ‘/perf’: No such file or directory
> Makefile.perf:492: recipe for target 'install-tools' failed
> make[1]: *** [install-tools] Error 1
> make[1]: *** Waiting for unfinished jobs
> Makefile:87: recipe for target 'install' failed
> make: *** [install] Error 2
> 
> somethings wrong with install targets dealing with bindir_SQ

Interesting. Building from a perf-tar-src-pkg tarball works:

$ rm -rf /tmp/krava/ && make install prefix=/tmp/krava
[...]
  LINK perf
  LINK libperf-gtk.so
  INSTALL  GTK UI
  INSTALL  binaries
  INSTALL  tests
  INSTALL  libexec
  INSTALL  perf-archive
  INSTALL  perf-with-kcore
  INSTALL  perl-scripts
  INSTALL  python-scripts
  INSTALL  perf_completion-script

but it breaks indeed the way you pointed out when building in-tree...

Having a look now.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] tools: perf: Fix test build error when bindir contains double slash

2015-07-27 Thread Pawel Moll

When building with a prefix ending with a slash, for example:

$ make prefix=/usr/local/

one of the perf tests fail to compile due to BUILD_STR macro mishandling
bindir_SQ string containing with two slashes:

-DBINDIR="BUILD_STR(/usr/local//bin)"

with the following error:

  CC   tests/attr.o
tests/attr.c: In function ‘test__attr’:
tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token
  snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR);
  ^
tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token
 }
 ^
tests/attr.c:176:1: error: control reaches end of non-void function 
[-Werror=return-type]
 }
 ^
cc1: all warnings being treated as errors

This patch works around the problem by "cleaning" the bindir string
using make's realpath function.

Signed-off-by: Pawel Moll 
---
 tools/perf/config/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index b4a9c29..59e8376 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -638,7 +638,7 @@ ifndef DESTDIR
 prefix ?= $(HOME)
 endif
 bindir_relative = bin
-bindir = $(prefix)/$(bindir_relative)
+bindir = $(realpath $(prefix)/$(bindir_relative))
 mandir = share/man
 infodir = share/info
 perfexecdir = libexec/perf-core
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] tools: perf: Avoid error message when building with no python installed

2015-07-27 Thread Pawel Moll

When run on a system without a python interpreter installed, the makefile
probing python will try to execute

/bin/sh -c 'command -v -config [...]'

instead of

/bin/sh -c 'command -v python-config [...]'

resulting in the following error message:

sh: 1: command: Illegal option -c

Although harmless, the message can be quite confusing and hard to track.

Signed-off-by: Pawel Moll 
---
 tools/perf/config/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 094ddae..b4a9c29 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -150,7 +150,7 @@ ifndef NO_LIBPYTHON
   PYTHON2 := $(if $(call get-executable,python2),python2,python)
   override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
   PYTHON2_CONFIG := \
-$(if $(call 
get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)
+$(if $(PYTHON),$(if $(call 
get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config))
   override PYTHON_CONFIG := \
 $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG))
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] perf: Add PERF_RECORD_SWITCH to indicate context switches

2015-06-12 Thread Pawel Moll

On Fri, 2015-06-12 at 14:28 +0100, Pawel Moll wrote:
> On Fri, 2015-06-12 at 14:15 +0100, Adrian Hunter wrote:
> > >>>> all 3 are already part of sample_id.
> > >>>
> > >>> You have to decide whether you expect to be able to use an event without
> > >>> sample_id. MMAP and MMAP2 both have pid, tid which are in sample_id, 
> > >>> LOST
> > >>> has id, EXIT and FORK have time, all of the THROTTLE/UNTHROTTLE members 
> > >>> are
> > >>> in sample_id etc. So it currently looks like we expect to be able to 
> > >>> use an
> > >>> event without requiring sample_id.
> > > 
> > > The fact that there is this duplication is because sample_id_all came
> > > after those events, but this new one being proposed doesn't have to do
> > > it :-)
> > 
> > Thanks, that's clear then.  There will just need to be a flag to indicate
> > whether it is scheduling in or out.
> 
> Just a thought: wouldn't it be good to know what CPU have we been
> scheduled from/to? This kind of information would be especially valuable
> in heterogeneous systems.

Of course, cpu is a part of sample_id as well. I'm sorted out then :-)

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] perf: Add PERF_RECORD_SWITCH to indicate context switches

2015-06-12 Thread Pawel Moll

On Fri, 2015-06-12 at 14:15 +0100, Adrian Hunter wrote:
>  all 3 are already part of sample_id.
> >>>
> >>> You have to decide whether you expect to be able to use an event without
> >>> sample_id. MMAP and MMAP2 both have pid, tid which are in sample_id, LOST
> >>> has id, EXIT and FORK have time, all of the THROTTLE/UNTHROTTLE members 
> >>> are
> >>> in sample_id etc. So it currently looks like we expect to be able to use 
> >>> an
> >>> event without requiring sample_id.
> > 
> > The fact that there is this duplication is because sample_id_all came
> > after those events, but this new one being proposed doesn't have to do
> > it :-)
> 
> Thanks, that's clear then.  There will just need to be a flag to indicate
> whether it is scheduling in or out.

Just a thought: wouldn't it be good to know what CPU have we been
scheduled from/to? This kind of information would be especially valuable
in heterogeneous systems.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/19] clk: versatile: Silence sparse warnings

2015-05-06 Thread Pawel Moll

On Wed, 2015-05-06 at 08:39 +0100, Stephen Boyd wrote:
> drivers/clk/versatile/clk-sp810.c:159:29: error: incompatible types for 
> operation (<=)
> drivers/clk/versatile/clk-sp810.c:159:29:left side has type char const 
> *
> drivers/clk/versatile/clk-sp810.c:159:29:right side has type int
> drivers/clk/versatile/clk-sp810.c:159:53: error: incompatible types for 
> operation (<=)
> drivers/clk/versatile/clk-sp810.c:159:53:left side has type char const 
> *
> drivers/clk/versatile/clk-sp810.c:159:53:right side has type int
> rivers/clk/versatile/clk-sp810.c:138:13: warning: symbol 'clk_sp810_of_setup' 
> was not declared. Should it be static?
> 
> Cc: Pawel Moll 
> Signed-off-by: Stephen Boyd 

Acked-by: Pawel Moll 

> @@ -156,7 +156,7 @@ void __init clk_sp810_of_setup(struct device_node *node)
>   "timclk");
>   parent_names[1] = of_clk_get_parent_name(node, sp810->timclk_index);
>  
> - if (parent_names[0] <= 0 || parent_names[1] <= 0) {
> + if (!parent_names[0] || !parent_names[1]) {
>   pr_warn("Failed to obtain parent clocks for SP810!\n");
>   return;
>   }

I stared at it (and at git blame output) and was thinking what was I
smoking typing the code... Fortunately mail history suggest that I had
"!parent..." in my original patch, and it was modified by Mike ;-) No
harm done :-)

Thanks for fixing this!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tracing: Export key trace event symbols

2015-04-23 Thread Pawel Moll

On Thu, 2015-04-23 at 16:28 +0100, Pawel Moll wrote:
> On Wed, 2015-04-22 at 16:36 +0100, David Ahern wrote:
> > On 4/22/15 8:47 AM, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Apr 22, 2015 at 08:53:14AM -0400, Steven Rostedt escreveu:
> > >> >On Tue, 21 Apr 2015 21:24:51 -0500
> > >> >Ron Rechenmacher  wrote:
> > >>> > >I've looked at the above reference briefly and it appears that 
> > >>> > >user-space
> > >>> > >would be mmapping the buffer read-only. Is that correct?
> > >> >
> > >> >Correct, but I'm sure we could still add something (if it doesn't
> > >> >already exist) to have userspace write into the buffer. Ftrace has that
> > >> >with the trace_marker file.
> > > There is something in the works, I guess Pawell Moll (sp) was working on 
> > > it, and
> > > David Ahern (CCed) should know, David?
> > >
> > 
> > I played around with generating perf events in userspace with the 
> > intention of having the userspace events get merged with kernel events 
> > during the processing stage, but I did not take it to the point of 
> > integrating into perf. This was around October 2013. I got distracted 
> > with other topics and have not come back to it.
> > 
> > Pawel has a patch that allows userspace to inject events into the stream 
> > via ioctl calls.
> 
> In the last version it was even a prctl - no need for a perf file
> descriptor any more :-)
> 
> But the patch requires more care if it's to go in, so I'm open to people
> screaming "yes, we need it!" ;-)

Forgot to quote the link:

http://article.gmane.org/gmane.linux.kernel.api/5851

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tracing: Export key trace event symbols

2015-04-23 Thread Pawel Moll

On Wed, 2015-04-22 at 16:36 +0100, David Ahern wrote:
> On 4/22/15 8:47 AM, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Apr 22, 2015 at 08:53:14AM -0400, Steven Rostedt escreveu:
> >> >On Tue, 21 Apr 2015 21:24:51 -0500
> >> >Ron Rechenmacher  wrote:
> >>> > >I've looked at the above reference briefly and it appears that 
> >>> > >user-space
> >>> > >would be mmapping the buffer read-only. Is that correct?
> >> >
> >> >Correct, but I'm sure we could still add something (if it doesn't
> >> >already exist) to have userspace write into the buffer. Ftrace has that
> >> >with the trace_marker file.
> > There is something in the works, I guess Pawell Moll (sp) was working on 
> > it, and
> > David Ahern (CCed) should know, David?
> >
> 
> I played around with generating perf events in userspace with the 
> intention of having the userspace events get merged with kernel events 
> during the processing stage, but I did not take it to the point of 
> integrating into perf. This was around October 2013. I got distracted 
> with other topics and have not come back to it.
> 
> Pawel has a patch that allows userspace to inject events into the stream 
> via ioctl calls.

In the last version it was even a prctl - no need for a perf file
descriptor any more :-)

But the patch requires more care if it's to go in, so I'm open to people
screaming "yes, we need it!" ;-)

> Stephane also injects events for JIT.

But that, as far as I understand, happens in userspace (as in: nothing
goes down to kernel)?

> One of the key requirements is a common time basis (e.g., 
> CLOCK_MONOTONIC or PERF_CLOCK) to be able to merge the events properly. 
> I have a kernel module that exports perf_clock to userspace via 
> clock_gettime; the 4.1 kernel should have the code that allows the clock 
> id to be specified providing a solution to this problem.

It's in! :-) After all these years...

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34f439278cef7b1177f8ce24f9fc81dfc6221d3b

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] virtio_mmio: support non-legacy balloon devices

2015-03-31 Thread Pawel Moll

On Mon, 2015-03-30 at 18:37 +0100, Michael S. Tsirkin wrote:
> virtio_device_is_legacy_only is always false now,
> drop the test from virtio mmio.
> 
> Signed-off-by: Michael S. Tsirkin 

Slightly ironic ack ;-) after all the battle you fought for this:

Acked-by: Pawel Moll 

Thanks!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 19/35 linux-next] virtio_mmio: constify of_device_id array

2015-03-23 Thread Pawel Moll

On Mon, 2015-03-16 at 19:20 +, Fabian Frederick wrote:
> of_device_id is always used as const.
> (See driver.of_match_table and open firmware functions)
> 
> Signed-off-by: Fabian Frederick 
> ---
>  drivers/virtio/virtio_mmio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 9c877d2..0ce8cda 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -671,7 +671,7 @@ static void vm_unregister_cmdline_devices(void)
>  
>  /* Platform driver */
>  
> -static struct of_device_id virtio_mmio_match[] = {
> +static const struct of_device_id virtio_mmio_match[] = {
>   { .compatible = "virtio,mmio", },
>   {},
>  };

Acked-by: Pawel Moll 

Thanks!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio_mmio: generation support

2015-03-23 Thread Pawel Moll

On Thu, 2015-03-12 at 02:07 +, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > virtio_mmio currently lacks generation support which
> > makes multi-byte field access racy.
> > Fix by getting the value at offset 0xfc for version 2
> > devices. Nothing we can do for version 1, so return
> > generation id 0.
> >
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >
> > Pawel, you mentioned you have a working virtio 1.0
> > hypervisor, can you pls confirm this works correctly?

Again, belated

Acked-by: Pawel Moll 

although not yet tested-by...

Paweł

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio_mmio: fix endian-ness for mmio

2015-03-23 Thread Pawel Moll

On Fri, 2015-03-13 at 01:22 +, Rusty Russell wrote:
> I'm sure Pawel is on a beach somewhere sipping cocktails, 

Ehm... There's still a some more prohibition for me (granted, by choice,
just to share the "pain" ;-), so it wasn't exactly 

> so I'll apply
> this immediately (with your updated commit message) to my
> pending-rebases and give him the weekend to Ack.

Not that it matters any more, but a belated

Acked-by: Pawel Moll 

Thanks, Michael, for fixing this! (although, I must admit, I haven't
tested it yet - all in right time :-)

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register

2015-03-05 Thread Pawel Moll

On Wed, 2015-03-04 at 00:37 +, Arun Ramamurthy wrote:
> > That way you're precisely describing the way the hardware is wired up.
> > And the driver simply tries to get clcdclk first, if it's defined -
> > cool, set clksel to 1, if not - try hclk and set clksel to 0. If neither
> > of them is present - bail out.
> >
> > Does this make any sense?
> >
> This makes sense to me, thank you for the suggestions. I will fix it all 
> up in V2

Cool. Just a word of comment to my own words ;-) The "bail out" case was
a bad idea - the non-DT use cases more likely than not will have no
clock name defined. So, to maintain backward compatibility, the driver
will still have to work when no named clock is available. I think the
simplest way to do that is to check if "hclk" is available, and use it
if it is (setting the clksel accordingly). Otherwise - proceed as it was
the case previously.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register

2015-03-03 Thread Pawel Moll

On Tue, 2015-03-03 at 10:02 +, Pawel Moll wrote:
> On Mon, 2015-03-02 at 19:09 +, Arun Ramamurthy wrote:
> > > The existing bindings intentionally avoided quoting internal registers -
> > > they are supposed to describe how the hardware is wired up...
> > >
> > > So how about something like "arm,pl11x,tft-invert-clac"? Then the driver
> > > sets the bit or not, depending on the property existance?
> > >
> > Sure, I can change it to two properties called arm,pl11x,tft-invert-clac 
> > and arm,pl11x,tft-clksel. Would that be acceptable?
> 
> That would be fine by me :-)

Or (after having a look at the TRM) I should rather say: the invert-clac
is fine by me :-) but the tft-clksel doesn't work, I afraid.

If I'm not mistaken, there are two problems with it.

Number one: it's not TFT-specific, is it? So it certainly should not
have the "tft-" bit.

Number two: setting this bit says "do not use CLCDCLK for the logic; use
HCLK instead", correct? If so, have a look at the clock properties. They
say:

- clock-names: should contain "clcdclk" and "apb_pclk"

- clocks: contains phandle and clock specifier pairs for the entries
in the clock-names property. See

So if your hardware has the reference clock wired to HCLK, and you
defining the clocks as "clcdclk", you are (no offence meant ;-)
lying :-)

So how about solving the problem by extending the clock-names definition
like this (feel free to use own wording):

- clock-names: should contain two clocks, either "clcdclk" or "hclk"
   (depending on which input is to be used as a reference
   clock by the controller logic) and "apb_pclk"

That way you're precisely describing the way the hardware is wired up.
And the driver simply tries to get clcdclk first, if it's defined -
cool, set clksel to 1, if not - try hclk and set clksel to 0. If neither
of them is present - bail out.

Does this make any sense?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register

2015-03-03 Thread Pawel Moll

On Mon, 2015-03-02 at 19:09 +, Arun Ramamurthy wrote:
> > The existing bindings intentionally avoided quoting internal registers -
> > they are supposed to describe how the hardware is wired up...
> >
> > So how about something like "arm,pl11x,tft-invert-clac"? Then the driver
> > sets the bit or not, depending on the property existance?
> >
> Sure, I can change it to two properties called arm,pl11x,tft-invert-clac 
> and arm,pl11x,tft-clksel. Would that be acceptable?

That would be fine by me :-)

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: ARM CLCD: Added support for FBIO_WAITFORVSYNC

2015-03-03 Thread Pawel Moll

On Mon, 2015-03-02 at 19:09 +, Arun Ramamurthy wrote:
> On 15-03-02 08:00 AM, Pawel Moll wrote:
> > On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote:
> >> Added ioctl and interrupt handler functions to support FBIO_WAITFORVSYNC
> >> Also corrected documentation to make interrupts and interrupt-names
> >> optional as they are not required properties.
> >
> > You may not be aware of this fact, but its the "documentation" what
> > defines what properties are required...
> >
> Pawel, I was not aware of that. Since the driver code did not require 
> the interrupts or interrupt-names bindings to load properly, I moved it 
> out of the required properties. I can remove that change.

Cool. Drivers don't have to use all available properties :-) The
interrupt names are required because CLCD can be wired up with a single,
combined interrupt or with 4 separate interrupt lines (see "A.4. On-chip
signals", in the TRM) and the binding has to provide ways of describing
this. Yes, one could say "1 number == combined, 4 numbers == split", but
I personally prefer to be explicit than implicit.

Just request the interrupt by name and you'll be fine.

> Any other comments on this change? Thanks

I have no experience with the vsync ioctl, so can't really comment on
it. One minor thing I did spot is your use of curly brackets in one of
the switch cases:

On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote:
@@ -466,6 +468,73 @@ static int clcdfb_pan_display(struct fb_var_screeninfo 
*var,
>   return 0;
>  }
>  
> +static int clcdfb_ioctl(struct fb_info *info,
> + unsigned int cmd, unsigned long args)
> +{
> + struct clcd_fb *fb = to_clcd(info);
> + int retval = 0;
> + u32 val, ienb_val;
> +
> + switch (cmd) {
> + case FBIO_WAITFORVSYNC:{

In the line above...

> + if (fb->lcd_irq <= 0) {
> + retval = -EINVAL;
> + break;
> + }
> + /* disable Vcomp interrupts */
> + ienb_val = readl(fb->regs + fb->off_ienb);
> + ienb_val &= ~CLCD_PL111_IENB_VCOMP;
> + writel(ienb_val, fb->regs + fb->off_ienb);
> +
> + /* clear Vcomp interrupt */
> + writel(CLCD_PL111_IENB_VCOMP, fb->regs + CLCD_PL111_ICR);
> +
> + /* Generate Interrupt at the start of Vsync */
> + reinit_completion(&fb->wait);
> + val = readl(fb->regs +  fb->off_cntl);
> + val &= ~(CNTL_LCDVCOMP(3));
> + writel(val, fb->regs + fb->off_cntl);
> +
> + /* enable Vcomp interrupt */
> + ienb_val = readl(fb->regs + fb->off_ienb);
> + ienb_val |= CLCD_PL111_IENB_VCOMP;
> + writel(ienb_val, fb->regs + fb->off_ienb);
> + if (!wait_for_completion_interruptible_timeout
> + (&fb->wait, HZ/10))
> + retval = -ETIMEDOUT;
> + break;
> + }

... and here.

Not sure it's needed?

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register

2015-03-02 Thread Pawel Moll

On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote:
> Added code based on linaro tree:
> http://git.linaro.org/kernel/linux-linaro-stable.git
> with commit id:6846e7822c4cab5a84672baace3b768c2d0db142
> at drivers/video/amba-clcd.c. This lets the driver set
> certain tim2 register bits after reading them from
> device tree.
> 
> Reviewed-by: Ray Jui 
> Reviewed-by: Scott Branden 
> Signed-off-by: Arun Ramamurthy 
> ---
>  .../devicetree/bindings/video/arm,pl11x.txt| 17 -
>  drivers/video/fbdev/amba-clcd.c| 41 
> ++
>  2 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/video/arm,pl11x.txt 
> b/Documentation/devicetree/bindings/video/arm,pl11x.txt
> index 3e3039a..14d6f87 100644
> --- a/Documentation/devicetree/bindings/video/arm,pl11x.txt
> +++ b/Documentation/devicetree/bindings/video/arm,pl11x.txt
> @@ -35,6 +35,21 @@ Optional properties:
>   cell's memory interface can handle; if not present, the memory
>   interface is fast enough to handle all possible video modes
>  
> +- tim2: Used to set certain bits in LCDTiming2 register.
> + It can be TIM2_CLKSEL or TIM2_IOE or both
> +
> + TIM2_CLKSEL: This bit drives the CLCDCLKSEL signal. It is the select
> + signal for the external LCD clock multiplexor.
> +
> + TIM2_IOE: Invert output enable:
> + 0 = CLAC output pin is active HIGH in TFT mode
> + 1 = CLAC output pin is active LOW in TFT mode.
> + This bit selects the active polarity of the output enable signal in
> + TFT mode. In this mode, the CLAC pin is an enable that indicates to
> + the LCD panel when valid display data is available. In active
> + display mode, data is driven onto the LCD data lines at the
> + programmed edge of CLCP when CLAC is in its active state.
> +

The existing bindings intentionally avoided quoting internal registers -
they are supposed to describe how the hardware is wired up...

So how about something like "arm,pl11x,tft-invert-clac"? Then the driver
sets the bit or not, depending on the property existance?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: ARM CLCD: Added support for FBIOPAN_DISPLAY and virtual y resolution

2015-03-02 Thread Pawel Moll

On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote:
> Added code to support FBIOPAN_DISPLAY. Also added yres_virtual
> parameter to device tree to set the virtual y resolution
> 
> Reviewed-by: Ray Jui 
> Reviewed-by: Scott Branden 
> Signed-off-by: Arun Ramamurthy 
> ---
>  .../devicetree/bindings/video/arm,pl11x.txt|  4 +++
>  drivers/video/fbdev/amba-clcd.c| 31 
> +++---
>  2 files changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/video/arm,pl11x.txt 
> b/Documentation/devicetree/bindings/video/arm,pl11x.txt
> index 14d6f87..2262cdb 100644
> --- a/Documentation/devicetree/bindings/video/arm,pl11x.txt
> +++ b/Documentation/devicetree/bindings/video/arm,pl11x.txt
> @@ -50,6 +50,10 @@ Optional properties:
>   display mode, data is driven onto the LCD data lines at the
>   programmed edge of CLCP when CLAC is in its active state.
>  
> +- yres_virtual: Virtual Y resolution,
> + It can be used to configure a virtual y resolution. It
> + must be a value larger than the actual y resolution.
> +

I'm not sure about this... The word "virtual" never works well with
device tree nodes defined as "hardware description".

I understand what you're doing, but adding this property to the display
controller's node doesn't sound right. How does this describe hardware?
If anywhere, it's more like a job for the panel node?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: ARM CLCD: Added support for FBIO_WAITFORVSYNC

2015-03-02 Thread Pawel Moll

On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote:
> Added ioctl and interrupt handler functions to support FBIO_WAITFORVSYNC
> Also corrected documentation to make interrupts and interrupt-names
> optional as they are not required properties.

You may not be aware of this fact, but its the "documentation" what
defines what properties are required...

> Reviewed-by: Ray Jui 
> Reviewed-by: Scott Branden 
> Signed-off-by: Arun Ramamurthy 0
> ---
>  .../devicetree/bindings/video/arm,pl11x.txt| 11 +--
>  drivers/video/fbdev/amba-clcd.c| 82 
> ++
>  include/linux/amba/clcd.h  |  4 ++
>  3 files changed, 89 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/video/arm,pl11x.txt 
> b/Documentation/devicetree/bindings/video/arm,pl11x.txt
> index 2262cdb..7d19024 100644
> --- a/Documentation/devicetree/bindings/video/arm,pl11x.txt
> +++ b/Documentation/devicetree/bindings/video/arm,pl11x.txt
> @@ -10,14 +10,6 @@ Required properties:
>  
>  - reg: base address and size of the control registers block
>  
> -- interrupt-names: either the single entry "combined" representing a
> - combined interrupt output (CLCDINTR), or the four entries
> - "mbe", "vcomp", "lnbu", "fuf" representing the individual
> - CLCDMBEINTR, CLCDVCOMPINTR, CLCDLNBUINTR, CLCDFUFINTR interrupts
> -
> -- interrupts: contains an interrupt specifier for each entry in
> - interrupt-names
> -
>  - clock-names: should contain "clcdclk" and "apb_pclk"
>  
>  - clocks: contains phandle and clock specifier pairs for the entries

So no, you can't do that.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2/2] perf: Add per event clockid support

2015-02-20 Thread Pawel Moll

On Fri, 2015-02-20 at 14:29 +, Peter Zijlstra wrote:
> The below patch makes the distinction between these two cases by
> adding perf_event_clock() which is used for the second case. It
> further makes this configurable on a per-event basis, but adds a few
> sanity checks such that we cannot combine events with different clocks
> in confusing ways.

The idea works for me (obviously :-)

> And since we then have per-event configurability we might as well
> retain the 'legacy' behaviour as a default.

Don't mind that at all.

> @@ -334,8 +335,7 @@ struct perf_event_attr {
>*/
>   __u32   sample_stack_user;
>  
> - /* Align to u64. */
> - __u32   __reserved_2;
> + __u32   clockid;

I thought about it, but was sort-of-afraid to propose it :-)

Now, one thing I'm not 100% sure about it is it being unsigned, as
clockid_t is signed for a reason (negative values have meaning - eg.
dynamic clocks, which could be useful in some circumstances). Of course
casting could be an answer, but is there any reason not to make it
__s32?

> + default:
> + /* XXX add: clock_id_valid() && clock_gettime_ns() ? */
> + err = -EINVAL;
> + goto err_alloc;
> + }

If you asked me, I'd say -EINVAL, no default.

Cheers!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] perf/x86: Add ability to sample TSC

2015-02-19 Thread Pawel Moll

On Thu, 2015-02-19 at 17:58 +, Pawel Moll wrote:
> and what Adrian did, explicitly
> defining possible timestamps at perf_event_attr level instead of
> relating them to to posix clock ids is the way to go. No strong opinion
> here.

One note here: I'd rather make it "processor trace clock", rather than
"architecturally defined" one. That way you relate it directly to the
time domain and allow userspace decoder to as for it in a
architecturally independent way (yes, I'm assuming that there will be
only one "type of processor trace in a system :-)

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] perf/x86: Add ability to sample TSC

2015-02-19 Thread Pawel Moll

On Thu, 2015-02-19 at 17:50 +, John Stultz wrote:
> On Thu, Feb 19, 2015 at 9:40 AM, Adrian Hunter  
> wrote:
> > On 19/02/2015 7:24 p.m., John Stultz wrote:
> >>
> >> On Thu, Feb 19, 2015 at 4:11 AM, Adrian Hunter 
> >> wrote:
> >>>
> >>> Hi
> >>>
> >>> With the advent of switching perf_clock to CLOCK_MONOTONIC,
> >>> it will not be possible to convert perf_clock directly to/from
> >>> TSC. So add the ability to sample TSC instead.
> >>>
> >>>
> >>> Adrian Hunter (2):
> >>>perf: Sample additional clock value
> >>>perf/x86: Provide TSC for PERF_SAMPLE_CLOCK_ARCH
> >>
> >>
> >> This doesn't seem very portable. The CLOCK_MONOTONIC_RAW clockid was
> >> added to provide a arch-neutral abstraction of a free-running hardware
> >> counter that isn't affected by adjtimex slewing (though like any
> >> counter, it will be affected by non-constant drift).
> >>
> >> You might consider looking at that if the short term slew adjustments
> >> (which result in more accurate timings in the long term) are
> >> problematic for you.
> >
> >
> > This is for Intel Processor Trace - which Peter has already
> > rightly chastised me for not making plain.
> >
> > Intel Processor Trace (Intel PT) is a trace that is hardware generated. The
> > hardware does not know about linux or its clocks, so the timestamps
> > are TSC. I think ARM might have the same issue with ETM or such. i.e. the
> > need to synchronize a hardware timestamp with a perf event.
> >
> > There is a description of Intel PT in the Intel Architecture manuals.
> >
> > http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
> 
> Cc'ing Mathieu since he might be familiar w/ any similar issues w/ Coresight.

We haven't got to this yet, but it was discussed (briefly) at Plumbers
in Dusseldorf. In our case the CS processor trace can be timestamped
from yet another clock source, probably hidden behind memory mapped
register. Thus the additional time sample of some sort will be required
in some form.

Peter pointed out that there may be problem with obtaining the "other"
timestamps in NMI context, so I was thinking about injecting the
synchronisation points as separate records:

http://article.gmane.org/gmane.linux.kernel.api/7726/match=perf+sample+additional+clock+value

but maybe I'm making it too complicated and what Adrian did, explicitly
defining possible timestamps at perf_event_attr level instead of
relating them to to posix clock ids is the way to go. No strong opinion
here.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support

2015-02-13 Thread Pawel Moll

On Fri, 2015-02-13 at 02:52 +, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
> >> Jason Wang  writes:
> >> > This patch enables the interrupt coalescing setting through ethtool.
> >> 
> >> The problem is that there's nothing network specific about interrupt
> >> coalescing.  I can see other devices wanting exactly the same thing,
> >> which means we'd deprecate this in the next virtio standard.
> >> 
> >> I think the right answer is to extend like we did with
> >> vring_used_event(), eg:
> >> 
> >> 1) Add a new feature VIRTIO_F_RING_COALESCE.
> >> 2) Add another a 32-bit field after vring_used_event(), eg:
> >> #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num 
> >> + 2]))
> >> 
> >> This loses the ability to coalesce by number of frames, but we can still
> >> do number of sg entries, as we do now with used_event, and we could
> >> change virtqueue_enable_cb_delayed() to take a precise number if we
> >> wanted.
> >
> > But do we expect delay to be update dynamically?
> > If not, why not stick it in config space?
> 
> Hmm, we could update it dynamically (and will, in the case of ethtool).
> But it won't be common, so we could append a field to
> virtio_pci_common_cfg for PCI.
> 
> I think MMIO and CCW would be easy to extend too, but CC'd to check.

As far as I understand the virtio_pci_common_cfg principle (just had a
look, for the first time ;-), it's now an equivalent of the MMIO control
registers block. I see no major problem with adding another one.

Or were you thinking about introducing some standard for the "real"
config space? (fine with me as well - the transport will have nothing to
do :-)

Paweł

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5] perf: Use monotonic clock as a source for timestamps

2015-02-03 Thread Pawel Moll

On Tue, 2015-02-03 at 08:30 +, ajh mls wrote:
> There is still
>
> http://marc.info/?l=linux-kernel&m=142141223902303

Uh. I have no idea why, but I haven't got this mail at all :-(

Thanks for pointing it out!

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5] perf: Use monotonic clock as a source for timestamps

2015-02-02 Thread Pawel Moll

Afternoon, Peter,

On Wed, 2015-01-21 at 20:27 +, Pawel Moll wrote:
> Until now, perf framework never defined the meaning of the timestamps
> captured as PERF_SAMPLE_TIME sample type. The values were obtaining
> from local (sched) clock, which is unavailable in userspace. This made
> it impossible to correlate perf data with any other events. Other
> tracing solutions have the source configurable (ftrace) or just share
> a common time domain between kernel and userspace (LTTng).
> 
> Follow the trend by using monotonic clock, which is readily available
> as POSIX CLOCK_MONOTONIC.
> 
> Also add a sysctl "perf_sample_time_clk_id" attribute (usually available
> as "/proc/sys/kernel/perf_sample_time_clk_id") which can be used by the
> user to obtain the clk_id to be used with POSIX clock API (eg.
> clock_gettime()) to obtain a time value comparable with perf samples.
> 
> Old behaviour can be restored by using "perf_use_local_clock" kernel
> parameter.
> 
> Signed-off-by: Pawel Moll 

I know that you're busy with other stuff, but it's already rc7 time
again... We can leave the other two patches from the series for later,
but how about getting this one merged for 3.20 and ending the 2 or 3
years long struggle? I'm not saying that everyone is happy about it, but
no one seems to be unhappy enough to speak :-)

Cheers!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/3] perf: Sample additional clock value

2015-01-23 Thread Pawel Moll

On Mon, 2015-01-05 at 13:45 +, Peter Zijlstra wrote:
> Also, one would expect something like:
> 
>   default: {
>   struct k_clock *kc = clockid_to_kclock(event->attr.clock);
>   struct timespec ts;
>   if (kc) {
>   kc->clock_get(event->attr.clock, &ts);
>   data->clock = ktime_to_ns(timespec_to_ktime(ts));
>   } else {
>   data->clock = 0;
>   }
>   }
> 
> Albeit preferably slightly less horrible -- of course, one would first
> need to deal with the NMI issue.

I was thinking about it... Maybe the solution is approaching the problem
in a completely different way.

As far as I understand (John?) POSIX timers can be used on any clockid?
So it would be possible to obtain a dynamic clock id, for example for my
exotic trace hardware (by any means necessary, like opening a char
device) and create a timer firing every 1 ms (in the trace time domain).
Than this event would be somehow associated with a perf session (for
example, by passing the timerid via perf's ioctl) and then, every when
timer fires, a perf record (something like PERF_RECORD_TIMER?)
containing the timer/clock's value *and* the normal perf timestamp,
would be injected into the circular buffer.

No issue with NMI, no issue with passing clockid through
perf_event_attr...

Does it make any sense to anyone else but me? ;-)

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] perf: Use monotonic clock as a source for timestamps

2015-01-21 Thread Pawel Moll

Until now, perf framework never defined the meaning of the timestamps
captured as PERF_SAMPLE_TIME sample type. The values were obtaining
from local (sched) clock, which is unavailable in userspace. This made
it impossible to correlate perf data with any other events. Other
tracing solutions have the source configurable (ftrace) or just share
a common time domain between kernel and userspace (LTTng).

Follow the trend by using monotonic clock, which is readily available
as POSIX CLOCK_MONOTONIC.

Also add a sysctl "perf_sample_time_clk_id" attribute (usually available
as "/proc/sys/kernel/perf_sample_time_clk_id") which can be used by the
user to obtain the clk_id to be used with POSIX clock API (eg.
clock_gettime()) to obtain a time value comparable with perf samples.

Old behaviour can be restored by using "perf_use_local_clock" kernel
parameter.

Signed-off-by: Pawel Moll 
---
Changes since v4:

- using jump labels to reduce runtime overhead of the local/mono check

 Documentation/kernel-parameters.txt |  9 
 kernel/events/core.c| 43 -
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 4df73da..3cccd0e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -91,6 +91,7 @@ parameter is applicable:
NUMANUMA support is enabled.
NFS Appropriate NFS support is enabled.
OSS OSS sound support is enabled.
+   PERFPerformance events and counters support is enabled.
PV_OPS  A paravirtualized kernel is enabled.
PARIDE  The ParIDE (parallel port IDE) subsystem is enabled.
PARISC  The PA-RISC architecture is enabled.
@@ -2795,6 +2796,14 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
allocator.  This parameter is primarily for debugging
and performance comparison.
 
+   perf_use_local_clock
+   [PERF]
+   Use local_clock() as a source for perf timestamps
+   generation. This was be the default behaviour and
+   this parameter can be used to maintain backward
+   compatibility or on older hardware with expensive
+   monotonic clock source.
+
pf. [PARIDE]
See Documentation/blockdev/paride.txt.
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 882f835..f45434d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -42,6 +42,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "internal.h"
 
@@ -322,9 +324,43 @@ extern __weak const char *perf_pmu_name(void)
return "pmu";
 }
 
+static struct static_key perf_use_local_clock_key = STATIC_KEY_INIT_FALSE;
+static bool perf_use_local_clock_param __initdata;
+static int __init perf_use_local_clock_setup(char *__unused)
+{
+   perf_use_local_clock_param = true;
+   return 1;
+}
+__setup("perf_use_local_clock", perf_use_local_clock_setup);
+
+static int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC;
+
+static struct ctl_table perf_sample_time_kern_table[] = {
+   {
+   .procname   = "perf_sample_time_clk_id",
+   .data   = &sysctl_perf_sample_time_clk_id,
+   .maxlen = sizeof(int),
+   .mode   = 0444,
+   .proc_handler   = proc_dointvec,
+   },
+   {}
+};
+
+static struct ctl_table perf_sample_time_root_table[] = {
+   {
+   .procname   = "kernel",
+   .mode   = 0555,
+   .child  = perf_sample_time_kern_table,
+   },
+   {}
+};
+
 static inline u64 perf_clock(void)
 {
-   return local_clock();
+   if (static_key_false(&perf_use_local_clock_key))
+   return local_clock();
+   else
+   return ktime_get_mono_fast_ns();
 }
 
 static inline struct perf_cpu_context *
@@ -8271,6 +8307,11 @@ void __init perf_event_init(void)
 */
BUILD_BUG_ON((offsetof(struct perf_event_mmap_page, data_head))
 != 1024);
+
+   if (perf_use_local_clock_param)
+   static_key_slow_inc(&perf_use_local_clock_key);
+   else
+   register_sysctl_table(perf_sample_time_root_table);
 }
 
 static int __init perf_event_sysfs_init(void)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps

2015-01-21 Thread Pawel Moll

On Wed, 2015-01-21 at 19:48 +, Pawel Moll wrote:
> On Wed, 2015-01-21 at 15:52 +0000, Pawel Moll wrote:
> > On Mon, 2015-01-05 at 13:00 +, Peter Zijlstra wrote:
> > > On Thu, Nov 06, 2014 at 04:51:56PM +, Pawel Moll wrote:
> > > >  Documentation/kernel-parameters.txt |  9 +
> > > >  kernel/events/core.c| 37 
> > > > +
> > > >  2 files changed, 46 insertions(+)
> > > 
> > > > diff --git a/Documentation/kernel-parameters.txt 
> > > > b/Documentation/kernel-parameters.txt
> > > > index 4c81a86..8ead8d8 100644
> > > > --- a/Documentation/kernel-parameters.txt
> > > > +++ b/Documentation/kernel-parameters.txt
> > > 
> > > > @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can 
> > > > also be entirely omitted.
> > > > allocator.  This parameter is primarily for 
> > > > debugging
> > > > and performance comparison.
> > > >  
> > > > +   perf_use_local_clock
> > > > +   [PERF]
> > > > +   Use local_clock() as a source for perf 
> > > > timestamps
> > > > +   generation. This was be the default behaviour 
> > > > and
> > > > +   this parameter can be used to maintain backward
> > > > +   compatibility or on older hardware with 
> > > > expensive
> > > > +   monotonic clock source.
> > > > +
> > > > pf. [PARIDE]
> > > > See Documentation/blockdev/paride.txt.
> > > 
> > > So I'm always terminally confused on the naming of kernel parameters,
> > > sometimes things are modules (even when they're not actually =m capable)
> > > and get a module::foo naming or so and sometimes they're not.
> > 
> > I guess you mean module.foo?
> > 
> > > So we want to use the module naming thing or not?
> > 
> > Honestly, I don't mind either way. For one thing ftrace doesn't bother
> > and just uses __setup() as well.
> 
> There's one more thing to this - as far as I remember, the module name
> is actually derived from the compilation unit name (at some level of
> Kbuild). I may be wrong (will have to double check), but a module
> parameter defined in kernel/events/core.c may be called something like
> "core.parameter" ;-).

Ok, so it's possible to enforce "perf." prefix:

/* You can override this manually, but generally this should match the
   module name. */
#ifdef MODULE
#define MODULE_PARAM_PREFIX /* empty */
#else
#define MODULE_PARAM_PREFIX KBUILD_MODNAME "."
#endif

So, perf_use_local_clock or perf.use_local_clock? :-)

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps

2015-01-21 Thread Pawel Moll

On Wed, 2015-01-21 at 15:52 +, Pawel Moll wrote:
> On Mon, 2015-01-05 at 13:00 +, Peter Zijlstra wrote:
> > On Thu, Nov 06, 2014 at 04:51:56PM +0000, Pawel Moll wrote:
> > >  Documentation/kernel-parameters.txt |  9 +
> > >  kernel/events/core.c| 37 
> > > +
> > >  2 files changed, 46 insertions(+)
> > 
> > > diff --git a/Documentation/kernel-parameters.txt 
> > > b/Documentation/kernel-parameters.txt
> > > index 4c81a86..8ead8d8 100644
> > > --- a/Documentation/kernel-parameters.txt
> > > +++ b/Documentation/kernel-parameters.txt
> > 
> > > @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can also 
> > > be entirely omitted.
> > >   allocator.  This parameter is primarily for debugging
> > >   and performance comparison.
> > >  
> > > + perf_use_local_clock
> > > + [PERF]
> > > + Use local_clock() as a source for perf timestamps
> > > + generation. This was be the default behaviour and
> > > + this parameter can be used to maintain backward
> > > + compatibility or on older hardware with expensive
> > > + monotonic clock source.
> > > +
> > >   pf. [PARIDE]
> > >   See Documentation/blockdev/paride.txt.
> > 
> > So I'm always terminally confused on the naming of kernel parameters,
> > sometimes things are modules (even when they're not actually =m capable)
> > and get a module::foo naming or so and sometimes they're not.
> 
> I guess you mean module.foo?
> 
> > So we want to use the module naming thing or not?
> 
> Honestly, I don't mind either way. For one thing ftrace doesn't bother
> and just uses __setup() as well.

There's one more thing to this - as far as I remember, the module name
is actually derived from the compilation unit name (at some level of
Kbuild). I may be wrong (will have to double check), but a module
parameter defined in kernel/events/core.c may be called something like
"core.parameter" ;-).

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/3] perf: Sample additional clock value

2015-01-21 Thread Pawel Moll

On Wed, 2015-01-21 at 17:44 +, John Stultz wrote:
> That said, there is the dynamic posix clockids. I'm not sure if it
> would make sense, but even if we don't bump MAX_CLOCKS, might there
> be some case where someone wants to use a dynamic posix clock for the
> perf reference?

If I remember correctly, last time I tried to use dynamic posix clocks
in the perf context, one needed to open a ptp character device in order
to get a file descriptor, than translated into a clockid_t value -
correct me if I'm wrong. But here you get the fd from the
sys_perf_open() and clock_*() doesn't know anything about such
descriptor.

I was looking into a way of associating a random clock with a random fd,
so that perf could "attach" itself to the clock API at will, but it
turned out not to be trivial (I'd have to dig through old threads to
remember all the nasty details).

The good thing is that it looks like the immediate need for this was no
more, with perf using monotonic clock as the clock source. It will come
back when we get into hardware trace correlation, but one step at a
time...

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/3] perf: Sample additional clock value

2015-01-21 Thread Pawel Moll

On Mon, 2015-01-05 at 13:45 +, Peter Zijlstra wrote:
> On Thu, Nov 06, 2014 at 04:51:58PM +0000, Pawel Moll wrote:
> > Currently three clocks are implemented: CLOCK_REALITME = 0,
> > CLOCK_MONOTONIC = 1 and CLOCK_MONOTONIC_RAW = 2. The clock field is
> > 5 bits wide to allow for future extension to custom, non-POSIX clock
> > sources(MAX_CLOCK for those is 16, see include/uapi/linux/time.h) like
> > ARM CoreSight (hardware trace) timestamp generator.
> 
> > @@ -304,7 +305,16 @@ struct perf_event_attr {
> > mmap2  :  1, /* include mmap with inode 
> > data */
> > comm_exec  :  1, /* flag comm events that 
> > are due to an exec */
> > uevents:  1, /* allow uevents into the 
> > buffer */
> > -   __reserved_1   : 38;
> > +
> > +   /*
> > +* clock: one of the POSIX clock IDs:
> > +*
> > +* 0 - CLOCK_REALTIME
> > +* 1 - CLOCK_MONOTONIC
> > +* 4 - CLOCK_MONOTONIC_RAW
> > +*/
> > +   clock  :  5, /* clock type */
> > +   __reserved_1   : 33;
> >  
> > union {
> > __u32   wakeup_events;/* wakeup every n events */
> 
> This would put a constraint on actually changing MAX_CLOCKS, are the
> time people OK with that? Thomas, John?

I admit I have some doubts about it myself. But I don't think we can
afford full int for the clock id, can we?

> I'm also not quite sure of using the >MAX_CLOCKS space for 'special'
> clocks, preferably those would register themselves with the POSIX clock
> interface.

That may be a hard sell - John never was particularly keen on extending
the hard-coded clocks with random sources. And the hardware trace clock
I had on mind would be probably one of them - its meaning depends a lot
on the . Of course I'm looking forward to being surprised :-)

> > @@ -4631,6 +4637,24 @@ static void __perf_event_header__init_id(struct 
> > perf_event_header *header,
> > data->cpu_entry.cpu  = raw_smp_processor_id();
> > data->cpu_entry.reserved = 0;
> > }
> > +
> > +   if (sample_type & PERF_SAMPLE_CLOCK) {
> > +   switch (event->attr.clock) {
> > +   case CLOCK_REALTIME:
> > +   data->clock = ktime_get_real_ns();
> > +   break;
> > +   case CLOCK_MONOTONIC:
> > +   data->clock = ktime_get_mono_fast_ns();
> > +   break;
> > +   case CLOCK_MONOTONIC_RAW:
> > +   data->clock = ktime_get_raw_ns();
> > +   break;
> > +   default:
> > +   data->clock = 0;
> > +   break;
> > +   }
> > +   }
> > +
> >  }
> 
> This is broken. There is nothing stopping this from being called from
> NMI context and only the MONO one is NMI safe.

Uh, right. 

> Also, one would expect something like:
> 
>   default: {
>   struct k_clock *kc = clockid_to_kclock(event->attr.clock);
>   struct timespec ts;
>   if (kc) {
>   kc->clock_get(event->attr.clock, &ts);
>   data->clock = ktime_to_ns(timespec_to_ktime(ts));
>   } else {
>   data->clock = 0;
>   }
>   }
> 
> Albeit preferably slightly less horrible -- of course, one would first
> need to deal with the NMI issue.

Ok, generally this patch clearly needs more work. I'll have to revisit
what Thomas did with the monotonic clock to understand the potential
solutions.

Maybe, in this case, giving such a wide choice of clocks to be sampled
is not the right thing after all? Maybe, when we face the issue of a
trace clock for real, simple "PERF_SAMPLE_TRACE_CLOCK" will suffice?
With the monotonic clock as a normal time base merged (hopefully
soon :-) the correlation between kernel and user space (which was the
original reason for having this change) won't be an issue any more.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/3] perf: Userspace event

2015-01-21 Thread Pawel Moll

On Mon, 2015-01-05 at 13:12 +, Peter Zijlstra wrote:
> On Thu, Nov 06, 2014 at 04:51:57PM +0000, Pawel Moll wrote:
> > This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by
> > any process to inject custom data into perf data stream as a new
> > PERF_RECORD_UEVENT record, if such process is being observed or if it
> > is running on a CPU being observed by the perf framework.
> > 
> > The prctl call takes the following arguments:
> > 
> > prctl(PR_TASK_PERF_UEVENT, type, size, data, flags);
> > 
> > - type: a number meaning to describe content of the following data.
> >   Kernel does not pay attention to it and merely passes it further in
> >   the perf data, therefore its use must be agreed between the events
> >   producer (the process being observed) and the consumer (performance
> >   analysis tool). The perf userspace tool will contain a repository of
> >   "well known" types and reference implementation of their decoders.
> > - size: Length in bytes of the data.
> > - data: Pointer to the data.
> > - flags: Reserved for future use. Always pass zero.
> > 
> > Perf context that are supposed to receive events generated with the
> > prctl above must be opened with perf_event_attr.uevent set to 1. The
> > PERF_RECORD_UEVENT records consist of a standard perf event header,
> > 32-bit type value, 32-bit data size and the data itself, followed by
> > padding to align the overall record size to 8 bytes and optional,
> > standard sample_id field.
> > 
> > Example use cases:
> > 
> > - "perf_printf" like mechanism to add logging messages to perf data;
> >   in the simplest case it can be just
> > 
> > prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0);
> > 
> > - synchronisation of performance data generated in user space with the
> >   perf stream coming from the kernel. For example, the marker can be
> >   inserted by a JIT engine after it generated portion of the code, but
> >   before the code is executed for the first time, allowing the
> >   post-processor to pick the correct debugging information.
> 
> The think I remember being raised was a unified means of these msgs
> across perf/ftrace/lttng. I am not seeing that mentioned.

Right. I was considering the "well known types repository" an attempt in
this direction. Having said that - ftrace also takes a random blob as
the trace marker, so the unification has to happen in userspace anyway.
I'll have a look what LTTng has to say in this respect.

> Also, I would like a stronger rationale for the @type argument, if it
> has no actual meaning why is it separate from the binary msg data?

Valid point. Without type 0 defined as a string, it doesn't bring
anything into the equation. I just have a gut feeling that sooner than
later we will want to split the messages somehow. Maybe we should make
it a "reserved for future use, use 0 now" field?

* struct {
*  struct perf_event_headerheader;
*  u32 __reserved; /* always 0 */
*  u32 size;
*  chardata[size];
*  char__padding[-size & 7];
*  struct sample_idsample_id;
* };

or, probably even better, make it a version value at a known offset
(currently always 1, with just size and random sized data following).

* struct {
*  struct perf_event_headerheader;
*  u32 version; /* use 1 */
*  u32 size;
*  chardata[size];
*  char__padding[-size & 7];
*  struct sample_idsample_id;
* };

So that we can mutate the user events format without too much of the
pain - the parsers will simply complain about unknown format if such
occurs and with the size of the record in the header, it is possible to
skip it.

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps

2015-01-21 Thread Pawel Moll

On Mon, 2015-01-05 at 13:00 +, Peter Zijlstra wrote:
> On Thu, Nov 06, 2014 at 04:51:56PM +0000, Pawel Moll wrote:
> >  Documentation/kernel-parameters.txt |  9 +
> >  kernel/events/core.c| 37 
> > +
> >  2 files changed, 46 insertions(+)
> 
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index 4c81a86..8ead8d8 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> 
> > @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > allocator.  This parameter is primarily for debugging
> > and performance comparison.
> >  
> > +   perf_use_local_clock
> > +   [PERF]
> > +   Use local_clock() as a source for perf timestamps
> > +   generation. This was be the default behaviour and
> > +   this parameter can be used to maintain backward
> > +   compatibility or on older hardware with expensive
> > +   monotonic clock source.
> > +
> > pf. [PARIDE]
> > See Documentation/blockdev/paride.txt.
> 
> So I'm always terminally confused on the naming of kernel parameters,
> sometimes things are modules (even when they're not actually =m capable)
> and get a module::foo naming or so and sometimes they're not.

I guess you mean module.foo?

> So we want to use the module naming thing or not?

Honestly, I don't mind either way. For one thing ftrace doesn't bother
and just uses __setup() as well.

> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 2b02c9f..5d0aa03 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> 
> > @@ -322,8 +323,41 @@ extern __weak const char *perf_pmu_name(void)
> > return "pmu";
> >  }
> >  
> > +static bool perf_use_local_clock;
> > +static int __init perf_use_local_clock_setup(char *__unused)
> > +{
> > +   perf_use_local_clock = true;
> > +   return 1;
> > +}
> > +__setup("perf_use_local_clock", perf_use_local_clock_setup);
> 
> >  static inline u64 perf_clock(void)
> >  {
> > +   if (likely(!perf_use_local_clock))
> > +   return ktime_get_mono_fast_ns();
> > +
> > return local_clock();
> >  }
> 
> Since this all is boot time, should we not use things like static_key
> and avoid the 'pointless' conditional at runtime?

Right. it's good to learn new stuff - I didn't know there was
architecture-agnostic support for jump labels. Definitely worth the
effort, will give it a try and spin the patch.

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps

2015-01-21 Thread Pawel Moll

On Mon, 2015-01-05 at 13:01 +, Peter Zijlstra wrote:
> On Thu, Dec 11, 2014 at 01:39:13PM +0000, Pawel Moll wrote:
> > On Thu, 2014-11-27 at 15:05 +0000, Pawel Moll wrote:
> > > It's been 3 weeks without any negative feedback (no feedback at all, but
> > > I take the optimistic view :-)...
> > > 
> > > How about queuing at least this patch alone for the incoming merge
> > > window? Or at least getting it into -next, with the view at 3.20?
> > 
> > It's been another two weeks... How about getting it queued for v3.20
> > then?
> 
> Yes sorry about that, I had me a kid and took a wee bit of time away
> from computers, then xmas and new years happened.

"a kid" as in: you have forked? :-) It will happen to me in 1.5 month
time (or earlier). All the best luck, we'll both need it ;-)

> In any case, best wishes for the new year to all.

Indeed.

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mfd: vexpress: Remove non-DT code

2015-01-20 Thread Pawel Moll

Now, as all VE platforms have to be booted with DT,
the code handling non-DT case can be removed.

Signed-off-by: Pawel Moll 
---
 drivers/mfd/vexpress-sysreg.c | 71 +--
 1 file changed, 15 insertions(+), 56 deletions(-)

diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c
index 8f43ab8..3e628df 100644
--- a/drivers/mfd/vexpress-sysreg.c
+++ b/drivers/mfd/vexpress-sysreg.c
@@ -47,71 +47,26 @@
 #define SYS_HBI_MASK   0xfff
 #define SYS_PROCIDx_HBI_SHIFT  0
 
-#define SYS_MCI_CARDIN (1 << 0)
-#define SYS_MCI_WPROT  (1 << 1)
-
 #define SYS_MISC_MASTERSITE(1 << 14)
 
-
-static void __iomem *__vexpress_sysreg_base;
-
-static void __iomem *vexpress_sysreg_base(void)
+void vexpress_flags_set(u32 data)
 {
-   if (!__vexpress_sysreg_base) {
+   static void __iomem *base;
+
+   if (!base) {
struct device_node *node = of_find_compatible_node(NULL, NULL,
"arm,vexpress-sysreg");
 
-   __vexpress_sysreg_base = of_iomap(node, 0);
+   base = of_iomap(node, 0);
}
 
-   WARN_ON(!__vexpress_sysreg_base);
-
-   return __vexpress_sysreg_base;
-}
-
-
-static int vexpress_sysreg_get_master(void)
-{
-   if (readl(vexpress_sysreg_base() + SYS_MISC) & SYS_MISC_MASTERSITE)
-   return VEXPRESS_SITE_DB2;
-
-   return VEXPRESS_SITE_DB1;
-}
-
-void vexpress_flags_set(u32 data)
-{
-   writel(~0, vexpress_sysreg_base() + SYS_FLAGSCLR);
-   writel(data, vexpress_sysreg_base() + SYS_FLAGSSET);
-}
-
-unsigned int vexpress_get_mci_cardin(struct device *dev)
-{
-   return readl(vexpress_sysreg_base() + SYS_MCI) & SYS_MCI_CARDIN;
-}
-
-u32 vexpress_get_procid(int site)
-{
-   if (site == VEXPRESS_SITE_MASTER)
-   site = vexpress_sysreg_get_master();
+   if (WARN_ON(!base))
+   return;
 
-   return readl(vexpress_sysreg_base() + (site == VEXPRESS_SITE_DB1 ?
-   SYS_PROCID0 : SYS_PROCID1));
+   writel(~0, base + SYS_FLAGSCLR);
+   writel(data, base + SYS_FLAGSSET);
 }
 
-void __iomem *vexpress_get_24mhz_clock_base(void)
-{
-   return vexpress_sysreg_base() + SYS_24MHZ;
-}
-
-
-void __init vexpress_sysreg_early_init(void __iomem *base)
-{
-   __vexpress_sysreg_base = base;
-
-   vexpress_config_set_master(vexpress_sysreg_get_master());
-}
-
-
 /* The sysreg block is just a random collection of various functions... */
 
 static struct syscon_platform_data vexpress_sysreg_sys_id_pdata = {
@@ -210,6 +165,7 @@ static int vexpress_sysreg_probe(struct platform_device 
*pdev)
struct resource *mem;
void __iomem *base;
struct bgpio_chip *mmc_gpio_chip;
+   int master;
u32 dt_hbi;
 
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -220,11 +176,14 @@ static int vexpress_sysreg_probe(struct platform_device 
*pdev)
if (!base)
return -ENOMEM;
 
-   vexpress_config_set_master(vexpress_sysreg_get_master());
+   master = readl(base + SYS_MISC) & SYS_MISC_MASTERSITE ?
+   VEXPRESS_SITE_DB2 : VEXPRESS_SITE_DB1;
+   vexpress_config_set_master(master);
 
/* Confirm board type against DT property, if available */
if (of_property_read_u32(of_root, "arm,hbi", &dt_hbi) == 0) {
-   u32 id = vexpress_get_procid(VEXPRESS_SITE_MASTER);
+   u32 id = readl(base + (master == VEXPRESS_SITE_DB1 ?
+SYS_PROCID0 : SYS_PROCID1));
u32 hbi = (id >> SYS_PROCIDx_HBI_SHIFT) & SYS_HBI_MASK;
 
if (WARN_ON(dt_hbi != hbi))
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] power/reset: vexpress: Remove non-DT code

2015-01-20 Thread Pawel Moll

Now, as all VE platforms have to be booted with DT,
the code handling non-DT case can be removed.

Signed-off-by: Pawel Moll 
---
 drivers/power/reset/vexpress-poweroff.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/power/reset/vexpress-poweroff.c 
b/drivers/power/reset/vexpress-poweroff.c
index 9dfc9ce..be12d9b 100644
--- a/drivers/power/reset/vexpress-poweroff.c
+++ b/drivers/power/reset/vexpress-poweroff.c
@@ -111,23 +111,20 @@ static int _vexpress_register_restart_handler(struct 
device *dev)
 
 static int vexpress_reset_probe(struct platform_device *pdev)
 {
-   enum vexpress_reset_func func;
const struct of_device_id *match =
of_match_device(vexpress_reset_of_match, &pdev->dev);
struct regmap *regmap;
int ret = 0;
 
-   if (match)
-   func = (enum vexpress_reset_func)match->data;
-   else
-   func = pdev->id_entry->driver_data;
+   if (!match)
+   return -EINVAL;
 
regmap = devm_regmap_init_vexpress_config(&pdev->dev);
if (IS_ERR(regmap))
return PTR_ERR(regmap);
dev_set_drvdata(&pdev->dev, regmap);
 
-   switch (func) {
+   switch ((enum vexpress_reset_func)match->data) {
case FUNC_SHUTDOWN:
vexpress_power_off_device = &pdev->dev;
pm_power_off = vexpress_power_off;
@@ -144,20 +141,12 @@ static int vexpress_reset_probe(struct platform_device 
*pdev)
return ret;
 }
 
-static const struct platform_device_id vexpress_reset_id_table[] = {
-   { .name = "vexpress-reset", .driver_data = FUNC_RESET, },
-   { .name = "vexpress-shutdown", .driver_data = FUNC_SHUTDOWN, },
-   { .name = "vexpress-reboot", .driver_data = FUNC_REBOOT, },
-   {}
-};
-
 static struct platform_driver vexpress_reset_driver = {
.probe = vexpress_reset_probe,
.driver = {
.name = "vexpress-reset",
.of_match_table = vexpress_reset_of_match,
},
-   .id_table = vexpress_reset_id_table,
 };
 
 static int __init vexpress_reset_init(void)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps

2014-12-11 Thread Pawel Moll

On Thu, 2014-11-27 at 15:05 +, Pawel Moll wrote:
> On Thu, 2014-11-06 at 16:51 +0000, Pawel Moll wrote:
> > Until now, perf framework never defined the meaning of the timestamps
> > captured as PERF_SAMPLE_TIME sample type. The values were obtaining
> > from local (sched) clock, which is unavailable in userspace. This made
> > it impossible to correlate perf data with any other events. Other
> > tracing solutions have the source configurable (ftrace) or just share
> > a common time domain between kernel and userspace (LTTng).
> > 
> > Follow the trend by using monotonic clock, which is readily available
> > as POSIX CLOCK_MONOTONIC.
> > 
> > Also add a sysctl "perf_sample_time_clk_id" attribute which can be used
> > by the user to obtain the clk_id to be used with POSIX clock API (eg.
> > clock_gettime()) to obtain a time value comparable with perf samples.
> > 
> > Old behaviour can be restored by using "perf_use_local_clock" kernel
> > parameter.
> > 
> > Signed-off-by: Pawel Moll 
> 
> It's been 3 weeks without any negative feedback (no feedback at all, but
> I take the optimistic view :-)...
> 
> How about queuing at least this patch alone for the incoming merge
> window? Or at least getting it into -next, with the view at 3.20?

It's been another two weeks... How about getting it queued for v3.20
then?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] MAINTAINERS: ARM Versatile Express platform

2014-12-01 Thread Pawel Moll

On Fri, 2014-11-28 at 17:56 +, Joe Perches wrote:
> On Wed, 2014-11-26 at 12:13 +0000, Pawel Moll wrote:
> > This patch adds a separate section for the ARM
> > Versatile Express platform maintainers, listing
> > all different bits and bobs used by it.
> > 
> > Acked-by: Liviu Dudau 
> > Acked-by: Sudeep Holla 
> > Acked-by: Lorenzo Pieralisi 
> > Signed-off-by: Pawel Moll 
> > Acked-by: Michael Turquette 
> > ---
> >  MAINTAINERS | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0ff630d..924454f 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -1503,6 +1503,18 @@ S:   Maintained
> >  F: drivers/clk/ux500/
> >  F: include/linux/platform_data/clk-ux500.h
> >  
> > +ARM/VERSATILE EXPRESS PLATFORM
> > +M: Liviu Dudau 
> > +M: Sudeep Holla 
> > +M: Lorenzo Pieralisi 
> > +L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
> > +S: Maintained
> > +F: arch/arm/boot/dts/vexpress*
> > +F: arch/arm/mach-vexpress/
> > +F: */*/vexpress*
> > +F: drivers/clk/versatile/clk-vexpress-osc.c
> > +F: drivers/clocksource/versatile.c
> 
> Your original submission had:
> 
> F: drivers/power/reset/vexpress-poweroff.c
> 
> This file is not matched by:
> 
> F:*/*/vexpress*

Oh boy, of course. I'll send fix in a second.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 RESEND 1/2] ARM: vexpress: Remove non-DT code

2014-11-28 Thread Pawel Moll

On Fri, 2014-11-28 at 16:31 +, Arnd Bergmann wrote:
> I would suggest applying your patch from
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/223426/focus=224128

Yes, please :-)

Do you want me to resend?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 RESEND 1/2] ARM: vexpress: Remove non-DT code

2014-11-28 Thread Pawel Moll

On Fri, 2014-11-28 at 15:30 +, Russell King - ARM Linux wrote:
> mmci-pl18x 10005000.mmci: no support for card's volts
> mmc0: error -22 whilst initialising SD card

Just tried it again:

--8<
ARM Versatile Express Boot Monitor
Version:V5.2.1
Build Date: Apr  4 2013
Daughterboard Site 1: V2P-CA9 Cortex A9
Daughterboard Site 2: Not Used
Running boot script from flash - BOOTSCRIPT
Loaded FDT - ca9
Loaded initrd - busybox
Copied 0x40 bytes from 0x4800 to 0x60008000
Booting kernel @ 0x60008000
Command line 'console=ttyAMA0,38400 earlyprintk debug '
initrd @ 0x7000 (4127012 bytes)
FDT @ 0x6900 (14048 bytes)
Uncompressing Linux... done, booting the kernel.
Booting Linux on physical CPU 0x0
Initializing cgroup subsys cpuset
Linux version 3.18.0-rc6+ (pawmol01@hornet) (gcc version 4.9.2 20140904 
(prerelease) (crosstool-NG linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09) 
) #1442 SMP Mon Nov 24 16:43:30 GMT 2014
CPU: ARMv7 Processor [410fc091] revision 1 (ARMv7), cr=10c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine model: V2P-CA9
bootconsole [earlycon0] enabled
[...]
mmci-pl18x 10005000.mmci: Got CD GPIO
mmci-pl18x 10005000.mmci: Got WP GPIO
mmci-pl18x 10005000.mmci: Looking up vmmc-supply from device tree
mmci-pl18x 10005000.mmci: Looking up vqmmc-supply from device tree
mmci-pl18x 10005000.mmci: Looking up vqmmc-supply property in node 
/smb/motherboard/iofpga@7,/mmci@05000 failed
mmci-pl18x 10005000.mmci: No vqmmc regulator found
mmci-pl18x 10005000.mmci: mmc0: PL180 manf 41 rev0 at 0x10005000 irq 41,42 (pio)
[...]
mmc0: new SD card at address 88f7
mmcblk0: mmc0:88f7 SD02G 1.84 GiB 
 mmcblk0: p1 p2
[...]
/ # mount /dev/mmcblk0p2 /mnt
EXT2-fs (mmcblk0p2): warning: mounting unchecked fs, running e2fsck is 
recommended
--8<

vmmc-supply is a defined as:

vmmc-supply = <&v2m_fixed_3v3>;

which is:

v2m_fixed_3v3: fixedregulator@0 {
compatible = "regulator-fixed";

and arch/arm/mach-vexpress/Kconfig says:

menuconfig ARCH_VEXPRESS
[...]
select REGULATOR_FIXED_VOLTAGE if REGULATOR

but it doesn't select REGULATOR in result of this discussion:

http://thread.gmane.org/gmane.linux.ports.arm.kernel/223426/focus=223427

To summarize, I believe that setting CONFIG_REGULATOR=y (as in
vexpress_defconfig) should solve your issue.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps

2014-11-27 Thread Pawel Moll

On Thu, 2014-11-06 at 16:51 +, Pawel Moll wrote:
> Until now, perf framework never defined the meaning of the timestamps
> captured as PERF_SAMPLE_TIME sample type. The values were obtaining
> from local (sched) clock, which is unavailable in userspace. This made
> it impossible to correlate perf data with any other events. Other
> tracing solutions have the source configurable (ftrace) or just share
> a common time domain between kernel and userspace (LTTng).
> 
> Follow the trend by using monotonic clock, which is readily available
> as POSIX CLOCK_MONOTONIC.
> 
> Also add a sysctl "perf_sample_time_clk_id" attribute which can be used
> by the user to obtain the clk_id to be used with POSIX clock API (eg.
> clock_gettime()) to obtain a time value comparable with perf samples.
> 
> Old behaviour can be restored by using "perf_use_local_clock" kernel
> parameter.
> 
> Signed-off-by: Pawel Moll 

It's been 3 weeks without any negative feedback (no feedback at all, but
I take the optimistic view :-)...

How about queuing at least this patch alone for the incoming merge
window? Or at least getting it into -next, with the view at 3.20?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] MAINTAINERS: ARM Versatile Express platform

2014-11-26 Thread Pawel Moll

On Wed, 2014-11-26 at 12:17 +, Arnd Bergmann wrote:
> On Wednesday 26 November 2014 12:04:09 Pawel Moll wrote:
> > 
> > Right, of course. Whole directory.
> > 
> > > F:*/*/vexpress*
> > 
> > Cool, didn't think about multiple wildcards. 
> 
> Actually, I think you could even do this as
> 
> N:vexpress
> 
> which would be even shorter and match most of the other entries too.

If you think it's safe, fine with me. It would reduce this section to:

N:  vexpress
F:  drivers/clocksource/versatile.c

I'll spin the patch again.

Pawel




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] MAINTAINERS: ARM Versatile Express platform

2014-11-26 Thread Pawel Moll

This patch adds a separate section for the ARM
Versatile Express platform maintainers, listing
all different bits and bobs used by it.

Acked-by: Liviu Dudau 
Acked-by: Sudeep Holla 
Acked-by: Lorenzo Pieralisi 
Signed-off-by: Pawel Moll 
Acked-by: Michael Turquette 
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0ff630d..924454f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1503,6 +1503,18 @@ S:   Maintained
 F: drivers/clk/ux500/
 F: include/linux/platform_data/clk-ux500.h
 
+ARM/VERSATILE EXPRESS PLATFORM
+M: Liviu Dudau 
+M: Sudeep Holla 
+M: Lorenzo Pieralisi 
+L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
+S: Maintained
+F: arch/arm/boot/dts/vexpress*
+F: arch/arm/mach-vexpress/
+F: */*/vexpress*
+F: drivers/clk/versatile/clk-vexpress-osc.c
+F: drivers/clocksource/versatile.c
+
 ARM/VFP SUPPORT
 M: Russell King 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] MAINTAINERS: ARM Versatile Express platform

2014-11-26 Thread Pawel Moll

On Wed, 2014-11-26 at 01:59 +, Joe Perches wrote:
> On Tue, 2014-11-25 at 17:54 -0800, Mike Turquette wrote:
> > Quoting Pawel Moll (2014-11-25 10:17:35)
> > > This patch adds a separate section for the ARM
> > > Versatile Express platform maintainers, listing
> > > all different bits and bobs used by it.
> []
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> []
> > > +ARM/VERSATILE EXPRESS PLATFORM
> []
> > > +F: arch/arm/boot/dts/vexpress-*
> > > +F: arch/arm/mach-vexpress/*
> > > +F: drivers/bus/vexpress-config.c
> > > +F: drivers/clk/versatile/clk-vexpress-osc.c
> > > +F: drivers/clocksource/versatile.c
> > > +F: drivers/cpufreq/vexpress-spc-cpufreq.c
> > > +F: drivers/hwmon/vexpress.c
> > > +F: drivers/mfd/vexpress-sysreg.c
> > > +F: drivers/misc/vexpress-vexpress-syscfg.c
> > > +F: drivers/power/reset/vexpress-poweroff.c
> > > +F: drivers/regulator/vexpress.c
> > > +F: include/linux/vexpress.h
> > >
> 
> maybe
> F:arch/arm/boot/dts/vexpress*

Ok.

> F:arch/arm/mach-vexpress/

Right, of course. Whole directory.

> F:*/*/vexpress*

Cool, didn't think about multiple wildcards. 

> F:*/*/*/*vexpress*

I'd keep the clk-vexpress-osc.c explicit - it's an odd one, and don't
feel comfortable with the wildcard before vexpress.

> F:drivers/clocksource/versatile.c

Thanks for the suggestion, sill spin the patch in a second.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] MAINTAINERS: ARM Versatile Express platform

2014-11-25 Thread Pawel Moll

This patch adds a separate section for the ARM
Versatile Express platform maintainers, listing
all different bits and bobs used by it.

Acked-by: Liviu Dudau 
Acked-by: Sudeep Holla 
Acked-by: Lorenzo Pieralisi 
Signed-off-by: Pawel Moll 
---
 MAINTAINERS | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0ff630d..e0cb002 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1503,6 +1503,25 @@ S:   Maintained
 F: drivers/clk/ux500/
 F: include/linux/platform_data/clk-ux500.h
 
+ARM/VERSATILE EXPRESS PLATFORM
+M: Liviu Dudau 
+M: Sudeep Holla 
+M: Lorenzo Pieralisi 
+L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
+S: Maintained
+F: arch/arm/boot/dts/vexpress-*
+F: arch/arm/mach-vexpress/*
+F: drivers/bus/vexpress-config.c
+F: drivers/clk/versatile/clk-vexpress-osc.c
+F: drivers/clocksource/versatile.c
+F: drivers/cpufreq/vexpress-spc-cpufreq.c
+F: drivers/hwmon/vexpress.c
+F: drivers/mfd/vexpress-sysreg.c
+F: drivers/misc/vexpress-vexpress-syscfg.c
+F: drivers/power/reset/vexpress-poweroff.c
+F: drivers/regulator/vexpress.c
+F: include/linux/vexpress.h
+
 ARM/VFP SUPPORT
 M: Russell King 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 RESEND 1/2] ARM: vexpress: Remove non-DT code

2014-11-25 Thread Pawel Moll

Now, with the CLCD DT support available, there is no
more reason to keep the non-DT support for V2P-CA9.

Removed, together with "some" supporting code. It was
necessary to make PLAT_VERSATILE_SCHED_CLOCK optional
and selected by the machines still interested in it.

Acked-by: Mike Turquette 
Signed-off-by: Pawel Moll 
---
Changes since v1:

- Removed MFD and power/reset changes from the patch;
  will be posted separately after this is merged.

 arch/arm/Kconfig  |   2 +
 arch/arm/mach-vexpress/Kconfig|   3 -
 arch/arm/mach-vexpress/Makefile   |   3 +-
 arch/arm/mach-vexpress/core.h |   7 -
 arch/arm/mach-vexpress/ct-ca9x4.c | 212 
 arch/arm/mach-vexpress/include/mach/ct-ca9x4.h|  47 ---
 arch/arm/mach-vexpress/include/mach/hardware.h|   1 -
 arch/arm/mach-vexpress/include/mach/irqs.h|   6 -
 arch/arm/mach-vexpress/include/mach/motherboard.h |  88 -
 arch/arm/mach-vexpress/platsmp.c  |  42 ---
 arch/arm/mach-vexpress/v2m.c  | 374 --
 arch/arm/plat-versatile/Kconfig   |   2 +-
 drivers/clk/versatile/Makefile|   1 -
 drivers/clk/versatile/clk-vexpress-osc.c  |   7 -
 drivers/clk/versatile/clk-vexpress.c  |  86 -
 drivers/misc/vexpress-syscfg.c|  60 +---
 include/linux/vexpress.h  |  19 --
 17 files changed, 17 insertions(+), 943 deletions(-)
 delete mode 100644 arch/arm/mach-vexpress/ct-ca9x4.c
 delete mode 100644 arch/arm/mach-vexpress/include/mach/ct-ca9x4.h
 delete mode 100644 arch/arm/mach-vexpress/include/mach/hardware.h
 delete mode 100644 arch/arm/mach-vexpress/include/mach/irqs.h
 delete mode 100644 arch/arm/mach-vexpress/include/mach/motherboard.h
 delete mode 100644 drivers/clk/versatile/clk-vexpress.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 89c4b5c..6c6fdb8 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -350,6 +350,7 @@ config ARCH_REALVIEW
select ICST
select NEED_MACH_MEMORY_H
select PLAT_VERSATILE
+   select PLAT_VERSATILE_SCHED_CLOCK
help
  This enables support for ARM Ltd RealView boards.
 
@@ -365,6 +366,7 @@ config ARCH_VERSATILE
select ICST
select PLAT_VERSATILE
select PLAT_VERSATILE_CLOCK
+   select PLAT_VERSATILE_SCHED_CLOCK
select VERSATILE_FPGA_IRQ
help
  This enables support for ARM Ltd Versatile board.
diff --git a/arch/arm/mach-vexpress/Kconfig b/arch/arm/mach-vexpress/Kconfig
index b2cfba16c..9a96bab 100644
--- a/arch/arm/mach-vexpress/Kconfig
+++ b/arch/arm/mach-vexpress/Kconfig
@@ -49,9 +49,6 @@ config ARCH_VEXPRESS_CORTEX_A5_A9_ERRATA
  build a working kernel, you must also enable relevant core
  tile support or Flattened Device Tree based support options.
 
-config ARCH_VEXPRESS_CA9X4
-   bool "Versatile Express Cortex-A9x4 tile"
-
 config ARCH_VEXPRESS_DCSCB
bool "Dual Cluster System Control Block (DCSCB) support"
depends on MCPM
diff --git a/arch/arm/mach-vexpress/Makefile b/arch/arm/mach-vexpress/Makefile
index fc649bc..f5c1006 100644
--- a/arch/arm/mach-vexpress/Makefile
+++ b/arch/arm/mach-vexpress/Makefile
@@ -1,11 +1,10 @@
 #
 # Makefile for the linux kernel.
 #
-ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := -I$(srctree)/$(src)/include \
+ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := \
-I$(srctree)/arch/arm/plat-versatile/include
 
 obj-y  := v2m.o
-obj-$(CONFIG_ARCH_VEXPRESS_CA9X4)  += ct-ca9x4.o
 obj-$(CONFIG_ARCH_VEXPRESS_DCSCB)  += dcscb.o  dcscb_setup.o
 CFLAGS_dcscb.o += -march=armv7-a
 CFLAGS_REMOVE_dcscb.o  = -pg
diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
index 152fad9..2a11d3a 100644
--- a/arch/arm/mach-vexpress/core.h
+++ b/arch/arm/mach-vexpress/core.h
@@ -1,12 +1,5 @@
-/* 2MB large area for motherboard's peripherals static mapping */
-#define V2M_PERIPH 0xf800
-
-/* Tile's peripherals static mappings should start here */
-#define V2T_PERIPH 0xf820
-
 bool vexpress_smp_init_ops(void);
 
-extern struct smp_operations   vexpress_smp_ops;
 extern struct smp_operations   vexpress_smp_dt_ops;
 
 extern void vexpress_cpu_die(unsigned int cpu);
diff --git a/arch/arm/mach-vexpress/ct-ca9x4.c 
b/arch/arm/mach-vexpress/ct-ca9x4.c
deleted file mode 100644
index 27bea04..000
--- a/arch/arm/mach-vexpress/ct-ca9x4.c
+++ /dev/null
@@ -1,212 +0,0 @@
-/*
- * Versatile Express Core Tile Cortex A9x4 Support
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-#include 
-
-#include 
-#include 
-
-#include "core.h"
-
-#include 
-#i

Re: [PATCH] video: ARM CLCD: Remove duplicated include in amba-clcd.c

2014-11-17 Thread Pawel Moll

On Mon, 2014-11-17 at 09:25 +, Qiang Chen wrote:
> This patch fixes duplicated include dma-mapping.h in
> drivers/video/fbdev/amba-clcd.c
> 
> Signed-off-by: Qiang Chen 
> ---
>  drivers/video/fbdev/amba-clcd.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/amba-clcd.c b/drivers/video/fbdev/amba-clcd.c
> index 6ad23bd..32c0b6b 100644
> --- a/drivers/video/fbdev/amba-clcd.c
> +++ b/drivers/video/fbdev/amba-clcd.c
> @@ -27,7 +27,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 

Acked-by: Pawel Moll 

Thanks!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-14 Thread Pawel Moll

On Thu, 2014-11-13 at 09:39 +, Shannon Zhao wrote:
> When we use only virtio-mmio or vhost-net without irqfd, the device uses 
> qemu_set_irq(within qemu)
> to inject interrupt and at the same time qemu update 
> "VIRTIO_MMIO_INTERRUPT_STATUS" to tell guest
> driver whom this interrupt to. All these things are done by qemu. Though 
> qemu_set_irq will call
> kvm_set_irq to do the real interrupt inject, but the trigger is qemu and it 
> can update the
> "VIRTIO_MMIO_INTERRUPT_STATUS" register before injecting the interrupt.
> 
> But if we use vhost-net with irqfd, the device uses ioeventfd mechanism to 
> inject interrupt.
> When an interrupt happened, it doesn't transfer to qemu, while the irqfd 
> finally call kvm_set_irq
> to inject the interrupt directly. All these things are done in kvm and it 
> can't update the
> "VIRTIO_MMIO_INTERRUPT_STATUS" register.
> So if the guest driver still uses the old interrupt handler, it has to read 
> the
> "VIRTIO_MMIO_INTERRUPT_STATUS" register to get the interrupt reason and run 
> different handlers
> for different reasons. But the register has nothing and none of handlers will 
> be called.
> 
> I make myself clear? :-)

I see... (well, at least I believe I see ;-)

Of course this means that with irqfd, the device is simply non-compliant
with the spec. And that extending the status register doesn't help you
at all, so we can drop this idea.

Paradoxically, it's a good news (of a sort :-) because I don't think we
should be doing such a fundamental change in the spec at this date. I'll
discuss it with others in the TC and I'm open to be convinced otherwise,
but I believe the spec should stay as it is.

Having said that, I see no problem with experimenting with the device
model so the next version of the standard is extended. Two suggestions I
have would be:

1. Drop the virtio-pci like case with two interrupts (one for config
changes, one for all queues), as it doesn't bring anything new. Just
make it all interrupts individual.

2. Change the way the interrupts are acknowledge - instead of a bitmask,
have a register which takes a queue number to ack the queue's interrupt
and ~0 to ack the config interrupt.

3. Change the version of the device to (intially) 0x8003 - I've just
made an executive decision :-) that non-standard compliant devices
should have the MSB of the version number set (I'll propose to reserve
this range of versions in future version of the spec). We'll consider it
a "prototype of the version 3". Then make the driver behave in the new
way when (and only when) such device version is observed.

Also, I remembered I haven't addressed one of your previous comments:

On Wed, 2014-11-12 at 08:32 +, Shannon Zhao wrote:
> > One point I'd like to make is that the device was intentionally designed
> > with simplicity in mind first, performance later (something about
> > "embedded" etc" :-). Changing this assumption is of course possible, but
> Ah, I think ARM is not only about embedded things. Maybe it could has
> a wider application > such as micro server. Just my personal opinion.

By all means, I couldn't agree more. But there's one thing you have to
keep in mind - it doesn't matter whether the real hardware has got PCI
or not, but what is emulated by qemu/KVM. Therefore, if only you can get
the PCI host controller working in the qemu virt machine (which should
be much simpler now, as we have generic support for PCI
controllers/bridges in the kernel now), you'll be able to forget the
issue of virtio-mmio and multiple interrupts and still enjoy your
performance gains :-)

Does it all make sense?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-12 Thread Pawel Moll

On Wed, 2014-11-12 at 08:32 +, Shannon Zhao wrote:
> On 2014/11/11 23:11, Pawel Moll wrote:
> > On Tue, 2014-11-04 at 09:35 +, Shannon Zhao wrote:
> >> As the current virtio-mmio only support single irq,
> >> so some advanced features such as vhost-net with irqfd
> >> are not supported. And the net performance is not
> >> the best without vhost-net and irqfd supporting.
> > 
> > Could you, please, help understanding me where does the main issue is?
> > Is it about:
> > 
> > 1. The fact that the existing implementation blindly kicks all queues,
> > instead only of the updated ones?
> > 
> > or:
> > 
> > 2. Literally having a dedicated interrupt line (remember, we're talking
> > "real" interrupts here, not message signalled ones) per queue, so they
> > can be handled by different processors at the same time?
> > 
> The main issue is that current virtio-mmio only support one interrupt which 
> is shared by
> config and queues. 

Yes.

Additional question: is your device changing the configuration space
often? Other devices (for example block devices) change configuration
very rarely (eg. change of the media size, usually meaning
hot-un-plugging), so unless you tell me that the config space is
frequently changes, I'll consider the config event irrelevant from the
performance point of view.

> Therefore the virtio-mmio driver should read the
> "VIRTIO_MMIO_INTERRUPT_STATUS" to get the interrupt reason and check whom 
> this interrupt is to.

Correct.

> If we use vhost-net which uses irqfd to inject interrupt, the
> vhost-net doesn't update "VIRTIO_MMIO_INTERRUPT_STATUS", then the
> guest driver can't read the interrupt reason and doesn't call a
> handler to process.

Ok, so this is the bit I don't understand. Explain, please how does this
whole thing work. And keep in mind that I've just looked up "irqfd" for
the first time in my life, so know nothing about its implementation. The
same applies to "vhost-net". I'm simply coming from a completely
different background, so bear with me on this.

Now, the virtio-mmio device *must* behave in a certain way, as specified
in both the old and the new version of the spec. If a Used Ring of *any*
of the queues has been updated the device *must* set bit 0 of the
interrupt status register, and - in effect - generate the interrupt.

But you are telling me that in some circumstances the interrupt is *not*
generated when a queue requires handling. Sounds like a bug to me, as it
is not following the requirements of the device specification.

It is quite likely that I simply don't see some facts which are obvious
for you, so please - help me understand.

> So we can assign a dedicated interrupt line per queue for virtio-mmio
> and it can work with irqfd.
> 
If my reasoning above is correct, I'd say that you are just working
around a functional bug, not improving performance. And this is a wrong
way of solving the problem.

> Theoretically the number of interrupts has no limit, but as the limit
> of ARM interrupt line, the number should  be less than ARM interrupt
> lines. 

Let me just emphasize the fact that virtio-mmio is not related to ARM in
any way, it's just a memory mapped device with a generic interrupt
output. Any limitations of a particular hardware platform (I guess
you're referring to the maximum number of peripheral interrupts a GIC
can take) are irrelevant - if we go with multiple interrupts, it must
cater for as many interrupt lines as one wishes... :-)

> In the real situation, I think, the number
> is generally less than 17 (8 pairs of vring interrupts and one config
> interrupt).

... but of course we could optimize for the real scenarios. And I'm glad
to hear that the number you quoted is less then 30 :-)

> >> we use GSI for multiple irq. 
> > 
> > I'm not sure what GSI stands for, but looking at the code I assume it's
> > just a "normal" peripheral interrupt.
> > 
> >> In this patch we use "vm_try_to_find_vqs"
> >> to check whether multiple irqs are supported like
> >> virtio-pci.
> > 
> > Yeah, I can see that you have followed virtio-pci quite literally. I'm
> > particularly not convinced to the one interrupt for config, one for all
> > queues option. Doesn't make any sense to me here.
> > 
> About one interrupt for all queues, it's not a typical case. But just offer
> one more choice for users. Users should configure the number of interrupts
> according to their situation.

> >> Is this the right direction? is there other ways to
> >> make virtio-mmio support multiple irq? Hope for feed

Re: [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-11 Thread Pawel Moll

On Tue, 2014-11-04 at 09:35 +, Shannon Zhao wrote:
> As the current virtio-mmio only support single irq,
> so some advanced features such as vhost-net with irqfd
> are not supported. And the net performance is not
> the best without vhost-net and irqfd supporting.

Could you, please, help understanding me where does the main issue is?
Is it about:

1. The fact that the existing implementation blindly kicks all queues,
instead only of the updated ones?

or:

2. Literally having a dedicated interrupt line (remember, we're talking
"real" interrupts here, not message signalled ones) per queue, so they
can be handled by different processors at the same time?

Now, if it's only about 1, the simplest solution would be to extend the
VIRTIO_MMIO_INTERRUPT_STATUS register to signal up to 30 queues
"readiness" in bits 2-31, still keeping bit 0 as a "combined"
VIRTIO_MMIO_INT_VRING. In case when VIRTIO_MMIO_INT_VRING is set and
none of the "individual" bits is (a device which doesn't support this
feature or one that has more than 30 queues and of of those is ready) we
would fall back to the original "kick all queues" approach. This could
be a useful (and pretty simple) extension. In the worst case scenario it
could be a post-1.0 standard addition, as it would provide backward
compatibility.

However, if it's about 2, we're talking larger changes here. From the
device perspective, we can define it as having per-queue (plus one for
config) interrupt output *and* a "combined" output, being simple logical
"or" of all the others. Then, the Device Tree bindings would be used to
express the implementation choices (I'd keep the kernel parameter
approach supporting the single interrupt case only). This is a very
popular and well understood approach for memory mapped peripherals (for
example, see the . It allows the system integrator to make a decision
when it's coming to latency vs number interrupt lines trade off. The
main issue is that we can't really impose a limit on a number of queues,
therefore on a number of interrupts. This would require adding a new
"interrupt acknowledge" register, which would take a number of the queue
(or a symbolic value for the config one) instead of a bit mask. And I
must say that I'm not enjoying the idea of such substantial change to
the specification that late in the process... (in other words: you'll
have to put extra effort into convincing me :-)

> This patch support virtio-mmio to request multiple
> irqs like virtio-pci. With this patch and qemu assigning
> multiple irqs for virtio-mmio device, it's ok to use
> vhost-net with irqfd on arm/arm64.

Could you please tell me how many queues (interrupts) are we talking
about in this case? 5? A dozen? Hundreds?

Disclaimer: I have no personal experience with virtio and network (due
to the fact how our Fast Models are implemented, I mostly us block
devices and 9p protocol over virtio and I get enough performance from
them :-).

> As arm doesn't support msi-x now, 

To be precise: "ARM" does "support" MSI-X :-) (google for GICv2m)

The correct statement would be: "normal memory mapped devices have no
interface for message signalled interrupts (like MSI-X)"

> we use GSI for multiple irq. 

I'm not sure what GSI stands for, but looking at the code I assume it's
just a "normal" peripheral interrupt.

> In this patch we use "vm_try_to_find_vqs"
> to check whether multiple irqs are supported like
> virtio-pci.

Yeah, I can see that you have followed virtio-pci quite literally. I'm
particularly not convinced to the one interrupt for config, one for all
queues option. Doesn't make any sense to me here.

> Is this the right direction? is there other ways to
> make virtio-mmio support multiple irq? Hope for feedback.

One point I'd like to make is that the device was intentionally designed
with simplicity in mind first, performance later (something about
"embedded" etc" :-). Changing this assumption is of course possible, but
- I must say - makes me slightly uncomfortable... The extensions we're
discussing here seem doable, but I've noticed your other patches doing
with a shared memory region and I didn't like them at all, sorry.

I see the subject has been already touched in the discussions, but let
me bring PCI to the surface again. We're getting more server-class SOCs
in the market, which obviously bring PCI with them to both arm and arm64
world, something unheard of in the "mobile past". I believe the PCI
patches for the arm64 have been already merged in the kernel.

Therefore: I'm not your boss so, obviously, I can't tell you what to do,
but could you consider redirecting your efforts into getting the "ARM
PCI" up and running in qemu so you can simply use the existing
infrastructure? This would save us a lot of work and pain in doing late
functional changes to the standard and will be probably more
future-proof from your perspective (PCI will happen, sooner or later -
you can make it sooner ;-)

Regards

Pawel

--
To unsubscribe from this list:

[PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps

2014-11-06 Thread Pawel Moll

Until now, perf framework never defined the meaning of the timestamps
captured as PERF_SAMPLE_TIME sample type. The values were obtaining
from local (sched) clock, which is unavailable in userspace. This made
it impossible to correlate perf data with any other events. Other
tracing solutions have the source configurable (ftrace) or just share
a common time domain between kernel and userspace (LTTng).

Follow the trend by using monotonic clock, which is readily available
as POSIX CLOCK_MONOTONIC.

Also add a sysctl "perf_sample_time_clk_id" attribute which can be used
by the user to obtain the clk_id to be used with POSIX clock API (eg.
clock_gettime()) to obtain a time value comparable with perf samples.

Old behaviour can be restored by using "perf_use_local_clock" kernel
parameter.

Signed-off-by: Pawel Moll 
---
Ingo, I remember your comments about this approach in the past, but
during discussions at LPC Thomas was convinced that it's the right
thing to do - see cover letter for the series...

Changes since v3:

- Added "perf_use_lock_clock" parameter...

- ... and creating the sysctl value only when it's not defined (turned
  out that negative clk_ids are not invalid - they are used to describe
  dynamic clocks)

- Had to keep sysctl_perf_sample_time_clk_id non-const, because struct
  ctl_table.data is non-const

 Documentation/kernel-parameters.txt |  9 +
 kernel/events/core.c| 37 +
 2 files changed, 46 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 4c81a86..8ead8d8 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -91,6 +91,7 @@ parameter is applicable:
NUMANUMA support is enabled.
NFS Appropriate NFS support is enabled.
OSS OSS sound support is enabled.
+   PERFPerformance events and counters support is enabled.
PV_OPS  A paravirtualized kernel is enabled.
PARIDE  The ParIDE (parallel port IDE) subsystem is enabled.
PARISC  The PA-RISC architecture is enabled.
@@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
allocator.  This parameter is primarily for debugging
and performance comparison.
 
+   perf_use_local_clock
+   [PERF]
+   Use local_clock() as a source for perf timestamps
+   generation. This was be the default behaviour and
+   this parameter can be used to maintain backward
+   compatibility or on older hardware with expensive
+   monotonic clock source.
+
pf. [PARIDE]
See Documentation/blockdev/paride.txt.
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2b02c9f..5d0aa03 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -322,8 +323,41 @@ extern __weak const char *perf_pmu_name(void)
return "pmu";
 }
 
+static bool perf_use_local_clock;
+static int __init perf_use_local_clock_setup(char *__unused)
+{
+   perf_use_local_clock = true;
+   return 1;
+}
+__setup("perf_use_local_clock", perf_use_local_clock_setup);
+
+static int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC;
+
+static struct ctl_table perf_sample_time_kern_table[] = {
+   {
+   .procname   = "perf_sample_time_clk_id",
+   .data   = &sysctl_perf_sample_time_clk_id,
+   .maxlen = sizeof(int),
+   .mode   = 0444,
+   .proc_handler   = proc_dointvec,
+   },
+   {}
+};
+
+static struct ctl_table perf_sample_time_root_table[] = {
+   {
+   .procname   = "kernel",
+   .mode   = 0555,
+   .child  = perf_sample_time_kern_table,
+   },
+   {}
+};
+
 static inline u64 perf_clock(void)
 {
+   if (likely(!perf_use_local_clock))
+   return ktime_get_mono_fast_ns();
+
return local_clock();
 }
 
@@ -8230,6 +8264,9 @@ void __init perf_event_init(void)
 */
BUILD_BUG_ON((offsetof(struct perf_event_mmap_page, data_head))
 != 1024);
+
+   if (!perf_use_local_clock)
+   register_sysctl_table(perf_sample_time_root_table);
 }
 
 static int __init perf_event_sysfs_init(void)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 2/3] perf: Userspace event

2014-11-06 Thread Pawel Moll

This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by
any process to inject custom data into perf data stream as a new
PERF_RECORD_UEVENT record, if such process is being observed or if it
is running on a CPU being observed by the perf framework.

The prctl call takes the following arguments:

prctl(PR_TASK_PERF_UEVENT, type, size, data, flags);

- type: a number meaning to describe content of the following data.
  Kernel does not pay attention to it and merely passes it further in
  the perf data, therefore its use must be agreed between the events
  producer (the process being observed) and the consumer (performance
  analysis tool). The perf userspace tool will contain a repository of
  "well known" types and reference implementation of their decoders.
- size: Length in bytes of the data.
- data: Pointer to the data.
- flags: Reserved for future use. Always pass zero.

Perf context that are supposed to receive events generated with the
prctl above must be opened with perf_event_attr.uevent set to 1. The
PERF_RECORD_UEVENT records consist of a standard perf event header,
32-bit type value, 32-bit data size and the data itself, followed by
padding to align the overall record size to 8 bytes and optional,
standard sample_id field.

Example use cases:

- "perf_printf" like mechanism to add logging messages to perf data;
  in the simplest case it can be just

prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0);

- synchronisation of performance data generated in user space with the
  perf stream coming from the kernel. For example, the marker can be
  inserted by a JIT engine after it generated portion of the code, but
  before the code is executed for the first time, allowing the
  post-processor to pick the correct debugging information.

Signed-off-by: Pawel Moll 
---

Changes since v3:

- none

 include/linux/perf_event.h  |  4 +++
 include/uapi/linux/perf_event.h | 23 -
 include/uapi/linux/prctl.h  | 10 ++
 kernel/events/core.c| 71 +
 kernel/sys.c|  5 +++
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 893a0d0..680767d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -721,6 +721,8 @@ extern int perf_unregister_guest_info_callbacks(struct 
perf_guest_info_callbacks
 extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
 extern void perf_event_fork(struct task_struct *tsk);
+extern int perf_uevent(struct task_struct *tsk, u32 type, u32 size,
+  const char __user *data);
 
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
@@ -829,6 +831,8 @@ static inline void perf_event_mmap(struct vm_area_struct 
*vma)  { }
 static inline void perf_event_exec(void)   { }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
 static inline void perf_event_fork(struct task_struct *tsk){ }
+static inline int perf_uevent(struct task_struct *tsk, u32 type, u32 size,
+ const char __user *data)  { 
return -1; };
 static inline void perf_event_init(void)   { }
 static inline int  perf_swevent_get_recursion_context(void){ 
return -1; }
 static inline void perf_swevent_put_recursion_context(int rctx)
{ }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9d84540..9a64eb1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -303,7 +303,8 @@ struct perf_event_attr {
exclude_callchain_user   : 1, /* exclude user 
callchains */
mmap2  :  1, /* include mmap with inode 
data */
comm_exec  :  1, /* flag comm events that 
are due to an exec */
-   __reserved_1   : 39;
+   uevents:  1, /* allow uevents into the 
buffer */
+   __reserved_1   : 38;
 
union {
__u32   wakeup_events;/* wakeup every n events */
@@ -712,6 +713,26 @@ enum perf_event_type {
 */
PERF_RECORD_MMAP2   = 10,
 
+   /*
+* Data in userspace event record is transparent for the kernel
+*
+* Userspace perf tool code maintains a list of known types with
+* reference implementations of parsers for the data field.
+*
+* Overall size of the record (including type and size fields)
+* is always aligned to 8 bytes by adding padding after the data.
+*
+* struct {
+*  struct perf_event_headerheader;
+*  u32

[PATCH v4 3/3] perf: Sample additional clock value

2014-11-06 Thread Pawel Moll

This patch adds an option to sample value of an additional clock with
any perf event, with the the aim of allowing time correlation between
data coming from perf and other sources like hardware trace which is
timestamped with an external clock.

The idea is to generate periodic perf record containing timestamps from
two different sources, sampled as close to each other as possible. This
allows performing simple linear approximation:

  other event
   -O--+-O--> t_other
:  | :
:  V :
   -OO--> t_perf
perf event

User can request such samples for any standard perf event, with the most
obvious examples of cpu-clock (hrtimer) at selected frequency or other
periodic events like sched:sched_switch.

In order to do this, PERF_SAMPLE_CLOCK has to be set in struct
perf_event_attr.sample_type and a type of the clock to be sampled
selected by setting perf_event_attr.clock to a value corresponding to
a POSIX clock clk_id (see "man 2 clock_gettime")

Currently three clocks are implemented: CLOCK_REALITME = 0,
CLOCK_MONOTONIC = 1 and CLOCK_MONOTONIC_RAW = 2. The clock field is
5 bits wide to allow for future extension to custom, non-POSIX clock
sources(MAX_CLOCK for those is 16, see include/uapi/linux/time.h) like
ARM CoreSight (hardware trace) timestamp generator.

Signed-off-by: Pawel Moll 
---

Changes since v3:

- none

 include/linux/perf_event.h  |  2 ++
 include/uapi/linux/perf_event.h | 16 ++--
 kernel/events/core.c| 30 ++
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 680767d..b690a0d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -607,6 +607,8 @@ struct perf_sample_data {
 * Transaction flags for abort events:
 */
u64 txn;
+   /* Clock value (additional timestamp for time correlation) */
+   u64 clock;
 };
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9a64eb1..53a7a72 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -137,8 +137,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_DATA_SRC= 1U << 15,
PERF_SAMPLE_IDENTIFIER  = 1U << 16,
PERF_SAMPLE_TRANSACTION = 1U << 17,
+   PERF_SAMPLE_CLOCK   = 1U << 18,
 
-   PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */
+   PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */
 };
 
 /*
@@ -304,7 +305,16 @@ struct perf_event_attr {
mmap2  :  1, /* include mmap with inode 
data */
comm_exec  :  1, /* flag comm events that 
are due to an exec */
uevents:  1, /* allow uevents into the 
buffer */
-   __reserved_1   : 38;
+
+   /*
+* clock: one of the POSIX clock IDs:
+*
+* 0 - CLOCK_REALTIME
+* 1 - CLOCK_MONOTONIC
+* 4 - CLOCK_MONOTONIC_RAW
+*/
+   clock  :  5, /* clock type */
+   __reserved_1   : 33;
 
union {
__u32   wakeup_events;/* wakeup every n events */
@@ -544,6 +554,7 @@ enum perf_event_type {
 *  { u64   id;   } && PERF_SAMPLE_ID
 *  { u64   stream_id;} && PERF_SAMPLE_STREAM_ID
 *  { u32   cpu, res; } && PERF_SAMPLE_CPU
+*  { u64   clock;} && PERF_SAMPLE_CLOCK
 *  { u64   id;   } && PERF_SAMPLE_IDENTIFIER
 * } && perf_event_attr::sample_id_all
 *
@@ -687,6 +698,7 @@ enum perf_event_type {
 *  { u64   weight;   } && PERF_SAMPLE_WEIGHT
 *  { u64   data_src; } && PERF_SAMPLE_DATA_SRC
 *  { u64   transaction; } && 
PERF_SAMPLE_TRANSACTION
+*  { u64   clock;} && PERF_SAMPLE_CLOCK
 * };
 */
PERF_RECORD_SAMPLE  = 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9a2d082..816baa6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1264,6 +1264,9 @@ static void perf_event__header_size(s

[PATCH v4 0/3] perf: User/kernel time correlation and event generation

2014-11-06 Thread Pawel Moll

This is a second version of the post-LPC series. The changes are limited
to the first patch, adding a backward-compatibility kernel parameter.

The v3 cover letter:

I've organised a session on the subject during the tracing minisummit
at LPC 2014 in Dusseldorf. Notes taken from the discussion taken by
Steven Rostedt (thanks Steve!)

http://www.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_Tracing.txt

The following three patches address three main topics. They are pretty
much orthogonal and (subject to small and obvious modifications) could
be applied independently from each other.

An executive summary of both discussion and the patches:

1. User/kernel perf timestamps correlation

Thomas suggested that, instead of jumping through loops of correlation,
perf should simply generate monotonic clock timestamps, instead of
using sched clock. Peter and I pointed out that Ingo didn't like this
idea as monotonic can be slow, but apparently the cases when it is are
irrelevant. Thomas offered to fly to Budapest to physically convince
Ingo - I hope it won't be necessary and he will be able to achieve
this here, on mailing lists :-)

Setting aside potential performance problems, it would be a really
great solution, unifying all trace systems (perf, ftrace and LTTng)
in this respect. I'm more than happy to work on potential improvements
in the are of monotonic clock if it was to help the cause.

If it is a definite no-go, we still have the third patch, allowing post-
capture correlation based on synchronisation events.

2. User event generation

Everyone present agreed that it would be a very-nice-to-have feature.
There was some discussion about implementation details, so I welcome
feedback and comments regarding my take on the matter.

3. Correlation with external timestamps

This is a new issue, which surfaced recently while I was working on
hardware trace infrastructure. It can timestamp trace packets, but is
using yet another, completely different time source to do this.

Thomas suggested solution which gets down to my original proposal for
sched/monotonic clock correlation - an additional sample type so events
can be "double stamped" using different clock sources providing
synchronisation points for later time approximation. I've just extended
the implementation with configuration value to select the clock source.
If the first patch (making perf timestamps monotonic) gets accepted,
there will be no immediate need for this one, but I'd like to gain some
feedback anyway.


That's all for this series. Previous versions:

- RFC: http://www.spinics.net/lists/kernel/msg1824419.html
- v1: http://thread.gmane.org/gmane.linux.kernel/1790231
- v2: http://thread.gmane.org/gmane.linux.kernel/1793272
- v3: http://thread.gmane.org/gmane.linux.kernel.api/5681


Pawel Moll (3):
  perf: Use monotonic clock as a source for timestamps
  perf: Userspace event
  perf: Sample additional clock value

 Documentation/kernel-parameters.txt |   9 +++
 include/linux/perf_event.h  |   6 ++
 include/uapi/linux/perf_event.h |  37 +-
 include/uapi/linux/prctl.h  |  10 +++
 kernel/events/core.c| 138 
 kernel/sys.c|   5 ++
 6 files changed, 203 insertions(+), 2 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/3] perf: Userspace event

2014-11-04 Thread Pawel Moll

On Tue, 2014-11-04 at 06:33 +, Namhyung Kim wrote:
> Hi Pawel,
> 
> On Tue,  4 Nov 2014 00:28:37 +0000, Pawel Moll wrote:
> > +   /*
> > +* Data in userspace event record is transparent for the kernel
> > +*
> > +* Userspace perf tool code maintains a list of known types with
> > +* reference implementations of parsers for the data field.
> > +*
> > +* Overall size of the record (including type and size fields)
> > +* is always aligned to 8 bytes by adding padding after the data.
> > +*
> > +* struct {
> > +*  struct perf_event_headerheader;
> > +*  u32 type;
> > +*  u32 size;
> 
> The struct perf_event_header also has 'size' field and it has the entire
> length of the record so it's redundant. 

Well, is it? Correct me if I'm wrong, but as far as I remember the
record size must be always aligned to 8 bytes. Thus you can't reliably
derive the data size from it - if I my data is 3 bytes long, I have to
add 5 bytes of padding thus making the header.size = 24 (I'm ignoring
sample_id here), right? So now, decoding the record, all I can do is:
header.size - sizeof(header) - sizeof(type) - sizeof(size) = 24 - 8 - 8
= 8. So, basing on the header.size the data is 8 bytes long. But only 3
first bytes are valid...

To summarize, there are three options:

1. I'm wrong and the record doesn't have to be padded to make it 8 bytes
aligned. Then I can drop the additional size field.

2. I could impose a limitation on the prctl API that the data size must
be 8 bytes aligned. Bad idea in my opinion, I'd rather not.

3. The additional size (for the data part) field stays. Notice that
PERF_SAMPLE_RAW has it as well :-)

>  Also there's 'misc' field in the
> perf_event_header and I guess it can be used as 'type' info as it's
> mostly for cpumode and we are in user mode by definition.

Hm. First of all, I don't really like the idea of "overloading" the misc
meaning. It's a set of flags and I'd rather see it staying like this.

Secondly, I'm not sure that it can be reused - we are in user mode,
true, but it can be either PERF_RECORD_MISC_USER or
PERF_RECORD_MISC_GUEST_USER.

Thirdly, misc is "only" 16 bits wide, and someone even asked for the
type to be 64 bit long! (I suspect he wanted to use it in some special,
hacky way though :-) 32 bit length seems like a reasonable choice,
though.

Do you feel that the "unnecessary" type field is a big problem?

Thanks for your time!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation

2014-11-04 Thread Pawel Moll

On Tue, 2014-11-04 at 09:24 +, Masami Hiramatsu wrote:
> What I'd like to do is the binary version of ftrace-marker, the text
> version is already supported by qemu (see below).
> https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg00505.html
> 
> But since that is just a string data (not structured data), it is hard to
> analyze via perf-script or some other useful filters/triggers in ftrace.
> 
> In my idea, the new event will be defined via a special file in debugfs like
> kprobe-events, like below.
> 
>   # cd $debugfs/tracing
>   # echo "newgrp/newevent signarg:s32 flag:u64" >> marker_events
>   # cat events/newgrp/newevent/format
>   name: newevent
>   ID: 2048
>   format:
> field:unsigned short common_type;   offset:0;   size:2; 
> signed:0;
> field:unsigned char common_flags;   offset:2;   size:1; 
> signed:0;
> field:unsigned char common_preempt_count;   offset:3;   
> size:1;signed:0;
> field:int common_pid;   offset:4;   size:4; signed:1;
> 
> field:s32 signarg;  offset:8;  size:4; signed:1;
> field:u64 flag; offset:12;  size:8; signed:0;
> 
>   print fmt: "signarg=%d flag=0x%Lx", REC->signarg, REC->flag
> 
> Then, users will write the data (excluded common fields) when the event 
> happens
> via trace_marker which start with '\0'ID(in u32). Kernel just checks the ID 
> and
> its data size, but doesn't parse, filter/trigger it and log it into the 
> kernel buffer.

Very neat, I like it! Certainly useful with scripting. Any gut feeling
regarding the kernel version it will be ready for? 3.19 or later than
that?

> Of course, this has a downside that the user must have a privilege to access 
> to debugfs.
> Thus maybe we need both of prctl() IF for perf and this IF for ftrace.

I don't have any particularly strong feelings about the solution as long
as I'm able to create this "synchronisation point" of mine in the perf
data. In one of this patch's previous incarnations I was also doing a
write() to the perf fd to achieve pretty much the same result.

In my personal use case root access to debugfs isn't a problem (I need
it for other ftrace operations anyway). However Ingo and some other guys
seemed interested in prctl() approach because: 1. it's much simpler to
use even comparing with simple trace_marker's open(path)/write()/close()
and 2. because any process can do it at any time and the results are
quietly discarded if no one is listening. I also remember that when I
proposed sort of "unification" between trace_marker and the uevents,
Ingo straight away "suggested" keeping it separate.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/3] perf: Use monotonic clock as a source for timestamps

2014-11-04 Thread Pawel Moll

On Tue, 2014-11-04 at 07:23 +, Peter Zijlstra wrote:
> On Tue, Nov 04, 2014 at 12:28:36AM +0000, Pawel Moll wrote:
> 
> > +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC;
> 
> const ?

Sure (unless we have to change it as mentioned below)

> >  /*
> >   * perf samples are done in some very critical code paths (NMIs).
> >   * If they take too much CPU time, the system can lock up and not
> > @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void)
> >  
> >  static inline u64 perf_clock(void)
> >  {
> > -   return local_clock();
> > +   return ktime_get_mono_fast_ns();
> >  }
> 
> Do we maybe want to make it boot-time switchable back to local_clock for
> people with bad systems and or backwards compat issues?

Very good idea, should have came up with it myself :-)

Does __setup("perf_use_local_clock") sound reasonable? Then we have to
decide whether to hide the sysctl "perf_sample_time_clk_id" (my
preferred option, will see how difficult it is) or just provide an
invalid clock_id (eg. -1) in it.

Cheers!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation

2014-11-04 Thread Pawel Moll

On Tue, 2014-11-04 at 01:25 +, Andy Lutomirski wrote:
> >> If you're going to add double-stamped packets, can you also add a
> >> syscall to read multiple clocks at once, atomically?  Or can you
> >> otherwise add a non-perf mechanism to get at this data?
> >
> > I've got some thoughts on what a possible interface that wouldn't be
> > awful could look like, but I'm still hesitant because I don't really
> > know if exposing this sort of data is actually a good idea long term.
> 
> My only real thought here is that, if perf is going to try to do this,
> then presumably it should be reasonably integrated w/ the core timing
> code.  I.e. if perf does this, then presumably the core code should
> know about it and there should be a core interface to it.

I think I understand where you're coming from. Arnd's idea for the API
seems reasonable, although I can't promise implementing a proposal
(don't make me stop you from doing it :-).

As to the perf-specific correlation, I'm assuming limited accuracy.
Others already mentioned that in the absence of hardware support, the
time values are never really "atomic". The best what can be done is to
access them as near to each other in the code as possible and make sure
it happens in a non-preemptible section. In my tests I've achieved, on
average, sub-microsecond accuracy, which was good enough from my
perspective, but it's far from ideal 42ns resolution for my (just an
example) time source clocked at 24MHz.

Paweł

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 3/3] perf: Sample additional clock value

2014-11-03 Thread Pawel Moll

From: Pawel Moll 

This patch adds an option to sample value of an additional clock with
any perf event, with the the aim of allowing time correlation between
data coming from perf and other sources like hardware trace which is
timestamped with an external clock.

The idea is to generate periodic perf record containing timestamps from
two different sources, sampled as close to each other as possible. This
allows performing simple linear approximation:

perf eventother event
   -O--+-O--> t_other
:  | :
:  V :
   -OO--> t_perf

User can request such samples for any standard perf event, with the most
obvious examples of cpu-clock (hrtimer) at selected frequency or other
periodic events like sched:sched_switch.

In order to do this, PERF_SAMPLE_CLOCK has to be set in struct
perf_event_attr.sample_type and a type of the clock to be sampled
selected by setting perf_event_attr.clock to a value corresponding to
a POSIX clock clk_id (see "man 2 clock_gettime")

Currently three clocks are implemented: CLOCK_REALITME = 0,
CLOCK_MONOTONIC = 1 and CLOCK_MONOTONIC_RAW = 2. The clock field is
5 bits wide to allow for future extension to custom, non-POSIX clock
sources(MAX_CLOCK for those is 16, see include/uapi/linux/time.h) like
ARM CoreSight (hardware trace) timestamp generator.

Signed-off-by: Pawel Moll 
---
 include/linux/perf_event.h  |  2 ++
 include/uapi/linux/perf_event.h | 16 ++--
 kernel/events/core.c| 30 ++
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 867415d..395d6ed 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -607,6 +607,8 @@ struct perf_sample_data {
 * Transaction flags for abort events:
 */
u64 txn;
+   /* Clock value (additional timestamp for time correlation) */
+   u64 clock;
 };
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9a64eb1..53a7a72 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -137,8 +137,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_DATA_SRC= 1U << 15,
PERF_SAMPLE_IDENTIFIER  = 1U << 16,
PERF_SAMPLE_TRANSACTION = 1U << 17,
+   PERF_SAMPLE_CLOCK   = 1U << 18,
 
-   PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */
+   PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */
 };
 
 /*
@@ -304,7 +305,16 @@ struct perf_event_attr {
mmap2  :  1, /* include mmap with inode 
data */
comm_exec  :  1, /* flag comm events that 
are due to an exec */
uevents:  1, /* allow uevents into the 
buffer */
-   __reserved_1   : 38;
+
+   /*
+* clock: one of the POSIX clock IDs:
+*
+* 0 - CLOCK_REALTIME
+* 1 - CLOCK_MONOTONIC
+* 4 - CLOCK_MONOTONIC_RAW
+*/
+   clock  :  5, /* clock type */
+   __reserved_1   : 33;
 
union {
__u32   wakeup_events;/* wakeup every n events */
@@ -544,6 +554,7 @@ enum perf_event_type {
 *  { u64   id;   } && PERF_SAMPLE_ID
 *  { u64   stream_id;} && PERF_SAMPLE_STREAM_ID
 *  { u32   cpu, res; } && PERF_SAMPLE_CPU
+*  { u64   clock;} && PERF_SAMPLE_CLOCK
 *  { u64   id;   } && PERF_SAMPLE_IDENTIFIER
 * } && perf_event_attr::sample_id_all
 *
@@ -687,6 +698,7 @@ enum perf_event_type {
 *  { u64   weight;   } && PERF_SAMPLE_WEIGHT
 *  { u64   data_src; } && PERF_SAMPLE_DATA_SRC
 *  { u64   transaction; } && 
PERF_SAMPLE_TRANSACTION
+*  { u64   clock;} && PERF_SAMPLE_CLOCK
 * };
 */
PERF_RECORD_SAMPLE  = 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3738e9c..611e2f7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1232,6 +1232,9 @@ static void perf_event__header_size(struct perf_event 
*event)

[PATCH v3 2/3] perf: Userspace event

2014-11-03 Thread Pawel Moll

From: Pawel Moll 

This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by
any process to inject custom data into perf data stream as a new
PERF_RECORD_UEVENT record, if such process is being observed or if it
is running on a CPU being observed by the perf framework.

The prctl call takes the following arguments:

prctl(PR_TASK_PERF_UEVENT, type, size, data, flags);

- type: a number meaning to describe content of the following data.
  Kernel does not pay attention to it and merely passes it further in
  the perf data, therefore its use must be agreed between the events
  producer (the process being observed) and the consumer (performance
  analysis tool). The perf userspace tool will contain a repository of
  "well known" types and reference implementation of their decoders.
- size: Length in bytes of the data.
- data: Pointer to the data.
- flags: Reserved for future use. Always pass zero.

Perf context that are supposed to receive events generated with the
prctl above must be opened with perf_event_attr.uevent set to 1. The
PERF_RECORD_UEVENT records consist of a standard perf event header,
32-bit type value, 32-bit data size and the data itself, followed by
padding to align the overall record size to 8 bytes and optional,
standard sample_id field.

Example use cases:

- "perf_printf" like mechanism to add logging messages to perf data;
  in the simplest case it can be just

prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0);

- synchronisation of performance data generated in user space with the
  perf stream coming from the kernel. For example, the marker can be
  inserted by a JIT engine after it generated portion of the code, but
  before the code is executed for the first time, allowing the
  post-processor to pick the correct debugging information.

Signed-off-by: Pawel Moll 
---
 include/linux/perf_event.h  |  4 +++
 include/uapi/linux/perf_event.h | 23 -
 include/uapi/linux/prctl.h  | 10 ++
 kernel/events/core.c| 71 +
 kernel/sys.c|  5 +++
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ba490d5..867415d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -721,6 +721,8 @@ extern int perf_unregister_guest_info_callbacks(struct 
perf_guest_info_callbacks
 extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
 extern void perf_event_fork(struct task_struct *tsk);
+extern int perf_uevent(struct task_struct *tsk, u32 type, u32 size,
+  const char __user *data);
 
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
@@ -830,6 +832,8 @@ static inline void perf_event_mmap(struct vm_area_struct 
*vma)  { }
 static inline void perf_event_exec(void)   { }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
 static inline void perf_event_fork(struct task_struct *tsk){ }
+static inline int perf_uevent(struct task_struct *tsk, u32 type, u32 size,
+ const char __user *data)  { 
return -1; };
 static inline void perf_event_init(void)   { }
 static inline int  perf_swevent_get_recursion_context(void){ 
return -1; }
 static inline void perf_swevent_put_recursion_context(int rctx)
{ }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9d84540..9a64eb1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -303,7 +303,8 @@ struct perf_event_attr {
exclude_callchain_user   : 1, /* exclude user 
callchains */
mmap2  :  1, /* include mmap with inode 
data */
comm_exec  :  1, /* flag comm events that 
are due to an exec */
-   __reserved_1   : 39;
+   uevents:  1, /* allow uevents into the 
buffer */
+   __reserved_1   : 38;
 
union {
__u32   wakeup_events;/* wakeup every n events */
@@ -712,6 +713,26 @@ enum perf_event_type {
 */
PERF_RECORD_MMAP2   = 10,
 
+   /*
+* Data in userspace event record is transparent for the kernel
+*
+* Userspace perf tool code maintains a list of known types with
+* reference implementations of parsers for the data field.
+*
+* Overall size of the record (including type and size fields)
+* is always aligned to 8 bytes by adding padding after the data.
+*
+* struct {
+*  struct perf_event_headerheader;
+*  u32

[PATCH v3 0/3] perf: User/kernel time correlation and event generation

2014-11-03 Thread Pawel Moll

From: Pawel Moll 

Hello again,

Back to the subject, this time with a slightly different angle...

I've organised a session on the subject during the tracing
minisummit at LPC 2014 in Dusseldorf. Notes taken from the discussion
taken by Steven Rostedt (thanks Steve!)

http://www.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_Tracing.txt

The following three patches address three main topics. They are pretty
much orthogonal and (subject to small and obvious modifications) could
be applied independently from each other.

An executive summary of both discussion and the patches:

1. User/kernel perf timestamps correlation

Thomas suggested that, instead of jumping through loops of correlation,
perf should simply generate monotonic clock timestamps, instead of
using sched clock. Peter and I pointed out that Ingo didn't like this
idea as monotonic can be slow, but apparently the cases when it is are
irrelevant. Thomas offered to fly to Budapest to physically convince
Ingo - I hope it won't be necessary and he will be able to achieve
this here, on mailing lists :-)

Setting aside potential performance problems, it would be a really
great solution, unifying all trace systems (perf, ftrace and LTTng)
in this respect. I'm more than happy to work on potential improvements
in the are of monotonic clock if it was to help the cause.

If it is a definite no-go, we still have the third patch, allowing post-
capture correlation based on synchronisation events.

2. User event generation

Everyone present agreed that it would be a very-nice-to-have feature.
There was some discussion about implementation details, so I welcome
feedback and comments regarding my take on the matter.

3. Correlation with external timestamps

This is a new issue, which surfaced recently while I was working on
hardware trace infrastructure. It can timestamp trace packets, but is
using yet another, completely different time source to do this.

Thomas suggested solution which gets down to my original proposal for
sched/monotonic clock correlation - an additional sample type so events
can be "double stamped" using different clock sources providing
synchronisation points for later time approximation. I've just extended
the implementation with configuration value to select the clock source.
If the first patch (making perf timestamps monotonic) gets accepted,
there will be no immediate need for this one, but I'd like to gain some
feedback anyway.


That's all for this series. Previous versions:

- RFC: http://www.spinics.net/lists/kernel/msg1824419.html
- v1: http://thread.gmane.org/gmane.linux.kernel/1790231
- v2: http://thread.gmane.org/gmane.linux.kernel/1793272

Pawel Moll (3):
  perf: Use monotonic clock as a source for timestamps
  perf: Userspace event
  perf: Sample additional clock value

 include/linux/perf_event.h  |   7 +++
 include/uapi/linux/perf_event.h |  37 +-
 include/uapi/linux/prctl.h  |  10 
 kernel/events/core.c| 105 +++-
 kernel/sys.c|   5 ++
 kernel/sysctl.c |   7 +++
 6 files changed, 168 insertions(+), 3 deletions(-)

-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/3] perf: Use monotonic clock as a source for timestamps

2014-11-03 Thread Pawel Moll

Until now, perf framework never defined the meaning of the timestampt
captured as PERF_SAMPLE_TIME sample type. The values were obtaining
from local (sched) clock, which is unavailable in userspace. This made
it impossible to correlate perf data with any other events. Other
tracing solutions have the source configurable (ftrace) or just share
a common time domain between kernel and userspace (LTTng).

Follow the trend by using monotonic clock, which is readily available
as POSIX CLOCK_MONOTONIC.

Also add a sysctl "perf_sample_time_clk_id" attribute which can be used
by the user to obtain the clk_id to be used with POSIX clock API (eg.
clock_gettime()) to obtain a time value comparable with perf samples.

Signed-off-by: Pawel Moll 
---

Ingo, I remember your comments about this approach in the past, but
during discussions at LPC Thomas was convinced that it's the right
thing to do - see cover letter for the series...

 include/linux/perf_event.h | 1 +
 kernel/events/core.c   | 4 +++-
 kernel/sysctl.c| 7 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 893a0d0..ba490d5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -738,6 +738,7 @@ extern int sysctl_perf_event_paranoid;
 extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 extern int sysctl_perf_cpu_time_max_percent;
+extern int sysctl_perf_sample_time_clk_id;
 
 extern void perf_sample_event_took(u64 sample_len_ns);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2b02c9f..ea3d6d3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -234,6 +234,8 @@ int perf_cpu_time_max_percent_handler(struct ctl_table 
*table, int write,
return 0;
 }
 
+int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC;
+
 /*
  * perf samples are done in some very critical code paths (NMIs).
  * If they take too much CPU time, the system can lock up and not
@@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void)
 
 static inline u64 perf_clock(void)
 {
-   return local_clock();
+   return ktime_get_mono_fast_ns();
 }
 
 static inline struct perf_cpu_context *
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 15f2511..cb75f5b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1094,6 +1094,13 @@ static struct ctl_table kern_table[] = {
.extra1 = &zero,
.extra2 = &one_hundred,
},
+   {
+   .procname   = "perf_sample_time_clk_id",
+   .data   = &sysctl_perf_sample_time_clk_id,
+   .maxlen = sizeof(unsigned int),
+   .mode   = 0444,
+   .proc_handler   = proc_dointvec,
+   },
 #endif
 #ifdef CONFIG_KMEMCHECK
{
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf: Userspace software event and ioctl

2014-11-03 Thread Pawel Moll

On Mon, 2014-11-03 at 14:48 +, Tomeu Vizoso wrote:
> On 29 September 2014 17:53, Pawel Moll  wrote:
> > On Mon, 2014-09-29 at 16:32 +0100, Peter Zijlstra wrote:
> >> Also none of the many words above describe
> >> PERF_SAMPLE_USERSPACE_EVENT(), wth is that about?
> >
> > Hopefully description of the v2 makes better job in this:
> >
> > http://thread.gmane.org/gmane.linux.kernel/1793272/focus=4813
> >
> > where it's already called "UEVENT" and was generated by write().
> >
> > Before you get into this, though, the most important outcomes of both v1
> > and v2 discussions:
> >
> > * Ingo suggested prctl(PR_TRACE_UEVENT, type, size, data, 0) as the way
> > of generating such events (so the tracee doesn't have to know the fd to
> > do ioctl); Frederic seems to have the same on his mind.
> >
> > * Namhyung proposed sticking the userspace-originating events into the
> > buffer as PERF_RECORD_UEVENT rather then PERF_SAMPLE_UEVENT.
> >
> > Working on making both happen now.
> 
> are you still working on this? Would be happy to lend a hand if that
> can speed things up.

By all means! In fact I'm typing commit messages right now and will post
the patches later today. Stay tuned and I'm looking forward to all
suggestions, reviews etc.

Cheers!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf: Userspace software event and ioctl

2014-09-29 Thread Pawel Moll

On Mon, 2014-09-29 at 16:32 +0100, Peter Zijlstra wrote:
> Also none of the many words above describe
> PERF_SAMPLE_USERSPACE_EVENT(), wth is that about?

Hopefully description of the v2 makes better job in this:

http://thread.gmane.org/gmane.linux.kernel/1793272/focus=4813

where it's already called "UEVENT" and was generated by write().

Before you get into this, though, the most important outcomes of both v1
and v2 discussions:

* Ingo suggested prctl(PR_TRACE_UEVENT, type, size, data, 0) as the way
of generating such events (so the tracee doesn't have to know the fd to
do ioctl); Frederic seems to have the same on his mind.

* Namhyung proposed sticking the userspace-originating events into the
buffer as PERF_RECORD_UEVENT rather then PERF_SAMPLE_UEVENT.

Working on making both happen now.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] perf: Add sampling of the raw monotonic clock

2014-09-29 Thread Pawel Moll

On Mon, 2014-09-29 at 16:28 +0100, Peter Zijlstra wrote:
> On Thu, Sep 18, 2014 at 03:34:32PM +0100, Pawel Moll wrote:
> > @@ -4456,6 +4459,13 @@ static void __perf_event_header__init_id(struct 
> > perf_event_header *header,
> > data->cpu_entry.cpu  = raw_smp_processor_id();
> > data->cpu_entry.reserved = 0;
> > }
> > +
> > +   if (sample_type & PERF_SAMPLE_CLOCK_RAW_MONOTONIC) {
> > +   struct timespec now;
> > +
> > +   getrawmonotonic(&now);
> > +   data->clock_raw_monotonic = timespec_to_ns(&now);
> > +   }
> >  }
> >  
> 
> This cannot work, getrawmonotonic() isn't NMI-safe and there's
> nothing stopping this being used from NMI context.
> 
> Also getrawmonotonic() + timespec_to_ns() will make tglx sad, he's just
> done a tree-wide eradication of silly conversions and now you're adding
> a ns -> timespec -> ns dance right back.

Last thing I want is to make Thomas sad... For obvious reasons ;-)

> I _think_ you want ktime_get_mono_fast_ns(), 

With pleasure, it's exactly what I need.

> but this does bring us
> right back to the question/discussion on which timebase you'd want to
> sync again. MONO does make sense for most cases, but I think we've had
> fairly sane stories for people wanting to sync against other clocks.

Yes. I've asked the same question somewhere in the thread.

ftrace has got a switch and a selection of trace_clocks in
kernel/trace/trace.c - do we want something similar (in integer form
probably, though) in perf_events.h with an additional "flag" in struct
perf_event_attr? It could be used to pick a time source for
PERF_SAMPLE_CLOCK (PERF_SAMPLE_TRACE_CLOCK?) sample.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf: Userspace software event and ioctl

2014-09-29 Thread Pawel Moll

On Sat, 2014-09-27 at 18:14 +0100, Frederic Weisbecker wrote:
> 2014-09-25 20:33 GMT+02:00 Ingo Molnar :
> >
> > * Pawel Moll  wrote:
> >
> >> On Wed, 2014-09-24 at 08:49 +0100, Ingo Molnar wrote:
> >> > * Pawel Moll  wrote:
> >> >
> >> > > On Thu, 2014-09-18 at 15:34 +0100, Pawel Moll wrote:
> >> > > > This patch adds a PERF_COUNT_SW_USERSPACE_EVENT type,
> >> > > > which can be generated by user with PERF_EVENT_IOC_ENTRY
> >> > > > ioctl command, which injects an event of said type into
> >> > > > the perf buffer.
> >> > >
> >> > > It occurred to me last night that currently perf doesn't handle "write"
> >> > > syscall at all, while this seems like the most natural way of
> >> > > "injecting" userspace events into perf buffer.
> >> > >
> >> > > An ioctl would still be needed to set a type of the following events,
> >> > > something like:
> >> > >
> >> > >   ioctl(SET_TYPE, 0x42);
> >> > >   write(perf_fd, binaryblob, size);
> >> > >   ioctl(SET_TYPE, 0);
> >> > >   dprintf(perf_fd, "String");
> >> > >
> >> > > which is fine for use cases when the type doesn't change often,
> >> > > but would double the amount of syscalls when every single event
> >> > > is of a different type. Perhaps there still should be a
> >> > > "generating ioctl" taking both type and data/size in one go?
> >> >
> >> > Absolutely, there should be a single syscall.
> >>
> >> Yeah, it's my gut feeling as well. I just wonder if we still want to
> >> keep write() handler for operations on perf fds? This seems natural -
> >> takes data buffer and its size. The only issue is the type.
> >>
> >> > I'd even argue it should be a new prctl(): that way we could both
> >> > generate user events for specific perf fds, but also into any
> >> > currently active context (that allows just generation/injection
> >> > of user events). In the latter case we might have no fd to work
> >> > off from.
> >>
> >> When Arnaldo suggested that the "user events" could be used by perf
> >> trace, it was exactly my first thought. I just didn't have answer how to
> >> present it to the user (an extra syscall didn't seem like a good idea),
> >> but prctl seems interesting, something like this?
> >>
> >>   prctl(PR_TRACE_UEVENT, type, size, data, 0);
> >
> > Exactly!
> >
> >> How would we select tasks that can write to a given buffer? Maybe an
> >> ioctl() on a perf fd? Something like this?
> >>
> >>   ioctl(perf_fd, PERF_EVENT_IOC_ENABLE_UEVENT, pid);
> >>   ioctl(perf_fd, PERF_EVENT_IOC_DISABLE_UEVENT, pid);
> >
> > No, I think there's a simpler way: this should be a regular
> > perf_attr flag, which defaults to '0' (tasks cannot do this), but
> > which can be set to 1 if the profiler explicitly allows such
> > event injection.
> 
> Maybe we just don't even need any permission at all. Which harm can
> that do if this only ever generate events to those interested in the
> relevant perf context? It could be a simple tracepoint BTW.

Yeah, Ingo already pointed it out (that non-root task can't trace root
tasks anyway).

> Oh and I really like the fact we don't use a syscall that requires an
> fd. The tracee really shouldn't be aware of the tracer.

Agreed, I'll look at solution with prctl() this week.

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] perf: Add sampling of the raw monotonic clock

2014-09-29 Thread Pawel Moll

On Fri, 2014-09-26 at 20:25 +0100, David Ahern wrote:
> On 9/26/14, 9:05 AM, Pawel Moll wrote:
> > To do the correlation you need both timestamps to be "taken"
> > simultaneously:
> >
> >  perf event user event
> > -O--+-O--> t_mono
> >  :  | :
> >  :  V :
> > -OO--> t_perf
> >
> > Of course it's not possible get both values literally at the same time,
> > but placing them in a atomic context a couple of instructions from each
> > other still gives pretty good results. The larger this distance is, the
> 
> An early patchset on this topic added the realtime clock as an event and 
> an ioctl was used to push a sample into the event stream. 

Yeah, I remember. If I remember correctly correctly the pushback was on
a custom event type, right?

Generally speaking I don't mind any solution that we'll get us to the
place both you and I want to be (just being able to time stamp some
performance data in userspace, how difficult can this be! ;-) but I like
the flexibility of an extra sample - one can pick and mix events and
samples at one's leisure.

> In that case 
> you have wall clock and perf-clock samples taken in the same kernel 
> context and about as close together as you can get.

Yep, that's what I was saying - we can't quite get two timestamps at the
*same*, but getting them within a single atomic block of instructions
gives reasonable accuracy.

Thanks!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] perf: Add sampling of the raw monotonic clock

2014-09-26 Thread Pawel Moll

On Fri, 2014-09-26 at 15:38 +0100, Namhyung Kim wrote:
> > Then I have loads of normal normal samples, timestamped with sched clock
> > only, and every now and then one with both timestamps which then I can
> > use for time correlation. The whole point is that the frequency of such
> > "synchronisation" event can be much (much!) lower than of the normal
> > samples, but it still allows pretty good approximation (I was getting
> > accuracy of ~1 microsecond and better with sched_switch trace event
> > marked with additional raw monotonic timestamp).
> 
> Okay.  But in that case wouldn't it be enough to use just a single
> timestamp for each event - sched_clock for cpu-cycles and monotonic raw
> for sched_switch?

To do the correlation you need both timestamps to be "taken"
simultaneously:

perf event user event
   -O--+-O--> t_mono
:  | :
:  V :
   -OO--> t_perf

Of course it's not possible get both values literally at the same time,
but placing them in a atomic context a couple of instructions from each
other still gives pretty good results. The larger this distance is, the
lower the accuracy will be. I must admit I haven't done such
experiments, but let me remind that I in my test I was getting results
in the range of 1000ns, with a single cycle of a 2GHz taking 0.5ns, so
moving the t_mono/t_perf value sampling further aside will reduce it
significantly...

Pawel

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf: Userspace software event and ioctl

2014-09-26 Thread Pawel Moll

On Fri, 2014-09-26 at 12:23 +0100, Ingo Molnar wrote:
> > As in: allows *all* tasks to inject the data? Are you sure we 
> > don't want more fine-grained control, in particular per task?
> 
> Yeah. If the profiler allows it, then any task that is being 
> traced can inject data.

The "that is being traced" fragment was the key here. I missed the fact
that perf trace already takes a list of pids, so we're not talking about
all tasks in the system. That should work.

Paweł

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/2] perf: Userspace event

2014-09-26 Thread Pawel Moll

On Fri, 2014-09-26 at 07:21 +0100, Namhyung Kim wrote:
> It looks like what trace-marker in ftrace does..  We might connect
> output of the trace marker into a perf event somehow.

I can probably trace_marker's write handler do the same as the new
prctl() would do. But this means that we really want the pre-defined
"zero terminated string" type (0). Otherwise, what type would be
assigned to a record originating from it?

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] perf: Add sampling of the raw monotonic clock

2014-09-26 Thread Pawel Moll

On Fri, 2014-09-26 at 07:16 +0100, Namhyung Kim wrote:
> > It would be doable, I guess, but what
> > if someone *wants* to have sched clock as the timestamps source (because
> > it's cheap) but still be able to correlate them with userspace? In this
> > case two separate timestamps are required to do the approximation.
> 
> But by collecting two timestamps, you'll loose the win of the first
> timestamp, no?

But I can ask for both timestamps only being collected on "low
bandwidth" events, in particular context switches and/or periodic (eg.
10ms hrtimer) software events.

Then I have loads of normal normal samples, timestamped with sched clock
only, and every now and then one with both timestamps which then I can
use for time correlation. The whole point is that the frequency of such
"synchronisation" event can be much (much!) lower than of the normal
samples, but it still allows pretty good approximation (I was getting
accuracy of ~1 microsecond and better with sched_switch trace event
marked with additional raw monotonic timestamp).

Pawel

PS. Have you sent a couple of the messages via some kind of gmane's
proxy? All the mail addresses got rather messed up...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf: Userspace software event and ioctl

2014-09-26 Thread Pawel Moll

On Thu, 2014-09-25 at 19:33 +0100, Ingo Molnar wrote:
> > How would we select tasks that can write to a given buffer? Maybe an
> > ioctl() on a perf fd? Something like this?
> > 
> > ioctl(perf_fd, PERF_EVENT_IOC_ENABLE_UEVENT, pid);
> > ioctl(perf_fd, PERF_EVENT_IOC_DISABLE_UEVENT, pid);
> 
> No, I think there's a simpler way: this should be a regular 
> perf_attr flag, which defaults to '0' (tasks cannot do this), but 
> which can be set to 1 if the profiler explicitly allows such 
> event injection.

As in: allows *all* tasks to inject the data? Are you sure we don't want
more fine-grained control, in particular per task?

If we have two buffers, both created with the "injecting allowed" flag,
do we inject a given uevent into both of them?

> I.e. whether user-events are allowed is controlled by the 
> profiling/tracing context, via the regular perf syscall. It would 
> propagate into the perf context, so it would be easy to check at 
> event generation time.

It would definitely be the profiling/tracing tools that would decide if
the injection is allowed, no question about that. I just feel that it
should be able to select the tasks that can do that, not just flip a big
switch saying "everyone is welcome". Other question is: should a
non-root context be able to receive events from root processes? Wouldn't
it be a security hole (for example, it could be used as a kind of covert
channel)? Maybe we should do what ptrace does? As in: if a task can
ptrace another task, it can also receive uevents from it.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf: Userspace software event and ioctl

2014-09-25 Thread Pawel Moll

On Wed, 2014-09-24 at 08:49 +0100, Ingo Molnar wrote:
> * Pawel Moll  wrote:
> 
> > On Thu, 2014-09-18 at 15:34 +0100, Pawel Moll wrote:
> > > This patch adds a PERF_COUNT_SW_USERSPACE_EVENT type,
> > > which can be generated by user with PERF_EVENT_IOC_ENTRY
> > > ioctl command, which injects an event of said type into
> > > the perf buffer.
> > 
> > It occurred to me last night that currently perf doesn't handle "write"
> > syscall at all, while this seems like the most natural way of
> > "injecting" userspace events into perf buffer.
> > 
> > An ioctl would still be needed to set a type of the following events,
> > something like:
> > 
> > ioctl(SET_TYPE, 0x42);
> > write(perf_fd, binaryblob, size);
> > ioctl(SET_TYPE, 0);
> > dprintf(perf_fd, "String");
> > 
> > which is fine for use cases when the type doesn't change often, 
> > but would double the amount of syscalls when every single event 
> > is of a different type. Perhaps there still should be a 
> > "generating ioctl" taking both type and data/size in one go?
> 
> Absolutely, there should be a single syscall.

Yeah, it's my gut feeling as well. I just wonder if we still want to
keep write() handler for operations on perf fds? This seems natural -
takes data buffer and its size. The only issue is the type.

> I'd even argue it should be a new prctl(): that way we could both 
> generate user events for specific perf fds, but also into any 
> currently active context (that allows just generation/injection 
> of user events). In the latter case we might have no fd to work 
> off from.

When Arnaldo suggested that the "user events" could be used by perf
trace, it was exactly my first thought. I just didn't have answer how to
present it to the user (an extra syscall didn't seem like a good idea),
but prctl seems interesting, something like this?

prctl(PR_TRACE_UEVENT, type, size, data, 0);

How would we select tasks that can write to a given buffer? Maybe an
ioctl() on a perf fd? Something like this?

ioctl(perf_fd, PERF_EVENT_IOC_ENABLE_UEVENT, pid);
ioctl(perf_fd, PERF_EVENT_IOC_DISABLE_UEVENT, pid);

It could set/clear a flag in pid's task_struct (but probably not in the
"normal" flags, as they are only supposed to be set by owner and in
ptrace/fork case) and a pointer to the task in perf_event(_context).

Or maybe some variation on ptrace would be more in place? This would
also solve issue of permission checking (if the profiling tool can
ptrace the process, it can also enable/disable its uevent generation
capability).

Paweł

Or maybe it should go through ptrace?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/2] perf: Userspace event

2014-09-25 Thread Pawel Moll

On Wed, 2014-09-24 at 07:07 +0100, Namhyung Kim wrote:
> On Tue, 23 Sep 2014 18:03:07 +0100, Pawel Moll wrote:
> > This patch adds a new PERF_COUNT_SW_UEVENT software event
> > and a related PERF_SAMPLE_UEVENT sample. User can now
> > write to the the perf file descriptor, injecting such
> > event in the perf buffer.
> 
> It seems the PERF_SAMPLE_UEVENT sample can be injected to any event.  So
> why the PERF_COUNT_SW_UEVENT is needed?  At least one can use the
> SW_DUMMY event for that purpose.

You're right. I needed a different SW type in one of my early
prototypes, but it's not the case any more. Consider it gone.

> Also I think it'd be better to be a record type (PERF_RECORD_XXX)
> instead of a sample flag (PERF_SAMPLE_XXX).  In perf tools, we already
> use perf_user_event_type for synthesized userspace events.  This way it
> can avoid unnecessary sample processing for userspace events.

Fine with me. If no one objects, I'm more than happy to use
PERF_RECORD_UEVENT = 11 for it.

> For contents, I prefer to give complete control to users - kernel
> doesn't need to care about it other than its size.  If one just wants to
> use strings only, she can write them directly.  If others want to mix
> different types of data, they might need to define a data format for
> their use.

Are you saying to drop even the "type 0 means zero-terminated string"
definition, even if everything else is up to the user? I quite like that
idea, especially combined with write()ing to the perf_fd (it is very
much like trace_marker then, which is beautiful in its simplicity), but
the feelings are not that strong to fight a war over it.

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 374 matches

Mail list logo