Re: [PATCHv2 1/2] bus: arm-ccn: Enable stats for CCN-512 interconnect
On Wed, 2019-10-16 at 10:06 +0100, Marek Bykowski wrote: > Add compatible string for the ARM CCN-502 interconnect > > Signed-off-by: Marek Bykowski > Signed-off-by: Boleslaw Malecki > --- > Changelog v1->v2: > - Change the subject to reflect where the driver got moved to (drivers/perf) > from > (drivers/bus). > - Add credit to my work mate that helped me validate the counts from > the interconnect. > --- > drivers/perf/arm-ccn.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c > index 7dd850e02f19..b6e00f35a448 100644 > --- a/drivers/perf/arm-ccn.c > +++ b/drivers/perf/arm-ccn.c > @@ -1545,6 +1545,7 @@ static int arm_ccn_remove(struct platform_device *pdev) > static const struct of_device_id arm_ccn_match[] = { > { .compatible = "arm,ccn-502", }, > { .compatible = "arm,ccn-504", }, > + { .compatible = "arm,ccn-512", }, > {}, > }; > MODULE_DEVICE_TABLE(of, arm_ccn_match); Acked-by: Pawel Moll Thanks! Pawel
Re: [PATCH] bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left.
On Wed, 2017-10-11 at 22:33 +0100, Kim Phillips wrote: > Unregistering the driver before calling cpuhp_remove_multi_state() > removes > any remaining hotplug cpu instances so > __cpuhp_remove_state_cpuslocked() > doesn't emit this warning: Cool, thanks! I'm getting together a series of fixes now, so I'll get this merged soon. Pawel
Re: [PATCH v2 1/2] bus: arm-ccn: Fix use of smp_processor_id() in preemptible context
On Wed, 2017-10-04 at 13:32 +0100, Marc Zyngier wrote: > > > Acked-by: Pawel Moll > > > > I assume you'll get this merged yourself? Or do you want me to > relay > > the CCN one (I've got a couple of other small changes to the driver > in > > the queue). > > I'd rather you take care of it if you already have a queue of > patches. > Otherwise, I'll bundle the two patches and ask the armsoc folks to do > something with them. Ok. I guess you can include CCI as well. Not today, not tomorrow, but Friday or early next week. Cheers! Paweł IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH v2 1/2] bus: arm-ccn: Fix use of smp_processor_id() in preemptible context
On Tue, 2017-10-03 at 18:14 +0100, Marc Zyngier wrote: > Booting a DEBUG_PREEMPT enabled kernel on a CCN-based system > results in the following splat: > > [...] > arm-ccn e800.ccn: No access to interrupts, using timer. > BUG: using smp_processor_id() in preemptible [] code: > swapper/0/1 > caller is debug_smp_processor_id+0x1c/0x28 > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.13.0 #6111 > Hardware name: AMD Seattle/Seattle, BIOS 17:08:23 Jun 26 2017 > Call trace: > [] dump_backtrace+0x0/0x278 > [] show_stack+0x24/0x30 > [] dump_stack+0x8c/0xb0 > [] check_preemption_disabled+0xfc/0x100 > [] debug_smp_processor_id+0x1c/0x28 > [] arm_ccn_probe+0x358/0x4f0 > [...] > > as we use smp_processor_id() in the wrong context. > > Turn this into a get_cpu()/put_cpu() that extends over the CPU > hotplug > registration, making sure that we don't race against a CPU down > operation. > > Signed-off-by: Marc Zyngier Acked-by: Pawel Moll I assume you'll get this merged yourself? Or do you want me to relay the CCN one (I've got a couple of other small changes to the driver in the queue). Paweł IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH] bus: arm-ccn: constify attribute_group structures.
On Mon, 2017-07-03 at 13:01 +0530, Arvind Yadav wrote: > attribute_groups are not supposed to change at runtime. All functions > working with attribute_groups provided by work with const > attribute_group. So mark the non-const structs as const. > > File size before: > text data bss dec hex filename > 9074 5592 416 15082 3aea drivers/bus/arm-ccn.o > > File size After adding 'const': > text data bss dec hex filename > 9327 5336 416 15079 3ae7 drivers/bus/arm-ccn.o > > Signed-off-by: Arvind Yadav Looks fine to me. I'll queue it for the next time I push out CCN driver fixes (no dates promised, though...) Thanks! Paweł
Re: [PATCH RESEND 0/2] Add support for ARM CCN-502 interconnect
On Fri, 2017-06-16 at 13:39 -0700, Scott Branden wrote: > Arnd, > > Should this patchset go through the ARM maintainers? Unless something changed in the last months, these patches should go through arm-soc. Arnd is probably just waiting for me to send a pull request. And indeed shame on me that I still haven't got around doing this. Promise to do so tomorrow. Regards and apologies Pawel
Re: [PATCH] bus: arm-ccn: Use devm_kcalloc() in arm_ccn_probe()
On Wed, 2017-04-19 at 09:29 +0200, SF Markus Elfring wrote: > From: Markus Elfring > Date: Wed, 19 Apr 2017 09:12:43 +0200 > > Multiplications for the size determination of memory allocations > indicated that array data structures should be processed. > Thus use the corresponding function "devm_kcalloc". > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring Fine with me. I'll queue it with a couple other CCN driver changes. Will probably get into 4.13. Regards! Pawel
Re: [PATCH 1/2] dt-bindings: arm-ccn: Add bindings info for CCN-502 compatible string
On Tue, 2017-02-14 at 17:48 +, Mark Rutland wrote: > On Fri, Feb 10, 2017 at 12:42:47PM -0800, Velibor Markovski wrote: > > > > Add CCN-502 to the list of supported devices by ARM CCN PMU driver. > > > > Signed-off-by: Velibor Markovski > Acked-by: Mark Rutland > > I assume Pawel will take this along with the driver patch. Will do, thanks! Pawel
Re: [PATCH] bus: arm-ccn: Fix module autoload
On Fri, 2017-02-03 at 18:31 -0300, Javier Martinez Canillas wrote: > On 01/02/2017 10:02 AM, Javier Martinez Canillas wrote: > > > > If the driver is built as a module, autoload won't work because the module > > alias information is not filled. So user-space can't match the registered > > device with the corresponding module. > > > > Export the module alias information using the MODULE_DEVICE_TABLE() > > macro. > > > > Before this patch: > > > > $ modinfo drivers/bus/arm-ccn.ko | grep alias > > $ > > > > After this patch: > > > > $ modinfo drivers/bus/arm-ccn.ko | grep alias > > alias: of:N*T*Carm,ccn-504C* > > alias: of:N*T*Carm,ccn-504 > > > > Signed-off-by: Javier Martinez Canillas > > --- > Any comments about this patch? Sorry, I clearly missed that in my mailbox at the beginning of the year. It all makes sense to me. I'll try to test it out soon and will make sure to include it next time I send out CCN update. Thanks! Paweł
Re: [PATCH] bus: arm-ccn: Fix module autoload
On Fri, 2017-02-03 at 18:31 -0300, Javier Martinez Canillas wrote: > On 01/02/2017 10:02 AM, Javier Martinez Canillas wrote: > > > > If the driver is built as a module, autoload won't work because the module > > alias information is not filled. So user-space can't match the registered > > device with the corresponding module. > > > > Export the module alias information using the MODULE_DEVICE_TABLE() > > macro. > > > > Before this patch: > > > > $ modinfo drivers/bus/arm-ccn.ko | grep alias > > $ > > > > After this patch: > > > > $ modinfo drivers/bus/arm-ccn.ko | grep alias > > alias: of:N*T*Carm,ccn-504C* > > alias: of:N*T*Carm,ccn-504 > > > > Signed-off-by: Javier Martinez Canillas > > --- > Any comments about this patch? Sorry, I clearly missed that in my mailbox at the beginning of the year. It all makes sense to me. I'll try to test it out soon and will make sure to include it next time I send out CCN update. Thanks! Paweł IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [patch V2 23/67] bus/arm-ccn: Convert to hotplug statemachine
Dnia 2016-07-13, Wed o godzinie 17:16 +, Anna-Maria Gleixner pisze: > From: Sebastian Andrzej Siewior > > Install the callbacks via the state machine and let the core invoke > the callbacks on the already online CPUs. > > Signed-off-by: Sebastian Andrzej Siewior > Cc: Arnd Bergmann > Cc: Dan Carpenter > Cc: Linus Torvalds > Cc: Pawel Moll > Cc: Peter Zijlstra > Cc: Suzuki K Poulose > Cc: Thomas Gleixner > Signed-off-by: Anna-Maria Gleixner > @@ -1316,7 +1309,6 @@ static void arm_ccn_pmu_cleanup(struct a > ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id); > } > > - > static int arm_ccn_for_each_valid_region(struct arm_ccn *ccn, > int (*callback)(struct arm_ccn *ccn, int region, > void __iomem *base, u32 type, u32 id)) This empty line doesn't impact hotplug, does it? ;-) With this chunk dropped Acked-by: Pawel Moll Cheers! Pawel
Re: [patch 22/66] bus: arm-ccn: convert to hotplug statemachine
On Mon, 2016-07-11 at 12:28 +, Anna-Maria Gleixner wrote: > @@ -1270,9 +1262,10 @@ static int arm_ccn_pmu_init(struct arm_c >* ... and change the selection when it goes offline. > Priority is >* picked to have a chance to migrate events before perf is > notified. >*/ > - ccn->dt.cpu_nb.notifier_call = arm_ccn_pmu_cpu_notifier; > - ccn->dt.cpu_nb.priority = CPU_PRI_PERF + 1, > - err = register_cpu_notifier(&ccn->dt.cpu_nb); > + cpuhp_armccn_dt = &ccn->dt; > + err = cpuhp_setup_state(CPUHP_AP_PERF_ARM_CCN_ONLINE, > + "AP_PERF_ARM_CCN_ONLINE", NULL, > + arm_ccn_pmu_offline_cpu); > if (err) > goto error_cpu_notifier; Also, unless I'm missing something obvious, it seems that the callback will be executed for CPUs going online? I'm definitely interested in my "current" CPU going down, in order to migrate my handlers somewhere else. Help? Pawel
Re: [patch 22/66] bus: arm-ccn: convert to hotplug statemachine
On Mon, 2016-07-11 at 12:28 +, Anna-Maria Gleixner wrote: > From: Sebastian Andrzej Siewior > > Install the callbacks via the state machine and let the core invoke > the callbacks on the already online CPUs. > > Signed-off-by: Sebastian Andrzej Siewior > Cc: Pawel Moll > Signed-off-by: Anna-Maria Gleixner Although I won't shed a tear over the notifiers going, there's a problem with this patch... > --- > drivers/bus/arm-ccn.c | 47 -- > --- > include/linux/cpuhotplug.h |2 + > 2 files changed, 23 insertions(+), 26 deletions(-) > > --- a/drivers/bus/arm-ccn.c > +++ b/drivers/bus/arm-ccn.c > @@ -167,7 +167,6 @@ struct arm_ccn_dt { > struct hrtimer hrtimer; > > cpumask_t cpu; > - struct notifier_block cpu_nb; > > struct pmu pmu; > }; Notice that here each instance of CCN (unlikely as it is today, the code was written with the assumption that there's more than one interconnect ring in the system) get its own notifier block... > @@ -1171,30 +1170,23 @@ static enum hrtimer_restart arm_ccn_pmu_ > } > > > -static int arm_ccn_pmu_cpu_notifier(struct notifier_block *nb, > - unsigned long action, void *hcpu) > +static struct arm_ccn_dt *cpuhp_armccn_dt; > +static int arm_ccn_pmu_offline_cpu(unsigned int cpu) > { > - struct arm_ccn_dt *dt = container_of(nb, struct arm_ccn_dt, > cpu_nb); > + struct arm_ccn_dt *dt = cpuhp_armccn_dt; > struct arm_ccn *ccn = container_of(dt, struct arm_ccn, dt); > - unsigned int cpu = (long)hcpu; /* for (long) see ... but here (and in all the rest of this change) it's replaced by a static pointer to a single instance. > @@ -1270,9 +1262,10 @@ static int arm_ccn_pmu_init(struct arm_c >* ... and change the selection when it goes offline. > Priority is >* picked to have a chance to migrate events before perf is > notified. >*/ > - ccn->dt.cpu_nb.notifier_call = arm_ccn_pmu_cpu_notifier; > - ccn->dt.cpu_nb.priority = CPU_PRI_PERF + 1, > - err = register_cpu_notifier(&ccn->dt.cpu_nb); > + cpuhp_armccn_dt = &ccn->dt; Even without checking if the pointer has been already set. > + err = cpuhp_setup_state(CPUHP_AP_PERF_ARM_CCN_ONLINE, > + "AP_PERF_ARM_CCN_ONLINE", NULL, > + arm_ccn_pmu_offline_cpu); > if (err) > goto error_cpu_notifier; > > @@ -1293,7 +1286,8 @@ static int arm_ccn_pmu_init(struct arm_c > > error_pmu_register: > error_set_affinity: > - unregister_cpu_notifier(&ccn->dt.cpu_nb); > + cpuhp_remove_state_nocalls(CPUHP_AP_PERF_ARM_CCN_ONLINE); > + cpuhp_armccn_dt = NULL; error_cpu_notifier: > ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id); > for (i = 0; i < ccn->num_xps; i++) > @@ -1308,7 +1302,8 @@ static void arm_ccn_pmu_cleanup(struct a > > if (ccn->irq) > irq_set_affinity_hint(ccn->irq, NULL); > - unregister_cpu_notifier(&ccn->dt.cpu_nb); > + cpuhp_remove_state_nocalls(CPUHP_AP_PERF_ARM_CCN_ONLINE); > + cpuhp_armccn_dt = NULL; Same (only the other way round) here... > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -30,6 +30,7 @@ enum cpuhp_state { > CPUHP_AP_PERF_X86_AMD_IBS_STARTING, > CPUHP_AP_PERF_X86_CQM_STARTING, > CPUHP_AP_PERF_X86_CSTATE_STARTING, > + CPUHP_AP_PERF_XTENSA_STARTING, > CPUHP_AP_NOTIFY_STARTING, > CPUHP_AP_ONLINE, > CPUHP_TEARDOWN_CPU, That chunk does not belong here, does it? Regards Pawel
Re: [PATCH 4/4] drivers/bus: make arm-ccn.c driver explicitly non-modular
On Tue, 2016-03-29 at 13:30 +0100, Will Deacon wrote: > On Tue, Mar 29, 2016 at 12:53:06PM +0100, Pawel Moll wrote: > > Dnia 2016-03-29, Tue o godzinie 12:45 +0100, Will Deacon pisze: > > > I'd much rather fix the driver to build as a module, if at all > > > possible. > > > Suzuki (CC'd) is taking a look at that, so please drop this patch > > > for > > > now. > > > > There's no problem with building arm-ccn.c as a module - all it's > > really doing today is providing a PMU driver. I don't even know why > > have I made it bool-only in the first place... > > Probably because it doesn't compile due to the irq_set_affinity call. The original arm-ccn.c did not call it at all. My guess is that I copied ARM_CCI stanza without thinking. But I'm glad Suzuki will straighten it out :-) Paweł
Re: [PATCH 4/4] drivers/bus: make arm-ccn.c driver explicitly non-modular
Dnia 2016-03-29, Tue o godzinie 12:45 +0100, Will Deacon pisze: > I'd much rather fix the driver to build as a module, if at all > possible. > Suzuki (CC'd) is taking a look at that, so please drop this patch for > now. There's no problem with building arm-ccn.c as a module - all it's really doing today is providing a PMU driver. I don't even know why have I made it bool-only in the first place... Pawel
Re: [PATCH v5 2/2] arm: perf: Add event descriptions
On Thu, 2015-10-15 at 16:42 +0100, Mark Rutland wrote: > On Thu, Oct 15, 2015 at 04:20:37PM +0100, Pawel Moll wrote: > > On Thu, 2015-10-15 at 08:15 -0700, Drew Richardson wrote: > > > On Thu, Oct 15, 2015 at 02:21:12PM +0100, Will Deacon wrote: > > > > On Tue, Oct 13, 2015 at 08:36:45AM -0700, Drew Richardson wrote: > > > > > Add additional information about the ARM architected hardware events > > > > > to make counters self describing. This makes the hardware PMUs easier > > > > > to use as perf list contains possible events instead of users having > > > > > to refer to documentation like the ARM TRMs. > > > > > > > > > > Signed-off-by: Drew Richardson > > > > > --- > > > > > arch/arm/kernel/perf_event_v7.c | 121 > > > > > > > > > > drivers/perf/arm_pmu.c | 1 + > > > > > 2 files changed, 122 insertions(+) > > > > > > > > [...] > > > > > > > > > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c > > > > > index 2365a32a595e..e933d2dd71c0 100644 > > > > > --- a/drivers/perf/arm_pmu.c > > > > > +++ b/drivers/perf/arm_pmu.c > > > > > @@ -548,6 +548,7 @@ static void armpmu_init(struct arm_pmu *armpmu) > > > > > .stop = armpmu_stop, > > > > > .read = armpmu_read, > > > > > .filter_match = armpmu_filter_match, > > > > > + .attr_groups= armpmu->pmu.attr_groups, > > > > > > > > I don't understand this hunk. What's it doing? > > > > > > I'm not 100% clear either on what it's doing. But without this line > > > the attr_groups don't get passed on and I don't see them on my TC2. I > > > debugged the issue down to this but it may not be the proper way to > > > solve the problem. > > > > The perf core creates attributes passed as pmu.attr_groups > > in /sys/bus/events/devices/. > > Sure, but the confusion here had to do with why we were re-assigning the > field to itself rather than what the purpose of said field was. Ah, right. Sorry. I must admit I haven't looked to the right from the = sign... Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 2/2] arm: perf: Add event descriptions
On Thu, 2015-10-15 at 08:15 -0700, Drew Richardson wrote: > On Thu, Oct 15, 2015 at 02:21:12PM +0100, Will Deacon wrote: > > On Tue, Oct 13, 2015 at 08:36:45AM -0700, Drew Richardson wrote: > > > Add additional information about the ARM architected hardware events > > > to make counters self describing. This makes the hardware PMUs easier > > > to use as perf list contains possible events instead of users having > > > to refer to documentation like the ARM TRMs. > > > > > > Signed-off-by: Drew Richardson > > > --- > > > arch/arm/kernel/perf_event_v7.c | 121 > > > > > > drivers/perf/arm_pmu.c | 1 + > > > 2 files changed, 122 insertions(+) > > > > [...] > > > > > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c > > > index 2365a32a595e..e933d2dd71c0 100644 > > > --- a/drivers/perf/arm_pmu.c > > > +++ b/drivers/perf/arm_pmu.c > > > @@ -548,6 +548,7 @@ static void armpmu_init(struct arm_pmu *armpmu) > > > .stop = armpmu_stop, > > > .read = armpmu_read, > > > .filter_match = armpmu_filter_match, > > > + .attr_groups= armpmu->pmu.attr_groups, > > > > I don't understand this hunk. What's it doing? > > I'm not 100% clear either on what it's doing. But without this line > the attr_groups don't get passed on and I don't see them on my TC2. I > debugged the issue down to this but it may not be the proper way to > solve the problem. The perf core creates attributes passed as pmu.attr_groups in /sys/bus/events/devices/. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/38] bus: arm-ccn: fix handling cpumask_any_but return value
Dnia 2015-09-21, Mon o godzinie 15:33 +0200, Andrzej Hajda pisze: > cpumask_any_but returns value >= nr_cpu_ids if there are no more CPUs. > > The problem has been detected using proposed semantic patch > scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1]. > > [1]: http://permalink.gmane.org/gmane.linux.kernel/2038576 > > Signed-off-by: Andrzej Hajda This has been already reported and discussed: http://thread.gmane.org/gmane.linux.kernel.janitors/34058 but apparently slipped through cracks :-( therefore: Acked-by: Pawel Moll Thanks for bringing it up again! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] clk: versatile: Switch to assigned clock parents
On Wed, 2015-08-05 at 18:56 +0100, Stephen Boyd wrote: > On 08/05/2015 03:29 AM, Pawel Moll wrote: > > > > Right, that's probably my fault. The sp810 nodes appeared in the VE > > trees very early (in terms of the "DT era"), when we weren't so strict > > about documentation. So far, it would be fairly simple, something like > > the text below - feel free to take it, mend it, extend it with your > > changes and include in the series: > > > > Thanks! > > Great! Can you please provide your signed-off-by? Sure. 8< >From cddb00e4c8b6b57e82008fa23a7359560e5c997c Mon Sep 17 00:00:00 2001 From: Pawel Moll Date: Thu, 6 Aug 2015 16:04:10 +0100 Subject: [PATCH] clk: versatile: Add SP810 device tree bindings document Signed-off-by: Pawel Moll --- Documentation/devicetree/bindings/arm/sp810.txt | 35 + 1 file changed, 35 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/sp810.txt diff --git a/Documentation/devicetree/bindings/arm/sp810.txt b/Documentation/devicetree/bindings/arm/sp810.txt new file mode 100644 index 000..440ee08 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/sp810.txt @@ -0,0 +1,35 @@ +SP810 System Controller +--- + +Required properties: + +- compatible: standard compatible string for a Primecell peripheral, + see Documentation/devicetree/bindings/arm/primecell.txt + for more details + should be: "arm,sp810", "arm,primecell" + +- reg: standard registers property, physical address and size + of the control registers + +- clock-names: from the common clock bindings, for more details see + Documentation/devicetree/bindings/clock/clock-bindings.txt; + should be: "refclk", "timclk", "apb_pclk" + +- clocks: from the common clock bindings, phandle and clock + specifier pairs for the entries of clock-names property + +- #clock-cells: from the common clock bindings; + should be: <1> + +- clock-output-names: from the common clock bindings; + should be: "timerclken0", "timerclken1", "timerclken2", "timerclken3" + +Example: + sysctl@02 { + compatible = "arm,sp810", "arm,primecell"; + reg = <0x02 0x1000>; + clocks = <&v2m_refclk32khz>, <&v2m_refclk1mhz>, <&smbclk>; + clock-names = "refclk", "timclk", "apb_pclk"; + #clock-cells = <1>; + clock-output-names = "timerclken0", "timerclken1", "timerclken2", "timerclken3"; + }; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] clk: versatile: Switch to assigned clock parents
On Mon, 2015-08-03 at 18:55 +0100, Stephen Boyd wrote: > On 08/03/2015 07:01 AM, Pawel Moll wrote: > > On Sat, 2015-08-01 at 00:44 +0100, Stephen Boyd wrote: > >> @@ -179,6 +124,15 @@ static void __init clk_sp810_of_setup(struct > >> device_node *node) > >>sp810->timerclken[i].channel = i; > >>sp810->timerclken[i].hw.init = &init; > >> > >> + /* > >> + * If DT isn't setting the parent, force it to be > >> + * the 1 MHz clock without going through the framework. > >> + * We do this before clk_register() so that it can determine > >> + * the parent and setup the tree properly. > >> + */ > >> + if (deprecated) > >> + init.ops->set_parent(&sp810->timerclken[i].hw, 1); > >> + > >>sp810->timerclken[i].clk = clk_register(NULL, > >>&sp810->timerclken[i].hw); > >>WARN_ON(IS_ERR(sp810->timerclken[i].clk)); > > So that's one thing I've got a (small) problem here... > > > > The above change assumes that SP810 always have 32kHz clock on input 0 > > and 1MHz clock on input 1. Yes, this is how it made on VExpress, but it > > doesn't have to be the case. The magic picking up the faster clock was > > added to handle all possible cases. > > > > The bottom line is: if all we care is VExpress than it works, but it's > > still a hack. Personally I don't like it, however I won't nak the patch > > because of this. > > All current dts files in the kernel tree have 1MHz on input 1, so we > make this change here to set the parent to input 1 if there isn't an > assigned-clock-parents property. That's what I said - all VE platforms known to me will work, because they all have the clocks wired up the same way. And I still don't like code in a "generic SP810 driver" assuming this. Call me what you want ;-) > Presumably new dts files should have > the new property so that things work properly. Sure, no argument here. > I tried to find the > binding document, but it doesn't look to exist, so I didn't have > anything to update. Right, that's probably my fault. The sp810 nodes appeared in the VE trees very early (in terms of the "DT era"), when we weren't so strict about documentation. So far, it would be fairly simple, something like the text below - feel free to take it, mend it, extend it with your changes and include in the series: Thanks! Pawel 8<- SP810 System Controller --- Required properties: - compatible: standard compatible string for a Primecell peripheral, see Documentation/devicetree/bindings/arm/primecell.txt for more details should be: "arm,sp810", "arm,primecell" - reg: standard registers property, physical address and size of the control registers - clock-names: from the common clock bindings, for more details see Documentation/devicetree/bindings/clock/clock-bindings.txt; should be: "refclk", "timclk", "apb_pclk" - clocks: from the common clock bindings, phandle and clock specifier pairs for the entries of clock-names property - #clock-cells: from the common clock bindings; should be: <1> - clock-output-names: from the common clock bindings; should be: "timerclken0", "timerclken1", "timerclken2", "timerclken3" Example: sysctl@02 { compatible = "arm,sp810", "arm,primecell"; reg = <0x02 0x1000>; clocks = <&v2m_refclk32khz>, <&v2m_refclk1mhz>, <&smbclk>; clock-names = "refclk", "timclk", "apb_pclk"; #clock-cells = <1>; clock-output-names = "timerclken0", "timerclken1", "timerclken2", "timerclken3"; }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] clk: versatile: Switch to assigned clock parents
On Sat, 2015-08-01 at 00:44 +0100, Stephen Boyd wrote: > @@ -179,6 +124,15 @@ static void __init clk_sp810_of_setup(struct device_node > *node) > sp810->timerclken[i].channel = i; > sp810->timerclken[i].hw.init = &init; > > + /* > + * If DT isn't setting the parent, force it to be > + * the 1 MHz clock without going through the framework. > + * We do this before clk_register() so that it can determine > + * the parent and setup the tree properly. > + */ > + if (deprecated) > + init.ops->set_parent(&sp810->timerclken[i].hw, 1); > + > sp810->timerclken[i].clk = clk_register(NULL, > &sp810->timerclken[i].hw); > WARN_ON(IS_ERR(sp810->timerclken[i].clk)); So that's one thing I've got a (small) problem here... The above change assumes that SP810 always have 32kHz clock on input 0 and 1MHz clock on input 1. Yes, this is how it made on VExpress, but it doesn't have to be the case. The magic picking up the faster clock was added to handle all possible cases. The bottom line is: if all we care is VExpress than it works, but it's still a hack. Personally I don't like it, however I won't nak the patch because of this. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 24/26] clk: versatile: Convert to clk_hw based provider APIs
On Fri, 2015-07-31 at 18:04 +0100, Stephen Boyd wrote: > We're removing struct clk from the clk provider API, so switch > this code to using the clk_hw based provider APIs. > > Cc: Pawel Moll > Cc: Linus Walleij > Signed-off-by: Stephen Boyd > --- > drivers/clk/versatile/clk-sp810.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/clk/versatile/clk-sp810.c > b/drivers/clk/versatile/clk-sp810.c > index 7fbe4d4bf35e..af653bfd4901 100644 > --- a/drivers/clk/versatile/clk-sp810.c > +++ b/drivers/clk/versatile/clk-sp810.c > @@ -80,7 +80,7 @@ static int clk_sp810_timerclken_prepare(struct clk_hw *hw) > { > struct clk_sp810_timerclken *timerclken = to_clk_sp810_timerclken(hw); > struct clk_sp810 *sp810 = timerclken->sp810; > - struct clk *old_parent = __clk_get_parent(hw->clk); > + struct clk_hw *old_parent = clk_hw_get_parent(hw); > struct clk *new_parent; > > if (!sp810->refclk) Acked-by: Pawel Moll (disclaimer: not tested ;-) Thanks! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/core] perf tools: Fix test build error when bindir contains double slash
Commit-ID: 0927beeca5f9d1a7978f8da9c9d28647859816d3 Gitweb: http://git.kernel.org/tip/0927beeca5f9d1a7978f8da9c9d28647859816d3 Author: Pawel Moll AuthorDate: Tue, 28 Jul 2015 15:10:13 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 28 Jul 2015 13:03:49 -0300 perf tools: Fix test build error when bindir contains double slash When building with a prefix ending with a slash, for example: $ make prefix=/usr/local/ one of the perf tests fail to compile due to BUILD_STR macro mishandling bindir_SQ string containing with two slashes: -DBINDIR="BUILD_STR(/usr/local//bin)" with the following error: CC tests/attr.o tests/attr.c: In function ‘test__attr’: tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR); ^ tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token } ^ tests/attr.c:176:1: error: control reaches end of non-void function [-Werror=return-type] } ^ cc1: all warnings being treated as errors This patch works around the problem by "cleaning" the bindir string using make's abspath function. Signed-off-by: Pawel Moll Acked-by: Jiri Olsa Cc: Adrian Hunter Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1438092613-21014-1-git-send-email-pawel.m...@arm.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/config/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 094ddae..d31fac1 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -638,7 +638,7 @@ ifndef DESTDIR prefix ?= $(HOME) endif bindir_relative = bin -bindir = $(prefix)/$(bindir_relative) +bindir = $(abspath $(prefix)/$(bindir_relative)) mandir = share/man infodir = share/info perfexecdir = libexec/perf-core -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] tools: perf: Fix test build error when bindir contains double slash
When building with a prefix ending with a slash, for example: $ make prefix=/usr/local/ one of the perf tests fail to compile due to BUILD_STR macro mishandling bindir_SQ string containing with two slashes: -DBINDIR="BUILD_STR(/usr/local//bin)" with the following error: CC tests/attr.o tests/attr.c: In function ‘test__attr’: tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR); ^ tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token } ^ tests/attr.c:176:1: error: control reaches end of non-void function [-Werror=return-type] } ^ cc1: all warnings being treated as errors This patch works around the problem by "cleaning" the bindir string using make's abspath function. Signed-off-by: Pawel Moll --- tools/perf/config/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index b4a9c29..e8d987a 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -638,7 +638,7 @@ ifndef DESTDIR prefix ?= $(HOME) endif bindir_relative = bin -bindir = $(prefix)/$(bindir_relative) +bindir = $(abspath $(prefix)/$(bindir_relative)) mandir = share/man infodir = share/info perfexecdir = libexec/perf-core -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf tests: Adding build test for having ending slash in double slash
On Tue, 2015-07-28 at 13:40 +0100, Pawel Moll wrote: > Interesting. Building from a perf-tar-src-pkg tarball works: > > $ rm -rf /tmp/krava/ && make install prefix=/tmp/krava > [...] > LINK perf > LINK libperf-gtk.so > INSTALL GTK UI > INSTALL binaries > INSTALL tests > INSTALL libexec > INSTALL perf-archive > INSTALL perf-with-kcore > INSTALL perl-scripts > INSTALL python-scripts > INSTALL perf_completion-script > > but it breaks indeed the way you pointed out when building in-tree... Ignore what I said above. The real issue is that $(realpath PATH) returns nothing if PATH does not actually exist. Didn't know that... Will send v2 soon. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tools: perf: Avoid error message when building with no python installed
On Mon, 2015-07-27 at 19:36 +0100, Jiri Olsa wrote: > > @@ -150,7 +150,7 @@ ifndef NO_LIBPYTHON > >PYTHON2 := $(if $(call get-executable,python2),python2,python) > >override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) > >PYTHON2_CONFIG := \ > > -$(if $(call > > get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config) > > +$(if $(PYTHON),$(if $(call > > get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)) > >override PYTHON_CONFIG := \ > > $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG)) > > haven't tried, but seems like we want to add also the else part > to get default into PYTHON_CONFIG? > -$(if $(PYTHON),$(if $(call > get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)) > +$(if $(PYTHON),$(if $(call > get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config),python-config) It's not strictly necessary, because the issue here is just executing "command -v -c" (followed by "onfig" :-), but of course it will still work. > it fails anyway later either way, but seems like the 'python-config' > was the default case that was meant to fail later Well, yes, I had the same impression. But at the same time get-executable-or-default is constructed in such a way that either of the variants must be a real executable: --8<--- # get-supplied-or-default-executable # # Usage: absolute-executable-path-or-empty = $(call get-executable-or-default,variable,default) # define get-executable-or-default $(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(call _ge_attempt,$(2))) endef _ge_attempt = $(if $(get-executable),$(get-executable),$(call _gea_err,$(2))) _gea_err = $(if $(1),$(error Please set '$(1)' appropriately)) --8<--- See? Both alternatives in the $(if $($(1)) monster do _ge_attempt... So one could say that the change may rather look like this (untested): --8<--- diff --git a/tools/perf/config/utilities.mak b/tools/perf/config/utilities.mak index 0ebef09..78135dc 100644 --- a/tools/perf/config/utilities.mak +++ b/tools/perf/config/utilities.mak @@ -173,7 +173,7 @@ _ge-abspath = $(if $(is-executable),$(1)) # Usage: absolute-executable-path-or-empty = $(call get-executable-or-default,variable,default) # define get-executable-or-default -$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(call _ge_attempt,$(2))) +$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(2)) endef _ge_attempt = $(if $(get-executable),$(get-executable),$(call _gea_err, $(2))) _gea_err = $(if $(1),$(error Please set '$(1)' appropriately)) --8<--- Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf tests: Adding build test for having ending slash in double slash
On Mon, 2015-07-27 at 19:24 +0100, Jiri Olsa wrote: > I'm getting install error with your patch: > > [jolsa@krava perf]$ rm -rf /tmp/krava/ && make install prefix=/tmp/krava > > SNIP > > LINK perf > INSTALL binaries > INSTALL tests > install: cannot create directory ‘’: No such file or directory > install: target ‘’ is not a directory: No such file or directory > ln: failed to access ‘/perf’: No such file or directory > Makefile.perf:492: recipe for target 'install-tools' failed > make[1]: *** [install-tools] Error 1 > make[1]: *** Waiting for unfinished jobs > Makefile:87: recipe for target 'install' failed > make: *** [install] Error 2 > > somethings wrong with install targets dealing with bindir_SQ Interesting. Building from a perf-tar-src-pkg tarball works: $ rm -rf /tmp/krava/ && make install prefix=/tmp/krava [...] LINK perf LINK libperf-gtk.so INSTALL GTK UI INSTALL binaries INSTALL tests INSTALL libexec INSTALL perf-archive INSTALL perf-with-kcore INSTALL perl-scripts INSTALL python-scripts INSTALL perf_completion-script but it breaks indeed the way you pointed out when building in-tree... Having a look now. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] tools: perf: Fix test build error when bindir contains double slash
When building with a prefix ending with a slash, for example: $ make prefix=/usr/local/ one of the perf tests fail to compile due to BUILD_STR macro mishandling bindir_SQ string containing with two slashes: -DBINDIR="BUILD_STR(/usr/local//bin)" with the following error: CC tests/attr.o tests/attr.c: In function ‘test__attr’: tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR); ^ tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token } ^ tests/attr.c:176:1: error: control reaches end of non-void function [-Werror=return-type] } ^ cc1: all warnings being treated as errors This patch works around the problem by "cleaning" the bindir string using make's realpath function. Signed-off-by: Pawel Moll --- tools/perf/config/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index b4a9c29..59e8376 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -638,7 +638,7 @@ ifndef DESTDIR prefix ?= $(HOME) endif bindir_relative = bin -bindir = $(prefix)/$(bindir_relative) +bindir = $(realpath $(prefix)/$(bindir_relative)) mandir = share/man infodir = share/info perfexecdir = libexec/perf-core -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] tools: perf: Avoid error message when building with no python installed
When run on a system without a python interpreter installed, the makefile probing python will try to execute /bin/sh -c 'command -v -config [...]' instead of /bin/sh -c 'command -v python-config [...]' resulting in the following error message: sh: 1: command: Illegal option -c Although harmless, the message can be quite confusing and hard to track. Signed-off-by: Pawel Moll --- tools/perf/config/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 094ddae..b4a9c29 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -150,7 +150,7 @@ ifndef NO_LIBPYTHON PYTHON2 := $(if $(call get-executable,python2),python2,python) override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) PYTHON2_CONFIG := \ -$(if $(call get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config) +$(if $(PYTHON),$(if $(call get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)) override PYTHON_CONFIG := \ $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG)) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] perf: Add PERF_RECORD_SWITCH to indicate context switches
On Fri, 2015-06-12 at 14:28 +0100, Pawel Moll wrote: > On Fri, 2015-06-12 at 14:15 +0100, Adrian Hunter wrote: > > >>>> all 3 are already part of sample_id. > > >>> > > >>> You have to decide whether you expect to be able to use an event without > > >>> sample_id. MMAP and MMAP2 both have pid, tid which are in sample_id, > > >>> LOST > > >>> has id, EXIT and FORK have time, all of the THROTTLE/UNTHROTTLE members > > >>> are > > >>> in sample_id etc. So it currently looks like we expect to be able to > > >>> use an > > >>> event without requiring sample_id. > > > > > > The fact that there is this duplication is because sample_id_all came > > > after those events, but this new one being proposed doesn't have to do > > > it :-) > > > > Thanks, that's clear then. There will just need to be a flag to indicate > > whether it is scheduling in or out. > > Just a thought: wouldn't it be good to know what CPU have we been > scheduled from/to? This kind of information would be especially valuable > in heterogeneous systems. Of course, cpu is a part of sample_id as well. I'm sorted out then :-) Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] perf: Add PERF_RECORD_SWITCH to indicate context switches
On Fri, 2015-06-12 at 14:15 +0100, Adrian Hunter wrote: > all 3 are already part of sample_id. > >>> > >>> You have to decide whether you expect to be able to use an event without > >>> sample_id. MMAP and MMAP2 both have pid, tid which are in sample_id, LOST > >>> has id, EXIT and FORK have time, all of the THROTTLE/UNTHROTTLE members > >>> are > >>> in sample_id etc. So it currently looks like we expect to be able to use > >>> an > >>> event without requiring sample_id. > > > > The fact that there is this duplication is because sample_id_all came > > after those events, but this new one being proposed doesn't have to do > > it :-) > > Thanks, that's clear then. There will just need to be a flag to indicate > whether it is scheduling in or out. Just a thought: wouldn't it be good to know what CPU have we been scheduled from/to? This kind of information would be especially valuable in heterogeneous systems. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 14/19] clk: versatile: Silence sparse warnings
On Wed, 2015-05-06 at 08:39 +0100, Stephen Boyd wrote: > drivers/clk/versatile/clk-sp810.c:159:29: error: incompatible types for > operation (<=) > drivers/clk/versatile/clk-sp810.c:159:29:left side has type char const > * > drivers/clk/versatile/clk-sp810.c:159:29:right side has type int > drivers/clk/versatile/clk-sp810.c:159:53: error: incompatible types for > operation (<=) > drivers/clk/versatile/clk-sp810.c:159:53:left side has type char const > * > drivers/clk/versatile/clk-sp810.c:159:53:right side has type int > rivers/clk/versatile/clk-sp810.c:138:13: warning: symbol 'clk_sp810_of_setup' > was not declared. Should it be static? > > Cc: Pawel Moll > Signed-off-by: Stephen Boyd Acked-by: Pawel Moll > @@ -156,7 +156,7 @@ void __init clk_sp810_of_setup(struct device_node *node) > "timclk"); > parent_names[1] = of_clk_get_parent_name(node, sp810->timclk_index); > > - if (parent_names[0] <= 0 || parent_names[1] <= 0) { > + if (!parent_names[0] || !parent_names[1]) { > pr_warn("Failed to obtain parent clocks for SP810!\n"); > return; > } I stared at it (and at git blame output) and was thinking what was I smoking typing the code... Fortunately mail history suggest that I had "!parent..." in my original patch, and it was modified by Mike ;-) No harm done :-) Thanks for fixing this! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tracing: Export key trace event symbols
On Thu, 2015-04-23 at 16:28 +0100, Pawel Moll wrote: > On Wed, 2015-04-22 at 16:36 +0100, David Ahern wrote: > > On 4/22/15 8:47 AM, Arnaldo Carvalho de Melo wrote: > > > Em Wed, Apr 22, 2015 at 08:53:14AM -0400, Steven Rostedt escreveu: > > >> >On Tue, 21 Apr 2015 21:24:51 -0500 > > >> >Ron Rechenmacher wrote: > > >>> > >I've looked at the above reference briefly and it appears that > > >>> > >user-space > > >>> > >would be mmapping the buffer read-only. Is that correct? > > >> > > > >> >Correct, but I'm sure we could still add something (if it doesn't > > >> >already exist) to have userspace write into the buffer. Ftrace has that > > >> >with the trace_marker file. > > > There is something in the works, I guess Pawell Moll (sp) was working on > > > it, and > > > David Ahern (CCed) should know, David? > > > > > > > I played around with generating perf events in userspace with the > > intention of having the userspace events get merged with kernel events > > during the processing stage, but I did not take it to the point of > > integrating into perf. This was around October 2013. I got distracted > > with other topics and have not come back to it. > > > > Pawel has a patch that allows userspace to inject events into the stream > > via ioctl calls. > > In the last version it was even a prctl - no need for a perf file > descriptor any more :-) > > But the patch requires more care if it's to go in, so I'm open to people > screaming "yes, we need it!" ;-) Forgot to quote the link: http://article.gmane.org/gmane.linux.kernel.api/5851 Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tracing: Export key trace event symbols
On Wed, 2015-04-22 at 16:36 +0100, David Ahern wrote: > On 4/22/15 8:47 AM, Arnaldo Carvalho de Melo wrote: > > Em Wed, Apr 22, 2015 at 08:53:14AM -0400, Steven Rostedt escreveu: > >> >On Tue, 21 Apr 2015 21:24:51 -0500 > >> >Ron Rechenmacher wrote: > >>> > >I've looked at the above reference briefly and it appears that > >>> > >user-space > >>> > >would be mmapping the buffer read-only. Is that correct? > >> > > >> >Correct, but I'm sure we could still add something (if it doesn't > >> >already exist) to have userspace write into the buffer. Ftrace has that > >> >with the trace_marker file. > > There is something in the works, I guess Pawell Moll (sp) was working on > > it, and > > David Ahern (CCed) should know, David? > > > > I played around with generating perf events in userspace with the > intention of having the userspace events get merged with kernel events > during the processing stage, but I did not take it to the point of > integrating into perf. This was around October 2013. I got distracted > with other topics and have not come back to it. > > Pawel has a patch that allows userspace to inject events into the stream > via ioctl calls. In the last version it was even a prctl - no need for a perf file descriptor any more :-) But the patch requires more care if it's to go in, so I'm open to people screaming "yes, we need it!" ;-) > Stephane also injects events for JIT. But that, as far as I understand, happens in userspace (as in: nothing goes down to kernel)? > One of the key requirements is a common time basis (e.g., > CLOCK_MONOTONIC or PERF_CLOCK) to be able to merge the events properly. > I have a kernel module that exports perf_clock to userspace via > clock_gettime; the 4.1 kernel should have the code that allows the clock > id to be specified providing a solution to this problem. It's in! :-) After all these years... http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34f439278cef7b1177f8ce24f9fc81dfc6221d3b Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/6] virtio_mmio: support non-legacy balloon devices
On Mon, 2015-03-30 at 18:37 +0100, Michael S. Tsirkin wrote: > virtio_device_is_legacy_only is always false now, > drop the test from virtio mmio. > > Signed-off-by: Michael S. Tsirkin Slightly ironic ack ;-) after all the battle you fought for this: Acked-by: Pawel Moll Thanks! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 19/35 linux-next] virtio_mmio: constify of_device_id array
On Mon, 2015-03-16 at 19:20 +, Fabian Frederick wrote: > of_device_id is always used as const. > (See driver.of_match_table and open firmware functions) > > Signed-off-by: Fabian Frederick > --- > drivers/virtio/virtio_mmio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > index 9c877d2..0ce8cda 100644 > --- a/drivers/virtio/virtio_mmio.c > +++ b/drivers/virtio/virtio_mmio.c > @@ -671,7 +671,7 @@ static void vm_unregister_cmdline_devices(void) > > /* Platform driver */ > > -static struct of_device_id virtio_mmio_match[] = { > +static const struct of_device_id virtio_mmio_match[] = { > { .compatible = "virtio,mmio", }, > {}, > }; Acked-by: Pawel Moll Thanks! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio_mmio: generation support
On Thu, 2015-03-12 at 02:07 +, Rusty Russell wrote: > "Michael S. Tsirkin" writes: > > virtio_mmio currently lacks generation support which > > makes multi-byte field access racy. > > Fix by getting the value at offset 0xfc for version 2 > > devices. Nothing we can do for version 1, so return > > generation id 0. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > > > Pawel, you mentioned you have a working virtio 1.0 > > hypervisor, can you pls confirm this works correctly? Again, belated Acked-by: Pawel Moll although not yet tested-by... Paweł -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio_mmio: fix endian-ness for mmio
On Fri, 2015-03-13 at 01:22 +, Rusty Russell wrote: > I'm sure Pawel is on a beach somewhere sipping cocktails, Ehm... There's still a some more prohibition for me (granted, by choice, just to share the "pain" ;-), so it wasn't exactly > so I'll apply > this immediately (with your updated commit message) to my > pending-rebases and give him the weekend to Ack. Not that it matters any more, but a belated Acked-by: Pawel Moll Thanks, Michael, for fixing this! (although, I must admit, I haven't tested it yet - all in right time :-) Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register
On Wed, 2015-03-04 at 00:37 +, Arun Ramamurthy wrote: > > That way you're precisely describing the way the hardware is wired up. > > And the driver simply tries to get clcdclk first, if it's defined - > > cool, set clksel to 1, if not - try hclk and set clksel to 0. If neither > > of them is present - bail out. > > > > Does this make any sense? > > > This makes sense to me, thank you for the suggestions. I will fix it all > up in V2 Cool. Just a word of comment to my own words ;-) The "bail out" case was a bad idea - the non-DT use cases more likely than not will have no clock name defined. So, to maintain backward compatibility, the driver will still have to work when no named clock is available. I think the simplest way to do that is to check if "hclk" is available, and use it if it is (setting the clksel accordingly). Otherwise - proceed as it was the case previously. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register
On Tue, 2015-03-03 at 10:02 +, Pawel Moll wrote: > On Mon, 2015-03-02 at 19:09 +, Arun Ramamurthy wrote: > > > The existing bindings intentionally avoided quoting internal registers - > > > they are supposed to describe how the hardware is wired up... > > > > > > So how about something like "arm,pl11x,tft-invert-clac"? Then the driver > > > sets the bit or not, depending on the property existance? > > > > > Sure, I can change it to two properties called arm,pl11x,tft-invert-clac > > and arm,pl11x,tft-clksel. Would that be acceptable? > > That would be fine by me :-) Or (after having a look at the TRM) I should rather say: the invert-clac is fine by me :-) but the tft-clksel doesn't work, I afraid. If I'm not mistaken, there are two problems with it. Number one: it's not TFT-specific, is it? So it certainly should not have the "tft-" bit. Number two: setting this bit says "do not use CLCDCLK for the logic; use HCLK instead", correct? If so, have a look at the clock properties. They say: - clock-names: should contain "clcdclk" and "apb_pclk" - clocks: contains phandle and clock specifier pairs for the entries in the clock-names property. See So if your hardware has the reference clock wired to HCLK, and you defining the clocks as "clcdclk", you are (no offence meant ;-) lying :-) So how about solving the problem by extending the clock-names definition like this (feel free to use own wording): - clock-names: should contain two clocks, either "clcdclk" or "hclk" (depending on which input is to be used as a reference clock by the controller logic) and "apb_pclk" That way you're precisely describing the way the hardware is wired up. And the driver simply tries to get clcdclk first, if it's defined - cool, set clksel to 1, if not - try hclk and set clksel to 0. If neither of them is present - bail out. Does this make any sense? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register
On Mon, 2015-03-02 at 19:09 +, Arun Ramamurthy wrote: > > The existing bindings intentionally avoided quoting internal registers - > > they are supposed to describe how the hardware is wired up... > > > > So how about something like "arm,pl11x,tft-invert-clac"? Then the driver > > sets the bit or not, depending on the property existance? > > > Sure, I can change it to two properties called arm,pl11x,tft-invert-clac > and arm,pl11x,tft-clksel. Would that be acceptable? That would be fine by me :-) Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] video: ARM CLCD: Added support for FBIO_WAITFORVSYNC
On Mon, 2015-03-02 at 19:09 +, Arun Ramamurthy wrote: > On 15-03-02 08:00 AM, Pawel Moll wrote: > > On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote: > >> Added ioctl and interrupt handler functions to support FBIO_WAITFORVSYNC > >> Also corrected documentation to make interrupts and interrupt-names > >> optional as they are not required properties. > > > > You may not be aware of this fact, but its the "documentation" what > > defines what properties are required... > > > Pawel, I was not aware of that. Since the driver code did not require > the interrupts or interrupt-names bindings to load properly, I moved it > out of the required properties. I can remove that change. Cool. Drivers don't have to use all available properties :-) The interrupt names are required because CLCD can be wired up with a single, combined interrupt or with 4 separate interrupt lines (see "A.4. On-chip signals", in the TRM) and the binding has to provide ways of describing this. Yes, one could say "1 number == combined, 4 numbers == split", but I personally prefer to be explicit than implicit. Just request the interrupt by name and you'll be fine. > Any other comments on this change? Thanks I have no experience with the vsync ioctl, so can't really comment on it. One minor thing I did spot is your use of curly brackets in one of the switch cases: On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote: @@ -466,6 +468,73 @@ static int clcdfb_pan_display(struct fb_var_screeninfo *var, > return 0; > } > > +static int clcdfb_ioctl(struct fb_info *info, > + unsigned int cmd, unsigned long args) > +{ > + struct clcd_fb *fb = to_clcd(info); > + int retval = 0; > + u32 val, ienb_val; > + > + switch (cmd) { > + case FBIO_WAITFORVSYNC:{ In the line above... > + if (fb->lcd_irq <= 0) { > + retval = -EINVAL; > + break; > + } > + /* disable Vcomp interrupts */ > + ienb_val = readl(fb->regs + fb->off_ienb); > + ienb_val &= ~CLCD_PL111_IENB_VCOMP; > + writel(ienb_val, fb->regs + fb->off_ienb); > + > + /* clear Vcomp interrupt */ > + writel(CLCD_PL111_IENB_VCOMP, fb->regs + CLCD_PL111_ICR); > + > + /* Generate Interrupt at the start of Vsync */ > + reinit_completion(&fb->wait); > + val = readl(fb->regs + fb->off_cntl); > + val &= ~(CNTL_LCDVCOMP(3)); > + writel(val, fb->regs + fb->off_cntl); > + > + /* enable Vcomp interrupt */ > + ienb_val = readl(fb->regs + fb->off_ienb); > + ienb_val |= CLCD_PL111_IENB_VCOMP; > + writel(ienb_val, fb->regs + fb->off_ienb); > + if (!wait_for_completion_interruptible_timeout > + (&fb->wait, HZ/10)) > + retval = -ETIMEDOUT; > + break; > + } ... and here. Not sure it's needed? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] video: ARM CLCD: Added dt support to set tim2 register
On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote: > Added code based on linaro tree: > http://git.linaro.org/kernel/linux-linaro-stable.git > with commit id:6846e7822c4cab5a84672baace3b768c2d0db142 > at drivers/video/amba-clcd.c. This lets the driver set > certain tim2 register bits after reading them from > device tree. > > Reviewed-by: Ray Jui > Reviewed-by: Scott Branden > Signed-off-by: Arun Ramamurthy > --- > .../devicetree/bindings/video/arm,pl11x.txt| 17 - > drivers/video/fbdev/amba-clcd.c| 41 > ++ > 2 files changed, 57 insertions(+), 1 deletion(-) > > diff --git a/Documentation/devicetree/bindings/video/arm,pl11x.txt > b/Documentation/devicetree/bindings/video/arm,pl11x.txt > index 3e3039a..14d6f87 100644 > --- a/Documentation/devicetree/bindings/video/arm,pl11x.txt > +++ b/Documentation/devicetree/bindings/video/arm,pl11x.txt > @@ -35,6 +35,21 @@ Optional properties: > cell's memory interface can handle; if not present, the memory > interface is fast enough to handle all possible video modes > > +- tim2: Used to set certain bits in LCDTiming2 register. > + It can be TIM2_CLKSEL or TIM2_IOE or both > + > + TIM2_CLKSEL: This bit drives the CLCDCLKSEL signal. It is the select > + signal for the external LCD clock multiplexor. > + > + TIM2_IOE: Invert output enable: > + 0 = CLAC output pin is active HIGH in TFT mode > + 1 = CLAC output pin is active LOW in TFT mode. > + This bit selects the active polarity of the output enable signal in > + TFT mode. In this mode, the CLAC pin is an enable that indicates to > + the LCD panel when valid display data is available. In active > + display mode, data is driven onto the LCD data lines at the > + programmed edge of CLCP when CLAC is in its active state. > + The existing bindings intentionally avoided quoting internal registers - they are supposed to describe how the hardware is wired up... So how about something like "arm,pl11x,tft-invert-clac"? Then the driver sets the bit or not, depending on the property existance? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] video: ARM CLCD: Added support for FBIOPAN_DISPLAY and virtual y resolution
On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote: > Added code to support FBIOPAN_DISPLAY. Also added yres_virtual > parameter to device tree to set the virtual y resolution > > Reviewed-by: Ray Jui > Reviewed-by: Scott Branden > Signed-off-by: Arun Ramamurthy > --- > .../devicetree/bindings/video/arm,pl11x.txt| 4 +++ > drivers/video/fbdev/amba-clcd.c| 31 > +++--- > 2 files changed, 31 insertions(+), 4 deletions(-) > > diff --git a/Documentation/devicetree/bindings/video/arm,pl11x.txt > b/Documentation/devicetree/bindings/video/arm,pl11x.txt > index 14d6f87..2262cdb 100644 > --- a/Documentation/devicetree/bindings/video/arm,pl11x.txt > +++ b/Documentation/devicetree/bindings/video/arm,pl11x.txt > @@ -50,6 +50,10 @@ Optional properties: > display mode, data is driven onto the LCD data lines at the > programmed edge of CLCP when CLAC is in its active state. > > +- yres_virtual: Virtual Y resolution, > + It can be used to configure a virtual y resolution. It > + must be a value larger than the actual y resolution. > + I'm not sure about this... The word "virtual" never works well with device tree nodes defined as "hardware description". I understand what you're doing, but adding this property to the display controller's node doesn't sound right. How does this describe hardware? If anywhere, it's more like a job for the panel node? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] video: ARM CLCD: Added support for FBIO_WAITFORVSYNC
On Wed, 2015-02-25 at 21:01 +, Arun Ramamurthy wrote: > Added ioctl and interrupt handler functions to support FBIO_WAITFORVSYNC > Also corrected documentation to make interrupts and interrupt-names > optional as they are not required properties. You may not be aware of this fact, but its the "documentation" what defines what properties are required... > Reviewed-by: Ray Jui > Reviewed-by: Scott Branden > Signed-off-by: Arun Ramamurthy 0 > --- > .../devicetree/bindings/video/arm,pl11x.txt| 11 +-- > drivers/video/fbdev/amba-clcd.c| 82 > ++ > include/linux/amba/clcd.h | 4 ++ > 3 files changed, 89 insertions(+), 8 deletions(-) > > diff --git a/Documentation/devicetree/bindings/video/arm,pl11x.txt > b/Documentation/devicetree/bindings/video/arm,pl11x.txt > index 2262cdb..7d19024 100644 > --- a/Documentation/devicetree/bindings/video/arm,pl11x.txt > +++ b/Documentation/devicetree/bindings/video/arm,pl11x.txt > @@ -10,14 +10,6 @@ Required properties: > > - reg: base address and size of the control registers block > > -- interrupt-names: either the single entry "combined" representing a > - combined interrupt output (CLCDINTR), or the four entries > - "mbe", "vcomp", "lnbu", "fuf" representing the individual > - CLCDMBEINTR, CLCDVCOMPINTR, CLCDLNBUINTR, CLCDFUFINTR interrupts > - > -- interrupts: contains an interrupt specifier for each entry in > - interrupt-names > - > - clock-names: should contain "clcdclk" and "apb_pclk" > > - clocks: contains phandle and clock specifier pairs for the entries So no, you can't do that. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/2] perf: Add per event clockid support
On Fri, 2015-02-20 at 14:29 +, Peter Zijlstra wrote: > The below patch makes the distinction between these two cases by > adding perf_event_clock() which is used for the second case. It > further makes this configurable on a per-event basis, but adds a few > sanity checks such that we cannot combine events with different clocks > in confusing ways. The idea works for me (obviously :-) > And since we then have per-event configurability we might as well > retain the 'legacy' behaviour as a default. Don't mind that at all. > @@ -334,8 +335,7 @@ struct perf_event_attr { >*/ > __u32 sample_stack_user; > > - /* Align to u64. */ > - __u32 __reserved_2; > + __u32 clockid; I thought about it, but was sort-of-afraid to propose it :-) Now, one thing I'm not 100% sure about it is it being unsigned, as clockid_t is signed for a reason (negative values have meaning - eg. dynamic clocks, which could be useful in some circumstances). Of course casting could be an answer, but is there any reason not to make it __s32? > + default: > + /* XXX add: clock_id_valid() && clock_gettime_ns() ? */ > + err = -EINVAL; > + goto err_alloc; > + } If you asked me, I'd say -EINVAL, no default. Cheers! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] perf/x86: Add ability to sample TSC
On Thu, 2015-02-19 at 17:58 +, Pawel Moll wrote: > and what Adrian did, explicitly > defining possible timestamps at perf_event_attr level instead of > relating them to to posix clock ids is the way to go. No strong opinion > here. One note here: I'd rather make it "processor trace clock", rather than "architecturally defined" one. That way you relate it directly to the time domain and allow userspace decoder to as for it in a architecturally independent way (yes, I'm assuming that there will be only one "type of processor trace in a system :-) Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] perf/x86: Add ability to sample TSC
On Thu, 2015-02-19 at 17:50 +, John Stultz wrote: > On Thu, Feb 19, 2015 at 9:40 AM, Adrian Hunter > wrote: > > On 19/02/2015 7:24 p.m., John Stultz wrote: > >> > >> On Thu, Feb 19, 2015 at 4:11 AM, Adrian Hunter > >> wrote: > >>> > >>> Hi > >>> > >>> With the advent of switching perf_clock to CLOCK_MONOTONIC, > >>> it will not be possible to convert perf_clock directly to/from > >>> TSC. So add the ability to sample TSC instead. > >>> > >>> > >>> Adrian Hunter (2): > >>>perf: Sample additional clock value > >>>perf/x86: Provide TSC for PERF_SAMPLE_CLOCK_ARCH > >> > >> > >> This doesn't seem very portable. The CLOCK_MONOTONIC_RAW clockid was > >> added to provide a arch-neutral abstraction of a free-running hardware > >> counter that isn't affected by adjtimex slewing (though like any > >> counter, it will be affected by non-constant drift). > >> > >> You might consider looking at that if the short term slew adjustments > >> (which result in more accurate timings in the long term) are > >> problematic for you. > > > > > > This is for Intel Processor Trace - which Peter has already > > rightly chastised me for not making plain. > > > > Intel Processor Trace (Intel PT) is a trace that is hardware generated. The > > hardware does not know about linux or its clocks, so the timestamps > > are TSC. I think ARM might have the same issue with ETM or such. i.e. the > > need to synchronize a hardware timestamp with a perf event. > > > > There is a description of Intel PT in the Intel Architecture manuals. > > > > http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html > > Cc'ing Mathieu since he might be familiar w/ any similar issues w/ Coresight. We haven't got to this yet, but it was discussed (briefly) at Plumbers in Dusseldorf. In our case the CS processor trace can be timestamped from yet another clock source, probably hidden behind memory mapped register. Thus the additional time sample of some sort will be required in some form. Peter pointed out that there may be problem with obtaining the "other" timestamps in NMI context, so I was thinking about injecting the synchronisation points as separate records: http://article.gmane.org/gmane.linux.kernel.api/7726/match=perf+sample+additional+clock+value but maybe I'm making it too complicated and what Adrian did, explicitly defining possible timestamps at perf_event_attr level instead of relating them to to posix clock ids is the way to go. No strong opinion here. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
On Fri, 2015-02-13 at 02:52 +, Rusty Russell wrote: > "Michael S. Tsirkin" writes: > > On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote: > >> Jason Wang writes: > >> > This patch enables the interrupt coalescing setting through ethtool. > >> > >> The problem is that there's nothing network specific about interrupt > >> coalescing. I can see other devices wanting exactly the same thing, > >> which means we'd deprecate this in the next virtio standard. > >> > >> I think the right answer is to extend like we did with > >> vring_used_event(), eg: > >> > >> 1) Add a new feature VIRTIO_F_RING_COALESCE. > >> 2) Add another a 32-bit field after vring_used_event(), eg: > >> #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num > >> + 2])) > >> > >> This loses the ability to coalesce by number of frames, but we can still > >> do number of sg entries, as we do now with used_event, and we could > >> change virtqueue_enable_cb_delayed() to take a precise number if we > >> wanted. > > > > But do we expect delay to be update dynamically? > > If not, why not stick it in config space? > > Hmm, we could update it dynamically (and will, in the case of ethtool). > But it won't be common, so we could append a field to > virtio_pci_common_cfg for PCI. > > I think MMIO and CCW would be easy to extend too, but CC'd to check. As far as I understand the virtio_pci_common_cfg principle (just had a look, for the first time ;-), it's now an equivalent of the MMIO control registers block. I see no major problem with adding another one. Or were you thinking about introducing some standard for the "real" config space? (fine with me as well - the transport will have nothing to do :-) Paweł -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] perf: Use monotonic clock as a source for timestamps
On Tue, 2015-02-03 at 08:30 +, ajh mls wrote: > There is still > > http://marc.info/?l=linux-kernel&m=142141223902303 Uh. I have no idea why, but I haven't got this mail at all :-( Thanks for pointing it out! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] perf: Use monotonic clock as a source for timestamps
Afternoon, Peter, On Wed, 2015-01-21 at 20:27 +, Pawel Moll wrote: > Until now, perf framework never defined the meaning of the timestamps > captured as PERF_SAMPLE_TIME sample type. The values were obtaining > from local (sched) clock, which is unavailable in userspace. This made > it impossible to correlate perf data with any other events. Other > tracing solutions have the source configurable (ftrace) or just share > a common time domain between kernel and userspace (LTTng). > > Follow the trend by using monotonic clock, which is readily available > as POSIX CLOCK_MONOTONIC. > > Also add a sysctl "perf_sample_time_clk_id" attribute (usually available > as "/proc/sys/kernel/perf_sample_time_clk_id") which can be used by the > user to obtain the clk_id to be used with POSIX clock API (eg. > clock_gettime()) to obtain a time value comparable with perf samples. > > Old behaviour can be restored by using "perf_use_local_clock" kernel > parameter. > > Signed-off-by: Pawel Moll I know that you're busy with other stuff, but it's already rc7 time again... We can leave the other two patches from the series for later, but how about getting this one merged for 3.20 and ending the 2 or 3 years long struggle? I'm not saying that everyone is happy about it, but no one seems to be unhappy enough to speak :-) Cheers! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/3] perf: Sample additional clock value
On Mon, 2015-01-05 at 13:45 +, Peter Zijlstra wrote: > Also, one would expect something like: > > default: { > struct k_clock *kc = clockid_to_kclock(event->attr.clock); > struct timespec ts; > if (kc) { > kc->clock_get(event->attr.clock, &ts); > data->clock = ktime_to_ns(timespec_to_ktime(ts)); > } else { > data->clock = 0; > } > } > > Albeit preferably slightly less horrible -- of course, one would first > need to deal with the NMI issue. I was thinking about it... Maybe the solution is approaching the problem in a completely different way. As far as I understand (John?) POSIX timers can be used on any clockid? So it would be possible to obtain a dynamic clock id, for example for my exotic trace hardware (by any means necessary, like opening a char device) and create a timer firing every 1 ms (in the trace time domain). Than this event would be somehow associated with a perf session (for example, by passing the timerid via perf's ioctl) and then, every when timer fires, a perf record (something like PERF_RECORD_TIMER?) containing the timer/clock's value *and* the normal perf timestamp, would be injected into the circular buffer. No issue with NMI, no issue with passing clockid through perf_event_attr... Does it make any sense to anyone else but me? ;-) Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5] perf: Use monotonic clock as a source for timestamps
Until now, perf framework never defined the meaning of the timestamps captured as PERF_SAMPLE_TIME sample type. The values were obtaining from local (sched) clock, which is unavailable in userspace. This made it impossible to correlate perf data with any other events. Other tracing solutions have the source configurable (ftrace) or just share a common time domain between kernel and userspace (LTTng). Follow the trend by using monotonic clock, which is readily available as POSIX CLOCK_MONOTONIC. Also add a sysctl "perf_sample_time_clk_id" attribute (usually available as "/proc/sys/kernel/perf_sample_time_clk_id") which can be used by the user to obtain the clk_id to be used with POSIX clock API (eg. clock_gettime()) to obtain a time value comparable with perf samples. Old behaviour can be restored by using "perf_use_local_clock" kernel parameter. Signed-off-by: Pawel Moll --- Changes since v4: - using jump labels to reduce runtime overhead of the local/mono check Documentation/kernel-parameters.txt | 9 kernel/events/core.c| 43 - 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 4df73da..3cccd0e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -91,6 +91,7 @@ parameter is applicable: NUMANUMA support is enabled. NFS Appropriate NFS support is enabled. OSS OSS sound support is enabled. + PERFPerformance events and counters support is enabled. PV_OPS A paravirtualized kernel is enabled. PARIDE The ParIDE (parallel port IDE) subsystem is enabled. PARISC The PA-RISC architecture is enabled. @@ -2795,6 +2796,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. allocator. This parameter is primarily for debugging and performance comparison. + perf_use_local_clock + [PERF] + Use local_clock() as a source for perf timestamps + generation. This was be the default behaviour and + this parameter can be used to maintain backward + compatibility or on older hardware with expensive + monotonic clock source. + pf. [PARIDE] See Documentation/blockdev/paride.txt. diff --git a/kernel/events/core.c b/kernel/events/core.c index 882f835..f45434d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -42,6 +42,8 @@ #include #include #include +#include +#include #include "internal.h" @@ -322,9 +324,43 @@ extern __weak const char *perf_pmu_name(void) return "pmu"; } +static struct static_key perf_use_local_clock_key = STATIC_KEY_INIT_FALSE; +static bool perf_use_local_clock_param __initdata; +static int __init perf_use_local_clock_setup(char *__unused) +{ + perf_use_local_clock_param = true; + return 1; +} +__setup("perf_use_local_clock", perf_use_local_clock_setup); + +static int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; + +static struct ctl_table perf_sample_time_kern_table[] = { + { + .procname = "perf_sample_time_clk_id", + .data = &sysctl_perf_sample_time_clk_id, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = proc_dointvec, + }, + {} +}; + +static struct ctl_table perf_sample_time_root_table[] = { + { + .procname = "kernel", + .mode = 0555, + .child = perf_sample_time_kern_table, + }, + {} +}; + static inline u64 perf_clock(void) { - return local_clock(); + if (static_key_false(&perf_use_local_clock_key)) + return local_clock(); + else + return ktime_get_mono_fast_ns(); } static inline struct perf_cpu_context * @@ -8271,6 +8307,11 @@ void __init perf_event_init(void) */ BUILD_BUG_ON((offsetof(struct perf_event_mmap_page, data_head)) != 1024); + + if (perf_use_local_clock_param) + static_key_slow_inc(&perf_use_local_clock_key); + else + register_sysctl_table(perf_sample_time_root_table); } static int __init perf_event_sysfs_init(void) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps
On Wed, 2015-01-21 at 19:48 +, Pawel Moll wrote: > On Wed, 2015-01-21 at 15:52 +0000, Pawel Moll wrote: > > On Mon, 2015-01-05 at 13:00 +, Peter Zijlstra wrote: > > > On Thu, Nov 06, 2014 at 04:51:56PM +, Pawel Moll wrote: > > > > Documentation/kernel-parameters.txt | 9 + > > > > kernel/events/core.c| 37 > > > > + > > > > 2 files changed, 46 insertions(+) > > > > > > > diff --git a/Documentation/kernel-parameters.txt > > > > b/Documentation/kernel-parameters.txt > > > > index 4c81a86..8ead8d8 100644 > > > > --- a/Documentation/kernel-parameters.txt > > > > +++ b/Documentation/kernel-parameters.txt > > > > > > > @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can > > > > also be entirely omitted. > > > > allocator. This parameter is primarily for > > > > debugging > > > > and performance comparison. > > > > > > > > + perf_use_local_clock > > > > + [PERF] > > > > + Use local_clock() as a source for perf > > > > timestamps > > > > + generation. This was be the default behaviour > > > > and > > > > + this parameter can be used to maintain backward > > > > + compatibility or on older hardware with > > > > expensive > > > > + monotonic clock source. > > > > + > > > > pf. [PARIDE] > > > > See Documentation/blockdev/paride.txt. > > > > > > So I'm always terminally confused on the naming of kernel parameters, > > > sometimes things are modules (even when they're not actually =m capable) > > > and get a module::foo naming or so and sometimes they're not. > > > > I guess you mean module.foo? > > > > > So we want to use the module naming thing or not? > > > > Honestly, I don't mind either way. For one thing ftrace doesn't bother > > and just uses __setup() as well. > > There's one more thing to this - as far as I remember, the module name > is actually derived from the compilation unit name (at some level of > Kbuild). I may be wrong (will have to double check), but a module > parameter defined in kernel/events/core.c may be called something like > "core.parameter" ;-). Ok, so it's possible to enforce "perf." prefix: /* You can override this manually, but generally this should match the module name. */ #ifdef MODULE #define MODULE_PARAM_PREFIX /* empty */ #else #define MODULE_PARAM_PREFIX KBUILD_MODNAME "." #endif So, perf_use_local_clock or perf.use_local_clock? :-) Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps
On Wed, 2015-01-21 at 15:52 +, Pawel Moll wrote: > On Mon, 2015-01-05 at 13:00 +, Peter Zijlstra wrote: > > On Thu, Nov 06, 2014 at 04:51:56PM +0000, Pawel Moll wrote: > > > Documentation/kernel-parameters.txt | 9 + > > > kernel/events/core.c| 37 > > > + > > > 2 files changed, 46 insertions(+) > > > > > diff --git a/Documentation/kernel-parameters.txt > > > b/Documentation/kernel-parameters.txt > > > index 4c81a86..8ead8d8 100644 > > > --- a/Documentation/kernel-parameters.txt > > > +++ b/Documentation/kernel-parameters.txt > > > > > @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can also > > > be entirely omitted. > > > allocator. This parameter is primarily for debugging > > > and performance comparison. > > > > > > + perf_use_local_clock > > > + [PERF] > > > + Use local_clock() as a source for perf timestamps > > > + generation. This was be the default behaviour and > > > + this parameter can be used to maintain backward > > > + compatibility or on older hardware with expensive > > > + monotonic clock source. > > > + > > > pf. [PARIDE] > > > See Documentation/blockdev/paride.txt. > > > > So I'm always terminally confused on the naming of kernel parameters, > > sometimes things are modules (even when they're not actually =m capable) > > and get a module::foo naming or so and sometimes they're not. > > I guess you mean module.foo? > > > So we want to use the module naming thing or not? > > Honestly, I don't mind either way. For one thing ftrace doesn't bother > and just uses __setup() as well. There's one more thing to this - as far as I remember, the module name is actually derived from the compilation unit name (at some level of Kbuild). I may be wrong (will have to double check), but a module parameter defined in kernel/events/core.c may be called something like "core.parameter" ;-). Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/3] perf: Sample additional clock value
On Wed, 2015-01-21 at 17:44 +, John Stultz wrote: > That said, there is the dynamic posix clockids. I'm not sure if it > would make sense, but even if we don't bump MAX_CLOCKS, might there > be some case where someone wants to use a dynamic posix clock for the > perf reference? If I remember correctly, last time I tried to use dynamic posix clocks in the perf context, one needed to open a ptp character device in order to get a file descriptor, than translated into a clockid_t value - correct me if I'm wrong. But here you get the fd from the sys_perf_open() and clock_*() doesn't know anything about such descriptor. I was looking into a way of associating a random clock with a random fd, so that perf could "attach" itself to the clock API at will, but it turned out not to be trivial (I'd have to dig through old threads to remember all the nasty details). The good thing is that it looks like the immediate need for this was no more, with perf using monotonic clock as the clock source. It will come back when we get into hardware trace correlation, but one step at a time... Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/3] perf: Sample additional clock value
On Mon, 2015-01-05 at 13:45 +, Peter Zijlstra wrote: > On Thu, Nov 06, 2014 at 04:51:58PM +0000, Pawel Moll wrote: > > Currently three clocks are implemented: CLOCK_REALITME = 0, > > CLOCK_MONOTONIC = 1 and CLOCK_MONOTONIC_RAW = 2. The clock field is > > 5 bits wide to allow for future extension to custom, non-POSIX clock > > sources(MAX_CLOCK for those is 16, see include/uapi/linux/time.h) like > > ARM CoreSight (hardware trace) timestamp generator. > > > @@ -304,7 +305,16 @@ struct perf_event_attr { > > mmap2 : 1, /* include mmap with inode > > data */ > > comm_exec : 1, /* flag comm events that > > are due to an exec */ > > uevents: 1, /* allow uevents into the > > buffer */ > > - __reserved_1 : 38; > > + > > + /* > > +* clock: one of the POSIX clock IDs: > > +* > > +* 0 - CLOCK_REALTIME > > +* 1 - CLOCK_MONOTONIC > > +* 4 - CLOCK_MONOTONIC_RAW > > +*/ > > + clock : 5, /* clock type */ > > + __reserved_1 : 33; > > > > union { > > __u32 wakeup_events;/* wakeup every n events */ > > This would put a constraint on actually changing MAX_CLOCKS, are the > time people OK with that? Thomas, John? I admit I have some doubts about it myself. But I don't think we can afford full int for the clock id, can we? > I'm also not quite sure of using the >MAX_CLOCKS space for 'special' > clocks, preferably those would register themselves with the POSIX clock > interface. That may be a hard sell - John never was particularly keen on extending the hard-coded clocks with random sources. And the hardware trace clock I had on mind would be probably one of them - its meaning depends a lot on the . Of course I'm looking forward to being surprised :-) > > @@ -4631,6 +4637,24 @@ static void __perf_event_header__init_id(struct > > perf_event_header *header, > > data->cpu_entry.cpu = raw_smp_processor_id(); > > data->cpu_entry.reserved = 0; > > } > > + > > + if (sample_type & PERF_SAMPLE_CLOCK) { > > + switch (event->attr.clock) { > > + case CLOCK_REALTIME: > > + data->clock = ktime_get_real_ns(); > > + break; > > + case CLOCK_MONOTONIC: > > + data->clock = ktime_get_mono_fast_ns(); > > + break; > > + case CLOCK_MONOTONIC_RAW: > > + data->clock = ktime_get_raw_ns(); > > + break; > > + default: > > + data->clock = 0; > > + break; > > + } > > + } > > + > > } > > This is broken. There is nothing stopping this from being called from > NMI context and only the MONO one is NMI safe. Uh, right. > Also, one would expect something like: > > default: { > struct k_clock *kc = clockid_to_kclock(event->attr.clock); > struct timespec ts; > if (kc) { > kc->clock_get(event->attr.clock, &ts); > data->clock = ktime_to_ns(timespec_to_ktime(ts)); > } else { > data->clock = 0; > } > } > > Albeit preferably slightly less horrible -- of course, one would first > need to deal with the NMI issue. Ok, generally this patch clearly needs more work. I'll have to revisit what Thomas did with the monotonic clock to understand the potential solutions. Maybe, in this case, giving such a wide choice of clocks to be sampled is not the right thing after all? Maybe, when we face the issue of a trace clock for real, simple "PERF_SAMPLE_TRACE_CLOCK" will suffice? With the monotonic clock as a normal time base merged (hopefully soon :-) the correlation between kernel and user space (which was the original reason for having this change) won't be an issue any more. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/3] perf: Userspace event
On Mon, 2015-01-05 at 13:12 +, Peter Zijlstra wrote: > On Thu, Nov 06, 2014 at 04:51:57PM +0000, Pawel Moll wrote: > > This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by > > any process to inject custom data into perf data stream as a new > > PERF_RECORD_UEVENT record, if such process is being observed or if it > > is running on a CPU being observed by the perf framework. > > > > The prctl call takes the following arguments: > > > > prctl(PR_TASK_PERF_UEVENT, type, size, data, flags); > > > > - type: a number meaning to describe content of the following data. > > Kernel does not pay attention to it and merely passes it further in > > the perf data, therefore its use must be agreed between the events > > producer (the process being observed) and the consumer (performance > > analysis tool). The perf userspace tool will contain a repository of > > "well known" types and reference implementation of their decoders. > > - size: Length in bytes of the data. > > - data: Pointer to the data. > > - flags: Reserved for future use. Always pass zero. > > > > Perf context that are supposed to receive events generated with the > > prctl above must be opened with perf_event_attr.uevent set to 1. The > > PERF_RECORD_UEVENT records consist of a standard perf event header, > > 32-bit type value, 32-bit data size and the data itself, followed by > > padding to align the overall record size to 8 bytes and optional, > > standard sample_id field. > > > > Example use cases: > > > > - "perf_printf" like mechanism to add logging messages to perf data; > > in the simplest case it can be just > > > > prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0); > > > > - synchronisation of performance data generated in user space with the > > perf stream coming from the kernel. For example, the marker can be > > inserted by a JIT engine after it generated portion of the code, but > > before the code is executed for the first time, allowing the > > post-processor to pick the correct debugging information. > > The think I remember being raised was a unified means of these msgs > across perf/ftrace/lttng. I am not seeing that mentioned. Right. I was considering the "well known types repository" an attempt in this direction. Having said that - ftrace also takes a random blob as the trace marker, so the unification has to happen in userspace anyway. I'll have a look what LTTng has to say in this respect. > Also, I would like a stronger rationale for the @type argument, if it > has no actual meaning why is it separate from the binary msg data? Valid point. Without type 0 defined as a string, it doesn't bring anything into the equation. I just have a gut feeling that sooner than later we will want to split the messages somehow. Maybe we should make it a "reserved for future use, use 0 now" field? * struct { * struct perf_event_headerheader; * u32 __reserved; /* always 0 */ * u32 size; * chardata[size]; * char__padding[-size & 7]; * struct sample_idsample_id; * }; or, probably even better, make it a version value at a known offset (currently always 1, with just size and random sized data following). * struct { * struct perf_event_headerheader; * u32 version; /* use 1 */ * u32 size; * chardata[size]; * char__padding[-size & 7]; * struct sample_idsample_id; * }; So that we can mutate the user events format without too much of the pain - the parsers will simply complain about unknown format if such occurs and with the size of the record in the header, it is possible to skip it. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps
On Mon, 2015-01-05 at 13:00 +, Peter Zijlstra wrote: > On Thu, Nov 06, 2014 at 04:51:56PM +0000, Pawel Moll wrote: > > Documentation/kernel-parameters.txt | 9 + > > kernel/events/core.c| 37 > > + > > 2 files changed, 46 insertions(+) > > > diff --git a/Documentation/kernel-parameters.txt > > b/Documentation/kernel-parameters.txt > > index 4c81a86..8ead8d8 100644 > > --- a/Documentation/kernel-parameters.txt > > +++ b/Documentation/kernel-parameters.txt > > > @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can also be > > entirely omitted. > > allocator. This parameter is primarily for debugging > > and performance comparison. > > > > + perf_use_local_clock > > + [PERF] > > + Use local_clock() as a source for perf timestamps > > + generation. This was be the default behaviour and > > + this parameter can be used to maintain backward > > + compatibility or on older hardware with expensive > > + monotonic clock source. > > + > > pf. [PARIDE] > > See Documentation/blockdev/paride.txt. > > So I'm always terminally confused on the naming of kernel parameters, > sometimes things are modules (even when they're not actually =m capable) > and get a module::foo naming or so and sometimes they're not. I guess you mean module.foo? > So we want to use the module naming thing or not? Honestly, I don't mind either way. For one thing ftrace doesn't bother and just uses __setup() as well. > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 2b02c9f..5d0aa03 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > > @@ -322,8 +323,41 @@ extern __weak const char *perf_pmu_name(void) > > return "pmu"; > > } > > > > +static bool perf_use_local_clock; > > +static int __init perf_use_local_clock_setup(char *__unused) > > +{ > > + perf_use_local_clock = true; > > + return 1; > > +} > > +__setup("perf_use_local_clock", perf_use_local_clock_setup); > > > static inline u64 perf_clock(void) > > { > > + if (likely(!perf_use_local_clock)) > > + return ktime_get_mono_fast_ns(); > > + > > return local_clock(); > > } > > Since this all is boot time, should we not use things like static_key > and avoid the 'pointless' conditional at runtime? Right. it's good to learn new stuff - I didn't know there was architecture-agnostic support for jump labels. Definitely worth the effort, will give it a try and spin the patch. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps
On Mon, 2015-01-05 at 13:01 +, Peter Zijlstra wrote: > On Thu, Dec 11, 2014 at 01:39:13PM +0000, Pawel Moll wrote: > > On Thu, 2014-11-27 at 15:05 +0000, Pawel Moll wrote: > > > It's been 3 weeks without any negative feedback (no feedback at all, but > > > I take the optimistic view :-)... > > > > > > How about queuing at least this patch alone for the incoming merge > > > window? Or at least getting it into -next, with the view at 3.20? > > > > It's been another two weeks... How about getting it queued for v3.20 > > then? > > Yes sorry about that, I had me a kid and took a wee bit of time away > from computers, then xmas and new years happened. "a kid" as in: you have forked? :-) It will happen to me in 1.5 month time (or earlier). All the best luck, we'll both need it ;-) > In any case, best wishes for the new year to all. Indeed. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mfd: vexpress: Remove non-DT code
Now, as all VE platforms have to be booted with DT, the code handling non-DT case can be removed. Signed-off-by: Pawel Moll --- drivers/mfd/vexpress-sysreg.c | 71 +-- 1 file changed, 15 insertions(+), 56 deletions(-) diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c index 8f43ab8..3e628df 100644 --- a/drivers/mfd/vexpress-sysreg.c +++ b/drivers/mfd/vexpress-sysreg.c @@ -47,71 +47,26 @@ #define SYS_HBI_MASK 0xfff #define SYS_PROCIDx_HBI_SHIFT 0 -#define SYS_MCI_CARDIN (1 << 0) -#define SYS_MCI_WPROT (1 << 1) - #define SYS_MISC_MASTERSITE(1 << 14) - -static void __iomem *__vexpress_sysreg_base; - -static void __iomem *vexpress_sysreg_base(void) +void vexpress_flags_set(u32 data) { - if (!__vexpress_sysreg_base) { + static void __iomem *base; + + if (!base) { struct device_node *node = of_find_compatible_node(NULL, NULL, "arm,vexpress-sysreg"); - __vexpress_sysreg_base = of_iomap(node, 0); + base = of_iomap(node, 0); } - WARN_ON(!__vexpress_sysreg_base); - - return __vexpress_sysreg_base; -} - - -static int vexpress_sysreg_get_master(void) -{ - if (readl(vexpress_sysreg_base() + SYS_MISC) & SYS_MISC_MASTERSITE) - return VEXPRESS_SITE_DB2; - - return VEXPRESS_SITE_DB1; -} - -void vexpress_flags_set(u32 data) -{ - writel(~0, vexpress_sysreg_base() + SYS_FLAGSCLR); - writel(data, vexpress_sysreg_base() + SYS_FLAGSSET); -} - -unsigned int vexpress_get_mci_cardin(struct device *dev) -{ - return readl(vexpress_sysreg_base() + SYS_MCI) & SYS_MCI_CARDIN; -} - -u32 vexpress_get_procid(int site) -{ - if (site == VEXPRESS_SITE_MASTER) - site = vexpress_sysreg_get_master(); + if (WARN_ON(!base)) + return; - return readl(vexpress_sysreg_base() + (site == VEXPRESS_SITE_DB1 ? - SYS_PROCID0 : SYS_PROCID1)); + writel(~0, base + SYS_FLAGSCLR); + writel(data, base + SYS_FLAGSSET); } -void __iomem *vexpress_get_24mhz_clock_base(void) -{ - return vexpress_sysreg_base() + SYS_24MHZ; -} - - -void __init vexpress_sysreg_early_init(void __iomem *base) -{ - __vexpress_sysreg_base = base; - - vexpress_config_set_master(vexpress_sysreg_get_master()); -} - - /* The sysreg block is just a random collection of various functions... */ static struct syscon_platform_data vexpress_sysreg_sys_id_pdata = { @@ -210,6 +165,7 @@ static int vexpress_sysreg_probe(struct platform_device *pdev) struct resource *mem; void __iomem *base; struct bgpio_chip *mmc_gpio_chip; + int master; u32 dt_hbi; mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); @@ -220,11 +176,14 @@ static int vexpress_sysreg_probe(struct platform_device *pdev) if (!base) return -ENOMEM; - vexpress_config_set_master(vexpress_sysreg_get_master()); + master = readl(base + SYS_MISC) & SYS_MISC_MASTERSITE ? + VEXPRESS_SITE_DB2 : VEXPRESS_SITE_DB1; + vexpress_config_set_master(master); /* Confirm board type against DT property, if available */ if (of_property_read_u32(of_root, "arm,hbi", &dt_hbi) == 0) { - u32 id = vexpress_get_procid(VEXPRESS_SITE_MASTER); + u32 id = readl(base + (master == VEXPRESS_SITE_DB1 ? +SYS_PROCID0 : SYS_PROCID1)); u32 hbi = (id >> SYS_PROCIDx_HBI_SHIFT) & SYS_HBI_MASK; if (WARN_ON(dt_hbi != hbi)) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] power/reset: vexpress: Remove non-DT code
Now, as all VE platforms have to be booted with DT, the code handling non-DT case can be removed. Signed-off-by: Pawel Moll --- drivers/power/reset/vexpress-poweroff.c | 17 +++-- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/drivers/power/reset/vexpress-poweroff.c b/drivers/power/reset/vexpress-poweroff.c index 9dfc9ce..be12d9b 100644 --- a/drivers/power/reset/vexpress-poweroff.c +++ b/drivers/power/reset/vexpress-poweroff.c @@ -111,23 +111,20 @@ static int _vexpress_register_restart_handler(struct device *dev) static int vexpress_reset_probe(struct platform_device *pdev) { - enum vexpress_reset_func func; const struct of_device_id *match = of_match_device(vexpress_reset_of_match, &pdev->dev); struct regmap *regmap; int ret = 0; - if (match) - func = (enum vexpress_reset_func)match->data; - else - func = pdev->id_entry->driver_data; + if (!match) + return -EINVAL; regmap = devm_regmap_init_vexpress_config(&pdev->dev); if (IS_ERR(regmap)) return PTR_ERR(regmap); dev_set_drvdata(&pdev->dev, regmap); - switch (func) { + switch ((enum vexpress_reset_func)match->data) { case FUNC_SHUTDOWN: vexpress_power_off_device = &pdev->dev; pm_power_off = vexpress_power_off; @@ -144,20 +141,12 @@ static int vexpress_reset_probe(struct platform_device *pdev) return ret; } -static const struct platform_device_id vexpress_reset_id_table[] = { - { .name = "vexpress-reset", .driver_data = FUNC_RESET, }, - { .name = "vexpress-shutdown", .driver_data = FUNC_SHUTDOWN, }, - { .name = "vexpress-reboot", .driver_data = FUNC_REBOOT, }, - {} -}; - static struct platform_driver vexpress_reset_driver = { .probe = vexpress_reset_probe, .driver = { .name = "vexpress-reset", .of_match_table = vexpress_reset_of_match, }, - .id_table = vexpress_reset_id_table, }; static int __init vexpress_reset_init(void) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps
On Thu, 2014-11-27 at 15:05 +, Pawel Moll wrote: > On Thu, 2014-11-06 at 16:51 +0000, Pawel Moll wrote: > > Until now, perf framework never defined the meaning of the timestamps > > captured as PERF_SAMPLE_TIME sample type. The values were obtaining > > from local (sched) clock, which is unavailable in userspace. This made > > it impossible to correlate perf data with any other events. Other > > tracing solutions have the source configurable (ftrace) or just share > > a common time domain between kernel and userspace (LTTng). > > > > Follow the trend by using monotonic clock, which is readily available > > as POSIX CLOCK_MONOTONIC. > > > > Also add a sysctl "perf_sample_time_clk_id" attribute which can be used > > by the user to obtain the clk_id to be used with POSIX clock API (eg. > > clock_gettime()) to obtain a time value comparable with perf samples. > > > > Old behaviour can be restored by using "perf_use_local_clock" kernel > > parameter. > > > > Signed-off-by: Pawel Moll > > It's been 3 weeks without any negative feedback (no feedback at all, but > I take the optimistic view :-)... > > How about queuing at least this patch alone for the incoming merge > window? Or at least getting it into -next, with the view at 3.20? It's been another two weeks... How about getting it queued for v3.20 then? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] MAINTAINERS: ARM Versatile Express platform
On Fri, 2014-11-28 at 17:56 +, Joe Perches wrote: > On Wed, 2014-11-26 at 12:13 +0000, Pawel Moll wrote: > > This patch adds a separate section for the ARM > > Versatile Express platform maintainers, listing > > all different bits and bobs used by it. > > > > Acked-by: Liviu Dudau > > Acked-by: Sudeep Holla > > Acked-by: Lorenzo Pieralisi > > Signed-off-by: Pawel Moll > > Acked-by: Michael Turquette > > --- > > MAINTAINERS | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 0ff630d..924454f 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -1503,6 +1503,18 @@ S: Maintained > > F: drivers/clk/ux500/ > > F: include/linux/platform_data/clk-ux500.h > > > > +ARM/VERSATILE EXPRESS PLATFORM > > +M: Liviu Dudau > > +M: Sudeep Holla > > +M: Lorenzo Pieralisi > > +L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) > > +S: Maintained > > +F: arch/arm/boot/dts/vexpress* > > +F: arch/arm/mach-vexpress/ > > +F: */*/vexpress* > > +F: drivers/clk/versatile/clk-vexpress-osc.c > > +F: drivers/clocksource/versatile.c > > Your original submission had: > > F: drivers/power/reset/vexpress-poweroff.c > > This file is not matched by: > > F:*/*/vexpress* Oh boy, of course. I'll send fix in a second. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 RESEND 1/2] ARM: vexpress: Remove non-DT code
On Fri, 2014-11-28 at 16:31 +, Arnd Bergmann wrote: > I would suggest applying your patch from > http://thread.gmane.org/gmane.linux.ports.arm.kernel/223426/focus=224128 Yes, please :-) Do you want me to resend? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 RESEND 1/2] ARM: vexpress: Remove non-DT code
On Fri, 2014-11-28 at 15:30 +, Russell King - ARM Linux wrote: > mmci-pl18x 10005000.mmci: no support for card's volts > mmc0: error -22 whilst initialising SD card Just tried it again: --8< ARM Versatile Express Boot Monitor Version:V5.2.1 Build Date: Apr 4 2013 Daughterboard Site 1: V2P-CA9 Cortex A9 Daughterboard Site 2: Not Used Running boot script from flash - BOOTSCRIPT Loaded FDT - ca9 Loaded initrd - busybox Copied 0x40 bytes from 0x4800 to 0x60008000 Booting kernel @ 0x60008000 Command line 'console=ttyAMA0,38400 earlyprintk debug ' initrd @ 0x7000 (4127012 bytes) FDT @ 0x6900 (14048 bytes) Uncompressing Linux... done, booting the kernel. Booting Linux on physical CPU 0x0 Initializing cgroup subsys cpuset Linux version 3.18.0-rc6+ (pawmol01@hornet) (gcc version 4.9.2 20140904 (prerelease) (crosstool-NG linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09) ) #1442 SMP Mon Nov 24 16:43:30 GMT 2014 CPU: ARMv7 Processor [410fc091] revision 1 (ARMv7), cr=10c5387d CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache Machine model: V2P-CA9 bootconsole [earlycon0] enabled [...] mmci-pl18x 10005000.mmci: Got CD GPIO mmci-pl18x 10005000.mmci: Got WP GPIO mmci-pl18x 10005000.mmci: Looking up vmmc-supply from device tree mmci-pl18x 10005000.mmci: Looking up vqmmc-supply from device tree mmci-pl18x 10005000.mmci: Looking up vqmmc-supply property in node /smb/motherboard/iofpga@7,/mmci@05000 failed mmci-pl18x 10005000.mmci: No vqmmc regulator found mmci-pl18x 10005000.mmci: mmc0: PL180 manf 41 rev0 at 0x10005000 irq 41,42 (pio) [...] mmc0: new SD card at address 88f7 mmcblk0: mmc0:88f7 SD02G 1.84 GiB mmcblk0: p1 p2 [...] / # mount /dev/mmcblk0p2 /mnt EXT2-fs (mmcblk0p2): warning: mounting unchecked fs, running e2fsck is recommended --8< vmmc-supply is a defined as: vmmc-supply = <&v2m_fixed_3v3>; which is: v2m_fixed_3v3: fixedregulator@0 { compatible = "regulator-fixed"; and arch/arm/mach-vexpress/Kconfig says: menuconfig ARCH_VEXPRESS [...] select REGULATOR_FIXED_VOLTAGE if REGULATOR but it doesn't select REGULATOR in result of this discussion: http://thread.gmane.org/gmane.linux.ports.arm.kernel/223426/focus=223427 To summarize, I believe that setting CONFIG_REGULATOR=y (as in vexpress_defconfig) should solve your issue. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps
On Thu, 2014-11-06 at 16:51 +, Pawel Moll wrote: > Until now, perf framework never defined the meaning of the timestamps > captured as PERF_SAMPLE_TIME sample type. The values were obtaining > from local (sched) clock, which is unavailable in userspace. This made > it impossible to correlate perf data with any other events. Other > tracing solutions have the source configurable (ftrace) or just share > a common time domain between kernel and userspace (LTTng). > > Follow the trend by using monotonic clock, which is readily available > as POSIX CLOCK_MONOTONIC. > > Also add a sysctl "perf_sample_time_clk_id" attribute which can be used > by the user to obtain the clk_id to be used with POSIX clock API (eg. > clock_gettime()) to obtain a time value comparable with perf samples. > > Old behaviour can be restored by using "perf_use_local_clock" kernel > parameter. > > Signed-off-by: Pawel Moll It's been 3 weeks without any negative feedback (no feedback at all, but I take the optimistic view :-)... How about queuing at least this patch alone for the incoming merge window? Or at least getting it into -next, with the view at 3.20? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] MAINTAINERS: ARM Versatile Express platform
On Wed, 2014-11-26 at 12:17 +, Arnd Bergmann wrote: > On Wednesday 26 November 2014 12:04:09 Pawel Moll wrote: > > > > Right, of course. Whole directory. > > > > > F:*/*/vexpress* > > > > Cool, didn't think about multiple wildcards. > > Actually, I think you could even do this as > > N:vexpress > > which would be even shorter and match most of the other entries too. If you think it's safe, fine with me. It would reduce this section to: N: vexpress F: drivers/clocksource/versatile.c I'll spin the patch again. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] MAINTAINERS: ARM Versatile Express platform
This patch adds a separate section for the ARM Versatile Express platform maintainers, listing all different bits and bobs used by it. Acked-by: Liviu Dudau Acked-by: Sudeep Holla Acked-by: Lorenzo Pieralisi Signed-off-by: Pawel Moll Acked-by: Michael Turquette --- MAINTAINERS | 12 1 file changed, 12 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 0ff630d..924454f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1503,6 +1503,18 @@ S: Maintained F: drivers/clk/ux500/ F: include/linux/platform_data/clk-ux500.h +ARM/VERSATILE EXPRESS PLATFORM +M: Liviu Dudau +M: Sudeep Holla +M: Lorenzo Pieralisi +L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) +S: Maintained +F: arch/arm/boot/dts/vexpress* +F: arch/arm/mach-vexpress/ +F: */*/vexpress* +F: drivers/clk/versatile/clk-vexpress-osc.c +F: drivers/clocksource/versatile.c + ARM/VFP SUPPORT M: Russell King L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] MAINTAINERS: ARM Versatile Express platform
On Wed, 2014-11-26 at 01:59 +, Joe Perches wrote: > On Tue, 2014-11-25 at 17:54 -0800, Mike Turquette wrote: > > Quoting Pawel Moll (2014-11-25 10:17:35) > > > This patch adds a separate section for the ARM > > > Versatile Express platform maintainers, listing > > > all different bits and bobs used by it. > [] > > > diff --git a/MAINTAINERS b/MAINTAINERS > [] > > > +ARM/VERSATILE EXPRESS PLATFORM > [] > > > +F: arch/arm/boot/dts/vexpress-* > > > +F: arch/arm/mach-vexpress/* > > > +F: drivers/bus/vexpress-config.c > > > +F: drivers/clk/versatile/clk-vexpress-osc.c > > > +F: drivers/clocksource/versatile.c > > > +F: drivers/cpufreq/vexpress-spc-cpufreq.c > > > +F: drivers/hwmon/vexpress.c > > > +F: drivers/mfd/vexpress-sysreg.c > > > +F: drivers/misc/vexpress-vexpress-syscfg.c > > > +F: drivers/power/reset/vexpress-poweroff.c > > > +F: drivers/regulator/vexpress.c > > > +F: include/linux/vexpress.h > > > > > maybe > F:arch/arm/boot/dts/vexpress* Ok. > F:arch/arm/mach-vexpress/ Right, of course. Whole directory. > F:*/*/vexpress* Cool, didn't think about multiple wildcards. > F:*/*/*/*vexpress* I'd keep the clk-vexpress-osc.c explicit - it's an odd one, and don't feel comfortable with the wildcard before vexpress. > F:drivers/clocksource/versatile.c Thanks for the suggestion, sill spin the patch in a second. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] MAINTAINERS: ARM Versatile Express platform
This patch adds a separate section for the ARM Versatile Express platform maintainers, listing all different bits and bobs used by it. Acked-by: Liviu Dudau Acked-by: Sudeep Holla Acked-by: Lorenzo Pieralisi Signed-off-by: Pawel Moll --- MAINTAINERS | 19 +++ 1 file changed, 19 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 0ff630d..e0cb002 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1503,6 +1503,25 @@ S: Maintained F: drivers/clk/ux500/ F: include/linux/platform_data/clk-ux500.h +ARM/VERSATILE EXPRESS PLATFORM +M: Liviu Dudau +M: Sudeep Holla +M: Lorenzo Pieralisi +L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) +S: Maintained +F: arch/arm/boot/dts/vexpress-* +F: arch/arm/mach-vexpress/* +F: drivers/bus/vexpress-config.c +F: drivers/clk/versatile/clk-vexpress-osc.c +F: drivers/clocksource/versatile.c +F: drivers/cpufreq/vexpress-spc-cpufreq.c +F: drivers/hwmon/vexpress.c +F: drivers/mfd/vexpress-sysreg.c +F: drivers/misc/vexpress-vexpress-syscfg.c +F: drivers/power/reset/vexpress-poweroff.c +F: drivers/regulator/vexpress.c +F: include/linux/vexpress.h + ARM/VFP SUPPORT M: Russell King L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 RESEND 1/2] ARM: vexpress: Remove non-DT code
Now, with the CLCD DT support available, there is no more reason to keep the non-DT support for V2P-CA9. Removed, together with "some" supporting code. It was necessary to make PLAT_VERSATILE_SCHED_CLOCK optional and selected by the machines still interested in it. Acked-by: Mike Turquette Signed-off-by: Pawel Moll --- Changes since v1: - Removed MFD and power/reset changes from the patch; will be posted separately after this is merged. arch/arm/Kconfig | 2 + arch/arm/mach-vexpress/Kconfig| 3 - arch/arm/mach-vexpress/Makefile | 3 +- arch/arm/mach-vexpress/core.h | 7 - arch/arm/mach-vexpress/ct-ca9x4.c | 212 arch/arm/mach-vexpress/include/mach/ct-ca9x4.h| 47 --- arch/arm/mach-vexpress/include/mach/hardware.h| 1 - arch/arm/mach-vexpress/include/mach/irqs.h| 6 - arch/arm/mach-vexpress/include/mach/motherboard.h | 88 - arch/arm/mach-vexpress/platsmp.c | 42 --- arch/arm/mach-vexpress/v2m.c | 374 -- arch/arm/plat-versatile/Kconfig | 2 +- drivers/clk/versatile/Makefile| 1 - drivers/clk/versatile/clk-vexpress-osc.c | 7 - drivers/clk/versatile/clk-vexpress.c | 86 - drivers/misc/vexpress-syscfg.c| 60 +--- include/linux/vexpress.h | 19 -- 17 files changed, 17 insertions(+), 943 deletions(-) delete mode 100644 arch/arm/mach-vexpress/ct-ca9x4.c delete mode 100644 arch/arm/mach-vexpress/include/mach/ct-ca9x4.h delete mode 100644 arch/arm/mach-vexpress/include/mach/hardware.h delete mode 100644 arch/arm/mach-vexpress/include/mach/irqs.h delete mode 100644 arch/arm/mach-vexpress/include/mach/motherboard.h delete mode 100644 drivers/clk/versatile/clk-vexpress.c diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 89c4b5c..6c6fdb8 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -350,6 +350,7 @@ config ARCH_REALVIEW select ICST select NEED_MACH_MEMORY_H select PLAT_VERSATILE + select PLAT_VERSATILE_SCHED_CLOCK help This enables support for ARM Ltd RealView boards. @@ -365,6 +366,7 @@ config ARCH_VERSATILE select ICST select PLAT_VERSATILE select PLAT_VERSATILE_CLOCK + select PLAT_VERSATILE_SCHED_CLOCK select VERSATILE_FPGA_IRQ help This enables support for ARM Ltd Versatile board. diff --git a/arch/arm/mach-vexpress/Kconfig b/arch/arm/mach-vexpress/Kconfig index b2cfba16c..9a96bab 100644 --- a/arch/arm/mach-vexpress/Kconfig +++ b/arch/arm/mach-vexpress/Kconfig @@ -49,9 +49,6 @@ config ARCH_VEXPRESS_CORTEX_A5_A9_ERRATA build a working kernel, you must also enable relevant core tile support or Flattened Device Tree based support options. -config ARCH_VEXPRESS_CA9X4 - bool "Versatile Express Cortex-A9x4 tile" - config ARCH_VEXPRESS_DCSCB bool "Dual Cluster System Control Block (DCSCB) support" depends on MCPM diff --git a/arch/arm/mach-vexpress/Makefile b/arch/arm/mach-vexpress/Makefile index fc649bc..f5c1006 100644 --- a/arch/arm/mach-vexpress/Makefile +++ b/arch/arm/mach-vexpress/Makefile @@ -1,11 +1,10 @@ # # Makefile for the linux kernel. # -ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := -I$(srctree)/$(src)/include \ +ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := \ -I$(srctree)/arch/arm/plat-versatile/include obj-y := v2m.o -obj-$(CONFIG_ARCH_VEXPRESS_CA9X4) += ct-ca9x4.o obj-$(CONFIG_ARCH_VEXPRESS_DCSCB) += dcscb.o dcscb_setup.o CFLAGS_dcscb.o += -march=armv7-a CFLAGS_REMOVE_dcscb.o = -pg diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h index 152fad9..2a11d3a 100644 --- a/arch/arm/mach-vexpress/core.h +++ b/arch/arm/mach-vexpress/core.h @@ -1,12 +1,5 @@ -/* 2MB large area for motherboard's peripherals static mapping */ -#define V2M_PERIPH 0xf800 - -/* Tile's peripherals static mappings should start here */ -#define V2T_PERIPH 0xf820 - bool vexpress_smp_init_ops(void); -extern struct smp_operations vexpress_smp_ops; extern struct smp_operations vexpress_smp_dt_ops; extern void vexpress_cpu_die(unsigned int cpu); diff --git a/arch/arm/mach-vexpress/ct-ca9x4.c b/arch/arm/mach-vexpress/ct-ca9x4.c deleted file mode 100644 index 27bea04..000 --- a/arch/arm/mach-vexpress/ct-ca9x4.c +++ /dev/null @@ -1,212 +0,0 @@ -/* - * Versatile Express Core Tile Cortex A9x4 Support - */ -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include - -#include - -#include - -#include -#include - -#include "core.h" - -#include -#i
Re: [PATCH] video: ARM CLCD: Remove duplicated include in amba-clcd.c
On Mon, 2014-11-17 at 09:25 +, Qiang Chen wrote: > This patch fixes duplicated include dma-mapping.h in > drivers/video/fbdev/amba-clcd.c > > Signed-off-by: Qiang Chen > --- > drivers/video/fbdev/amba-clcd.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/video/fbdev/amba-clcd.c b/drivers/video/fbdev/amba-clcd.c > index 6ad23bd..32c0b6b 100644 > --- a/drivers/video/fbdev/amba-clcd.c > +++ b/drivers/video/fbdev/amba-clcd.c > @@ -27,7 +27,6 @@ > #include > #include > #include > -#include > #include > #include > #include Acked-by: Pawel Moll Thanks! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] virtio-mmio: support for multiple irqs
On Thu, 2014-11-13 at 09:39 +, Shannon Zhao wrote: > When we use only virtio-mmio or vhost-net without irqfd, the device uses > qemu_set_irq(within qemu) > to inject interrupt and at the same time qemu update > "VIRTIO_MMIO_INTERRUPT_STATUS" to tell guest > driver whom this interrupt to. All these things are done by qemu. Though > qemu_set_irq will call > kvm_set_irq to do the real interrupt inject, but the trigger is qemu and it > can update the > "VIRTIO_MMIO_INTERRUPT_STATUS" register before injecting the interrupt. > > But if we use vhost-net with irqfd, the device uses ioeventfd mechanism to > inject interrupt. > When an interrupt happened, it doesn't transfer to qemu, while the irqfd > finally call kvm_set_irq > to inject the interrupt directly. All these things are done in kvm and it > can't update the > "VIRTIO_MMIO_INTERRUPT_STATUS" register. > So if the guest driver still uses the old interrupt handler, it has to read > the > "VIRTIO_MMIO_INTERRUPT_STATUS" register to get the interrupt reason and run > different handlers > for different reasons. But the register has nothing and none of handlers will > be called. > > I make myself clear? :-) I see... (well, at least I believe I see ;-) Of course this means that with irqfd, the device is simply non-compliant with the spec. And that extending the status register doesn't help you at all, so we can drop this idea. Paradoxically, it's a good news (of a sort :-) because I don't think we should be doing such a fundamental change in the spec at this date. I'll discuss it with others in the TC and I'm open to be convinced otherwise, but I believe the spec should stay as it is. Having said that, I see no problem with experimenting with the device model so the next version of the standard is extended. Two suggestions I have would be: 1. Drop the virtio-pci like case with two interrupts (one for config changes, one for all queues), as it doesn't bring anything new. Just make it all interrupts individual. 2. Change the way the interrupts are acknowledge - instead of a bitmask, have a register which takes a queue number to ack the queue's interrupt and ~0 to ack the config interrupt. 3. Change the version of the device to (intially) 0x8003 - I've just made an executive decision :-) that non-standard compliant devices should have the MSB of the version number set (I'll propose to reserve this range of versions in future version of the spec). We'll consider it a "prototype of the version 3". Then make the driver behave in the new way when (and only when) such device version is observed. Also, I remembered I haven't addressed one of your previous comments: On Wed, 2014-11-12 at 08:32 +, Shannon Zhao wrote: > > One point I'd like to make is that the device was intentionally designed > > with simplicity in mind first, performance later (something about > > "embedded" etc" :-). Changing this assumption is of course possible, but > Ah, I think ARM is not only about embedded things. Maybe it could has > a wider application > such as micro server. Just my personal opinion. By all means, I couldn't agree more. But there's one thing you have to keep in mind - it doesn't matter whether the real hardware has got PCI or not, but what is emulated by qemu/KVM. Therefore, if only you can get the PCI host controller working in the qemu virt machine (which should be much simpler now, as we have generic support for PCI controllers/bridges in the kernel now), you'll be able to forget the issue of virtio-mmio and multiple interrupts and still enjoy your performance gains :-) Does it all make sense? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] virtio-mmio: support for multiple irqs
On Wed, 2014-11-12 at 08:32 +, Shannon Zhao wrote: > On 2014/11/11 23:11, Pawel Moll wrote: > > On Tue, 2014-11-04 at 09:35 +, Shannon Zhao wrote: > >> As the current virtio-mmio only support single irq, > >> so some advanced features such as vhost-net with irqfd > >> are not supported. And the net performance is not > >> the best without vhost-net and irqfd supporting. > > > > Could you, please, help understanding me where does the main issue is? > > Is it about: > > > > 1. The fact that the existing implementation blindly kicks all queues, > > instead only of the updated ones? > > > > or: > > > > 2. Literally having a dedicated interrupt line (remember, we're talking > > "real" interrupts here, not message signalled ones) per queue, so they > > can be handled by different processors at the same time? > > > The main issue is that current virtio-mmio only support one interrupt which > is shared by > config and queues. Yes. Additional question: is your device changing the configuration space often? Other devices (for example block devices) change configuration very rarely (eg. change of the media size, usually meaning hot-un-plugging), so unless you tell me that the config space is frequently changes, I'll consider the config event irrelevant from the performance point of view. > Therefore the virtio-mmio driver should read the > "VIRTIO_MMIO_INTERRUPT_STATUS" to get the interrupt reason and check whom > this interrupt is to. Correct. > If we use vhost-net which uses irqfd to inject interrupt, the > vhost-net doesn't update "VIRTIO_MMIO_INTERRUPT_STATUS", then the > guest driver can't read the interrupt reason and doesn't call a > handler to process. Ok, so this is the bit I don't understand. Explain, please how does this whole thing work. And keep in mind that I've just looked up "irqfd" for the first time in my life, so know nothing about its implementation. The same applies to "vhost-net". I'm simply coming from a completely different background, so bear with me on this. Now, the virtio-mmio device *must* behave in a certain way, as specified in both the old and the new version of the spec. If a Used Ring of *any* of the queues has been updated the device *must* set bit 0 of the interrupt status register, and - in effect - generate the interrupt. But you are telling me that in some circumstances the interrupt is *not* generated when a queue requires handling. Sounds like a bug to me, as it is not following the requirements of the device specification. It is quite likely that I simply don't see some facts which are obvious for you, so please - help me understand. > So we can assign a dedicated interrupt line per queue for virtio-mmio > and it can work with irqfd. > If my reasoning above is correct, I'd say that you are just working around a functional bug, not improving performance. And this is a wrong way of solving the problem. > Theoretically the number of interrupts has no limit, but as the limit > of ARM interrupt line, the number should be less than ARM interrupt > lines. Let me just emphasize the fact that virtio-mmio is not related to ARM in any way, it's just a memory mapped device with a generic interrupt output. Any limitations of a particular hardware platform (I guess you're referring to the maximum number of peripheral interrupts a GIC can take) are irrelevant - if we go with multiple interrupts, it must cater for as many interrupt lines as one wishes... :-) > In the real situation, I think, the number > is generally less than 17 (8 pairs of vring interrupts and one config > interrupt). ... but of course we could optimize for the real scenarios. And I'm glad to hear that the number you quoted is less then 30 :-) > >> we use GSI for multiple irq. > > > > I'm not sure what GSI stands for, but looking at the code I assume it's > > just a "normal" peripheral interrupt. > > > >> In this patch we use "vm_try_to_find_vqs" > >> to check whether multiple irqs are supported like > >> virtio-pci. > > > > Yeah, I can see that you have followed virtio-pci quite literally. I'm > > particularly not convinced to the one interrupt for config, one for all > > queues option. Doesn't make any sense to me here. > > > About one interrupt for all queues, it's not a typical case. But just offer > one more choice for users. Users should configure the number of interrupts > according to their situation. > >> Is this the right direction? is there other ways to > >> make virtio-mmio support multiple irq? Hope for feed
Re: [RFC PATCH] virtio-mmio: support for multiple irqs
On Tue, 2014-11-04 at 09:35 +, Shannon Zhao wrote: > As the current virtio-mmio only support single irq, > so some advanced features such as vhost-net with irqfd > are not supported. And the net performance is not > the best without vhost-net and irqfd supporting. Could you, please, help understanding me where does the main issue is? Is it about: 1. The fact that the existing implementation blindly kicks all queues, instead only of the updated ones? or: 2. Literally having a dedicated interrupt line (remember, we're talking "real" interrupts here, not message signalled ones) per queue, so they can be handled by different processors at the same time? Now, if it's only about 1, the simplest solution would be to extend the VIRTIO_MMIO_INTERRUPT_STATUS register to signal up to 30 queues "readiness" in bits 2-31, still keeping bit 0 as a "combined" VIRTIO_MMIO_INT_VRING. In case when VIRTIO_MMIO_INT_VRING is set and none of the "individual" bits is (a device which doesn't support this feature or one that has more than 30 queues and of of those is ready) we would fall back to the original "kick all queues" approach. This could be a useful (and pretty simple) extension. In the worst case scenario it could be a post-1.0 standard addition, as it would provide backward compatibility. However, if it's about 2, we're talking larger changes here. From the device perspective, we can define it as having per-queue (plus one for config) interrupt output *and* a "combined" output, being simple logical "or" of all the others. Then, the Device Tree bindings would be used to express the implementation choices (I'd keep the kernel parameter approach supporting the single interrupt case only). This is a very popular and well understood approach for memory mapped peripherals (for example, see the . It allows the system integrator to make a decision when it's coming to latency vs number interrupt lines trade off. The main issue is that we can't really impose a limit on a number of queues, therefore on a number of interrupts. This would require adding a new "interrupt acknowledge" register, which would take a number of the queue (or a symbolic value for the config one) instead of a bit mask. And I must say that I'm not enjoying the idea of such substantial change to the specification that late in the process... (in other words: you'll have to put extra effort into convincing me :-) > This patch support virtio-mmio to request multiple > irqs like virtio-pci. With this patch and qemu assigning > multiple irqs for virtio-mmio device, it's ok to use > vhost-net with irqfd on arm/arm64. Could you please tell me how many queues (interrupts) are we talking about in this case? 5? A dozen? Hundreds? Disclaimer: I have no personal experience with virtio and network (due to the fact how our Fast Models are implemented, I mostly us block devices and 9p protocol over virtio and I get enough performance from them :-). > As arm doesn't support msi-x now, To be precise: "ARM" does "support" MSI-X :-) (google for GICv2m) The correct statement would be: "normal memory mapped devices have no interface for message signalled interrupts (like MSI-X)" > we use GSI for multiple irq. I'm not sure what GSI stands for, but looking at the code I assume it's just a "normal" peripheral interrupt. > In this patch we use "vm_try_to_find_vqs" > to check whether multiple irqs are supported like > virtio-pci. Yeah, I can see that you have followed virtio-pci quite literally. I'm particularly not convinced to the one interrupt for config, one for all queues option. Doesn't make any sense to me here. > Is this the right direction? is there other ways to > make virtio-mmio support multiple irq? Hope for feedback. One point I'd like to make is that the device was intentionally designed with simplicity in mind first, performance later (something about "embedded" etc" :-). Changing this assumption is of course possible, but - I must say - makes me slightly uncomfortable... The extensions we're discussing here seem doable, but I've noticed your other patches doing with a shared memory region and I didn't like them at all, sorry. I see the subject has been already touched in the discussions, but let me bring PCI to the surface again. We're getting more server-class SOCs in the market, which obviously bring PCI with them to both arm and arm64 world, something unheard of in the "mobile past". I believe the PCI patches for the arm64 have been already merged in the kernel. Therefore: I'm not your boss so, obviously, I can't tell you what to do, but could you consider redirecting your efforts into getting the "ARM PCI" up and running in qemu so you can simply use the existing infrastructure? This would save us a lot of work and pain in doing late functional changes to the standard and will be probably more future-proof from your perspective (PCI will happen, sooner or later - you can make it sooner ;-) Regards Pawel -- To unsubscribe from this list:
[PATCH v4 1/3] perf: Use monotonic clock as a source for timestamps
Until now, perf framework never defined the meaning of the timestamps captured as PERF_SAMPLE_TIME sample type. The values were obtaining from local (sched) clock, which is unavailable in userspace. This made it impossible to correlate perf data with any other events. Other tracing solutions have the source configurable (ftrace) or just share a common time domain between kernel and userspace (LTTng). Follow the trend by using monotonic clock, which is readily available as POSIX CLOCK_MONOTONIC. Also add a sysctl "perf_sample_time_clk_id" attribute which can be used by the user to obtain the clk_id to be used with POSIX clock API (eg. clock_gettime()) to obtain a time value comparable with perf samples. Old behaviour can be restored by using "perf_use_local_clock" kernel parameter. Signed-off-by: Pawel Moll --- Ingo, I remember your comments about this approach in the past, but during discussions at LPC Thomas was convinced that it's the right thing to do - see cover letter for the series... Changes since v3: - Added "perf_use_lock_clock" parameter... - ... and creating the sysctl value only when it's not defined (turned out that negative clk_ids are not invalid - they are used to describe dynamic clocks) - Had to keep sysctl_perf_sample_time_clk_id non-const, because struct ctl_table.data is non-const Documentation/kernel-parameters.txt | 9 + kernel/events/core.c| 37 + 2 files changed, 46 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 4c81a86..8ead8d8 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -91,6 +91,7 @@ parameter is applicable: NUMANUMA support is enabled. NFS Appropriate NFS support is enabled. OSS OSS sound support is enabled. + PERFPerformance events and counters support is enabled. PV_OPS A paravirtualized kernel is enabled. PARIDE The ParIDE (parallel port IDE) subsystem is enabled. PARISC The PA-RISC architecture is enabled. @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. allocator. This parameter is primarily for debugging and performance comparison. + perf_use_local_clock + [PERF] + Use local_clock() as a source for perf timestamps + generation. This was be the default behaviour and + this parameter can be used to maintain backward + compatibility or on older hardware with expensive + monotonic clock source. + pf. [PARIDE] See Documentation/blockdev/paride.txt. diff --git a/kernel/events/core.c b/kernel/events/core.c index 2b02c9f..5d0aa03 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "internal.h" @@ -322,8 +323,41 @@ extern __weak const char *perf_pmu_name(void) return "pmu"; } +static bool perf_use_local_clock; +static int __init perf_use_local_clock_setup(char *__unused) +{ + perf_use_local_clock = true; + return 1; +} +__setup("perf_use_local_clock", perf_use_local_clock_setup); + +static int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; + +static struct ctl_table perf_sample_time_kern_table[] = { + { + .procname = "perf_sample_time_clk_id", + .data = &sysctl_perf_sample_time_clk_id, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = proc_dointvec, + }, + {} +}; + +static struct ctl_table perf_sample_time_root_table[] = { + { + .procname = "kernel", + .mode = 0555, + .child = perf_sample_time_kern_table, + }, + {} +}; + static inline u64 perf_clock(void) { + if (likely(!perf_use_local_clock)) + return ktime_get_mono_fast_ns(); + return local_clock(); } @@ -8230,6 +8264,9 @@ void __init perf_event_init(void) */ BUILD_BUG_ON((offsetof(struct perf_event_mmap_page, data_head)) != 1024); + + if (!perf_use_local_clock) + register_sysctl_table(perf_sample_time_root_table); } static int __init perf_event_sysfs_init(void) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 2/3] perf: Userspace event
This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by any process to inject custom data into perf data stream as a new PERF_RECORD_UEVENT record, if such process is being observed or if it is running on a CPU being observed by the perf framework. The prctl call takes the following arguments: prctl(PR_TASK_PERF_UEVENT, type, size, data, flags); - type: a number meaning to describe content of the following data. Kernel does not pay attention to it and merely passes it further in the perf data, therefore its use must be agreed between the events producer (the process being observed) and the consumer (performance analysis tool). The perf userspace tool will contain a repository of "well known" types and reference implementation of their decoders. - size: Length in bytes of the data. - data: Pointer to the data. - flags: Reserved for future use. Always pass zero. Perf context that are supposed to receive events generated with the prctl above must be opened with perf_event_attr.uevent set to 1. The PERF_RECORD_UEVENT records consist of a standard perf event header, 32-bit type value, 32-bit data size and the data itself, followed by padding to align the overall record size to 8 bytes and optional, standard sample_id field. Example use cases: - "perf_printf" like mechanism to add logging messages to perf data; in the simplest case it can be just prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0); - synchronisation of performance data generated in user space with the perf stream coming from the kernel. For example, the marker can be inserted by a JIT engine after it generated portion of the code, but before the code is executed for the first time, allowing the post-processor to pick the correct debugging information. Signed-off-by: Pawel Moll --- Changes since v3: - none include/linux/perf_event.h | 4 +++ include/uapi/linux/perf_event.h | 23 - include/uapi/linux/prctl.h | 10 ++ kernel/events/core.c| 71 + kernel/sys.c| 5 +++ 5 files changed, 112 insertions(+), 1 deletion(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 893a0d0..680767d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -721,6 +721,8 @@ extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks extern void perf_event_exec(void); extern void perf_event_comm(struct task_struct *tsk, bool exec); extern void perf_event_fork(struct task_struct *tsk); +extern int perf_uevent(struct task_struct *tsk, u32 type, u32 size, + const char __user *data); /* Callchains */ DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry); @@ -829,6 +831,8 @@ static inline void perf_event_mmap(struct vm_area_struct *vma) { } static inline void perf_event_exec(void) { } static inline void perf_event_comm(struct task_struct *tsk, bool exec) { } static inline void perf_event_fork(struct task_struct *tsk){ } +static inline int perf_uevent(struct task_struct *tsk, u32 type, u32 size, + const char __user *data) { return -1; }; static inline void perf_event_init(void) { } static inline int perf_swevent_get_recursion_context(void){ return -1; } static inline void perf_swevent_put_recursion_context(int rctx) { } diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 9d84540..9a64eb1 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -303,7 +303,8 @@ struct perf_event_attr { exclude_callchain_user : 1, /* exclude user callchains */ mmap2 : 1, /* include mmap with inode data */ comm_exec : 1, /* flag comm events that are due to an exec */ - __reserved_1 : 39; + uevents: 1, /* allow uevents into the buffer */ + __reserved_1 : 38; union { __u32 wakeup_events;/* wakeup every n events */ @@ -712,6 +713,26 @@ enum perf_event_type { */ PERF_RECORD_MMAP2 = 10, + /* +* Data in userspace event record is transparent for the kernel +* +* Userspace perf tool code maintains a list of known types with +* reference implementations of parsers for the data field. +* +* Overall size of the record (including type and size fields) +* is always aligned to 8 bytes by adding padding after the data. +* +* struct { +* struct perf_event_headerheader; +* u32
[PATCH v4 3/3] perf: Sample additional clock value
This patch adds an option to sample value of an additional clock with any perf event, with the the aim of allowing time correlation between data coming from perf and other sources like hardware trace which is timestamped with an external clock. The idea is to generate periodic perf record containing timestamps from two different sources, sampled as close to each other as possible. This allows performing simple linear approximation: other event -O--+-O--> t_other : | : : V : -OO--> t_perf perf event User can request such samples for any standard perf event, with the most obvious examples of cpu-clock (hrtimer) at selected frequency or other periodic events like sched:sched_switch. In order to do this, PERF_SAMPLE_CLOCK has to be set in struct perf_event_attr.sample_type and a type of the clock to be sampled selected by setting perf_event_attr.clock to a value corresponding to a POSIX clock clk_id (see "man 2 clock_gettime") Currently three clocks are implemented: CLOCK_REALITME = 0, CLOCK_MONOTONIC = 1 and CLOCK_MONOTONIC_RAW = 2. The clock field is 5 bits wide to allow for future extension to custom, non-POSIX clock sources(MAX_CLOCK for those is 16, see include/uapi/linux/time.h) like ARM CoreSight (hardware trace) timestamp generator. Signed-off-by: Pawel Moll --- Changes since v3: - none include/linux/perf_event.h | 2 ++ include/uapi/linux/perf_event.h | 16 ++-- kernel/events/core.c| 30 ++ 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 680767d..b690a0d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -607,6 +607,8 @@ struct perf_sample_data { * Transaction flags for abort events: */ u64 txn; + /* Clock value (additional timestamp for time correlation) */ + u64 clock; }; /* default value for data source */ diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 9a64eb1..53a7a72 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -137,8 +137,9 @@ enum perf_event_sample_format { PERF_SAMPLE_DATA_SRC= 1U << 15, PERF_SAMPLE_IDENTIFIER = 1U << 16, PERF_SAMPLE_TRANSACTION = 1U << 17, + PERF_SAMPLE_CLOCK = 1U << 18, - PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */ }; /* @@ -304,7 +305,16 @@ struct perf_event_attr { mmap2 : 1, /* include mmap with inode data */ comm_exec : 1, /* flag comm events that are due to an exec */ uevents: 1, /* allow uevents into the buffer */ - __reserved_1 : 38; + + /* +* clock: one of the POSIX clock IDs: +* +* 0 - CLOCK_REALTIME +* 1 - CLOCK_MONOTONIC +* 4 - CLOCK_MONOTONIC_RAW +*/ + clock : 5, /* clock type */ + __reserved_1 : 33; union { __u32 wakeup_events;/* wakeup every n events */ @@ -544,6 +554,7 @@ enum perf_event_type { * { u64 id; } && PERF_SAMPLE_ID * { u64 stream_id;} && PERF_SAMPLE_STREAM_ID * { u32 cpu, res; } && PERF_SAMPLE_CPU +* { u64 clock;} && PERF_SAMPLE_CLOCK * { u64 id; } && PERF_SAMPLE_IDENTIFIER * } && perf_event_attr::sample_id_all * @@ -687,6 +698,7 @@ enum perf_event_type { * { u64 weight; } && PERF_SAMPLE_WEIGHT * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION +* { u64 clock;} && PERF_SAMPLE_CLOCK * }; */ PERF_RECORD_SAMPLE = 9, diff --git a/kernel/events/core.c b/kernel/events/core.c index 9a2d082..816baa6 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1264,6 +1264,9 @@ static void perf_event__header_size(s
[PATCH v4 0/3] perf: User/kernel time correlation and event generation
This is a second version of the post-LPC series. The changes are limited to the first patch, adding a backward-compatibility kernel parameter. The v3 cover letter: I've organised a session on the subject during the tracing minisummit at LPC 2014 in Dusseldorf. Notes taken from the discussion taken by Steven Rostedt (thanks Steve!) http://www.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_Tracing.txt The following three patches address three main topics. They are pretty much orthogonal and (subject to small and obvious modifications) could be applied independently from each other. An executive summary of both discussion and the patches: 1. User/kernel perf timestamps correlation Thomas suggested that, instead of jumping through loops of correlation, perf should simply generate monotonic clock timestamps, instead of using sched clock. Peter and I pointed out that Ingo didn't like this idea as monotonic can be slow, but apparently the cases when it is are irrelevant. Thomas offered to fly to Budapest to physically convince Ingo - I hope it won't be necessary and he will be able to achieve this here, on mailing lists :-) Setting aside potential performance problems, it would be a really great solution, unifying all trace systems (perf, ftrace and LTTng) in this respect. I'm more than happy to work on potential improvements in the are of monotonic clock if it was to help the cause. If it is a definite no-go, we still have the third patch, allowing post- capture correlation based on synchronisation events. 2. User event generation Everyone present agreed that it would be a very-nice-to-have feature. There was some discussion about implementation details, so I welcome feedback and comments regarding my take on the matter. 3. Correlation with external timestamps This is a new issue, which surfaced recently while I was working on hardware trace infrastructure. It can timestamp trace packets, but is using yet another, completely different time source to do this. Thomas suggested solution which gets down to my original proposal for sched/monotonic clock correlation - an additional sample type so events can be "double stamped" using different clock sources providing synchronisation points for later time approximation. I've just extended the implementation with configuration value to select the clock source. If the first patch (making perf timestamps monotonic) gets accepted, there will be no immediate need for this one, but I'd like to gain some feedback anyway. That's all for this series. Previous versions: - RFC: http://www.spinics.net/lists/kernel/msg1824419.html - v1: http://thread.gmane.org/gmane.linux.kernel/1790231 - v2: http://thread.gmane.org/gmane.linux.kernel/1793272 - v3: http://thread.gmane.org/gmane.linux.kernel.api/5681 Pawel Moll (3): perf: Use monotonic clock as a source for timestamps perf: Userspace event perf: Sample additional clock value Documentation/kernel-parameters.txt | 9 +++ include/linux/perf_event.h | 6 ++ include/uapi/linux/perf_event.h | 37 +- include/uapi/linux/prctl.h | 10 +++ kernel/events/core.c| 138 kernel/sys.c| 5 ++ 6 files changed, 203 insertions(+), 2 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/3] perf: Userspace event
On Tue, 2014-11-04 at 06:33 +, Namhyung Kim wrote: > Hi Pawel, > > On Tue, 4 Nov 2014 00:28:37 +0000, Pawel Moll wrote: > > + /* > > +* Data in userspace event record is transparent for the kernel > > +* > > +* Userspace perf tool code maintains a list of known types with > > +* reference implementations of parsers for the data field. > > +* > > +* Overall size of the record (including type and size fields) > > +* is always aligned to 8 bytes by adding padding after the data. > > +* > > +* struct { > > +* struct perf_event_headerheader; > > +* u32 type; > > +* u32 size; > > The struct perf_event_header also has 'size' field and it has the entire > length of the record so it's redundant. Well, is it? Correct me if I'm wrong, but as far as I remember the record size must be always aligned to 8 bytes. Thus you can't reliably derive the data size from it - if I my data is 3 bytes long, I have to add 5 bytes of padding thus making the header.size = 24 (I'm ignoring sample_id here), right? So now, decoding the record, all I can do is: header.size - sizeof(header) - sizeof(type) - sizeof(size) = 24 - 8 - 8 = 8. So, basing on the header.size the data is 8 bytes long. But only 3 first bytes are valid... To summarize, there are three options: 1. I'm wrong and the record doesn't have to be padded to make it 8 bytes aligned. Then I can drop the additional size field. 2. I could impose a limitation on the prctl API that the data size must be 8 bytes aligned. Bad idea in my opinion, I'd rather not. 3. The additional size (for the data part) field stays. Notice that PERF_SAMPLE_RAW has it as well :-) > Also there's 'misc' field in the > perf_event_header and I guess it can be used as 'type' info as it's > mostly for cpumode and we are in user mode by definition. Hm. First of all, I don't really like the idea of "overloading" the misc meaning. It's a set of flags and I'd rather see it staying like this. Secondly, I'm not sure that it can be reused - we are in user mode, true, but it can be either PERF_RECORD_MISC_USER or PERF_RECORD_MISC_GUEST_USER. Thirdly, misc is "only" 16 bits wide, and someone even asked for the type to be 64 bit long! (I suspect he wanted to use it in some special, hacky way though :-) 32 bit length seems like a reasonable choice, though. Do you feel that the "unnecessary" type field is a big problem? Thanks for your time! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tue, 2014-11-04 at 09:24 +, Masami Hiramatsu wrote: > What I'd like to do is the binary version of ftrace-marker, the text > version is already supported by qemu (see below). > https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg00505.html > > But since that is just a string data (not structured data), it is hard to > analyze via perf-script or some other useful filters/triggers in ftrace. > > In my idea, the new event will be defined via a special file in debugfs like > kprobe-events, like below. > > # cd $debugfs/tracing > # echo "newgrp/newevent signarg:s32 flag:u64" >> marker_events > # cat events/newgrp/newevent/format > name: newevent > ID: 2048 > format: > field:unsigned short common_type; offset:0; size:2; > signed:0; > field:unsigned char common_flags; offset:2; size:1; > signed:0; > field:unsigned char common_preempt_count; offset:3; > size:1;signed:0; > field:int common_pid; offset:4; size:4; signed:1; > > field:s32 signarg; offset:8; size:4; signed:1; > field:u64 flag; offset:12; size:8; signed:0; > > print fmt: "signarg=%d flag=0x%Lx", REC->signarg, REC->flag > > Then, users will write the data (excluded common fields) when the event > happens > via trace_marker which start with '\0'ID(in u32). Kernel just checks the ID > and > its data size, but doesn't parse, filter/trigger it and log it into the > kernel buffer. Very neat, I like it! Certainly useful with scripting. Any gut feeling regarding the kernel version it will be ready for? 3.19 or later than that? > Of course, this has a downside that the user must have a privilege to access > to debugfs. > Thus maybe we need both of prctl() IF for perf and this IF for ftrace. I don't have any particularly strong feelings about the solution as long as I'm able to create this "synchronisation point" of mine in the perf data. In one of this patch's previous incarnations I was also doing a write() to the perf fd to achieve pretty much the same result. In my personal use case root access to debugfs isn't a problem (I need it for other ftrace operations anyway). However Ingo and some other guys seemed interested in prctl() approach because: 1. it's much simpler to use even comparing with simple trace_marker's open(path)/write()/close() and 2. because any process can do it at any time and the results are quietly discarded if no one is listening. I also remember that when I proposed sort of "unification" between trace_marker and the uevents, Ingo straight away "suggested" keeping it separate. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/3] perf: Use monotonic clock as a source for timestamps
On Tue, 2014-11-04 at 07:23 +, Peter Zijlstra wrote: > On Tue, Nov 04, 2014 at 12:28:36AM +0000, Pawel Moll wrote: > > > +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; > > const ? Sure (unless we have to change it as mentioned below) > > /* > > * perf samples are done in some very critical code paths (NMIs). > > * If they take too much CPU time, the system can lock up and not > > @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void) > > > > static inline u64 perf_clock(void) > > { > > - return local_clock(); > > + return ktime_get_mono_fast_ns(); > > } > > Do we maybe want to make it boot-time switchable back to local_clock for > people with bad systems and or backwards compat issues? Very good idea, should have came up with it myself :-) Does __setup("perf_use_local_clock") sound reasonable? Then we have to decide whether to hide the sysctl "perf_sample_time_clk_id" (my preferred option, will see how difficult it is) or just provide an invalid clock_id (eg. -1) in it. Cheers! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tue, 2014-11-04 at 01:25 +, Andy Lutomirski wrote: > >> If you're going to add double-stamped packets, can you also add a > >> syscall to read multiple clocks at once, atomically? Or can you > >> otherwise add a non-perf mechanism to get at this data? > > > > I've got some thoughts on what a possible interface that wouldn't be > > awful could look like, but I'm still hesitant because I don't really > > know if exposing this sort of data is actually a good idea long term. > > My only real thought here is that, if perf is going to try to do this, > then presumably it should be reasonably integrated w/ the core timing > code. I.e. if perf does this, then presumably the core code should > know about it and there should be a core interface to it. I think I understand where you're coming from. Arnd's idea for the API seems reasonable, although I can't promise implementing a proposal (don't make me stop you from doing it :-). As to the perf-specific correlation, I'm assuming limited accuracy. Others already mentioned that in the absence of hardware support, the time values are never really "atomic". The best what can be done is to access them as near to each other in the code as possible and make sure it happens in a non-preemptible section. In my tests I've achieved, on average, sub-microsecond accuracy, which was good enough from my perspective, but it's far from ideal 42ns resolution for my (just an example) time source clocked at 24MHz. Paweł -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 3/3] perf: Sample additional clock value
From: Pawel Moll This patch adds an option to sample value of an additional clock with any perf event, with the the aim of allowing time correlation between data coming from perf and other sources like hardware trace which is timestamped with an external clock. The idea is to generate periodic perf record containing timestamps from two different sources, sampled as close to each other as possible. This allows performing simple linear approximation: perf eventother event -O--+-O--> t_other : | : : V : -OO--> t_perf User can request such samples for any standard perf event, with the most obvious examples of cpu-clock (hrtimer) at selected frequency or other periodic events like sched:sched_switch. In order to do this, PERF_SAMPLE_CLOCK has to be set in struct perf_event_attr.sample_type and a type of the clock to be sampled selected by setting perf_event_attr.clock to a value corresponding to a POSIX clock clk_id (see "man 2 clock_gettime") Currently three clocks are implemented: CLOCK_REALITME = 0, CLOCK_MONOTONIC = 1 and CLOCK_MONOTONIC_RAW = 2. The clock field is 5 bits wide to allow for future extension to custom, non-POSIX clock sources(MAX_CLOCK for those is 16, see include/uapi/linux/time.h) like ARM CoreSight (hardware trace) timestamp generator. Signed-off-by: Pawel Moll --- include/linux/perf_event.h | 2 ++ include/uapi/linux/perf_event.h | 16 ++-- kernel/events/core.c| 30 ++ 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 867415d..395d6ed 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -607,6 +607,8 @@ struct perf_sample_data { * Transaction flags for abort events: */ u64 txn; + /* Clock value (additional timestamp for time correlation) */ + u64 clock; }; /* default value for data source */ diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 9a64eb1..53a7a72 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -137,8 +137,9 @@ enum perf_event_sample_format { PERF_SAMPLE_DATA_SRC= 1U << 15, PERF_SAMPLE_IDENTIFIER = 1U << 16, PERF_SAMPLE_TRANSACTION = 1U << 17, + PERF_SAMPLE_CLOCK = 1U << 18, - PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */ }; /* @@ -304,7 +305,16 @@ struct perf_event_attr { mmap2 : 1, /* include mmap with inode data */ comm_exec : 1, /* flag comm events that are due to an exec */ uevents: 1, /* allow uevents into the buffer */ - __reserved_1 : 38; + + /* +* clock: one of the POSIX clock IDs: +* +* 0 - CLOCK_REALTIME +* 1 - CLOCK_MONOTONIC +* 4 - CLOCK_MONOTONIC_RAW +*/ + clock : 5, /* clock type */ + __reserved_1 : 33; union { __u32 wakeup_events;/* wakeup every n events */ @@ -544,6 +554,7 @@ enum perf_event_type { * { u64 id; } && PERF_SAMPLE_ID * { u64 stream_id;} && PERF_SAMPLE_STREAM_ID * { u32 cpu, res; } && PERF_SAMPLE_CPU +* { u64 clock;} && PERF_SAMPLE_CLOCK * { u64 id; } && PERF_SAMPLE_IDENTIFIER * } && perf_event_attr::sample_id_all * @@ -687,6 +698,7 @@ enum perf_event_type { * { u64 weight; } && PERF_SAMPLE_WEIGHT * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION +* { u64 clock;} && PERF_SAMPLE_CLOCK * }; */ PERF_RECORD_SAMPLE = 9, diff --git a/kernel/events/core.c b/kernel/events/core.c index 3738e9c..611e2f7 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1232,6 +1232,9 @@ static void perf_event__header_size(struct perf_event *event)
[PATCH v3 2/3] perf: Userspace event
From: Pawel Moll This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by any process to inject custom data into perf data stream as a new PERF_RECORD_UEVENT record, if such process is being observed or if it is running on a CPU being observed by the perf framework. The prctl call takes the following arguments: prctl(PR_TASK_PERF_UEVENT, type, size, data, flags); - type: a number meaning to describe content of the following data. Kernel does not pay attention to it and merely passes it further in the perf data, therefore its use must be agreed between the events producer (the process being observed) and the consumer (performance analysis tool). The perf userspace tool will contain a repository of "well known" types and reference implementation of their decoders. - size: Length in bytes of the data. - data: Pointer to the data. - flags: Reserved for future use. Always pass zero. Perf context that are supposed to receive events generated with the prctl above must be opened with perf_event_attr.uevent set to 1. The PERF_RECORD_UEVENT records consist of a standard perf event header, 32-bit type value, 32-bit data size and the data itself, followed by padding to align the overall record size to 8 bytes and optional, standard sample_id field. Example use cases: - "perf_printf" like mechanism to add logging messages to perf data; in the simplest case it can be just prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0); - synchronisation of performance data generated in user space with the perf stream coming from the kernel. For example, the marker can be inserted by a JIT engine after it generated portion of the code, but before the code is executed for the first time, allowing the post-processor to pick the correct debugging information. Signed-off-by: Pawel Moll --- include/linux/perf_event.h | 4 +++ include/uapi/linux/perf_event.h | 23 - include/uapi/linux/prctl.h | 10 ++ kernel/events/core.c| 71 + kernel/sys.c| 5 +++ 5 files changed, 112 insertions(+), 1 deletion(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ba490d5..867415d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -721,6 +721,8 @@ extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks extern void perf_event_exec(void); extern void perf_event_comm(struct task_struct *tsk, bool exec); extern void perf_event_fork(struct task_struct *tsk); +extern int perf_uevent(struct task_struct *tsk, u32 type, u32 size, + const char __user *data); /* Callchains */ DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry); @@ -830,6 +832,8 @@ static inline void perf_event_mmap(struct vm_area_struct *vma) { } static inline void perf_event_exec(void) { } static inline void perf_event_comm(struct task_struct *tsk, bool exec) { } static inline void perf_event_fork(struct task_struct *tsk){ } +static inline int perf_uevent(struct task_struct *tsk, u32 type, u32 size, + const char __user *data) { return -1; }; static inline void perf_event_init(void) { } static inline int perf_swevent_get_recursion_context(void){ return -1; } static inline void perf_swevent_put_recursion_context(int rctx) { } diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 9d84540..9a64eb1 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -303,7 +303,8 @@ struct perf_event_attr { exclude_callchain_user : 1, /* exclude user callchains */ mmap2 : 1, /* include mmap with inode data */ comm_exec : 1, /* flag comm events that are due to an exec */ - __reserved_1 : 39; + uevents: 1, /* allow uevents into the buffer */ + __reserved_1 : 38; union { __u32 wakeup_events;/* wakeup every n events */ @@ -712,6 +713,26 @@ enum perf_event_type { */ PERF_RECORD_MMAP2 = 10, + /* +* Data in userspace event record is transparent for the kernel +* +* Userspace perf tool code maintains a list of known types with +* reference implementations of parsers for the data field. +* +* Overall size of the record (including type and size fields) +* is always aligned to 8 bytes by adding padding after the data. +* +* struct { +* struct perf_event_headerheader; +* u32
[PATCH v3 0/3] perf: User/kernel time correlation and event generation
From: Pawel Moll Hello again, Back to the subject, this time with a slightly different angle... I've organised a session on the subject during the tracing minisummit at LPC 2014 in Dusseldorf. Notes taken from the discussion taken by Steven Rostedt (thanks Steve!) http://www.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_Tracing.txt The following three patches address three main topics. They are pretty much orthogonal and (subject to small and obvious modifications) could be applied independently from each other. An executive summary of both discussion and the patches: 1. User/kernel perf timestamps correlation Thomas suggested that, instead of jumping through loops of correlation, perf should simply generate monotonic clock timestamps, instead of using sched clock. Peter and I pointed out that Ingo didn't like this idea as monotonic can be slow, but apparently the cases when it is are irrelevant. Thomas offered to fly to Budapest to physically convince Ingo - I hope it won't be necessary and he will be able to achieve this here, on mailing lists :-) Setting aside potential performance problems, it would be a really great solution, unifying all trace systems (perf, ftrace and LTTng) in this respect. I'm more than happy to work on potential improvements in the are of monotonic clock if it was to help the cause. If it is a definite no-go, we still have the third patch, allowing post- capture correlation based on synchronisation events. 2. User event generation Everyone present agreed that it would be a very-nice-to-have feature. There was some discussion about implementation details, so I welcome feedback and comments regarding my take on the matter. 3. Correlation with external timestamps This is a new issue, which surfaced recently while I was working on hardware trace infrastructure. It can timestamp trace packets, but is using yet another, completely different time source to do this. Thomas suggested solution which gets down to my original proposal for sched/monotonic clock correlation - an additional sample type so events can be "double stamped" using different clock sources providing synchronisation points for later time approximation. I've just extended the implementation with configuration value to select the clock source. If the first patch (making perf timestamps monotonic) gets accepted, there will be no immediate need for this one, but I'd like to gain some feedback anyway. That's all for this series. Previous versions: - RFC: http://www.spinics.net/lists/kernel/msg1824419.html - v1: http://thread.gmane.org/gmane.linux.kernel/1790231 - v2: http://thread.gmane.org/gmane.linux.kernel/1793272 Pawel Moll (3): perf: Use monotonic clock as a source for timestamps perf: Userspace event perf: Sample additional clock value include/linux/perf_event.h | 7 +++ include/uapi/linux/perf_event.h | 37 +- include/uapi/linux/prctl.h | 10 kernel/events/core.c| 105 +++- kernel/sys.c| 5 ++ kernel/sysctl.c | 7 +++ 6 files changed, 168 insertions(+), 3 deletions(-) -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 1/3] perf: Use monotonic clock as a source for timestamps
Until now, perf framework never defined the meaning of the timestampt captured as PERF_SAMPLE_TIME sample type. The values were obtaining from local (sched) clock, which is unavailable in userspace. This made it impossible to correlate perf data with any other events. Other tracing solutions have the source configurable (ftrace) or just share a common time domain between kernel and userspace (LTTng). Follow the trend by using monotonic clock, which is readily available as POSIX CLOCK_MONOTONIC. Also add a sysctl "perf_sample_time_clk_id" attribute which can be used by the user to obtain the clk_id to be used with POSIX clock API (eg. clock_gettime()) to obtain a time value comparable with perf samples. Signed-off-by: Pawel Moll --- Ingo, I remember your comments about this approach in the past, but during discussions at LPC Thomas was convinced that it's the right thing to do - see cover letter for the series... include/linux/perf_event.h | 1 + kernel/events/core.c | 4 +++- kernel/sysctl.c| 7 +++ 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 893a0d0..ba490d5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -738,6 +738,7 @@ extern int sysctl_perf_event_paranoid; extern int sysctl_perf_event_mlock; extern int sysctl_perf_event_sample_rate; extern int sysctl_perf_cpu_time_max_percent; +extern int sysctl_perf_sample_time_clk_id; extern void perf_sample_event_took(u64 sample_len_ns); diff --git a/kernel/events/core.c b/kernel/events/core.c index 2b02c9f..ea3d6d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -234,6 +234,8 @@ int perf_cpu_time_max_percent_handler(struct ctl_table *table, int write, return 0; } +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; + /* * perf samples are done in some very critical code paths (NMIs). * If they take too much CPU time, the system can lock up and not @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void) static inline u64 perf_clock(void) { - return local_clock(); + return ktime_get_mono_fast_ns(); } static inline struct perf_cpu_context * diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 15f2511..cb75f5b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1094,6 +1094,13 @@ static struct ctl_table kern_table[] = { .extra1 = &zero, .extra2 = &one_hundred, }, + { + .procname = "perf_sample_time_clk_id", + .data = &sysctl_perf_sample_time_clk_id, + .maxlen = sizeof(unsigned int), + .mode = 0444, + .proc_handler = proc_dointvec, + }, #endif #ifdef CONFIG_KMEMCHECK { -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf: Userspace software event and ioctl
On Mon, 2014-11-03 at 14:48 +, Tomeu Vizoso wrote: > On 29 September 2014 17:53, Pawel Moll wrote: > > On Mon, 2014-09-29 at 16:32 +0100, Peter Zijlstra wrote: > >> Also none of the many words above describe > >> PERF_SAMPLE_USERSPACE_EVENT(), wth is that about? > > > > Hopefully description of the v2 makes better job in this: > > > > http://thread.gmane.org/gmane.linux.kernel/1793272/focus=4813 > > > > where it's already called "UEVENT" and was generated by write(). > > > > Before you get into this, though, the most important outcomes of both v1 > > and v2 discussions: > > > > * Ingo suggested prctl(PR_TRACE_UEVENT, type, size, data, 0) as the way > > of generating such events (so the tracee doesn't have to know the fd to > > do ioctl); Frederic seems to have the same on his mind. > > > > * Namhyung proposed sticking the userspace-originating events into the > > buffer as PERF_RECORD_UEVENT rather then PERF_SAMPLE_UEVENT. > > > > Working on making both happen now. > > are you still working on this? Would be happy to lend a hand if that > can speed things up. By all means! In fact I'm typing commit messages right now and will post the patches later today. Stay tuned and I'm looking forward to all suggestions, reviews etc. Cheers! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf: Userspace software event and ioctl
On Mon, 2014-09-29 at 16:32 +0100, Peter Zijlstra wrote: > Also none of the many words above describe > PERF_SAMPLE_USERSPACE_EVENT(), wth is that about? Hopefully description of the v2 makes better job in this: http://thread.gmane.org/gmane.linux.kernel/1793272/focus=4813 where it's already called "UEVENT" and was generated by write(). Before you get into this, though, the most important outcomes of both v1 and v2 discussions: * Ingo suggested prctl(PR_TRACE_UEVENT, type, size, data, 0) as the way of generating such events (so the tracee doesn't have to know the fd to do ioctl); Frederic seems to have the same on his mind. * Namhyung proposed sticking the userspace-originating events into the buffer as PERF_RECORD_UEVENT rather then PERF_SAMPLE_UEVENT. Working on making both happen now. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] perf: Add sampling of the raw monotonic clock
On Mon, 2014-09-29 at 16:28 +0100, Peter Zijlstra wrote: > On Thu, Sep 18, 2014 at 03:34:32PM +0100, Pawel Moll wrote: > > @@ -4456,6 +4459,13 @@ static void __perf_event_header__init_id(struct > > perf_event_header *header, > > data->cpu_entry.cpu = raw_smp_processor_id(); > > data->cpu_entry.reserved = 0; > > } > > + > > + if (sample_type & PERF_SAMPLE_CLOCK_RAW_MONOTONIC) { > > + struct timespec now; > > + > > + getrawmonotonic(&now); > > + data->clock_raw_monotonic = timespec_to_ns(&now); > > + } > > } > > > > This cannot work, getrawmonotonic() isn't NMI-safe and there's > nothing stopping this being used from NMI context. > > Also getrawmonotonic() + timespec_to_ns() will make tglx sad, he's just > done a tree-wide eradication of silly conversions and now you're adding > a ns -> timespec -> ns dance right back. Last thing I want is to make Thomas sad... For obvious reasons ;-) > I _think_ you want ktime_get_mono_fast_ns(), With pleasure, it's exactly what I need. > but this does bring us > right back to the question/discussion on which timebase you'd want to > sync again. MONO does make sense for most cases, but I think we've had > fairly sane stories for people wanting to sync against other clocks. Yes. I've asked the same question somewhere in the thread. ftrace has got a switch and a selection of trace_clocks in kernel/trace/trace.c - do we want something similar (in integer form probably, though) in perf_events.h with an additional "flag" in struct perf_event_attr? It could be used to pick a time source for PERF_SAMPLE_CLOCK (PERF_SAMPLE_TRACE_CLOCK?) sample. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf: Userspace software event and ioctl
On Sat, 2014-09-27 at 18:14 +0100, Frederic Weisbecker wrote: > 2014-09-25 20:33 GMT+02:00 Ingo Molnar : > > > > * Pawel Moll wrote: > > > >> On Wed, 2014-09-24 at 08:49 +0100, Ingo Molnar wrote: > >> > * Pawel Moll wrote: > >> > > >> > > On Thu, 2014-09-18 at 15:34 +0100, Pawel Moll wrote: > >> > > > This patch adds a PERF_COUNT_SW_USERSPACE_EVENT type, > >> > > > which can be generated by user with PERF_EVENT_IOC_ENTRY > >> > > > ioctl command, which injects an event of said type into > >> > > > the perf buffer. > >> > > > >> > > It occurred to me last night that currently perf doesn't handle "write" > >> > > syscall at all, while this seems like the most natural way of > >> > > "injecting" userspace events into perf buffer. > >> > > > >> > > An ioctl would still be needed to set a type of the following events, > >> > > something like: > >> > > > >> > > ioctl(SET_TYPE, 0x42); > >> > > write(perf_fd, binaryblob, size); > >> > > ioctl(SET_TYPE, 0); > >> > > dprintf(perf_fd, "String"); > >> > > > >> > > which is fine for use cases when the type doesn't change often, > >> > > but would double the amount of syscalls when every single event > >> > > is of a different type. Perhaps there still should be a > >> > > "generating ioctl" taking both type and data/size in one go? > >> > > >> > Absolutely, there should be a single syscall. > >> > >> Yeah, it's my gut feeling as well. I just wonder if we still want to > >> keep write() handler for operations on perf fds? This seems natural - > >> takes data buffer and its size. The only issue is the type. > >> > >> > I'd even argue it should be a new prctl(): that way we could both > >> > generate user events for specific perf fds, but also into any > >> > currently active context (that allows just generation/injection > >> > of user events). In the latter case we might have no fd to work > >> > off from. > >> > >> When Arnaldo suggested that the "user events" could be used by perf > >> trace, it was exactly my first thought. I just didn't have answer how to > >> present it to the user (an extra syscall didn't seem like a good idea), > >> but prctl seems interesting, something like this? > >> > >> prctl(PR_TRACE_UEVENT, type, size, data, 0); > > > > Exactly! > > > >> How would we select tasks that can write to a given buffer? Maybe an > >> ioctl() on a perf fd? Something like this? > >> > >> ioctl(perf_fd, PERF_EVENT_IOC_ENABLE_UEVENT, pid); > >> ioctl(perf_fd, PERF_EVENT_IOC_DISABLE_UEVENT, pid); > > > > No, I think there's a simpler way: this should be a regular > > perf_attr flag, which defaults to '0' (tasks cannot do this), but > > which can be set to 1 if the profiler explicitly allows such > > event injection. > > Maybe we just don't even need any permission at all. Which harm can > that do if this only ever generate events to those interested in the > relevant perf context? It could be a simple tracepoint BTW. Yeah, Ingo already pointed it out (that non-root task can't trace root tasks anyway). > Oh and I really like the fact we don't use a syscall that requires an > fd. The tracee really shouldn't be aware of the tracer. Agreed, I'll look at solution with prctl() this week. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] perf: Add sampling of the raw monotonic clock
On Fri, 2014-09-26 at 20:25 +0100, David Ahern wrote: > On 9/26/14, 9:05 AM, Pawel Moll wrote: > > To do the correlation you need both timestamps to be "taken" > > simultaneously: > > > > perf event user event > > -O--+-O--> t_mono > > : | : > > : V : > > -OO--> t_perf > > > > Of course it's not possible get both values literally at the same time, > > but placing them in a atomic context a couple of instructions from each > > other still gives pretty good results. The larger this distance is, the > > An early patchset on this topic added the realtime clock as an event and > an ioctl was used to push a sample into the event stream. Yeah, I remember. If I remember correctly correctly the pushback was on a custom event type, right? Generally speaking I don't mind any solution that we'll get us to the place both you and I want to be (just being able to time stamp some performance data in userspace, how difficult can this be! ;-) but I like the flexibility of an extra sample - one can pick and mix events and samples at one's leisure. > In that case > you have wall clock and perf-clock samples taken in the same kernel > context and about as close together as you can get. Yep, that's what I was saying - we can't quite get two timestamps at the *same*, but getting them within a single atomic block of instructions gives reasonable accuracy. Thanks! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] perf: Add sampling of the raw monotonic clock
On Fri, 2014-09-26 at 15:38 +0100, Namhyung Kim wrote: > > Then I have loads of normal normal samples, timestamped with sched clock > > only, and every now and then one with both timestamps which then I can > > use for time correlation. The whole point is that the frequency of such > > "synchronisation" event can be much (much!) lower than of the normal > > samples, but it still allows pretty good approximation (I was getting > > accuracy of ~1 microsecond and better with sched_switch trace event > > marked with additional raw monotonic timestamp). > > Okay. But in that case wouldn't it be enough to use just a single > timestamp for each event - sched_clock for cpu-cycles and monotonic raw > for sched_switch? To do the correlation you need both timestamps to be "taken" simultaneously: perf event user event -O--+-O--> t_mono : | : : V : -OO--> t_perf Of course it's not possible get both values literally at the same time, but placing them in a atomic context a couple of instructions from each other still gives pretty good results. The larger this distance is, the lower the accuracy will be. I must admit I haven't done such experiments, but let me remind that I in my test I was getting results in the range of 1000ns, with a single cycle of a 2GHz taking 0.5ns, so moving the t_mono/t_perf value sampling further aside will reduce it significantly... Pawel Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf: Userspace software event and ioctl
On Fri, 2014-09-26 at 12:23 +0100, Ingo Molnar wrote: > > As in: allows *all* tasks to inject the data? Are you sure we > > don't want more fine-grained control, in particular per task? > > Yeah. If the profiler allows it, then any task that is being > traced can inject data. The "that is being traced" fragment was the key here. I missed the fact that perf trace already takes a list of pids, so we're not talking about all tasks in the system. That should work. Paweł -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] perf: Userspace event
On Fri, 2014-09-26 at 07:21 +0100, Namhyung Kim wrote: > It looks like what trace-marker in ftrace does.. We might connect > output of the trace marker into a perf event somehow. I can probably trace_marker's write handler do the same as the new prctl() would do. But this means that we really want the pre-defined "zero terminated string" type (0). Otherwise, what type would be assigned to a record originating from it? Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] perf: Add sampling of the raw monotonic clock
On Fri, 2014-09-26 at 07:16 +0100, Namhyung Kim wrote: > > It would be doable, I guess, but what > > if someone *wants* to have sched clock as the timestamps source (because > > it's cheap) but still be able to correlate them with userspace? In this > > case two separate timestamps are required to do the approximation. > > But by collecting two timestamps, you'll loose the win of the first > timestamp, no? But I can ask for both timestamps only being collected on "low bandwidth" events, in particular context switches and/or periodic (eg. 10ms hrtimer) software events. Then I have loads of normal normal samples, timestamped with sched clock only, and every now and then one with both timestamps which then I can use for time correlation. The whole point is that the frequency of such "synchronisation" event can be much (much!) lower than of the normal samples, but it still allows pretty good approximation (I was getting accuracy of ~1 microsecond and better with sched_switch trace event marked with additional raw monotonic timestamp). Pawel PS. Have you sent a couple of the messages via some kind of gmane's proxy? All the mail addresses got rather messed up... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf: Userspace software event and ioctl
On Thu, 2014-09-25 at 19:33 +0100, Ingo Molnar wrote: > > How would we select tasks that can write to a given buffer? Maybe an > > ioctl() on a perf fd? Something like this? > > > > ioctl(perf_fd, PERF_EVENT_IOC_ENABLE_UEVENT, pid); > > ioctl(perf_fd, PERF_EVENT_IOC_DISABLE_UEVENT, pid); > > No, I think there's a simpler way: this should be a regular > perf_attr flag, which defaults to '0' (tasks cannot do this), but > which can be set to 1 if the profiler explicitly allows such > event injection. As in: allows *all* tasks to inject the data? Are you sure we don't want more fine-grained control, in particular per task? If we have two buffers, both created with the "injecting allowed" flag, do we inject a given uevent into both of them? > I.e. whether user-events are allowed is controlled by the > profiling/tracing context, via the regular perf syscall. It would > propagate into the perf context, so it would be easy to check at > event generation time. It would definitely be the profiling/tracing tools that would decide if the injection is allowed, no question about that. I just feel that it should be able to select the tasks that can do that, not just flip a big switch saying "everyone is welcome". Other question is: should a non-root context be able to receive events from root processes? Wouldn't it be a security hole (for example, it could be used as a kind of covert channel)? Maybe we should do what ptrace does? As in: if a task can ptrace another task, it can also receive uevents from it. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf: Userspace software event and ioctl
On Wed, 2014-09-24 at 08:49 +0100, Ingo Molnar wrote: > * Pawel Moll wrote: > > > On Thu, 2014-09-18 at 15:34 +0100, Pawel Moll wrote: > > > This patch adds a PERF_COUNT_SW_USERSPACE_EVENT type, > > > which can be generated by user with PERF_EVENT_IOC_ENTRY > > > ioctl command, which injects an event of said type into > > > the perf buffer. > > > > It occurred to me last night that currently perf doesn't handle "write" > > syscall at all, while this seems like the most natural way of > > "injecting" userspace events into perf buffer. > > > > An ioctl would still be needed to set a type of the following events, > > something like: > > > > ioctl(SET_TYPE, 0x42); > > write(perf_fd, binaryblob, size); > > ioctl(SET_TYPE, 0); > > dprintf(perf_fd, "String"); > > > > which is fine for use cases when the type doesn't change often, > > but would double the amount of syscalls when every single event > > is of a different type. Perhaps there still should be a > > "generating ioctl" taking both type and data/size in one go? > > Absolutely, there should be a single syscall. Yeah, it's my gut feeling as well. I just wonder if we still want to keep write() handler for operations on perf fds? This seems natural - takes data buffer and its size. The only issue is the type. > I'd even argue it should be a new prctl(): that way we could both > generate user events for specific perf fds, but also into any > currently active context (that allows just generation/injection > of user events). In the latter case we might have no fd to work > off from. When Arnaldo suggested that the "user events" could be used by perf trace, it was exactly my first thought. I just didn't have answer how to present it to the user (an extra syscall didn't seem like a good idea), but prctl seems interesting, something like this? prctl(PR_TRACE_UEVENT, type, size, data, 0); How would we select tasks that can write to a given buffer? Maybe an ioctl() on a perf fd? Something like this? ioctl(perf_fd, PERF_EVENT_IOC_ENABLE_UEVENT, pid); ioctl(perf_fd, PERF_EVENT_IOC_DISABLE_UEVENT, pid); It could set/clear a flag in pid's task_struct (but probably not in the "normal" flags, as they are only supposed to be set by owner and in ptrace/fork case) and a pointer to the task in perf_event(_context). Or maybe some variation on ptrace would be more in place? This would also solve issue of permission checking (if the profiling tool can ptrace the process, it can also enable/disable its uevent generation capability). Paweł Or maybe it should go through ptrace? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] perf: Userspace event
On Wed, 2014-09-24 at 07:07 +0100, Namhyung Kim wrote: > On Tue, 23 Sep 2014 18:03:07 +0100, Pawel Moll wrote: > > This patch adds a new PERF_COUNT_SW_UEVENT software event > > and a related PERF_SAMPLE_UEVENT sample. User can now > > write to the the perf file descriptor, injecting such > > event in the perf buffer. > > It seems the PERF_SAMPLE_UEVENT sample can be injected to any event. So > why the PERF_COUNT_SW_UEVENT is needed? At least one can use the > SW_DUMMY event for that purpose. You're right. I needed a different SW type in one of my early prototypes, but it's not the case any more. Consider it gone. > Also I think it'd be better to be a record type (PERF_RECORD_XXX) > instead of a sample flag (PERF_SAMPLE_XXX). In perf tools, we already > use perf_user_event_type for synthesized userspace events. This way it > can avoid unnecessary sample processing for userspace events. Fine with me. If no one objects, I'm more than happy to use PERF_RECORD_UEVENT = 11 for it. > For contents, I prefer to give complete control to users - kernel > doesn't need to care about it other than its size. If one just wants to > use strings only, she can write them directly. If others want to mix > different types of data, they might need to define a data format for > their use. Are you saying to drop even the "type 0 means zero-terminated string" definition, even if everything else is up to the user? I quite like that idea, especially combined with write()ing to the perf_fd (it is very much like trace_marker then, which is beautiful in its simplicity), but the feelings are not that strong to fight a war over it. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/