[PATCH SLOF v2 0/5] GPT fixes/cleanup and LVM support with FAT
Following patchset implements some improvements and cleanup for the GPT booting code: patch 1: Simplify the gpt detection code with lesser scopes and add comments. patch 2: Introduce 8byte LE helpers: x@-le and x!-le patch 3: Rename block / read-sector to indicate it a allocated buffer patch 4: As we need to detect FAT partition, implement a helper that can be used both from GPT code and fat-bootblock? patch 5: Implement GPT FAT for LVM suport and make GPT detection code robust Nikunj A Dadhania (5): disk-label: simplify gpt-prep-partition? routine introduce 8-byte LE helpers disk-label: rename confusing block word disk-label: introduce helper to check fat filesystem disk-label: add support for booting from GPT FAT partition slof/fs/little-endian.fs | 6 ++ slof/fs/packages/disk-label.fs | 209 + 2 files changed, 136 insertions(+), 79 deletions(-) -- 2.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH SLOF v2 1/5] disk-label: simplify gpt-prep-partition? routine
Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com Reviewed-by: Thomas Huth th...@redhat.com --- slof/fs/packages/disk-label.fs | 42 -- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs index fe1c25e..bb64d57 100644 --- a/slof/fs/packages/disk-label.fs +++ b/slof/fs/packages/disk-label.fs @@ -352,42 +352,32 @@ CONSTANT /gpt-part-entry drop 0 ; -\ Check for GPT PReP partition GUID -9E1A2D38 CONSTANT GPT-PREP-PARTITION-1 -C612 CONSTANT GPT-PREP-PARTITION-2 -4316 CONSTANT GPT-PREP-PARTITION-3 -AA26 CONSTANT GPT-PREP-PARTITION-4 -8B49521E5A8B CONSTANT GPT-PREP-PARTITION-5 +\ Check for GPT PReP partition GUID. Only first 3 blocks are +\ byte-swapped treating last two blocks as contigous for simplifying +\ comparison +9E1A2D38CONSTANT GPT-PREP-PARTITION-1 +C612CONSTANT GPT-PREP-PARTITION-2 +4316CONSTANT GPT-PREP-PARTITION-3 +AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4 : gpt-prep-partition? ( -- true|false ) - block gpt-part-entrypart-type-guid l@-le GPT-PREP-PARTITION-1 = IF - block gpt-part-entrypart-type-guid 4 + w@-le - GPT-PREP-PARTITION-2 = IF - block gpt-part-entrypart-type-guid 6 + w@-le - GPT-PREP-PARTITION-3 = IF -block gpt-part-entrypart-type-guid 8 + w@ -GPT-PREP-PARTITION-4 = IF - block gpt-part-entrypart-type-guid a + w@ - block gpt-part-entrypart-type-guid c + l@ swap lxjoin - GPT-PREP-PARTITION-5 = IF - TRUE EXIT - THEN -THEN - THEN - THEN - THEN - FALSE + block gpt-part-entrypart-type-guid + dup l@-le GPT-PREP-PARTITION-1 IF drop false EXIT THEN + dup 4 + w@-le GPT-PREP-PARTITION-2 IF drop false EXIT THEN + dup 6 + w@-le GPT-PREP-PARTITION-3 IF drop false EXIT THEN + 8 + x@GPT-PREP-PARTITION-4 IF false EXIT THEN + true ; : load-from-gpt-prep-partition ( addr -- size ) - no-gpt? IF drop FALSE EXIT THEN + no-gpt? IF drop false EXIT THEN debug-disk-label? IF cr . GPT partition found cr THEN 1 read-sector block gptpart-entry-lba l@-le block-size * to seek-pos block gptpart-entry-size l@-le to gpt-part-size - block gptnum-part-entry l@-le dup 0= IF FALSE EXIT THEN + block gptnum-part-entry l@-le dup 0= IF false EXIT THEN 1+ 1 ?DO seek-pos 0 seek drop block gpt-part-size read drop gpt-prep-partition? IF @@ -405,7 +395,7 @@ AA26 CONSTANT GPT-PREP-PARTITION-4 THEN seek-pos gpt-part-size i * + to seek-pos LOOP - FALSE + false ; \ Extract the boot loader path from a bootinfo.txt file -- 2.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH SLOF v2 4/5] disk-label: introduce helper to check fat filesystem
Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com --- slof/fs/packages/disk-label.fs | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs index 0995808..7ed5526 100644 --- a/slof/fs/packages/disk-label.fs +++ b/slof/fs/packages/disk-label.fs @@ -321,6 +321,14 @@ CONSTANT /gpt-part-entry \ Load from first active DOS boot partition. +: fat-bootblock? ( addr -- flag ) + \ byte 0-2 of the bootblock is a jump instruction in + \ all FAT filesystems. + \ e9 and eb are jump instructions in x86 assembler. + dup c@ e9 = IF drop true EXIT THEN + dup c@ eb = swap 2+ c@ 90 = and +; + \ NOTE: block-size is always 512 bytes for DOS partition tables. : load-from-dos-boot-partition ( addr -- size ) @@ -549,14 +557,7 @@ AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4 : try-dos-files ( -- found? ) no-mbr? IF false EXIT THEN - \ disk-buf 0 byte 0-2 is a jump instruction in all FAT - \ filesystems. - \ e9 and eb are jump instructions in x86 assembler. - disk-buf c@ e9 IF - disk-buf c@ eb - disk-buf 2+ c@ 90 or - IF false EXIT THEN - THEN + disk-buf fat-bootblock? 0= IF false EXIT THEN s fat-files (interpose-filesystem) true ; -- 2.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [GIT PULL 00/13] perf/core improvements and fixes
* Arnaldo Carvalho de Melo a...@kernel.org wrote: Hi Ingo, Please consider pulling, - Arnaldo The following changes since commit a9a3cd900fbbcbf837d65653105e7bfc583ced09: Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2015-06-20 01:11:11 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo for you to fetch changes up to 83b2ea257eb1d43e52f76d756722aeb899a2852c: perf tools: Allow auxtrace data alignment (2015-06-23 18:28:37 -0300) perf/core improvements and fixes: User visible: - Move toggling event logic from 'perf top' and into hists browser, allowing freeze/unfreeze with event lists with more than one entry (Namhyung Kim) - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and showing the Aggregated stats in 'perf report -D' (Adrian Hunter) Infrastructure: - Allow auxtrace data alignment (Adrian Hunter) - Allow events with dot (Andi Kleen) - Fix failure to 'perf probe' events on arm (He Kuang) - Add testing for Makefile.perf (Jiri Olsa) - Add test for make install with prefix (Jiri Olsa) - Fix single target build dependency check (Jiri Olsa) - Access thread_map entries via accessors, prep patch to hold more info per entry, for ongoing 'perf stat --per-thread' work (Jiri Olsa) - Use __weak definition from compiler.h (Sukadev Bhattiprolu) - Split perf_pmu__new_alias() (Sukadev Bhattiprolu) Signed-off-by: Arnaldo Carvalho de Melo a...@redhat.com Adrian Hunter (3): perf session: Print a newline when dumping PERF_RECORD_FINISHED_ROUND perf tools: Print a newline before dumping Aggregated stats perf tools: Allow auxtrace data alignment Andi Kleen (1): perf tools: Allow events with dot He Kuang (1): perf probe: Fix failure to probe events on arm Jiri Olsa (5): perf tests: Add testing for Makefile.perf perf tests: Add test for make install with prefix perf build: Fix single target build dependency check perf thread_map: Don't access the array entries directly perf thread_map: Change map entries into a struct Namhyung Kim (1): perf top: Move toggling event logic into hists browser Sukadev Bhattiprolu (2): perf pmu: Use __weak definition from linux/compiler.h perf pmu: Split perf_pmu__new_alias() tools/perf/Makefile | 4 +-- tools/perf/builtin-top.c| 24 ++- tools/perf/builtin-trace.c | 4 +-- tools/perf/tests/make | 31 ++-- tools/perf/tests/openat-syscall-tp-fields.c | 2 +- tools/perf/ui/browsers/hists.c | 19 ++-- tools/perf/util/auxtrace.c | 11 +-- tools/perf/util/auxtrace.h | 1 + tools/perf/util/event.c | 6 ++-- tools/perf/util/evlist.c| 4 +-- tools/perf/util/evsel.c | 2 +- tools/perf/util/parse-events.l | 5 ++-- tools/perf/util/pmu.c | 45 +++-- tools/perf/util/probe-event.c | 6 +++- tools/perf/util/session.c | 4 ++- tools/perf/util/thread_map.c| 24 --- tools/perf/util/thread_map.h| 16 +- 17 files changed, 136 insertions(+), 72 deletions(-) Pulled, thanks a lot Arnaldo! Btw., one small thing I noticed about the status line in perf top: if I ever use 'f' to freeze/unfreeze events, the following message: Press 'f' to disable the events or 'h' to see other hotkeys sticks around forever, even after I look into annotation and exit it, etc. So I don't mind some default, helpful message there (such as 'Press 'h' to see hotkeys'), but it appears this particular message is context and usage sensitive, which wasn't really the goal, right? Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH SLOF v2 3/5] disk-label: rename confusing block word
block word is not a block number, actually its an allocated host address. Rename it to disk-buf along with a associated size(disk-buf-size=4096) for using during allocation/free. Also renaming the helper routine read-sector to read-disk-buf. This routine assumes the address to be disk-buf and only takes sector number as argument. Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com --- slof/fs/packages/disk-label.fs | 78 ++ 1 file changed, 41 insertions(+), 37 deletions(-) diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs index 8c93cfb..0995808 100644 --- a/slof/fs/packages/disk-label.fs +++ b/slof/fs/packages/disk-label.fs @@ -33,7 +33,8 @@ s disk-label device-name 0 INSTANCE VALUE dos-logical-partitions 0 INSTANCE VALUE block-size -0 INSTANCE VALUE block +0 INSTANCE VALUE disk-buf +d# 4096VALUE disk-buf-size 0 INSTANCE VALUE args 0 INSTANCE VALUE args-len @@ -126,11 +127,11 @@ CONSTANT /gpt-part-entry ; -\ read sector to array block -: read-sector ( sector-number -- ) +\ read sector to array disk-buf +: read-disk-buf ( sector-number -- ) \ block-size is 0x200 on disks, 0x800 on cdrom drives block-size * 0 seek drop \ seek to sector - block block-size read drop\ read sector + disk-buf block-size read drop\ read sector ; : (.part-entry) ( part-entry ) @@ -149,35 +150,35 @@ CONSTANT /gpt-part-entry : (.name) r@ begin cell - dup @ colon = UNTIL xtname cr type space ; -: init-block ( -- ) +: init-disk-buf ( -- ) s block-size ['] $call-parent CATCH IF ABORT parent has no block-size. THEN to block-size - d# 4096 alloc-mem - dup d# 4096 erase - to block + disk-buf-size alloc-mem + dup disk-buf-size erase + to disk-buf debug-disk-label? IF - . init-block: block-size= block-size .d . block=0x block u. cr + . init-disk-buf: block-size= block-size .d . disk-buf=0x disk-buf u. cr THEN ; : partitionpart-entry ( partition -- part-entry ) - 1- /partition-entry * block mbrpartition-table + + 1- /partition-entry * disk-buf mbrpartition-table + ; : partitionstart-sector ( partition -- sector-offset ) partitionpart-entry part-entrysector-offset l@-le ; -\ This word returns true if the currently loaded block has _NO_ MBR magic +\ This word returns true if the currently loaded disk-buf has _NO_ MBR magic : no-mbr? ( -- true|false ) - 0 read-sector + 0 read-disk-buf 1 partitionpart-entry part-entryid c@ ee = IF TRUE EXIT THEN \ GPT partition found - block mbrmagic w@-le aa55 + disk-buf mbrmagic w@-le aa55 ; -\ This word returns true if the currently loaded block has _NO_ GPT partition id +\ This word returns true if the currently loaded disk-buf has _NO_ GPT partition id : no-gpt? ( -- true|false ) - 0 read-sector + 0 read-disk-buf 1 partitionpart-entry part-entryid c@ ee ; @@ -197,7 +198,7 @@ CONSTANT /gpt-part-entry part-entrysector-offset l@-le( current sector ) dup to part-start to lpart-start ( current ) BEGIN -part-start read-sector \ read EBR +part-start read-disk-buf \ read EBR 1 partitionstart-sector IF \ . Logical Partition found at part-start .d cr 1+ @@ -240,7 +241,7 @@ CONSTANT /gpt-part-entry part-entrysector-offset l@-le( log-part current sector ) dup to part-start to lpart-start ( log-part current ) BEGIN - part-start read-sector \ read EBR + part-start read-disk-buf \ read EBR 1 partitionstart-sector IF\ first partition entry 1+ 2dup = IF( log-part current ) 2drop @@ -306,13 +307,13 @@ CONSTANT /gpt-part-entry : has-iso9660-filesystem ( -- TRUE|FALSE ) \ Seek to the beginning of logical 2048-byte sector 16 \ refer to Chapter C.11.1 in PAPR 2.0 Spec - \ was: 10 read-sector, but this might cause trouble if you + \ was: 10 read-disk-buf, but this might cause trouble if you \ try booting an ISO image from a device with 512b sectors. 10 800 * 0 seek drop \ seek to sector - block 800 read drop \ read sector + disk-buf 800 read drop \ read sector \ Check for CD-ROM volume magic: - block c@ 1 = - block 1+ 5 s CD001 str= + disk-buf c@ 1 = + disk-buf 1+ 5 s CD001 str= and dup IF 800 to block-size THEN ; @@ -361,7 +362,7 @@ C612CONSTANT GPT-PREP-PARTITION-2 AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4 : gpt-prep-partition? ( -- true|false ) - block gpt-part-entrypart-type-guid + disk-buf gpt-part-entrypart-type-guid dup l@-le GPT-PREP-PARTITION-1 IF drop false EXIT THEN dup 4 + w@-le GPT-PREP-PARTITION-2 IF drop false EXIT THEN dup 6 + w@-le GPT-PREP-PARTITION-3 IF drop false EXIT
Re: [PATCH] powerpc/powernv: Fix vma page prot flags in opal-prd driver
* Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com [2015-06-25 11:45:46]: [snip] Hi Ben, remap_pfn_range() is the correct method to map the firmware pages because we will not have struct page associated with this RAM area. We do a memblock_reserve() in early boot and take out this memory from kernel and avoid struct page allocation/init for these. Kindly ignore the this comment. memblock_reserve() does not prevent/avoid struct page allocation. We do have valid struct page which can be used for mapping. --Vaidy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 3/3] leds/powernv: Add driver for PowerNV platform
On 06/25/2015 06:39 AM, Benjamin Herrenschmidt wrote: On Tue, 2015-04-28 at 15:40 +0530, Vasant Hegde wrote: +Device Tree binding for LEDs on IBM Power Systems +- + +The 'led' node under '/ibm,opal' lists service indicators available in the +system and their capabilities. + +led { +compatible = ibm,opal-v3-led; +phandle = 0x106b; +linux,phandle = 0x106b; +led-mode = lightpath; + +U78C9.001.RST0027-P1-C1 { +led-types = identify, fault; +led-loc = descendent; +phandle = 0x106f; +linux,phandle = 0x106f; +}; +... +... +}; Ben, My only issue is that /led should probably have been /leds but afaik this is already committed in the FW tree. Vasant, have we done a release what that code yet or can we still change this ? I think we can change OPAL side as we haven't released kernel code... So no consumer yet. Will send a patch to skiboot mailing list. lets see Also what does led-mode = lightpath means ? Can you describe it ? Are there alternative values ? I don't see the point myself ... Yes.. Our system can work in two modes... - light path -- Both identify and faults are supported typically all low end servers are in this mode - guiding light - Only identify LEDs are supports . no fault indicator support for individual FRU typically high end servers are in this mode. These modes are static in nature.. meaning we cannot change that during run time... AFAIK all the PowerNV boxes shipped today are in Light Path mode.. I have added this, so that in future if they decide to ship system with guiding light mode, then we don't need to make any changes. Don't leave the linux,phandle in the description of the binding (nor the phandle actually). They are implicit for all nodes, no need to clutter the documentation with them. Sure. Will fix in next version. +Each node under 'led' node describes location code of FRU/Enclosure. + +The properties under each node: + + led-types : Supported LED types (attention/identify/fault). + + led-loc : enclosure/descendent(FRU) location code. I don't understand what that means. Please provide a more detailed explanation. This describes the LED location (FRU or enclosure level).. This was added to identify the component (as FRU leds are overloaded and enclosure level we have separate LEDs for each component)... Is the name of the node the loc code of the FRU ? In that case, how do you deal with multiple LEDs on the same FRU without a unit address ? We use location code + LED type for node. So that we can identify multiple LEDs. Looking back again, probably we can take out above property as we are not using that today. -Vasant ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V9 05/13] powerpc, perf: Change name type of 'pred' in power_pmu_bhrb_read
On 06/25/2015 10:41 AM, Daniel Axtens wrote: cpuhw-bhrb_entries[u_index].to = addr; -cpuhw-bhrb_entries[u_index].mispred = pred; -cpuhw-bhrb_entries[u_index].predicted = ~pred; +cpuhw-bhrb_entries[u_index].mispred = mispred; +cpuhw-bhrb_entries[u_index].predicted = +~mispred; This is much better! However, these are still bitwise rather than logical inversions. They will work, but would it be easier to understand if you used !mispred? Sure, will change it as well. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V9 02/13] powerpc, perf: Change type of the bhrb_users variable
On 06/25/2015 11:12 AM, Daniel Axtens wrote: -int bhrb_users; +unsigned intbhrb_users; OK, so this is a good start. A quick git grep for bhrb_users reveals this: perf/core-book3s.c: WARN_ON_ONCE(cpuhw-bhrb_users 0); That occurs in power_pmu_bhrb_disable, immediately following a decrement of bhrb_users. Now that the test can never be true, this patch should change the function to check if bhrb_users is 0 before decrementing. Sure. Would replace with 'WARN_ON_ONCE(!cpuhw-bhrb_users)' before decrementing bhrb_users in the function. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V9 00/13] powerpc, perf: Enable SW branch filters
On 06/25/2015 11:48 AM, Daniel Axtens wrote: Hi Anshuman, Thanks for your continued work on this. Given that the series is now at version 9 and is 13 patches long, I wonder if it might be worth splitting it up. Splitting it up completely or just keeping all the generic fixes and cleanups at the beginning of the series would be sufficient. Anyways I am willing to send them out separately if that helps. I'd suggest: - Patch 1 could be sent individually as it's a bug fix. Not with the generic cleanup group as proposed below ? - Separating out a series of simple cleanups would make the actual changes in your patch set easier to understand. Patches 2, 3 and 5 are obvious candidates. Agreed. Just that adding the first patch here will prevent a three way split of the entire series. - It looks like the changes in patch 6 aren't used by any of the following patches. It might be worth separating that out or just dropping it entirely. I guess you are talking about patch 7 powerpc, perf: Re organize PMU branch filter processing on POWER8. Patch 6 is getting used later on. That would give you a series with just: 4 powerpc, perf: Restore privilege level filter support for BHRB 7 powerpc, perf: Re organize PMU branch filter processing on POWER8 8 powerpc, perf: Change the name of HW PMU branch filter tracking variable 9 powerpc, lib: Add new branch analysis support functions 10 powerpc, perf: Enable SW filtering in branch stack sampling framework 11 powerpc, perf: Change POWER8 PMU configuration to work with SW filters 12 powerpc, perf: Enable privilege mode SW branch filters 13 selftests, powerpc: Add test for BHRB branch filters (HW SW) That might make it easier for you to start getting the ground work in, and make it easier for others to understand what you're trying to do. Sure, agreed. Here are the two set of patches after the proposed split. Patches are in the reverse order though. Hope this helps. Generic cleanups and fixes --- powerpc/perf: Re organize PMU branch filter processing on POWER8 powerpc/perf: Change name type of 'pred' in power_pmu_bhrb_read powerpc/perf: Replace last usage of get_cpu_var with this_cpu_ptr powerpc/perf: Change type of the bhrb_users variable powerpc/perf: Drop the branch sample when 'from' cannot be fetched BHRB SW branch filter -- selftests/powerpc: Add test for BHRB branch filters (HW SW) powerpc/perf: Enable privilege mode SW branch filters powerpc/perf: Change POWER8 PMU configuration to work with SW filters powerpc/perf: Enable SW filtering in branch stack sampling framework powerpc/lib: Add new branch analysis support functions powerpc/perf: Change the name of HW PMU branch filter tracking variable powerpc/perf: Re organize BHRB processing powerpc/perf: Restore privilege level filter support for BHRB Regards Anshuman ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V9 06/13] powerpc, perf: Re organize BHRB processing
On 06/25/2015 11:22 AM, Daniel Axtens wrote: +static void insert_branch(struct cpu_hw_events *cpuhw, +int index, u64 from, u64 to, int mispred) Given that your previous patch made mispred a bool, this could take a bool too. It could probably be an inline function as well. Sure. will change it. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] powerpc, tm: Drop tm_orig_msr from thread_struct
On 04/24/2015 10:31 AM, Anshuman Khandual wrote: On 04/20/2015 01:45 PM, Anshuman Khandual wrote: Currently tm_orig_msr is getting used during process context switch only. Then there is ckpt_regs which saves the checkpointed userspace context The MSR slot contained in ckpt_regs structure can be used during process context switch instead of tm_orig_msr, thus allowing us to drop it from thread_struct structure. This patch does that change. Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com --- This issue came up in the discussion regarding ptrace interface for TM specific registers https://lkml.org/lkml/2015/4/20/100, so just wanted to give this a try. The basic TM tests still pass after this change. Hey Michael/Mikey, Whats your thoughts on this ? Can we drop tm_orig_msr ? Just wanted some inputs/suggestions/thoughts on this idea. Did not hear from any one on this. Will it create any problem any where if we drop this variable. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [GIT PULL 00/13] perf/core improvements and fixes
Em Thu, Jun 25, 2015 at 09:31:41AM +0200, Ingo Molnar escreveu: Pulled, thanks a lot Arnaldo! Btw., one small thing I noticed about the status line in perf top: if I ever use 'f' to freeze/unfreeze events, the following message: Press 'f' to disable the events or 'h' to see other hotkeys sticks around forever, even after I look into annotation and exit it, etc. So I don't mind some default, helpful message there (such as 'Press 'h' to see hotkeys'), but it appears this particular message is context and usage sensitive, which wasn't really the goal, right? Agreed, some more work is needed to change that message in more places, will do it eventually. - Arnaldo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V9 04/13] powerpc, perf: Restore privilege level filter support for BHRB
On 06/25/2015 10:32 AM, Daniel Axtens wrote: diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 7a03cce..892340e 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -930,7 +930,7 @@ static int power_check_constraints(struct cpu_hw_events *cpuhw, * added events. */ static int check_excludes(struct perf_event **ctrs, unsigned int cflags[], - int n_prev, int n_new) + int n_prev, int n_new, int bhrb_users) Shouldn't this be an unsigned int too? Yeah it should be, will change it. -if (ppmu-flags PPMU_ARCH_207S) +if ((ppmu-flags PPMU_ARCH_207S) !bhrb_users) This is now different to the others. Now that bhrb_users is unsigned, I'm happy if you want to revert all of them to be like this, I was just concerned that if bhrb_users is an int, both 1 and -1 evaluate to true and I wasn't sure that was the desired behaviour. Sure, will change it. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/7]powerpc/powernv: Nest Instrumentation support
This patchset enables Nest Instrumentation support on powerpc. POWER8 has per-chip Nest Intrumentation which provides various per-chip metrics like memory, powerbus, Xlink and Alink bandwidth. Nest Instrumentation provides an interface (via PORE Engine) to configure and move the nest counter data to memory. From kernel side, OPAL Call interface is used to activate/deactivate PORE Engine for nest data collection. OPAL at boot, detects the feature, initializes it and pass on the nest units and other related information such as memory region, events supported so on, to kernel via device-tree. Kernel code then, parses the device-tree for nest pmu support and registers nest pmu with the events available. PORE Engine collects and accumulate nest counter data in per-chip reserved memory region, hence device-tree also exports per-chip nest accumulation memory region. And individual event offset are used as event configuration values. Here is sample perf usage to explain the interface. #./perf list iTLB-load-misses [Hardware cache event] Nest_Alink_BW/Alink0/ [Kernel PMU event] Nest_Alink_BW/Alink1/ [Kernel PMU event] Nest_Alink_BW/Alink2/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event] Nest_PowerBus_BW/External/ [Kernel PMU event] Nest_PowerBus_BW/Internal/ [Kernel PMU event] Nest_Xlink_BW/Xlink0/ [Kernel PMU event] Nest_Xlink_BW/Xlink1/ [Kernel PMU event] Nest_Xlink_BW/Xlink2/ [Kernel PMU event] rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] . # ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1 Performance counter stats for 'system wide': CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/ CPU3211,955.88 MiB Nest_Xlink_BW/Xlink1/ CPU6411,042.43 MiB Nest_Xlink_BW/Xlink1/ CPU9614,065.27 MiB Nest_Xlink_BW/Xlink1/ 1.001062038 seconds time elapsed # ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5 Performance counter stats for 'system wide': CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/ CPU32 7.50 MiB Nest_Alink_BW/Alink2/ CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/ CPU96 288.49 MiB Nest_Alink_BW/Alink2/ . OPAL side patches are posted in the skiboot mailing list. Changelog from v2: 1) Changed variable and macro names to be consistent. 2) Made changes to commit message and code comment messages 3) Moved format attribute related code from patch 6 to 5 4) Added check for pmu register function 5) Changed cpu_init and cpu_exit functions to use first online cpu of the chip, there by making code lot simplier. Changelog from v1: 1) No logic changes, re-ordered patches make each patch compile without errors 2) Added comments based on the review feedback. 3) removed perf_event_del function and replaced it with perf_event_stop. 4) Moved Nest feature detection code out of parser function. 5) Optimized functions and removed some variables. 6) squashed the makefile changes, instead of the separate patch 7) squashed the cpumask and hotplug patches as single patch 8) Added cpu checks in nest_change_cpu_context and nest_exit_cpu functions 9) Made changes to commit messages. Changelog
[PATCH v3 2/7]powerpc/powernv: Add OPAL support for Nest PMU
Nest Counters can be configured via PORE Engine and OPAL provides an interface to start/stop it. OPAL side patches are posted in the skiboot mailing. Cc: Stewart Smith stew...@linux.vnet.ibm.com Cc: Jeremy Kerr j...@ozlabs.org Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Michael Ellerman m...@ellerman.id.au Cc: Paul Mackerras pau...@samba.org Cc: Anton Blanchard an...@samba.org Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Cc: Anshuman Khandual khand...@linux.vnet.ibm.com Cc: Stephane Eranian eran...@google.com Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/include/asm/opal-api.h| 3 ++- arch/powerpc/include/asm/opal.h| 2 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 0321a90..8011a75 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -153,7 +153,8 @@ #define OPAL_FLASH_READ110 #define OPAL_FLASH_WRITE 111 #define OPAL_FLASH_ERASE 112 -#define OPAL_LAST 112 +#define OPAL_NEST_IMA_CONTROL 116 +#define OPAL_LAST 116 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 042af1a..f86e5e9 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -201,6 +201,8 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf, int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size, uint64_t token); +int64_t opal_nest_ima_control(uint32_t value); + /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data); diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index a7ade94..ce36a68 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -295,3 +295,4 @@ OPAL_CALL(opal_i2c_request, OPAL_I2C_REQUEST); OPAL_CALL(opal_flash_read, OPAL_FLASH_READ); OPAL_CALL(opal_flash_write,OPAL_FLASH_WRITE); OPAL_CALL(opal_flash_erase,OPAL_FLASH_ERASE); +OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 5/7]powerpc/powernv: add event attribute and group to nest pmu
Add code to create event/format attributes and attribute groups for each nest pmu. Cc: Michael Ellerman m...@ellerman.id.au Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Anton Blanchard an...@samba.org Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Cc: Anshuman Khandual khand...@linux.vnet.ibm.com Cc: Stephane Eranian eran...@google.com Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/perf/nest-pmu.c | 57 1 file changed, 57 insertions(+) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index 6116ff3..20ed9f8 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -13,6 +13,17 @@ static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS]; +PMU_FORMAT_ATTR(event, config:0-20); +struct attribute *p8_nest_format_attrs[] = { + format_attr_event.attr, + NULL, +}; + +struct attribute_group p8_nest_format_group = { + .name = format, + .attrs = p8_nest_format_attrs, +}; + static int nest_event_info(struct property *pp, char *start, struct nest_ima_events *p8_events, int flg, u32 val) { @@ -45,6 +56,48 @@ static int nest_event_info(struct property *pp, char *start, return 0; } +/* + * Populate event name and string in attribute + */ +struct attribute *dev_str_attr(const char *name, const char *str) +{ + struct perf_pmu_events_attr *attr; + + attr = kzalloc(sizeof(*attr), GFP_KERNEL); + + attr-event_str = str; + attr-attr.attr.name = name; + attr-attr.attr.mode = 0444; + attr-attr.show = perf_event_sysfs_show; + + return attr-attr.attr; +} + +int update_events_in_group( + struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu) +{ + struct attribute_group *attr_group; + struct attribute **attrs; + int i; + + /* Allocate memory for event attribute group */ + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) + + sizeof(*attr_group)), GFP_KERNEL); + if (!attr_group) + return -ENOMEM; + + attrs = (struct attribute **)(attr_group + 1); + attr_group-name = events; + attr_group-attrs = attrs; + + for (i = 0; i idx; i++, p8_events++) + attrs[i] = dev_str_attr((char *)p8_events-ev_name, + (char *)p8_events-ev_value); + + pmu-attr_groups[0] = attr_group; + return 0; +} + static int nest_pmu_create(struct device_node *dev, int pmu_index) { struct nest_ima_events **p8_events_arr, *p8_events; @@ -91,6 +144,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index) /* Save the name to register it later */ sprintf(buf, Nest_%s, (char *)pp-value); pmu_ptr-pmu.name = (char *)buf; + pmu_ptr-attr_groups[1] = p8_nest_format_group; continue; } @@ -122,6 +176,9 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index) idx++; } + update_events_in_group( + (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr); + return 0; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 1/7]powerpc/powernv: Data structure and macros definition
Create new header file nest-pmu.h to add the data structures and macros needed for the nest pmu support. Cc: Michael Ellerman m...@ellerman.id.au Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Anton Blanchard an...@samba.org Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Cc: Anshuman Khandual khand...@linux.vnet.ibm.com Cc: Stephane Eranian eran...@google.com Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/perf/nest-pmu.h | 53 1 file changed, 53 insertions(+) create mode 100644 arch/powerpc/perf/nest-pmu.h diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h new file mode 100644 index 000..ecb5d26 --- /dev/null +++ b/arch/powerpc/perf/nest-pmu.h @@ -0,0 +1,53 @@ +/* + * Nest Performance Monitor counter support for POWER8 processors. + * + * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include linux/perf_event.h +#include linux/slab.h +#include linux/of.h +#include linux/io.h +#include asm/opal.h + +#define P8_NEST_MAX_CHIPS 32 +#define P8_NEST_MAX_PMUS 32 +#define P8_NEST_MAX_PMU_NAME_LEN 256 +#define P8_NEST_MAX_EVENTS_SUPPORTED 256 +#define P8_NEST_ENGINE_START 1 +#define P8_NEST_ENGINE_STOP0 + +/* + * Structure to hold per chip specific memory address + * information for nest pmus. Nest Counter data are exported + * in per-chip reserved memory region by the PORE Engine. + */ +struct perchip_nest_info { + uint32_t chip_id; + uint64_t pbase; + uint64_t vbase; + uint32_t size; +}; + +/* + * Place holder for nest pmu events and values. + */ +struct nest_ima_events { + const char *ev_name; + const char *ev_value; +}; + +/* + * Device tree parser code detects nest pmu support and + * registers new nest pmus. This structure will + * hold the pmu functions and attrs for each nest pmu and + * will be referenced at the time of pmu registration. + */ +struct nest_pmu { + struct pmu pmu; + const struct attribute_group *attr_groups[4]; +}; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 4/7]powerpc/powernv: detect supported nest pmus and its events
Parse device tree to detect supported nest pmu units. Traverse through each nest pmu unit folder to find supported events and corresponding unit/scale files (if any). The nest unit event file from DT, will contain the offset in the reserved memory region to get the counter data for a given event. Kernel code uses this offset as event configuration value. Device tree parser code also looks for scale/unit in the file name and passes on the file as an event attr for perf tool to use in the post processing. Cc: Michael Ellerman m...@ellerman.id.au Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Anton Blanchard an...@samba.org Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Cc: Anshuman Khandual khand...@linux.vnet.ibm.com Cc: Stephane Eranian eran...@google.com Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/perf/nest-pmu.c | 124 ++- 1 file changed, 123 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index e7d45ed..6116ff3 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -11,6 +11,119 @@ #include nest-pmu.h static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS]; + +static int nest_event_info(struct property *pp, char *start, + struct nest_ima_events *p8_events, int flg, u32 val) +{ + char *buf; + + /* memory for event name */ + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + strncpy(buf, start, strlen(start)); + p8_events-ev_name = buf; + + /* memory for content */ + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (flg) { + /* string content*/ + if (!pp-value || + (strnlen(pp-value, pp-length) == pp-length)) + return -EINVAL; + + strncpy(buf, (const char *)pp-value, pp-length); + } else + sprintf(buf, event=0x%x, val); + + p8_events-ev_value = buf; + return 0; +} + +static int nest_pmu_create(struct device_node *dev, int pmu_index) +{ + struct nest_ima_events **p8_events_arr, *p8_events; + struct nest_pmu *pmu_ptr; + struct property *pp; + char *buf, *start; + const __be32 *lval; + u32 val; + int idx = 0, ret; + + if (!dev) + return -EINVAL; + + /* memory for nest pmus */ + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL); + if (!pmu_ptr) + return -ENOMEM; + + /* Needed for hotplug/migration */ + per_nest_pmu_arr[pmu_index] = pmu_ptr; + + /* memory for nest pmu events */ + p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64), + GFP_KERNEL); + if (!p8_events_arr) + return -ENOMEM; + p8_events = (struct nest_ima_events *)p8_events_arr; + + /* +* Loop through each property +*/ + for_each_property_of_node(dev, pp) { + start = pp-name; + + if (!strcmp(pp-name, name)) { + if (!pp-value || + (strnlen(pp-value, pp-length) == pp-length)) + return -EINVAL; + + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + /* Save the name to register it later */ + sprintf(buf, Nest_%s, (char *)pp-value); + pmu_ptr-pmu.name = (char *)buf; + continue; + } + + /* Skip these, we dont need it */ + if (!strcmp(pp-name, phandle) || + !strcmp(pp-name, device_type) || + !strcmp(pp-name, linux,phandle)) + continue; + + if (strncmp(pp-name, unit., 5) == 0) { + /* Skip first few chars in the name */ + start += 5; + ret = nest_event_info(pp, start, p8_events++, 1, 0); + } else if (strncmp(pp-name, scale., 6) == 0) { + /* Skip first few chars in the name */ + start += 6; + ret = nest_event_info(pp, start, p8_events++, 1, 0); + } else { + lval = of_get_property(dev, pp-name, NULL); + val = (uint32_t)be32_to_cpup(lval); + + ret = nest_event_info(pp, start, p8_events++, 0, val); + } + + if (ret) + return ret; + + /* book keeping
[PATCH v3 3/7]powerpc/powernv: Nest PMU detection and device tree parser
Create a file nest-pmu.c to contain nest pmu related functions. Code to detect nest pmu support and parser to collect per-chip reserved memory region information from device tree (DT). Detection mechanism is to look for specific property ibm,ima-chip in DT. For Nest pmu, device tree will have two set of information. 1) Per-chip reserved memory region for nest pmu counter collection area. 2) Supported Nest PMUs and events Device tree layout for the Nest PMU as follows. / -- DT root folder | -nest-ima -- Nest PMU folder | -ima-chip@chip-id -- Per-chip folder for reserved region information | -ibm,chip-id-- Chip id -ibm,ima-chip -reg-- HOMER PORE Nest Counter collection Address (RA) -size -- size to map in kernel space -Alink_BW-- Nest PMU folder | -Alink0 -- Nest PMU Alink Event file -scale.Alink0.scale -- Event scale file -unit.Alink0.unit -- Event unit file -device_type-- nest-ima-unit marker Subsequent patch will parse the next part of the DT to find various Nest PMUs and their events. Cc: Michael Ellerman m...@ellerman.id.au Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Anton Blanchard an...@samba.org Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Cc: Anshuman Khandual khand...@linux.vnet.ibm.com Cc: Stephane Eranian eran...@google.com Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/perf/Makefile | 2 +- arch/powerpc/perf/nest-pmu.c | 85 2 files changed, 86 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/perf/nest-pmu.c diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index f9c083a..6da656b 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \ power5+-pmu.o power6-pmu.o power7-pmu.o \ - power8-pmu.o + power8-pmu.o nest-pmu.o obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c new file mode 100644 index 000..e7d45ed --- /dev/null +++ b/arch/powerpc/perf/nest-pmu.c @@ -0,0 +1,85 @@ +/* + * Nest Performance Monitor counter support for POWER8 processors. + * + * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include nest-pmu.h + +static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; + +static int nest_ima_dt_parser(void) +{ + const __be32 *gcid; + const __be64 *chip_ima_reg; + const __be64 *chip_ima_size; + struct device_node *dev; + struct perchip_nest_info *p8ni; + int idx; + + /* +* nest-ima folder contains two things, +* a) per-chip reserved memory region for Nest PMU Counter data +* b) Support Nest PMU units and their event files +*/ + for_each_node_with_property(dev, ibm,ima-chip) { + gcid = of_get_property(dev, ibm,chip-id, NULL); + chip_ima_reg = of_get_property(dev, reg, NULL); + chip_ima_size = of_get_property(dev, size, NULL); + + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) { + pr_err(Nest_PMU: device %s missing property\n, + dev-full_name); + return -ENODEV; + } + + /* chip id to save reserve memory region */ + idx = (uint32_t)be32_to_cpup(gcid); + + /* +* Using a local variable to make it compact and +* easier to read +*/ + p8ni = p8_nest_perchip_info[idx]; + p8ni-pbase = be64_to_cpup(chip_ima_reg); + p8ni-size = be64_to_cpup(chip_ima_size); + p8ni-vbase = (uint64_t) phys_to_virt(p8ni-pbase); + } + + return 0; +} + +static int __init nest_pmu_init(void) +{ + int ret = -ENODEV; + + /* +* Lets do this only if we are hypervisor +*/ + if (!cur_cpu_spec-oprofile_cpu_type || + !(strcmp(cur_cpu_spec-oprofile_cpu_type, ppc64/power8) == 0) || + !cpu_has_feature(CPU_FTR_HVMODE)) + return ret; + + /* +* Nest PMU information is grouped under nest-ima node +* of the top-level device-tree directory. Detect Nest PMU +
[PATCH v3 6/7]powerpc/powernv: generic nest pmu event functions
Add set of generic nest pmu related event functions to be used by each nest pmu. Add code to register nest pmus. Cc: Michael Ellerman m...@ellerman.id.au Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Anton Blanchard an...@samba.org Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Cc: Anshuman Khandual khand...@linux.vnet.ibm.com Cc: Stephane Eranian eran...@google.com Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/perf/nest-pmu.c | 104 +++ 1 file changed, 104 insertions(+) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index 20ed9f8..c2ada13 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -24,6 +24,100 @@ struct attribute_group p8_nest_format_group = { .attrs = p8_nest_format_attrs, }; +static int p8_nest_event_init(struct perf_event *event) +{ + int chip_id; + + if (event-attr.type != event-pmu-type) + return -ENOENT; + + /* Sampling not supported yet */ + if (event-hw.sample_period) + return -EINVAL; + + /* unsupported modes and filters */ + if (event-attr.exclude_user || + event-attr.exclude_kernel || + event-attr.exclude_hv || + event-attr.exclude_idle || + event-attr.exclude_host || + event-attr.exclude_guest) + return -EINVAL; + + if (event-cpu 0) + return -EINVAL; + + chip_id = topology_physical_package_id(event-cpu); + event-hw.event_base = event-attr.config + + p8_nest_perchip_info[chip_id].vbase; + + return 0; +} + +static void p8_nest_read_counter(struct perf_event *event) +{ + uint64_t *addr; + u64 data = 0; + + addr = (u64 *)event-hw.event_base; + data = __be64_to_cpu(*addr); + local64_set(event-hw.prev_count, data); +} + +static void p8_nest_perf_event_update(struct perf_event *event) +{ + u64 counter_prev, counter_new, final_count; + uint64_t *addr; + + addr = (uint64_t *)event-hw.event_base; + counter_prev = local64_read(event-hw.prev_count); + counter_new = __be64_to_cpu(*addr); + final_count = counter_new - counter_prev; + + local64_set(event-hw.prev_count, counter_new); + local64_add(final_count, event-count); +} + +static void p8_nest_event_start(struct perf_event *event, int flags) +{ + event-hw.state = 0; + p8_nest_read_counter(event); +} + +static void p8_nest_event_stop(struct perf_event *event, int flags) +{ + if (flags PERF_EF_UPDATE) + p8_nest_perf_event_update(event); +} + +static int p8_nest_event_add(struct perf_event *event, int flags) +{ + if (flags PERF_EF_START) + p8_nest_event_start(event, flags); + + return 0; +} + +/* + * Populate pmu ops in the structure + */ +static int update_pmu_ops(struct nest_pmu *pmu) +{ + if (!pmu) + return -EINVAL; + + pmu-pmu.task_ctx_nr = perf_invalid_context; + pmu-pmu.event_init = p8_nest_event_init; + pmu-pmu.add = p8_nest_event_add; + pmu-pmu.del = p8_nest_event_stop; + pmu-pmu.start = p8_nest_event_start; + pmu-pmu.stop = p8_nest_event_stop; + pmu-pmu.read = p8_nest_perf_event_update; + pmu-pmu.attr_groups = pmu-attr_groups; + + return 0; +} + static int nest_event_info(struct property *pp, char *start, struct nest_ima_events *p8_events, int flg, u32 val) { @@ -179,6 +273,16 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index) update_events_in_group( (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr); + update_pmu_ops(pmu_ptr); + /* Register the pmu */ + ret = perf_pmu_register(pmu_ptr-pmu, pmu_ptr-pmu.name, -1); + if (ret) { + pr_err(Nest PMU %s Register failed\n, pmu_ptr-pmu.name); + return ret; + } + + pr_info(%s performance monitor hardware support registered\n, + pmu_ptr-pmu.name); return 0; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 7/7]powerpc/powernv: nest pmu cpumask and cpu hotplug support
Adds cpumask attribute to be used by each nest pmu since nest units are per-chip. Only one cpu (first online cpu) from each node/chip is designated to read counters. On cpu hotplug, dying cpu is checked to see whether it is one of the designated cpus, if yes, next online cpu from the same node/chip is designated as new cpu to read counters. Cc: Michael Ellerman m...@ellerman.id.au Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Anton Blanchard an...@samba.org Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Cc: Anshuman Khandual khand...@linux.vnet.ibm.com Cc: Stephane Eranian eran...@google.com Cc: Preeti U Murthy pre...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/perf/nest-pmu.c | 146 +++ 1 file changed, 146 insertions(+) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index c2ada13..31943c5 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -12,6 +12,7 @@ static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS]; +static cpumask_t nest_pmu_cpu_mask; PMU_FORMAT_ATTR(event, config:0-20); struct attribute *p8_nest_format_attrs[] = { @@ -24,6 +25,147 @@ struct attribute_group p8_nest_format_group = { .attrs = p8_nest_format_attrs, }; +static ssize_t nest_pmu_cpumask_get_attr(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return cpumap_print_to_pagebuf(true, buf, nest_pmu_cpu_mask); +} + +static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL); + +static struct attribute *nest_pmu_cpumask_attrs[] = { + dev_attr_cpumask.attr, + NULL, +}; + +static struct attribute_group nest_pmu_cpumask_attr_group = { + .attrs = nest_pmu_cpumask_attrs, +}; + +static void nest_init(void *dummy) +{ + opal_nest_ima_control(P8_NEST_ENGINE_START); +} + +static void nest_change_cpu_context(int old_cpu, int new_cpu) +{ + int i; + + for (i = 0; per_nest_pmu_arr[i] != NULL; i++) + perf_pmu_migrate_context(per_nest_pmu_arr[i]-pmu, + old_cpu, new_cpu); +} + +static void nest_exit_cpu(int cpu) +{ + int nid, target = -1; + struct cpumask *l_cpumask; + + /* +* Check in the designated list for this cpu. Dont bother +* if not one of them. +*/ + if (!cpumask_test_and_clear_cpu(cpu, nest_pmu_cpu_mask)) + return; + + /* +* Now that this cpu is one of the designated, +* find a next cpu a) which is online and b) in same chip. +*/ + nid = cpu_to_node(cpu); + l_cpumask = cpumask_of_node(nid); + target = cpumask_next(cpu, l_cpumask); + + /* +* Update the cpumask with the target cpu and +* migrate the context if needed +*/ + if (target = 0 target = nr_cpu_ids) { + cpumask_set_cpu(target, nest_pmu_cpu_mask); + nest_change_cpu_context(cpu, target); + } +} + +static void nest_init_cpu(int cpu) +{ + int nid, fcpu, ncpu; + struct cpumask *l_cpumask, tmp_mask; + + nid = cpu_to_node(cpu); + l_cpumask = cpumask_of_node(nid); + + /* +* if empty cpumask, just add incoming cpu and move on. +*/ + if (!cpumask_and(tmp_mask, l_cpumask, nest_pmu_cpu_mask)) { + cpumask_set_cpu(cpu, nest_pmu_cpu_mask); + return; + } + + /* +* Alway have the first online cpu of a chip as designated one. +*/ + fcpu = cpumask_first(l_cpumask); + ncpu = cpumask_next(cpu, l_cpumask); + if (cpu == fcpu) { + if (cpumask_test_and_clear_cpu(ncpu, nest_pmu_cpu_mask)) { + cpumask_set_cpu(cpu, nest_pmu_cpu_mask); + nest_change_cpu_context(ncpu, cpu); + } + } +} + +static int nest_pmu_cpu_notifier(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + long cpu = (long)hcpu; + + switch (action ~CPU_TASKS_FROZEN) { + case CPU_ONLINE: + nest_init_cpu(cpu); + break; + case CPU_DOWN_PREPARE: + nest_exit_cpu(cpu); + break; + default: + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block nest_pmu_cpu_nb = { + .notifier_call = nest_pmu_cpu_notifier, + .priority = CPU_PRI_PERF + 1, +}; + +void nest_pmu_cpumask_init(void) +{ + const struct cpumask *l_cpumask; + int cpu, nid; + + cpu_notifier_register_begin(); + + /* +* Nest PMUs are per-chip counters. So designate a cpu +* from
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Tue, 23 Jun 2015, Vlastimil Babka wrote: On 06/15/2015 04:43 PM, Eric B Munson wrote: Note that the semantic of MAP_LOCKED can be subtly surprising: mlock(2) fails if the memory range cannot get populated to guarantee that no future major faults will happen on the range. mmap(MAP_LOCKED) on the other hand silently succeeds even if the range was populated only partially. ( from http://marc.info/?l=linux-mmm=143152790412727w=2 ) So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's sufficient reason not to extend mmap by new mlock() flags that can be instead applied to the VMA after mmapping, using the proposed mlock2() with flags. So I think instead we could deprecate MAP_LOCKED more prominently. I doubt the overhead of calling the extra syscall matters here? We could talk about retiring the MAP_LOCKED flag but I suspect that would get significantly more pushback than adding a new mmap flag. Oh no we can't retire as in remove the flag, ever. Just not continue the way of mmap() flags related to mlock(). Likely that the overhead does not matter in most cases, but presumably there are cases where it does (as we have a MAP_LOCKED flag today). Even with the proposed new system calls I think we should have the MAP_LOCKONFAULT for parity with MAP_LOCKED. I'm not convinced, but it's not a major issue. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. Do you disagree with the use cases I have listed or do you think there is a better way of addressing those cases? I'm somewhat sceptical about the security one. Are security sensitive buffers that large to matter? The performance one is more convincing and I don't see a better way, so OK. They can be, the two that come to mind are medical images and high resolution sensor data. What do others think? signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Thu, Jun 25, 2015 at 7:16 AM, Eric B Munson emun...@akamai.com wrote: On Tue, 23 Jun 2015, Vlastimil Babka wrote: On 06/15/2015 04:43 PM, Eric B Munson wrote: If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. Do you disagree with the use cases I have listed or do you think there is a better way of addressing those cases? I'm somewhat sceptical about the security one. Are security sensitive buffers that large to matter? The performance one is more convincing and I don't see a better way, so OK. They can be, the two that come to mind are medical images and high resolution sensor data. I think we've been handling sensitive memory pages wrong forever. We shouldn't lock them into memory; we should flag them as sensitive and encrypt them if they're ever written out to disk. --Andy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault
On Wed, 24 Jun 2015, Michal Hocko wrote: On Mon 22-06-15 10:18:06, Eric B Munson wrote: On Mon, 22 Jun 2015, Michal Hocko wrote: On Fri 19-06-15 12:43:33, Eric B Munson wrote: [...] Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the new MAP_LOCKONFAULT flag (or both)? I thought the MAP_FAULTPOPULATE (or any other better name) would directly translate into VM_FAULTPOPULATE and wouldn't be tight to the locked semantic. We already have VM_LOCKED for that. The direct effect of the flag would be to prevent from population other than the direct page fault - including any speculative actions like fault around or read-ahead. I like the ability to control other speculative population, but I am not sure about overloading it with the VM_LOCKONFAULT case. Here is my concern. If we are using VM_FAULTPOPULATE | VM_LOCKED to denote LOCKONFAULT, how can we tell the difference between someone that wants to avoid read-ahead and wants to use mlock()? Not sure I understand. Something like? addr = mmap(VM_FAULTPOPULATE) # To prevent speculative mappings into the vma [...] mlock(addr, len) # Now I want the full mlock semantic So this leaves us without the LOCKONFAULT semantics? That is not at all what I am looking for. What I want is a way to express 3 possible states of a VMA WRT locking, locked (populated and all pages on the unevictable LRU), lock on fault (populated by page fault, pages that are present are on the unevictable LRU, newly faulted pages are added to same), and not locked. and the later to have the full mlock semantic and populate the given area regardless of VM_FAULTPOPULATE being set on the vma? This would be an interesting question because mlock man page clearly states the semantic and that is to _always_ populate or fail. So I originally thought that it would obey VM_FAULTPOPULATE but this needs a more thinking. This might lead to some interesting states with mlock() and munlock() that take flags. For instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by munlock(MLOCK_LOCKED) leaves the VMAs in the same state with VM_LOCKONFAULT set. This is really confusing. Let me try to rephrase that. So you have mlock(addr, len, MLOCK_ONFAULT) munlock(addr, len, MLOCK_LOCKED) IIUC you would expect the vma still being MLOCK_ONFAULT, right? Isn't that behavior strange and unexpected? First of all, munlock has traditionally dropped the lock on the address range (e.g. what should happen if you did plain old munlock(addr, len)). But even without that. You are trying to unlock something that hasn't been locked the same way. So I would expect -EINVAL at least, if the two modes should be really represented by different flags. I would expect it to remain MLOCK_LOCKONFAULT because the user requested munlock(addr, len, MLOCK_LOCKED). It is not currently an error to unlock memory that is not locked. We do this because we do not require the user track what areas are locked. It is acceptable to have a mostly locked area with holes unlocked with a single call to munlock that spans the entire area. The same semantics should hold for munlock with flags. If I have an area with MLOCK_LOCKED and MLOCK_ONFAULT interleaved, it should be acceptable to clear the MLOCK_ONFAULT flag from those areas with a single munlock call that spans the area. On top of continuing with munlock semantics, the implementation would need the ability to rollback an munlock call if it failed after altering VMAs. If we have the same interleaved area as before and we go to return -EINVAL the first time we hit an area that was MLOCK_LOCKED, how do we restore the state of the VMAs we have already processed, and possibly merged/split? Or did you mean the both types of lock like: mlock(addr, len, MLOCK_ONFAULT) | mmap(MAP_LOCKONFAULT) mlock(addr, len, MLOCK_LOCKED) munlock(addr, len, MLOCK_LOCKED) and that should keep MLOCK_ONFAULT? This sounds even more weird to me because that means that the vma in question would be locked by two different mechanisms. MLOCK_LOCKED with the always populate semantic would rule out MLOCK_ONFAULT so what would be the meaning of the other flag then? Also what should regular munlock(addr, len) without flags unlock? Both? This is indeed confusing and not what I was trying to illustrate, but since you bring it up. mlockall() currently clears all flags and then sets the new flags with each subsequent call. mlock2 would use that same behavior, if LOCKED was specified for a ONFAULT region, that region would become LOCKED and vice versa. I have the new system call set ready, I am waiting to post for rc1 so I can run the benchmarks again on a base more stable than the middle of a merge window. We should wait to hash out implementations until the code is up rather than talk past eachother here. If we use VM_FAULTPOPULATE, the same pair of calls would clear VM_LOCKED, but leave
Re: powerpc,numa: Memory hotplug to memory-less nodes ?
On 24.06.2015 [07:13:36 -0500], Nathan Fontenot wrote: On 06/23/2015 11:01 PM, Bharata B Rao wrote: So will it be correct to say that memory hotplug to memory-less node isn't supported by PowerPC kernel ? Should I enforce the same in QEMU for PowerKVM ? I'm not sure if that is correct. It appears that we initialize all online nodes, even those without spanned_pages, at boot time. This occurs in setup_node_data() called from initmem_init(). Looking at this I would think that we could add memory to any online node even if it does not have any spanned_pages. I think an interesting test we be to check for the node being online instead of checking to see if it has any memory. I see no *technical* reason we should't be able to hotplug to an initially memoryless node. I'm not sure it happens in practice under PowerVM (where we have far less control of the topology anyways). One aspect of this that I have on my todo list is seeing what SLUB does when a node goes from memoryless to populated -- as during boot memoryless nodes get a 'useless' per node structure (early_kmem_cache_node_alloc). I can look at this a bit under KVM maybe later this week myself to see what happens in a guest. -Nish ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [8/8] powerpc/perf: cleanup in perf_event_print_debug()
On Thu, 2015-11-06 at 08:43:37 UTC, Madhavan Srinivasan wrote: From: Janani janan...@linux.vnet.ibm.com Code cleanup/fix in perf_event_print_debug(). Performance Monitoring Unit (PMU) registers in the server side are 64bit long. No they're not, see the ISA, figure 17. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,9/9] fsl/fman: Add FMan MAC driver
(Evolution 3.16 is basically unbearable for replying to patches. Anyone else running into this?) On Wed, 2015-06-24 at 22:37 +0300, igal.liber...@freescale.com wrote: --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/mac/mac-api.c +int set_mac_active_pause(struct mac_device *mac_dev, bool rx, bool tx) +{ + [...] +} +EXPORT_SYMBOL(set_mac_active_pause); Which module is using this function? +void get_pause_cfg(struct mac_device *mac_dev, bool *rx_pause, bool *tx_pause) +{ + [...] +} +EXPORT_SYMBOL(get_pause_cfg); This exports a function that is only used in this file. Why? Thanks, Paul Bolle ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,8/9] fsl/fman: Add FMan Port Support
On Wed, 2015-06-24 at 22:37 +0300, igal.liber...@freescale.com wrote: --- a/drivers/net/ethernet/freescale/fman/fm_drv.c +++ b/drivers/net/ethernet/freescale/fman/fm_drv.c +struct fm_port_t *fm_port_drv_handle(const struct fm_port_drv_t *port) +{ + return port-fm_port; +} +EXPORT_SYMBOL(fm_port_drv_handle); I couldn't find any users of this function. +void fm_port_get_buff_layout_ext_params(struct fm_port_drv_t *port, +struct fm_port_params *params) (Evolution 3.16 is a piece of ...). +{ + params-data_align = 0; +} +EXPORT_SYMBOL(fm_port_get_buff_layout_ext_params); Ditto. +int fm_get_tx_port_channel(struct fm_port_drv_t *port) +{ + return port-tx_ch; +} +EXPORT_SYMBOL(fm_get_tx_port_channel); Ditto. --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/fm_port_drv.c +void fm_set_rx_port_params(struct fm_port_drv_t *port, +struct fm_port_params *params) +{ + [...] +} +EXPORT_SYMBOL(fm_set_rx_port_params); Ditto. (If you hear about my arrest for randomly attacking innocent people: blame evolution 3.16!) +void fm_set_tx_port_params(struct fm_port_drv_t *port, +struct fm_port_params *params) +{ + [...] +} +EXPORT_SYMBOL(fm_set_tx_port_params); Ditto. --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/port/fm_port.c +u64 *fm_port_get_buffer_time_stamp(const struct fm_port_t *p_fm_port, +char *p_data) +{ + [...] +} +EXPORT_SYMBOL(fm_port_get_buffer_time_stamp); Ditto. +int fm_port_disable(struct fm_port_t *p_fm_port) +{ + [...] +} +EXPORT_SYMBOL(fm_port_disable); This exports a function that I think is only used inside this file. +int fm_port_enable(struct fm_port_t *p_fm_port) +{ + [...] +} +EXPORT_SYMBOL(fm_port_enable); And here I could again find no users of this function. Thanks, Paul Bolle ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,9/9] fsl/fman: Add FMan MAC driver
On Thu, 2015-06-25 at 19:59 -0500, Scott Wood wrote: On Fri, 2015-06-26 at 01:06 +0200, Paul Bolle wrote: (Evolution 3.16 is basically unbearable for replying to patches. Anyone else running into this?) If you mean the crazy lag when selecting moderate-to-large amounts of text (for snipping), yes. I recommend the external editor plugin with vim. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,9/9] fsl/fman: Add FMan MAC driver
On Fri, 2015-06-26 at 12:21 +1000, Michael Ellerman wrote: On Thu, 2015-06-25 at 19:59 -0500, Scott Wood wrote: On Fri, 2015-06-26 at 01:06 +0200, Paul Bolle wrote: (Evolution 3.16 is basically unbearable for replying to patches. Anyone else running into this?) If you mean the crazy lag when selecting moderate-to-large amounts of text (for snipping), yes. I recommend the external editor plugin with vim. I tried the external editor plugin (not with vim) and it failed to bring the externally made edits back into the evolution compose window. It then started spastically respawning the external editor without my doing anything. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,5/9] fsl/fman: Add Frame Manager support
On Wed, 2015-06-24 at 22:35 +0300, igal.liber...@freescale.com wrote: From: Igal Liberman igal.liber...@freescale.com Add Frame Manger Driver support. This patch adds The FMan configuration, initialization and runtime control routines. Signed-off-by: Igal Liberman igal.liber...@freescale.com --- drivers/net/ethernet/freescale/fman/Kconfig| 35 + drivers/net/ethernet/freescale/fman/Makefile |2 +- drivers/net/ethernet/freescale/fman/fm.c | 1406 drivers/net/ethernet/freescale/fman/fm.h | 394 ++ drivers/net/ethernet/freescale/fman/fm_common.h| 142 ++ drivers/net/ethernet/freescale/fman/fm_drv.c | 701 ++ drivers/net/ethernet/freescale/fman/fm_drv.h | 116 ++ drivers/net/ethernet/freescale/fman/inc/enet_ext.h | 199 +++ drivers/net/ethernet/freescale/fman/inc/fm_ext.h | 488 +++ .../net/ethernet/freescale/fman/inc/fsl_fman_drv.h | 99 ++ drivers/net/ethernet/freescale/fman/inc/service.h | 55 + 11 files changed, 3636 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/fman/fm.c create mode 100644 drivers/net/ethernet/freescale/fman/fm.h create mode 100644 drivers/net/ethernet/freescale/fman/fm_common.h create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.c create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.h create mode 100644 drivers/net/ethernet/freescale/fman/inc/enet_ext.h create mode 100644 drivers/net/ethernet/freescale/fman/inc/fm_ext.h create mode 100644 drivers/net/ethernet/freescale/fman/inc/fsl_fman_drv.h create mode 100644 drivers/net/ethernet/freescale/fman/inc/service.h Again, please start with something pared down, without extraneous features, but *with* enough functionality to actually pass packets around. Getting this thing into decent shape is going to be hard enough without carrying around the excess baggage. diff --git a/drivers/net/ethernet/freescale/fman/Kconfig b/drivers/net/ethernet/freescale/fman/Kconfig index 825a0d5..12c75bfd 100644 --- a/drivers/net/ethernet/freescale/fman/Kconfig +++ b/drivers/net/ethernet/freescale/fman/Kconfig @@ -7,3 +7,38 @@ config FSL_FMAN Freescale Data-Path Acceleration Architecture Frame Manager (FMan) support +if FSL_FMAN + +config FSL_FM_MAX_FRAME_SIZE + int Maximum L2 frame size + range 64 9600 + default 1522 + help + Configure this in relation to the maximum possible MTU of your + network configuration. In particular, one would need to + increase this value in order to use jumbo frames. + FSL_FM_MAX_FRAME_SIZE must accommodate the Ethernet FCS + (4 bytes) and one ETH+VLAN header (18 bytes), to a total of + 22 bytes in excess of the desired L3 MTU. + + Note that having too large a FSL_FM_MAX_FRAME_SIZE (much larger + than the actual MTU) may lead to buffer exhaustion, especially + in the case of badly fragmented datagrams on the Rx path. + Conversely, having a FSL_FM_MAX_FRAME_SIZE smaller than the + actual MTU will lead to frames being dropped. Scatter gather can't be used for jumbo frames? Why is this a compile-time option? + +config FSL_FM_RX_EXTRA_HEADROOM + int Add extra headroom at beginning of data buffers + range 16 384 + default 64 + help + Configure this to tell the Frame Manager to reserve some extra + space at the beginning of a data buffer on the receive path, + before Internal Context fields are copied. This is in addition + to the private data area already reserved for driver internal + use. The provided value must be a multiple of 16. + + This option does not affect in any way the layout of + transmitted buffers. There's nothing here to indicate when a user would want to do this. Why is this a compile-time option? + /* FManV3H */ + else if (minor == 0 || minor == 2 || minor == 3) { + intg-fm_muram_size = 384 * 1024; + intg-fm_iram_size = 64 * 1024; + intg-fm_num_of_ctrl= 4; + + intg-bmi_max_num_of_tasks = 128; + intg-bmi_max_num_of_dmas = 84; + + intg-num_of_rx_ports = 8; + } else { + pr_err(Unsupported FManv3 version\n); + kfree(intg); + return NULL; + } + + break; + default: + pr_err(Unsupported FMan version\n); + kfree(intg); + return NULL; + } Don't duplicate error paths. Use goto like the rest of the kernel. + + intg-bmi_max_fifo_size =
Re: [PATCH v4] powerpc/rcpm: add RCPM driver
On Tue, 2015-06-23 at 16:07 +0800, yuantian.t...@freescale.com wrote: From: Tang Yuantian yuantian.t...@freescale.com There is a RCPM (Run Control/Power Management) in Freescale QorIQ series processors. The device performs tasks associated with device run control and power management. The driver implements some features: mask/unmask irq, enter/exit low power states, freeze time base, etc. Signed-off-by: Chenhui Zhao chenhui.z...@freescale.com Signed-off-by: Tang Yuantian yuantian.t...@freescale.com --- v4: - refine bindings document v3: - added static and __init modifier to fsl_rcpm_init v2: - fix code style issues - refine compatible string match part Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 42 +++ arch/powerpc/include/asm/fsl_guts.h| 105 +++ arch/powerpc/include/asm/fsl_pm.h | 48 +++ arch/powerpc/platforms/85xx/Kconfig| 1 + arch/powerpc/sysdev/Kconfig| 5 + arch/powerpc/sysdev/Makefile | 1 + arch/powerpc/sysdev/fsl_rcpm.c | 338 + 7 files changed, 540 insertions(+) create mode 100644 Documentation/devicetree/bindings/soc/fsl/rcpm.txt create mode 100644 arch/powerpc/include/asm/fsl_pm.h create mode 100644 arch/powerpc/sysdev/fsl_rcpm.c diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt new file mode 100644 index 000..1f58018 --- /dev/null +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt @@ -0,0 +1,42 @@ +* Run Control and Power Management + +The RCPM performs all device-level tasks associated with device run control +and power management. + +Required properites: + - reg : Offset and length of the register set of RCPM block. + - compatible : Sould contain a chip-specific RCPM block compatible string + and (if applicable) may contain a chassis-version RCPM compatible string. + Chip-specific strings are of the form fsl,chip-rcpm, such as: + * fsl,p2041-rcpm + * fsl,p3041-rcpm + * fsl,p4080-rcpm + * fsl,p5020-rcpm + * fsl,p5040-rcpm + * fsl,t4240-rcpm + * fsl,b4420-rcpm + * fsl,b4860-rcpm + + Chassis-version RCPM strings include: + * fsl,qoriq-rcpm-1.0: for chassis 1.0 rcpm + * fsl,qoriq-rcpm-2.0: for chassis 2.0 rcpm + +All references to 1.0 and 2.0 refer to the QorIQ chassis version to +which the chip complies. +Chassis Version Example Chips +--- --- +1.0 p4080, p5020, p5040, p2041, p3041 +2.0 t4240, b4860, t1040, b4420 I don't think it's accurate to call t1040 chassis 2.0. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,3/9] fsl/fman: Add the FMan MAC FLIB
On Wed, 2015-06-24 at 22:34 +0300, igal.liber...@freescale.com wrote: From: Igal Liberman igal.liber...@freescale.com The FMan MAC FLib provides basic API used by the drivers to configure and control the FMan MAC hardware. Signed-off-by: Igal Liberman igal.liber...@freescale.com ... +int fman_dtsec_mii_write_reg(struct dtsec_mii_reg __iomem *regs, uint8_t addr, + uint8_t reg, uint16_t data, uint16_t dtsec_freq) +{ + uint32_t tmp; + + /* Setup the MII Mgmt clock speed */ + iowrite32be((uint32_t)dtsec_mii_get_div(dtsec_freq), regs-miimcfg); + /* Memory barrier */ + wmb(); + + /* Stop the MII management read cycle */ + iowrite32be(0, regs-miimcom); + /* Dummy read to make sure MIIMCOM is written */ + tmp = ioread32be(regs-miimcom); + /* Memory barrier */ + wmb(); + + /* Setting up MII Management Address Register */ + tmp = (uint32_t)((addr MIIMADD_PHY_ADDR_SHIFT) | reg); + iowrite32be(tmp, regs-miimadd); + /* Memory barrier */ + wmb(); + + /* Setting up MII Management Control Register with data */ + iowrite32be((uint32_t)data, regs-miimcon); + /* Dummy read to make sure MIIMCON is written */ + tmp = ioread32be(regs-miimcon); + /* Memory barrier */ + wmb(); iowrite32be() should already contain a memory barrier. + + /* Wait until MII management write is complete */ + /* todo: a timeout could be useful here */ + while ((ioread32be(regs-miimind)) MIIMIND_BUSY) + ; /* busy wait */ + + return 0; +} Please add the timeout. + /* Read MII management status */ + *data = (uint16_t)ioread32be(regs-miimstat); Unnecessary cast (please check for these throughout the patchset). There are also casts in this patchset that are only needed because a variable was unnecessarily defined with a smaller-than-32-bit data type. +void fman_memac_reset(struct memac_regs __iomem *regs) +{ + uint32_t tmp; + + tmp = ioread32be(regs-command_config); + + tmp |= CMD_CFG_SW_RESET; + + iowrite32be(tmp, regs-command_config); + + while (ioread32be(regs-command_config) CMD_CFG_SW_RESET) + ; +} Timeout, here and in all such loops. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,5/9] fsl/fman: Add Frame Manager support
On Fri, 2015-06-26 at 01:53 +0200, Paul Bolle wrote: So I decided to pick one subject: exports. I think I had something to comment on all eight of them. s/eight/twelve/ Paul Bolle ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,5/9] fsl/fman: Add Frame Manager support
On Wed, 2015-06-24 at 22:35 +0300, igal.liber...@freescale.com wrote: --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/fm_drv.c +u16 fm_get_max_frm(void) +{ + return fsl_fm_max_frm; +} +EXPORT_SYMBOL(fm_get_max_frm); Which module is using this export? (And what does this function actually do?) +int fm_get_rx_extra_headroom(void) +{ + return ALIGN(fsl_fm_rx_extra_headroom, 16); +} +EXPORT_SYMBOL(fm_get_rx_extra_headroom); This exports an unused function. I don't know how to, well, review a series that adds almost 20K lines. So I decided to pick one subject: exports. I think I had something to comment on all eight of them. I'm not sure if I'll try another scan with a different subject. Thanks, Paul Bolle ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,9/9] fsl/fman: Add FMan MAC driver
On Fri, 2015-06-26 at 01:06 +0200, Paul Bolle wrote: (Evolution 3.16 is basically unbearable for replying to patches. Anyone else running into this?) If you mean the crazy lag when selecting moderate-to-large amounts of text (for snipping), yes. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,1/9] fsl/fman: Add the FMan FLIB
On Wed, 2015-06-24 at 22:33 +0300, igal.liber...@freescale.com wrote: From: Igal Liberman igal.liber...@freescale.com The FMan FLib provides the basic API used by the FMan drivers to configure and control the FMan hardware. Signed-off-by: Igal Liberman igal.liber...@freescale.com Again, what is an FLib? What determines whether content should go in the flib directory? The patch title says Add the FMan FLIB, but there's more code added outside the flib directory than inside. FMan drivers? There's more than one? What is drivers/net/ethernet/freescale/fman/fman.c if not the FMan driver? What is the FMan driver if not the code to configure and control the FMan hardware? If this is a public API, where's the documentation? --- drivers/net/ethernet/freescale/Kconfig |1 + drivers/net/ethernet/freescale/Makefile|2 + drivers/net/ethernet/freescale/fman/Kconfig|7 + drivers/net/ethernet/freescale/fman/Makefile |5 + .../net/ethernet/freescale/fman/flib/fsl_fman.h| 608 drivers/net/ethernet/freescale/fman/fman.c | 975 6 files changed, 1598 insertions(+) create mode 100644 drivers/net/ethernet/freescale/fman/Kconfig create mode 100644 drivers/net/ethernet/freescale/fman/Makefile create mode 100644 drivers/net/ethernet/freescale/fman/flib/fsl_fman.h create mode 100644 drivers/net/ethernet/freescale/fman/fman.c diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig index 25e3425..24e938d 100644 --- a/drivers/net/ethernet/freescale/Kconfig +++ b/drivers/net/ethernet/freescale/Kconfig @@ -55,6 +55,7 @@ config FEC_MPC52xx_MDIO If compiled as module, it will be called fec_mpc52xx_phy. source drivers/net/ethernet/freescale/fs_enet/Kconfig +source drivers/net/ethernet/freescale/fman/Kconfig config FSL_PQ_MDIO tristate Freescale PQ MDIO diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile index 71debd1..4097c58 100644 --- a/drivers/net/ethernet/freescale/Makefile +++ b/drivers/net/ethernet/freescale/Makefile @@ -17,3 +17,5 @@ gianfar_driver-objs := gianfar.o \ gianfar_ethtool.o obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o + +obj-$(CONFIG_FSL_FMAN) += fman/ diff --git a/drivers/net/ethernet/freescale/fman/Kconfig b/drivers/net/ethernet/freescale/fman/Kconfig new file mode 100644 index 000..8aeae29 --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/Kconfig @@ -0,0 +1,7 @@ +config FSL_FMAN + bool FMan support + depends on FSL_SOC || COMPILE_TEST + default n + help + Freescale Data-Path Acceleration Architecture Frame Manager + (FMan) support default n is a no-op. What does enabling this option actually do, in terms of user-visible features? +typedef struct fm_prs_result_t fm_prs_result; +typedef enum e_enet_mode enet_mode_t; This use of typedef is contrary to kernel coding style. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,4/9] fsl/fman: Add FMan MURAM support
On Wed, 2015-06-24 at 22:34 +0300, igal.liber...@freescale.com wrote: + struct muram_info *p_muram; No Hungarian notation. +void fm_muram_free(struct muram_info *p_muram) +{ + /* Destroy pool */ + gen_pool_destroy(p_muram-pool); + /* Unmap memory */ + iounmap(p_muram-vbase); + /* Free pointer */ + kfree(p_muram); +} This type of commenting is not useful. + memset_io((void __iomem *)vaddr, 0, (int)size); Unnecessary cast of size. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V9 00/13] powerpc, perf: Enable SW branch filters
Hi Anshuman, Thanks for your continued work on this. Given that the series is now at version 9 and is 13 patches long, I wonder if it might be worth splitting it up. I'd suggest: - Patch 1 could be sent individually as it's a bug fix. - Separating out a series of simple cleanups would make the actual changes in your patch set easier to understand. Patches 2, 3 and 5 are obvious candidates. - It looks like the changes in patch 6 aren't used by any of the following patches. It might be worth separating that out or just dropping it entirely. That would give you a series with just: 4 powerpc, perf: Restore privilege level filter support for BHRB 7 powerpc, perf: Re organize PMU branch filter processing on POWER8 8 powerpc, perf: Change the name of HW PMU branch filter tracking variable 9 powerpc, lib: Add new branch analysis support functions 10 powerpc, perf: Enable SW filtering in branch stack sampling framework 11 powerpc, perf: Change POWER8 PMU configuration to work with SW filters 12 powerpc, perf: Enable privilege mode SW branch filters 13 selftests, powerpc: Add test for BHRB branch filters (HW SW) That might make it easier for you to start getting the ground work in, and make it easier for others to understand what you're trying to do. Regards, Daniel On Mon, 2015-06-15 at 17:40 +0530, Anshuman Khandual wrote: This is the continuation (rebased and reworked) of the series posted at https://lkml.org/lkml/2014/5/5/153 (which is V6). I remember to have incremented the version count for the re-send of the first four patches of the series to Peter Z for generic review which got pulled in last year. These patches here are the remaining powerpc part of the original series. Changes in V9 = (1) Changed some of the commit messages and fixed some typos (2) Variable 'bhrb_users' type changed from int to unsigned int (3) Replaced the last usage of 'get_cpu_var' with 'this_cpu_ptr' (4) Conditional checks for 'cpuhw-bhrb_users' changed to test against zero (5) Updated in-code documentation inside 'check_excludes' function (6) Changed the name type of 'pred' variable in 'power_pmu_bhrb_read' (7) Changed the name of 'tmp' to 'to_addr' inside 'power_pmu_bhrb_read' (8) Changed return values for branch instruction analysis functions (9) Changed the name of 'flag' variable to 'select_branch' inside 'keep_branch' (10) Fixed one nested conditional statement inside 'keep_branch' function (11) Changed function name from 'update_branch_entry' to 'insert_branch' (12) Fixed copyright and license statements for new selftest related files (13) Improved helper assembly functions with some macro definitions (14) Improved the core test program at various places (15) Added .gitignore file for the new test case Changes in V8 (http://patchwork.ozlabs.org/patch/481848/) = (1) Fixed BHRB privilege mode branch filter request processing (2) Dropped branch records where 'from' cannot be fetched (3) Added in-code documenation at various places in the patch series (4) Added one comprehensive seltest case to verify all the filters Changes in V7 = (1) Incremented the version count while requesting pull for generic changes Changes in V6 (https://lkml.org/lkml/2014/5/5/153) = (1) Rebased the patchset against the master (2) Added Reviewed-by: Andi Kleen in the first four patches in the series which changes the generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130] Changes in V5 (https://lkml.org/lkml/2014/3/7/101) = (1) Added a precursor patch to cleanup the indentation problem in power_pmu_bhrb_read (2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which improved the clarity (3) Merged the previous 10th patch into the 8th patch (4) Moved SW based branch analysis code from core perf into code-patching library as suggested by Michael (5) Simplified the logic in branch analysis library (6) Fixed some ambiguities in documentation at various places (7) Added some more in-code documentation blocks at various places (8) Renamed some local variable and function names (9) Fixed some indentation and white space errors in the code (10) Implemented almost all the review comments and suggestions made by Michael Ellerman on V4 patchset (11) Enabled privilege mode SW branch filter (12) Simplified and generalized the SW implemented conditional branch filter (13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW implementation (14) Adjusted other patches to deal with the above changes Changes in V4 (https://lkml.org/lkml/2013/12/4/168) = (1) Changed the commit message for patch (01/10) (2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman (3) Rebased the patchset against latest Linus's tree Changes in V3 (https://lkml.org/lkml/2013/10/16/59) = (1) Split the SW branch filter
Re: [PATCH] powerpc/powernv: Fix vma page prot flags in opal-prd driver
* Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com [2015-06-21 23:56:16]: opal-prd driver will mmap() firmware code/data area as private mapping to prd user space daemon. Write to this page will trigger COW faults. The new COW pages are normal kernel RAM pages accounted by the kernel and are not special. vma-vm_page_prot value will be used at page fault time for the new COW pages, while pgprot_t value passed in remap_pfn_range() is used for the initial page table entry. Hence: * Do not add _PAGE_SPECIAL in vma, but only for remap_pfn_range() * Also remap_pfn_range() will add the _PAGE_SPECIAL flag using pte_mkspecial() call, hence no need to specify in the driver This fix resolves the page accounting warning shown below: BUG: Bad rss-counter state mm:c007d34ac600 idx:1 val:19 The above warning is triggered since _PAGE_SPECIAL was incorrectly being set for the normal kernel COW pages. Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/opal-prd.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-prd.c b/arch/powerpc/platforms/powernv/opal-prd.c index 46cb3fe..4ece8e4 100644 --- a/arch/powerpc/platforms/powernv/opal-prd.c +++ b/arch/powerpc/platforms/powernv/opal-prd.c @@ -112,6 +112,7 @@ static int opal_prd_open(struct inode *inode, struct file *file) static int opal_prd_mmap(struct file *file, struct vm_area_struct *vma) { size_t addr, size; + pgprot_t page_prot; int rc; pr_devel(opal_prd_mmap(0x%016lx, 0x%016lx, 0x%lx, 0x%lx)\n, @@ -125,13 +126,11 @@ static int opal_prd_mmap(struct file *file, struct vm_area_struct *vma) if (!opal_prd_range_is_valid(addr, size)) return -EINVAL; - vma-vm_page_prot = __pgprot(pgprot_val(phys_mem_access_prot(file, - vma-vm_pgoff, - size, vma-vm_page_prot)) - | _PAGE_SPECIAL); + page_prot = phys_mem_access_prot(file, vma-vm_pgoff, + size, vma-vm_page_prot); rc = remap_pfn_range(vma, vma-vm_start, vma-vm_pgoff, size, - vma-vm_page_prot); + page_prot); Hi Ben, remap_pfn_range() is the correct method to map the firmware pages because we will not have struct page associated with this RAM area. We do a memblock_reserve() in early boot and take out this memory from kernel and avoid struct page allocation/init for these. vm_insert_page() is an alternative that would have worked if kernel allocated the memory, in which case we can bump up the page count and map the page to user space. This is already done by vm_insert_page() and we will not need to make the page special. However, this use case fits remap_pfn_range() and page special mechanism since there is no struct page associate with this physical pages. --Vaidy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH SLOF v2 2/5] introduce 8-byte LE helpers
Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com Reviewed-by: Thomas Huth th...@redhat.com --- slof/fs/little-endian.fs | 6 ++ slof/fs/packages/disk-label.fs | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/slof/fs/little-endian.fs b/slof/fs/little-endian.fs index f2e4e8d..6b4779e 100644 --- a/slof/fs/little-endian.fs +++ b/slof/fs/little-endian.fs @@ -17,6 +17,9 @@ here c@ ef = CONSTANT ?littleendian ?bigendian [IF] +: x!-le r xbflip r x! ; +: x@-le x@ xbflip ; + : l!-le r lbflip r l! ; : l@-le l@ lbflip ; @@ -47,6 +50,9 @@ here c@ ef = CONSTANT ?littleendian [ELSE] +: x!-le x! ; +: x@-le x@ ; + : l!-le l! ; : l@-le l@ ; diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs index bb64d57..8c93cfb 100644 --- a/slof/fs/packages/disk-label.fs +++ b/slof/fs/packages/disk-label.fs @@ -384,8 +384,8 @@ AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4 debug-disk-label? IF . GPT PReP partition found cr THEN - block gpt-part-entryfirst-lba x@ xbflip - block gpt-part-entrylast-lba x@ xbflip + block gpt-part-entryfirst-lba x@-le + block gpt-part-entrylast-lba x@-le over - 1+ ( addr offset len ) swap ( addr len offset ) block-size * to part-offset -- 2.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH SLOF v2 5/5] disk-label: add support for booting from GPT FAT partition
For a GPT+LVM combination disk, older bootloader that does not support LVM, cannot load kernel from LVM. The patch adds support to read from BASIC_DATA UUID partitions for the case that the OS installer has installed the CHRP-BOOT config on a FAT file system. Makes GPT detection robust * Check for Protective MBR Magic * Check for valid GPT Signature * Boundary check for allocated block size before reading into the buffer Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com --- slof/fs/packages/disk-label.fs | 96 +- 1 file changed, 76 insertions(+), 20 deletions(-) diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs index 7ed5526..e5759a3 100644 --- a/slof/fs/packages/disk-label.fs +++ b/slof/fs/packages/disk-label.fs @@ -179,7 +179,8 @@ CONSTANT /gpt-part-entry \ This word returns true if the currently loaded disk-buf has _NO_ GPT partition id : no-gpt? ( -- true|false ) 0 read-disk-buf - 1 partitionpart-entry part-entryid c@ ee + 1 partitionpart-entry part-entryid c@ ee IF true EXIT THEN + disk-buf mbrmagic w@-le aa55 ; : pc-extended-partition? ( part-entry-addr -- true|false ) @@ -267,7 +268,10 @@ CONSTANT /gpt-part-entry : try-dos-partition ( -- okay? ) \ Read partition table and check magic. - no-mbr? IF cr . No DOS disk-label found. cr false EXIT THEN + no-mbr? IF + debug-disk-label? IF cr . No DOS disk-label found. cr THEN + false EXIT + THEN count-dos-logical-partitions TO dos-logical-partitions @@ -378,29 +382,80 @@ AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4 true ; -: load-from-gpt-prep-partition ( addr -- size ) - no-gpt? IF drop false EXIT THEN - debug-disk-label? IF - cr . GPT partition found cr - THEN - 1 read-disk-buf disk-buf gptpart-entry-lba l@-le +\ Check for GPT MSFT BASIC DATA GUID - fat based +EBD0A0A2CONSTANT GPT-BASIC-DATA-PARTITION-1 +B9E5CONSTANT GPT-BASIC-DATA-PARTITION-2 +4433CONSTANT GPT-BASIC-DATA-PARTITION-3 +87C068B6B72699C7CONSTANT GPT-BASIC-DATA-PARTITION-4 + +: gpt-basic-data-partition? ( -- true|false ) + disk-buf gpt-part-entrypart-type-guid + dup l@-le GPT-BASIC-DATA-PARTITION-1 IF drop false EXIT THEN + dup 4 + w@-le GPT-BASIC-DATA-PARTITION-2 IF drop false EXIT THEN + dup 6 + w@-le GPT-BASIC-DATA-PARTITION-3 IF drop false EXIT THEN + 8 + x@GPT-BASIC-DATA-PARTITION-4 IF false EXIT THEN + true +; + +\ +\ GPT Signature +\ (EFI PART, 45h 46h 49h 20h 50h 41h 52h 54h) +\ +4546492050415254 CONSTANT GPT-SIGNATURE + +: verify-gpt-partition ( -- true | false ) + no-gpt? IF false EXIT THEN + debug-disk-label? IF cr . GPT partition found cr THEN + 1 read-disk-buf + disk-buf gptpart-entry-lba x@-le block-size * to seek-pos disk-buf gptpart-entry-size l@-le to gpt-part-size - disk-buf gptnum-part-entry l@-le dup 0= IF false EXIT THEN + gpt-part-size disk-buf-size IF + cr . GPT part size exceeds buffer allocated cr + false exit + THEN + disk-buf gptsignature x@ GPT-SIGNATURE = +; + +: load-from-gpt-prep-partition ( addr -- size ) + verify-gpt-partition 0= IF false EXIT THEN + disk-buf gptnum-part-entry l@-le dup 0= IF false exit THEN 1+ 1 ?DO seek-pos 0 seek drop disk-buf gpt-part-size read drop gpt-prep-partition? IF - debug-disk-label? IF -. GPT PReP partition found cr - THEN - disk-buf gpt-part-entryfirst-lba x@-le - disk-buf gpt-part-entrylast-lba x@-le - over - 1+ ( addr offset len ) - swap ( addr len offset ) - block-size * to part-offset - 0 0 seek drop ( addr len ) - block-size * read ( size ) + debug-disk-label? IF . GPT PReP partition found cr THEN + disk-buf gpt-part-entryfirst-lba x@-le ( addr first-lba ) + disk-buf gpt-part-entrylast-lba x@-le ( addr first-lba last-lba) + over - 1+( addr first-lba blocks ) + swap ( addr blocks ) + block-size * to part-offset ( addr blocks ) + 0 0 seek drop( addr blocks ) + block-size * read( size ) UNLOOP EXIT + THEN + seek-pos gpt-part-size i * + to seek-pos + LOOP + false +; + +: try-gpt-dos-partition ( -- true | false ) + verify-gpt-partition 0= IF false EXIT THEN + disk-buf gptnum-part-entry l@-le dup 0= IF false EXIT THEN + 1+ 1 ?DO + seek-pos 0 seek drop + disk-buf gpt-part-size read drop + gpt-basic-data-partition? IF + debug-disk-label? IF . GPT LINUX DATA partition found cr THEN + disk-buf gpt-part-entryfirst-lba x@-le ( first-lba ) + dup to part-start ( first-lba ) + disk-buf gpt-part-entrylast-lba x@-le(