[PATCH SLOF v2 0/5] GPT fixes/cleanup and LVM support with FAT

2015-06-25 Thread Nikunj A Dadhania
Following patchset implements some improvements and cleanup for the
GPT booting code:

patch 1: Simplify the gpt detection code with lesser scopes and add
 comments.

patch 2: Introduce 8byte LE helpers: x@-le and x!-le

patch 3: Rename block / read-sector to indicate it a allocated buffer

patch 4: As we need to detect FAT partition, implement a helper that
 can be used both from GPT code and fat-bootblock?
 
patch 5: Implement GPT FAT for LVM suport and make GPT detection code
 robust

Nikunj A Dadhania (5):
  disk-label: simplify gpt-prep-partition? routine
  introduce 8-byte LE helpers
  disk-label: rename confusing block word
  disk-label: introduce helper to check fat filesystem
  disk-label: add support for booting from GPT FAT partition

 slof/fs/little-endian.fs   |   6 ++
 slof/fs/packages/disk-label.fs | 209 +
 2 files changed, 136 insertions(+), 79 deletions(-)

-- 
2.4.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH SLOF v2 1/5] disk-label: simplify gpt-prep-partition? routine

2015-06-25 Thread Nikunj A Dadhania
Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
Reviewed-by: Thomas Huth th...@redhat.com
---
 slof/fs/packages/disk-label.fs | 42 --
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
index fe1c25e..bb64d57 100644
--- a/slof/fs/packages/disk-label.fs
+++ b/slof/fs/packages/disk-label.fs
@@ -352,42 +352,32 @@ CONSTANT /gpt-part-entry
drop 0
 ;
 
-\ Check for GPT PReP partition GUID
-9E1A2D38 CONSTANT GPT-PREP-PARTITION-1
-C612 CONSTANT GPT-PREP-PARTITION-2
-4316 CONSTANT GPT-PREP-PARTITION-3
-AA26 CONSTANT GPT-PREP-PARTITION-4
-8B49521E5A8B CONSTANT GPT-PREP-PARTITION-5
+\ Check for GPT PReP partition GUID. Only first 3 blocks are
+\ byte-swapped treating last two blocks as contigous for simplifying
+\ comparison
+9E1A2D38CONSTANT GPT-PREP-PARTITION-1
+C612CONSTANT GPT-PREP-PARTITION-2
+4316CONSTANT GPT-PREP-PARTITION-3
+AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4
 
 : gpt-prep-partition? ( -- true|false )
-   block gpt-part-entrypart-type-guid l@-le GPT-PREP-PARTITION-1 = IF
-  block gpt-part-entrypart-type-guid 4 + w@-le
-  GPT-PREP-PARTITION-2 = IF
- block gpt-part-entrypart-type-guid 6 + w@-le
- GPT-PREP-PARTITION-3 = IF
-block gpt-part-entrypart-type-guid 8 + w@
-GPT-PREP-PARTITION-4 = IF
-   block gpt-part-entrypart-type-guid a + w@
-   block gpt-part-entrypart-type-guid c + l@ swap lxjoin
-   GPT-PREP-PARTITION-5 = IF
-   TRUE EXIT
-   THEN
-THEN
- THEN
-  THEN
-   THEN
-   FALSE
+   block gpt-part-entrypart-type-guid
+   dup l@-le GPT-PREP-PARTITION-1  IF drop false EXIT THEN
+   dup 4 + w@-le GPT-PREP-PARTITION-2  IF drop false EXIT THEN
+   dup 6 + w@-le GPT-PREP-PARTITION-3  IF drop false EXIT THEN
+   8 + x@GPT-PREP-PARTITION-4  IF false EXIT THEN
+   true
 ;
 
 : load-from-gpt-prep-partition ( addr -- size )
-   no-gpt? IF drop FALSE EXIT THEN
+   no-gpt? IF drop false EXIT THEN
debug-disk-label? IF
   cr . GPT partition found  cr
THEN
1 read-sector block gptpart-entry-lba l@-le
block-size * to seek-pos
block gptpart-entry-size l@-le to gpt-part-size
-   block gptnum-part-entry l@-le dup 0= IF FALSE EXIT THEN
+   block gptnum-part-entry l@-le dup 0= IF false EXIT THEN
1+ 1 ?DO
   seek-pos 0 seek drop
   block gpt-part-size read drop gpt-prep-partition? IF
@@ -405,7 +395,7 @@ AA26 CONSTANT GPT-PREP-PARTITION-4
   THEN
   seek-pos gpt-part-size i * + to seek-pos
LOOP
-   FALSE
+   false
 ;
 
 \ Extract the boot loader path from a bootinfo.txt file
-- 
2.4.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH SLOF v2 4/5] disk-label: introduce helper to check fat filesystem

2015-06-25 Thread Nikunj A Dadhania
Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
---
 slof/fs/packages/disk-label.fs | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
index 0995808..7ed5526 100644
--- a/slof/fs/packages/disk-label.fs
+++ b/slof/fs/packages/disk-label.fs
@@ -321,6 +321,14 @@ CONSTANT /gpt-part-entry
 
 \ Load from first active DOS boot partition.
 
+: fat-bootblock? ( addr -- flag )
+   \ byte 0-2 of the bootblock is a jump instruction in
+   \ all FAT filesystems.
+   \ e9 and eb are jump instructions in x86 assembler.
+   dup c@ e9 = IF drop true EXIT THEN
+   dup c@ eb = swap 2+ c@ 90 = and
+;
+
 \ NOTE: block-size is always 512 bytes for DOS partition tables.
 
 : load-from-dos-boot-partition ( addr -- size )
@@ -549,14 +557,7 @@ AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4
 : try-dos-files ( -- found? )
no-mbr? IF false EXIT THEN
 
-   \ disk-buf 0 byte 0-2 is a jump instruction in all FAT
-   \ filesystems.
-   \ e9 and eb are jump instructions in x86 assembler.
-   disk-buf c@ e9  IF
-  disk-buf c@ eb 
-  disk-buf 2+ c@ 90  or
-  IF false EXIT THEN
-   THEN
+   disk-buf fat-bootblock? 0= IF false EXIT THEN
s fat-files (interpose-filesystem)
true
 ;
-- 
2.4.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/13] perf/core improvements and fixes

2015-06-25 Thread Ingo Molnar

* Arnaldo Carvalho de Melo a...@kernel.org wrote:

 Hi Ingo,
 
   Please consider pulling,
 
 - Arnaldo
 
 The following changes since commit a9a3cd900fbbcbf837d65653105e7bfc583ced09:
 
   Merge tag 'perf-core-for-mingo' of 
 git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
 (2015-06-20 01:11:11 +0200)
 
 are available in the git repository at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
 tags/perf-core-for-mingo
 
 for you to fetch changes up to 83b2ea257eb1d43e52f76d756722aeb899a2852c:
 
   perf tools: Allow auxtrace data alignment (2015-06-23 18:28:37 -0300)
 
 
 perf/core improvements and fixes:
 
 User visible:
 
 - Move toggling event logic from 'perf top' and into hists browser, allowing
   freeze/unfreeze with event lists with more than one entry (Namhyung Kim)
 
 - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and
   showing the Aggregated stats in 'perf report -D' (Adrian Hunter)
 
 Infrastructure:
 
 - Allow auxtrace data alignment (Adrian Hunter)
 
 - Allow events with dot (Andi Kleen)
 
 - Fix failure to 'perf probe' events on arm (He Kuang)
 
 - Add testing for Makefile.perf (Jiri Olsa)
 
 - Add test for make install with prefix (Jiri Olsa)
 
 - Fix single target build dependency check (Jiri Olsa)
 
 - Access thread_map entries via accessors, prep patch to hold more info per
   entry, for ongoing 'perf stat --per-thread' work (Jiri Olsa)
 
 - Use __weak definition from compiler.h (Sukadev Bhattiprolu)
 
 - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)
 
 Signed-off-by: Arnaldo Carvalho de Melo a...@redhat.com
 
 
 Adrian Hunter (3):
   perf session: Print a newline when dumping PERF_RECORD_FINISHED_ROUND
   perf tools: Print a newline before dumping Aggregated stats
   perf tools: Allow auxtrace data alignment
 
 Andi Kleen (1):
   perf tools: Allow events with dot
 
 He Kuang (1):
   perf probe: Fix failure to probe events on arm
 
 Jiri Olsa (5):
   perf tests: Add testing for Makefile.perf
   perf tests: Add test for make install with prefix
   perf build: Fix single target build dependency check
   perf thread_map: Don't access the array entries directly
   perf thread_map: Change map entries into a struct
 
 Namhyung Kim (1):
   perf top: Move toggling event logic into hists browser
 
 Sukadev Bhattiprolu (2):
   perf pmu: Use __weak definition from linux/compiler.h
   perf pmu: Split perf_pmu__new_alias()
 
  tools/perf/Makefile |  4 +--
  tools/perf/builtin-top.c| 24 ++-
  tools/perf/builtin-trace.c  |  4 +--
  tools/perf/tests/make   | 31 ++--
  tools/perf/tests/openat-syscall-tp-fields.c |  2 +-
  tools/perf/ui/browsers/hists.c  | 19 ++--
  tools/perf/util/auxtrace.c  | 11 +--
  tools/perf/util/auxtrace.h  |  1 +
  tools/perf/util/event.c |  6 ++--
  tools/perf/util/evlist.c|  4 +--
  tools/perf/util/evsel.c |  2 +-
  tools/perf/util/parse-events.l  |  5 ++--
  tools/perf/util/pmu.c   | 45 
 +++--
  tools/perf/util/probe-event.c   |  6 +++-
  tools/perf/util/session.c   |  4 ++-
  tools/perf/util/thread_map.c| 24 ---
  tools/perf/util/thread_map.h| 16 +-
  17 files changed, 136 insertions(+), 72 deletions(-)

Pulled, thanks a lot Arnaldo!

Btw., one small thing I noticed about the status line in perf top: if I ever 
use 
'f' to freeze/unfreeze events, the following message:

  Press 'f' to disable the events or 'h' to see other hotkeys

sticks around forever, even after I look into annotation and exit it, etc.

So I don't mind some default, helpful message there (such as 'Press 'h' to see 
hotkeys'), but it appears this particular message is context and usage 
sensitive, 
which wasn't really the goal, right?

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH SLOF v2 3/5] disk-label: rename confusing block word

2015-06-25 Thread Nikunj A Dadhania
block word is not a block number, actually its an allocated host
address.  Rename it to disk-buf along with a associated
size(disk-buf-size=4096) for using during allocation/free.

Also renaming the helper routine read-sector to read-disk-buf. This
routine assumes the address to be disk-buf and only takes sector number
as argument.

Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
---
 slof/fs/packages/disk-label.fs | 78 ++
 1 file changed, 41 insertions(+), 37 deletions(-)

diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
index 8c93cfb..0995808 100644
--- a/slof/fs/packages/disk-label.fs
+++ b/slof/fs/packages/disk-label.fs
@@ -33,7 +33,8 @@ s disk-label device-name
 0 INSTANCE VALUE dos-logical-partitions
 
 0 INSTANCE VALUE block-size
-0 INSTANCE VALUE block
+0 INSTANCE VALUE disk-buf
+d# 4096VALUE disk-buf-size
 
 0 INSTANCE VALUE args
 0 INSTANCE VALUE args-len
@@ -126,11 +127,11 @@ CONSTANT /gpt-part-entry
 ;
 
 
-\ read sector to array block
-: read-sector ( sector-number -- )
+\ read sector to array disk-buf
+: read-disk-buf ( sector-number -- )
\ block-size is 0x200 on disks, 0x800 on cdrom drives
block-size * 0 seek drop  \ seek to sector
-   block block-size read drop\ read sector
+   disk-buf block-size read drop\ read sector
 ;
 
 : (.part-entry) ( part-entry )
@@ -149,35 +150,35 @@ CONSTANT /gpt-part-entry
 
 : (.name) r@ begin cell - dup @ colon = UNTIL xtname cr type space ;
 
-: init-block ( -- )
+: init-disk-buf ( -- )
s block-size ['] $call-parent CATCH IF ABORT parent has no block-size. 
THEN
to block-size
-   d# 4096 alloc-mem
-   dup d# 4096 erase
-   to block
+   disk-buf-size alloc-mem
+   dup disk-buf-size erase
+   to disk-buf
debug-disk-label? IF
-  . init-block: block-size= block-size .d . block=0x block u. cr
+  . init-disk-buf: block-size= block-size .d . disk-buf=0x disk-buf u. 
cr
THEN
 ;
 
 : partitionpart-entry ( partition -- part-entry )
-   1- /partition-entry * block mbrpartition-table +
+   1- /partition-entry * disk-buf mbrpartition-table +
 ;
 
 : partitionstart-sector ( partition -- sector-offset )
partitionpart-entry part-entrysector-offset l@-le
 ;
 
-\ This word returns true if the currently loaded block has _NO_ MBR magic
+\ This word returns true if the currently loaded disk-buf has _NO_ MBR magic
 : no-mbr? ( -- true|false )
-   0 read-sector
+   0 read-disk-buf
1 partitionpart-entry part-entryid c@ ee = IF TRUE EXIT THEN \ GPT 
partition found
-   block mbrmagic w@-le aa55 
+   disk-buf mbrmagic w@-le aa55 
 ;
 
-\ This word returns true if the currently loaded block has _NO_ GPT partition 
id
+\ This word returns true if the currently loaded disk-buf has _NO_ GPT 
partition id
 : no-gpt? ( -- true|false )
-   0 read-sector
+   0 read-disk-buf
1 partitionpart-entry part-entryid c@ ee 
 ;
 
@@ -197,7 +198,7 @@ CONSTANT /gpt-part-entry
  part-entrysector-offset l@-le( current sector )
  dup to part-start to lpart-start  ( current )
  BEGIN
-part-start read-sector  \ read EBR
+part-start read-disk-buf  \ read EBR
 1 partitionstart-sector IF
\ . Logical Partition found at  part-start .d cr
1+
@@ -240,7 +241,7 @@ CONSTANT /gpt-part-entry
 part-entrysector-offset l@-le( log-part current sector )
 dup to part-start to lpart-start  ( log-part current )
 BEGIN
-   part-start read-sector \ read EBR
+   part-start read-disk-buf \ read EBR
1 partitionstart-sector IF\ first partition entry
   1+ 2dup = IF( log-part current )
  2drop
@@ -306,13 +307,13 @@ CONSTANT /gpt-part-entry
 : has-iso9660-filesystem  ( -- TRUE|FALSE )
\ Seek to the beginning of logical 2048-byte sector 16
\ refer to Chapter C.11.1 in PAPR 2.0 Spec
-   \ was: 10 read-sector, but this might cause trouble if you
+   \ was: 10 read-disk-buf, but this might cause trouble if you
\ try booting an ISO image from a device with 512b sectors.
10 800 * 0 seek drop  \ seek to sector
-   block 800 read drop   \ read sector
+   disk-buf 800 read drop   \ read sector
\ Check for CD-ROM volume magic:
-   block c@ 1 =
-   block 1+ 5 s CD001  str=
+   disk-buf c@ 1 =
+   disk-buf 1+ 5 s CD001  str=
and
dup IF 800 to block-size THEN
 ;
@@ -361,7 +362,7 @@ C612CONSTANT GPT-PREP-PARTITION-2
 AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4
 
 : gpt-prep-partition? ( -- true|false )
-   block gpt-part-entrypart-type-guid
+   disk-buf gpt-part-entrypart-type-guid
dup l@-le GPT-PREP-PARTITION-1  IF drop false EXIT THEN
dup 4 + w@-le GPT-PREP-PARTITION-2  IF drop false EXIT THEN
dup 6 + w@-le GPT-PREP-PARTITION-3  IF drop false EXIT 

Re: [PATCH] powerpc/powernv: Fix vma page prot flags in opal-prd driver

2015-06-25 Thread Vaidyanathan Srinivasan
* Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com [2015-06-25 11:45:46]:

[snip]
 
 Hi Ben,
 
 remap_pfn_range() is the correct method to map the firmware pages
 because we will not have struct page associated with this RAM area.
 
 We do a memblock_reserve() in early boot and take out this memory from
 kernel and avoid struct page allocation/init for these.

Kindly ignore the this comment.  memblock_reserve() does not
prevent/avoid struct page allocation.  We do have valid struct page
which can be used for mapping.

--Vaidy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 3/3] leds/powernv: Add driver for PowerNV platform

2015-06-25 Thread Vasant Hegde
On 06/25/2015 06:39 AM, Benjamin Herrenschmidt wrote:
 On Tue, 2015-04-28 at 15:40 +0530, Vasant Hegde wrote:
 
 +Device Tree binding for LEDs on IBM Power Systems
 +-
 +
 +The 'led' node under '/ibm,opal' lists service indicators available in the
 +system and their capabilities.
 +
 +led {
 +compatible = ibm,opal-v3-led;
 +phandle = 0x106b;
 +linux,phandle = 0x106b;
 +led-mode = lightpath;
 +
 +U78C9.001.RST0027-P1-C1 {
 +led-types = identify, fault;
 +led-loc = descendent;
 +phandle = 0x106f;
 +linux,phandle = 0x106f;
 +};
 +...
 +...
 +};

Ben,

 
 My only issue is that /led should probably have been /leds but afaik
 this is already committed in the FW tree. Vasant, have we done a release
 what that code yet or can we still change this ?

I think we can change OPAL side as we haven't released kernel code... So no
consumer yet.
Will send a patch to skiboot mailing list. lets see

 
 Also what does led-mode = lightpath means ? Can you describe it ?
 Are there alternative values ? I don't see the point myself ...

Yes.. Our system can work in two modes...
   - light path -- Both identify and faults are supported
 typically all low end servers are in this mode
  - guiding light - Only identify LEDs are supports . no fault indicator
support for individual FRU
 typically high end servers are in this mode.

These modes are static in nature.. meaning we cannot change that during run 
time...
AFAIK all the PowerNV boxes shipped today are in Light Path mode.. I have added
this, so that in future if they decide to ship system with guiding light mode,
then we don't need to make any changes.


 
 Don't leave the linux,phandle in the description of the binding (nor
 the phandle actually). They are implicit for all nodes, no need to
 clutter the documentation with them.

Sure. Will fix in next version.


 
 +Each node under 'led' node describes location code of FRU/Enclosure.
 +
 +The properties under each node:
 +
 +  led-types : Supported LED types (attention/identify/fault).
 +
 +  led-loc   : enclosure/descendent(FRU) location code.
 
 I don't understand what that means. Please provide a more detailed
 explanation.
 

This describes the LED location (FRU or enclosure level).. This was added to
identify the component (as FRU leds are overloaded and enclosure level we have
separate LEDs for each component)...


 Is the name of the node the loc code of the FRU ? In that case, how do
 you deal with multiple LEDs on the same FRU without a unit address ?

We use location code + LED type for node. So that we can identify multiple LEDs.

Looking back again, probably we can take out above property as we are not using
that today.

-Vasant

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V9 05/13] powerpc, perf: Change name type of 'pred' in power_pmu_bhrb_read

2015-06-25 Thread Anshuman Khandual
On 06/25/2015 10:41 AM, Daniel Axtens wrote:
  cpuhw-bhrb_entries[u_index].to = addr;
 -cpuhw-bhrb_entries[u_index].mispred = pred;
 -cpuhw-bhrb_entries[u_index].predicted = ~pred;
 +cpuhw-bhrb_entries[u_index].mispred = mispred;
 +cpuhw-bhrb_entries[u_index].predicted =
 +~mispred;
  
 
 This is much better! However, these are still bitwise rather than
 logical inversions. They will work, but would it be easier to understand
 if you used !mispred?

Sure, will change it as well.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V9 02/13] powerpc, perf: Change type of the bhrb_users variable

2015-06-25 Thread Anshuman Khandual
On 06/25/2015 11:12 AM, Daniel Axtens wrote:
 -int bhrb_users;
 +unsigned intbhrb_users;
 
 OK, so this is a good start.
 
 A quick git grep for bhrb_users reveals this:
 perf/core-book3s.c: WARN_ON_ONCE(cpuhw-bhrb_users  0);
 
 That occurs in power_pmu_bhrb_disable, immediately following a decrement
 of bhrb_users. Now that the test can never be true, this patch should
 change the function to check if bhrb_users is 0 before decrementing.

Sure. Would replace with 'WARN_ON_ONCE(!cpuhw-bhrb_users)' before
decrementing bhrb_users in the function.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V9 00/13] powerpc, perf: Enable SW branch filters

2015-06-25 Thread Anshuman Khandual
On 06/25/2015 11:48 AM, Daniel Axtens wrote:

 Hi Anshuman,
 
 Thanks for your continued work on this.
 
 Given that the series is now at version 9 and is 13 patches long, I
 wonder if it might be worth splitting it up.

Splitting it up completely or just keeping all the generic fixes
and cleanups at the beginning of the series would be sufficient.
Anyways I am willing to send them out separately if that helps.

 
 I'd suggest:
 
  - Patch 1 could be sent individually as it's a bug fix.

Not with the generic cleanup group as proposed below ?

 
  - Separating out a series of simple cleanups would make the actual
 changes in your patch set easier to understand. Patches 2, 3 and 5 are
 obvious candidates.

Agreed. Just that adding the first patch here will prevent a three way
split of the entire series.

 
  - It looks like the changes in patch 6 aren't used by any of the
 following patches. It might be worth separating that out or just
 dropping it entirely.

I guess you are talking about patch 7 powerpc, perf: Re organize PMU
branch filter processing on POWER8. Patch 6 is getting used later on.

 
 
 That would give you a series with just:
 4   powerpc, perf: Restore privilege level filter support for BHRB
 7   powerpc, perf: Re organize PMU branch filter processing on POWER8
 8   powerpc, perf: Change the name of HW PMU branch filter tracking variable
 9   powerpc, lib: Add new branch analysis support functions
 10   powerpc, perf: Enable SW filtering in branch stack sampling framework
 11   powerpc, perf: Change POWER8 PMU configuration to work with SW filters
 12   powerpc, perf: Enable privilege mode SW branch filters
 13   selftests, powerpc: Add test for BHRB branch filters (HW  SW)
 
 That might make it easier for you to start getting the ground work in,
 and make it easier for others to understand what you're trying to do.

Sure, agreed. Here are the two set of patches after the proposed split.
Patches are in the reverse order though. Hope this helps.

Generic cleanups and fixes
---

powerpc/perf: Re organize PMU branch filter processing on POWER8
powerpc/perf: Change name  type of 'pred' in power_pmu_bhrb_read
powerpc/perf: Replace last usage of get_cpu_var with this_cpu_ptr
powerpc/perf: Change type of the bhrb_users variable
powerpc/perf: Drop the branch sample when 'from' cannot be fetched

BHRB SW branch filter
--

selftests/powerpc: Add test for BHRB branch filters (HW  SW)
powerpc/perf: Enable privilege mode SW branch filters
powerpc/perf: Change POWER8 PMU configuration to work with SW filters
powerpc/perf: Enable SW filtering in branch stack sampling framework
powerpc/lib: Add new branch analysis support functions
powerpc/perf: Change the name of HW PMU branch filter tracking variable
powerpc/perf: Re organize BHRB processing
powerpc/perf: Restore privilege level filter support for BHRB

Regards
Anshuman

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V9 06/13] powerpc, perf: Re organize BHRB processing

2015-06-25 Thread Anshuman Khandual
On 06/25/2015 11:22 AM, Daniel Axtens wrote:
 
 +static void insert_branch(struct cpu_hw_events *cpuhw,
 +int index, u64 from, u64 to, int mispred)
 Given that your previous patch made mispred a bool, this could take a
 bool too. It could probably be an inline function as well.

Sure. will change it.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc, tm: Drop tm_orig_msr from thread_struct

2015-06-25 Thread Anshuman Khandual
On 04/24/2015 10:31 AM, Anshuman Khandual wrote:
 On 04/20/2015 01:45 PM, Anshuman Khandual wrote:
 Currently tm_orig_msr is getting used during process context switch only.
 Then there is ckpt_regs which saves the checkpointed userspace context
 The MSR slot contained in ckpt_regs structure can be used during process
 context switch instead of tm_orig_msr, thus allowing us to drop it from
 thread_struct structure. This patch does that change.

 Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com
 ---
 This issue came up in the discussion regarding ptrace interface for TM
 specific registers https://lkml.org/lkml/2015/4/20/100, so just wanted
 to give this a try. The basic TM tests still pass after this change.
 
 Hey Michael/Mikey,
 
 Whats your thoughts on this ? Can we drop tm_orig_msr ?

Just wanted some inputs/suggestions/thoughts on this idea. Did not hear
from any one on this. Will it create any problem any where if we drop
this variable.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/13] perf/core improvements and fixes

2015-06-25 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 25, 2015 at 09:31:41AM +0200, Ingo Molnar escreveu:
 Pulled, thanks a lot Arnaldo!
 
 Btw., one small thing I noticed about the status line in perf top: if I ever 
 use 
 'f' to freeze/unfreeze events, the following message:
 
   Press 'f' to disable the events or 'h' to see other hotkeys
 
 sticks around forever, even after I look into annotation and exit it, etc.
 
 So I don't mind some default, helpful message there (such as 'Press 'h' to 
 see 
 hotkeys'), but it appears this particular message is context and usage 
 sensitive, 
 which wasn't really the goal, right?

Agreed, some more work is needed to change that message in more places,
will do it eventually.

- Arnaldo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V9 04/13] powerpc, perf: Restore privilege level filter support for BHRB

2015-06-25 Thread Anshuman Khandual
On 06/25/2015 10:32 AM, Daniel Axtens wrote:
 
 diff --git a/arch/powerpc/perf/core-book3s.c 
 b/arch/powerpc/perf/core-book3s.c
 index 7a03cce..892340e 100644
 --- a/arch/powerpc/perf/core-book3s.c
 +++ b/arch/powerpc/perf/core-book3s.c
 @@ -930,7 +930,7 @@ static int power_check_constraints(struct cpu_hw_events 
 *cpuhw,
   * added events.
   */
  static int check_excludes(struct perf_event **ctrs, unsigned int cflags[],
 -  int n_prev, int n_new)
 +  int n_prev, int n_new, int bhrb_users)
 Shouldn't this be an unsigned int too?

Yeah it should be, will change it.

 
 -if (ppmu-flags  PPMU_ARCH_207S)
 +if ((ppmu-flags  PPMU_ARCH_207S)  !bhrb_users)
 This is now different to the others. Now that bhrb_users is unsigned,
 I'm happy if you want to revert all of them to be like this, I was just
 concerned that if bhrb_users is an int, both 1 and -1 evaluate to true
 and I wasn't sure that was the desired behaviour.

Sure, will change it.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/7]powerpc/powernv: Nest Instrumentation support

2015-06-25 Thread Madhavan Srinivasan
This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.

Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.

OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.

Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.

Here is sample perf usage to explain the interface.

#./perf list

  iTLB-load-misses   [Hardware cache event]

  Nest_Alink_BW/Alink0/  [Kernel PMU event]
  Nest_Alink_BW/Alink1/  [Kernel PMU event]
  Nest_Alink_BW/Alink2/  [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_00/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_01/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_02/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_03/   [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_00/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_01/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_02/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_03/  [Kernel PMU event]
  Nest_PowerBus_BW/External/ [Kernel PMU event]
  Nest_PowerBus_BW/Internal/ [Kernel PMU event]
  Nest_Xlink_BW/Xlink0/  [Kernel PMU event]
  Nest_Xlink_BW/Xlink1/  [Kernel PMU event]
  Nest_Xlink_BW/Xlink2/  [Kernel PMU event]

  rNNN   [Raw hardware event 
descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier  [Raw hardware event 
descriptor]
.

# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1

 Performance counter stats for 'system wide':

CPU0 15,913.18 MiB  Nest_Xlink_BW/Xlink1/
CPU3211,955.88 MiB  Nest_Xlink_BW/Xlink1/
CPU6411,042.43 MiB  Nest_Xlink_BW/Xlink1/
CPU9614,065.27 MiB  Nest_Xlink_BW/Xlink1/

   1.001062038 seconds time elapsed

# ./perf stat -e 
'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 
1000 sleep 5

 Performance counter stats for 'system wide':

CPU0  0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU32 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU64 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU96 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU0  1,430.43 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU32   320.99 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU64 3,443.83 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU96 1,904.41 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU0  2,856.85 MiB  Nest_Alink_BW/Alink2/
CPU32 7.50 MiB  Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB  Nest_Alink_BW/Alink2/
CPU96   288.49 MiB  Nest_Alink_BW/Alink2/
.

OPAL side patches are posted in the skiboot mailing list.

Changelog from v2:

1) Changed variable and macro names to be consistent.
2) Made changes to commit message and code comment messages
3) Moved format attribute related code from patch 6 to 5
4) Added check for pmu register function
5) Changed cpu_init and cpu_exit functions to use first online
   cpu of the chip, there by making code lot simplier.

Changelog from v1:

1) No logic changes, re-ordered patches make each patch compile
   without errors
2) Added comments based on the review feedback.
3) removed perf_event_del function and replaced it with perf_event_stop.
4) Moved Nest feature detection code out of parser function.
5) Optimized functions and removed some variables.
6) squashed the makefile changes, instead of the separate patch
7) squashed the cpumask and hotplug patches as single patch
8) Added cpu checks in nest_change_cpu_context and nest_exit_cpu functions
9) Made changes to commit messages.

Changelog 

[PATCH v3 2/7]powerpc/powernv: Add OPAL support for Nest PMU

2015-06-25 Thread Madhavan Srinivasan
Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.

OPAL side patches are posted in the skiboot mailing.

Cc: Stewart Smith stew...@linux.vnet.ibm.com
Cc: Jeremy Kerr j...@ozlabs.org
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Michael Ellerman m...@ellerman.id.au
Cc: Paul Mackerras pau...@samba.org
Cc: Anton Blanchard an...@samba.org
Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
Cc: Anshuman Khandual khand...@linux.vnet.ibm.com
Cc: Stephane Eranian eran...@google.com
Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/opal-api.h| 3 ++-
 arch/powerpc/include/asm/opal.h| 2 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0321a90..8011a75 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -153,7 +153,8 @@
 #define OPAL_FLASH_READ110
 #define OPAL_FLASH_WRITE   111
 #define OPAL_FLASH_ERASE   112
-#define OPAL_LAST  112
+#define OPAL_NEST_IMA_CONTROL  116
+#define OPAL_LAST  116
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..f86e5e9 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -201,6 +201,8 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
uint64_t buf,
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
 
+int64_t opal_nest_ima_control(uint32_t value);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a7ade94..ce36a68 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -295,3 +295,4 @@ OPAL_CALL(opal_i2c_request, 
OPAL_I2C_REQUEST);
 OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
 OPAL_CALL(opal_flash_write,OPAL_FLASH_WRITE);
 OPAL_CALL(opal_flash_erase,OPAL_FLASH_ERASE);
+OPAL_CALL(opal_nest_ima_control,   OPAL_NEST_IMA_CONTROL);
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 5/7]powerpc/powernv: add event attribute and group to nest pmu

2015-06-25 Thread Madhavan Srinivasan
Add code to create event/format attributes and attribute groups for
each nest pmu.

Cc: Michael Ellerman m...@ellerman.id.au
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: Anton Blanchard an...@samba.org
Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
Cc: Anshuman Khandual khand...@linux.vnet.ibm.com
Cc: Stephane Eranian eran...@google.com
Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/perf/nest-pmu.c | 57 
 1 file changed, 57 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 6116ff3..20ed9f8 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,17 @@
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
 static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
 
+PMU_FORMAT_ATTR(event, config:0-20);
+struct attribute *p8_nest_format_attrs[] = {
+   format_attr_event.attr,
+   NULL,
+};
+
+struct attribute_group p8_nest_format_group = {
+   .name = format,
+   .attrs = p8_nest_format_attrs,
+};
+
 static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
 {
@@ -45,6 +56,48 @@ static int nest_event_info(struct property *pp, char *start,
return 0;
 }
 
+/*
+ * Populate event name and string in attribute
+ */
+struct attribute *dev_str_attr(const char *name, const char *str)
+{
+   struct perf_pmu_events_attr *attr;
+
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+   attr-event_str = str;
+   attr-attr.attr.name = name;
+   attr-attr.attr.mode = 0444;
+   attr-attr.show = perf_event_sysfs_show;
+
+   return attr-attr.attr;
+}
+
+int update_events_in_group(
+   struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
+{
+   struct attribute_group *attr_group;
+   struct attribute **attrs;
+   int i;
+
+   /* Allocate memory for event attribute group */
+   attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+   sizeof(*attr_group)), GFP_KERNEL);
+   if (!attr_group)
+   return -ENOMEM;
+
+   attrs = (struct attribute **)(attr_group + 1);
+   attr_group-name = events;
+   attr_group-attrs = attrs;
+
+   for (i = 0; i  idx; i++, p8_events++)
+   attrs[i] = dev_str_attr((char *)p8_events-ev_name,
+   (char *)p8_events-ev_value);
+
+   pmu-attr_groups[0] = attr_group;
+   return 0;
+}
+
 static int nest_pmu_create(struct device_node *dev, int pmu_index)
 {
struct nest_ima_events **p8_events_arr, *p8_events;
@@ -91,6 +144,7 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
/* Save the name to register it later */
sprintf(buf, Nest_%s, (char *)pp-value);
pmu_ptr-pmu.name = (char *)buf;
+   pmu_ptr-attr_groups[1] = p8_nest_format_group;
continue;
}
 
@@ -122,6 +176,9 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
idx++;
}
 
+   update_events_in_group(
+   (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
+
return 0;
 }
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/7]powerpc/powernv: Data structure and macros definition

2015-06-25 Thread Madhavan Srinivasan
Create new header file nest-pmu.h to add the data structures
and macros needed for the nest pmu support.

Cc: Michael Ellerman m...@ellerman.id.au
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: Anton Blanchard an...@samba.org
Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
Cc: Anshuman Khandual khand...@linux.vnet.ibm.com
Cc: Stephane Eranian eran...@google.com
Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/perf/nest-pmu.h | 53 
 1 file changed, 53 insertions(+)
 create mode 100644 arch/powerpc/perf/nest-pmu.h

diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 000..ecb5d26
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,53 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include linux/perf_event.h
+#include linux/slab.h
+#include linux/of.h
+#include linux/io.h
+#include asm/opal.h
+
+#define P8_NEST_MAX_CHIPS  32
+#define P8_NEST_MAX_PMUS   32
+#define P8_NEST_MAX_PMU_NAME_LEN   256
+#define P8_NEST_MAX_EVENTS_SUPPORTED   256
+#define P8_NEST_ENGINE_START   1
+#define P8_NEST_ENGINE_STOP0
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+   uint32_t chip_id;
+   uint64_t pbase;
+   uint64_t vbase;
+   uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct nest_ima_events {
+   const char *ev_name;
+   const char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+   struct pmu pmu;
+   const struct attribute_group *attr_groups[4];
+};
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/7]powerpc/powernv: detect supported nest pmus and its events

2015-06-25 Thread Madhavan Srinivasan
Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).

The nest unit event file from DT, will contain the offset in the reserved memory
region to get the counter data for a given event. Kernel code uses this offset
as event configuration value.

Device tree parser code also looks for scale/unit in the file name and
passes on the file as an event attr for perf tool to use in the post
processing.

Cc: Michael Ellerman m...@ellerman.id.au
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: Anton Blanchard an...@samba.org
Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
Cc: Anshuman Khandual khand...@linux.vnet.ibm.com
Cc: Stephane Eranian eran...@google.com
Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/perf/nest-pmu.c | 124 ++-
 1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index e7d45ed..6116ff3 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,119 @@
 #include nest-pmu.h
 
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+
+static int nest_event_info(struct property *pp, char *start,
+   struct nest_ima_events *p8_events, int flg, u32 val)
+{
+   char *buf;
+
+   /* memory for event name */
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   strncpy(buf, start, strlen(start));
+   p8_events-ev_name = buf;
+
+   /* memory for content */
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   if (flg) {
+   /* string content*/
+   if (!pp-value ||
+  (strnlen(pp-value, pp-length) == pp-length))
+   return -EINVAL;
+
+   strncpy(buf, (const char *)pp-value, pp-length);
+   } else
+   sprintf(buf, event=0x%x, val);
+
+   p8_events-ev_value = buf;
+   return 0;
+}
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+   struct nest_ima_events **p8_events_arr, *p8_events;
+   struct nest_pmu *pmu_ptr;
+   struct property *pp;
+   char *buf, *start;
+   const __be32 *lval;
+   u32 val;
+   int idx = 0, ret;
+
+   if (!dev)
+   return -EINVAL;
+
+   /* memory for nest pmus */
+   pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+   if (!pmu_ptr)
+   return -ENOMEM;
+
+   /* Needed for hotplug/migration */
+   per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+   /* memory for nest pmu events */
+   p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
+   GFP_KERNEL);
+   if (!p8_events_arr)
+   return -ENOMEM;
+   p8_events = (struct nest_ima_events *)p8_events_arr;
+
+   /*
+* Loop through each property
+*/
+   for_each_property_of_node(dev, pp) {
+   start = pp-name;
+
+   if (!strcmp(pp-name, name)) {
+   if (!pp-value ||
+  (strnlen(pp-value, pp-length) == pp-length))
+   return -EINVAL;
+
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   /* Save the name to register it later */
+   sprintf(buf, Nest_%s, (char *)pp-value);
+   pmu_ptr-pmu.name = (char *)buf;
+   continue;
+   }
+
+   /* Skip these, we dont need it */
+   if (!strcmp(pp-name, phandle) ||
+   !strcmp(pp-name, device_type) ||
+   !strcmp(pp-name, linux,phandle))
+   continue;
+
+   if (strncmp(pp-name, unit., 5) == 0) {
+   /* Skip first few chars in the name */
+   start += 5;
+   ret = nest_event_info(pp, start, p8_events++, 1, 0);
+   } else if (strncmp(pp-name, scale., 6) == 0) {
+   /* Skip first few chars in the name */
+   start += 6;
+   ret = nest_event_info(pp, start, p8_events++, 1, 0);
+   } else {
+   lval = of_get_property(dev, pp-name, NULL);
+   val = (uint32_t)be32_to_cpup(lval);
+
+   ret = nest_event_info(pp, start, p8_events++, 0, val);
+   }
+
+   if (ret)
+   return ret;
+
+   /* book keeping 

[PATCH v3 3/7]powerpc/powernv: Nest PMU detection and device tree parser

2015-06-25 Thread Madhavan Srinivasan
Create a file nest-pmu.c to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).

Detection mechanism is to look for specific property ibm,ima-chip in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events

Device tree layout for the Nest PMU as follows.

  / -- DT root folder
  |
  -nest-ima -- Nest PMU folder
   |

   -ima-chip@chip-id  -- Per-chip folder for reserved region information
|
-ibm,chip-id-- Chip id
-ibm,ima-chip
-reg-- HOMER PORE Nest Counter collection Address (RA)
-size   -- size to map in kernel space

   -Alink_BW-- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit   -- Event unit file
-device_type-- nest-ima-unit marker
  

Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.

Cc: Michael Ellerman m...@ellerman.id.au
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: Anton Blanchard an...@samba.org
Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
Cc: Anshuman Khandual khand...@linux.vnet.ibm.com
Cc: Stephane Eranian eran...@google.com
Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/perf/Makefile   |  2 +-
 arch/powerpc/perf/nest-pmu.c | 85 
 2 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/perf/nest-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a..6da656b 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS)   += callchain.o
 obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += power4-pmu.o ppc970-pmu.o power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
-  power8-pmu.o
+  power8-pmu.o nest-pmu.o
 obj32-$(CONFIG_PPC_PERF_CTRS)  += mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 000..e7d45ed
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,85 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include nest-pmu.h
+
+static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+
+static int nest_ima_dt_parser(void)
+{
+   const __be32 *gcid;
+   const __be64 *chip_ima_reg;
+   const __be64 *chip_ima_size;
+   struct device_node *dev;
+   struct perchip_nest_info *p8ni;
+   int idx;
+
+   /*
+* nest-ima folder contains two things,
+* a) per-chip reserved memory region for Nest PMU Counter data
+* b) Support Nest PMU units and their event files
+*/
+   for_each_node_with_property(dev, ibm,ima-chip) {
+   gcid = of_get_property(dev, ibm,chip-id, NULL);
+   chip_ima_reg = of_get_property(dev, reg, NULL);
+   chip_ima_size = of_get_property(dev, size, NULL);
+
+   if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+   pr_err(Nest_PMU: device %s missing property\n,
+   dev-full_name);
+   return -ENODEV;
+   }
+
+   /* chip id to save reserve memory region */
+   idx = (uint32_t)be32_to_cpup(gcid);
+
+   /*
+* Using a local variable to make it compact and
+* easier to read
+*/
+   p8ni = p8_nest_perchip_info[idx];
+   p8ni-pbase = be64_to_cpup(chip_ima_reg);
+   p8ni-size = be64_to_cpup(chip_ima_size);
+   p8ni-vbase = (uint64_t) phys_to_virt(p8ni-pbase);
+   }
+
+   return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+   int ret = -ENODEV;
+
+   /*
+* Lets do this only if we are hypervisor
+*/
+   if (!cur_cpu_spec-oprofile_cpu_type ||
+   !(strcmp(cur_cpu_spec-oprofile_cpu_type, ppc64/power8) == 0) ||
+   !cpu_has_feature(CPU_FTR_HVMODE))
+   return ret;
+
+   /*
+* Nest PMU information is grouped under nest-ima node
+* of the top-level device-tree directory. Detect Nest PMU
+ 

[PATCH v3 6/7]powerpc/powernv: generic nest pmu event functions

2015-06-25 Thread Madhavan Srinivasan
Add set of generic nest pmu related event functions to be used by
each nest pmu. Add code to register nest pmus.

Cc: Michael Ellerman m...@ellerman.id.au
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: Anton Blanchard an...@samba.org
Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
Cc: Anshuman Khandual khand...@linux.vnet.ibm.com
Cc: Stephane Eranian eran...@google.com
Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/perf/nest-pmu.c | 104 +++
 1 file changed, 104 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 20ed9f8..c2ada13 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -24,6 +24,100 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
 };
 
+static int p8_nest_event_init(struct perf_event *event)
+{
+   int chip_id;
+
+   if (event-attr.type != event-pmu-type)
+   return -ENOENT;
+
+   /* Sampling not supported yet */
+   if (event-hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event-attr.exclude_user   ||
+   event-attr.exclude_kernel ||
+   event-attr.exclude_hv ||
+   event-attr.exclude_idle   ||
+   event-attr.exclude_host   ||
+   event-attr.exclude_guest)
+   return -EINVAL;
+
+   if (event-cpu  0)
+   return -EINVAL;
+
+   chip_id = topology_physical_package_id(event-cpu);
+   event-hw.event_base = event-attr.config +
+   p8_nest_perchip_info[chip_id].vbase;
+
+   return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+   uint64_t *addr;
+   u64 data = 0;
+
+   addr = (u64 *)event-hw.event_base;
+   data = __be64_to_cpu(*addr);
+   local64_set(event-hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+   u64 counter_prev, counter_new, final_count;
+   uint64_t *addr;
+
+   addr = (uint64_t *)event-hw.event_base;
+   counter_prev = local64_read(event-hw.prev_count);
+   counter_new = __be64_to_cpu(*addr);
+   final_count = counter_new - counter_prev;
+
+   local64_set(event-hw.prev_count, counter_new);
+   local64_add(final_count, event-count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+   event-hw.state = 0;
+   p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+   if (flags  PERF_EF_UPDATE)
+   p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+   if (flags  PERF_EF_START)
+   p8_nest_event_start(event, flags);
+
+   return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+   if (!pmu)
+   return -EINVAL;
+
+   pmu-pmu.task_ctx_nr = perf_invalid_context;
+   pmu-pmu.event_init = p8_nest_event_init;
+   pmu-pmu.add = p8_nest_event_add;
+   pmu-pmu.del = p8_nest_event_stop;
+   pmu-pmu.start = p8_nest_event_start;
+   pmu-pmu.stop = p8_nest_event_stop;
+   pmu-pmu.read = p8_nest_perf_event_update;
+   pmu-pmu.attr_groups = pmu-attr_groups;
+
+   return 0;
+}
+
 static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
 {
@@ -179,6 +273,16 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
update_events_in_group(
(struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
 
+   update_pmu_ops(pmu_ptr);
+   /* Register the pmu */
+   ret = perf_pmu_register(pmu_ptr-pmu, pmu_ptr-pmu.name, -1);
+   if (ret) {
+   pr_err(Nest PMU %s Register failed\n, pmu_ptr-pmu.name);
+   return ret;
+   }
+
+   pr_info(%s performance monitor hardware support registered\n,
+   pmu_ptr-pmu.name);
return 0;
 }
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 7/7]powerpc/powernv: nest pmu cpumask and cpu hotplug support

2015-06-25 Thread Madhavan Srinivasan
Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each node/chip
is designated to read counters.

On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same node/chip is designated
as new cpu to read counters.

Cc: Michael Ellerman m...@ellerman.id.au
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: Anton Blanchard an...@samba.org
Cc: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
Cc: Anshuman Khandual khand...@linux.vnet.ibm.com
Cc: Stephane Eranian eran...@google.com
Cc: Preeti U Murthy pre...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@kernel.org
Cc: Peter Zijlstra pet...@infradead.org
Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/perf/nest-pmu.c | 146 +++
 1 file changed, 146 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index c2ada13..31943c5 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@
 
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
 static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+static cpumask_t nest_pmu_cpu_mask;
 
 PMU_FORMAT_ATTR(event, config:0-20);
 struct attribute *p8_nest_format_attrs[] = {
@@ -24,6 +25,147 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
 };
 
+static ssize_t nest_pmu_cpumask_get_attr(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return cpumap_print_to_pagebuf(true, buf, nest_pmu_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *nest_pmu_cpumask_attrs[] = {
+   dev_attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group nest_pmu_cpumask_attr_group = {
+   .attrs = nest_pmu_cpumask_attrs,
+};
+
+static void nest_init(void *dummy)
+{
+   opal_nest_ima_control(P8_NEST_ENGINE_START);
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   int i;
+
+   for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
+   perf_pmu_migrate_context(per_nest_pmu_arr[i]-pmu,
+   old_cpu, new_cpu);
+}
+
+static void nest_exit_cpu(int cpu)
+{
+   int nid, target = -1;
+   struct cpumask *l_cpumask;
+
+   /*
+* Check in the designated list for this cpu. Dont bother
+* if not one of them.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, nest_pmu_cpu_mask))
+   return;
+
+   /*
+* Now that this cpu is one of the designated,
+* find a next cpu a) which is online and b) in same chip.
+*/
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+   target = cpumask_next(cpu, l_cpumask);
+
+   /*
+* Update the cpumask with the target cpu and
+* migrate the context if needed
+*/
+   if (target = 0  target = nr_cpu_ids) {
+   cpumask_set_cpu(target, nest_pmu_cpu_mask);
+   nest_change_cpu_context(cpu, target);
+   }
+}
+
+static void nest_init_cpu(int cpu)
+{
+   int nid, fcpu, ncpu;
+   struct cpumask *l_cpumask, tmp_mask;
+
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+
+   /*
+* if empty cpumask, just add incoming cpu and move on.
+*/
+   if (!cpumask_and(tmp_mask, l_cpumask, nest_pmu_cpu_mask)) {
+   cpumask_set_cpu(cpu, nest_pmu_cpu_mask);
+   return;
+   }
+
+   /*
+* Alway have the first online cpu of a chip as designated one.
+*/
+   fcpu = cpumask_first(l_cpumask);
+   ncpu = cpumask_next(cpu, l_cpumask);
+   if (cpu == fcpu) {
+   if (cpumask_test_and_clear_cpu(ncpu, nest_pmu_cpu_mask)) {
+   cpumask_set_cpu(cpu, nest_pmu_cpu_mask);
+   nest_change_cpu_context(ncpu, cpu);
+   }
+   }
+}
+
+static int nest_pmu_cpu_notifier(struct notifier_block *self,
+   unsigned long action, void *hcpu)
+{
+   long cpu = (long)hcpu;
+
+   switch (action  ~CPU_TASKS_FROZEN) {
+   case CPU_ONLINE:
+   nest_init_cpu(cpu);
+   break;
+   case CPU_DOWN_PREPARE:
+  nest_exit_cpu(cpu);
+  break;
+   default:
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block nest_pmu_cpu_nb = {
+   .notifier_call  = nest_pmu_cpu_notifier,
+   .priority   = CPU_PRI_PERF + 1,
+};
+
+void nest_pmu_cpumask_init(void)
+{
+   const struct cpumask *l_cpumask;
+   int cpu, nid;
+
+   cpu_notifier_register_begin();
+
+   /*
+* Nest PMUs are per-chip counters. So designate a cpu
+* from 

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-25 Thread Eric B Munson
On Tue, 23 Jun 2015, Vlastimil Babka wrote:

 On 06/15/2015 04:43 PM, Eric B Munson wrote:
 Note that the semantic of MAP_LOCKED can be subtly surprising:
 
 mlock(2) fails if the memory range cannot get populated to guarantee
 that no future major faults will happen on the range.
 mmap(MAP_LOCKED) on the other hand silently succeeds even if the
 range was populated only
 partially.
 
 ( from http://marc.info/?l=linux-mmm=143152790412727w=2 )
 
 So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While
 MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's
 sufficient reason not to extend mmap by new mlock() flags that can
 be instead applied to the VMA after mmapping, using the proposed
 mlock2() with flags. So I think instead we could deprecate
 MAP_LOCKED more prominently. I doubt the overhead of calling the
 extra syscall matters here?
 
 We could talk about retiring the MAP_LOCKED flag but I suspect that
 would get significantly more pushback than adding a new mmap flag.
 
 Oh no we can't retire as in remove the flag, ever. Just not
 continue the way of mmap() flags related to mlock().
 
 Likely that the overhead does not matter in most cases, but presumably
 there are cases where it does (as we have a MAP_LOCKED flag today).
 Even with the proposed new system calls I think we should have the
 MAP_LOCKONFAULT for parity with MAP_LOCKED.
 
 I'm not convinced, but it's not a major issue.
 
 
 - mlock() takes a `flags' argument.  Presently that's
MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
to specify which flags are being cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.
 
 Now, that's how we would have designed all this on day one.  And I
 think we can do this now, by adding new mlock2() and munlock2()
 syscalls.  And we may as well deprecate the old mlock() and munlock(),
 not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple
 boilerplate and wrappers and such, and it gets the interface correct,
 and extensible.
 
 If the new LOCKONFAULT functionality is indeed desired (I haven't
 still decided myself) then I agree that would be the cleanest way.
 
 Do you disagree with the use cases I have listed or do you think there
 is a better way of addressing those cases?
 
 I'm somewhat sceptical about the security one. Are security
 sensitive buffers that large to matter? The performance one is more
 convincing and I don't see a better way, so OK.

They can be, the two that come to mind are medical images and high
resolution sensor data.

 
 
 
 What do others think?
 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-25 Thread Andy Lutomirski
On Thu, Jun 25, 2015 at 7:16 AM, Eric B Munson emun...@akamai.com wrote:
 On Tue, 23 Jun 2015, Vlastimil Babka wrote:

 On 06/15/2015 04:43 PM, Eric B Munson wrote:
 
 If the new LOCKONFAULT functionality is indeed desired (I haven't
 still decided myself) then I agree that would be the cleanest way.
 
 Do you disagree with the use cases I have listed or do you think there
 is a better way of addressing those cases?

 I'm somewhat sceptical about the security one. Are security
 sensitive buffers that large to matter? The performance one is more
 convincing and I don't see a better way, so OK.

 They can be, the two that come to mind are medical images and high
 resolution sensor data.

I think we've been handling sensitive memory pages wrong forever.  We
shouldn't lock them into memory; we should flag them as sensitive and
encrypt them if they're ever written out to disk.

--Andy
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-25 Thread Eric B Munson
On Wed, 24 Jun 2015, Michal Hocko wrote:

 On Mon 22-06-15 10:18:06, Eric B Munson wrote:
  On Mon, 22 Jun 2015, Michal Hocko wrote:
  
   On Fri 19-06-15 12:43:33, Eric B Munson wrote:
 [...]
Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
new MAP_LOCKONFAULT flag (or both)? 
   
   I thought the MAP_FAULTPOPULATE (or any other better name) would
   directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
   locked semantic. We already have VM_LOCKED for that. The direct effect
   of the flag would be to prevent from population other than the direct
   page fault - including any speculative actions like fault around or
   read-ahead.
  
  I like the ability to control other speculative population, but I am not
  sure about overloading it with the VM_LOCKONFAULT case.  Here is my
  concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
  LOCKONFAULT, how can we tell the difference between someone that wants
  to avoid read-ahead and wants to use mlock()?
 
 Not sure I understand. Something like?
 addr = mmap(VM_FAULTPOPULATE) # To prevent speculative mappings into the vma
 [...]
 mlock(addr, len) # Now I want the full mlock semantic

So this leaves us without the LOCKONFAULT semantics?  That is not at all
what I am looking for.  What I want is a way to express 3 possible
states of a VMA WRT locking, locked (populated and all pages on the
unevictable LRU), lock on fault (populated by page fault, pages that are
present are on the unevictable LRU, newly faulted pages are added to
same), and not locked.

 
 and the later to have the full mlock semantic and populate the given
 area regardless of VM_FAULTPOPULATE being set on the vma? This would
 be an interesting question because mlock man page clearly states the
 semantic and that is to _always_ populate or fail. So I originally
 thought that it would obey VM_FAULTPOPULATE but this needs a more
 thinking.
 
  This might lead to some
  interesting states with mlock() and munlock() that take flags.  For
  instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
  munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
  VM_LOCKONFAULT set. 
 
 This is really confusing. Let me try to rephrase that. So you have
 mlock(addr, len, MLOCK_ONFAULT)
 munlock(addr, len, MLOCK_LOCKED)
 
 IIUC you would expect the vma still being MLOCK_ONFAULT, right? Isn't
 that behavior strange and unexpected? First of all, munlock has
 traditionally dropped the lock on the address range (e.g. what should
 happen if you did plain old munlock(addr, len)). But even without
 that. You are trying to unlock something that hasn't been locked the
 same way. So I would expect -EINVAL at least, if the two modes should be
 really represented by different flags.

I would expect it to remain MLOCK_LOCKONFAULT because the user requested
munlock(addr, len, MLOCK_LOCKED).  It is not currently an error to
unlock memory that is not locked.  We do this because we do not require
the user track what areas are locked.  It is acceptable to have a mostly
locked area with holes unlocked with a single call to munlock that spans
the entire area.  The same semantics should hold for munlock with flags.
If I have an area with MLOCK_LOCKED and MLOCK_ONFAULT interleaved, it
should be acceptable to clear the MLOCK_ONFAULT flag from those areas
with a single munlock call that spans the area.

On top of continuing with munlock semantics, the implementation would
need the ability to rollback an munlock call if it failed after altering
VMAs.  If we have the same interleaved area as before and we go to
return -EINVAL the first time we hit an area that was MLOCK_LOCKED, how
do we restore the state of the VMAs we have already processed, and
possibly merged/split?
 
 Or did you mean the both types of lock like:
 mlock(addr, len, MLOCK_ONFAULT) | mmap(MAP_LOCKONFAULT)
 mlock(addr, len, MLOCK_LOCKED)
 munlock(addr, len, MLOCK_LOCKED)
 
 and that should keep MLOCK_ONFAULT?
 This sounds even more weird to me because that means that the vma in
 question would be locked by two different mechanisms. MLOCK_LOCKED with
 the always populate semantic would rule out MLOCK_ONFAULT so what
 would be the meaning of the other flag then? Also what should regular
 munlock(addr, len) without flags unlock? Both?

This is indeed confusing and not what I was trying to illustrate, but
since you bring it up.  mlockall() currently clears all flags and then
sets the new flags with each subsequent call.  mlock2 would use that
same behavior, if LOCKED was specified for a ONFAULT region, that region
would become LOCKED and vice versa.

I have the new system call set ready, I am waiting to post for rc1 so I
can run the benchmarks again on a base more stable than the middle of a
merge window.  We should wait to hash out implementations until the code
is up rather than talk past eachother here.

 
  If we use VM_FAULTPOPULATE, the same pair of calls
  would clear VM_LOCKED, but leave 

Re: powerpc,numa: Memory hotplug to memory-less nodes ?

2015-06-25 Thread Nishanth Aravamudan
On 24.06.2015 [07:13:36 -0500], Nathan Fontenot wrote:
 On 06/23/2015 11:01 PM, Bharata B Rao wrote:
  So will it be correct to say that memory hotplug to memory-less node
  isn't supported by PowerPC kernel ? Should I enforce the same in QEMU
  for PowerKVM ?
 
 
 I'm not sure if that is correct. It appears that we initialize all online
 nodes, even those without spanned_pages, at boot time. This occurs
 in setup_node_data() called from initmem_init().
 
 Looking at this I would think that we could add memory to any online node
 even if it does not have any spanned_pages. I think an interesting test
 we be to check for the node being online instead of checking to see if
 it has any memory.

I see no *technical* reason we should't be able to hotplug to an
initially memoryless node. I'm not sure it happens in practice under
PowerVM (where we have far less control of the topology anyways). One
aspect of this that I have on my todo list is seeing what SLUB does when
a node goes from memoryless to populated -- as during boot memoryless
nodes get a 'useless' per node structure (early_kmem_cache_node_alloc).

I can look at this a bit under KVM maybe later this week myself to see
what happens in a guest.

-Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [8/8] powerpc/perf: cleanup in perf_event_print_debug()

2015-06-25 Thread Michael Ellerman
On Thu, 2015-11-06 at 08:43:37 UTC, Madhavan Srinivasan wrote:
 From: Janani janan...@linux.vnet.ibm.com
 
 Code cleanup/fix in perf_event_print_debug(). Performance
 Monitoring Unit (PMU) registers in the server side
 are 64bit long.

No they're not, see the ISA, figure 17.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,9/9] fsl/fman: Add FMan MAC driver

2015-06-25 Thread Paul Bolle
(Evolution 3.16 is basically unbearable for replying to patches. Anyone
else running into this?) 

On Wed, 2015-06-24 at 22:37 +0300, igal.liber...@freescale.com wrote:
 
 --- /dev/null
 +++ b/drivers/net/ethernet/freescale/fman/mac/mac-api.c
 +int set_mac_active_pause(struct mac_device *mac_dev, bool rx, bool tx)
 +{
 + [...]
 +}
 +EXPORT_SYMBOL(set_mac_active_pause);

Which module is using this function?

 +void get_pause_cfg(struct mac_device *mac_dev, bool *rx_pause, bool 
 *tx_pause)
 +{
 + [...]
 +}
 +EXPORT_SYMBOL(get_pause_cfg);

This exports a function that is only used in this file. Why? 

Thanks,


Paul Bolle
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,8/9] fsl/fman: Add FMan Port Support

2015-06-25 Thread Paul Bolle
On Wed, 2015-06-24 at 22:37 +0300, igal.liber...@freescale.com wrote:
 --- a/drivers/net/ethernet/freescale/fman/fm_drv.c
 +++ b/drivers/net/ethernet/freescale/fman/fm_drv.c

 +struct fm_port_t *fm_port_drv_handle(const struct fm_port_drv_t *port)
 +{
 + return port-fm_port;
 +}
 +EXPORT_SYMBOL(fm_port_drv_handle);

I couldn't find any users of this function.

 +void fm_port_get_buff_layout_ext_params(struct fm_port_drv_t *port,
 +struct fm_port_params *params)

(Evolution 3.16 is a piece of ...).

 +{
 + params-data_align = 0;
 +}
 +EXPORT_SYMBOL(fm_port_get_buff_layout_ext_params);

Ditto.

 +int fm_get_tx_port_channel(struct fm_port_drv_t *port)
 +{
 + return port-tx_ch;
 +}
 +EXPORT_SYMBOL(fm_get_tx_port_channel);

Ditto.

 --- /dev/null
 +++ b/drivers/net/ethernet/freescale/fman/fm_port_drv.c

 +void fm_set_rx_port_params(struct fm_port_drv_t *port,
 +struct fm_port_params *params)
 +{
+   [...]
 +}
 +EXPORT_SYMBOL(fm_set_rx_port_params);

Ditto.

(If you hear about my arrest for randomly attacking innocent people:
blame evolution 3.16!)

 +void fm_set_tx_port_params(struct fm_port_drv_t *port,
 +struct fm_port_params *params)
 +{
 + [...]
 +}
 +EXPORT_SYMBOL(fm_set_tx_port_params);

Ditto.

 --- /dev/null
 +++ b/drivers/net/ethernet/freescale/fman/port/fm_port.c

 +u64 *fm_port_get_buffer_time_stamp(const struct fm_port_t *p_fm_port,
 +char *p_data)
 +{
 + [...]
 +}
 +EXPORT_SYMBOL(fm_port_get_buffer_time_stamp);

Ditto.

 +int fm_port_disable(struct fm_port_t *p_fm_port)
 +{
 + [...]
 +}
 +EXPORT_SYMBOL(fm_port_disable);

This exports a function that I think is only used inside this file.

 +int fm_port_enable(struct fm_port_t *p_fm_port)
 +{
 + [...]
 +}
 +EXPORT_SYMBOL(fm_port_enable);

And here I could again find no users of this function.

Thanks,


Paul Bolle
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,9/9] fsl/fman: Add FMan MAC driver

2015-06-25 Thread Michael Ellerman
On Thu, 2015-06-25 at 19:59 -0500, Scott Wood wrote:
 On Fri, 2015-06-26 at 01:06 +0200, Paul Bolle wrote:
  (Evolution 3.16 is basically unbearable for replying to patches. 
  Anyone
  else running into this?) 
 
 If you mean the crazy lag when selecting moderate-to-large amounts of 
 text (for snipping), yes.

I recommend the external editor plugin with vim.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,9/9] fsl/fman: Add FMan MAC driver

2015-06-25 Thread Scott Wood
On Fri, 2015-06-26 at 12:21 +1000, Michael Ellerman wrote:
 On Thu, 2015-06-25 at 19:59 -0500, Scott Wood wrote:
  On Fri, 2015-06-26 at 01:06 +0200, Paul Bolle wrote:
   (Evolution 3.16 is basically unbearable for replying to patches. 
   Anyone
   else running into this?) 
  
  If you mean the crazy lag when selecting moderate-to-large amounts of 
  text (for snipping), yes.
 
 I recommend the external editor plugin with vim.

I tried the external editor plugin (not with vim) and it failed to bring the 
externally made edits back into the evolution compose window.  It then 
started spastically respawning the external editor without my doing anything.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,5/9] fsl/fman: Add Frame Manager support

2015-06-25 Thread Scott Wood
On Wed, 2015-06-24 at 22:35 +0300, igal.liber...@freescale.com wrote:
 From: Igal Liberman igal.liber...@freescale.com
 
 Add Frame Manger Driver support.
 This patch adds The FMan configuration, initialization and
 runtime control routines.
 
 Signed-off-by: Igal Liberman igal.liber...@freescale.com
 ---
  drivers/net/ethernet/freescale/fman/Kconfig|   35 +
  drivers/net/ethernet/freescale/fman/Makefile   |2 +-
  drivers/net/ethernet/freescale/fman/fm.c   | 1406 
 
  drivers/net/ethernet/freescale/fman/fm.h   |  394 ++
  drivers/net/ethernet/freescale/fman/fm_common.h|  142 ++
  drivers/net/ethernet/freescale/fman/fm_drv.c   |  701 ++
  drivers/net/ethernet/freescale/fman/fm_drv.h   |  116 ++
  drivers/net/ethernet/freescale/fman/inc/enet_ext.h |  199 +++
  drivers/net/ethernet/freescale/fman/inc/fm_ext.h   |  488 +++
  .../net/ethernet/freescale/fman/inc/fsl_fman_drv.h |   99 ++
  drivers/net/ethernet/freescale/fman/inc/service.h  |   55 +
  11 files changed, 3636 insertions(+), 1 deletion(-)
  create mode 100644 drivers/net/ethernet/freescale/fman/fm.c
  create mode 100644 drivers/net/ethernet/freescale/fman/fm.h
  create mode 100644 drivers/net/ethernet/freescale/fman/fm_common.h
  create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.c
  create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.h
  create mode 100644 drivers/net/ethernet/freescale/fman/inc/enet_ext.h
  create mode 100644 drivers/net/ethernet/freescale/fman/inc/fm_ext.h
  create mode 100644 drivers/net/ethernet/freescale/fman/inc/fsl_fman_drv.h
  create mode 100644 drivers/net/ethernet/freescale/fman/inc/service.h

Again, please start with something pared down, without extraneous features, 
but *with* enough functionality to actually pass packets around.  Getting 
this thing into decent shape is going to be hard enough without carrying 
around the excess baggage.

 diff --git a/drivers/net/ethernet/freescale/fman/Kconfig 
 b/drivers/net/ethernet/freescale/fman/Kconfig
 index 825a0d5..12c75bfd 100644
 --- a/drivers/net/ethernet/freescale/fman/Kconfig
 +++ b/drivers/net/ethernet/freescale/fman/Kconfig
 @@ -7,3 +7,38 @@ config FSL_FMAN
   Freescale Data-Path Acceleration Architecture Frame Manager
   (FMan) support
  
 +if FSL_FMAN
 +
 +config FSL_FM_MAX_FRAME_SIZE
 + int Maximum L2 frame size
 + range 64 9600
 + default 1522
 + help
 + Configure this in relation to the maximum possible MTU of your
 + network configuration. In particular, one would need to
 + increase this value in order to use jumbo frames.
 + FSL_FM_MAX_FRAME_SIZE must accommodate the Ethernet FCS
 + (4 bytes) and one ETH+VLAN header (18 bytes), to a total of
 + 22 bytes in excess of the desired L3 MTU.
 +
 + Note that having too large a FSL_FM_MAX_FRAME_SIZE (much larger
 + than the actual MTU) may lead to buffer exhaustion, especially
 + in the case of badly fragmented datagrams on the Rx path.
 + Conversely, having a FSL_FM_MAX_FRAME_SIZE smaller than the
 + actual MTU will lead to frames being dropped.

Scatter gather can't be used for jumbo frames?

Why is this a compile-time option?

 +
 +config FSL_FM_RX_EXTRA_HEADROOM
 + int Add extra headroom at beginning of data buffers
 + range 16 384
 + default 64
 + help
 + Configure this to tell the Frame Manager to reserve some extra
 + space at the beginning of a data buffer on the receive path,
 + before Internal Context fields are copied. This is in addition
 + to the private data area already reserved for driver internal
 + use. The provided value must be a multiple of 16.
 +
 + This option does not affect in any way the layout of
 + transmitted buffers.

There's nothing here to indicate when a user would want to do this.

Why is this a compile-time option?

 + /* FManV3H */
 + else if (minor == 0 || minor == 2 || minor == 3) {
 + intg-fm_muram_size = 384 * 1024;
 + intg-fm_iram_size  = 64 * 1024;
 + intg-fm_num_of_ctrl= 4;
 +
 + intg-bmi_max_num_of_tasks  = 128;
 + intg-bmi_max_num_of_dmas   = 84;
 +
 + intg-num_of_rx_ports   = 8;
 + } else {
 + pr_err(Unsupported FManv3 version\n);
 + kfree(intg);
 + return NULL;
 + }
 +
 + break;
 + default:
 + pr_err(Unsupported FMan version\n);
 + kfree(intg);
 + return NULL;
 + }

Don't duplicate error paths.  Use goto like the rest of the kernel.

 +
 + intg-bmi_max_fifo_size = 

Re: [PATCH v4] powerpc/rcpm: add RCPM driver

2015-06-25 Thread Scott Wood
On Tue, 2015-06-23 at 16:07 +0800, yuantian.t...@freescale.com wrote:
 From: Tang Yuantian yuantian.t...@freescale.com
 
 There is a RCPM (Run Control/Power Management) in Freescale QorIQ
 series processors. The device performs tasks associated with device
 run control and power management.
 
 The driver implements some features: mask/unmask irq, enter/exit low
 power states, freeze time base, etc.
 
 Signed-off-by: Chenhui Zhao chenhui.z...@freescale.com
 Signed-off-by: Tang Yuantian yuantian.t...@freescale.com
 ---
 v4:
   - refine bindings document
 v3:
   - added static and __init modifier to fsl_rcpm_init
 v2:
   - fix code style issues
   - refine compatible string match part
 
  Documentation/devicetree/bindings/soc/fsl/rcpm.txt |  42 +++
  arch/powerpc/include/asm/fsl_guts.h| 105 +++
  arch/powerpc/include/asm/fsl_pm.h  |  48 +++
  arch/powerpc/platforms/85xx/Kconfig|   1 +
  arch/powerpc/sysdev/Kconfig|   5 +
  arch/powerpc/sysdev/Makefile   |   1 +
  arch/powerpc/sysdev/fsl_rcpm.c | 338 
 +
  7 files changed, 540 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/soc/fsl/rcpm.txt
  create mode 100644 arch/powerpc/include/asm/fsl_pm.h
  create mode 100644 arch/powerpc/sysdev/fsl_rcpm.c
 
 diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt 
 b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
 new file mode 100644
 index 000..1f58018
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
 @@ -0,0 +1,42 @@
 +* Run Control and Power Management
 +
 +The RCPM performs all device-level tasks associated with device run control
 +and power management.
 +
 +Required properites:
 +  - reg : Offset and length of the register set of RCPM block.
 +  - compatible : Sould contain a chip-specific RCPM block compatible string
 + and (if applicable) may contain a chassis-version RCPM compatible 
 string.
 + Chip-specific strings are of the form fsl,chip-rcpm, such as:
 + * fsl,p2041-rcpm
 + * fsl,p3041-rcpm
 + * fsl,p4080-rcpm
 + * fsl,p5020-rcpm
 + * fsl,p5040-rcpm
 + * fsl,t4240-rcpm
 + * fsl,b4420-rcpm
 + * fsl,b4860-rcpm
 +
 + Chassis-version RCPM strings include:
 + * fsl,qoriq-rcpm-1.0: for chassis 1.0 rcpm
 + * fsl,qoriq-rcpm-2.0: for chassis 2.0 rcpm
 +
 +All references to 1.0 and 2.0 refer to the QorIQ chassis version to
 +which the chip complies.
 +Chassis Version  Example Chips
 +---  ---
 +1.0  p4080, p5020, p5040, p2041, p3041
 +2.0  t4240, b4860, t1040, b4420

I don't think it's accurate to call t1040 chassis 2.0.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,3/9] fsl/fman: Add the FMan MAC FLIB

2015-06-25 Thread Scott Wood
On Wed, 2015-06-24 at 22:34 +0300, igal.liber...@freescale.com wrote:
 From: Igal Liberman igal.liber...@freescale.com
 
 The FMan MAC FLib provides basic API used by the drivers to
 configure and control the FMan MAC hardware.
 
 Signed-off-by: Igal Liberman igal.liber...@freescale.com
...
  +int fman_dtsec_mii_write_reg(struct dtsec_mii_reg __iomem *regs, uint8_t 
 addr,
 +  uint8_t reg, uint16_t data, uint16_t dtsec_freq)
 +{
 + uint32_t tmp;
 +
 + /* Setup the MII Mgmt clock speed */
 + iowrite32be((uint32_t)dtsec_mii_get_div(dtsec_freq), regs-miimcfg);
 + /* Memory barrier */
 + wmb();
 +
 + /* Stop the MII management read cycle */
 + iowrite32be(0, regs-miimcom);
 + /* Dummy read to make sure MIIMCOM is written */
 + tmp = ioread32be(regs-miimcom);
 + /* Memory barrier */
 + wmb();
 +
 + /* Setting up MII Management Address Register */
 + tmp = (uint32_t)((addr  MIIMADD_PHY_ADDR_SHIFT) | reg);
 + iowrite32be(tmp, regs-miimadd);
 + /* Memory barrier */
 + wmb();
 +
 + /* Setting up MII Management Control Register with data */
 + iowrite32be((uint32_t)data, regs-miimcon);
 + /* Dummy read to make sure MIIMCON is written */
 + tmp = ioread32be(regs-miimcon);
 + /* Memory barrier */
 + wmb();

iowrite32be() should already contain a memory barrier.

 +
 + /* Wait until MII management write is complete */
 + /* todo: a timeout could be useful here */
 + while ((ioread32be(regs-miimind))  MIIMIND_BUSY)
 + ; /* busy wait */
 +
 + return 0;
 +}

Please add the timeout.

 + /* Read MII management status  */
 + *data = (uint16_t)ioread32be(regs-miimstat);

Unnecessary cast (please check for these throughout the patchset).

There are also casts in this patchset that are only needed because a variable 
was unnecessarily defined with a smaller-than-32-bit data type.


 +void fman_memac_reset(struct memac_regs __iomem *regs)
 +{
 + uint32_t tmp;
 +
 + tmp = ioread32be(regs-command_config);
 +
 + tmp |= CMD_CFG_SW_RESET;
 +
 + iowrite32be(tmp, regs-command_config);
 +
 + while (ioread32be(regs-command_config)  CMD_CFG_SW_RESET)
 + ;
 +}

Timeout, here and in all such loops.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,5/9] fsl/fman: Add Frame Manager support

2015-06-25 Thread Paul Bolle
On Fri, 2015-06-26 at 01:53 +0200, Paul Bolle wrote:
 So I decided to pick one subject: exports. I think I had something to
 comment on all eight of them.

s/eight/twelve/


Paul Bolle
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,5/9] fsl/fman: Add Frame Manager support

2015-06-25 Thread Paul Bolle
On Wed, 2015-06-24 at 22:35 +0300, igal.liber...@freescale.com wrote:
 --- /dev/null
 +++ b/drivers/net/ethernet/freescale/fman/fm_drv.c

 +u16 fm_get_max_frm(void)
 +{
 + return fsl_fm_max_frm;
 +}
 +EXPORT_SYMBOL(fm_get_max_frm);

Which module is using this export? (And what does this function
actually do?)

 +int fm_get_rx_extra_headroom(void)
 +{
 + return ALIGN(fsl_fm_rx_extra_headroom, 16);
 +}
 +EXPORT_SYMBOL(fm_get_rx_extra_headroom);

This exports an unused function.

I don't know how to, well, review a series that adds almost 20K lines.
So I decided to pick one subject: exports. I think I had something to
comment on all eight of them.

I'm not sure if I'll try another scan with a different subject.

Thanks,


Paul Bolle
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,9/9] fsl/fman: Add FMan MAC driver

2015-06-25 Thread Scott Wood
On Fri, 2015-06-26 at 01:06 +0200, Paul Bolle wrote:
 (Evolution 3.16 is basically unbearable for replying to patches. 
 Anyone
 else running into this?) 

If you mean the crazy lag when selecting moderate-to-large amounts of 
text (for snipping), yes.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/9] fsl/fman: Add the FMan FLIB

2015-06-25 Thread Scott Wood
On Wed, 2015-06-24 at 22:33 +0300, igal.liber...@freescale.com wrote:
 From: Igal Liberman igal.liber...@freescale.com
 
 The FMan FLib provides the basic API used by the FMan drivers to
  configure and control the FMan hardware.
 
 Signed-off-by: Igal Liberman igal.liber...@freescale.com

Again, what is an FLib?  What determines whether content should go in 
the flib directory?

The patch title says Add the FMan FLIB, but there's more code added 
outside the flib directory than inside.
 
FMan drivers?  There's more than one?  What is 
drivers/net/ethernet/freescale/fman/fman.c if not the FMan driver? 
 What is the FMan driver if not the code to configure and control 
the FMan hardware?  If this is a public API, where's the 
documentation?

---
  drivers/net/ethernet/freescale/Kconfig |1 +
  drivers/net/ethernet/freescale/Makefile|2 +
  drivers/net/ethernet/freescale/fman/Kconfig|7 +
  drivers/net/ethernet/freescale/fman/Makefile   |5 +
  .../net/ethernet/freescale/fman/flib/fsl_fman.h|  608 
 
  drivers/net/ethernet/freescale/fman/fman.c |  975 
 
  6 files changed, 1598 insertions(+)
  create mode 100644 drivers/net/ethernet/freescale/fman/Kconfig
  create mode 100644 drivers/net/ethernet/freescale/fman/Makefile
  create mode 100644 
 drivers/net/ethernet/freescale/fman/flib/fsl_fman.h
  create mode 100644 drivers/net/ethernet/freescale/fman/fman.c
 
 diff --git a/drivers/net/ethernet/freescale/Kconfig 
 b/drivers/net/ethernet/freescale/Kconfig
 index 25e3425..24e938d 100644
 --- a/drivers/net/ethernet/freescale/Kconfig
 +++ b/drivers/net/ethernet/freescale/Kconfig
 @@ -55,6 +55,7 @@ config FEC_MPC52xx_MDIO
 If compiled as module, it will be called fec_mpc52xx_phy.
  
  source drivers/net/ethernet/freescale/fs_enet/Kconfig
 +source drivers/net/ethernet/freescale/fman/Kconfig
  
  config FSL_PQ_MDIO
   tristate Freescale PQ MDIO
 diff --git a/drivers/net/ethernet/freescale/Makefile 
 b/drivers/net/ethernet/freescale/Makefile
 index 71debd1..4097c58 100644
 --- a/drivers/net/ethernet/freescale/Makefile
 +++ b/drivers/net/ethernet/freescale/Makefile
 @@ -17,3 +17,5 @@ gianfar_driver-objs := gianfar.o \
   gianfar_ethtool.o
  obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
  ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o
 +
 +obj-$(CONFIG_FSL_FMAN) += fman/
 diff --git a/drivers/net/ethernet/freescale/fman/Kconfig 
 b/drivers/net/ethernet/freescale/fman/Kconfig
 new file mode 100644
 index 000..8aeae29
 --- /dev/null
 +++ b/drivers/net/ethernet/freescale/fman/Kconfig
 @@ -0,0 +1,7 @@
 +config FSL_FMAN
 + bool FMan support
 + depends on FSL_SOC || COMPILE_TEST
 + default n
 + help
 + Freescale Data-Path Acceleration Architecture Frame Manager
 + (FMan) support

default n is a no-op.

What does enabling this option actually do, in terms of user-visible 
features?

 +typedef struct fm_prs_result_t fm_prs_result;
 +typedef enum e_enet_mode enet_mode_t;

This use of typedef is contrary to kernel coding style.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,4/9] fsl/fman: Add FMan MURAM support

2015-06-25 Thread Scott Wood
On Wed, 2015-06-24 at 22:34 +0300, igal.liber...@freescale.com wrote:
 + struct muram_info *p_muram;

No Hungarian notation.

 +void fm_muram_free(struct muram_info *p_muram)
 +{
 + /* Destroy pool */
 + gen_pool_destroy(p_muram-pool);
 + /* Unmap memory */
 + iounmap(p_muram-vbase);
 + /* Free pointer */
 + kfree(p_muram);
 +}

This type of commenting is not useful.

 + memset_io((void __iomem *)vaddr, 0, (int)size);

Unnecessary cast of size.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V9 00/13] powerpc, perf: Enable SW branch filters

2015-06-25 Thread Daniel Axtens
Hi Anshuman,

Thanks for your continued work on this.

Given that the series is now at version 9 and is 13 patches long, I
wonder if it might be worth splitting it up.

I'd suggest:

 - Patch 1 could be sent individually as it's a bug fix.

 - Separating out a series of simple cleanups would make the actual
changes in your patch set easier to understand. Patches 2, 3 and 5 are
obvious candidates.

 - It looks like the changes in patch 6 aren't used by any of the
following patches. It might be worth separating that out or just
dropping it entirely.


That would give you a series with just:
4   powerpc, perf: Restore privilege level filter support for BHRB
7   powerpc, perf: Re organize PMU branch filter processing on POWER8
8   powerpc, perf: Change the name of HW PMU branch filter tracking variable
9   powerpc, lib: Add new branch analysis support functions
10   powerpc, perf: Enable SW filtering in branch stack sampling framework
11   powerpc, perf: Change POWER8 PMU configuration to work with SW filters
12   powerpc, perf: Enable privilege mode SW branch filters
13   selftests, powerpc: Add test for BHRB branch filters (HW  SW)

That might make it easier for you to start getting the ground work in,
and make it easier for others to understand what you're trying to do.

Regards,
Daniel 

On Mon, 2015-06-15 at 17:40 +0530, Anshuman Khandual wrote:
   This is the continuation (rebased and reworked) of the series
 posted at https://lkml.org/lkml/2014/5/5/153 (which is V6). I remember
 to have incremented the version count for the re-send of the first four
 patches of the series to Peter Z for generic review which got pulled in
 last year. These patches here are the remaining powerpc part of the
 original series.
 
 Changes in V9
 =
 (1) Changed some of the commit messages and fixed some typos
 (2) Variable 'bhrb_users' type changed from int to unsigned int
 (3) Replaced the last usage of 'get_cpu_var' with 'this_cpu_ptr'
 (4) Conditional checks for 'cpuhw-bhrb_users' changed to test against zero
 (5) Updated in-code documentation inside 'check_excludes' function
 (6) Changed the name  type of 'pred' variable in 'power_pmu_bhrb_read'
 (7) Changed the name of 'tmp' to 'to_addr' inside 'power_pmu_bhrb_read'
 (8) Changed return values for branch instruction analysis functions
 (9) Changed the name of 'flag' variable to 'select_branch' inside 
 'keep_branch'
 (10) Fixed one nested conditional statement inside 'keep_branch' function
 (11) Changed function name from 'update_branch_entry' to 'insert_branch'
 (12) Fixed copyright and license statements for new selftest related files
 (13) Improved helper assembly functions with some macro definitions
 (14) Improved the core test program at various places
 (15) Added .gitignore file for the new test case
 
 Changes in V8 (http://patchwork.ozlabs.org/patch/481848/)
 =
 (1) Fixed BHRB privilege mode branch filter request processing
 (2) Dropped branch records where 'from' cannot be fetched
 (3) Added in-code documenation at various places in the patch series
 (4) Added one comprehensive seltest case to verify all the filters
 
 Changes in V7
 =
 (1) Incremented the version count while requesting pull for generic changes
 
 Changes in V6 (https://lkml.org/lkml/2014/5/5/153)
 =
 (1) Rebased the patchset against the master
 (2) Added Reviewed-by: Andi Kleen in the first four patches in the series 
 which changes the
 generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130]
 
 Changes in V5 (https://lkml.org/lkml/2014/3/7/101)
 =
 (1) Added a precursor patch to cleanup the indentation problem in 
 power_pmu_bhrb_read
 (2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which 
 improved the clarity
 (3) Merged the previous 10th patch into the 8th patch
 (4) Moved SW based branch analysis code from core perf into code-patching 
 library as suggested by Michael
 (5) Simplified the logic in branch analysis library
 (6) Fixed some ambiguities in documentation at various places
 (7) Added some more in-code documentation blocks at various places
 (8) Renamed some local variable and function names
 (9) Fixed some indentation and white space errors in the code
 (10) Implemented almost all the review comments and suggestions made by 
 Michael Ellerman on V4 patchset
 (11) Enabled privilege mode SW branch filter
 (12) Simplified and generalized the SW implemented conditional branch filter
 (13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW 
 implementation
 (14) Adjusted other patches to deal with the above changes
 
 Changes in V4 (https://lkml.org/lkml/2013/12/4/168)
 =
 (1) Changed the commit message for patch (01/10)
 (2) Changed the patch (02/10) to accommodate review comments from Michael 
 Ellerman
 (3) Rebased the patchset against latest Linus's tree
 
 Changes in V3 (https://lkml.org/lkml/2013/10/16/59)
 =
 (1) Split the SW branch filter 

Re: [PATCH] powerpc/powernv: Fix vma page prot flags in opal-prd driver

2015-06-25 Thread Vaidyanathan Srinivasan
* Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com [2015-06-21 23:56:16]:

 opal-prd driver will mmap() firmware code/data area as private
 mapping to prd user space daemon.  Write to this page will
 trigger COW faults.  The new COW pages are normal kernel RAM
 pages accounted by the kernel and are not special.
 
 vma-vm_page_prot value will be used at page fault time
 for the new COW pages, while pgprot_t value passed in
 remap_pfn_range() is used for the initial page table entry.
 
 Hence:
 * Do not add _PAGE_SPECIAL in vma, but only for remap_pfn_range()
 * Also remap_pfn_range() will add the _PAGE_SPECIAL flag using
   pte_mkspecial() call, hence no need to specify in the driver
 
 This fix resolves the page accounting warning shown below:
 BUG: Bad rss-counter state mm:c007d34ac600 idx:1 val:19
 
 The above warning is triggered since _PAGE_SPECIAL was incorrectly
 being set for the normal kernel COW pages.
 
 Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
 ---
  arch/powerpc/platforms/powernv/opal-prd.c |9 -
  1 file changed, 4 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/platforms/powernv/opal-prd.c 
 b/arch/powerpc/platforms/powernv/opal-prd.c
 index 46cb3fe..4ece8e4 100644
 --- a/arch/powerpc/platforms/powernv/opal-prd.c
 +++ b/arch/powerpc/platforms/powernv/opal-prd.c
 @@ -112,6 +112,7 @@ static int opal_prd_open(struct inode *inode, struct file 
 *file)
  static int opal_prd_mmap(struct file *file, struct vm_area_struct *vma)
  {
   size_t addr, size;
 + pgprot_t page_prot;
   int rc;
  
   pr_devel(opal_prd_mmap(0x%016lx, 0x%016lx, 0x%lx, 0x%lx)\n,
 @@ -125,13 +126,11 @@ static int opal_prd_mmap(struct file *file, struct 
 vm_area_struct *vma)
   if (!opal_prd_range_is_valid(addr, size))
   return -EINVAL;
  
 - vma-vm_page_prot = __pgprot(pgprot_val(phys_mem_access_prot(file,
 - vma-vm_pgoff,
 -  size, vma-vm_page_prot))
 - | _PAGE_SPECIAL);
 + page_prot = phys_mem_access_prot(file, vma-vm_pgoff,
 +  size, vma-vm_page_prot);
  
   rc = remap_pfn_range(vma, vma-vm_start, vma-vm_pgoff, size,
 - vma-vm_page_prot);
 + page_prot);

Hi Ben,

remap_pfn_range() is the correct method to map the firmware pages
because we will not have struct page associated with this RAM area.

We do a memblock_reserve() in early boot and take out this memory from
kernel and avoid struct page allocation/init for these.

vm_insert_page() is an alternative that would have worked if kernel
allocated the memory, in which case we can bump up the page count and
map the page to user space.  This is already done by vm_insert_page()
and we will not need to make the page special.

However, this use case fits remap_pfn_range() and page special
mechanism since there is no struct page associate with this physical
pages.

--Vaidy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH SLOF v2 2/5] introduce 8-byte LE helpers

2015-06-25 Thread Nikunj A Dadhania
Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
Reviewed-by: Thomas Huth th...@redhat.com
---
 slof/fs/little-endian.fs   | 6 ++
 slof/fs/packages/disk-label.fs | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/slof/fs/little-endian.fs b/slof/fs/little-endian.fs
index f2e4e8d..6b4779e 100644
--- a/slof/fs/little-endian.fs
+++ b/slof/fs/little-endian.fs
@@ -17,6 +17,9 @@ here c@ ef = CONSTANT ?littleendian
 
 ?bigendian [IF]
 
+: x!-le r xbflip r x! ;
+: x@-le x@ xbflip ;
+
 : l!-le  r lbflip r l! ;
 : l@-le  l@ lbflip ;
 
@@ -47,6 +50,9 @@ here c@ ef = CONSTANT ?littleendian
 
 [ELSE]
 
+: x!-le x! ;
+: x@-le x@ ;
+
 : l!-le  l! ;
 : l@-le  l@ ;
 
diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
index bb64d57..8c93cfb 100644
--- a/slof/fs/packages/disk-label.fs
+++ b/slof/fs/packages/disk-label.fs
@@ -384,8 +384,8 @@ AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4
  debug-disk-label? IF
 . GPT PReP partition found  cr
  THEN
- block gpt-part-entryfirst-lba x@ xbflip
- block gpt-part-entrylast-lba x@ xbflip
+ block gpt-part-entryfirst-lba x@-le
+ block gpt-part-entrylast-lba x@-le
  over - 1+ ( addr offset len )
  swap  ( addr len offset )
  block-size * to part-offset
-- 
2.4.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH SLOF v2 5/5] disk-label: add support for booting from GPT FAT partition

2015-06-25 Thread Nikunj A Dadhania
For a GPT+LVM combination disk, older bootloader that does not support
LVM, cannot load kernel from LVM.

The patch adds support to read from BASIC_DATA UUID partitions for the
case that the OS installer has installed the CHRP-BOOT config on a FAT
file system.

Makes GPT detection robust
* Check for Protective MBR Magic
* Check for valid GPT Signature
* Boundary check for allocated block size before reading into the
  buffer

Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
---
 slof/fs/packages/disk-label.fs | 96 +-
 1 file changed, 76 insertions(+), 20 deletions(-)

diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
index 7ed5526..e5759a3 100644
--- a/slof/fs/packages/disk-label.fs
+++ b/slof/fs/packages/disk-label.fs
@@ -179,7 +179,8 @@ CONSTANT /gpt-part-entry
 \ This word returns true if the currently loaded disk-buf has _NO_ GPT 
partition id
 : no-gpt? ( -- true|false )
0 read-disk-buf
-   1 partitionpart-entry part-entryid c@ ee 
+   1 partitionpart-entry part-entryid c@ ee  IF true EXIT THEN
+   disk-buf mbrmagic w@-le aa55 
 ;
 
 : pc-extended-partition? ( part-entry-addr -- true|false )
@@ -267,7 +268,10 @@ CONSTANT /gpt-part-entry
 
 : try-dos-partition ( -- okay? )
\ Read partition table and check magic.
-   no-mbr? IF cr . No DOS disk-label found. cr false EXIT THEN
+   no-mbr? IF
+   debug-disk-label? IF cr . No DOS disk-label found. cr THEN
+   false EXIT
+   THEN
 
count-dos-logical-partitions TO dos-logical-partitions
 
@@ -378,29 +382,80 @@ AA268B49521E5A8BCONSTANT GPT-PREP-PARTITION-4
true
 ;
 
-: load-from-gpt-prep-partition ( addr -- size )
-   no-gpt? IF drop false EXIT THEN
-   debug-disk-label? IF
-  cr . GPT partition found  cr
-   THEN
-   1 read-disk-buf disk-buf gptpart-entry-lba l@-le
+\ Check for GPT MSFT BASIC DATA GUID - fat based
+EBD0A0A2CONSTANT GPT-BASIC-DATA-PARTITION-1
+B9E5CONSTANT GPT-BASIC-DATA-PARTITION-2
+4433CONSTANT GPT-BASIC-DATA-PARTITION-3
+87C068B6B72699C7CONSTANT GPT-BASIC-DATA-PARTITION-4
+
+: gpt-basic-data-partition? ( -- true|false )
+   disk-buf gpt-part-entrypart-type-guid
+   dup l@-le GPT-BASIC-DATA-PARTITION-1  IF drop false EXIT THEN
+   dup 4 + w@-le GPT-BASIC-DATA-PARTITION-2  IF drop false EXIT THEN
+   dup 6 + w@-le GPT-BASIC-DATA-PARTITION-3  IF drop false EXIT THEN
+   8 + x@GPT-BASIC-DATA-PARTITION-4  IF false EXIT THEN
+   true
+;
+
+\
+\ GPT Signature
+\ (EFI PART, 45h 46h 49h 20h 50h 41h 52h 54h)
+\
+4546492050415254 CONSTANT GPT-SIGNATURE
+
+: verify-gpt-partition ( -- true | false )
+   no-gpt? IF false EXIT THEN
+   debug-disk-label? IF cr . GPT partition found  cr  THEN
+   1 read-disk-buf
+   disk-buf gptpart-entry-lba x@-le
block-size * to seek-pos
disk-buf gptpart-entry-size l@-le to gpt-part-size
-   disk-buf gptnum-part-entry l@-le dup 0= IF false EXIT THEN
+   gpt-part-size disk-buf-size  IF
+  cr . GPT part size exceeds buffer allocated  cr
+  false exit
+   THEN
+   disk-buf gptsignature x@ GPT-SIGNATURE =
+;
+
+: load-from-gpt-prep-partition ( addr -- size )
+   verify-gpt-partition 0= IF false EXIT THEN
+   disk-buf gptnum-part-entry l@-le dup 0= IF false exit THEN
1+ 1 ?DO
   seek-pos 0 seek drop
   disk-buf gpt-part-size read drop gpt-prep-partition? IF
- debug-disk-label? IF
-. GPT PReP partition found  cr
- THEN
- disk-buf gpt-part-entryfirst-lba x@-le
- disk-buf gpt-part-entrylast-lba x@-le
- over - 1+ ( addr offset len )
- swap  ( addr len offset )
- block-size * to part-offset
- 0 0 seek drop ( addr len )
- block-size * read ( size )
+ debug-disk-label? IF  . GPT PReP partition found  cr THEN
+ disk-buf gpt-part-entryfirst-lba x@-le ( addr first-lba )
+ disk-buf gpt-part-entrylast-lba x@-le  ( addr first-lba last-lba)
+ over - 1+( addr first-lba blocks )
+ swap ( addr blocks )
+ block-size * to part-offset  ( addr blocks )
+ 0 0 seek drop( addr blocks )
+ block-size * read( size )
  UNLOOP EXIT
+ THEN
+ seek-pos gpt-part-size i * + to seek-pos
+   LOOP
+   false
+;
+
+: try-gpt-dos-partition ( -- true | false )
+   verify-gpt-partition 0= IF false EXIT THEN
+   disk-buf gptnum-part-entry l@-le dup 0= IF false EXIT THEN
+   1+ 1 ?DO
+  seek-pos 0 seek drop
+  disk-buf gpt-part-size read drop
+  gpt-basic-data-partition? IF
+ debug-disk-label? IF . GPT LINUX DATA partition found  cr THEN
+ disk-buf gpt-part-entryfirst-lba x@-le   ( first-lba )
+ dup to part-start  ( first-lba )
+ disk-buf gpt-part-entrylast-lba x@-le(