Re: [v5] powerpc/powernv: Add poweroff (EPOW, DPO) events support for PowerNV platform

2015-06-03 Thread Vipin K Parashar

Hi Michael,
  Thanks for review. Responses below

On 06/03/2015 10:43 AM, Michael Ellerman wrote:

On Mon, 2015-18-05 at 15:18:04 UTC, Vipin K Parashar wrote:

This patch adds support for FSP EPOW (Early Power Off Warning) and

Please spell out the acronyms the first time you use them, including FSP.


Will do.




DPO (Delayed Power Off) events for PowerNV platform. EPOW events are

 ^
the


the PowerNV platform.  Will edit.




generated by SPCN/FSP due to various critical system conditions that

SPCN?


Will remove SPCN. FSP should be sufficient.




need system shutdown. Few examples of these conditions are high

 ^
s/need/require/ ?   A few


Agreed.




ambient temperature or system running on UPS power with low UPS battery.
DPO event is generated in response to admin initiated system request.

Blank line between paragraphs please.


Sure




Upon receipt of EPOW and DPO events host kernel invokes

 ^
the host kernel


will edit


orderly_poweroff for performing graceful system shutdown. System admin

I like it if you spell functions with a trailing () to make it clear they are
functions, so this would be orderly_powerof().


Agreed.




can also add systemd service shutdown scripts to perform any specific
actions like graceful guest shutdown upon system poweroff. libvirt-guests
is systemd service available on recent distros for management of guests
at system start/shutdown time.

This last part about the scripts is not relevant to the kernel patch so just
leave it out please.


Agreed.




Signed-off-by: Vipin K Parashar vi...@linux.vnet.ibm.com
Reviewed-by: Joel Stanley j...@jms.id.au
Reviewed-by: Vaibhav Jain vaib...@linux.vnet.ibm.com
---
  arch/powerpc/include/asm/opal-api.h|  44 
  arch/powerpc/include/asm/opal.h|   3 +-
  arch/powerpc/platforms/powernv/opal-power.c| 147 ++---
  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
  4 files changed, 179 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0321a90..90fa364 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -355,6 +355,10 @@ enum opal_msg_type {
OPAL_MSG_TYPE_MAX,
  };
  
+/* OPAL_MSG_SHUTDOWN parameter values */

+#defineSOFT_OFF0x00
+#defineSOFT_REBOOT 0x01

I don't see this in the skiboot version of opal-api.h ?

They should be kept in sync.

If it's a Linux only define it should go in opal.h


Agreed. Won't add these definitions to opal-api.h as its not present in 
skiboot version of opal-api.h.



  struct opal_msg {
__be32 msg_type;
__be32 reserved;
@@ -730,6 +734,46 @@ struct opal_i2c_request {
__be64 buffer_ra;   /* Buffer real address */
  };
  
+/*

+ * EPOW status sharing (OPAL and the host)
+ *
+ * The host will pass on OPAL, a buffer of length OPAL_SYSEPOW_MAX
+ * with individual elements being 16 bits wide to fetch the system
+ * wide EPOW status. Each element in the buffer will contain the
+ * EPOW status in it's bit representation for a particular EPOW sub
+ * class as defiend here. So multiple detailed EPOW status bits
+ * specific for any sub class can be represented in a single buffer
+ * element as it's bit representation.
+ */
+
+/* System EPOW type */
+enum OpalSysEpow {
+   OPAL_SYSEPOW_POWER  = 0,/* Power EPOW */
+   OPAL_SYSEPOW_TEMP   = 1,/* Temperature EPOW */
+   OPAL_SYSEPOW_COOLING= 2,/* Cooling EPOW */
+   OPAL_SYSEPOW_MAX= 3,/* Max EPOW categories */
+};
+
+/* Power EPOW */
+enum OpalSysPower {
+   OPAL_SYSPOWER_UPS   = 0x0001, /* System on UPS power */
+   OPAL_SYSPOWER_CHNG  = 0x0002, /* System power config change */
+   OPAL_SYSPOWER_FAIL  = 0x0004, /* System impending power failure */
+   OPAL_SYSPOWER_INCL  = 0x0008, /* System incomplete power */
+};
+
+/* Temperature EPOW */
+enum OpalSysTemp {
+   OPAL_SYSTEMP_AMB= 0x0001, /* System over ambient temperature */
+   OPAL_SYSTEMP_INT= 0x0002, /* System over internal temperature */
+   OPAL_SYSTEMP_HMD= 0x0004, /* System over ambient humidity */
+};
+
+/* Cooling EPOW */
+enum OpalSysCooling {
+   OPAL_SYSCOOL_INSF   = 0x0001, /* System insufficient cooling */
+};

I don't see the last three of these enums used at all, so please drop them.


OPAL_SYSPOWER_CHNG / FAIL / INCL, OPAL_SYSTEMP_HMD and OPAL_SYSCOOL_INSF
enums aren't used here but they are part of skiboot version of 
opal-api.h and

thus need to be retained.
 PKVM2.1 uses these enums and thus can't be removed from 
skiboot opal-api.h





  #endif /* __ASSEMBLY__ */
  
  #endif /* __OPAL_API_H */

diff 

Re: [PATCH v13 11/14] perf, tools: Support long descriptions with perf list -v

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:11AM -0700, Sukadev Bhattiprolu wrote:
 From: Andi Kleen a...@linux.intel.com
 
 Previously we were dropping the useful longer descriptions that some
 events have in the event list completely. This patch makes them appear with
 perf list.
 
 Old perf list:
 
 baclears:
   baclears.all
[Counts the number of baclears]
 
 vs new:
 
 perf list -v:
 ...
 baclears:
   baclears.all
[The BACLEARS event counts the number of times the front end is 
 resteered, mainly when
 the Branch Prediction Unit cannot provide a correct prediction and 
 this is corrected
 by the Branch Address Calculator at the front end. The BACLEARS.ANY 
 event counts the
 number of baclears for any type of branch]
 
 Signed-off-by: Andi Kleen a...@linux.intel.com
 Signed-off-by: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
 ---
  tools/perf/builtin-list.c  |9 ++---
  tools/perf/pmu-events/jevents.c|   29 -
  tools/perf/pmu-events/jevents.h|2 +-
  tools/perf/pmu-events/pmu-events.h |1 +
  tools/perf/util/parse-events.c |4 ++--
  tools/perf/util/parse-events.h |2 +-
  tools/perf/util/pmu.c  |   17 -
  tools/perf/util/pmu.h  |4 +++-
  8 files changed, 46 insertions(+), 22 deletions(-)

I think this change should be split into:
  - jevents update of parsing out PublicDescription tag
  - alias support for long_desc
  - perf list update

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v11 26/34] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

2015-06-03 Thread Alexey Kardashevskiy

On 06/02/2015 09:30 AM, David Gibson wrote:

On Fri, May 29, 2015 at 06:44:50PM +1000, Alexey Kardashevskiy wrote:

This is a part of moving DMA window programming to an iommu_ops
callback. pnv_pci_ioda2_set_window() takes an iommu_table_group as
a first parameter (not pnv_ioda_pe) as it is going to be used as
a callback for VFIO DDW code.

This adds pnv_pci_ioda2_tvt_invalidate() to invalidate TVT as it is


I'm assuming that's what's now called pnv_pci_ioda2_invalidate_entire()?



Yes, my bad... And the patch is not adding it at all...





a good thing to do. It does not have immediate effect now as the table
is never recreated after reboot but it will in the following patches.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
Reviewed-by: Gavin Shan gws...@linux.vnet.ibm.com
---
Changes:
v11:
* replaced some 1it_page_shift with IOMMU_PAGE_SIZE() macro

v9:
* initialize pe-table_group.tables[0] at the very end when
tbl is fully initialized
* moved pnv_pci_ioda2_tvt_invalidate() from earlier patch
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 47 +--
  1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3d29fe3..fda01c1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1968,6 +1968,43 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
*phb,
}
  }

+static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
+   int num, struct iommu_table *tbl)
+{
+   struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
+   table_group);
+   struct pnv_phb *phb = pe-phb;
+   int64_t rc;
+   const __u64 start_addr = tbl-it_offset  tbl-it_page_shift;
+   const __u64 win_size = tbl-it_size  tbl-it_page_shift;
+
+   pe_info(pe, Setting up window %llx..%llx pg=%x\n,
+   start_addr, start_addr + win_size - 1,
+   IOMMU_PAGE_SIZE(tbl));
+
+   /*
+* Map TCE table through TVT. The TVE index is the PE number
+* shifted by 1 bit for 32-bits DMA space.
+*/
+   rc = opal_pci_map_pe_dma_window(phb-opal_id,
+   pe-pe_number,
+   pe-pe_number  1,
+   1,
+   __pa(tbl-it_base),
+   tbl-it_size  3,
+   IOMMU_PAGE_SIZE(tbl));
+   if (rc) {
+   pe_err(pe, Failed to configure TCE table, err %ld\n, rc);
+   return rc;
+   }
+
+   pnv_pci_link_table_and_group(phb-hose-node, num,
+   tbl, pe-table_group);
+   pnv_pci_ioda2_tce_invalidate_entire(pe);
+
+   return 0;
+}
+
  static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
  {
uint16_t window_id = (pe-pe_number  1 ) + 1;
@@ -2123,21 +2160,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
pe-table_group.ops = pnv_pci_ioda2_ops;
  #endif

-   /*
-* Map TCE table through TVT. The TVE index is the PE number
-* shifted by 1 bit for 32-bits DMA space.
-*/
-   rc = opal_pci_map_pe_dma_window(phb-opal_id, pe-pe_number,
-   pe-pe_number  1, 1, __pa(tbl-it_base),
-   tbl-it_size  3, 1ULL  tbl-it_page_shift);
+   rc = pnv_pci_ioda2_set_window(pe-table_group, 0, tbl);
if (rc) {
pe_err(pe, Failed to configure 32-bit TCE table,
err %ld\n, rc);
goto fail;
}

-   pnv_pci_ioda2_tce_invalidate_entire(pe);
-
/* OPAL variant of PHB3 invalidated TCEs */
if (phb-ioda.tce_inval_reg)
tbl-it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);





--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 04/14] perf, tools: Allow events with dot

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:04AM -0700, Sukadev Bhattiprolu wrote:
 From: Andi Kleen a...@linux.intel.com
 
 The Intel events use a dot to separate event name and unit mask.
 Allow dot in names in the scanner, and remove special handling
 of dot as EOF. Also remove the hack in jevents to replace dot
 with underscore. This way dotted events can be specified
 directly by the user.
 
 I'm not fully sure this change to the scanner is correct
 (what was the dot special case good for?), but I haven't
 found anything that breaks with it so far at least.

can't see anything either

Robert,
does it ring a bell? seems like you introduced it ;-)

thanks,
jirka


 
 V2: Add the dot to name too, to handle events outside cpu//
 Acked-by: Namhyung Kim namhy...@kernel.org
 Signed-off-by: Andi Kleen a...@linux.intel.com
 Signed-off-by: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
 ---
  tools/perf/util/parse-events.l |5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
 
 diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
 index 09e738f..13cef3c 100644
 --- a/tools/perf/util/parse-events.l
 +++ b/tools/perf/util/parse-events.l
 @@ -119,8 +119,8 @@ event [^,{}/]+
  num_dec  [0-9]+
  num_hex  0x[a-fA-F0-9]+
  num_raw_hex  [a-fA-F0-9]+
 -name [a-zA-Z_*?][a-zA-Z0-9_*?]*
 -name_minus   [a-zA-Z_*?][a-zA-Z0-9\-_*?]*
 +name [a-zA-Z_*?][a-zA-Z0-9_*?.]*
 +name_minus   [a-zA-Z_*?][a-zA-Z0-9\-_*?.]*
  /* If you add a modifier you need to update check_modifier() */
  modifier_event   [ukhpGHSDI]+
  modifier_bp  [rwx]{1,3}
 @@ -165,7 +165,6 @@ modifier_bp   [rwx]{1,3}
   return PE_EVENT_NAME;
   }
  
 -.|
  EOF  {
   BEGIN(INITIAL);
   REWIND(0);
 -- 
 1.7.9.5
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 06/14] perf, tools: Support alias descriptions

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:06AM -0700, Sukadev Bhattiprolu wrote:

SNIP

 @@ -1033,37 +1064,49 @@ void print_pmu_events(const char *event_glob, bool 
 name_only)
  event_glob
   continue;
  
 - if (is_cpu  !name_only)
 + if (is_cpu  !name_only  !alias-desc)
   name = format_alias_or(buf, sizeof(buf), pmu, 
 alias);
  
 - aliases[j] = strdup(name);
 - if (aliases[j] == NULL)
 - goto out_enomem;
 + aliases[j].name = name;
 + if (is_cpu  !name_only  !alias-desc)
 + aliases[j].name = format_alias_or(buf, 
 sizeof(buf),
 +   pmu, alias);
 + aliases[j].name = strdup(aliases[j].name);
 + /* failure harmless */

yea but we still try to care everywhere.. ;-)
we would print NULL for name in the code below right?

please keep the above pattern:

if (aliases[j].name == NULL)
goto out_enomem;



 + aliases[j].desc = alias-desc;
   j++;
   }
   if (pmu-selectable) {
   char *s;
   if (asprintf(s, %s//, pmu-name)  0)
   goto out_enomem;
 - aliases[j] = s;
 + aliases[j].name = s;
   j++;
   }
   }
   len = j;
 - qsort(aliases, len, sizeof(char *), cmp_string);
 + qsort(aliases, len, sizeof(struct pair), cmp_pair);
   for (j = 0; j  len; j++) {
   if (name_only) {
 - printf(%s , aliases[j]);
 + printf(%s , aliases[j].name);
   continue;
   }
 - printf(  %-50s [Kernel PMU event]\n, aliases[j]);
 + if (aliases[j].desc) {
 + if (numdesc++ == 0)
 + printf(\n);
 + printf(  %-50s\n, aliases[j].name);
 + printf(%*s, 8, [);
 + wordwrap(aliases[j].desc, 8, columns, 0);
 + printf(]\n);
 + } else
 + printf(  %-50s [Kernel PMU event]\n, aliases[j].name);
   printed++;

SNIP
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 03/14] perf, tools: Use pmu_events_map table to create event aliases

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:03AM -0700, Sukadev Bhattiprolu wrote:

SNIP

  
 +/*
 + * Return the CPU id as a raw string.
 + *
 + * Each architecture should provide a more precise id string that
 + * can be use to match the architecture's mapfile.
 + */
 +char *__attribute__((weak))get_cpuid_str(void)

we have '__weak' define in linux/compiler.h include

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 03/14] perf, tools: Use pmu_events_map table to create event aliases

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:03AM -0700, Sukadev Bhattiprolu wrote:
 At run time, (i.e when perf is starting up), locate the specific events
 table for the current CPU and create event aliases for each of the events.
 
 Use these aliases to parse user's specified perf event.
 
 Signed-off-by: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
 
 Changelog[v3]
   [Jiri Olsa] Fix a memory leak with cpuid.
 
 Changelog[v2]
   [Andi Kleen] Replace the pmu_events_map-vfm field with a simple
   generic cpuid string and use that string to find the
   matching mapfile entry.
 ---
  tools/perf/arch/powerpc/util/header.c |   11 
  tools/perf/util/header.h  |3 +-
  tools/perf/util/pmu.c |  104 
 -
  3 files changed, 104 insertions(+), 14 deletions(-)

I think this patch is doing too much, it should be split into 3 pieces:

  - introduce get_cpuid_str for powerpc
  - introducing __perf_pmu__new_alias/perf_pmu__new_alias functions split
  - adding pmu_add_cpu_aliases functionality

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 02/14] perf, tools, jevents: Program to convert JSON file to C style file

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:02AM -0700, Sukadev Bhattiprolu wrote:

SNIP

 +
 +static int process_mapfile(FILE *outfp, char *fpath)
 +{
 + int n = 16384;
 + FILE *mapfp;
 + char *save;
 + char *line, *p;
 + int line_num;
 + char *tblname;
 +
 + pr_info(%s: Processing mapfile %s\n, prog, fpath);

SNIP

 +
 + cpuid = strtok_r(p, ,, save);
 + version = strtok_r(NULL, ,, save);
 + fname = strtok_r(NULL, ,, save);
 + type = strtok_r(NULL, ,, save);
 +
 + tblname = file_name_to_table_name(fname);
 + fprintf(outfp, {\n);
 + fprintf(outfp, \t.cpuid = \%s\,\n, cpuid);
 + fprintf(outfp, \t.version = \%s\,\n, version);
 + fprintf(outfp, \t.type = \%s\,\n, type);

got build failure for make DEBUG=1:

  CC   pmu-events/jevents.o
pmu-events/jevents.c: In function ‘process_mapfile’:
pmu-events/jevents.c:498:10: error: ‘save’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
   fprintf(outfp, \t.type = \%s\,\n, type);
  ^

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 07/24] x86/thinkpad_acpi: Use arch_nvram_ops methods instead of nvram_read_byte() and nvram_write_byte()

2015-06-03 Thread Henrique de Moraes Holschuh
On Wed, Jun 3, 2015, at 00:34, Darren Hart wrote:
 On Tue, Jun 02, 2015 at 07:09:28AM -0300, Henrique de Moraes Holschuh
 wrote:
  Test results were sent to me privately, and they are correct, so...
 
 Finn, unless there is some compelling reason not to - like they are MBs
 worth of
 data, please submit these to the list in the future so we have them for
 reference.

After I told him which exact bitmask to use on a T43 to test
hotkey_source_mask, his test results can be summarized as I could see
no difference in behavior, which is *exactly* what I expected to
happen.

If anything went wrong with the thinkpad-acpi NVRAM code, you'd notice a
very large change in behavior (typical: hotkeys don't work, less
typical: random hotkey keypresses, hotkey press bursts, low responsivity
of hotkeys).

  Acked-by: Henrique de Moraes Holschuh h...@hmh.eng.br
 
 I'm fine with the changes, but they need to be submitted with the other
 changes
 as this one change cannot compile independently in my tree.
 
 Finn, please work with whomever is pulling the series to include this in
 their
 pull request.
 
 Reviewed-by: Darren Hart dvh...@linux.intel.com

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 02/14] perf, tools, jevents: Program to convert JSON file to C style file

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:02AM -0700, Sukadev Bhattiprolu wrote:

SNIP

 +
 +static char *file_name_to_table_name(char *fname)
 +{
 + unsigned int i, j;
 + int c;
 + int n = 1024;   /* use max variable length? */

I think this should be at least PATH_MAX, or you might
actually use asprintf and have all below done within
one line or so

jirka

 + char *tblname;
 + char *p;
 +
 + tblname = malloc(n);
 + if (!tblname)
 + return NULL;
 +
 + p = basename(fname);
 +
 + memset(tblname, 0, n);
 +
 + /* Ensure table name starts with an alphabetic char */
 + strcpy(tblname, pme_);
 +
 + n = strlen(fname) + strlen(tblname);
 + n = min(1024, n);
 +
 + for (i = 0, j = strlen(tblname); i  strlen(fname); i++, j++) {
 + c = p[i];
 + if (isalnum(c) || c == '_')
 + tblname[j] = c;
 + else if (c == '-')
 + tblname[j] = '_';
 + else if (c == '.') {
 + tblname[j] = '\0';
 + break;
 + } else {
 + pr_err(%s: Invalid character '%c' in file name %s\n,
 + prog, c, p);
 + free(tblname);
 + return NULL;
 + }
 + }
 +
 + return tblname;
 +}

SNIP
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 03/14] perf, tools: Use pmu_events_map table to create event aliases

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:03AM -0700, Sukadev Bhattiprolu wrote:

SNIP

 +
 +/*
 + * From the pmu_events_map, find the table of PMU events that corresponds
 + * to the current running CPU. Then, add all PMU events from that table
 + * as aliases.
 + */
 +static int pmu_add_cpu_aliases(void *data)

any reason why the argument is not 'head' directly?

jirka

 +{
 + struct list_head *head = (struct list_head *)data;
 + int i;
 + struct pmu_events_map *map;
 + struct pmu_event *pe;
 + char *cpuid;
 +
 + cpuid = get_cpuid_str();
 + if (!cpuid)
 + return 0;
 +
 + i = 0;
 + while (1) {
 + map = pmu_events_map[i++];
 + if (!map-table) {
 + goto out;
 + }
 +
 + if (!strcmp(map-cpuid, cpuid))
 + break;
 + }
 +
 + /*
 +  * Found a matching PMU events table. Create aliases
 +  */
 + i = 0;
 + while (1) {
 + pe = map-table[i++];
 + if (!pe-name)
 + break;
 +
 + /* need type casts to override 'const' */
 + __perf_pmu__new_alias(head, (char *)pe-name, NULL,
 + (char *)pe-desc, (char *)pe-event);
 + }
 +
 +out:
 + free(cpuid);
 + return 0;
 +}
 +
 +
  static struct perf_pmu *pmu_lookup(const char *name)
  {
   struct perf_pmu *pmu;
 @@ -464,6 +540,8 @@ static struct perf_pmu *pmu_lookup(const char *name)
   if (pmu_aliases(name, aliases))
   return NULL;
  
 + if (!strcmp(name, cpu))
 + (void)pmu_add_cpu_aliases(aliases);
   if (pmu_type(name, type))
   return NULL;
  
 -- 
 1.7.9.5
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 07/24] x86/thinkpad_acpi: Use arch_nvram_ops methods instead of nvram_read_byte() and nvram_write_byte()

2015-06-03 Thread Finn Thain

On Tue, 2 Jun 2015, Darren Hart wrote:

 On Tue, Jun 02, 2015 at 07:09:28AM -0300, Henrique de Moraes Holschuh 
 wrote:
  Test results were sent to me privately, and they are correct, so...
  
 
 Finn, unless there is some compelling reason not to - like they are MBs 
 worth of data, please submit these to the list in the future so we have 
 them for reference.

Sure. Those results were just confirmation that this patch series doesn't 
affect input events read directly from 
/dev/input/by-path/platform-thinkpad_acpi-event
given the the hotkey_source_mask settings discussed in this thread.

 
  Acked-by: Henrique de Moraes Holschuh h...@hmh.eng.br
 
 I'm fine with the changes, but they need to be submitted with the other 
 changes as this one change cannot compile independently in my tree.
 
 Finn, please work with whomever is pulling the series to include this in 
 their pull request.

Right.

 
 Reviewed-by: Darren Hart dvh...@linux.intel.com

Thanks for your review.

-- 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] of: clean-up unnecessary libfdt include paths

2015-06-03 Thread Ralf Baechle
On Wed, Jun 03, 2015 at 12:10:25AM -0500, Rob Herring wrote:
 Date:   Wed,  3 Jun 2015 00:10:25 -0500
 From: Rob Herring r...@kernel.org
 To: devicet...@vger.kernel.org, linux-ker...@vger.kernel.org
 Cc: Grant Likely grant.lik...@linaro.org, Rob Herring r...@kernel.org,
  Ralf Baechle r...@linux-mips.org, Benjamin Herrenschmidt
  b...@kernel.crashing.org, Paul Mackerras pau...@samba.org, Michael
  Ellerman m...@ellerman.id.au, linux-m...@linux-mips.org,
  linuxppc-dev@lists.ozlabs.org
 Subject: [PATCH] of: clean-up unnecessary libfdt include paths
 
 With the latest dtc import include fixups, it is no longer necessary to
 add explicit include paths to use libfdt. Remove these across the
 kernel.
 
 Signed-off-by: Rob Herring r...@kernel.org
 Cc: Ralf Baechle r...@linux-mips.org
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Paul Mackerras pau...@samba.org
 Cc: Michael Ellerman m...@ellerman.id.au
 Cc: Grant Likely grant.lik...@linaro.org
 Cc: linux-m...@linux-mips.org
 Cc: linuxppc-dev@lists.ozlabs.org

For the MIPS bits;

Acked-by: Ralf Baechle r...@linux-mips.org

  Ralf
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 12/13] selftests, powerpc: Add thread based stress test for DSCR sysfs interfaces

2015-06-03 Thread Anshuman Khandual
On 05/21/2015 12:13 PM, Anshuman Khandual wrote:
 This patch adds a test to update the system wide DSCR value repeatedly
 and then verifies that any thread on any given CPU on the system must
 be able to see the same DSCR value whether its is being read through
 the problem state based SPR or the privilege state based SPR.

This test can fail on a system if some kind of cpu hotplug activity is
happening when this test is being run at the same time. Then call to
sched_setaffinity() might fail as the test does not check for the CPU
availability/online every time before changing the affinity of the
thread. Here is one changed version of this test which achieves similar
test objective.

Michael,

Please let me know if the patch here would be okay or I need to re-spin
the patch series for this change. Thank you.

--
[PATCH] selftests, powerpc: Add thread based stress test for DSCR sysfs 
interfaces

This patch adds a test to update the system wide DSCR value repeatedly
and then verifies that any thread on any given CPU on the system must
be able to see the same DSCR value whether its is being read through
the problem state based SPR or the privilege state based SPR.

Acked-by: Shuah Khan shua...@osg.samsung.com
Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com
---
 tools/testing/selftests/powerpc/dscr/Makefile  |  2 +-
 .../powerpc/dscr/dscr_sysfs_thread_test.c  | 81 ++
 2 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/powerpc/dscr/dscr_sysfs_thread_test.c

diff --git a/tools/testing/selftests/powerpc/dscr/Makefile 
b/tools/testing/selftests/powerpc/dscr/Makefile
index fada526..834ef88 100644
--- a/tools/testing/selftests/powerpc/dscr/Makefile
+++ b/tools/testing/selftests/powerpc/dscr/Makefile
@@ -1,6 +1,6 @@
 PROGS := dscr_default_test dscr_explicit_test dscr_user_test   \
 dscr_inherit_test dscr_inherit_exec_test   \
-dscr_sysfs_test
+dscr_sysfs_test dscr_sysfs_thread_test
 
 CFLAGS := $(CFLAGS) -lpthread
 
diff --git a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_thread_test.c 
b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_thread_test.c
new file mode 100644
index 000..9671d52
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_thread_test.c
@@ -0,0 +1,81 @@
+/*
+ * POWER Data Stream Control Register (DSCR) sysfs thread test
+ *
+ * This test updates the system wide DSCR default value through
+ * sysfs interface which should then update all the CPU specific
+ * DSCR default values which must also be then visible to threads
+ * executing on individual CPUs on the system.
+ *
+ * Copyright (C) 2015 Anshuman Khandual khand...@linux.vnet.ibm.com, IBM
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#define _GNU_SOURCE
+#include dscr.h
+
+static int test_thread_dscr(unsigned long val)
+{
+   unsigned long cur_dscr, cur_dscr_usr;
+
+   cur_dscr = get_dscr();
+   cur_dscr_usr = get_dscr_usr();
+
+   if (val != cur_dscr) {
+   printf([cpu %d] Kernel DSCR should be %ld but is %ld\n,
+   sched_getcpu(), val, cur_dscr);
+   return 1;
+   }
+
+   if (val != cur_dscr_usr) {
+   printf([cpu %d] User DSCR should be %ld but is %ld\n,
+   sched_getcpu(), val, cur_dscr_usr);
+   return 1;
+   }
+   return 0;
+}
+
+static int check_cpu_dscr_thread(unsigned long val)
+{
+   cpu_set_t mask;
+   int cpu;
+
+   for (cpu = 0; cpu  CPU_SETSIZE; cpu++) {
+   CPU_ZERO(mask);
+   CPU_SET(cpu, mask);
+   if (sched_setaffinity(0, sizeof(mask), mask))
+   continue;
+
+   if (test_thread_dscr(val))
+   return 1;
+   }
+   return 0;
+
+}
+
+int dscr_sysfs_thread(void)
+{
+   unsigned long orig_dscr_default;
+   int i, j;
+
+   orig_dscr_default = get_default_dscr();
+   for (i = 0; i  COUNT; i++) {
+   for (j = 0; j  DSCR_MAX; j++) {
+   set_default_dscr(j);
+   if (check_cpu_dscr_thread(j))
+   goto fail;
+   }
+   }
+   set_default_dscr(orig_dscr_default);
+   return 0;
+fail:
+   set_default_dscr(orig_dscr_default);
+   return 1;
+}
+
+int main(int argc, char *argv[])
+{
+   return test_harness(dscr_sysfs_thread, dscr_sysfs_thread_test);
+}
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 03/14] perf, tools: Use pmu_events_map table to create event aliases

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:03AM -0700, Sukadev Bhattiprolu wrote:

SNIP

 @@ -225,26 +221,47 @@ static int perf_pmu__new_alias(struct list_head *list, 
 char *dir, char *name, FI
   alias-unit[0] = '\0';
   alias-per_pkg = false;
  
 - ret = parse_events_terms(alias-terms, buf);
 + ret = parse_events_terms(alias-terms, val);
   if (ret) {
 + pr_err(Cannot parse alias %s: %d\n, val, ret);
   free(alias);
   return ret;
   }
  
   alias-name = strdup(name);
 + if (dir) {
 + /*
 +  * load unit name and scale if available
 +  */
 + perf_pmu__parse_unit(alias, dir, name);
 + perf_pmu__parse_scale(alias, dir, name);
 + perf_pmu__parse_per_pkg(alias, dir, name);
 + perf_pmu__parse_snapshot(alias, dir, name);
 + }
 +
   /*
 -  * load unit name and scale if available
 +  * TODO: pickup description from Andi's patchset
*/
 - perf_pmu__parse_unit(alias, dir, name);
 - perf_pmu__parse_scale(alias, dir, name);
 - perf_pmu__parse_per_pkg(alias, dir, name);
 - perf_pmu__parse_snapshot(alias, dir, name);
 + //alias-desc = desc ? strdpu(desc) : NULL;

please remove the TODO line and above commented code,
it is addressed later in this patchset

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 02/14] perf, tools, jevents: Program to convert JSON file to C style file

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:02AM -0700, Sukadev Bhattiprolu wrote:

SNIP

 + * If we fail to locate/process JSON and map files, create a NULL mapping
 + * table. This would at least allow perf to build even if we can't find/use
 + * the aliases.
 + */
 +static void create_empty_mapping(const char *output_file)
 +{
 + FILE *outfp;
 +
 + pr_info(%s: Creating empty pmu_events_map[] table\n, prog);
 +
 + /* Unlink file to clear any partial writes to it */
 + unlink(output_file);
 +
 + outfp = fopen(output_file, a);

you could open with w+ and save the unlink call

SNIP

 +int main(int argc, char *argv[])
 +{
 + int rc;
 + int flags;
 + int maxfds;
 + char dirname[PATH_MAX];
 +
 + const char *arch;
 + const char *output_file;
 + const char *start_dirname;
 +
 + prog = basename(argv[0]);
 + if (argc  4) {
 + pr_err(Usage: %s arch starting_dir output_file\n, prog);
 + return 1;
 + }
 +
 + arch = argv[1];
 + start_dirname = argv[2];
 + output_file = argv[3];
 +
 + if (argc  4)
 + verbose = atoi(argv[4]);
 +
 + unlink(output_file);
 + eventsfp = fopen(output_file, a);

ditto

SNIP
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 12/14] perf, tools: Add support for event list topics

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 12:16:41PM -0700, Sukadev Bhattiprolu wrote:

SNIP

[Speculative and retired macro-conditional branches]
   br_inst_exec.all_direct_jmp
[Speculative and retired macro-unconditional branches excluding calls 
 and indirects]
   br_inst_exec.all_direct_near_call
[Speculative and retired direct near calls]
   br_inst_exec.all_indirect_jump_non_call_ret
 
 Signed-off-by: Andi Kleen a...@linux.intel.com
 Signed-off-by: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
 
 Changelog[v2]
   Dropped an unnecessary patch before this and fixed resulting
   conflicts in tools/perf/util/pmu.c
 ---
  tools/perf/pmu-events/jevents.c| 16 +++-
  tools/perf/pmu-events/jevents.h|  3 ++-
  tools/perf/pmu-events/pmu-events.h |  1 +
  tools/perf/util/pmu.c  | 34 --
  tools/perf/util/pmu.h  |  1 +
  5 files changed, 39 insertions(+), 16 deletions(-)

please split at least the jevents Topic parsing from the rest
idelay also the alias update and the display change

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 13/14] perf, tools: Handle header line in mapfile

2015-06-03 Thread Jiri Olsa
On Tue, Jun 02, 2015 at 10:12:13AM -0700, Sukadev Bhattiprolu wrote:
 From: Andi Kleen a...@linux.intel.com
 
 Support a header line in the mapfile.csv, to match the existing
 mapfiles

'Suport' means 'skip' in here

jirka

 
 Signed-off-by: Andi Kleen a...@linux.intel.com
 Signed-off-by: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
 
 Changelog[v2]
   All architectures may not use the Family to identify. So,
   assume first line is header.
 ---
  tools/perf/pmu-events/jevents.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
 index 14707fb..8d365f2 100644
 --- a/tools/perf/pmu-events/jevents.c
 +++ b/tools/perf/pmu-events/jevents.c
 @@ -461,7 +461,12 @@ static int process_mapfile(FILE *outfp, char *fpath)
  
   print_mapping_table_prefix(outfp);
  
 - line_num = 0;
 + /* Skip first line (header) */
 + p = fgets(line, n, mapfp);
 + if (!p)
 + goto out;
 +
 + line_num = 1;
   while (1) {
   char *cpuid, *version, *type, *fname;
  
 @@ -505,8 +510,8 @@ static int process_mapfile(FILE *outfp, char *fpath)
   fprintf(outfp, },\n);
   }
  
 +out:
   print_mapping_table_suffix(outfp);
 -
   return 0;
  }
  
 -- 
 1.7.9.5
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v11 09/34] vfio: powerpc/spapr: Move locked_vm accounting to helpers

2015-06-03 Thread Alexey Kardashevskiy

On 06/01/2015 02:28 PM, David Gibson wrote:

On Fri, May 29, 2015 at 06:44:33PM +1000, Alexey Kardashevskiy wrote:

There moves locked pages accounting to helpers.
Later they will be reused for Dynamic DMA windows (DDW).

This reworks debug messages to show the current value and the limit.

This stores the locked pages number in the container so when unlocking
the iommu table pointer won't be needed. This does not have an effect
now but it will with the multiple tables per container as then we will
allow attaching/detaching groups on fly and we may end up having
a container with no group attached but with the counter incremented.

While we are here, update the comment explaining why RLIMIT_MEMLOCK
might be required to be bigger than the guest RAM. This also prints
pid of the current process in pr_warn/pr_debug.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
[aw: for the vfio related changes]
Acked-by: Alex Williamson alex.william...@redhat.com
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
Reviewed-by: Gavin Shan gws...@linux.vnet.ibm.com
---
Changes:
v4:
* new helpers do nothing if @npages == 0
* tce_iommu_disable() now can decrement the counter if the group was
detached (not possible now but will be in the future)
---
  drivers/vfio/vfio_iommu_spapr_tce.c | 82 -
  1 file changed, 63 insertions(+), 19 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 64300cc..40583f9 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -29,6 +29,51 @@
  static void tce_iommu_detach_group(void *iommu_data,
struct iommu_group *iommu_group);

+static long try_increment_locked_vm(long npages)
+{
+   long ret = 0, locked, lock_limit;
+
+   if (!current || !current-mm)
+   return -ESRCH; /* process exited */
+
+   if (!npages)
+   return 0;
+
+   down_write(current-mm-mmap_sem);
+   locked = current-mm-locked_vm + npages;


Is there a possibility of userspace triggering an integer overflow
here, if npages is really huge?



I do not see how. I just do not accept npages bigger than the host RAM size 
in pages. And it is long. For (lets say) 128GB host, the number of 4KB 
pages is (12830)/4096=33554432.






+   lock_limit = rlimit(RLIMIT_MEMLOCK)  PAGE_SHIFT;
+   if (locked  lock_limit  !capable(CAP_IPC_LOCK))
+   ret = -ENOMEM;
+   else
+   current-mm-locked_vm += npages;
+
+   pr_debug([%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n, current-pid,
+   npages  PAGE_SHIFT,
+   current-mm-locked_vm  PAGE_SHIFT,
+   rlimit(RLIMIT_MEMLOCK),
+   ret ?  - exceeded : );
+
+   up_write(current-mm-mmap_sem);
+
+   return ret;
+}
+
+static void decrement_locked_vm(long npages)
+{
+   if (!current || !current-mm || !npages)
+   return; /* process exited */
+
+   down_write(current-mm-mmap_sem);
+   if (npages  current-mm-locked_vm)
+   npages = current-mm-locked_vm;


Can this case ever occur (without there being a leak bug somewhere
else in the code)?



It should not. Safety measure. Having a warning here might make sense but I 
believe if this happens, there will be many, many warnings in other places :)





+   current-mm-locked_vm -= npages;
+   pr_debug([%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n, current-pid,
+   npages  PAGE_SHIFT,
+   current-mm-locked_vm  PAGE_SHIFT,
+   rlimit(RLIMIT_MEMLOCK));
+   up_write(current-mm-mmap_sem);
+}
+
  /*
   * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
   *
@@ -45,6 +90,7 @@ struct tce_container {
struct mutex lock;
struct iommu_table *tbl;
bool enabled;
+   unsigned long locked_pages;
  };

  static bool tce_page_is_contained(struct page *page, unsigned page_shift)
@@ -60,7 +106,7 @@ static bool tce_page_is_contained(struct page *page, 
unsigned page_shift)
  static int tce_iommu_enable(struct tce_container *container)
  {
int ret = 0;
-   unsigned long locked, lock_limit, npages;
+   unsigned long locked;
struct iommu_table *tbl = container-tbl;

if (!container-tbl)
@@ -89,21 +135,22 @@ static int tce_iommu_enable(struct tce_container 
*container)
 * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits,
 * that would effectively kill the guest at random points, much better
 * enforcing the limit based on the max that the guest can map.
+*
+* Unfortunately at the moment it counts whole tables, no matter how
+* much memory the guest has. I.e. for 4GB guest and 4 IOMMU groups
+* each with 2GB DMA window, 8GB will be counted here. The reason for
+* this is that we cannot tell here the amount of RAM used by the guest
+* as this 

Re: [PATCH kernel v11 27/34] powerpc/powernv: Implement multilevel TCE tables

2015-06-03 Thread Alexey Kardashevskiy

On 06/02/2015 09:50 AM, David Gibson wrote:

On Fri, May 29, 2015 at 06:44:51PM +1000, Alexey Kardashevskiy wrote:

TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
on huge guests (hundreds of GB of RAM) so the kernel might be unable to
allocate contiguous chunk of physical memory to store the TCE table.

To address this, POWER8 CPU (actually, IODA2) supports multi-level
TCE tables, up to 5 levels which splits the table into a tree of
smaller subtables.

This adds multi-level TCE tables support to
pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages()
helpers.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v10:
* fixed multiple comments received for v9

v9:
* moved from ioda2 to common powernv pci code
* fixed cleanup if allocation fails in a middle
* removed check for the size - all boundary checks happen in the calling code
anyway
---
  arch/powerpc/include/asm/iommu.h  |  2 +
  arch/powerpc/platforms/powernv/pci-ioda.c | 98 ---
  arch/powerpc/platforms/powernv/pci.c  | 13 
  3 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 4636734..706cfc0 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -96,6 +96,8 @@ struct iommu_pool {
  struct iommu_table {
unsigned long  it_busno; /* Bus number this table belongs to */
unsigned long  it_size;  /* Size of iommu table in entries */
+   unsigned long  it_indirect_levels;
+   unsigned long  it_level_size;
unsigned long  it_offset;/* Offset into global table */
unsigned long  it_base;  /* mapped address of tce table */
unsigned long  it_index; /* which iommu table this is */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index fda01c1..68ffc7a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -49,6 +49,9 @@
  /* 256M DMA window, 4K TCE pages, 8 bytes TCE */
  #define TCE32_TABLE_SIZE  ((0x1000 / 0x1000) * 8)

+#define POWERNV_IOMMU_DEFAULT_LEVELS   1
+#define POWERNV_IOMMU_MAX_LEVELS   5
+
  static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);

  static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
@@ -1975,6 +1978,8 @@ static long pnv_pci_ioda2_set_window(struct 
iommu_table_group *table_group,
table_group);
struct pnv_phb *phb = pe-phb;
int64_t rc;
+   const unsigned long size = tbl-it_indirect_levels ?
+   tbl-it_level_size : tbl-it_size;
const __u64 start_addr = tbl-it_offset  tbl-it_page_shift;
const __u64 win_size = tbl-it_size  tbl-it_page_shift;

@@ -1989,9 +1994,9 @@ static long pnv_pci_ioda2_set_window(struct 
iommu_table_group *table_group,
rc = opal_pci_map_pe_dma_window(phb-opal_id,
pe-pe_number,
pe-pe_number  1,
-   1,
+   tbl-it_indirect_levels + 1,
__pa(tbl-it_base),
-   tbl-it_size  3,
+   size  3,
IOMMU_PAGE_SIZE(tbl));
if (rc) {
pe_err(pe, Failed to configure TCE table, err %ld\n, rc);
@@ -2071,11 +2076,19 @@ static void pnv_pci_ioda_setup_opal_tce_kill(struct 
pnv_phb *phb)
phb-ioda.tce_inval_reg = ioremap(phb-ioda.tce_inval_reg_phys, 8);
  }

-static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift)
+static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift,
+   unsigned levels, unsigned long limit,
+   unsigned long *tce_table_allocated)
  {
struct page *tce_mem = NULL;
-   __be64 *addr;
+   __be64 *addr, *tmp;
unsigned order = max_t(unsigned, shift, PAGE_SHIFT) - PAGE_SHIFT;
+   unsigned long local_allocated = 1UL  (order + PAGE_SHIFT);
+   unsigned entries = 1UL  (shift - 3);
+   long i;
+
+   if (*tce_table_allocated = limit)
+   return NULL;


I'm not quite clear what case this limit logic is trying to catch.



The function is allocating some amount of entries which may be in one chunk 
of memory and spread between multiple chunks in multiple levels. limit is 
the amount of memory for actual TCEs (not intermediate levels). If I do not 
do this, and the user requests 5 levels, and I do not check this, more 
memory will be allocated that actually needed because size of the window is 
limited.






tce_mem = alloc_pages_node(nid, GFP_KERNEL, order);
if (!tce_mem) {
@@ -2083,31 +2096,69 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int 
nid, unsigned shift)
return NULL;
}
addr = page_address(tce_mem);
-   memset(addr, 0, 1UL  (order + 

Re: [PATCH kernel v11 33/34] vfio: powerpc/spapr: Register memory and define IOMMU v2

2015-06-03 Thread Alexey Kardashevskiy

On 06/02/2015 02:17 PM, David Gibson wrote:

On Fri, May 29, 2015 at 06:44:57PM +1000, Alexey Kardashevskiy wrote:

The existing implementation accounts the whole DMA window in
the locked_vm counter. This is going to be worse with multiple
containers and huge DMA windows. Also, real-time accounting would requite
additional tracking of accounted pages due to the page size difference -
IOMMU uses 4K pages and system uses 4K or 64K pages.

Another issue is that actual pages pinning/unpinning happens on every
DMA map/unmap request. This does not affect the performance much now as
we spend way too much time now on switching context between
guest/userspace/host but this will start to matter when we add in-kernel
DMA map/unmap acceleration.

This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
2 new ioctls to register/unregister DMA memory -
VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
which receive user space address and size of a memory region which
needs to be pinned/unpinned and counted in locked_vm.
New IOMMU splits physical pages pinning and TCE table update
into 2 different operations. It requires:
1) guest pages to be registered first
2) consequent map/unmap requests to work only with pre-registered memory.
For the default single window case this means that the entire guest
(instead of 2GB) needs to be pinned before using VFIO.
When a huge DMA window is added, no additional pinning will be
required, otherwise it would be guest RAM + 2GB.

The new memory registration ioctls are not supported by
VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
will require memory to be preregistered in order to work.

The accounting is done per the user process.

This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
can do with v1 or v2 IOMMUs.

In order to support memory pre-registration, we need a way to track
the use of every registered memory region and only allow unregistration
if a region is not in use anymore. So we need a way to tell from what
region the just cleared TCE was from.

This adds a userspace view of the TCE table into iommu_table struct.
It contains userspace address, one per TCE entry. The table is only
allocated when the ownership over an IOMMU group is taken which means
it is only used from outside of the powernv code (such as VFIO).

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
[aw: for the vfio related changes]
Acked-by: Alex Williamson alex.william...@redhat.com
---
Changes:
v11:
* mm_iommu_put() does not return a code so this does not check it
* moved v2 in tce_container to pack the struct

v10:
* moved it_userspace allocation to vfio_iommu_spapr_tce as it VFIO
specific thing
* squashed powerpc/iommu: Add userspace view of TCE table into this as
it is
a part of IOMMU v2
* s/tce_iommu_use_page_v2/tce_iommu_prereg_ua_to_hpa/
* fixed some function names to have tce_iommu_ in the beginning rather
just tce_
* as mm_iommu_mapped_inc() can now fail, check for the return code

v9:
* s/tce_get_hva_cached/tce_iommu_use_page_v2/

v7:
* now memory is registered per mm (i.e. process)
* moved memory registration code to powerpc/mmu
* merged vfio: powerpc/spapr: Define v2 IOMMU into this
* limited new ioctls to v2 IOMMU
* updated doc
* unsupported ioclts return -ENOTTY instead of -EPERM

v6:
* tce_get_hva_cached() returns hva via a pointer

v4:
* updated docs
* s/kzmalloc/vzalloc/
* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
replaced offset with index
* renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory
and removed duplicating vfio_iommu_spapr_register_memory
---
  Documentation/vfio.txt  |  31 ++-
  arch/powerpc/include/asm/iommu.h|   6 +
  drivers/vfio/vfio_iommu_spapr_tce.c | 512 ++--
  include/uapi/linux/vfio.h   |  27 ++
  4 files changed, 487 insertions(+), 89 deletions(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 96978ec..7dcf2b5 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -289,10 +289,12 @@ PPC64 sPAPR implementation note

  This implementation has some specifics:

-1) Only one IOMMU group per container is supported as an IOMMU group
-represents the minimal entity which isolation can be guaranteed for and
-groups are allocated statically, one per a Partitionable Endpoint (PE)
+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
+container is supported as an IOMMU table is allocated at the boot time,
+one table per a IOMMU group which is a Partitionable Endpoint (PE)
  (PE is often a PCI domain but not always).
+Newer systems (POWER8 with IODA2) have improved hardware design which allows
+to remove this limitation and have multiple IOMMU groups per a VFIO container.

  2) The hardware supports so called DMA windows - the PCI address range
  within which DMA transfer is allowed, 

Re: [PATCH v13 12/14] perf, tools: Add support for event list topics

2015-06-03 Thread Jiri Olsa
On Wed, Jun 03, 2015 at 05:57:33AM -0700, Andi Kleen wrote:
  please split at least the jevents Topic parsing from the rest
  idelay also the alias update and the display change
 
 What's the point of all these splits? It's already one logical unit,
 not too large, and is bisectable.

splitting the patch in logical pieces helps review and distro
backporting 

You changed the parsing tool and perf alias code that uses
the new output. IMO it's separate enough to be placed into
separate patches.

I believe the review would have been easier for me if those changes
were separate, also easing my job when backporting this change later
into the distro

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 12/14] perf, tools: Add support for event list topics

2015-06-03 Thread Andi Kleen
 please split at least the jevents Topic parsing from the rest
 idelay also the alias update and the display change

What's the point of all these splits? It's already one logical unit,
not too large, and is bisectable.

-andi

-- 
a...@linux.intel.com -- Speaking for myself only
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] ppc64 ftrace configuration

2015-06-03 Thread Torsten Duwe

Add Kconfig variables and Makefile magic for ftrace with -mprofile-kernel

Signed-off-by: Torsten Duwe d...@suse.de

diff --git a/Makefile b/Makefile
index 3d16bcc..bbd5e87 100644
--- a/Makefile
+++ b/Makefile
@@ -733,7 +733,10 @@ export CC_FLAGS_FTRACE
 ifdef CONFIG_HAVE_FENTRY
 CC_USING_FENTRY:= $(call cc-option, -mfentry -DCC_USING_FENTRY)
 endif
-KBUILD_CFLAGS  += $(CC_FLAGS_FTRACE) $(CC_USING_FENTRY)
+ifdef CONFIG_HAVE_MPROFILE_KERNEL
+CC_USING_MPROFILE_KERNEL   := $(call cc-option, -mprofile-kernel 
-DCC_USING_MPROFILE_KERNEL)
+endif
+KBUILD_CFLAGS  += $(CC_FLAGS_FTRACE) $(CC_USING_FENTRY) 
$(CC_USING_MPROFILE_KERNEL)
 KBUILD_AFLAGS  += $(CC_USING_FENTRY)
 ifdef CONFIG_DYNAMIC_FTRACE
ifdef CONFIG_HAVE_C_RECORDMCOUNT
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 22b0940..566f204 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -94,8 +94,10 @@ config PPC
select OF_RESERVED_MEM
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
+   select HAVE_MPROFILE_KERNEL
select SYSCTL_EXCEPTION_TRACE
select ARCH_WANT_OPTIONAL_GPIOLIB
select VIRT_TO_BUS if !PPC64
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a5da09c..dd53f3d 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -52,6 +52,11 @@ config HAVE_FENTRY
help
  Arch supports the gcc options -pg with -mfentry
 
+config HAVE_MPROFILE_KERNEL
+   bool
+   help
+ Arch supports the gcc options -pg with -mprofile-kernel
+
 config HAVE_C_RECORDMCOUNT
bool
help
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] ppc64 ftrace: spare early boot and low level code

2015-06-03 Thread Torsten Duwe

Using -mprofile-kernel on early boot code not only confuses the checker
but is also useless, as the infrastructure is not yet in place. Proceed
like with -pg, equally with time.o and ftrace itself.

Signed-off-by: Torsten Duwe d...@suse.de

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 502cf69..fb33fc5 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -17,14 +17,14 @@ endif
 
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace early boot code
-CFLAGS_REMOVE_cputable.o = -pg -mno-sched-epilog
-CFLAGS_REMOVE_prom_init.o = -pg -mno-sched-epilog
-CFLAGS_REMOVE_btext.o = -pg -mno-sched-epilog
-CFLAGS_REMOVE_prom.o = -pg -mno-sched-epilog
+CFLAGS_REMOVE_cputable.o = -pg -mno-sched-epilog -mprofile-kernel
+CFLAGS_REMOVE_prom_init.o = -pg -mno-sched-epilog -mprofile-kernel
+CFLAGS_REMOVE_btext.o = -pg -mno-sched-epilog -mprofile-kernel
+CFLAGS_REMOVE_prom.o = -pg -mno-sched-epilog -mprofile-kernel
 # do not trace tracer code
-CFLAGS_REMOVE_ftrace.o = -pg -mno-sched-epilog
+CFLAGS_REMOVE_ftrace.o = -pg -mno-sched-epilog -mprofile-kernel
 # timers used by tracing
-CFLAGS_REMOVE_time.o = -pg -mno-sched-epilog
+CFLAGS_REMOVE_time.o = -pg -mno-sched-epilog -mprofile-kernel
 endif
 
 obj-y  := cputable.o ptrace.o syscalls.o \
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] ppc64 ftrace recursion protection

2015-06-03 Thread Torsten Duwe
As suggested by You and Jikos, a flag in task_struct's trace_recursion
is used to block a tracer function to recurse into itself, especially
on a data access fault. This should catch all functions called by the
fault handlers which are not yet attributed notrace.

Signed-off-by: Torsten Duwe d...@suse.de

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ae10752 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -72,6 +72,7 @@ int main(void)
DEFINE(THREAD, offsetof(struct task_struct, thread));
DEFINE(MM, offsetof(struct task_struct, mm));
DEFINE(MMCONTEXTID, offsetof(struct mm_struct, context.id));
+   DEFINE(TASK_TRACEREC, offsetof(struct task_struct, trace_recursion));
 #ifdef CONFIG_PPC64
DEFINE(AUDITCONTEXT, offsetof(struct task_struct, audit_context));
DEFINE(SIGSEGV, SIGSEGV);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index a4132ef..4768104 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1202,7 +1202,13 @@ _GLOBAL(ftrace_caller)
SAVE_8GPRS(16,r1)
SAVE_8GPRS(24,r1)
 
-
+   ld  r3, PACACURRENT(r13)
+   ld  r4, TASK_TRACEREC(r3)
+   andi.   r5, r4, 0x0010 // ( 1  TRACE_FTRACE_BIT )
+   ori r4, r4, 0x0010
+   std r4, TASK_TRACEREC(r3)
+   bne 3f  // ftrace in progress - avoid recursion!
+   
LOAD_REG_IMMEDIATE(r3,function_trace_op)
ld  r5,0(r3)
 
@@ -1224,9 +1230,14 @@ ftrace_call:
bl  ftrace_stub
nop
 
+   ld  r3, PACACURRENT(r13)
+   ld  r4, TASK_TRACEREC(r3)
+   andi.   r4, r4, 0xffef // ~( 1  TRACE_FTRACE_BIT )
+   std r4, TASK_TRACEREC(r3)
+
ld  r3, _NIP(r1)
mtlrr3
-
+3:
REST_8GPRS(0,r1)
REST_8GPRS(8,r1)
REST_8GPRS(16,r1)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 12/14] perf, tools: Add support for event list topics

2015-06-03 Thread Arnaldo Carvalho de Melo
Em Wed, Jun 03, 2015 at 05:57:33AM -0700, Andi Kleen escreveu:
  please split at least the jevents Topic parsing from the rest
  idelay also the alias update and the display change
 
 What's the point of all these splits? It's already one logical unit,
 not too large, and is bisectable.

Eases review, improves bisectability, and its a reasonable request from
a reviewer/maintainer that has to look at an evergrowing number of patch
flows.

- Arnaldo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/4] ppc64 ftrace implementation

2015-06-03 Thread Torsten Duwe
On Tue, May 19, 2015 at 11:52:47AM +0200, Jiri Kosina wrote:
 On Tue, 19 May 2015, Michael Ellerman wrote:
 
   ftrace already handles recursion protection by itself (depending on the 
   per-ftrace-ops FTRACE_OPS_FL_RECURSION_SAFE flag).
  
  OK, so I wonder why that's not working for us?
 
 The situation when traced function recurses to itself is different from 
 the situation when tracing core infrastrcuture would recurse to itself 
 while performing tracing.

I have used this inspiration to add a catch-all parachute for ftrace_caller,
see my last reply. It reappears here as patch 4/4. Expect noticable performance
impact compared to the selected notrace attributation discussed here. This 
should
still be done in a second step especially for the hardware assistance functions
I mentioned.

Torsten


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] ppc64 ftrace implementation

2015-06-03 Thread Torsten Duwe
Implement ftrace on ppc64

Signed-off-by: Torsten Duwe d...@suse.de

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index e366187..691 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -46,6 +46,8 @@
 extern void _mcount(void);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
+# define FTRACE_ADDR ((unsigned long)ftrace_caller+8)
+# define FTRACE_REGS_ADDR FTRACE_ADDR
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
/* reloction of mcount call site is the same as the address */
@@ -58,6 +60,9 @@ struct dyn_arch_ftrace {
 #endif /*  CONFIG_DYNAMIC_FTRACE */
 #endif /* __ASSEMBLY__ */
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#endif
 #endif
 
 #if defined(CONFIG_FTRACE_SYSCALLS)  defined(CONFIG_PPC64)  
!defined(__ASSEMBLY__)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index d180caf..a4132ef 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1152,32 +1152,107 @@ _GLOBAL(enter_prom)
 
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef CONFIG_DYNAMIC_FTRACE
-_GLOBAL(mcount)
+
+#define TOCSAVE 24
+
 _GLOBAL(_mcount)
-   blr
+   nop // REQUIRED for ftrace, to calculate local/global entry diff
+.localentry _mcount,.-_mcount
+   mflrr0
+   mtctr   r0
+
+   LOAD_REG_ADDR_PIC(r12,ftrace_trace_function)
+   ld  r12,0(r12)
+   LOAD_REG_ADDR_PIC(r0,ftrace_stub)
+   cmpdr0,r12
+   ld  r0,LRSAVE(r1)
+   bne-2f
+
+   mtlrr0
+   bctr
+
+2: /* here we have (*ftrace_trace_function)() in r12,
+  selfpc in CTR
+  and frompc in r0 */
+
+   mtlrr0
+   bctr
+
+_GLOBAL(ftrace_caller)
+   mr  r0,r2   // global (module) call: save module TOC
+   b   1f
+.localentry ftrace_caller,.-ftrace_caller
+   mr  r0,r2   // local call: callee's TOC == our TOC
+   b   2f
+
+1: addis   r2,r12,(.TOC.-0b)@ha
+   addir2,r2,(.TOC.-0b)@l
+
+2: // Here we have our proper TOC ptr in R2,
+   // and the one we need to restore on return in r0.
+
+   ld  r12, 16(r1) // get caller's adress
+
+   stdur1,-SWITCH_FRAME_SIZE(r1)
+
+   std r12, _LINK(r1)
+   SAVE_8GPRS(0,r1)
+   std r0,TOCSAVE(r1)
+   SAVE_8GPRS(8,r1)
+   SAVE_8GPRS(16,r1)
+   SAVE_8GPRS(24,r1)
+
+
+   LOAD_REG_IMMEDIATE(r3,function_trace_op)
+   ld  r5,0(r3)
+
+   mflrr3
+   std r3, _NIP(r1)
+   std r3, 16(r1)
+   subir3, r3, MCOUNT_INSN_SIZE
+   mfmsr   r4
+   std r4, _MSR(r1)
+   mfctr   r4
+   std r4, _CTR(r1)
+   mfxer   r4
+   std r4, _XER(r1)
+   mr  r4, r12
+   addir6, r1 ,STACK_FRAME_OVERHEAD
 
-_GLOBAL_TOC(ftrace_caller)
-   /* Taken from output of objdump from lib64/glibc */
-   mflrr3
-   ld  r11, 0(r1)
-   stdur1, -112(r1)
-   std r3, 128(r1)
-   ld  r4, 16(r11)
-   subir3, r3, MCOUNT_INSN_SIZE
 .globl ftrace_call
 ftrace_call:
bl  ftrace_stub
nop
+
+   ld  r3, _NIP(r1)
+   mtlrr3
+
+   REST_8GPRS(0,r1)
+   REST_8GPRS(8,r1)
+   REST_8GPRS(16,r1)
+   REST_8GPRS(24,r1)
+
+   addi r1, r1, SWITCH_FRAME_SIZE
+
+   ld  r12, 16(r1) // get caller's adress
+   mr  r2,r0   // restore callee's TOC
+   mflrr0  // move this LR to CTR
+   mtctr   r0
+   mr  r0,r12  // restore callee's lr at _mcount site
+   mtlrr0
+   bctr// jump after _mcount site
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 .globl ftrace_graph_call
 ftrace_graph_call:
b   ftrace_graph_stub
 _GLOBAL(ftrace_graph_stub)
 #endif
-   ld  r0, 128(r1)
-   mtlrr0
-   addir1, r1, 112
+   
 _GLOBAL(ftrace_stub)
+   nop
+   nop
+.localentry ftrace_stub,.-ftrace_stub  
blr
 #else
 _GLOBAL_TOC(_mcount)
@@ -1211,12 +1286,12 @@ _GLOBAL(ftrace_stub)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 _GLOBAL(ftrace_graph_caller)
/* load r4 with local address */
-   ld  r4, 128(r1)
+   ld  r4, LRSAVE+SWITCH_FRAME_SIZE(r1)
subir4, r4, MCOUNT_INSN_SIZE
 
/* Grab the LR out of the caller stack frame */
-   ld  r11, 112(r1)
-   ld  r3, 16(r11)
+   ld  r11, SWITCH_FRAME_SIZE(r1)
+   ld  r3, LRSAVE(r11)
 
bl  prepare_ftrace_return
nop
@@ -1228,10 +1303,7 @@ _GLOBAL(ftrace_graph_caller)
ld  r11, 112(r1)
std r3, 16(r11)
 
-   ld  r0, 128(r1)
-   mtlrr0
-   addir1, r1, 112
-   blr
+   b ftrace_graph_stub
 
 _GLOBAL(return_to_handler)
/* need to save return values */
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index 44d4d8e..349d07c 100644
--- 

Re: [PATCH] cpuidle: powernv/pseries: Decrease the snooze residency

2015-06-03 Thread Vaidyanathan Srinivasan
* Benjamin Herrenschmidt b...@au1.ibm.com [2015-05-30 20:38:22]:

 On Sat, 2015-05-30 at 11:31 +0530, Vaidyanathan Srinivasan wrote:
  In shared lpar case, spinning in guest context may potentially take
  away cycles from other lpars waiting to run on the same physical cpu.
  
  So the policy in shared lpar case is to let PowerVM hypervisor know
  immediately that the guest cpu is idle which will allow the hypervisor
  to use the cycles for other tasks/lpars.
 
 But that will have negative side effects under KVM no ?

Yes, you have a good point.  If one of the thread in the core goes to
cede, it can still come back quickly since the KVM guest context is
not switched yet.  But in single threaded guest, this can force an
unnecessary exit/context switch overhead.

Now that we have fixed the snooze loop to be bounded and exit
predictably, KVM guest should actually use snooze state to improve
latency.

I will test this scenario and enable snooze state for KVM guest.
 
 Suresh mentioned something with his new directed interrupts code that we
 had many cases where the interrupts ended up arriving shortly after we
 exited to host for NAP'ing ...
 
 Snooze might fix it...

Right.  This scenario is worth experimenting and then introduce snooze
loop for guest.

--Vaidy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs

2015-06-03 Thread Bjorn Helgaas
On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
 On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
 On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
 On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
  Current EEH recovery code works with the assumption: the PE has primary
  bus. Unfortunately, that's not true to VF PEs, which generally contains
  one or multiple VFs (for VF group case). The patch creates PEs for VFs
  at PCI final fixup time. Those PEs for VFs are indentified with newly
  introduced flag EEH_PE_VF so that we handle them differently during
  EEH recovery.
  
  [gwshan: changelog and code refactoring]
  Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
  Acked-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
   arch/powerpc/include/asm/eeh.h   |1 +
   arch/powerpc/kernel/eeh_pe.c |   10 --
   arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +
   3 files changed, 26 insertions(+), 2 deletions(-)
  
  diff --git a/arch/powerpc/include/asm/eeh.h 
  b/arch/powerpc/include/asm/eeh.h
  index 1b3614d..c1fde48 100644
  --- a/arch/powerpc/include/asm/eeh.h
  +++ b/arch/powerpc/include/asm/eeh.h
  @@ -70,6 +70,7 @@ struct pci_dn;
   #define EEH_PE_PHB   (1  1)/* PHB PE*/
   #define EEH_PE_DEVICE(1  2)/* Device PE */
   #define EEH_PE_BUS   (1  3)/* Bus PE*/
  +#define EEH_PE_VF(1  4)/* VF PE */
   
   #define EEH_PE_ISOLATED  (1  0)/* Isolated PE  
  */
   #define EEH_PE_RECOVERING(1  1)/* Recovering PE
  */
  diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
  index 35f0b62..260a701 100644
  --- a/arch/powerpc/kernel/eeh_pe.c
  +++ b/arch/powerpc/kernel/eeh_pe.c
  @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct 
  eeh_dev *edev)
 * EEH device already having associated PE, but
 * the direct parent EEH device doesn't have yet.
 */
  - pdn = pdn ? pdn-parent : NULL;
  + if (edev-physfn)
  + pdn = pci_get_pdn(edev-physfn);
  + else
  + pdn = pdn ? pdn-parent : NULL;
while (pdn) {
/* We're poking out of PCI territory */
parent = pdn_to_eeh_dev(pdn);
  @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
}
   
/* Create a new EEH PE */
  - pe = eeh_pe_alloc(edev-phb, EEH_PE_DEVICE);
  + if (edev-physfn)
  + pe = eeh_pe_alloc(edev-phb, EEH_PE_VF);
  + else
  + pe = eeh_pe_alloc(edev-phb, EEH_PE_DEVICE);
if (!pe) {
pr_err(%s: out of memory!\n, __func__);
return -ENOMEM;
  diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
  b/arch/powerpc/platforms/powernv/eeh-powernv.c
  index ce738ab..c505036 100644
  --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
  +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
  @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
.restore_config = pnv_eeh_restore_config
   };
   
  +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
  +{
  + struct pci_dn *pdn = pci_get_pdn(pdev);
  +
  + if (!pdev-is_virtfn)
  + return;
  +
  + /*
  +  * The following operations will fail if VF's sysfs files
  +  * aren't created or its resources aren't finalized.
  +  */
 
 I don't understand this comment.  The following operations seems to refer
 to eeh_add_device_early() and eeh_add_device_late(), and
 VF's sysfs files being created seems to refer to eeh_sysfs_add_device().
 
 So the comment suggests that eeh_add_device_early() and
 eeh_add_device_late() will fail because they're called before
 eeh_sysfs_add_device().  So I think you must be talking about some other
 following operations, not eeh_add_device_early() and
 eeh_add_device_late().
 
 Sorry for this confusion.
 
 The comment here wants to say the eeh_sysfs_add_device() will fail if the 
 VF's
 sysfs is not created well. Or it will fail if the VF's resources are not set
 properly, since we would cache the VF's BAR in eeh_add_device_late().
 
 Gavin,
 
 If my understanding is not correct please let me know.
 
 
 It's correct. The following operations refers to eeh_add_device_late()
 and eeh_sysfs_add_device(). The former one requires the resources for
 one particular PCI device (VF here) are finalized (assigned). 
 eeh_sysfs_add_device()
 will fail if the sysfs entry for the PCI device isn't populated yet.

eeh_add_device_late() contains several things that read config space:
eeh_save_bars() caches the entire config header, and
eeh_addr_cache_insert_dev() looks at the device resources (which are
determined by BARs in config space).  I think this is an error-prone
approach.  I think it would be simpler and safer for you to capture what
you need in your PCI config accessors.

eeh_add_device_late() also contains code to deal with an EEH cache that
might not be removed correctly because of unbalanced kref to the device

Re: [v3,33/36] genirq: Use helper function to access irq_data-msi_desc

2015-06-03 Thread Thomas Gleixner
On Tue, 2 Jun 2015, Michael Ellerman wrote:
 Though I would point out we already have 16 irq_foo_get_bar() accessors?!

Say thanks to the people who mindlessly fiddled in the core managed
fields of irqdesc and irqdata. The hard to track down wreckage they
caused me to enforce accessors.

The new accessors are to help migrating members of irq_data to a new
data structure without breaking the world and some more.

Thanks,

tglx


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v11 09/34] vfio: powerpc/spapr: Move locked_vm accounting to helpers

2015-06-03 Thread David Gibson
On Wed, Jun 03, 2015 at 09:11:09PM +1000, Alexey Kardashevskiy wrote:
 On 06/01/2015 02:28 PM, David Gibson wrote:
 On Fri, May 29, 2015 at 06:44:33PM +1000, Alexey Kardashevskiy wrote:
 There moves locked pages accounting to helpers.
 Later they will be reused for Dynamic DMA windows (DDW).
 
 This reworks debug messages to show the current value and the limit.
 
 This stores the locked pages number in the container so when unlocking
 the iommu table pointer won't be needed. This does not have an effect
 now but it will with the multiple tables per container as then we will
 allow attaching/detaching groups on fly and we may end up having
 a container with no group attached but with the counter incremented.
 
 While we are here, update the comment explaining why RLIMIT_MEMLOCK
 might be required to be bigger than the guest RAM. This also prints
 pid of the current process in pr_warn/pr_debug.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com
 Reviewed-by: David Gibson da...@gibson.dropbear.id.au
 Reviewed-by: Gavin Shan gws...@linux.vnet.ibm.com
 ---
 Changes:
 v4:
 * new helpers do nothing if @npages == 0
 * tce_iommu_disable() now can decrement the counter if the group was
 detached (not possible now but will be in the future)
 ---
   drivers/vfio/vfio_iommu_spapr_tce.c | 82 
  -
   1 file changed, 63 insertions(+), 19 deletions(-)
 
 diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
 b/drivers/vfio/vfio_iommu_spapr_tce.c
 index 64300cc..40583f9 100644
 --- a/drivers/vfio/vfio_iommu_spapr_tce.c
 +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
 @@ -29,6 +29,51 @@
   static void tce_iommu_detach_group(void *iommu_data,
 struct iommu_group *iommu_group);
 
 +static long try_increment_locked_vm(long npages)
 +{
 +   long ret = 0, locked, lock_limit;
 +
 +   if (!current || !current-mm)
 +   return -ESRCH; /* process exited */
 +
 +   if (!npages)
 +   return 0;
 +
 +   down_write(current-mm-mmap_sem);
 +   locked = current-mm-locked_vm + npages;
 
 Is there a possibility of userspace triggering an integer overflow
 here, if npages is really huge?
 
 
 I do not see how. I just do not accept npages bigger than the host RAM size
 in pages. And it is long. For (lets say) 128GB host, the number of 4KB
 pages is (12830)/4096=33554432.

Ah, yes, npages has already been shifted right so it should be safe. Ok.

 
 
 
 +   lock_limit = rlimit(RLIMIT_MEMLOCK)  PAGE_SHIFT;
 +   if (locked  lock_limit  !capable(CAP_IPC_LOCK))
 +   ret = -ENOMEM;
 +   else
 +   current-mm-locked_vm += npages;
 +
 +   pr_debug([%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n, current-pid,
 +   npages  PAGE_SHIFT,
 +   current-mm-locked_vm  PAGE_SHIFT,
 +   rlimit(RLIMIT_MEMLOCK),
 +   ret ?  - exceeded : );
 +
 +   up_write(current-mm-mmap_sem);
 +
 +   return ret;
 +}
 +
 +static void decrement_locked_vm(long npages)
 +{
 +   if (!current || !current-mm || !npages)
 +   return; /* process exited */
 +
 +   down_write(current-mm-mmap_sem);
 +   if (npages  current-mm-locked_vm)
 +   npages = current-mm-locked_vm;
 
 Can this case ever occur (without there being a leak bug somewhere
 else in the code)?
 
 
 It should not. Safety measure. Having a warning here might make sense but I
 believe if this happens, there will be many, many warnings in other places
 :)

Ok.  I'd would be nice to see a WARN_ON() as documentation that this
isn't a situation that should ever happen.  I wouldn't nack on that
basis alone though.

 +   current-mm-locked_vm -= npages;
 +   pr_debug([%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n, current-pid,
 +   npages  PAGE_SHIFT,
 +   current-mm-locked_vm  PAGE_SHIFT,
 +   rlimit(RLIMIT_MEMLOCK));
 +   up_write(current-mm-mmap_sem);
 +}
 +
   /*
* VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
*
 @@ -45,6 +90,7 @@ struct tce_container {
 struct mutex lock;
 struct iommu_table *tbl;
 bool enabled;
 +   unsigned long locked_pages;
   };
 
   static bool tce_page_is_contained(struct page *page, unsigned page_shift)
 @@ -60,7 +106,7 @@ static bool tce_page_is_contained(struct page *page, 
 unsigned page_shift)
   static int tce_iommu_enable(struct tce_container *container)
   {
 int ret = 0;
 -   unsigned long locked, lock_limit, npages;
 +   unsigned long locked;
 struct iommu_table *tbl = container-tbl;
 
 if (!container-tbl)
 @@ -89,21 +135,22 @@ static int tce_iommu_enable(struct tce_container 
 *container)
  * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits,
  * that would effectively kill the guest at random points, much better
  * enforcing the limit based on the max that the guest can map.
 +*
 +* Unfortunately at the moment it counts whole tables, no matter how
 +* much 

[RFT v2 26/48] powerpc, irq: Prepare for killing the first parameter 'irq' of irq_flow_handler_t

2015-06-03 Thread Jiang Liu
Change irq flow handler to prepare for killing the first parameter 'irq'
of irq_flow_handler_t.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 arch/powerpc/platforms/512x/mpc5121_ads_cpld.c  |4 +++-
 arch/powerpc/platforms/85xx/socrates_fpga_pic.c |2 +-
 arch/powerpc/platforms/cell/interrupt.c |3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c 
b/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c
index ca3a062ed1b9..4411ed51803e 100644
--- a/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c
+++ b/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c
@@ -105,8 +105,10 @@ cpld_pic_get_irq(int offset, u8 ignore, u8 __iomem 
*statusp,
 }
 
 static void
-cpld_pic_cascade(unsigned int irq, struct irq_desc *desc)
+cpld_pic_cascade(unsigned int __irq, struct irq_desc *desc)
 {
+   unsigned int irq;
+
irq = cpld_pic_get_irq(0, PCI_IGNORE, cpld_regs-pci_status,
cpld_regs-pci_mask);
if (irq != NO_IRQ) {
diff --git a/arch/powerpc/platforms/85xx/socrates_fpga_pic.c 
b/arch/powerpc/platforms/85xx/socrates_fpga_pic.c
index 55a9682b9529..5153e58654f7 100644
--- a/arch/powerpc/platforms/85xx/socrates_fpga_pic.c
+++ b/arch/powerpc/platforms/85xx/socrates_fpga_pic.c
@@ -100,7 +100,7 @@ void socrates_fpga_pic_cascade(unsigned int irq, struct 
irq_desc *desc)
 * See if we actually have an interrupt, call generic handling code if
 * we do.
 */
-   cascade_irq = socrates_fpga_pic_get_irq(irq);
+   cascade_irq = socrates_fpga_pic_get_irq(irq_desc_get_irq(desc));
 
if (cascade_irq != NO_IRQ)
generic_handle_irq(cascade_irq);
diff --git a/arch/powerpc/platforms/cell/interrupt.c 
b/arch/powerpc/platforms/cell/interrupt.c
index 3af8324c122e..e2dd6c9d3a78 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -99,8 +99,9 @@ static void iic_ioexc_eoi(struct irq_data *d)
 {
 }
 
-static void iic_ioexc_cascade(unsigned int irq, struct irq_desc *desc)
+static void iic_ioexc_cascade(unsigned int __irq, struct irq_desc *desc)
 {
+   unsigned int irq = irq_desc_get_irq(desc);
struct irq_chip *chip = irq_desc_get_chip(desc);
struct cbe_iic_regs __iomem *node_iic =
(void __iomem *)irq_desc_get_handler_data(desc);
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs

2015-06-03 Thread Wei Yang
On Wed, Jun 03, 2015 at 10:46:38AM -0500, Bjorn Helgaas wrote:
On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
 On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
 On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
 On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
  Current EEH recovery code works with the assumption: the PE has primary
  bus. Unfortunately, that's not true to VF PEs, which generally contains
  one or multiple VFs (for VF group case). The patch creates PEs for VFs
  at PCI final fixup time. Those PEs for VFs are indentified with newly
  introduced flag EEH_PE_VF so that we handle them differently during
  EEH recovery.
  
  [gwshan: changelog and code refactoring]
  Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
  Acked-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
   arch/powerpc/include/asm/eeh.h   |1 +
   arch/powerpc/kernel/eeh_pe.c |   10 --
   arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +
   3 files changed, 26 insertions(+), 2 deletions(-)
  
  diff --git a/arch/powerpc/include/asm/eeh.h 
  b/arch/powerpc/include/asm/eeh.h
  index 1b3614d..c1fde48 100644
  --- a/arch/powerpc/include/asm/eeh.h
  +++ b/arch/powerpc/include/asm/eeh.h
  @@ -70,6 +70,7 @@ struct pci_dn;
   #define EEH_PE_PHB  (1  1)/* PHB PE*/
   #define EEH_PE_DEVICE   (1  2)/* Device PE */
   #define EEH_PE_BUS  (1  3)/* Bus PE*/
  +#define EEH_PE_VF   (1  4)/* VF PE */
   
   #define EEH_PE_ISOLATED (1  0)/* Isolated PE  
  */
   #define EEH_PE_RECOVERING   (1  1)/* Recovering PE
  */
  diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
  index 35f0b62..260a701 100644
  --- a/arch/powerpc/kernel/eeh_pe.c
  +++ b/arch/powerpc/kernel/eeh_pe.c
  @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct 
  eeh_dev *edev)
* EEH device already having associated PE, but
* the direct parent EEH device doesn't have yet.
*/
  -pdn = pdn ? pdn-parent : NULL;
  +if (edev-physfn)
  +pdn = pci_get_pdn(edev-physfn);
  +else
  +pdn = pdn ? pdn-parent : NULL;
   while (pdn) {
   /* We're poking out of PCI territory */
   parent = pdn_to_eeh_dev(pdn);
  @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
   }
   
   /* Create a new EEH PE */
  -pe = eeh_pe_alloc(edev-phb, EEH_PE_DEVICE);
  +if (edev-physfn)
  +pe = eeh_pe_alloc(edev-phb, EEH_PE_VF);
  +else
  +pe = eeh_pe_alloc(edev-phb, EEH_PE_DEVICE);
   if (!pe) {
   pr_err(%s: out of memory!\n, __func__);
   return -ENOMEM;
  diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
  b/arch/powerpc/platforms/powernv/eeh-powernv.c
  index ce738ab..c505036 100644
  --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
  +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
  @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
   .restore_config = pnv_eeh_restore_config
   };
   
  +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
  +{
  +struct pci_dn *pdn = pci_get_pdn(pdev);
  +
  +if (!pdev-is_virtfn)
  +return;
  +
  +/*
  + * The following operations will fail if VF's sysfs files
  + * aren't created or its resources aren't finalized.
  + */
 
 I don't understand this comment.  The following operations seems to refer
 to eeh_add_device_early() and eeh_add_device_late(), and
 VF's sysfs files being created seems to refer to eeh_sysfs_add_device().
 
 So the comment suggests that eeh_add_device_early() and
 eeh_add_device_late() will fail because they're called before
 eeh_sysfs_add_device().  So I think you must be talking about some other
 following operations, not eeh_add_device_early() and
 eeh_add_device_late().
 
 Sorry for this confusion.
 
 The comment here wants to say the eeh_sysfs_add_device() will fail if the 
 VF's
 sysfs is not created well. Or it will fail if the VF's resources are not set
 properly, since we would cache the VF's BAR in eeh_add_device_late().
 
 Gavin,
 
 If my understanding is not correct please let me know.
 
 
 It's correct. The following operations refers to eeh_add_device_late()
 and eeh_sysfs_add_device(). The former one requires the resources for
 one particular PCI device (VF here) are finalized (assigned). 
 eeh_sysfs_add_device()
 will fail if the sysfs entry for the PCI device isn't populated yet.

eeh_add_device_late() contains several things that read config space:
eeh_save_bars() caches the entire config header, and
eeh_addr_cache_insert_dev() looks at the device resources (which are
determined by BARs in config space).  I think this is an error-prone

Re: [PATCH kernel v11 27/34] powerpc/powernv: Implement multilevel TCE tables

2015-06-03 Thread David Gibson
On Wed, Jun 03, 2015 at 09:27:10PM +1000, Alexey Kardashevskiy wrote:
 On 06/02/2015 09:50 AM, David Gibson wrote:
 On Fri, May 29, 2015 at 06:44:51PM +1000, Alexey Kardashevskiy wrote:
 TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
 on huge guests (hundreds of GB of RAM) so the kernel might be unable to
 allocate contiguous chunk of physical memory to store the TCE table.
 
 To address this, POWER8 CPU (actually, IODA2) supports multi-level
 TCE tables, up to 5 levels which splits the table into a tree of
 smaller subtables.
 
 This adds multi-level TCE tables support to
 pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages()
 helpers.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v10:
 * fixed multiple comments received for v9
 
 v9:
 * moved from ioda2 to common powernv pci code
 * fixed cleanup if allocation fails in a middle
 * removed check for the size - all boundary checks happen in the calling 
 code
 anyway
 ---
   arch/powerpc/include/asm/iommu.h  |  2 +
   arch/powerpc/platforms/powernv/pci-ioda.c | 98 
  ---
   arch/powerpc/platforms/powernv/pci.c  | 13 
   3 files changed, 104 insertions(+), 9 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 4636734..706cfc0 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -96,6 +96,8 @@ struct iommu_pool {
   struct iommu_table {
 unsigned long  it_busno; /* Bus number this table belongs to */
 unsigned long  it_size;  /* Size of iommu table in entries */
 +   unsigned long  it_indirect_levels;
 +   unsigned long  it_level_size;
 unsigned long  it_offset;/* Offset into global table */
 unsigned long  it_base;  /* mapped address of tce table */
 unsigned long  it_index; /* which iommu table this is */
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index fda01c1..68ffc7a 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -49,6 +49,9 @@
   /* 256M DMA window, 4K TCE pages, 8 bytes TCE */
   #define TCE32_TABLE_SIZE  ((0x1000 / 0x1000) * 8)
 
 +#define POWERNV_IOMMU_DEFAULT_LEVELS   1
 +#define POWERNV_IOMMU_MAX_LEVELS   5
 +
   static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
 
   static void pe_level_printk(const struct pnv_ioda_pe *pe, const char 
  *level,
 @@ -1975,6 +1978,8 @@ static long pnv_pci_ioda2_set_window(struct 
 iommu_table_group *table_group,
 table_group);
 struct pnv_phb *phb = pe-phb;
 int64_t rc;
 +   const unsigned long size = tbl-it_indirect_levels ?
 +   tbl-it_level_size : tbl-it_size;
 const __u64 start_addr = tbl-it_offset  tbl-it_page_shift;
 const __u64 win_size = tbl-it_size  tbl-it_page_shift;
 
 @@ -1989,9 +1994,9 @@ static long pnv_pci_ioda2_set_window(struct 
 iommu_table_group *table_group,
 rc = opal_pci_map_pe_dma_window(phb-opal_id,
 pe-pe_number,
 pe-pe_number  1,
 -   1,
 +   tbl-it_indirect_levels + 1,
 __pa(tbl-it_base),
 -   tbl-it_size  3,
 +   size  3,
 IOMMU_PAGE_SIZE(tbl));
 if (rc) {
 pe_err(pe, Failed to configure TCE table, err %ld\n, rc);
 @@ -2071,11 +2076,19 @@ static void pnv_pci_ioda_setup_opal_tce_kill(struct 
 pnv_phb *phb)
 phb-ioda.tce_inval_reg = ioremap(phb-ioda.tce_inval_reg_phys, 8);
   }
 
 -static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift)
 +static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift,
 +   unsigned levels, unsigned long limit,
 +   unsigned long *tce_table_allocated)
   {
 struct page *tce_mem = NULL;
 -   __be64 *addr;
 +   __be64 *addr, *tmp;
 unsigned order = max_t(unsigned, shift, PAGE_SHIFT) - PAGE_SHIFT;
 +   unsigned long local_allocated = 1UL  (order + PAGE_SHIFT);
 +   unsigned entries = 1UL  (shift - 3);
 +   long i;
 +
 +   if (*tce_table_allocated = limit)
 +   return NULL;
 
 I'm not quite clear what case this limit logic is trying to catch.
 
 
 The function is allocating some amount of entries which may be in one chunk
 of memory and spread between multiple chunks in multiple levels. limit is
 the amount of memory for actual TCEs (not intermediate levels). If I do not
 do this, and the user requests 5 levels, and I do not check this, more
 memory will be allocated that actually needed because size of the window is
 limited.

Ah, ok.  It's to handle the case where the requested window size
doesn't match a whole number of levels.

It seems a rather counter-intuitive way of handling it to me -
tracking the amount of memory allocated at the leaf level, rather than
tracking what window offset you're 

[RFT v2 38/48] genirq, powerpc: Kill the first parameter 'irq' of irq_flow_handler_t

2015-06-03 Thread Jiang Liu
Now most IRQ flow handlers make no use of the first parameter 'irq'.
And for those who do make use of 'irq', we could easily get the irq
number through irq_desc-irq_data-irq. So kill the first parameter
'irq' of irq_flow_handler_t.

To ease review, I have split the changes into several parts, though
they should be merge as one to support bisecting.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 arch/powerpc/include/asm/qe_ic.h|   23 +--
 arch/powerpc/include/asm/tsi108_pci.h   |2 +-
 arch/powerpc/platforms/512x/mpc5121_ads_cpld.c  |2 +-
 arch/powerpc/platforms/52xx/media5200.c |2 +-
 arch/powerpc/platforms/52xx/mpc52xx_gpt.c   |2 +-
 arch/powerpc/platforms/82xx/pq2ads-pci-pic.c|2 +-
 arch/powerpc/platforms/85xx/common.c|2 +-
 arch/powerpc/platforms/85xx/mpc85xx_cds.c   |5 ++---
 arch/powerpc/platforms/85xx/mpc85xx_ds.c|2 +-
 arch/powerpc/platforms/85xx/socrates_fpga_pic.c |2 +-
 arch/powerpc/platforms/86xx/pic.c   |2 +-
 arch/powerpc/platforms/8xx/m8xx_setup.c |2 +-
 arch/powerpc/platforms/cell/axon_msi.c  |2 +-
 arch/powerpc/platforms/cell/interrupt.c |2 +-
 arch/powerpc/platforms/cell/spider-pic.c|2 +-
 arch/powerpc/platforms/chrp/setup.c |2 +-
 arch/powerpc/platforms/embedded6xx/hlwd-pic.c   |3 +--
 arch/powerpc/platforms/embedded6xx/mvme5100.c   |2 +-
 arch/powerpc/platforms/pseries/setup.c  |2 +-
 arch/powerpc/sysdev/ge/ge_pic.c |2 +-
 arch/powerpc/sysdev/ge/ge_pic.h |2 +-
 arch/powerpc/sysdev/mpic.c  |2 +-
 arch/powerpc/sysdev/qe_lib/qe_ic.c  |4 ++--
 arch/powerpc/sysdev/tsi108_pci.c|2 +-
 arch/powerpc/sysdev/uic.c   |2 +-
 arch/powerpc/sysdev/xilinx_intc.c   |2 +-
 26 files changed, 36 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/qe_ic.h b/arch/powerpc/include/asm/qe_ic.h
index 25784cc959a0..1e155ca6d33c 100644
--- a/arch/powerpc/include/asm/qe_ic.h
+++ b/arch/powerpc/include/asm/qe_ic.h
@@ -59,14 +59,14 @@ enum qe_ic_grp_id {
 
 #ifdef CONFIG_QUICC_ENGINE
 void qe_ic_init(struct device_node *node, unsigned int flags,
-   void (*low_handler)(unsigned int irq, struct irq_desc *desc),
-   void (*high_handler)(unsigned int irq, struct irq_desc *desc));
+   void (*low_handler)(struct irq_desc *desc),
+   void (*high_handler)(struct irq_desc *desc));
 unsigned int qe_ic_get_low_irq(struct qe_ic *qe_ic);
 unsigned int qe_ic_get_high_irq(struct qe_ic *qe_ic);
 #else
 static inline void qe_ic_init(struct device_node *node, unsigned int flags,
-   void (*low_handler)(unsigned int irq, struct irq_desc *desc),
-   void (*high_handler)(unsigned int irq, struct irq_desc *desc))
+   void (*low_handler)(struct irq_desc *desc),
+   void (*high_handler)(struct irq_desc *desc))
 {}
 static inline unsigned int qe_ic_get_low_irq(struct qe_ic *qe_ic)
 { return 0; }
@@ -78,8 +78,7 @@ void qe_ic_set_highest_priority(unsigned int virq, int high);
 int qe_ic_set_priority(unsigned int virq, unsigned int priority);
 int qe_ic_set_high_priority(unsigned int virq, unsigned int priority, int 
high);
 
-static inline void qe_ic_cascade_low_ipic(unsigned int irq,
- struct irq_desc *desc)
+static inline void qe_ic_cascade_low_ipic(struct irq_desc *desc)
 {
struct qe_ic *qe_ic = irq_desc_get_handler_data(desc);
unsigned int cascade_irq = qe_ic_get_low_irq(qe_ic);
@@ -88,8 +87,7 @@ static inline void qe_ic_cascade_low_ipic(unsigned int irq,
generic_handle_irq(cascade_irq);
 }
 
-static inline void qe_ic_cascade_high_ipic(unsigned int irq,
-  struct irq_desc *desc)
+static inline void qe_ic_cascade_high_ipic(struct irq_desc *desc)
 {
struct qe_ic *qe_ic = irq_desc_get_handler_data(desc);
unsigned int cascade_irq = qe_ic_get_high_irq(qe_ic);
@@ -98,8 +96,7 @@ static inline void qe_ic_cascade_high_ipic(unsigned int irq,
generic_handle_irq(cascade_irq);
 }
 
-static inline void qe_ic_cascade_low_mpic(unsigned int irq,
- struct irq_desc *desc)
+static inline void qe_ic_cascade_low_mpic(struct irq_desc *desc)
 {
struct qe_ic *qe_ic = irq_desc_get_handler_data(desc);
unsigned int cascade_irq = qe_ic_get_low_irq(qe_ic);
@@ -111,8 +108,7 @@ static inline void qe_ic_cascade_low_mpic(unsigned int irq,
chip-irq_eoi(desc-irq_data);
 }
 
-static inline void qe_ic_cascade_high_mpic(unsigned int irq,
-  struct irq_desc *desc)
+static inline void qe_ic_cascade_high_mpic(struct irq_desc *desc)
 {
struct qe_ic *qe_ic = 

Re: [RFC 07/24] x86/thinkpad_acpi: Use arch_nvram_ops methods instead of nvram_read_byte() and nvram_write_byte()

2015-06-03 Thread Darren Hart
On Wed, Jun 03, 2015 at 07:37:13AM -0300, Henrique de Moraes Holschuh wrote:
 On Wed, Jun 3, 2015, at 00:34, Darren Hart wrote:
  On Tue, Jun 02, 2015 at 07:09:28AM -0300, Henrique de Moraes Holschuh
  wrote:
   Test results were sent to me privately, and they are correct, so...
  
  Finn, unless there is some compelling reason not to - like they are MBs
  worth of
  data, please submit these to the list in the future so we have them for
  reference.
 
 After I told him which exact bitmask to use on a T43 to test
 hotkey_source_mask, his test results can be summarized as I could see
 no difference in behavior, which is *exactly* what I expected to
 happen.
 
 If anything went wrong with the thinkpad-acpi NVRAM code, you'd notice a
 very large change in behavior (typical: hotkeys don't work, less
 typical: random hotkey keypresses, hotkey press bursts, low responsivity
 of hotkeys).

Perfect, thanks for the update so we have it recorded here on the list.

-- 
Darren Hart
Intel Open Source Technology Center
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v11 33/34] vfio: powerpc/spapr: Register memory and define IOMMU v2

2015-06-03 Thread David Gibson
On Wed, Jun 03, 2015 at 09:40:49PM +1000, Alexey Kardashevskiy wrote:
 On 06/02/2015 02:17 PM, David Gibson wrote:
 On Fri, May 29, 2015 at 06:44:57PM +1000, Alexey Kardashevskiy wrote:
 The existing implementation accounts the whole DMA window in
 the locked_vm counter. This is going to be worse with multiple
 containers and huge DMA windows. Also, real-time accounting would requite
 additional tracking of accounted pages due to the page size difference -
 IOMMU uses 4K pages and system uses 4K or 64K pages.
 
 Another issue is that actual pages pinning/unpinning happens on every
 DMA map/unmap request. This does not affect the performance much now as
 we spend way too much time now on switching context between
 guest/userspace/host but this will start to matter when we add in-kernel
 DMA map/unmap acceleration.
 
 This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
 New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
 2 new ioctls to register/unregister DMA memory -
 VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
 which receive user space address and size of a memory region which
 needs to be pinned/unpinned and counted in locked_vm.
 New IOMMU splits physical pages pinning and TCE table update
 into 2 different operations. It requires:
 1) guest pages to be registered first
 2) consequent map/unmap requests to work only with pre-registered memory.
 For the default single window case this means that the entire guest
 (instead of 2GB) needs to be pinned before using VFIO.
 When a huge DMA window is added, no additional pinning will be
 required, otherwise it would be guest RAM + 2GB.
 
 The new memory registration ioctls are not supported by
 VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
 will require memory to be preregistered in order to work.
 
 The accounting is done per the user process.
 
 This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
 can do with v1 or v2 IOMMUs.
 
 In order to support memory pre-registration, we need a way to track
 the use of every registered memory region and only allow unregistration
 if a region is not in use anymore. So we need a way to tell from what
 region the just cleared TCE was from.
 
 This adds a userspace view of the TCE table into iommu_table struct.
 It contains userspace address, one per TCE entry. The table is only
 allocated when the ownership over an IOMMU group is taken which means
 it is only used from outside of the powernv code (such as VFIO).
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com
 ---
 Changes:
 v11:
 * mm_iommu_put() does not return a code so this does not check it
 * moved v2 in tce_container to pack the struct
 
 v10:
 * moved it_userspace allocation to vfio_iommu_spapr_tce as it VFIO
 specific thing
 * squashed powerpc/iommu: Add userspace view of TCE table into this as
 it is
 a part of IOMMU v2
 * s/tce_iommu_use_page_v2/tce_iommu_prereg_ua_to_hpa/
 * fixed some function names to have tce_iommu_ in the beginning rather
 just tce_
 * as mm_iommu_mapped_inc() can now fail, check for the return code
 
 v9:
 * s/tce_get_hva_cached/tce_iommu_use_page_v2/
 
 v7:
 * now memory is registered per mm (i.e. process)
 * moved memory registration code to powerpc/mmu
 * merged vfio: powerpc/spapr: Define v2 IOMMU into this
 * limited new ioctls to v2 IOMMU
 * updated doc
 * unsupported ioclts return -ENOTTY instead of -EPERM
 
 v6:
 * tce_get_hva_cached() returns hva via a pointer
 
 v4:
 * updated docs
 * s/kzmalloc/vzalloc/
 * in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
 replaced offset with index
 * renamed vfio_iommu_type_register_memory to 
 vfio_iommu_spapr_register_memory
 and removed duplicating vfio_iommu_spapr_register_memory
 ---
   Documentation/vfio.txt  |  31 ++-
   arch/powerpc/include/asm/iommu.h|   6 +
   drivers/vfio/vfio_iommu_spapr_tce.c | 512 
  ++--
   include/uapi/linux/vfio.h   |  27 ++
   4 files changed, 487 insertions(+), 89 deletions(-)
 
 diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
 index 96978ec..7dcf2b5 100644
 --- a/Documentation/vfio.txt
 +++ b/Documentation/vfio.txt
 @@ -289,10 +289,12 @@ PPC64 sPAPR implementation note
 
   This implementation has some specifics:
 
 -1) Only one IOMMU group per container is supported as an IOMMU group
 -represents the minimal entity which isolation can be guaranteed for and
 -groups are allocated statically, one per a Partitionable Endpoint (PE)
 +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
 +container is supported as an IOMMU table is allocated at the boot time,
 +one table per a IOMMU group which is a Partitionable Endpoint (PE)
   (PE is often a PCI domain but not always).
 +Newer systems (POWER8 with IODA2) have improved hardware design which 
 allows
 +to 

Re: [PATCH V7 06/10] powerpc/eeh: Create PE for VFs

2015-06-03 Thread Gavin Shan
On Wed, Jun 03, 2015 at 10:46:38AM -0500, Bjorn Helgaas wrote:
On Wed, Jun 03, 2015 at 03:10:23PM +1000, Gavin Shan wrote:
 On Wed, Jun 03, 2015 at 11:31:42AM +0800, Wei Yang wrote:
 On Mon, Jun 01, 2015 at 06:46:45PM -0500, Bjorn Helgaas wrote:
 On Tue, May 19, 2015 at 06:50:08PM +0800, Wei Yang wrote:
  Current EEH recovery code works with the assumption: the PE has primary
  bus. Unfortunately, that's not true to VF PEs, which generally contains
  one or multiple VFs (for VF group case). The patch creates PEs for VFs
  at PCI final fixup time. Those PEs for VFs are indentified with newly
  introduced flag EEH_PE_VF so that we handle them differently during
  EEH recovery.
  
  [gwshan: changelog and code refactoring]
  Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
  Acked-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
   arch/powerpc/include/asm/eeh.h   |1 +
   arch/powerpc/kernel/eeh_pe.c |   10 --
   arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +
   3 files changed, 26 insertions(+), 2 deletions(-)
  
  diff --git a/arch/powerpc/include/asm/eeh.h 
  b/arch/powerpc/include/asm/eeh.h
  index 1b3614d..c1fde48 100644
  --- a/arch/powerpc/include/asm/eeh.h
  +++ b/arch/powerpc/include/asm/eeh.h
  @@ -70,6 +70,7 @@ struct pci_dn;
   #define EEH_PE_PHB  (1  1)/* PHB PE*/
   #define EEH_PE_DEVICE   (1  2)/* Device PE */
   #define EEH_PE_BUS  (1  3)/* Bus PE*/
  +#define EEH_PE_VF   (1  4)/* VF PE */
   
   #define EEH_PE_ISOLATED (1  0)/* Isolated PE  
  */
   #define EEH_PE_RECOVERING   (1  1)/* Recovering PE
  */
  diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
  index 35f0b62..260a701 100644
  --- a/arch/powerpc/kernel/eeh_pe.c
  +++ b/arch/powerpc/kernel/eeh_pe.c
  @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct 
  eeh_dev *edev)
* EEH device already having associated PE, but
* the direct parent EEH device doesn't have yet.
*/
  -pdn = pdn ? pdn-parent : NULL;
  +if (edev-physfn)
  +pdn = pci_get_pdn(edev-physfn);
  +else
  +pdn = pdn ? pdn-parent : NULL;
   while (pdn) {
   /* We're poking out of PCI territory */
   parent = pdn_to_eeh_dev(pdn);
  @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
   }
   
   /* Create a new EEH PE */
  -pe = eeh_pe_alloc(edev-phb, EEH_PE_DEVICE);
  +if (edev-physfn)
  +pe = eeh_pe_alloc(edev-phb, EEH_PE_VF);
  +else
  +pe = eeh_pe_alloc(edev-phb, EEH_PE_DEVICE);
   if (!pe) {
   pr_err(%s: out of memory!\n, __func__);
   return -ENOMEM;
  diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
  b/arch/powerpc/platforms/powernv/eeh-powernv.c
  index ce738ab..c505036 100644
  --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
  +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
  @@ -1520,6 +1520,23 @@ static struct eeh_ops pnv_eeh_ops = {
   .restore_config = pnv_eeh_restore_config
   };
   
  +static void pnv_eeh_vf_final_fixup(struct pci_dev *pdev)
  +{
  +struct pci_dn *pdn = pci_get_pdn(pdev);
  +
  +if (!pdev-is_virtfn)
  +return;
  +
  +/*
  + * The following operations will fail if VF's sysfs files
  + * aren't created or its resources aren't finalized.
  + */
 
 I don't understand this comment.  The following operations seems to refer
 to eeh_add_device_early() and eeh_add_device_late(), and
 VF's sysfs files being created seems to refer to eeh_sysfs_add_device().
 
 So the comment suggests that eeh_add_device_early() and
 eeh_add_device_late() will fail because they're called before
 eeh_sysfs_add_device().  So I think you must be talking about some other
 following operations, not eeh_add_device_early() and
 eeh_add_device_late().
 
 Sorry for this confusion.
 
 The comment here wants to say the eeh_sysfs_add_device() will fail if the 
 VF's
 sysfs is not created well. Or it will fail if the VF's resources are not set
 properly, since we would cache the VF's BAR in eeh_add_device_late().
 
 Gavin,
 
 If my understanding is not correct please let me know.
 
 
 It's correct. The following operations refers to eeh_add_device_late()
 and eeh_sysfs_add_device(). The former one requires the resources for
 one particular PCI device (VF here) are finalized (assigned). 
 eeh_sysfs_add_device()
 will fail if the sysfs entry for the PCI device isn't populated yet.

eeh_add_device_late() contains several things that read config space:
eeh_save_bars() caches the entire config header, and
eeh_addr_cache_insert_dev() looks at the device resources (which are
determined by BARs in config space).  I think this is an error-prone

RE: [PATCH 2/2] rheap: move rheap.c from arch/powerpc/lib/ to lib/

2015-06-03 Thread Zhao Qiang
On Thu, 2015-05-28 at 1:37AM +0800, Wood Scott wrote:


 -Original Message-
 From: Wood Scott-B07421
 Sent: Thursday, May 28, 2015 1:37 AM
 To: Zhao Qiang-B45475
 Cc: linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Xie Xiaobo-R63061
 Subject: Re: [PATCH 2/2] rheap: move rheap.c from arch/powerpc/lib/ to
 lib/
 
 On Wed, 2015-05-27 at 17:12 +0800, Zhao Qiang wrote:
  qe need to use the rheap, so move it to public directory.
 
 You've been previously asked to use lib/genalloc.c rather than introduce
 duplicate functionality into /lib.  NACK.

Can't use lib/genalloc.c instead of rheap.c.
Qe need to alloc muram of qe, not DIMM.

 
 Also, please don't use coreid-based e-mail addresses with no real names
 associated, which makes it hard to tell who has been CCed.
 
 -Scott
Best Regards
Zhao Qiang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFT v2 04/48] powerpc, irq: Use irq_desc_get_xxx() to avoid redundant lookup of irq_desc

2015-06-03 Thread Jiang Liu
Use irq_desc_get_xxx() to avoid redundant lookup of irq_desc while we
already have a pointer to corresponding irq_desc.

Note: this patch has been queued for 4.2 by Michael Ellerman 
m...@ellerman.id.au

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 arch/powerpc/platforms/52xx/mpc52xx_gpt.c |2 +-
 arch/powerpc/platforms/cell/axon_msi.c|2 +-
 arch/powerpc/platforms/embedded6xx/hlwd-pic.c |2 +-
 arch/powerpc/sysdev/uic.c |2 +-
 arch/powerpc/sysdev/xics/xics-common.c|2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c 
b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
index c949ca055712..63016621aff8 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
@@ -193,7 +193,7 @@ static struct irq_chip mpc52xx_gpt_irq_chip = {
 
 void mpc52xx_gpt_irq_cascade(unsigned int virq, struct irq_desc *desc)
 {
-   struct mpc52xx_gpt_priv *gpt = irq_get_handler_data(virq);
+   struct mpc52xx_gpt_priv *gpt = irq_desc_get_handler_data(desc);
int sub_virq;
u32 status;
 
diff --git a/arch/powerpc/platforms/cell/axon_msi.c 
b/arch/powerpc/platforms/cell/axon_msi.c
index 623bd961465a..817d0e6747ea 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -95,7 +95,7 @@ static void msic_dcr_write(struct axon_msic *msic, unsigned 
int dcr_n, u32 val)
 static void axon_msi_cascade(unsigned int irq, struct irq_desc *desc)
 {
struct irq_chip *chip = irq_desc_get_chip(desc);
-   struct axon_msic *msic = irq_get_handler_data(irq);
+   struct axon_msic *msic = irq_desc_get_handler_data(desc);
u32 write_offset, msi;
int idx;
int retry = 0;
diff --git a/arch/powerpc/platforms/embedded6xx/hlwd-pic.c 
b/arch/powerpc/platforms/embedded6xx/hlwd-pic.c
index c269caee58f9..9dd154d6f89a 100644
--- a/arch/powerpc/platforms/embedded6xx/hlwd-pic.c
+++ b/arch/powerpc/platforms/embedded6xx/hlwd-pic.c
@@ -124,7 +124,7 @@ static void hlwd_pic_irq_cascade(unsigned int cascade_virq,
  struct irq_desc *desc)
 {
struct irq_chip *chip = irq_desc_get_chip(desc);
-   struct irq_domain *irq_domain = irq_get_handler_data(cascade_virq);
+   struct irq_domain *irq_domain = irq_desc_get_handler_data(desc);
unsigned int virq;
 
raw_spin_lock(desc-lock);
diff --git a/arch/powerpc/sysdev/uic.c b/arch/powerpc/sysdev/uic.c
index 7c37157d4c24..e763fe215cf5 100644
--- a/arch/powerpc/sysdev/uic.c
+++ b/arch/powerpc/sysdev/uic.c
@@ -198,7 +198,7 @@ void uic_irq_cascade(unsigned int virq, struct irq_desc 
*desc)
 {
struct irq_chip *chip = irq_desc_get_chip(desc);
struct irq_data *idata = irq_desc_get_irq_data(desc);
-   struct uic *uic = irq_get_handler_data(virq);
+   struct uic *uic = irq_desc_get_handler_data(desc);
u32 msr;
int src;
int subvirq;
diff --git a/arch/powerpc/sysdev/xics/xics-common.c 
b/arch/powerpc/sysdev/xics/xics-common.c
index 878a54036a25..76be7b00dd80 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -227,7 +227,7 @@ void xics_migrate_irqs_away(void)
 
/* Locate interrupt server */
server = -1;
-   ics = irq_get_chip_data(virq);
+   ics = irq_desc_get_chip_data(desc);
if (ics)
server = ics-get_server(ics, irq);
if (server  0) {
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] rtc/rtc-opal: Disable rtc-alarms when opal doesn't support tpo

2015-06-03 Thread Neelesh Gupta



On 06/03/2015 10:21 AM, Vaibhav Jain wrote:

rtc-opal driver provides support for rtc alarms via
times-power-on(tpo). However some platforms like BML use a fake rtc
clock and don't support tpo. Such platforms are indicated by the missing
'has-tpo' property in the device tree.

Current implementation however enables callback for
rtc_class_ops.read/set alarm irrespective of the tpo support from the
platform. This results in a failed opal call when kernel tries to read
an existing alarms via opal_get_tpo_time during rtc device registration.

This patch fixes this issue by setting opal_rtc_ops.read/set_alarm
callback pointers only when tpo is supported.

Signed-off-by: Vaibhav Jain vaib...@linux.vnet.ibm.com


Acked-by: Neelesh Gupta neele...@linux.vnet.ibm.com

Thanks,
Neelesh.


---
  drivers/rtc/rtc-opal.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
index 7061dca..1125641 100644
--- a/drivers/rtc/rtc-opal.c
+++ b/drivers/rtc/rtc-opal.c
@@ -190,11 +190,9 @@ exit:
return rc;
  }

-static const struct rtc_class_ops opal_rtc_ops = {
+static struct rtc_class_ops opal_rtc_ops = {
.read_time  = opal_get_rtc_time,
.set_time   = opal_set_rtc_time,
-   .read_alarm = opal_get_tpo_time,
-   .set_alarm  = opal_set_tpo_time,
  };

  static int opal_rtc_probe(struct platform_device *pdev)
@@ -202,8 +200,11 @@ static int opal_rtc_probe(struct platform_device *pdev)
struct rtc_device *rtc;

if (pdev-dev.of_node  of_get_property(pdev-dev.of_node, has-tpo,
-NULL))
+NULL)) {
device_set_wakeup_capable(pdev-dev, true);
+   opal_rtc_ops.read_alarm = opal_get_tpo_time;
+   opal_rtc_ops.set_alarm = opal_set_tpo_time;
+   }

rtc = devm_rtc_device_register(pdev-dev, DRVNAME, opal_rtc_ops,
   THIS_MODULE);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev