Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread Sachin Sant

> I also checked all the other .ko files and they were properly aligned. So I 
> think this should hopefully work, and I like that its not a per-arch fix.
> 
> Sachin, sorry to bother you again, but I'm hoping you can try David's latest 
> patch to scripts/module-common.lds, just to test in your setup.

I tested the patch on 2 different systems where I ran into this problem. In 
both cases
the system boots without any warning. A quick module load/unload test also 
worked
correctly.

Tested-by: Sachin Sant 

Thanks
-Sachin



Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread Michael Ellerman
Jason Baron  writes:
...
> I also checked all the other .ko files and they were properly aligned. 
> So I think this should hopefully work, and I like that its not a 
> per-arch fix.
>
> Sachin, sorry to bother you again, but I'm hoping you can try David's 
> latest patch to scripts/module-common.lds, just to test in your setup.

It does fix the problem.

I was reproducing with crc_t10dif:

[  695.890552] [ cut here ]
[  695.890709] WARNING: CPU: 15 PID: 3019 at ../kernel/jump_label.c:287 
static_key_set_entries+0x74/0xa0
[  695.890710] Modules linked in: crc_t10dif(+) crct10dif_generic 
crct10dif_common ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc dm_thin_pool 
dm_persistent_data dm_bio_prison dm_bufio libcrc32c kvm virtio_balloon 
binfmt_misc autofs4 virtio_net virtio_pci virtio_ring virtio

Which had:

  [21] __jump_table  PROGBITS 0004e8 18 00  WA  
0   0  1


And now has:

  [18] __jump_table  PROGBITS 0004d0 18 00  WA  
0   0  8

And all other modules have an alignment of 8 on __jump_table, as expected.

I'm inclined to merge a version of the balign patch for powerpc anyway,
just to be on the safe side. I guess the old code was coping fine with
the unaligned keys, but it still makes me nervous.

cheers


Re: [PATCH v2] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails

2017-02-28 Thread Michael Ellerman
Stewart Smith  writes:

> Vipin K Parashar  writes:
>> Added check for OPAL_WRONG_STATE error code returned from OPAL.
>> Currently Linux flashes "unexpected error" over console for this
>> error. This will avoid throwing such message and return I/O error
>> for such OPAL failures.
>>
>> Signed-off-by: Vipin K Parashar 
>> ---
>> Changes in v2:
>>  - Added log message indicating sleeping/offline core
>>for OPAL_WRONG_STATE
>>
>>  arch/powerpc/platforms/powernv/opal.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/opal.c 
>> b/arch/powerpc/platforms/powernv/opal.c
>> index 86d9fde..8af230e 100644
>> --- a/arch/powerpc/platforms/powernv/opal.c
>> +++ b/arch/powerpc/platforms/powernv/opal.c
>> @@ -869,8 +869,11 @@ int opal_error_code(int rc)
>>  case OPAL_UNSUPPORTED:  return -EIO;
>>  case OPAL_HARDWARE: return -EIO;
>>  case OPAL_INTERNAL_ERROR:   return -EIO;
>> +case OPAL_WRONG_STATE:
>> +pr_notice("%s: Core sleeping/offline\n", __func__);
>> +return -EIO;
>
> Since this is part of opal_error_code() though, this will be printed for
> any OPAL call that returns that.
>
> Why not have the sensor code do this:
>
> rc = opal_sensor_read(foo)
> if (rc == OPAL_WRONG_STATE)
>return -EIO;
> else
>return oal_error_code(rc);
>
> ?

Yeah, that's exactly what I said to do, though perhaps I didn't say it
clearly enough?

   ... changing just opal_get_sensor_data() to handle
   OPAL_WRONG_STATE would be OK, with a comment explaining that we might be
   asked to read a sensor on an offline CPU and we aren't able to detect
   that.

cheers


[GIT PULL] Please pull powerpc/linux.git powerpc-4.11-2 tag

2017-02-28 Thread Michael Ellerman
Hi Linus,

Please pull the second set of powerpc updates for 4.11.

This includes an update of the disassembly code in xmon, from binutils. We've
received permission from all the authors of the relevant binutils changes to
relicense their changes to the relevant files from GPLv3 to GPLv2. That was the
reason I split the pull request in two, so I could make 100% sure we had all
those approvals.

The rest is fairly boring arch stuff, plus a few changes to the pnv-php PCI
hotplug driver, which Bjorn has said he's OK with us merging.

This does generate a fairly ugly conflict in asm-offsets.c, between changes in
your tree and a big patch I merged to switch that file to use the OFFSET macro.
In hindsight I should have done the OFFSET change after rc1.

The resolution is to drop TSK_STACK_CANARY which was removed in your tree, and
then for both 32 & 64-bit accounting.user_time was renamed to accounting.utime,
and accounting.system_time was renamed to accounting.stime.

Stephen has the correct resolution in linux-next:

http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/diff/arch/powerpc/kernel/asm-offsets.c?id=99b07077f0c2cae8d4ff7263bf45df49f79f4c0d=9f3768e02335ddd6ebe1d85d5cb3a68ee6264004


cheers


The following changes since commit 438e69b52be776c035aa2a851ccc1709033d729b:

  powerpc/mm/radix: Skip ptesync in pte update helpers (2017-02-15 20:02:41 
+1100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.11-2

for you to fetch changes up to 9f3768e02335ddd6ebe1d85d5cb3a68ee6264004:

  powerpc: Remove leftover cputime_to_nsecs call causing build error 
(2017-02-23 08:33:20 +1100)


powerpc updates for 4.11 part 2

Highlights include:

 - An update of the disassembly code used by xmon to the latest versions in
   binutils. We've received permission from all the authors of the relevant
   binutils changes to relicense their changes to the relevant files from GPLv3
   to GPLv2, for inclusion in Linux. Thanks to Peter Bergner for doing the leg
   work to get permission from everyone.

 - Addition of the "architected" Power9 CPU table entry, allowing us to boot
   in Power9 architected mode under a hypervisor.

 - Updates to the Power9 PMU code.

 - Implementation of clear_bit_unlock_is_negative_byte() to optimise
   unlock_page().

 - Freescale updates from Scott: "Highlights include 8xx breakpoints and perf,
   t1042rdb display support, and board updates."

Thanks to:
  Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas Miller,
  Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan, Michael Roth, Nathan
  Fontenot, Naveen N. Rao, Nicholas Piggin, Peter Bergner, Paul E. McKenney,
  Rashmica Gupta, Russell Currey, Sahil Mehta, Stewart Smith.


Al Viro (1):
  powerpc/spufs: Get rid of broken fasync stuff

Andrew Donnellan (1):
  cxl: fix nested locking hang during EEH hotplug

Aneesh Kumar K.V (1):
  powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU

Balbir Singh (3):
  powerpc/xmon: Update ppc-dis/opc.c and ppc.h
  powerpc/xmon: Apply binutils changes to upgrade disassembly
  powerpc/xmon: Enable disassembly files (compilation changes)

Christophe Leroy (4):
  powerpc/32: Enable HW_BREAKPOINT on BOOK3S
  powerpc/8xx: Implement hw_breakpoint
  powerpc/32: Remove FIX_SRR1
  powerpc/8xx: Perf events on PPC 8xx

Douglas Miller (1):
  powerpc/xmon: Dump memory in CPU endian format

Frédéric Weisbecker (1):
  powerpc: Remove leftover cputime_to_nsecs call causing build error

Gavin Shan (9):
  drivers/pci/hotplug: Handle presence detection change properly
  drivers/pci/hotplug: Fix initial state for empty slot
  drivers/pci/hotplug: Mask PDC interrupt if required
  pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot()
  pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts
  pci/hotplug/pnv-php: Disable MSI and PCI device properly
  powerpc/mm: Fix typo in set_pte_at()
  powerpc/kernel: Remove error message in pcibios_setup_phb_resources()
  powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable()

Jason Jin (1):
  powerpc/85xx: Enable display support for t1042rdb

Madhavan Srinivasan (8):
  powerpc/perf: Factor out event_alternative function
  powerpc/perf: Add PM_INST_DISP event to Power9 event list
  powerpc/perf: Add alternative event table and function for power9
  powerpc/perf: Use PM_INST_DISP for generic instructions sample
  powerpc/perf: Use Instruction Counter value
  powerpc/perf: Add restrictions to PMC5 in power9 DD1
  powerpc/perf: Avoid FAB_*_MATCH checks for power9
  powerpc/perf: use is_kernel_addr macro in perf_get_misc_flags()

Michael Ellerman (6):
  powerpc/64e: Fix bogus usage of 

Re: [PATCH V3 2/2] powerpc: Update to new option-vector-5 format for CAS

2017-02-28 Thread Paul Mackerras
On Tue, Feb 28, 2017 at 05:03:48PM +1100, Suraj Jitindar Singh wrote:
> On POWER9 the ibm,client-architecture-support (CAS) negotiation process
> has been updated to change how the host to guest negotiation is done for
> the new hash/radix mmu as well as the nest mmu, process tables and guest
> translation shootdown (GTSE).
> 
> The host tells the guest which options it supports in
> ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
> to request in the CAS call and these are agreed to in the
> ibm,architecture-vec-5 property of the chosen node.
> 
> Thus we read ibm,arch-vec-5-platform-support and make our selection before
> calling CAS. We then parse the ibm,architecture-vec-5 property of the
> chosen node to check whether we should run as hash or radix.
> 
> ibm,arch-vec-5-platform-support format:
> 
> index value pairs:  ... 
> 
> index: Option vector 5 byte number
> val:   Some representation of supported values
> 
> Signed-off-by: Suraj Jitindar Singh 

It would be good if the description mentioned the PAPR architecture
change request "CAS option vector additions for P9".

Apart from that,

Acked-by: Paul Mackerras 


Re: [RFC 3/3] cxl: Reset freeze counters before adapter PERST for flashing new image

2017-02-28 Thread Russell Currey
[resending this since it didn't get delivered to the list]

On Tue, 2017-02-28 at 12:52 +0530, Vaibhav Jain wrote:
> The patch resets the freeze counter on eeh_pe struct for PHB
> associated with the cxl pci adapter. This would enable re-flashing of
> the cxl-adapter beyond the default limit of 5.
> 
> Signed-off-by: Vaibhav Jain 
> ---
>  drivers/misc/cxl/pci.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 679afc9..3b14688 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "cxl.h"
>  #include 
> @@ -1229,6 +1230,8 @@ static void cxl_pci_remove_afu(struct cxl_afu *afu)
>  int cxl_pci_reset(struct cxl *adapter)
>  {
>   struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
> + struct eeh_dev *eehdev = pci_dev_to_eeh_dev(dev);
> + struct eeh_pe *devpe = eeh_dev_to_pe(eehdev);

EEH code typically uses "edev" and "pe" for these variable names

>   int rc;
>  
>   if (adapter->perst_same_image) {
> @@ -1242,6 +1245,18 @@ int cxl_pci_reset(struct cxl *adapter)
>   /* the adapter is about to be reset, so ignore errors */
>   cxl_data_cache_flush(adapter);
>  
> + /* If loading a new image, reset freeze counters for the PHB
> +  * associated with the adapter.
> +  */
> + if (devpe && adapter->perst_loads_image) {
> + /* Find the pe associated with the device PHB */
> + while (devpe->parent != NULL && (devpe->type & EEH_PE_PHB) ==
> 0)
> + devpe = devpe->parent;
> +
> + dev_info(>dev, "Resetting freeze counters for the
> PHB\n");

Would be good to mention "EEH" here to help with grepping, alternatively a
similar message could be printed in eeh_pe_reset_freeze_counter() displaying the
PHB information.

> + eeh_pe_reset_freeze_counter(devpe);
> + }
> +
>   /* pcie_warm_reset requests a fundamental pci reset which includes a
>    * PERST assert/deassert.  PERST triggers a loading of the image
>    * if "user" or "factory" is selected in sysfs */



Re: [RFC 2/3] powerpc/eeh: Introduce function eeh_pe_reset_freeze_counter

2017-02-28 Thread Russell Currey
On Tue, 2017-02-28 at 12:32 +0530, Vaibhav Jain wrote:
> This patch introduces function eeh_pe_reset_freeze_counter which can
> be used to reset the PE's freeze count variable outside eeh code. This
> is useful for devices that can acquire a different personality after
> a PERST event (e.g FPGA Adapters). Presently an existing freeze
> count for an adapter with personality N will be taken into account
> when the adapter acquired personality N+1.
> 
> By calling eeh_pe_reset_freeze_counter drivers can reset the freeze
> counter for an adapter once it has acquired a new personality and
> ideally wont be plagued by the failures similar to the one before.

Same comment as before about adding () to the end of function names

> 
> Signed-off-by: Vaibhav Jain 
> ---
>  arch/powerpc/include/asm/eeh.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 68806be..19ac6d0 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -266,6 +266,13 @@ struct eeh_pe *eeh_pe_get(struct eeh_dev *edev);
>  int eeh_add_to_parent_pe(struct eeh_dev *edev);
>  int eeh_rmv_from_parent_pe(struct eeh_dev *edev);
>  int eeh_pe_update_freeze_counter(struct eeh_pe *pe);
> +
> +/* Reset the PE freeze counter */

I would like to see a comment here noting that doing this is in general a bad
idea, and this shouldn't be called for regular PCI devices.

> +static inline void eeh_pe_reset_freeze_counter(struct eeh_pe *pe)
> +{
> + pe->freeze_count = 0;
> +}
> +
>  void *eeh_pe_traverse(struct eeh_pe *root,
>   eeh_traverse_func fn, void *flag);
>  void *eeh_pe_dev_traverse(struct eeh_pe *root,
> @@ -339,6 +346,8 @@ static inline int eeh_check_failure(const volatile void
> __iomem *token)
>   return 0;
>  }
>  
> +static inline void eeh_pe_reset_freeze_counter(struct eeh_pe *pe) { }
> +
>  #define eeh_dev_check_failure(x) (0)
>  
>  static inline void eeh_addr_cache_build(void) { }



Re: [RFC 1/3] powerpc/eeh: Refactor eeh_pe_update_time_stamp to update freeze_count

2017-02-28 Thread Russell Currey
On Tue, 2017-02-28 at 12:32 +0530, Vaibhav Jain wrote:
> This patch introduces a new function named
> eeh_pe_update_freeze_counter replacing existing function
> eeh_pe_update_time_stamp. The new function also manages the value of
> freeze_count along with tstamp to track the number of times the PE
> froze in last one hour and if the freeze_count > eeh_max_freezes then
> reports an error(-ENOTRECOVERABLE) to indicate that the PE should be
> permanently disabled.

You should add parens to the end of function names in commit messages, i.e.
eeh_pe_update_freeze_counter().  Same goes for the title

> 
> This patch should not introduce any behavioral change.
> 
> Signed-off-by: Vaibhav Jain 
> ---
>  arch/powerpc/include/asm/eeh.h   |  2 +-
>  arch/powerpc/kernel/eeh_driver.c | 20 +++-
>  arch/powerpc/kernel/eeh_pe.c | 50 ++-
> -
>  3 files changed, 37 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 8e37b71..68806be 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -265,7 +265,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
>  struct eeh_pe *eeh_pe_get(struct eeh_dev *edev);
>  int eeh_add_to_parent_pe(struct eeh_dev *edev);
>  int eeh_rmv_from_parent_pe(struct eeh_dev *edev);
> -void eeh_pe_update_time_stamp(struct eeh_pe *pe);
> +int eeh_pe_update_freeze_counter(struct eeh_pe *pe);
>  void *eeh_pe_traverse(struct eeh_pe *root,
>   eeh_traverse_func fn, void *flag);
>  void *eeh_pe_dev_traverse(struct eeh_pe *root,
> diff --git a/arch/powerpc/kernel/eeh_driver.c
> b/arch/powerpc/kernel/eeh_driver.c
> index b948871..326b4e4 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -739,10 +739,9 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>   return;
>   }
>  
> - eeh_pe_update_time_stamp(pe);
> - pe->freeze_count++;
> - if (pe->freeze_count > eeh_max_freezes)
> - goto excess_failures;
> + /* Update freeze counters and see if we have tripped max-freeze limit
> */
> + if (eeh_pe_update_freeze_counter(pe) < 0)

I would prefer dropping the "< 0" check here

> + goto perm_error;
>   pr_warn("EEH: This PCI device has failed %d times in the last
> hour\n",
>   pe->freeze_count);
>  
> @@ -872,19 +871,6 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>  
>   return;
>  
> -excess_failures:
> - /*
> -  * About 90% of all real-life EEH failures in the field
> -  * are due to poorly seated PCI cards. Only 10% or so are
> -  * due to actual, failed cards.
> -  */
> - pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n"
> -    "last hour and has been permanently disabled.\n"
> -    "Please try reseating or replacing it.\n",
> - pe->phb->global_number, pe->addr,
> - pe->freeze_count);
> - goto perm_error;
> -
>  hard_fail:
>   pr_err("EEH: Unable to recover from failure from PHB#%x-PE#%x.\n"
>      "Please try reseating or replacing it\n",
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index cc4b206..cf70a8b 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -504,30 +504,46 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev)
>  }
>  
>  /**
> - * eeh_pe_update_time_stamp - Update PE's frozen time stamp
> + * eeh_pe_update_freeze_counter - Update PE's frozen time stamp
> + * and freeze counter
>   * @pe: EEH PE
>   *
> - * We have time stamp for each PE to trace its time of getting
> - * frozen in last hour. The function should be called to update
> - * the time stamp on first error of the specific PE. On the other
> - * handle, we needn't account for errors happened in last hour.
> + * We have a freeze-counter and time stamp for each PE to trace

drop the hyphen between freeze and counter

> + * number of times the PE was frozen in last one hour. This function

this should probably "in the last hour", I know it was copied from the existing
comment but since you're in here, you may as well fix it

> + * updates the PE's freeze counter and if its > eeh_max_freezes then

this reads awkwardly.  perhaps "This function updates the PE's freeze counter
and returns an error if the number of freezes is greater than eeh_max_freezes." 

> + * returns an error. The function should be called to once every-time
> + * a specific PE freezes.

a hyphen between "every time" is unnecessary

>   */
> -void eeh_pe_update_time_stamp(struct eeh_pe *pe)
> +int eeh_pe_update_freeze_counter(struct eeh_pe *pe)
>  {
>   struct timeval tstamp;
>  
> - if (!pe) return;
> + if (!pe)
> + return -EINVAL;
>  
> - if (pe->freeze_count <= 0) {
> - pe->freeze_count = 0;
> - do_gettimeofday(>tstamp);
> - } 

Re: [PATCH v3] powerpc/powernv: add hdat attribute to sysfs

2017-02-28 Thread Andrew Donnellan

On 27/02/17 21:56, Michael Ellerman wrote:

+/* HDAT attribute for sysfs */
+static struct bin_attribute hdat_attr = {
+   .attr = {.name = "hdat", .mode = 0444},

 
ajd and oohal report to my office.


...ACK... :/

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH] powerpc: booke: fix boot crash due to null hugepd

2017-02-28 Thread Scott Wood
On Tue, 2017-02-28 at 14:55 +, Laurentiu Tudor wrote:
> Hi,
> 
> Some more information on the crash, inline.
> 
> On 02/17/2017 02:18 PM, Aneesh Kumar K.V wrote:
> > 
> > laurentiu.tu...@nxp.com writes:
> > 
> > > 
> > > From: Laurentiu Tudor 
> > > 
> > > On 32-bit book-e machines, hugepd_ok() does not take
> > > into account null hugepd values, causing this crash at boot:
> > > 
> > > Unable to handle kernel paging request for data at address 0x8000
> > > Faulting instruction address: 0xc00182a8
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > SMP NR_CPUS=24
> > > CoreNet Generic
> > > Modules linked in:
> > > CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW   4.10.0-rc8-
> > > 00016-g69b1f87 #11
> > > task: e505 task.stack: e5058000
> > > NIP: c00182a8 LR: c001829c CTR: 7ffe
> > > REGS: e5059c50 TRAP: 0300   Tainted: GW(4.10.0-rc8-
> > > 00016-g69b1f87)
> > > MSR: 00021002 
> > >    CR: 88428e82  XER: 
> > > DEAR: 8000 ESR: 
> > > GPR00: c0107510 e5059d00 e505 8000 bff1 e5059d0c e5059d08
> > > 2017
> > > GPR08:     28428e82  c00027d0
> > > 
> > > GPR16:   88a28e82 2000 48422e82  88a28e84
> > > dd004000
> > > GPR24: e5059e38   bff1 dd004000 0001 00029002
> > > bff1
> > > NIP [c00182a8] follow_huge_addr+0x38/0xf0
> > > LR [c001829c] follow_huge_addr+0x2c/0xf0
> > > Call Trace:
> > > [e5059d00] [e5059d00] 0xe5059d00 (unreliable)
> > > [e5059d20] [c0107510] follow_page_mask+0x40/0x3c0
> > > [e5059d80] [c0107958] __get_user_pages+0xc8/0x420
> > > [e5059de0] [c010817c] get_user_pages_remote+0x8c/0x230
> > > [e5059e30] [c013f170] copy_strings+0x110/0x3a0
> > > [e5059ea0] [c013f42c] copy_strings_kernel+0x2c/0x50
> > > [e5059ec0] [c0141324] do_execveat_common+0x474/0x620
> > > [e5059f10] [c01414fc] do_execve+0x2c/0x40
> > > [e5059f20] [c0001f68] try_to_run_init_process+0x18/0x60
> > > [e5059f30] [c000289c] kernel_init+0xcc/0x120
> > > [e5059f40] [c000f1e8] ret_from_kernel_thread+0x5c/0x64
> > > Instruction dump:
> > > bfc10018 7c9f2378 90010024 7fc000a6 7c000146 80630020 38a1000c 38c10008
> > > 4bfff869 2c03 41c20090 81210008 <8143> 81630004 3860ffea
> > > 2f89
> > > ---[ end trace 4bf94e15fd9fa824 ]---
> > 
> > Which code path is that. That null should be filtered by the if
> > (pmd_none(pmd)) check in find_linux_pte_or_hugepte right ?
> The crash happens when __find_linux_pte_or_hugepte() calls hugepd_ok(),
> on this line [1]. It's triggered when __find_linux_pte_or_hugepte() is
> first called, when the kernel tries to spawn the init process. The input
> effective address (ea arg) is bff1. This is the call stack:

What is the pmd value?  There's a pmd_none() check before that line.

That said, regardless of what's going wrong here, it would be simpler and more
robust if is_hugepd() returned false for empty ptes rather than assuming the
caller explicitly checked pmd_none().

-Scott



Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread Jason Baron

On 02/28/2017 03:15 PM, David Daney wrote:

On 02/28/2017 11:34 AM, Jason Baron wrote:



On 02/28/2017 02:22 PM, David Daney wrote:

On 02/28/2017 11:05 AM, David Daney wrote:

On 02/28/2017 10:39 AM, Jason Baron wrote:



[...]

I suspect that the alignment of the __jump_table section in the .ko
files is not correct, and you are seeing some sort of problem due to
that.




Hi,

Yes, if you look at the trace that Sachin sent the module being loaded
that does the WARN_ON() is nfsd.ko.

That module from Sachin's trace has:

  [31] __jump_table  PROGBITS 03fd77
c0
18 WAM  0   0  1


The problem is then the section alignment (last column) for power.

On mips with no patches applied, we get:

  [17] __jump_table  PROGBITS 00d2c0 48
00  WA  0   0  8

Look, proper alignment!

The question I have is why do the power ".llong" and ".long" assembler
directives not force section alignment?  Is there an alternative that
could be used that would result in the proper alignment?  Would ".word"
work?

If not, then I would say patch only power with your balign thing.
8-byte
alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel



I think the proper fix is either:

A) Modify scripts/module-common.lds to force __jump_table alignment for
all architectures.

B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment
for powerpc only.

David.




Ok, I can try adding it to the linger script.

FWIW, here is my before and after with the .balign thing for the nfsd.ko
module on powperc (using a cross-compiler):

before:

  [31] __jump_table  PROGBITS 03ee3e f0
00  WA  0   0  1

after:

 [31] __jump_table  PROGBITS 03ee40 f0
00  WA  0   0  4



Try the (lightly tested) attached.

If it works and Steven likes it, perhaps someone can merge it.

David.





So before your module.lds script:

# powerpc64-linux-readelf -eW fs/nfsd/nfsd.o | grep jump 

  [31] __jump_table  PROGBITS 03edfe f0 
00  WA  0   0  1



# powerpc64-linux-readelf -eW fs/nfsd/nfsd.ko | grep jump 

  [32] __jump_table  PROGBITS 044046 f0 
00  WA  0   0  1


With your patch:




# powerpc64-linux-readelf -eW fs/nfsd/nfsd.o | grep jump 

  [31] __jump_table  PROGBITS 03edfe f0 
00  WA  0   0  1



# powerpc64-linux-readelf -eW fs/nfsd/nfsd.ko | grep jump 

  [18] __jump_table  PROGBITS 03e358 f0 
00  WA  0   0  8



I also checked all the other .ko files and they were properly aligned. 
So I think this should hopefully work, and I like that its not a 
per-arch fix.


Sachin, sorry to bother you again, but I'm hoping you can try David's 
latest patch to scripts/module-common.lds, just to test in your setup.


Thanks,

-Jason


Re: [PATCH V3 1/2] powerpc: Parse the command line before calling CAS

2017-02-28 Thread Balbir Singh


On 28/02/17 17:03, Suraj Jitindar Singh wrote:
> On POWER9 the hypervisor requires the guest to decide whether it would
> like to use a hash or radix mmu model at the time it calls
> ibm,client-architecture-support (CAS) based on what the hypervisor has
> said it's allowed to do. It is possible to disable radix by passing
> "disable_radix" on the command line. The next patch will add support for
> the new CAS format, thus we need to parse the command line before calling
> CAS so we can correctly select which mmu we would like to use.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Reviewed-by: Paul Mackerras 
> 
> ---
> 
> V1 -> V3:
>  - Reword commit message for clarity. No functional change
> ---

Acked-by: Balbir Singh 


Re: [PATCH v2] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails

2017-02-28 Thread Stewart Smith
Vipin K Parashar  writes:
> Added check for OPAL_WRONG_STATE error code returned from OPAL.
> Currently Linux flashes "unexpected error" over console for this
> error. This will avoid throwing such message and return I/O error
> for such OPAL failures.
>
> Signed-off-by: Vipin K Parashar 
> ---
> Changes in v2:
>  - Added log message indicating sleeping/offline core
>for OPAL_WRONG_STATE
>
>  arch/powerpc/platforms/powernv/opal.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal.c 
> b/arch/powerpc/platforms/powernv/opal.c
> index 86d9fde..8af230e 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -869,8 +869,11 @@ int opal_error_code(int rc)
>   case OPAL_UNSUPPORTED:  return -EIO;
>   case OPAL_HARDWARE: return -EIO;
>   case OPAL_INTERNAL_ERROR:   return -EIO;
> + case OPAL_WRONG_STATE:
> + pr_notice("%s: Core sleeping/offline\n", __func__);
> + return -EIO;

Since this is part of opal_error_code() though, this will be printed for
any OPAL call that returns that.

Why not have the sensor code do this:

rc = opal_sensor_read(foo)
if (rc == OPAL_WRONG_STATE)
   return -EIO;
else
   return oal_error_code(rc);

?

>   default:
> - pr_err("%s: unexpected OPAL error %d\n", __func__, rc);
> + pr_err("%s: Unexpected OPAL error %d\n", __func__, rc);

Do we need this?

-- 
Stewart Smith
OPAL Architect, IBM.



Re: [Patch v4] powerpc/powernv: add hdat attribute to sysfs

2017-02-28 Thread Stewart Smith
Matt Brown  writes:
> The HDAT data area is consumed by skiboot and turned into a device-tree.
> In some cases we would like to look directly at the HDAT, so this patch
> adds a sysfs node to allow it to be viewed.  This is not possible through
> /dev/mem as it is reserved memory which is stopped by the /dev/mem filter.
>
> Signed-off-by: Matt Brown 
> ---
> Changes between v3 and v4:
>   - changed sysfs attribute permissions from 0444 to 0400
>   - fixed makefile to be on same line
>   - fixed authorship/copyright info
>   - re-ordered includes
>   - changed hdat_info struct to a static struct
>
> ---
>  arch/powerpc/include/asm/opal.h|  1 +
>  arch/powerpc/platforms/powernv/Makefile|  2 +-
>  arch/powerpc/platforms/powernv/opal-hdat.c | 60 
> ++
>  arch/powerpc/platforms/powernv/opal.c  |  2 +
>  4 files changed, 64 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-hdat.c
>
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 5c7db0f..b26944e 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -277,6 +277,7 @@ extern int opal_async_comp_init(void);
>  extern int opal_sensor_init(void);
>  extern int opal_hmi_handler_init(void);
>  extern int opal_event_init(void);
> +extern void opal_hdat_sysfs_init(void);
>
>  extern int opal_machine_check(struct pt_regs *regs);
>  extern bool opal_mce_check_early_recovery(struct pt_regs *regs);
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index b5d98cb..3826b70 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o 
> opal-irqchip.o
> -obj-y+= opal-kmsg.o
> +obj-y+= opal-kmsg.o opal-hdat.o
>
>  obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o
>  obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o
> diff --git a/arch/powerpc/platforms/powernv/opal-hdat.c 
> b/arch/powerpc/platforms/powernv/opal-hdat.c
> new file mode 100644
> index 000..19647fd
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-hdat.c
> @@ -0,0 +1,60 @@
> +/*
> + * PowerNV OPAL HDAT interface
> + *
> + * Copyright 2017, Matt Brown, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static struct {
> + char *base;
> + u64 size;
> +} hdat_info;
> +
> +/* Read function for HDAT attribute in sysfs */
> +static ssize_t hdat_read(struct file *file, struct kobject *kobj,
> +  struct bin_attribute *bin_attr, char *to,
> +  loff_t pos, size_t count)
> +{
> + if (!hdat_info.base)
> + return -ENODEV;
> +
> + return memory_read_from_buffer(to, count, , hdat_info.base,
> + hdat_info.size);
> +}
> +
> +/* HDAT attribute for sysfs */
> +static struct bin_attribute hdat_attr = {
> + .attr = {.name = "hdat", .mode = 0400},
> + .read = hdat_read
> +};

Why not BIN_ATTR_RO (like opal_export_symmap() does) ?

(I have comments/thoughts on the OPAL side as well)

-- 
Stewart Smith
OPAL Architect, IBM.



Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread David Daney

On 02/28/2017 11:34 AM, Jason Baron wrote:



On 02/28/2017 02:22 PM, David Daney wrote:

On 02/28/2017 11:05 AM, David Daney wrote:

On 02/28/2017 10:39 AM, Jason Baron wrote:



[...]

I suspect that the alignment of the __jump_table section in the .ko
files is not correct, and you are seeing some sort of problem due to
that.




Hi,

Yes, if you look at the trace that Sachin sent the module being loaded
that does the WARN_ON() is nfsd.ko.

That module from Sachin's trace has:

  [31] __jump_table  PROGBITS 03fd77 c0
18 WAM  0   0  1


The problem is then the section alignment (last column) for power.

On mips with no patches applied, we get:

  [17] __jump_table  PROGBITS 00d2c0 48
00  WA  0   0  8

Look, proper alignment!

The question I have is why do the power ".llong" and ".long" assembler
directives not force section alignment?  Is there an alternative that
could be used that would result in the proper alignment?  Would ".word"
work?

If not, then I would say patch only power with your balign thing. 8-byte
alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel



I think the proper fix is either:

A) Modify scripts/module-common.lds to force __jump_table alignment for
all architectures.

B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment
for powerpc only.

David.




Ok, I can try adding it to the linger script.

FWIW, here is my before and after with the .balign thing for the nfsd.ko
module on powperc (using a cross-compiler):

before:

  [31] __jump_table  PROGBITS 03ee3e f0
00  WA  0   0  1

after:

 [31] __jump_table  PROGBITS 03ee40 f0
00  WA  0   0  4



Try the (lightly tested) attached.

If it works and Steven likes it, perhaps someone can merge it.

David.





>From 89d4deafbc920351b93afb1ac4b4124995e1f19d Mon Sep 17 00:00:00 2001
From: David Daney 
Date: Tue, 28 Feb 2017 12:10:02 -0800
Subject: [PATCH] module: set __jump_table alignment to 8

For powerpc the __jump_table section in modules is not aligned, this
causes an OOPS when loading a module containing a __jump_table.

Fix by forcing __jump_table to 8, which is the same alignment used for
this section in the kernel proper.

Signed-off-by: David Daney 
---
 scripts/module-common.lds | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/module-common.lds b/scripts/module-common.lds
index 73a2c7d..53234e8 100644
--- a/scripts/module-common.lds
+++ b/scripts/module-common.lds
@@ -19,4 +19,6 @@ SECTIONS {
 
 	. = ALIGN(8);
 	.init_array		0 : { *(SORT(.init_array.*)) *(.init_array) }
+
+	__jump_table		0 : ALIGN(8) { KEEP(*(__jump_table)) }
 }
-- 
2.9.3



Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread Jason Baron


On 02/28/2017 02:22 PM, David Daney wrote:
> On 02/28/2017 11:05 AM, David Daney wrote:
>> On 02/28/2017 10:39 AM, Jason Baron wrote:
>>>
> [...]
 I suspect that the alignment of the __jump_table section in the .ko
 files is not correct, and you are seeing some sort of problem due to
 that.


>>>
>>> Hi,
>>>
>>> Yes, if you look at the trace that Sachin sent the module being loaded
>>> that does the WARN_ON() is nfsd.ko.
>>>
>>> That module from Sachin's trace has:
>>>
>>>   [31] __jump_table  PROGBITS 03fd77 c0
>>> 18 WAM  0   0  1
>>
>> The problem is then the section alignment (last column) for power.
>>
>> On mips with no patches applied, we get:
>>
>>   [17] __jump_table  PROGBITS 00d2c0 48
>> 00  WA  0   0  8
>>
>> Look, proper alignment!
>>
>> The question I have is why do the power ".llong" and ".long" assembler
>> directives not force section alignment?  Is there an alternative that
>> could be used that would result in the proper alignment?  Would ".word"
>> work?
>>
>> If not, then I would say patch only power with your balign thing. 8-byte
>> alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel
>>
> 
> I think the proper fix is either:
> 
> A) Modify scripts/module-common.lds to force __jump_table alignment for
> all architectures.
> 
> B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment
> for powerpc only.
> 
> David.
> 
>

Ok, I can try adding it to the linger script.

FWIW, here is my before and after with the .balign thing for the nfsd.ko
module on powperc (using a cross-compiler):

before:

  [31] __jump_table  PROGBITS 03ee3e f0
00  WA  0   0  1

after:

 [31] __jump_table  PROGBITS 03ee40 f0
00  WA  0   0  4

Thanks,

-Jason


> 
>>
>>>
>>> So its not the size but rather the start offset '03fd77', that is the
>>> problem here. That is what the WARN_ON triggers on, that the start of
>>> the table is not 4-byte aligned.
>>>
>>> Using a ppc cross-compiler and the ENTSIZE patch that line does not
>>> change, however if I use the initial patch posted in this thread, the
>>> start does align to 4-bytes and thus the warning goes away, as Sachin
>>> verified. In fact, without the patch I found several modules that don't
>>> start at the proper alignment, however with the patch that started this
>>> thread they were all properly aligned.
>>>
>>> In terms of the '.balign' causing holes, we originally added the
>>> '_ASM_ALIGN' to x86 for precisely this reason. See commit:
>>> ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion.
>>>
>>> In addition, we have a lot of runtime with the .balign in the tree and
>>> I'm not aware of any holes in the table. I think the code would blow up
>>> pretty badly if there were.
>>>
>>> A number of arches were already using the '.balign', and the patch I
>>> proposed simply added it to remaining ones, now that we added a
>>> WARN_ON() to catch this condition.
>>>
>>> Thanks,
>>>
>>> -Jason
>>>
>>>
>>>
>>>
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread David Daney

On 02/28/2017 11:05 AM, David Daney wrote:

On 02/28/2017 10:39 AM, Jason Baron wrote:



[...]

I suspect that the alignment of the __jump_table section in the .ko
files is not correct, and you are seeing some sort of problem due to
that.




Hi,

Yes, if you look at the trace that Sachin sent the module being loaded
that does the WARN_ON() is nfsd.ko.

That module from Sachin's trace has:

  [31] __jump_table  PROGBITS 03fd77 c0
18 WAM  0   0  1


The problem is then the section alignment (last column) for power.

On mips with no patches applied, we get:

  [17] __jump_table  PROGBITS 00d2c0 48
00  WA  0   0  8

Look, proper alignment!

The question I have is why do the power ".llong" and ".long" assembler
directives not force section alignment?  Is there an alternative that
could be used that would result in the proper alignment?  Would ".word"
work?

If not, then I would say patch only power with your balign thing. 8-byte
alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel



I think the proper fix is either:

A) Modify scripts/module-common.lds to force __jump_table alignment for 
all architectures.


B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment 
for powerpc only.


David.







So its not the size but rather the start offset '03fd77', that is the
problem here. That is what the WARN_ON triggers on, that the start of
the table is not 4-byte aligned.

Using a ppc cross-compiler and the ENTSIZE patch that line does not
change, however if I use the initial patch posted in this thread, the
start does align to 4-bytes and thus the warning goes away, as Sachin
verified. In fact, without the patch I found several modules that don't
start at the proper alignment, however with the patch that started this
thread they were all properly aligned.

In terms of the '.balign' causing holes, we originally added the
'_ASM_ALIGN' to x86 for precisely this reason. See commit:
ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion.

In addition, we have a lot of runtime with the .balign in the tree and
I'm not aware of any holes in the table. I think the code would blow up
pretty badly if there were.

A number of arches were already using the '.balign', and the patch I
proposed simply added it to remaining ones, now that we added a
WARN_ON() to catch this condition.

Thanks,

-Jason







___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel




Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread David Daney

On 02/28/2017 10:39 AM, Jason Baron wrote:



On 02/28/2017 01:16 PM, David Daney wrote:

On 02/28/2017 08:21 AM, Steven Rostedt wrote:

On Tue, 28 Feb 2017 10:25:46 +0530
Sachin Sant  wrote:


File: ./net/ipv4/xfrm4_input.o
  [12] __jump_table  PROGBITS 000639
18 18 WAM  0   0  1
File: ./net/ipv4/udplite.o
File: ./net/ipv4/xfrm4_output.o
  [ 9] __jump_table  PROGBITS 000481
18 18 WAM  0   0  1


Looks like there's some issues right there.


Those look good to me 18/18 = 1 with no remainder.  The odd numbers are
the offset of the section in the ELF file.

If you look at the stack trace, it seems that it is during module
loading.

Are the primitives for generating the tables doing something different
for the module case?  I am not familiar enough with the powerpc ABIs to
know.

Try this:

$ perl -n -e 's/\[ /\[/; my @f = split " "; print hex($f[5]) % 0x18 if
$#f > 5; print $_' <~/jump_table.log


There are no entries with size that is not a multiple of 0x18.

I think my patch to add the ENTSIZE is not doing anything here.

I suspect that the alignment of the __jump_table section in the .ko
files is not correct, and you are seeing some sort of problem due to
that.




Hi,

Yes, if you look at the trace that Sachin sent the module being loaded
that does the WARN_ON() is nfsd.ko.

That module from Sachin's trace has:

  [31] __jump_table  PROGBITS 03fd77 c0
18 WAM  0   0  1


The problem is then the section alignment (last column) for power.

On mips with no patches applied, we get:

  [17] __jump_table  PROGBITS 00d2c0 48 
00  WA  0   0  8


Look, proper alignment!

The question I have is why do the power ".llong" and ".long" assembler 
directives not force section alignment?  Is there an alternative that 
could be used that would result in the proper alignment?  Would ".word" 
work?


If not, then I would say patch only power with your balign thing. 
8-byte alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel





So its not the size but rather the start offset '03fd77', that is the
problem here. That is what the WARN_ON triggers on, that the start of
the table is not 4-byte aligned.

Using a ppc cross-compiler and the ENTSIZE patch that line does not
change, however if I use the initial patch posted in this thread, the
start does align to 4-bytes and thus the warning goes away, as Sachin
verified. In fact, without the patch I found several modules that don't
start at the proper alignment, however with the patch that started this
thread they were all properly aligned.

In terms of the '.balign' causing holes, we originally added the
'_ASM_ALIGN' to x86 for precisely this reason. See commit:
ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion.

In addition, we have a lot of runtime with the .balign in the tree and
I'm not aware of any holes in the table. I think the code would blow up
pretty badly if there were.

A number of arches were already using the '.balign', and the patch I
proposed simply added it to remaining ones, now that we added a
WARN_ON() to catch this condition.

Thanks,

-Jason








Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread Jason Baron



On 02/28/2017 01:16 PM, David Daney wrote:

On 02/28/2017 08:21 AM, Steven Rostedt wrote:

On Tue, 28 Feb 2017 10:25:46 +0530
Sachin Sant  wrote:


File: ./net/ipv4/xfrm4_input.o
  [12] __jump_table  PROGBITS 000639
18 18 WAM  0   0  1
File: ./net/ipv4/udplite.o
File: ./net/ipv4/xfrm4_output.o
  [ 9] __jump_table  PROGBITS 000481
18 18 WAM  0   0  1


Looks like there's some issues right there.


Those look good to me 18/18 = 1 with no remainder.  The odd numbers are
the offset of the section in the ELF file.

If you look at the stack trace, it seems that it is during module loading.

Are the primitives for generating the tables doing something different
for the module case?  I am not familiar enough with the powerpc ABIs to
know.

Try this:

$ perl -n -e 's/\[ /\[/; my @f = split " "; print hex($f[5]) % 0x18 if
$#f > 5; print $_' <~/jump_table.log


There are no entries with size that is not a multiple of 0x18.

I think my patch to add the ENTSIZE is not doing anything here.

I suspect that the alignment of the __jump_table section in the .ko
files is not correct, and you are seeing some sort of problem due to that.




Hi,

Yes, if you look at the trace that Sachin sent the module being loaded 
that does the WARN_ON() is nfsd.ko.


That module from Sachin's trace has:

  [31] __jump_table  PROGBITS 03fd77 c0 
18 WAM  0   0  1


So its not the size but rather the start offset '03fd77', that is the 
problem here. That is what the WARN_ON triggers on, that the start of 
the table is not 4-byte aligned.


Using a ppc cross-compiler and the ENTSIZE patch that line does not 
change, however if I use the initial patch posted in this thread, the 
start does align to 4-bytes and thus the warning goes away, as Sachin 
verified. In fact, without the patch I found several modules that don't 
start at the proper alignment, however with the patch that started this 
thread they were all properly aligned.


In terms of the '.balign' causing holes, we originally added the 
'_ASM_ALIGN' to x86 for precisely this reason. See commit:

ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion.

In addition, we have a lot of runtime with the .balign in the tree and 
I'm not aware of any holes in the table. I think the code would blow up 
pretty badly if there were.


A number of arches were already using the '.balign', and the patch I 
proposed simply added it to remaining ones, now that we added a 
WARN_ON() to catch this condition.


Thanks,

-Jason






Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread David Daney

On 02/28/2017 08:21 AM, Steven Rostedt wrote:

On Tue, 28 Feb 2017 10:25:46 +0530
Sachin Sant  wrote:


File: ./net/ipv4/xfrm4_input.o
  [12] __jump_table  PROGBITS 000639 18 18 WAM  
0   0  1
File: ./net/ipv4/udplite.o
File: ./net/ipv4/xfrm4_output.o
  [ 9] __jump_table  PROGBITS 000481 18 18 WAM  
0   0  1


Looks like there's some issues right there.


Those look good to me 18/18 = 1 with no remainder.  The odd numbers are 
the offset of the section in the ELF file.


If you look at the stack trace, it seems that it is during module loading.

Are the primitives for generating the tables doing something different 
for the module case?  I am not familiar enough with the powerpc ABIs to 
know.


Try this:

$ perl -n -e 's/\[ /\[/; my @f = split " "; print hex($f[5]) % 0x18 if 
$#f > 5; print $_' <~/jump_table.log



There are no entries with size that is not a multiple of 0x18.

I think my patch to add the ENTSIZE is not doing anything here.

I suspect that the alignment of the __jump_table section in the .ko 
files is not correct, and you are seeing some sort of problem due to that.







-- Steve





Re: [PATCH 5/5] powerpc/64s: fix POWER9 machine check handler from stop state

2017-02-28 Thread Gautham R Shenoy
Hi Nick,

On Fri, Feb 17, 2017 at 12:09 AM, Nicholas Piggin  wrote:
> The ISA specifies power save wakeup can cause a machine check interrupt.
> The machine check handler currently has code to handle that for POWER8,
> but POWER9 crashes when trying to execute the P8 style sleep
> instructions.
>
> So queue up the machine check, then call into the idle code to wake up
> as the system reset interrupt does, rather than attempting to sleep
> again without going through the main idle path.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 71 
> ++--
>  1 file changed, 36 insertions(+), 35 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 5f775783f744..0388843c8d12 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -329,6 +329,35 @@ EXC_COMMON_BEGIN(machine_check_common)
> /* restore original r1. */  \
> ld  r1,GPR1(r1)
>
> +#ifdef CONFIG_PPC_P7_NAP
> +EXC_COMMON_BEGIN(machine_check_idle_common)
> +   bl  machine_check_queue_event
> +   /*
> +* Queue the machine check, then reload SRR1 and use it to set
> +* CR3 according to pnv_powersave_wakeup convention.
> +*/
> +   ld  r12,_MSR(r1)
> +   rlwinm  r11,r12,47-31,30,31
> +   cmpwi   cr3,r11,2
> +
> +   /*
> +* Now have to make SRR1 wake up reason look like a system reset
> +* interrupt. Put 0xf in there, which is reserved (and does not
> +* match HMI).

The only places where the wakeup reason is presently checked on the way out
of idle-exit are in KVM guest entry path in kvmppc_check_wake_reason()
 and on the CPU-Hotplug exit path pnv_smp_cpu_kill_self(). In both places
we do a positive check for EE, Doorbell, HVEE . The kvm case is also
interested in
HMI. We ignore all the other reasons at the moment.

So this should be fine.
> +*/
> +   li  r11,0xf
> +   insrdi  r12,r11,4,45


Shouldn't this be insrdi r12,r11,4,42? The exception bits are 42:45.
I always have trouble wrapping my head around these nifty
rotate-shift-mask-insert instructions. So I might as well be wrong!


> +   mtspr   r12,SPRN_SRR1
> +   std r12,_MSR(r1)
> +
> +   /*
> +* Decrement MCE nesting after finishing with the stack.
> +*/
> +   lhz r11,PACA_IN_MCE(r13)
> +   subir11,r11,1
> +   sth r11,PACA_IN_MCE(r13)
> +   b   pnv_powersave_wakeup
> +#endif
> /*
>  * Handle machine check early in real mode. We come here with
>  * ME=1, MMU (IR=0 and DR=0) off and using MC emergency stack.
> @@ -341,6 +370,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
> bl  machine_check_early
> std r3,RESULT(r1)   /* Save result */
> ld  r12,_MSR(r1)
> +
>  #ifdef CONFIG_PPC_P7_NAP
> /*
>  * Check if thread was in power saving mode. We come here when any
> @@ -351,43 +381,14 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
>  *
>  * Go back to nap/sleep/winkle mode again if (b) is true.
>  */
> -   rlwinm. r11,r12,47-31,30,31 /* Was it in power saving mode? */
> -   beq 4f  /* No, it wasn't */
> -   /* Thread was in power saving mode. Go back to nap again. */
> -   cmpwi   r11,2
> -   blt 3f
> -   /* Supervisor/Hypervisor state loss */
> -   li  r0,1
> -   stb r0,PACA_NAPSTATELOST(r13)
> -3: bl  machine_check_queue_event
> -   MACHINE_CHECK_HANDLER_WINDUP
> -   GET_PACA(r13)
> -   ld  r1,PACAR1(r13)
> -   /*
> -* Check what idle state this CPU was in and go back to same mode
> -* again.
> -*/
> -   lbz r3,PACA_THREAD_IDLE_STATE(r13)
> -   cmpwi   r3,PNV_THREAD_NAP
> -   bgt 10f
> -   IDLE_STATE_ENTER_SEQ(PPC_NAP)
> -   /* No return */
> -10:
> -   cmpwi   r3,PNV_THREAD_SLEEP
> -   bgt 2f
> -   IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
> -   /* No return */
> -
> -2:
> -   /*
> -* Go back to winkle. Please note that this thread was woken up in
> -* machine check from winkle and have not restored the per-subcore
> -* state.
> -*/
> -   IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
> -   /* No return */
> +   BEGIN_FTR_SECTION
> +   rlwinm. r11,r12,47-31,30,31
> +   beq-4f
> +   BRANCH_TO_COMMON(r10, machine_check_idle_common)
>  4:
> +   END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
>  #endif
> +
> /*
>  * Check if we are coming from hypervisor userspace. If yes then we
>  * continue in host kernel in V mode to deliver the MC event.
> --
> 2.11.0
>

Otherwise, the patch looks correct to me.
Reviewed-by: Gautham R. Shenoy 

-- 

Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-28 Thread Steven Rostedt
On Tue, 28 Feb 2017 10:25:46 +0530
Sachin Sant  wrote:

> File: ./net/ipv4/xfrm4_input.o
>   [12] __jump_table  PROGBITS 000639 18 18 
> WAM  0   0  1
> File: ./net/ipv4/udplite.o
> File: ./net/ipv4/xfrm4_output.o
>   [ 9] __jump_table  PROGBITS 000481 18 18 
> WAM  0   0  1

Looks like there's some issues right there.

-- Steve


Re: [PATCH 2/5] powerpc/64: stop using bit in HSPRG0 to test winkle

2017-02-28 Thread Nicholas Piggin
On Tue, 28 Feb 2017 20:38:19 +0530
Gautham R Shenoy  wrote:

> Hi Nick,
> 
> On Fri, Feb 17, 2017 at 12:08 AM, Nicholas Piggin  wrote:
> > The POWER8 idle code has a neat trick of programming the power on engine
> > to restore a low bit into HSPRG0, so idle wakeup code can test and see
> > if it has been programmed this way and therefore lost all state. Restore
> > time can be reduced if winkle has not been reached.
> >  
> 
> > However this messes with our r13 PACA pointer, and requires HSPRG0 to
> > be written to. It also optimizes the slowest and most uncommon case at
> > the expense of another SPR write in the common nap state wakeup.  
> 
> Actually, this optimization was needed to reduce the guest entry time
> for a KVM guest vcore.
> 
> While running KVM on POWER8, the host is in SMT=1 mode with the
> secondary threads
> being hotplugged out and supposed to have entered the winkle state.
> Since the primary thread is still running, the core would still be in Nap.
> 
> However, each time a guest vcore is scheduled on the core, the
> secondary threads are sent an IPI to wake them up from the idle state.
> 
> On waking up, they use the last bit of HSPRG0 and conclude that they
> don't need to restore resources before executing the guest entry code.
> 
> Thus, the HSPRG0 trick help the secondary threads avoid spending
> extra cycles restoring the SLBs and per-thread SPRs.

Ah, thanks. Until now I didn't really understand why that case was
optimimized. My mistake and apologies for assuming it was not a good
reason :)


> > Remove this complexity and assume winkle sleeps always require a state
> > restore. This speedup could be made entirely contained within the winkle
> > idle code by counting per-core winkles and setting a thread bitmap when
> > all have gone to winkle.  
> 
> This is a good idea! We are anyway taking the lock in pnv_wakeup_tb_loss()
> to check if we are the first thread in the core waking up from a deep-state.
> 
> We can check another bitmap if we are the first thread waking up from winkle
> and set cr4 to the result of that comparison while holding the lock.
> 
> That shouldn't cost us much, at least for the fake-wakeup-from-winkle
> for KVM case alluded to above.
> 
> Otherwise, the patch looks good to me.

Okay, I had a half-done patch for this. I'll finish it and send it
out for review.

Thanks,
Nick


Re: [PATCH 4/5] powerpc/64s: consolidate pnv_restore_hyp_resource into pnv_powersave_wakeup

2017-02-28 Thread Gautham R Shenoy
Hi Nick,

On Fri, Feb 17, 2017 at 12:09 AM, Nicholas Piggin  wrote:
> There is only one caller, so this reduces spaghetti of subsequent
> callees returning into the caller.
>
> Signed-off-by: Nicholas Piggin 

This patch is good!

Reviewed-by: Gautham R. Shenoy 

--
Thanks and Regards
gautham.


Re: [PATCH 3/5] powerpc/64s: use alternative feature patching

2017-02-28 Thread Gautham R Shenoy
Hi Nick,

On Fri, Feb 17, 2017 at 12:08 AM, Nicholas Piggin  wrote:
> This reduces the number of nops for POWER8
>
> Signed-off-by: Nicholas Piggin 

This change looks ok to me.

Reviewed-by: Gautham R. Shenoy 

> ---
>  arch/powerpc/kernel/idle_book3s.S | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 1271344e5523..ab15dee371c9 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -417,13 +417,8 @@ BEGIN_FTR_SECTION
> rldicl  r5,r5,4,60
> cmpdcr4,r5,r4
> bge cr4,pnv_wakeup_tb_loss
> -   /*
> -* Waking up without hypervisor state loss. Return to
> -* reset vector
> -*/
> -   blr
>
> -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> +FTR_SECTION_ELSE
>
> /*
>  * POWER ISA 2.07 or less.
> @@ -440,9 +435,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>  * indicates we are waking with hypervisor state loss from nap.
>  */
> bgt cr3,.
> +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
>
> -   blr /* Return back to System Reset vector from where
> -  pnv_restore_hyp_resource was invoked */
> +   /*
> +* Waking up without hypervisor state loss. Return to
> +* reset vector
> +*/
> +   blr
>
>  /*
>   * Called if waking up from idle state which can cause either partial or
> --
> 2.11.0
>



-- 
Thanks and Regards
gautham.


Re: [PATCH 2/5] powerpc/64: stop using bit in HSPRG0 to test winkle

2017-02-28 Thread Gautham R Shenoy
Hi Nick,

On Fri, Feb 17, 2017 at 12:08 AM, Nicholas Piggin  wrote:
> The POWER8 idle code has a neat trick of programming the power on engine
> to restore a low bit into HSPRG0, so idle wakeup code can test and see
> if it has been programmed this way and therefore lost all state. Restore
> time can be reduced if winkle has not been reached.
>

> However this messes with our r13 PACA pointer, and requires HSPRG0 to
> be written to. It also optimizes the slowest and most uncommon case at
> the expense of another SPR write in the common nap state wakeup.

Actually, this optimization was needed to reduce the guest entry time
for a KVM guest vcore.

While running KVM on POWER8, the host is in SMT=1 mode with the
secondary threads
being hotplugged out and supposed to have entered the winkle state.
Since the primary thread is still running, the core would still be in Nap.

However, each time a guest vcore is scheduled on the core, the
secondary threads are sent an IPI to wake them up from the idle state.

On waking up, they use the last bit of HSPRG0 and conclude that they
don't need to restore resources before executing the guest entry code.

Thus, the HSPRG0 trick help the secondary threads avoid spending
extra cycles restoring the SLBs and per-thread SPRs.



>
> Remove this complexity and assume winkle sleeps always require a state
> restore. This speedup could be made entirely contained within the winkle
> idle code by counting per-core winkles and setting a thread bitmap when
> all have gone to winkle.

This is a good idea! We are anyway taking the lock in pnv_wakeup_tb_loss()
to check if we are the first thread in the core waking up from a deep-state.

We can check another bitmap if we are the first thread waking up from winkle
and set cr4 to the result of that comparison while holding the lock.

That shouldn't cost us much, at least for the fake-wakeup-from-winkle
for KVM case alluded to above.

Otherwise, the patch looks good to me.


>
> Signed-off-by: Nicholas Piggin 

--
Thanks and Regards
gautham.


Re: [PATCH] powerpc: booke: fix boot crash due to null hugepd

2017-02-28 Thread Laurentiu Tudor
Hi,

Some more information on the crash, inline.

On 02/17/2017 02:18 PM, Aneesh Kumar K.V wrote:
> laurentiu.tu...@nxp.com writes:
>
>> From: Laurentiu Tudor 
>>
>> On 32-bit book-e machines, hugepd_ok() does not take
>> into account null hugepd values, causing this crash at boot:
>>
>> Unable to handle kernel paging request for data at address 0x8000
>> Faulting instruction address: 0xc00182a8
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> SMP NR_CPUS=24
>> CoreNet Generic
>> Modules linked in:
>> CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW   
>> 4.10.0-rc8-00016-g69b1f87 #11
>> task: e505 task.stack: e5058000
>> NIP: c00182a8 LR: c001829c CTR: 7ffe
>> REGS: e5059c50 TRAP: 0300   Tainted: GW
>> (4.10.0-rc8-00016-g69b1f87)
>> MSR: 00021002 
>>CR: 88428e82  XER: 
>> DEAR: 8000 ESR: 
>> GPR00: c0107510 e5059d00 e505 8000 bff1 e5059d0c e5059d08 
>> 2017
>> GPR08:     28428e82  c00027d0 
>> 
>> GPR16:   88a28e82 2000 48422e82  88a28e84 
>> dd004000
>> GPR24: e5059e38   bff1 dd004000 0001 00029002 
>> bff1
>> NIP [c00182a8] follow_huge_addr+0x38/0xf0
>> LR [c001829c] follow_huge_addr+0x2c/0xf0
>> Call Trace:
>> [e5059d00] [e5059d00] 0xe5059d00 (unreliable)
>> [e5059d20] [c0107510] follow_page_mask+0x40/0x3c0
>> [e5059d80] [c0107958] __get_user_pages+0xc8/0x420
>> [e5059de0] [c010817c] get_user_pages_remote+0x8c/0x230
>> [e5059e30] [c013f170] copy_strings+0x110/0x3a0
>> [e5059ea0] [c013f42c] copy_strings_kernel+0x2c/0x50
>> [e5059ec0] [c0141324] do_execveat_common+0x474/0x620
>> [e5059f10] [c01414fc] do_execve+0x2c/0x40
>> [e5059f20] [c0001f68] try_to_run_init_process+0x18/0x60
>> [e5059f30] [c000289c] kernel_init+0xcc/0x120
>> [e5059f40] [c000f1e8] ret_from_kernel_thread+0x5c/0x64
>> Instruction dump:
>> bfc10018 7c9f2378 90010024 7fc000a6 7c000146 80630020 38a1000c 38c10008
>> 4bfff869 2c03 41c20090 81210008 <8143> 81630004 3860ffea 2f89
>> ---[ end trace 4bf94e15fd9fa824 ]---
>
>
> Which code path is that. That null should be filtered by the if
> (pmd_none(pmd)) check in find_linux_pte_or_hugepte right ?

The crash happens when __find_linux_pte_or_hugepte() calls hugepd_ok(),
on this line [1]. It's triggered when __find_linux_pte_or_hugepte() is
first called, when the kernel tries to spawn the init process. The input
effective address (ea arg) is bff1. This is the call stack:

[e5059cd0] [c0017b60] __find_linux_pte_or_hugepte+0x60/0x120 (unreliable)
[e5059d00] [c001832c] follow_huge_addr+0x2c/0xf0
[e5059d20] [c0107590] follow_page_mask+0x40/0x3c0
[e5059d80] [c01079d8] __get_user_pages+0xc8/0x420
[e5059de0] [c01081fc] get_user_pages_remote+0x8c/0x230
[e5059e30] [c013f210] copy_strings+0x110/0x3a0
[e5059ea0] [c013f4cc] copy_strings_kernel+0x2c/0x50
[e5059ec0] [c01413c4] do_execveat_common+0x474/0x620
[e5059f10] [c014159c] do_execve+0x2c/0x40
[e5059f20] [c0001f68] try_to_run_init_process+0x18/0x60
[e5059f30] [c000289c] kernel_init+0xcc/0x120
[e5059f40] [c000f1e8] ret_from_kernel_thread+0x5c/0x64

Thanks in advance for any pointers.

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/mm/hugetlbpage.c#n918

---
Best Regards, Laurentiu

Re: [PATCH v3 1/2] powerpc: split ftrace bits into a separate file

2017-02-28 Thread Steven Rostedt
On Tue, 28 Feb 2017 15:04:15 +1100
Michael Ellerman  wrote:

kernel/trace/ftrace.c more obvious.
> 
> I don't know if it's really worth keeping the names the same across
> arches, especially as we already have:
> 
>   arch/arm64/kernel/entry-ftrace.S
>   arch/arm/kernel/entry-ftrace.S
>   arch/blackfin/kernel/ftrace-entry.S
>   arch/metag/kernel/ftrace_stub.S
> 
> But we can rename it if you feel strongly about it.

Hmm, perhaps "entry-ftrace.S" would be the better name. I never liked
the "mcount.S" name.

-- Steve


Re: [bug report] v4.10.1 Oops on P1010 - huge tlb - there's already a fix ...

2017-02-28 Thread Laurentiu Tudor
Hi Wolfgang,

Thanks for reporting!

On 02/28/2017 01:47 PM, Wolfgang Ocker wrote:> On Tue, 2017-02-28 at 
10:57 +0100, Wolfgang Ocker wrote:
 >> With kernel v4.10.1 and huge tlb enabled (CONFIG_HUGETLBFS=y) I see
 >> the
 >>   following oops on a P1010:
 >
 > Just saw that there is already a fix:
 >

The fix is not accepted but you can use it until we come up with a 
better one. Please note that 4.7 & 4.9 stable kernels are also impacted
so if you plan to use them you'll need the fix.

---
Best Regards, Laurentiu

Re: [bug report] v4.10.1 Oops on P1010 - huge tlb - there's already a fix ...

2017-02-28 Thread Wolfgang Ocker
On Tue, 2017-02-28 at 10:57 +0100, Wolfgang Ocker wrote:
> With kernel v4.10.1 and huge tlb enabled (CONFIG_HUGETLBFS=y) I see
> the
>  following oops on a P1010:

Just saw that there is already a fix:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/154204.html

Thanks!



Re: [PATCH 1/5] powerpc/64s: move remaining system reset idle code into idle_book3s.S

2017-02-28 Thread Gautham R Shenoy
Hi Nick,

This patch is fine by me.

Reviewed-by: Gautham R. Shenoy 

On Fri, Feb 17, 2017 at 12:08 AM, Nicholas Piggin  wrote:
> Should be no functional change.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 26 +---
>  arch/powerpc/kernel/idle_book3s.S| 39 
> +++-
>  2 files changed, 35 insertions(+), 30 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 0e1350b30160..d0d89047befe 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -130,31 +130,7 @@ EXC_VIRT_NONE(0x4100, 0x4200)
>
>  #ifdef CONFIG_PPC_P7_NAP
>  EXC_COMMON_BEGIN(system_reset_idle_common)
> -BEGIN_FTR_SECTION
> -   GET_PACA(r13) /* Restore HSPRG0 to get the winkle bit in r13 */
> -END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> -   bl  pnv_restore_hyp_resource
> -
> -   li  r0,PNV_THREAD_RUNNING
> -   stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
> -
> -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> -   li  r0,KVM_HWTHREAD_IN_KERNEL
> -   stb r0,HSTATE_HWTHREAD_STATE(r13)
> -   /* Order setting hwthread_state vs. testing hwthread_req */
> -   sync
> -   lbz r0,HSTATE_HWTHREAD_REQ(r13)
> -   cmpwi   r0,0
> -   beq 1f
> -   b   kvm_start_guest
> -1:
> -#endif
> -
> -   /* Return SRR1 from power7_nap() */
> -   mfspr   r3,SPRN_SRR1
> -   blt cr3,2f
> -   b   pnv_wakeup_loss
> -2: b   pnv_wakeup_noloss
> +   b   pnv_powersave_wakeup
>  #endif
>
>  EXC_COMMON_BEGIN(system_reset_common)
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 72dac0b58061..ea3562f83c57 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -115,7 +115,7 @@ core_idle_lock_held:
>   *
>   * Address to 'rfid' to in r5
>   */
> -_GLOBAL(pnv_powersave_common)
> +pnv_powersave_common:
> /* Use r3 to pass state nap/sleep/winkle */
> /* NAP is a state loss, we create a regs frame on the
>  * stack, fill it up with the state we care about and
> @@ -365,6 +365,34 @@ _GLOBAL(power9_idle_stop)
> LOAD_REG_ADDR(r5,power_enter_stop)
> b   pnv_powersave_common
> /* No return */
> +
> +.global pnv_powersave_wakeup
> +pnv_powersave_wakeup:
> +BEGIN_FTR_SECTION
> +   GET_PACA(r13) /* Restore HSPRG0 to get the winkle bit in r13 */
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> +   bl  pnv_restore_hyp_resource
> +
> +   li  r0,PNV_THREAD_RUNNING
> +   stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +   li  r0,KVM_HWTHREAD_IN_KERNEL
> +   stb r0,HSTATE_HWTHREAD_STATE(r13)
> +   /* Order setting hwthread_state vs. testing hwthread_req */
> +   sync
> +   lbz r0,HSTATE_HWTHREAD_REQ(r13)
> +   cmpwi   r0,0
> +   beq 1f
> +   b   kvm_start_guest
> +1:
> +#endif
> +
> +   /* Return SRR1 from power7_nap() */
> +   mfspr   r3,SPRN_SRR1
> +   blt cr3,pnv_wakeup_noloss
> +   b   pnv_wakeup_loss
> +
>  /*
>   * Called from reset vector. Check whether we have woken up with
>   * hypervisor state loss. If yes, restore hypervisor state and return
> @@ -373,7 +401,7 @@ _GLOBAL(power9_idle_stop)
>   * r13 - Contents of HSPRG0
>   * cr3 - set to gt if waking up with partial/complete hypervisor state loss
>   */
> -_GLOBAL(pnv_restore_hyp_resource)
> +pnv_restore_hyp_resource:
>  BEGIN_FTR_SECTION
> ld  r2,PACATOC(r13);
> /*
> @@ -436,7 +464,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>   * cr3 - gt if waking up with partial/complete hypervisor state loss
>   * cr4 - gt or eq if waking up from complete hypervisor state loss.
>   */
> -_GLOBAL(pnv_wakeup_tb_loss)
> +pnv_wakeup_tb_loss:
> ld  r1,PACAR1(r13)
> /*
>  * Before entering any idle state, the NVGPRs are saved in the stack
> @@ -640,7 +668,8 @@ fastsleep_workaround_at_exit:
>   * R3 here contains the value that will be returned to the caller
>   * of power7_nap.
>   */
> -_GLOBAL(pnv_wakeup_loss)
> +.global pnv_wakeup_loss
> +pnv_wakeup_loss:
> ld  r1,PACAR1(r13)
>  BEGIN_FTR_SECTION
> CHECK_HMI_INTERRUPT
> @@ -660,7 +689,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>   * R3 here contains the value that will be returned to the caller
>   * of power7_nap.
>   */
> -_GLOBAL(pnv_wakeup_noloss)
> +pnv_wakeup_noloss:
> lbz r0,PACA_NAPSTATELOST(r13)
> cmpwi   r0,0
> bne pnv_wakeup_loss
> --
> 2.11.0
>



-- 
Thanks and Regards
gautham.


[bug report] v4.10.1 Oops on P1010 - huge tlb

2017-02-28 Thread Wolfgang Ocker
With kernel v4.10.1 and huge tlb enabled (CONFIG_HUGETLBFS=y) I see the
 following oops on a P1010:


Freeing unused kernel memory: 428K
This architecture does not have kernel memory protection.
Unable to handle kernel paging request for data at address 0x8000
Faulting instruction address: 0xc001b3a0
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT
SMP NR_CPUS=8
DSP01
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
4.10.1-weo-00024-g4de55c3 #1
task: eb0e4000 task.stack: eb0e2000
NIP: c001b3a0 LR: c001b390 CTR: 7ffe
REGS: eb0e3c50 TRAP: 0300   Tainted: GW
(4.10.1-weo-00024-g4de55c3)
MSR: 00021000 
  CR: 88428022  XER: 
DEAR: 8000 ESR: 
GPR00: c001b390 eb0e3d00 eb0e4000 8000 bff1 eb0e3d0c eb0e3d08 
GPR08: c097   0080 28428022  c0002a5c 
GPR16: eb0e3e48 0001 88002022 2000 48422022  88002024 ef0a8000
GPR24: eb0e3e48   bff1 0001 bff1 00029000 ffea
NIP [c001b3a0] follow_huge_addr+0x54/0x114
LR [c001b390] follow_huge_addr+0x44/0x114
Call Trace:
[eb0e3d00] [c001b390] follow_huge_addr+0x44/0x114 (unreliable)
[eb0e3d30] [c014ffbc] follow_page_mask+0x4c/0x4a4
[eb0e3d90] [c01504e4] __get_user_pages+0xd0/0x3d8
[eb0e3df0] [c0150ca0] get_user_pages_remote+0x98/0x238
[eb0e3e40] [c018abf4] copy_strings+0x108/0x2bc
[eb0e3ea0] [c018ade0] copy_strings_kernel+0x38/0x50
[eb0e3ec0] [c018ce00] do_execveat_common+0x468/0x6c4
[eb0e3f10] [c018d094] do_execve+0x38/0x48
[eb0e3f20] [c00021cc] try_to_run_init_process+0x24/0x64
[eb0e3f30] [c0002b24] kernel_init+0xc8/0x11c
[eb0e3f40] [c00112d0] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at   (null)
LR =   (null)
Instruction dump:
7fc000a6 7c000146 480edde9 807f0020 7fa4eb78 38a1000c 38c10008 4bfff851
2c03 418200bc 81410008 3be0ffea <80c3> 80e30004 2f8a 409e002c
---[ end trace 79d29a00645dff1b ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b


Line numbers:

NIP [c001b3a0] follow_huge_addr+0x54/0x114
linux/include/linux/compiler.h:243

[eb0e3d00] [c001b390] follow_huge_addr+0x44/0x114 (unreliable)
linux/arch/powerpc/mm/hugetlbpage.c:635

[eb0e3d30] [c014ffbc] follow_page_mask+0x4c/0x4a4
linux/mm/gup.c:238


The problem does not occur with 4.9.x. It does not occure with 4.10.1
and CONFIG_HUGETLBFS disabled. The problem does not occur after
reverting the following patches:

"powerpc: get hugetlbpage handling more generic" (03bb2d65900c)
"powerpc/8xx: Implement support of hugepages" (4b9142869947)
"powerpc: Fix pgtable pmd cache init" (bf5ca68dd2ee)
"powerpc/mm/hugetlb: Don't panic when we don't find the default huge page size" 
(ff8b85796dad)
"powerpc/mm: Fix little-endian 4K hugetlb" (20717e1ff526)


Best regards,
Wolfgang


Re: [PATCH] staging: fsl-mc: fix warning in DT ranges parser

2017-02-28 Thread Laurentiu Tudor
Hi Arnd,

On 02/27/2017 10:42 PM, Arnd Bergmann wrote:
> The fsl-mc-bus driver in staging contains a copy of the standard
> 'ranges' property parsing algorithm with a hack to treat a missing
> property the same way as an empty one. This code produces false-positive
> warnings for me in an allmodconfig build:
>
> drivers/staging/fsl-mc/bus/fsl-mc-bus.c: In function 'fsl_mc_bus_probe':
> drivers/staging/fsl-mc/bus/fsl-mc-bus.c:645:6: error: 'mc_size_cells' may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]
> drivers/staging/fsl-mc/bus/fsl-mc-bus.c:682:8: error: 'mc_addr_cells' may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]
> drivers/staging/fsl-mc/bus/fsl-mc-bus.c:644:6: note: 'mc_addr_cells' was 
> declared here
> drivers/staging/fsl-mc/bus/fsl-mc-bus.c:684:8: error: 'paddr_cells' may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]
> drivers/staging/fsl-mc/bus/fsl-mc-bus.c:643:6: note: 'paddr_cells' was 
> declared here
>
> To avoid the warnings, I'm simplifying the argument handling to pass
> the number of valid ranges in the property as the function return code
> rather than passing it by reference. With this change, gcc can see that
> we don't evaluate the cell numbers for an missing ranges property.
>
> Signed-off-by: Arnd Bergmann 

Looks good to me, i've tested it and did not see any issues, so here's an:

Acked-by: Laurentiu Tudor 

---
Thanks & Best Regards, Laurentiu

Re: [PATCH] kbuild: avoid unrecognized option error for external DTC

2017-02-28 Thread Michael Ellerman
Ben Hutchings  writes:

> [ Unknown signature status ]
> On Mon, 2017-02-27 at 14:40 +0900, Masahiro Yamada wrote:
>> Since commit 6b22b3d1614a ("kbuild: Allow using host dtc instead of
>> kernel's copy"), it is possible to use an external dtc.  In this
>> case, we do not know which options are supported on it.
>> 
>> Commit bc553986a2f7 ("dtc: turn off dtc unit address warnings by
>> default") gives -Wno-unit_address_vs_reg, but this options is only
>> recognized by v1.4.2 or later.
>> 
>> If an older version is specified, the build fails:
>
> But the option to use an external dtc was intended to allow testing of
> newer versions.  If there's no reason to use this option to run an
> older version, why bother trying to support that?

+1

That's a lot of added complexity, when the answer could just be "use the
kernel dtc".

cheers


[PATCH v2] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails

2017-02-28 Thread Vipin K Parashar
Added check for OPAL_WRONG_STATE error code returned from OPAL.
Currently Linux flashes "unexpected error" over console for this
error. This will avoid throwing such message and return I/O error
for such OPAL failures.

Signed-off-by: Vipin K Parashar 
---
Changes in v2:
 - Added log message indicating sleeping/offline core
   for OPAL_WRONG_STATE

 arch/powerpc/platforms/powernv/opal.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 86d9fde..8af230e 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -869,8 +869,11 @@ int opal_error_code(int rc)
case OPAL_UNSUPPORTED:  return -EIO;
case OPAL_HARDWARE: return -EIO;
case OPAL_INTERNAL_ERROR:   return -EIO;
+   case OPAL_WRONG_STATE:
+   pr_notice("%s: Core sleeping/offline\n", __func__);
+   return -EIO;
default:
-   pr_err("%s: unexpected OPAL error %d\n", __func__, rc);
+   pr_err("%s: Unexpected OPAL error %d\n", __func__, rc);
return -EIO;
}
 }
-- 
2.7.4



Re: [PATCH] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails

2017-02-28 Thread Vipin K Parashar

Thanks!! for review.

Sending out v2 with  suggested changes.


On Thursday 23 February 2017 09:22 AM, Stewart Smith wrote:

Michael Ellerman  writes:


Stewart Smith  writes:


Vipin K Parashar  writes:

On Monday 13 February 2017 06:13 AM, Michael Ellerman wrote:

Vipin K Parashar  writes:


OPAL returns OPAL_WRONG_STATE for XSCOM operations

done to read any core FIR which is sleeping, offline.

OK.

Do we know why Linux is causing that to happen?

This issue is originally seen upon running STAF (Software Test
Automation Framework) stress tests and off-lining some cores
with stress tests running.

It can also be re-created after off-lining few cores and following
one of below methods.
1. Executing Linux "sensors" command
2. Reading contents of file /sys/class/hwmon/hwmon0/tempX_input,
 where X is offline CPU.

Its "opal_get_sensor_data" Linux API that that triggers
OPAL call "opal_sensor_read", performing XSCOM ops here.
If core is found sleeping/offline Linux throws up
"opal_error_code: Unexpected OPAL error" error onto console.

Currently Linux isn't aware about OPAL_WRONG_STATE return code
from OPAL. Thus it prints "Unexpected OPAL error" message, same
as it would log for any unknown OPAL return codes.

Seeing this error over console has been a concern for Test and
would puzzle real user as well. This patch makes Linux aware about
OPAL_WRONG_STATE return code from OPAL and stops printing
"Unexpected OPAL error" message onto console for OPAL fails
with OPAL_WRONG_STATE

Ahh... so this is a DTS sensor, which indeed is just XSCOMs and we
return the xscom_read return code in event of error.

I would argue that converting to EIO in that instance is probably
correct... or EAGAIN? EAGAIN may be more correct in the situation where
the core is just sleeping.

What kind of offlining are you doing?

Arguably, the correct behaviour would be to remove said sensors when the
core is offline.

Right, that would be ideal. There appear to be at least two other hwmon
drivers that are CPU hotplug aware (coretemp and via-cputemp).

But perhaps it's not possible to work out which sensors are attached to
which CPU etc., I haven't looked in detail.

Each core-temp@ sensor has a ibm,pir property, so linking back to what
core shouldn't be too hard. For mem-temp@ sensors, we have the chip-id.


In that case changing just opal_get_sensor_data() to handle
OPAL_WRONG_STATE would be OK, with a comment explaining that we might be
asked to read a sensor on an offline CPU and we aren't able to detect
that.

Agree.





Re: [PATCH 1/3] powerpc/64s: fix handling of non-synchronous machine checks

2017-02-28 Thread Nicholas Piggin
On Tue, 28 Feb 2017 11:27:29 +0530
Mahesh Jagannath Salgaonkar  wrote:

> On 02/28/2017 07:30 AM, Nicholas Piggin wrote:
> > A synchronous machine check is an exception raised by the attempt to
> > execute the current instruction. If the error can't be corrected, it
> > can make sense to SIGBUS the currently running process.
> > 
> > In other cases, the error condition is not related to the current
> > instruction, so killing the current process is not the right thing to
> > do.
> > 
> > Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no
> > practical change. It will be used to handle POWER9 asynchronous
> > machine checks.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  arch/powerpc/platforms/powernv/opal.c | 21 ++---
> >  1 file changed, 6 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/opal.c 
> > b/arch/powerpc/platforms/powernv/opal.c
> > index 86d9fde93c17..e0f856bfbfe8 100644
> > --- a/arch/powerpc/platforms/powernv/opal.c
> > +++ b/arch/powerpc/platforms/powernv/opal.c
> > @@ -395,7 +395,6 @@ static int opal_recover_mce(struct pt_regs *regs,
> > struct machine_check_event *evt)
> >  {
> > int recovered = 0;
> > -   uint64_t ea = get_mce_fault_addr(evt);
> > 
> > if (!(regs->msr & MSR_RI)) {
> > /* If MSR_RI isn't set, we cannot recover */
> > @@ -404,26 +403,18 @@ static int opal_recover_mce(struct pt_regs *regs,
> > } else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
> > /* Platform corrected itself */
> > recovered = 1;
> > -   } else if (ea && !is_kernel_addr(ea)) {
> > +   } else if (evt->severity == MCE_SEV_FATAL) {
> > +   /* Fatal machine check */
> > +   pr_err("Machine check interrupt is fatal\n");
> > +   recovered = 0;  
> 
> Setting recovered = 0 would trigger kernel panic. Should we panic the
> kernel for asynchronous errors ?

If it's not recoverable, I don't see what other option we have. SRR0 is
meaningless for async machine checks. So it's much the same thing we do
as if we don't have a process to kill or were running in kernel when a
synchronous MCE occurred.

Thanks,
Nick