Re: [PATCH 15/20] cpuidle/powernv: Add support for POWER ISA v3 idle states

2016-11-28 Thread Ram Pai
Sorry. please ignore this email.
RP

On Mon, Nov 28, 2016 at 11:08:59PM -0800, linux...@us.ibm.com wrote:
> From: Shreyas B. Prabhu 
> 
> POWER ISA v3 defines a new idle processor core mechanism. In summary,
>  a) new instruction named stop is added.
>  b) new per thread SPR named PSSCR is added which controls the behavior
>   of stop instruction.
> 
> Supported idle states and value to be written to PSSCR register to enter
> any idle state is exposed via ibm,cpu-idle-state-names and
> ibm,cpu-idle-state-psscr respectively. To enter an idle state,
> platform provided power_stop() needs to be invoked with the appropriate
> PSSCR value.
> 
> This patch adds support for this new mechanism in cpuidle powernv driver.
> 
> Cc: Rafael J. Wysocki 
> Cc: Daniel Lezcano 
> Cc: Rob Herring 
> Cc: Lorenzo Pieralisi 
> Cc: linux...@vger.kernel.org
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: linuxppc-dev@lists.ozlabs.org
> Reviewed-by: Gautham R. Shenoy 
> Signed-off-by: Shreyas B. Prabhu 
> Signed-off-by: Michael Ellerman 
> Signed-off-by: Ram Pai 
> (cherry-picked from 3005c597ba46480b42e1fea3512c408f1830b816)
> ---
>  drivers/cpuidle/cpuidle-powernv.c |   61 
> +
>  1 files changed, 61 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> b/drivers/cpuidle/cpuidle-powernv.c
> index bdf8dae..cab1b4b 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -20,6 +20,8 @@
>  #include 
>  #include 
> 
> +#define POWERNV_THRESHOLD_LATENCY_NS 20
> +
>  struct cpuidle_driver powernv_idle_driver = {
>   .name = "powernv_idle",
>   .owner= THIS_MODULE,
> @@ -27,6 +29,9 @@ struct cpuidle_driver powernv_idle_driver = {
> 
>  static int max_idle_state;
>  static struct cpuidle_state *cpuidle_state_table;
> +
> +static u64 stop_psscr_table[CPUIDLE_STATE_MAX];
> +
>  static u64 snooze_timeout;
>  static bool snooze_timeout_en;
> 
> @@ -91,6 +96,17 @@ static int fastsleep_loop(struct cpuidle_device *dev,
>   return index;
>  }
>  #endif
> +
> +static int stop_loop(struct cpuidle_device *dev,
> +  struct cpuidle_driver *drv,
> +  int index)
> +{
> + ppc64_runlatch_off();
> + power9_idle_stop(stop_psscr_table[index]);
> + ppc64_runlatch_on();
> + return index;
> +}
> +
>  /*
>   * States for dedicated partition case.
>   */
> @@ -170,6 +186,8 @@ static int powernv_add_idle_states(void)
>   u32 latency_ns[CPUIDLE_STATE_MAX];
>   u32 residency_ns[CPUIDLE_STATE_MAX];
>   u32 flags[CPUIDLE_STATE_MAX];
> + u64 psscr_val[CPUIDLE_STATE_MAX];
> + const char *names[CPUIDLE_STATE_MAX];
>   int i, rc;
> 
>   /* Currently we have snooze statically defined */
> @@ -208,12 +226,35 @@ static int powernv_add_idle_states(void)
>   pr_warn("cpuidle-powernv: missing 
> ibm,cpu-idle-state-latencies-ns in DT\n");
>   goto out;
>   }
> + if (of_property_read_string_array(power_mgt,
> + "ibm,cpu-idle-state-names", names, dt_idle_states) < 0) {
> + pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in 
> DT\n");
> + goto out;
> + }
> +
> + /*
> +  * If the idle states use stop instruction, probe for psscr values
> +  * which are necessary to specify required stop level.
> +  */
> + if (flags[0] & (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP))
> + if (of_property_read_u64_array(power_mgt,
> + "ibm,cpu-idle-state-psscr", psscr_val, dt_idle_states)) {
> + pr_warn("cpuidle-powernv: missing 
> ibm,cpu-idle-states-psscr in DT\n");
> + goto out;
> + }
> 
>   rc = of_property_read_u32_array(power_mgt,
>   "ibm,cpu-idle-state-residency-ns", residency_ns, 
> dt_idle_states);
> 
> 
>   for (i = 0; i < dt_idle_states; i++) {
> + /*
> +  * If an idle state has exit latency beyond
> +  * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> +  * in cpu-idle.
> +  */
> + if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
> + continue;
> 
>   /*
>* Cpuidle accepts exit_latency and target_residency in us.
> @@ -226,6 +267,16 @@ static int powernv_add_idle_states(void)
>   powernv_states[nr_idle_states].flags = 
> CPUIDLE_FLAG_TIME_VALID;
>   powernv_states[nr_idle_states].target_residency = 100;
>   powernv_states[nr_idle_states].enter = nap_loop;
> + } else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&

Re: [PATCH v3 0/3] powernv:stop: Use psscr_val,mask provided by firmware

2016-11-28 Thread Balbir Singh


On 10/11/16 18:54, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> This is the third iteration of the patchset to use the psscr_val and
> psscr_mask provided by the firmware for each of the stop states.
> 
> The previous version can be found here:
> 
> [v2]: https://lkml.org/lkml/2016/10/27/143
> [v1]: https://lkml.org/lkml/2016/9/29/45
> 
> This version fixes a couple of bugs pertaining to strncpy and
> initialization of the target-residency values of nap and sleep
> which were pointed out by Paul and Oliver in the earlier version.
> 
> Synopsis
> ==
> In the current implementation, the code for ISA
> v3.0 stop implementation has a couple of shortcomings.
> 
> a) The code hand-codes the values for ESL,EC,TR,MTL bits of PSSCR and
>uses only the RL field from the firmware. While this is not
>incorrect, since the hand-coded values are legitimate, it is not a
>very flexible design since the firmware has the capability to
>communicate these values via the "ibm,cpu-idle-state-psscr" and
>"ibm,cpu-idle-state-psscr-mask" properties. In case where the
>firmware provides values for these fields that is different from
>the hand-coded values, the current code will not work as intended.
> 
> b) Due to issue a), the current code assumes that ESL=EC=1 for all the
>stop states and hence the wakeup from the stop instruction will
>happen at 0x100, the system-reset vector. However, the ISA v3.0
>allows the ESL=EC=0 behaviour where the corresponding stop-state
>loses no state and wakes up from the subsequent instruction. The
>current code doesn't handle this case.
>

Sounds reasonable

Balbir Singh.


Re: [PATCH v3 3/3] powernv: Pass PSSCR value and mask to power9_idle_stop

2016-11-28 Thread Gautham R Shenoy
Hi Michael,

On Wed, Nov 23, 2016 at 08:51:10PM +1100, Michael Ellerman wrote:
> "Gautham R. Shenoy"  writes:
> 
> > From: "Gautham R. Shenoy" 
> >
> > The power9_idle_stop method currently takes only the requested stop
> > level as a parameter and picks up the rest of the PSSCR bits from a
> > hand-coded macro. This is not a very flexible design, especially when
> > the firmware has the capability to communicate the psscr value and the
> > mask associated with a particular stop state via device tree.
> >
> > This patch modifies the power9_idle_stop API to take as parameters the
> > PSSCR value and the PSSCR mask corresponding to the stop state that
> > needs to be set. These PSSCR value and mask are respectively obtained
> > by parsing the "ibm,cpu-idle-state-psscr" and
> > "ibm,cpu-idle-state-psscr-mask" fields from the device tree.
> >
> > In addition to this, the patch adds support for handling stop states
> > for which ESL and EC bits in the PSSCR are zero. As per the
> > architecture, a wakeup from these stop states resumes execution from
> > the subsequent instruction as opposed to waking up at the System
> > Vector.
> >
> > The older firmware sets only the Requested Level (RL) field in the
> > psscr and psscr-mask exposed in the device tree. For older firmware
> > where psscr-mask=0xf, this patch will set the default sane values that
> > the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
> > TR).
> 
> So we're using psscr-mas=0xf as a signal that we're running on old
> firmware.
> 
> That's OK I think, but please send a patch to document it in the device
> tree binding.
> 
> And call it out below in the code.

Sure will do this! Thanks for reviewing the code.

> 
> cheers
> 

--
Thanks and Regards
gautham.



Re: [PATCH v3 2/3] cpuidle:powernv: Add helper function to populate powernv idle states.

2016-11-28 Thread Gautham R Shenoy
Hi Michael,

On Wed, Nov 23, 2016 at 08:49:08PM +1100, Michael Ellerman wrote:
> "Gautham R. Shenoy"  writes:
> 
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> > b/drivers/cpuidle/cpuidle-powernv.c
> > index 7fe442c..9240e08 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -243,28 +262,31 @@ static int powernv_add_idle_states(void)
> >  */
> > if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
> > continue;
> > +   /*
> > +* Firmware passes residency and latency values in ns.
> > +* cpuidle expects it in us.
> > +*/
> > +   exit_latency = ((unsigned int)latency_ns[i]) / 1000;
> > +   target_residency = (!rc) ? ((unsigned int)residency_ns[i]) : 0;
> > +   target_residency /= 1000;
> 
> Urk.
> 
> Can you just do it normally:
> 
>   if (rc == 0)
>   target_residency = (unsigned int)residency_ns[i] / 1000;

Wrote this in a non-standard manner since the normal way would exceed
80 chars.

> 
> I also don't see why you need the cast?

It is a remnant from the previous code. But I see your point, the cast
is redundant, and removing it will make it possible to implement this
in the normal manner without checkpatch warning about it.

> 
> cheers
> 

--
Thanks and Regards
gautham.



Re: [PATCH kernel v6 6/7] vfio/spapr: Reference mm in tce_container

2016-11-28 Thread David Gibson
On Thu, Nov 24, 2016 at 04:48:09PM +1100, Alexey Kardashevskiy wrote:
> In some situations the userspace memory context may live longer than
> the userspace process itself so if we need to do proper memory context
> cleanup, we better have tce_container take a reference to mm_struct and
> use it later when the process is gone (@current or @current->mm is NULL).
> 
> This references mm and stores the pointer in the container; this is done
> in a new helper - tce_iommu_mm_set() - when one of the following happens:
> - a container is enabled (IOMMU v1);
> - a first attempt to pre-register memory is made (IOMMU v2);
> - a DMA window is created (IOMMU v2).
> The @mm stays referenced till the container is destroyed.
> 
> This replaces current->mm with container->mm everywhere except debug
> prints.
> 
> This adds a check that current->mm is the same as the one stored in
> the container to prevent userspace from making changes to a memory
> context of other processes.
> 
> DMA map/unmap ioctls() do not check for @mm as they already check
> for @enabled which is set after tce_iommu_mm_set() is called.
> 
> This does not reference a task as multiple threads within the same mm
> are allowed to ioctl() to vfio and supposedly they will have same limits
> and capabilities and if they do not, we'll just fail with no harm made.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> Changes:
> v6:
> * updated the commit log about not referencing task
> 
> v5:
> * postpone referencing of mm
> 
> v4:
> * added check for container->mm!=current->mm in tce_iommu_ioctl()
> for all ioctls and removed other redundand checks
> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 159 
> ++--
>  1 file changed, 99 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 88622be..b2fb05ac 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -31,49 +31,49 @@
>  static void tce_iommu_detach_group(void *iommu_data,
>   struct iommu_group *iommu_group);
>  
> -static long try_increment_locked_vm(long npages)
> +static long try_increment_locked_vm(struct mm_struct *mm, long npages)
>  {
>   long ret = 0, locked, lock_limit;
>  
> - if (!current || !current->mm)
> - return -ESRCH; /* process exited */
> + if (!mm)
> + return -EPERM;

Can this happen in practice, or should it be a WARN_ON?

>  
>   if (!npages)
>   return 0;
>  
> - down_write(>mm->mmap_sem);
> - locked = current->mm->locked_vm + npages;
> + down_write(>mmap_sem);
> + locked = mm->locked_vm + npages;
>   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>   ret = -ENOMEM;
>   else
> - current->mm->locked_vm += npages;
> + mm->locked_vm += npages;
>  
>   pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
>   npages << PAGE_SHIFT,
> - current->mm->locked_vm << PAGE_SHIFT,
> + mm->locked_vm << PAGE_SHIFT,
>   rlimit(RLIMIT_MEMLOCK),
>   ret ? " - exceeded" : "");
>  
> - up_write(>mm->mmap_sem);
> + up_write(>mmap_sem);
>  
>   return ret;
>  }
>  
> -static void decrement_locked_vm(long npages)
> +static void decrement_locked_vm(struct mm_struct *mm, long npages)
>  {
> - if (!current || !current->mm || !npages)
> - return; /* process exited */
> + if (!mm && !npages)

Do you mean &&, or ||?

> + return;
>  
> - down_write(>mm->mmap_sem);
> - if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> - npages = current->mm->locked_vm;
> - current->mm->locked_vm -= npages;
> + down_write(>mmap_sem);
> + if (WARN_ON_ONCE(npages > mm->locked_vm))
> + npages = mm->locked_vm;
> + mm->locked_vm -= npages;
>   pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid,
>   npages << PAGE_SHIFT,
> - current->mm->locked_vm << PAGE_SHIFT,
> + mm->locked_vm << PAGE_SHIFT,
>   rlimit(RLIMIT_MEMLOCK));
> - up_write(>mm->mmap_sem);
> + up_write(>mmap_sem);
>  }
>  
>  /*
> @@ -99,26 +99,38 @@ struct tce_container {
>   bool v2;
>   bool def_window_pending;
>   unsigned long locked_pages;
> + struct mm_struct *mm;
>   struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
>   struct list_head group_list;
>  };
>  
> +static long tce_iommu_mm_set(struct tce_container *container)
> +{
> + if (container->mm) {
> + if (container->mm == current->mm)
> + return 0;
> + return -EPERM;
> + }
> + BUG_ON(!current->mm);
> + container->mm = current->mm;
> + atomic_inc(>mm->mm_count);
> 

Re: [PATCH kernel v6 7/7] powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown

2016-11-28 Thread David Gibson
On Thu, Nov 24, 2016 at 04:48:10PM +1100, Alexey Kardashevskiy wrote:
> At the moment the userspace tool is expected to request pinning of
> the entire guest RAM when VFIO IOMMU SPAPR v2 driver is present.
> When the userspace process finishes, all the pinned pages need to
> be put; this is done as a part of the userspace memory context (MM)
> destruction which happens on the very last mmdrop().
> 
> This approach has a problem that a MM of the userspace process
> may live longer than the userspace process itself as kernel threads
> use userspace process MMs which was runnning on a CPU where
> the kernel thread was scheduled to. If this happened, the MM remains
> referenced until this exact kernel thread wakes up again
> and releases the very last reference to the MM, on an idle system this
> can take even hours.

Incidentally, I think this change will also help if we ever had a use
case of a long-lived process that only occasionally (rather than
constantly) uses VFIO.  IIUC, before this change, when the process
shuts down VFIO the locked memory would still be accounted until the
process exits, even if it keeps running for a long time without VFIO.

> This moves preregistered regions tracking from MM to VFIO; insteads of
> using mm_iommu_table_group_mem_t::used, tce_container::prereg_list is
> added so each container releases regions which it has pre-registered.
> 
> This changes the userspace interface to return EBUSY if a memory
> region is already registered in a container. However it should not
> have any practical effect as the only userspace tool available now
> does register memory region once per container anyway.
> 
> As tce_iommu_register_pages/tce_iommu_unregister_pages are called
> under container->lock, this does not need additional locking.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: Nicholas Piggin 
> ---
> Changes:
> v4:
> * changed tce_iommu_register_pages() to call mm_iommu_find() first and
> avoid calling mm_iommu_put() if memory is preregistered already
> 
> v3:
> * moved tce_iommu_prereg_free() call out of list_for_each_entry()
> 
> v2:
> * updated commit log
> ---
>  arch/powerpc/mm/mmu_context_book3s64.c |  4 ---
>  arch/powerpc/mm/mmu_context_iommu.c| 11 ---
>  drivers/vfio/vfio_iommu_spapr_tce.c| 58 
> +-
>  3 files changed, 57 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
> b/arch/powerpc/mm/mmu_context_book3s64.c
> index ad82735..1a07969 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -159,10 +159,6 @@ static inline void destroy_pagetable_page(struct 
> mm_struct *mm)
>  
>  void destroy_context(struct mm_struct *mm)
>  {
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
> - mm_iommu_cleanup(mm);
> -#endif
> -

Would it be worth having a WARN_ON() here (or somewhere in the
callchain) to verify that all iommu preregs have been removed by the
time the context is removed?

>  #ifdef CONFIG_PPC_ICSWX
>   drop_cop(mm->context.acop, mm);
>   kfree(mm->context.cop_lockp);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
> b/arch/powerpc/mm/mmu_context_iommu.c
> index 4c6db09..104bad0 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -365,14 +365,3 @@ void mm_iommu_init(struct mm_struct *mm)
>  {
>   INIT_LIST_HEAD_RCU(>context.iommu_group_mem_list);
>  }
> -
> -void mm_iommu_cleanup(struct mm_struct *mm)
> -{
> - struct mm_iommu_table_group_mem_t *mem, *tmp;
> -
> - list_for_each_entry_safe(mem, tmp, >context.iommu_group_mem_list,
> - next) {
> - list_del_rcu(>next);
> - mm_iommu_do_free(mem);
> - }
> -}
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index b2fb05ac..86c9348 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -89,6 +89,15 @@ struct tce_iommu_group {
>  };
>  
>  /*
> + * A container needs to remember which preregistered region  it has
> + * referenced to do proper cleanup at the userspace process exit.
> + */
> +struct tce_iommu_prereg {
> + struct list_head next;
> + struct mm_iommu_table_group_mem_t *mem;
> +};

So, as a future optimization, you might be able to (sort of) avoid
having both the per-container list and the per-mm list if you put a
private list pointer of some sort into mm_iommu_table_group_mem_t -
i.e. allowing both a list of all the preregs for a whole mm, and a
sub-list of all those for a single container on the same structure.

> +
> +/*
>   * The container descriptor supports only a single group per container.
>   * Required by the API as the container is not supplied with the IOMMU group
>   * at the moment of initialization.
> @@ -102,6 +111,7 @@ struct tce_container {
>   struct mm_struct *mm;
>   struct iommu_table 

Re: [PATCH v4 7/7] PCI: Add comments about ROM BAR updating

2016-11-28 Thread Gavin Shan
On Mon, Nov 28, 2016 at 10:16:48PM -0600, Bjorn Helgaas wrote:
>pci_update_resource() updates a hardware BAR so its address matches the
>kernel's struct resource UNLESS it's a disabled ROM BAR.  We only update
>those when we enable the ROM.
>
>It's not obvious from the code why ROM BARs should be handled specially.
>Apparently there are Matrox devices with defective ROM BARs that read as
>zero when disabled.  That means that if pci_enable_rom() reads the disabled
>BAR, sets PCI_ROM_ADDRESS_ENABLE (without re-inserting the address), and
>writes it back, it would enable the ROM at address zero.
>
>Add comments and references to explain why we can't make the code look more
>rational.
>
>The code changes are from 755528c860b0 ("Ignore disabled ROM resources at
>setup") and 8085ce084c0f ("[PATCH] Fix PCI ROM mapping").
>
>Link: https://lkml.org/lkml/2005/8/30/138
>Signed-off-by: Bjorn Helgaas 

Reviewed-by: Gavin Shan 

>---
> drivers/pci/rom.c   |5 +
> drivers/pci/setup-res.c |6 ++
> 2 files changed, 11 insertions(+)
>
>diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c
>index 06663d3..b6edb18 100644
>--- a/drivers/pci/rom.c
>+++ b/drivers/pci/rom.c
>@@ -35,6 +35,11 @@ int pci_enable_rom(struct pci_dev *pdev)
>   if (res->flags & IORESOURCE_ROM_SHADOW)
>   return 0;
>
>+  /*
>+   * Ideally pci_update_resource() would update the ROM BAR address,
>+   * and we would only set the enable bit here.  But apparently some
>+   * devices have buggy ROM BARs that read as zero when disabled.
>+   */
>   pcibios_resource_to_bus(pdev->bus, , res);
>   pci_read_config_dword(pdev, pdev->rom_base_reg, _addr);
>   rom_addr &= ~PCI_ROM_ADDRESS_MASK;
>diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>index 09bdff7..57d7041 100644
>--- a/drivers/pci/setup-res.c
>+++ b/drivers/pci/setup-res.c
>@@ -67,6 +67,12 @@ static void pci_std_update_resource(struct pci_dev *dev, 
>int resno)
>   if (resno < PCI_ROM_RESOURCE) {
>   reg = PCI_BASE_ADDRESS_0 + 4 * resno;
>   } else if (resno == PCI_ROM_RESOURCE) {
>+
>+  /*
>+   * Apparently some Matrox devices have ROM BARs that read
>+   * as zero when disabled, so don't update ROM BARs unless
>+   * they're enabled.  See https://lkml.org/lkml/2005/8/30/138.
>+   */
>   if (!(res->flags & IORESOURCE_ROM_ENABLE))
>   return;
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [PATCH v4 6/7] PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE

2016-11-28 Thread Gavin Shan
On Mon, Nov 28, 2016 at 10:16:07PM -0600, Bjorn Helgaas wrote:
>Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE.
>PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if
>we're reading or writing a BAR register value, that's what we should use.
>IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags.
>
>Signed-off-by: Bjorn Helgaas 

Reviewed-by: Gavin Shan 

>---
> drivers/pci/probe.c |3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>index ab00267..cf7670e 100644
>--- a/drivers/pci/probe.c
>+++ b/drivers/pci/probe.c
>@@ -227,7 +227,8 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
>type,
>   mask64 = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>   }
>   } else {
>-  res->flags |= (l & IORESOURCE_ROM_ENABLE);
>+  if (l & PCI_ROM_ADDRESS_ENABLE)
>+  res->flags |= IORESOURCE_ROM_ENABLE;
>   l64 = l & PCI_ROM_ADDRESS_MASK;
>   sz64 = sz & PCI_ROM_ADDRESS_MASK;
>   mask64 = (u32)PCI_ROM_ADDRESS_MASK;
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [PATCH v4 5/7] PCI: Remove pci_resource_bar() and pci_iov_resource_bar()

2016-11-28 Thread Gavin Shan
On Mon, Nov 28, 2016 at 10:15:42PM -0600, Bjorn Helgaas wrote:
>pci_std_update_resource() only deals with standard BARs, so we don't have
>to worry about the complications of VF BARs in an SR-IOV capability.
>
>Compute the BAR address inline and remove pci_resource_bar().  That makes
>pci_iov_resource_bar() unused, so remove that as well.
>
>Signed-off-by: Bjorn Helgaas 

Reviewed-by: Gavin Shan 

>---
> drivers/pci/iov.c   |   18 --
> drivers/pci/pci.c   |   30 --
> drivers/pci/pci.h   |6 --
> drivers/pci/setup-res.c |   13 +++--
> 4 files changed, 7 insertions(+), 60 deletions(-)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index a800ba2..c9c84ba 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -554,24 +554,6 @@ void pci_iov_release(struct pci_dev *dev)
> }
>
> /**
>- * pci_iov_resource_bar - get position of the SR-IOV BAR
>- * @dev: the PCI device
>- * @resno: the resource number
>- *
>- * Returns position of the BAR encapsulated in the SR-IOV capability.
>- */
>-int pci_iov_resource_bar(struct pci_dev *dev, int resno)
>-{
>-  if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCE_END)
>-  return 0;
>-
>-  BUG_ON(!dev->is_physfn);
>-
>-  return dev->sriov->pos + PCI_SRIOV_BAR +
>-  4 * (resno - PCI_IOV_RESOURCES);
>-}
>-
>-/**
>  * pci_iov_update_resource - update a VF BAR
>  * @dev: the PCI device
>  * @resno: the resource number
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index 631eac2..ec3f16d 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -4827,36 +4827,6 @@ int pci_select_bars(struct pci_dev *dev, unsigned long 
>flags)
> }
> EXPORT_SYMBOL(pci_select_bars);
>
>-/**
>- * pci_resource_bar - get position of the BAR associated with a resource
>- * @dev: the PCI device
>- * @resno: the resource number
>- * @type: the BAR type to be filled in
>- *
>- * Returns BAR position in config space, or 0 if the BAR is invalid.
>- */
>-int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type)
>-{
>-  int reg;
>-
>-  if (resno < PCI_ROM_RESOURCE) {
>-  *type = pci_bar_unknown;
>-  return PCI_BASE_ADDRESS_0 + 4 * resno;
>-  } else if (resno == PCI_ROM_RESOURCE) {
>-  *type = pci_bar_mem32;
>-  return dev->rom_base_reg;
>-  } else if (resno < PCI_BRIDGE_RESOURCES) {
>-  /* device specific resource */
>-  *type = pci_bar_unknown;
>-  reg = pci_iov_resource_bar(dev, resno);
>-  if (reg)
>-  return reg;
>-  }
>-
>-  dev_err(>dev, "BAR %d: invalid resource\n", resno);
>-  return 0;
>-}
>-
> /* Some architectures require additional programming to enable VGA */
> static arch_set_vga_state_t arch_set_vga_state;
>
>diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>index 5bfcb92..a5d37f6 100644
>--- a/drivers/pci/pci.h
>+++ b/drivers/pci/pci.h
>@@ -245,7 +245,6 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int 
>devfn, u32 *pl,
> int pci_setup_device(struct pci_dev *dev);
> int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>   struct resource *res, unsigned int reg);
>-int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type);
> void pci_configure_ari(struct pci_dev *dev);
> void __pci_bus_size_bridges(struct pci_bus *bus,
>   struct list_head *realloc_head);
>@@ -289,7 +288,6 @@ static inline void pci_restore_ats_state(struct pci_dev 
>*dev)
> #ifdef CONFIG_PCI_IOV
> int pci_iov_init(struct pci_dev *dev);
> void pci_iov_release(struct pci_dev *dev);
>-int pci_iov_resource_bar(struct pci_dev *dev, int resno);
> void pci_iov_update_resource(struct pci_dev *dev, int resno);
> resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
> void pci_restore_iov_state(struct pci_dev *dev);
>@@ -304,10 +302,6 @@ static inline void pci_iov_release(struct pci_dev *dev)
>
> {
> }
>-static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno)
>-{
>-  return 0;
>-}
> static inline void pci_restore_iov_state(struct pci_dev *dev)
> {
> }
>diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>index ee0be34..09bdff7 100644
>--- a/drivers/pci/setup-res.c
>+++ b/drivers/pci/setup-res.c
>@@ -32,7 +32,6 @@ static void pci_std_update_resource(struct pci_dev *dev, int 
>resno)
>   u16 cmd;
>   u32 new, check, mask;
>   int reg;
>-  enum pci_bar_type type;
>   struct resource *res = dev->resource + resno;
>
>   /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */
>@@ -65,14 +64,16 @@ static void pci_std_update_resource(struct pci_dev *dev, 
>int resno)
>   else
>   mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>
>-  reg = pci_resource_bar(dev, resno, );
>-  if (!reg)
>-  return;
>-  if (type != pci_bar_unknown) 

Re: [mm v2 0/3] Support memory cgroup hotplug

2016-11-28 Thread Balbir Singh


On 29/11/16 11:42, Tejun Heo wrote:
> Hello, Balbir.
> 
> On Tue, Nov 29, 2016 at 11:09:26AM +1100, Balbir Singh wrote:
>> On 29/11/16 08:10, Tejun Heo wrote:
>>> On Thu, Nov 24, 2016 at 12:05:12AM +1100, Balbir Singh wrote:
 On my desktop NODES_SHIFT is 6, many distro kernels have it a 9. I've known
 of solutions that use fake NUMA for partitioning and need as many nodes as
 possible.
>>>
>>> It was a crude kludge that people used before memcg.  If people still
>>> use it, that's fine but we don't want to optimize / make code
>>> complicated for it, so let's please put away this part of
>>> justification.
>>
>> Are you suggesting those use cases can be ignored now?
> 
> Don't do that.  When did I say that?  What I said is that it isn't a
> good idea to optimize and complicate the code base for it at this
> point.  It shouldn't a controversial argument given fake numa's
> inherent issues and general lack of popularity.
> 
> Besides, does node hotplug even apply to fake numa?  ISTR it being
> configured statically on the boot prompt.

Good point, not sure if it worked with that setup as well.

> 
>>> NUMA code already has possible detection.  Why not simply make memcg
>>> use those instead of MAX_NUMNODES like how we use nr_cpu_ids instead
>>> of NR_CPUS?
>>
>> nodes_possible_map is set to node_online_map at the moment for ppc64.
>> Which becomes a problem when hotplugging a node that was not already
>> online.
>>
>> I am not sure what you mean by possible detection. node_possible_map
>> is set based on CONFIG_NODE_SHIFT and then can be adjusted by the
>> architecture (if desired). Are you suggesting firmware populate it
>> in?
> 
> That's what we do with cpus.  The kernel is built with high maximum
> limit and the kernel queries the firmware during boot to determine how
> many are actually possible on the system, which in most cases isn't
> too far from what's already on the system.  I don't see why we would
> take a different approach with NUMA nodes.

Agreed for the short term. I think memory hotplug needs a review and we'll
probably get it fixed as we make progress along the way

Balbir


Re: [PATCH v4 4/7] PCI: Don't update VF BARs while VF memory space is enabled

2016-11-28 Thread Gavin Shan
On Mon, Nov 28, 2016 at 10:15:21PM -0600, Bjorn Helgaas wrote:
>If we update a VF BAR while it's enabled, there are two potential problems:
>
>  1) Any driver that's using the VF has a cached BAR value that is stale
> after the update, and
>
>  2) We can't update 64-bit BARs atomically, so the intermediate state
> (new lower dword with old upper dword) may conflict with another
> device, and an access by a driver unrelated to the VF may cause a bus
> error.
>
>Warn about attempts to update VF BARs while they are enabled.  This is a
>programming error, so use dev_WARN() to get a backtrace.
>
>Signed-off-by: Bjorn Helgaas 

Reviewed-by: Gavin Shan 

>---
> drivers/pci/iov.c |8 
> 1 file changed, 8 insertions(+)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index d00ed5c..a800ba2 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -584,6 +584,7 @@ void pci_iov_update_resource(struct pci_dev *dev, int 
>resno)
>   struct resource *res = dev->resource + resno;
>   int vf_bar = resno - PCI_IOV_RESOURCES;
>   struct pci_bus_region region;
>+  u16 cmd;
>   u32 new;
>   int reg;
>
>@@ -595,6 +596,13 @@ void pci_iov_update_resource(struct pci_dev *dev, int 
>resno)
>   if (!iov)
>   return;
>
>+  pci_read_config_word(dev, iov->pos + PCI_SRIOV_CTRL, );
>+  if ((cmd & PCI_SRIOV_CTRL_VFE) && (cmd & PCI_SRIOV_CTRL_MSE)) {
>+  dev_WARN(>dev, "can't update enabled VF BAR%d %pR\n",
>+   vf_bar, res);
>+  return;
>+  }
>+
>   /*
>* Ignore unimplemented BARs, unused resource slots for 64-bit
>* BARs, and non-movable resources, e.g., those described via
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [PATCH v4 3/7] PCI: Separate VF BAR updates from standard BAR updates

2016-11-28 Thread Gavin Shan
On Mon, Nov 28, 2016 at 10:15:06PM -0600, Bjorn Helgaas wrote:
>Previously pci_update_resource() used the same code path for updating
>standard BARs and VF BARs in SR-IOV capabilities.
>
>Split the VF BAR update into a new pci_iov_update_resource() internal
>interface, which makes it simpler to compute the BAR address (we can get
>rid of pci_resource_bar() and pci_iov_resource_bar()).
>
>This patch:
>
>  - Renames pci_update_resource() to pci_std_update_resource(),
>  - Adds pci_iov_update_resource(),
>  - Makes pci_update_resource() a wrapper that calls the appropriate one,
>
>No functional change intended.
>
>Signed-off-by: Bjorn Helgaas 

With below minor comments fixed:

Reviewed-by: Gavin Shan 

>---
> drivers/pci/iov.c   |   49 +++
> drivers/pci/pci.h   |1 +
> drivers/pci/setup-res.c |   13 +++-
> 3 files changed, 61 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index d41ec29..d00ed5c 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -571,6 +571,55 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
>   4 * (resno - PCI_IOV_RESOURCES);
> }
>
>+/**
>+ * pci_iov_update_resource - update a VF BAR
>+ * @dev: the PCI device
>+ * @resno: the resource number
>+ *
>+ * Update a VF BAR in the SR-IOV capability of a PF.
>+ */
>+void pci_iov_update_resource(struct pci_dev *dev, int resno)
>+{
>+  struct pci_sriov *iov = dev->is_physfn ? dev->sriov : NULL;
>+  struct resource *res = dev->resource + resno;
>+  int vf_bar = resno - PCI_IOV_RESOURCES;
>+  struct pci_bus_region region;
>+  u32 new;
>+  int reg;
>+
>+  /*
>+   * The generic pci_restore_bars() path calls this for all devices,
>+   * including VFs and non-SR-IOV devices.  If this is not a PF, we
>+   * have nothing to do.
>+   */
>+  if (!iov)
>+  return;
>+
>+  /*
>+   * Ignore unimplemented BARs, unused resource slots for 64-bit
>+   * BARs, and non-movable resources, e.g., those described via
>+   * Enhanced Allocation.
>+   */
>+  if (!res->flags)
>+  return;
>+
>+  if (res->flags & IORESOURCE_UNSET)
>+  return;
>+
>+  if (res->flags & IORESOURCE_PCI_FIXED)
>+  return;
>+
>+  pcibios_resource_to_bus(dev->bus, , res);
>+  new = region.start;
>+

The bits indicating the BAR's property (e.g. memory, IO etc) are missed in @new.

>+  reg = iov->pos + PCI_SRIOV_BAR + 4 * vf_bar;
>+  pci_write_config_dword(dev, reg, new);
>+  if (res->flags & IORESOURCE_MEM_64) {
>+  new = region.start >> 16 >> 16;

I think it was copied from pci_update_resource(). Why we can't just have "new = 
region.start >> 32"? 

>+  pci_write_config_dword(dev, reg + 4, new);
>+  }
>+}
>+
> resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
> int resno)
> {
>diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>index 4518562..5bfcb92 100644
>--- a/drivers/pci/pci.h
>+++ b/drivers/pci/pci.h
>@@ -290,6 +290,7 @@ static inline void pci_restore_ats_state(struct pci_dev 
>*dev)
> int pci_iov_init(struct pci_dev *dev);
> void pci_iov_release(struct pci_dev *dev);
> int pci_iov_resource_bar(struct pci_dev *dev, int resno);
>+void pci_iov_update_resource(struct pci_dev *dev, int resno);
> resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
> void pci_restore_iov_state(struct pci_dev *dev);
> int pci_iov_bus_range(struct pci_bus *bus);
>diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>index d2a32d8..ee0be34 100644
>--- a/drivers/pci/setup-res.c
>+++ b/drivers/pci/setup-res.c
>@@ -25,8 +25,7 @@
> #include 
> #include "pci.h"
>
>-
>-void pci_update_resource(struct pci_dev *dev, int resno)
>+static void pci_std_update_resource(struct pci_dev *dev, int resno)
> {
>   struct pci_bus_region region;
>   bool disable;
>@@ -109,6 +108,16 @@ void pci_update_resource(struct pci_dev *dev, int resno)
>   pci_write_config_word(dev, PCI_COMMAND, cmd);
> }
>
>+void pci_update_resource(struct pci_dev *dev, int resno)
>+{
>+  if (resno <= PCI_ROM_RESOURCE)
>+  pci_std_update_resource(dev, resno);
>+#ifdef CONFIG_PCI_IOV
>+  else if (resno >= PCI_IOV_RESOURCES && resno < PCI_IOV_RESOURCE_END)

The last BAR is missed:

else if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)

>+  pci_iov_update_resource(dev, resno);
>+#endif
>+}
>+
> int pci_claim_resource(struct pci_dev *dev, int resource)
> {
>   struct resource *res = >resource[resource];
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [PATCH v3 4/4] powerpc/perf: macros for PowerISA v3.0 format encoding

2016-11-28 Thread Michael Ellerman
Madhavan Srinivasan  writes:

> diff --git a/arch/powerpc/perf/isa207-common.c 
> b/arch/powerpc/perf/isa207-common.c
> index 2a2040ea5f99..e747bbf06661 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -55,6 +55,81 @@ static inline bool event_is_fab_match(u64 event)
>   return (event == 0x30056 || event == 0x4f052);
>  }
>  
> +static bool is_event_valid(u64 event)
> +{
> + if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> + (cpu_has_feature(CPU_FTR_POWER9_DD1)) &&

You don't need ARCH_300 in these checks.

POWER9_DD1 implies ARCH_300.

And the way you've written it you have two arms that use
EVENT_VALID_MASK, which is confusing.

> + (event & ~EVENT_VALID_MASK))
> + return false;
> + else if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> + (event & ~ISA300_EVENT_VALID_MASK))
> + return false;
> + else if (event & ~EVENT_VALID_MASK)
> + return false;
> +
> + return true;
> +}

I think it would read better as:

u64 valid_mask = EVENT_VALID_MASK;

if (cpu_has_feature(CPU_FTR_ARCH_300) && 
!cpu_has_feature(CPU_FTR_POWER9_DD1))
valid_mask = ISA300_EVENT_VALID_MASK;

return !(event & ~valid_mask);

> +static u64 mmcra_sdar_mode(u64 event)
> +{
> + u64 sm;
> +
> + if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> +(cpu_has_feature(CPU_FTR_POWER9_DD1))) {
> + goto sm_tlb;
> + } else if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + sm = (event >> ISA300_SDAR_MODE_SHIFT) & ISA300_SDAR_MODE_MASK;
> + if (sm)
> + return sm< + } else
> + goto sm_tlb;
> +
> +sm_tlb:
> + return MMCRA_SDAR_MODE_TLB;
> +}

You should not need a goto to implement that logic.

> +static u64 thresh_cmp_val(u64 value)
> +{
> + if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> +(cpu_has_feature(CPU_FTR_POWER9_DD1)))
> + goto thr_cmp;
> + else if (cpu_has_feature(CPU_FTR_ARCH_300))
> + return value< + else
> + goto thr_cmp;
> +thr_cmp:
> + return value< +}

And similarly for this one and all the others.

> diff --git a/arch/powerpc/perf/isa207-common.h 
> b/arch/powerpc/perf/isa207-common.h
> index 4d0a4e5017c2..0a240635cf48 100644
> --- a/arch/powerpc/perf/isa207-common.h
> +++ b/arch/powerpc/perf/isa207-common.h
> @@ -134,6 +134,24 @@
>PERF_SAMPLE_BRANCH_KERNEL  |\
>PERF_SAMPLE_BRANCH_HV)
>  
> +/* Contants to support PowerISA v3.0 encoding format */
> +#define ISA300_EVENT_COMBINE_SHIFT   10  /* Combine bit */
> +#define ISA300_EVENT_COMBINE_MASK0x3ull
> +#define ISA300_SDAR_MODE_SHIFT   50
> +#define ISA300_SDAR_MODE_MASK0x3ull

As I said in the previous patch, calling these P9 would be more accurate
I think. And shorter.


cheers


Re: [PATCH v4 2/7] PCI: Ignore BAR updates on virtual functions

2016-11-28 Thread Gavin Shan
On Mon, Nov 28, 2016 at 10:14:29PM -0600, Bjorn Helgaas wrote:
>VF BARs are read-only zero, so updating VF BARs will not have any effect.
>See the SR-IOV spec r1.1, sec 3.4.1.11.
>
>We already ignore these updates because of 70675e0b6a1a ("PCI: Don't try to
>restore VF BARs"); this merely restructures it slightly to make it easier
>to split updates for standard and SR-IOV BARs.
>
>Signed-off-by: Bjorn Helgaas 
>CC: Wei Yang 

Reviewed-by: Gavin Shan 

>---
> drivers/pci/pci.c   |4 
> drivers/pci/setup-res.c |5 ++---
> 2 files changed, 2 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index ba34907..631eac2 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -564,10 +564,6 @@ static void pci_restore_bars(struct pci_dev *dev)
> {
>   int i;
>
>-  /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */
>-  if (dev->is_virtfn)
>-  return;
>-
>   for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
>   pci_update_resource(dev, i);
> }
>diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>index 66c4d8f..d2a32d8 100644
>--- a/drivers/pci/setup-res.c
>+++ b/drivers/pci/setup-res.c
>@@ -36,10 +36,9 @@ void pci_update_resource(struct pci_dev *dev, int resno)
>   enum pci_bar_type type;
>   struct resource *res = dev->resource + resno;
>
>-  if (dev->is_virtfn) {
>-  dev_warn(>dev, "can't update VF BAR%d\n", resno);
>+  /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */
>+  if (dev->is_virtfn)
>   return;
>-  }
>
>   /*
>* Ignore resources for unimplemented BARs and unused resource slots
>



Re: [PATCH kernel v6 3/7] vfio/spapr: Postpone allocation of userspace version of TCE table

2016-11-28 Thread David Gibson
On Thu, Nov 24, 2016 at 04:48:06PM +1100, Alexey Kardashevskiy wrote:
> The iommu_table struct manages a hardware TCE table and a vmalloc'd
> table with corresponding userspace addresses. Both are allocated when
> the default DMA window is created and this happens when the very first
> group is attached to a container.
> 
> As we are going to allow the userspace to configure container in one
> memory context and pas container fd to another, we have to postpones
> such allocations till a container fd is passed to the destination
> user process so we would account locked memory limit against the actual
> container user constrainsts.
> 
> This postpones the it_userspace array allocation till it is used first
> time for mapping. The unmapping patch already checks if the array is
> allocated.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
> Changes:
> v6:
> * moved missing hunk from the next patch: tce_iommu_create_table()
> would decrement locked_vm while new caller - tce_iommu_build_v2() -
> will not; this adds a new return code to the DMA mapping path but
> this seems to be a minor change.
> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 20 +++-
>  1 file changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index d0c38b2..4efd2b2 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -515,6 +515,12 @@ static long tce_iommu_build_v2(struct tce_container 
> *container,
>   unsigned long hpa;
>   enum dma_data_direction dirtmp;
>  
> + if (!tbl->it_userspace) {
> + ret = tce_iommu_userspace_view_alloc(tbl);
> + if (ret)
> + return ret;
> + }
> +
>   for (i = 0; i < pages; ++i) {
>   struct mm_iommu_table_group_mem_t *mem = NULL;
>   unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl,
> @@ -588,15 +594,6 @@ static long tce_iommu_create_table(struct tce_container 
> *container,
>   WARN_ON(!ret && !(*ptbl)->it_ops->free);
>   WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size));
>  
> - if (!ret && container->v2) {
> - ret = tce_iommu_userspace_view_alloc(*ptbl);
> - if (ret)
> - (*ptbl)->it_ops->free(*ptbl);
> - }
> -
> - if (ret)
> - decrement_locked_vm(table_size >> PAGE_SHIFT);
> -
>   return ret;
>  }
>  
> @@ -1068,10 +1065,7 @@ static int tce_iommu_take_ownership(struct 
> tce_container *container,
>   if (!tbl || !tbl->it_map)
>   continue;
>  
> - rc = tce_iommu_userspace_view_alloc(tbl);
> - if (!rc)
> - rc = iommu_take_ownership(tbl);
> -
> + rc = iommu_take_ownership(tbl);
>   if (rc) {
>   for (j = 0; j < i; ++j)
>   iommu_release_ownership(

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel v6 5/7] vfio/spapr: Postpone default window creation

2016-11-28 Thread David Gibson
On Thu, Nov 24, 2016 at 04:48:08PM +1100, Alexey Kardashevskiy wrote:
> We are going to allow the userspace to configure container in
> one memory context and pass container fd to another so
> we are postponing memory allocations accounted against
> the locked memory limit. One of previous patches took care of
> it_userspace.
> 
> At the moment we create the default DMA window when the first group is
> attached to a container; this is done for the userspace which is not
> DDW-aware but familiar with the SPAPR TCE IOMMU v2 in the part of memory
> pre-registration - such client expects the default DMA window to exist.
> 
> This postpones the default DMA window allocation till one of
> the folliwing happens:
> 1. first map/unmap request arrives;
> 2. new window is requested;
> This adds noop for the case when the userspace requested removal
> of the default window which has not been created yet.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
> Changes:
> v6:
> * new helper tce_iommu_create_default_window() moved to a separate patch;
> * creates a default window when new window is requested; it used to
> reset the def_window_pending flag instead;
> * def_window_pending handling (mostly) localized in
> tce_iommu_create_default_window() now, the only exception is removal
> of not yet created default window.
> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 40 
> +++--
>  1 file changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index a67bbfd..88622be 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -97,6 +97,7 @@ struct tce_container {
>   struct mutex lock;
>   bool enabled;
>   bool v2;
> + bool def_window_pending;
>   unsigned long locked_pages;
>   struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
>   struct list_head group_list;
> @@ -717,6 +718,9 @@ static long tce_iommu_create_default_window(struct 
> tce_container *container)
>   struct tce_iommu_group *tcegrp;
>   struct iommu_table_group *table_group;
>  
> + if (!container->def_window_pending)
> + return 0;
> +
>   if (!tce_groups_attached(container))
>   return -ENODEV;
>  
> @@ -730,6 +734,9 @@ static long tce_iommu_create_default_window(struct 
> tce_container *container)
>   table_group->tce32_size, 1, _addr);
>   WARN_ON_ONCE(!ret && start_addr);
>  
> + if (!ret)
> + container->def_window_pending = false;
> +
>   return ret;
>  }
>  
> @@ -823,6 +830,10 @@ static long tce_iommu_ioctl(void *iommu_data,
>   VFIO_DMA_MAP_FLAG_WRITE))
>   return -EINVAL;
>  
> + ret = tce_iommu_create_default_window(container);
> + if (ret)
> + return ret;
> +
>   num = tce_iommu_find_table(container, param.iova, );
>   if (num < 0)
>   return -ENXIO;
> @@ -886,6 +897,10 @@ static long tce_iommu_ioctl(void *iommu_data,
>   if (param.flags)
>   return -EINVAL;
>  
> + ret = tce_iommu_create_default_window(container);
> + if (ret)
> + return ret;
> +
>   num = tce_iommu_find_table(container, param.iova, );
>   if (num < 0)
>   return -ENXIO;
> @@ -1012,6 +1027,10 @@ static long tce_iommu_ioctl(void *iommu_data,
>  
>   mutex_lock(>lock);
>  
> + ret = tce_iommu_create_default_window(container);
> + if (ret)
> + return ret;
> +
>   ret = tce_iommu_create_window(container, create.page_shift,
>   create.window_size, create.levels,
>   _addr);
> @@ -1044,6 +1063,11 @@ static long tce_iommu_ioctl(void *iommu_data,
>   if (remove.flags)
>   return -EINVAL;
>  
> + if (container->def_window_pending && !remove.start_addr) {
> + container->def_window_pending = false;
> + return 0;
> + }
> +
>   mutex_lock(>lock);
>  
>   ret = tce_iommu_remove_window(container, remove.start_addr);
> @@ -1141,7 +1165,6 @@ static int tce_iommu_attach_group(void *iommu_data,
>   struct tce_container *container = iommu_data;
>   struct iommu_table_group *table_group;
>   struct tce_iommu_group *tcegrp = NULL;
> - bool create_default_window = false;
>  
>   mutex_lock(>lock);
>  
> @@ -1189,25 +1212,12 @@ static int tce_iommu_attach_group(void *iommu_data,
>   } else {
>   ret = tce_iommu_take_ownership_ddw(container, table_group);
>   if (!tce_groups_attached(container) && !container->tables[0])
> -

Re: [PATCH kernel v6 4/7] vfio/spapr: Add a helper to create default DMA window

2016-11-28 Thread David Gibson
On Thu, Nov 24, 2016 at 04:48:07PM +1100, Alexey Kardashevskiy wrote:
> There is already a helper to create a DMA window which does allocate
> a table and programs it to the IOMMU group. However
> tce_iommu_take_ownership_ddw() did not use it and did these 2 calls
> itself to simplify error path.
> 
> Since we are going to delay the default window creation till
> the default window is accessed/removed or new window is added,
> we need a helper to create a default window from all these cases.
> 
> This adds tce_iommu_create_default_window(). Since it relies on
> a VFIO container to have at least one IOMMU group (for future use),
> this changes tce_iommu_attach_group() to add a group to the container
> first and then call the new helper.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
> Changes:
> v6:
> * new to the patchset
> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 87 
> ++---
>  1 file changed, 42 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 4efd2b2..a67bbfd 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -710,6 +710,29 @@ static long tce_iommu_remove_window(struct tce_container 
> *container,
>   return 0;
>  }
>  
> +static long tce_iommu_create_default_window(struct tce_container *container)
> +{
> + long ret;
> + __u64 start_addr = 0;
> + struct tce_iommu_group *tcegrp;
> + struct iommu_table_group *table_group;
> +
> + if (!tce_groups_attached(container))
> + return -ENODEV;
> +
> + tcegrp = list_first_entry(>group_list,
> + struct tce_iommu_group, next);
> + table_group = iommu_group_get_iommudata(tcegrp->grp);
> + if (!table_group)
> + return -ENODEV;
> +
> + ret = tce_iommu_create_window(container, IOMMU_PAGE_SHIFT_4K,
> + table_group->tce32_size, 1, _addr);
> + WARN_ON_ONCE(!ret && start_addr);
> +
> + return ret;
> +}
> +
>  static long tce_iommu_ioctl(void *iommu_data,
>unsigned int cmd, unsigned long arg)
>  {
> @@ -1100,9 +1123,6 @@ static void tce_iommu_release_ownership_ddw(struct 
> tce_container *container,
>  static long tce_iommu_take_ownership_ddw(struct tce_container *container,
>   struct iommu_table_group *table_group)
>  {
> - long i, ret = 0;
> - struct iommu_table *tbl = NULL;
> -
>   if (!table_group->ops->create_table || !table_group->ops->set_window ||
>   !table_group->ops->release_ownership) {
>   WARN_ON_ONCE(1);
> @@ -,47 +1131,7 @@ static long tce_iommu_take_ownership_ddw(struct 
> tce_container *container,
>  
>   table_group->ops->take_ownership(table_group);
>  
> - /*
> -  * If it the first group attached, check if there is
> -  * a default DMA window and create one if none as
> -  * the userspace expects it to exist.
> -  */
> - if (!tce_groups_attached(container) && !container->tables[0]) {
> - ret = tce_iommu_create_table(container,
> - table_group,
> - 0, /* window number */
> - IOMMU_PAGE_SHIFT_4K,
> - table_group->tce32_size,
> - 1, /* default levels */
> - );
> - if (ret)
> - goto release_exit;
> - else
> - container->tables[0] = tbl;
> - }
> -
> - /* Set all windows to the new group */
> - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
> - tbl = container->tables[i];
> -
> - if (!tbl)
> - continue;
> -
> - /* Set the default window to a new group */
> - ret = table_group->ops->set_window(table_group, i, tbl);
> - if (ret)
> - goto release_exit;
> - }
> -
>   return 0;
> -
> -release_exit:
> - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
> - table_group->ops->unset_window(table_group, i);
> -
> - table_group->ops->release_ownership(table_group);
> -
> - return ret;
>  }
>  
>  static int tce_iommu_attach_group(void *iommu_data,
> @@ -1161,6 +1141,7 @@ static int tce_iommu_attach_group(void *iommu_data,
>   struct tce_container *container = iommu_data;
>   struct iommu_table_group *table_group;
>   struct tce_iommu_group *tcegrp = NULL;
> + bool create_default_window = false;
>  
>   mutex_lock(>lock);
>  
> @@ -1203,14 +1184,30 @@ static int tce_iommu_attach_group(void *iommu_data,
>   }
>  
>   if (!table_group->ops || !table_group->ops->take_ownership ||
> - !table_group->ops->release_ownership)
> + 

[PATCH 4/4] powerpc: link warning for orphan sections

2016-11-28 Thread Nicholas Piggin
Add --orphan-handling=warn to final link flags. This ensures we can
handle all sections explicitly. This would have caught subtle breakage
such as 7de3b27bac47da9de08409df1d69664acbb72197 at build-time.

Also bring some wayward sections into the fold:
- .text.hot and .text.unlikely are compiler generated sections.
- .sdata2, .dynsbss, .plt are used by PPC32
- We previously did not specify DWARF_DEBUG or STABS_DEBUG
- DWARF_DEBUG did not include all DWARF sections that can be emitted
- A number of sections are unused.

Signed-off-by: Nicholas Piggin 
---

I don't know if I've exactly got everything right here, particularly
with ppc32, so would appreciate people casting their eye over it.

 arch/powerpc/Makefile |  1 +
 arch/powerpc/kernel/vmlinux.lds.S | 16 ++--
 include/asm-generic/vmlinux.lds.h | 11 +++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 9fbc1c8..6a9f319 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -102,6 +102,7 @@ endif
 LDFLAGS_vmlinux-y := -Bstatic
 LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
 LDFLAGS_vmlinux:= $(LDFLAGS_vmlinux-y)
+LDFLAGS_vmlinux += $(call ld-option,--orphan-handling=warn)
 
 ifeq ($(CONFIG_PPC64),y)
 ifeq ($(call cc-option-yn,-mcmodel=medium),y)
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index f710d15..6bd42d3 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -86,7 +86,7 @@ SECTIONS
ALIGN_FUNCTION();
 #endif
/* careful! __ftr_alt_* sections need to be close to .text */
-   *(.text .fixup __ftr_alt_* .ref.text)
+   *(.text.hot .text .text.fixup .text.unlikely .fixup __ftr_alt_* 
.ref.text);
SCHED_TEXT
CPUIDLE_TEXT
LOCK_TEXT
@@ -253,7 +253,9 @@ SECTIONS
.data : AT(ADDR(.data) - LOAD_OFFSET) {
DATA_DATA
*(.sdata)
+   *(.sdata2)
*(.got.plt) *(.got)
+   *(.plt)
}
 #else
.data : AT(ADDR(.data) - LOAD_OFFSET) {
@@ -314,6 +316,16 @@ SECTIONS
_end = . ;
PROVIDE32 (end = .);
 
-   /* Sections to be discarded. */
+   STABS_DEBUG
+
+   DWARF_DEBUG
+
DISCARDS
+   /DISCARD/ : {
+   *(*.EMB.apuinfo)
+   *(.glink .iplt .plt .rela* .comment)
+   *(.gnu.version*)
+   *(.gnu.attributes)
+   *(.eh_frame)
+   }
 }
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 31e1d63..6d5d35f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -595,6 +595,7 @@
 #define SBSS(sbss_align)   \
. = ALIGN(sbss_align);  \
.sbss : AT(ADDR(.sbss) - LOAD_OFFSET) { \
+   *(.dynsbss) \
*(.sbss)\
*(.scommon) \
}
@@ -641,11 +642,21 @@
.debug_str  0 : { *(.debug_str) }   \
.debug_loc  0 : { *(.debug_loc) }   \
.debug_macinfo  0 : { *(.debug_macinfo) }   \
+   .debug_pubtypes 0 : { *(.debug_pubtypes) }  \
+   /* DWARF 3 */   \
+   .debug_ranges   0 : { *(.debug_ranges) }\
/* SGI/MIPS DWARF 2 extensions */   \
.debug_weaknames 0 : { *(.debug_weaknames) }\
.debug_funcnames 0 : { *(.debug_funcnames) }\
.debug_typenames 0 : { *(.debug_typenames) }\
.debug_varnames  0 : { *(.debug_varnames) } \
+   /* GNU DWARF 2 extensions */\
+   .debug_gnu_pubnames 0 : { *(.debug_gnu_pubnames) }  \
+   .debug_gnu_pubtypes 0 : { *(.debug_gnu_pubtypes) }  \
+   /* DWARF 4 */   \
+   .debug_types0 : { *(.debug_types) } \
+   /* DWARF 5 */   \
+   .debug_addr 0 : { *(.debug_addr) }
 
/* Stabs debugging sections.  */
 #define STABS_DEBUG\
-- 
2.10.2



[PATCH 3/4] powerpc/64: do not create new section for save/restore functions

2016-11-28 Thread Nicholas Piggin
There is no need to create a new section for these. Consolidate
with 32-bit and just use .text.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/lib/crtsavres.S | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/crtsavres.S b/arch/powerpc/lib/crtsavres.S
index 18af0b3..7e5e1c2 100644
--- a/arch/powerpc/lib/crtsavres.S
+++ b/arch/powerpc/lib/crtsavres.S
@@ -44,10 +44,10 @@
 
 #ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
 
-#ifndef CONFIG_PPC64
-
.section ".text"
 
+#ifndef CONFIG_PPC64
+
 /* Routines for saving integer registers, called by the compiler.  */
 /* Called with r11 pointing to the stack header word of the caller of the */
 /* function, just beyond the end of the integer save area.  */
@@ -314,8 +314,6 @@ _GLOBAL(_restvr_31)
 
 #else /* CONFIG_PPC64 */
 
-   .section ".text.save.restore","ax",@progbits
-
 .globl _savegpr0_14
 _savegpr0_14:
std r14,-144(r1)
-- 
2.10.2



[PATCH 2/4] powerpc/64: do not link crtsavres routines into vmlinux

2016-11-28 Thread Nicholas Piggin
The 64-bit linker creates save/restore functions on demand. crtsavres.o
is still required to link against modules (which do not get their
save/restore created automatically, due to not being a final link).

Make crtsavres.o extra-y on 64-bit, rather than obj-y.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/lib/Makefile | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 309361e8..cb02027 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -9,8 +9,12 @@ ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 CFLAGS_REMOVE_code-patching.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_feature-fixups.o = $(CC_FLAGS_FTRACE)
 
-obj-y += string.o alloc.o crtsavres.o code-patching.o \
-feature-fixups.o
+obj-y += string.o alloc.o code-patching.o feature-fixups.o
+
+# 64-bit linker creates .sfpr on demand for final link (vmlinux),
+# so it is only needed for modules.
+obj-$(CONFIG_PPC32) += crtsavres.o
+extra-$(CONFIG_PPC64) += crtsavres.o
 
 obj-$(CONFIG_PPC32)+= div64.o copy_32.o
 
-- 
2.10.2



[PATCH 1/4] powerpc/64: place sfpr section explicitly with the linker script

2016-11-28 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vmlinux.lds.S | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 861f4e1..f710d15 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -93,6 +93,14 @@ SECTIONS
KPROBES_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
+   /*
+* -Os builds call FP save/restore functions. The powerpc64
+* linker generates those on demand in the .sfpr section.
+* .sfpr gets placed at the beginning of a group of input
+* sections, which can break start-of-text offset if it is
+* included with the main text sections, so put it by itself.
+*/
+   *(.sfpr);
MEM_KEEP(init.text)
MEM_KEEP(exit.text)
 
-- 
2.10.2



[PATCH 0/4] Try orphan section warning again

2016-11-28 Thread Nicholas Piggin
Hi,

The orphan section warning patch had a minor issue with unhandled
.text.save.restore last time I sent it. I went back and tidied up
crtsavres a bit, so after these, I think we can enable orphan section
warnings.

There is a further step we can take which is to enable
--save-restore-funcs for module link, which means 64-bit does not
need crtsavres at all. That option is only in binutils 2.25 though,
so I'll send that later.

Thanks,
Nick

Nicholas Piggin (4):
  powerpc/64: place sfpr section explicitly with the linker script
  powerpc/64: do not link crtsavres routines into vmlinux
  powerpc/64: do not create new section for save/restore functions
  powerpc: link warning for orphan sections

 arch/powerpc/Makefile |  1 +
 arch/powerpc/kernel/vmlinux.lds.S | 24 ++--
 arch/powerpc/lib/Makefile |  8 ++--
 arch/powerpc/lib/crtsavres.S  |  6 ++
 include/asm-generic/vmlinux.lds.h | 11 +++
 5 files changed, 42 insertions(+), 8 deletions(-)

-- 
2.10.2



Re: [PATCH v3 3/4] powerpc/perf: PowerISA v3.0 raw event format encoding

2016-11-28 Thread Michael Ellerman
Madhavan Srinivasan  writes:

> Patch to update the PowerISA v3.0 raw event encoding format
> information and add support for the same in Power9-pmu.c.

I'm not sure if calling this the "PowerISA v3.0" encoding is right. It's
not actually specified in the ISA anywhere right?

Maybe it should just be the "P9 encoding" ?

Though admittedly that is slightly confusing because P9 DD1 doesn't
implement it.

cheers


[PATCH v4 7/7] PCI: Add comments about ROM BAR updating

2016-11-28 Thread Bjorn Helgaas
pci_update_resource() updates a hardware BAR so its address matches the
kernel's struct resource UNLESS it's a disabled ROM BAR.  We only update
those when we enable the ROM.

It's not obvious from the code why ROM BARs should be handled specially.
Apparently there are Matrox devices with defective ROM BARs that read as
zero when disabled.  That means that if pci_enable_rom() reads the disabled
BAR, sets PCI_ROM_ADDRESS_ENABLE (without re-inserting the address), and
writes it back, it would enable the ROM at address zero.

Add comments and references to explain why we can't make the code look more
rational.

The code changes are from 755528c860b0 ("Ignore disabled ROM resources at
setup") and 8085ce084c0f ("[PATCH] Fix PCI ROM mapping").

Link: https://lkml.org/lkml/2005/8/30/138
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/rom.c   |5 +
 drivers/pci/setup-res.c |6 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c
index 06663d3..b6edb18 100644
--- a/drivers/pci/rom.c
+++ b/drivers/pci/rom.c
@@ -35,6 +35,11 @@ int pci_enable_rom(struct pci_dev *pdev)
if (res->flags & IORESOURCE_ROM_SHADOW)
return 0;
 
+   /*
+* Ideally pci_update_resource() would update the ROM BAR address,
+* and we would only set the enable bit here.  But apparently some
+* devices have buggy ROM BARs that read as zero when disabled.
+*/
pcibios_resource_to_bus(pdev->bus, , res);
pci_read_config_dword(pdev, pdev->rom_base_reg, _addr);
rom_addr &= ~PCI_ROM_ADDRESS_MASK;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 09bdff7..57d7041 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -67,6 +67,12 @@ static void pci_std_update_resource(struct pci_dev *dev, int 
resno)
if (resno < PCI_ROM_RESOURCE) {
reg = PCI_BASE_ADDRESS_0 + 4 * resno;
} else if (resno == PCI_ROM_RESOURCE) {
+
+   /*
+* Apparently some Matrox devices have ROM BARs that read
+* as zero when disabled, so don't update ROM BARs unless
+* they're enabled.  See https://lkml.org/lkml/2005/8/30/138.
+*/
if (!(res->flags & IORESOURCE_ROM_ENABLE))
return;
 



[PATCH v4 6/7] PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE

2016-11-28 Thread Bjorn Helgaas
Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE.
PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if
we're reading or writing a BAR register value, that's what we should use.
IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/probe.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ab00267..cf7670e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -227,7 +227,8 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
mask64 = (u32)PCI_BASE_ADDRESS_MEM_MASK;
}
} else {
-   res->flags |= (l & IORESOURCE_ROM_ENABLE);
+   if (l & PCI_ROM_ADDRESS_ENABLE)
+   res->flags |= IORESOURCE_ROM_ENABLE;
l64 = l & PCI_ROM_ADDRESS_MASK;
sz64 = sz & PCI_ROM_ADDRESS_MASK;
mask64 = (u32)PCI_ROM_ADDRESS_MASK;



[PATCH v4 5/7] PCI: Remove pci_resource_bar() and pci_iov_resource_bar()

2016-11-28 Thread Bjorn Helgaas
pci_std_update_resource() only deals with standard BARs, so we don't have
to worry about the complications of VF BARs in an SR-IOV capability.

Compute the BAR address inline and remove pci_resource_bar().  That makes
pci_iov_resource_bar() unused, so remove that as well.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c   |   18 --
 drivers/pci/pci.c   |   30 --
 drivers/pci/pci.h   |6 --
 drivers/pci/setup-res.c |   13 +++--
 4 files changed, 7 insertions(+), 60 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index a800ba2..c9c84ba 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -554,24 +554,6 @@ void pci_iov_release(struct pci_dev *dev)
 }
 
 /**
- * pci_iov_resource_bar - get position of the SR-IOV BAR
- * @dev: the PCI device
- * @resno: the resource number
- *
- * Returns position of the BAR encapsulated in the SR-IOV capability.
- */
-int pci_iov_resource_bar(struct pci_dev *dev, int resno)
-{
-   if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCE_END)
-   return 0;
-
-   BUG_ON(!dev->is_physfn);
-
-   return dev->sriov->pos + PCI_SRIOV_BAR +
-   4 * (resno - PCI_IOV_RESOURCES);
-}
-
-/**
  * pci_iov_update_resource - update a VF BAR
  * @dev: the PCI device
  * @resno: the resource number
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 631eac2..ec3f16d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4827,36 +4827,6 @@ int pci_select_bars(struct pci_dev *dev, unsigned long 
flags)
 }
 EXPORT_SYMBOL(pci_select_bars);
 
-/**
- * pci_resource_bar - get position of the BAR associated with a resource
- * @dev: the PCI device
- * @resno: the resource number
- * @type: the BAR type to be filled in
- *
- * Returns BAR position in config space, or 0 if the BAR is invalid.
- */
-int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type)
-{
-   int reg;
-
-   if (resno < PCI_ROM_RESOURCE) {
-   *type = pci_bar_unknown;
-   return PCI_BASE_ADDRESS_0 + 4 * resno;
-   } else if (resno == PCI_ROM_RESOURCE) {
-   *type = pci_bar_mem32;
-   return dev->rom_base_reg;
-   } else if (resno < PCI_BRIDGE_RESOURCES) {
-   /* device specific resource */
-   *type = pci_bar_unknown;
-   reg = pci_iov_resource_bar(dev, resno);
-   if (reg)
-   return reg;
-   }
-
-   dev_err(>dev, "BAR %d: invalid resource\n", resno);
-   return 0;
-}
-
 /* Some architectures require additional programming to enable VGA */
 static arch_set_vga_state_t arch_set_vga_state;
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5bfcb92..a5d37f6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -245,7 +245,6 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int 
devfn, u32 *pl,
 int pci_setup_device(struct pci_dev *dev);
 int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
struct resource *res, unsigned int reg);
-int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type);
 void pci_configure_ari(struct pci_dev *dev);
 void __pci_bus_size_bridges(struct pci_bus *bus,
struct list_head *realloc_head);
@@ -289,7 +288,6 @@ static inline void pci_restore_ats_state(struct pci_dev 
*dev)
 #ifdef CONFIG_PCI_IOV
 int pci_iov_init(struct pci_dev *dev);
 void pci_iov_release(struct pci_dev *dev);
-int pci_iov_resource_bar(struct pci_dev *dev, int resno);
 void pci_iov_update_resource(struct pci_dev *dev, int resno);
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
 void pci_restore_iov_state(struct pci_dev *dev);
@@ -304,10 +302,6 @@ static inline void pci_iov_release(struct pci_dev *dev)
 
 {
 }
-static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno)
-{
-   return 0;
-}
 static inline void pci_restore_iov_state(struct pci_dev *dev)
 {
 }
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index ee0be34..09bdff7 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -32,7 +32,6 @@ static void pci_std_update_resource(struct pci_dev *dev, int 
resno)
u16 cmd;
u32 new, check, mask;
int reg;
-   enum pci_bar_type type;
struct resource *res = dev->resource + resno;
 
/* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */
@@ -65,14 +64,16 @@ static void pci_std_update_resource(struct pci_dev *dev, 
int resno)
else
mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
 
-   reg = pci_resource_bar(dev, resno, );
-   if (!reg)
-   return;
-   if (type != pci_bar_unknown) {
+   if (resno < PCI_ROM_RESOURCE) {
+   reg = PCI_BASE_ADDRESS_0 + 4 * resno;
+   } else if (resno == PCI_ROM_RESOURCE) {
if (!(res->flags & IORESOURCE_ROM_ENABLE))

[PATCH v4 4/7] PCI: Don't update VF BARs while VF memory space is enabled

2016-11-28 Thread Bjorn Helgaas
If we update a VF BAR while it's enabled, there are two potential problems:

  1) Any driver that's using the VF has a cached BAR value that is stale
 after the update, and

  2) We can't update 64-bit BARs atomically, so the intermediate state
 (new lower dword with old upper dword) may conflict with another
 device, and an access by a driver unrelated to the VF may cause a bus
 error.

Warn about attempts to update VF BARs while they are enabled.  This is a
programming error, so use dev_WARN() to get a backtrace.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index d00ed5c..a800ba2 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -584,6 +584,7 @@ void pci_iov_update_resource(struct pci_dev *dev, int resno)
struct resource *res = dev->resource + resno;
int vf_bar = resno - PCI_IOV_RESOURCES;
struct pci_bus_region region;
+   u16 cmd;
u32 new;
int reg;
 
@@ -595,6 +596,13 @@ void pci_iov_update_resource(struct pci_dev *dev, int 
resno)
if (!iov)
return;
 
+   pci_read_config_word(dev, iov->pos + PCI_SRIOV_CTRL, );
+   if ((cmd & PCI_SRIOV_CTRL_VFE) && (cmd & PCI_SRIOV_CTRL_MSE)) {
+   dev_WARN(>dev, "can't update enabled VF BAR%d %pR\n",
+vf_bar, res);
+   return;
+   }
+
/*
 * Ignore unimplemented BARs, unused resource slots for 64-bit
 * BARs, and non-movable resources, e.g., those described via



[PATCH v4 3/7] PCI: Separate VF BAR updates from standard BAR updates

2016-11-28 Thread Bjorn Helgaas
Previously pci_update_resource() used the same code path for updating
standard BARs and VF BARs in SR-IOV capabilities.

Split the VF BAR update into a new pci_iov_update_resource() internal
interface, which makes it simpler to compute the BAR address (we can get
rid of pci_resource_bar() and pci_iov_resource_bar()).

This patch:

  - Renames pci_update_resource() to pci_std_update_resource(),
  - Adds pci_iov_update_resource(),
  - Makes pci_update_resource() a wrapper that calls the appropriate one,

No functional change intended.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c   |   49 +++
 drivers/pci/pci.h   |1 +
 drivers/pci/setup-res.c |   13 +++-
 3 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index d41ec29..d00ed5c 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -571,6 +571,55 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
4 * (resno - PCI_IOV_RESOURCES);
 }
 
+/**
+ * pci_iov_update_resource - update a VF BAR
+ * @dev: the PCI device
+ * @resno: the resource number
+ *
+ * Update a VF BAR in the SR-IOV capability of a PF.
+ */
+void pci_iov_update_resource(struct pci_dev *dev, int resno)
+{
+   struct pci_sriov *iov = dev->is_physfn ? dev->sriov : NULL;
+   struct resource *res = dev->resource + resno;
+   int vf_bar = resno - PCI_IOV_RESOURCES;
+   struct pci_bus_region region;
+   u32 new;
+   int reg;
+
+   /*
+* The generic pci_restore_bars() path calls this for all devices,
+* including VFs and non-SR-IOV devices.  If this is not a PF, we
+* have nothing to do.
+*/
+   if (!iov)
+   return;
+
+   /*
+* Ignore unimplemented BARs, unused resource slots for 64-bit
+* BARs, and non-movable resources, e.g., those described via
+* Enhanced Allocation.
+*/
+   if (!res->flags)
+   return;
+
+   if (res->flags & IORESOURCE_UNSET)
+   return;
+
+   if (res->flags & IORESOURCE_PCI_FIXED)
+   return;
+
+   pcibios_resource_to_bus(dev->bus, , res);
+   new = region.start;
+
+   reg = iov->pos + PCI_SRIOV_BAR + 4 * vf_bar;
+   pci_write_config_dword(dev, reg, new);
+   if (res->flags & IORESOURCE_MEM_64) {
+   new = region.start >> 16 >> 16;
+   pci_write_config_dword(dev, reg + 4, new);
+   }
+}
+
 resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
  int resno)
 {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4518562..5bfcb92 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -290,6 +290,7 @@ static inline void pci_restore_ats_state(struct pci_dev 
*dev)
 int pci_iov_init(struct pci_dev *dev);
 void pci_iov_release(struct pci_dev *dev);
 int pci_iov_resource_bar(struct pci_dev *dev, int resno);
+void pci_iov_update_resource(struct pci_dev *dev, int resno);
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
 void pci_restore_iov_state(struct pci_dev *dev);
 int pci_iov_bus_range(struct pci_bus *bus);
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d2a32d8..ee0be34 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -25,8 +25,7 @@
 #include 
 #include "pci.h"
 
-
-void pci_update_resource(struct pci_dev *dev, int resno)
+static void pci_std_update_resource(struct pci_dev *dev, int resno)
 {
struct pci_bus_region region;
bool disable;
@@ -109,6 +108,16 @@ void pci_update_resource(struct pci_dev *dev, int resno)
pci_write_config_word(dev, PCI_COMMAND, cmd);
 }
 
+void pci_update_resource(struct pci_dev *dev, int resno)
+{
+   if (resno <= PCI_ROM_RESOURCE)
+   pci_std_update_resource(dev, resno);
+#ifdef CONFIG_PCI_IOV
+   else if (resno >= PCI_IOV_RESOURCES && resno < PCI_IOV_RESOURCE_END)
+   pci_iov_update_resource(dev, resno);
+#endif
+}
+
 int pci_claim_resource(struct pci_dev *dev, int resource)
 {
struct resource *res = >resource[resource];



[PATCH v4 2/7] PCI: Ignore BAR updates on virtual functions

2016-11-28 Thread Bjorn Helgaas
VF BARs are read-only zero, so updating VF BARs will not have any effect.
See the SR-IOV spec r1.1, sec 3.4.1.11.

We already ignore these updates because of 70675e0b6a1a ("PCI: Don't try to
restore VF BARs"); this merely restructures it slightly to make it easier
to split updates for standard and SR-IOV BARs.

Signed-off-by: Bjorn Helgaas 
CC: Wei Yang 
---
 drivers/pci/pci.c   |4 
 drivers/pci/setup-res.c |5 ++---
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ba34907..631eac2 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -564,10 +564,6 @@ static void pci_restore_bars(struct pci_dev *dev)
 {
int i;
 
-   /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */
-   if (dev->is_virtfn)
-   return;
-
for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
pci_update_resource(dev, i);
 }
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 66c4d8f..d2a32d8 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -36,10 +36,9 @@ void pci_update_resource(struct pci_dev *dev, int resno)
enum pci_bar_type type;
struct resource *res = dev->resource + resno;
 
-   if (dev->is_virtfn) {
-   dev_warn(>dev, "can't update VF BAR%d\n", resno);
+   /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */
+   if (dev->is_virtfn)
return;
-   }
 
/*
 * Ignore resources for unimplemented BARs and unused resource slots



[PATCH v4 1/7] PCI: Do any VF BAR updates before enabling the BARs

2016-11-28 Thread Bjorn Helgaas
From: Gavin Shan 

Previously we enabled VFs and enable their memory space before calling
pcibios_sriov_enable().  But pcibios_sriov_enable() may update the VF BARs:
for example, on PPC PowerNV we may change them to manage the association of
VFs to PEs.

Because 64-bit BARs cannot be updated atomically, it's unsafe to update
them while they're enabled.  The half-updated state may conflict with other
devices in the system.

Call pcibios_sriov_enable() before enabling the VFs so any BAR updates
happen while the VF BARs are disabled.

[bhelgaas: changelog]
Tested-by: Carol Soto 
Signed-off-by: Gavin Shan 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index e30f05c..d41ec29 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -306,13 +306,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
return rc;
}
 
-   pci_iov_set_numvfs(dev, nr_virtfn);
-   iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
-   pci_cfg_access_lock(dev);
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-   msleep(100);
-   pci_cfg_access_unlock(dev);
-
iov->initial_VFs = initial;
if (nr_virtfn < initial)
initial = nr_virtfn;
@@ -323,6 +316,13 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
goto err_pcibios;
}
 
+   pci_iov_set_numvfs(dev, nr_virtfn);
+   iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
+   pci_cfg_access_lock(dev);
+   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
+   msleep(100);
+   pci_cfg_access_unlock(dev);
+
for (i = 0; i < initial; i++) {
rc = pci_iov_add_virtfn(dev, i, 0);
if (rc)



[PATCH v4 0/7] Disable VF's memory space on updating IOV BARs

2016-11-28 Thread Bjorn Helgaas
This is a v4 of Gavin's series for handling VF BAR updates.  The
important piece is the first patch ("PCI: Do any VF BAR updates before
enabling the BARs").  That makes sure that if we update VF BARs, we do
it before enabling the VFs, and that is unchanged from v3.

The second patch in Gavin's series ("PCI: Disable VF's memory space on
updating IOV BAR in pci_update_resource()") temporarily disabled VF
memory space during an update.  Since the first patch does the update
before enabling VFs, this case shouldn't happen in practice.

But even if we want to update a VF BAR while VF memory space is
enabled, I think temporarily disabling it is wrong.  So I replaced
the second patch with a few patches that make that such an update
fail.

Please comment.  These are on my pci/virtualization branch.

Changelog
=
v4:
  * Don't disable VF's memory space when IOV BARs are updated; fail the
update instead. 
  * Split IOV BAR updates from standard BAR updates so IOV updates can go
in pci/iov.c.
  * Remove pci_resource_bar() and pci_iov_resource_bar() (the relevant
code is simpler when inlined into the callers).
  * Cleanup IORESOURCE_ROM_ENABLE usage.
  * Add comments about why ROMs are updated differently.
v3:
  * Disable VF's memory space when IOV BARs are updated in
pcibios_sriov_enable().
v2:
  * Added one patch calling pcibios_sriov_enable() before the VF
and VF BARs are enabled.

---

Bjorn Helgaas (6):
  PCI: Ignore BAR updates on virtual functions
  PCI: Separate VF BAR updates from standard BAR updates
  PCI: Don't update VF BARs while VF memory space is enabled
  PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
  PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
  PCI: Add comments about ROM BAR updating

Gavin Shan (1):
  PCI: Do any VF BAR updates before enabling the BARs


 drivers/pci/iov.c   |   69 +--
 drivers/pci/pci.c   |   34 ---
 drivers/pci/pci.h   |7 +
 drivers/pci/probe.c |3 +-
 drivers/pci/rom.c   |5 +++
 drivers/pci/setup-res.c |   37 ++---
 6 files changed, 88 insertions(+), 67 deletions(-)


Re: [PATCH v3 0/2] Disable VF's memory space on updating IOV BARs

2016-11-28 Thread Bjorn Helgaas
On Wed, Nov 23, 2016 at 05:49:10PM -0600, Bjorn Helgaas wrote:
> On Wed, Oct 26, 2016 at 12:15:34PM +1100, Gavin Shan wrote:
> > This moves pcibios_sriov_enable() to the point before VF and VF BARs
> > are enabled on PowerNV platform. Also, pci_update_resource() is used
> > to update IOV BARs on PowerNV platform, the PF might have been functional
> > when the function is called. We shouldn't disable PF's memory decoding
> > at that point. Instead, the VF's memory space should be disabled.
> > 
> > Changelog
> > =
> > v3:
> >   * Disable VF's memory space when IOV BARs are updated in
> > pcibios_sriov_enable().
> > v2:
> >   * Added one patch calling pcibios_sriov_enable() before the VF
> > and VF BARs are enabled.
> > 
> > Gavin Shan (2):
> >   PCI: Call pcibios_sriov_enable() before IOV BARs are enabled
> >   PCI: Disable VF's memory space on updating IOV BAR in
> > pci_update_resource()
> > 
> >  drivers/pci/iov.c   | 14 +++---
> >  drivers/pci/setup-res.c | 28 
> >  2 files changed, 27 insertions(+), 15 deletions(-)
> 
> I applied these to pci/virtualization for v4.10.  Thanks for your
> patience, Gavin.

This nagged at me over the holidays, and I think that if the VF has
memory decoding enabled, it's wrong to allow a VF BAR to be updated,
even if we temporarily disable memory decoding.

If the VF is enabled, a driver may have claimed it, and that means the
driver probably has ioremapped the BAR, and if we update the BAR, that
ioremap is now stale.  The driver has no idea that anything changed.

Of course, we may know that pci_update_resource() is never called
after a driver has claimed the VF, but I think it's much better to
restructure the callers so the VF BAR updates happen *before* enabling
the VFs, as you did in patch 1.

Once we have that, there's no point in temporarily disabling memory
decoding, so I worked up some patches to make the update fail.  I'll
post those in a minute and you can see what you think.

Bjorn


Re: [PATCH] powerpc/boot: request no dynamic linker for boot wrapper

2016-11-28 Thread Michael Ellerman
Nicholas Piggin  writes:
> On Mon, 28 Nov 2016 22:07:39 +1100
> Michael Ellerman  wrote:
>> Nicholas Piggin  writes:
>> > diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
>> > index 404b3aa..cd941a8 100755
>> > --- a/arch/powerpc/boot/wrapper
>> > +++ b/arch/powerpc/boot/wrapper
>> > @@ -181,6 +181,13 @@ case "$elfformat" in
>> >  elf32-powerpc)format=elf32ppc ;;
>> >  esac
>> >  
>> > +# Do not include PT_INTERP segment when linking pie. Non-pie linking
>> > +# just ignores this option.
>> > +LD_VERSION=$(${CROSS}ld --version | $srctree/scripts/ld-version.sh)
>> > +LD_NO_DL_MIN_VERSION=$(echo 2.26 | $srctree/scripts/ld-version.sh)
>> > +if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VERSION" ] ; then
>> > +  nodl="--no-dynamic-linker"
>> > +fi  
>> 
>> Some distros (RHEL at least?), ship the wrapper as a standalone script.
>> So I don't think we can call things in $srctree. Or at least I don't
>> know how that's supposed to work when it's shipped standalone.
>> We're also basically reinventing ld-option, which is a PITA.
>
> Okay I didn't realize that. It's already using mkuboot.sh, but only
> for uboot targets... I don't know, I don't have any good ideas at the
> moment.

That looks like a bug, but I guess no one runs Fedora/RHEL on those
machines? Previously it just called mkimage using $PATH.

The Fedora spec file does:

make DESTDIR=$RPM_BUILD_ROOT bootwrapper_install 
WRAPPER_OBJDIR=%{_libdir}/kernel-wrapper 
WRAPPER_DTSDIR=%{_libdir}/kernel-wrapper/dts

bootwrapper_install installs a bunch of files, and also calls:

quiet_cmd_install_wrapper = INSTALL $(patsubst 
$(DESTDIR)$(WRAPPER_BINDIR)/%,%,$@)
  cmd_install_wrapper = $(INSTALL)  -m0755 $(patsubst 
$(DESTDIR)$(WRAPPER_BINDIR)/%,$(srctree)/$(obj)/%,$@) $@ ;\
sed -i $@ -e 
's%^object=.*%object=$(WRAPPER_OBJDIR)%' \
  -e 
's%^objbin=.*%objbin=$(WRAPPER_BINDIR)%' \

ie. it seds the script. So we could probably just install ld-version.sh
into $(DESTDIR)$(WRAPPER_OBJDIR) and then sed $srctree maybe?

But it's old code and I'm not that across how it gets used in the wild.
CC'ing Gustavo who is our Fedora/RHEL/Power guy.

cheers


Re: [PATCH v2 2/4] usb: dwc2: Add binding for AHB burst

2016-11-28 Thread John Youn
On 11/22/2016 12:51 PM, Christian Lamparter wrote:
> On Monday, November 21, 2016 7:32:30 PM CET John Youn wrote:
>> On 11/21/2016 1:10 PM, Christian Lamparter wrote:
>>> On Monday, November 21, 2016 12:16:31 PM CET John Youn wrote:
 On 11/18/2016 12:18 PM, Christian Lamparter wrote:
> On Friday, November 18, 2016 8:16:08 AM CET Rob Herring wrote:
>> Also, perhaps you should allow that the compatible string can define the 
>> default.
>>
> I hoped you would say that :).
>
> I've attached a patch (on top of John Youn changes) [...]
> ---
> Subject: [PATCH] usb: dwc2: add a default ahb-burst setting for 
> amcc,dwc-otg
> [...]
> @@ -1097,6 +1097,22 @@ static const char *const ahb_bursts[] = {
> +/* [...] */
> +static const struct of_device_id dwc2_compat_ahb_bursts[] = {
> + {
> + .compatible = "amcc,dwc-otg",
> + .data = (void *) GAHBCFG_HBSTLEN_INCR16,
> + },
> +};
>> [...]
>> @@ -1107,6 +1123,12 @@ static int dwc2_get_property_ahb_burst(struct 
>> dwc2_hsotg *hsotg)
>   ret = device_property_read_string(hsotg->dev, "snps,ahb-burst", );
>   if (ret < 0) {
> + const struct of_device_id *match;
> +
> + match = of_match_node(dwc2_compat_ahb_bursts, node);
> + if (match)
> + ret = (int)match->data;
> +
>> [...]
 I'd prefer if you use the binding which requires no extra code in
 dwc2.
>>> I'm fine with either option. However it think that this would require
>>> that either Mark or Rob would allow an exception to the "keep existing
>>> dts the way they are) and ack the following change to the canyonlands.dts. 
>>
>> I don't know about that. Under what circumstance can the dts change?
> As far as I know, the justification for not changing the DTS is that a
> compiled DTB might be stored in an read-only ROM on a board. So it would
> be impossible to update it. Hence, the driver have work with the existing
> (and sometimes buggy or incomplete) information to stay compatible.
> 
> (Note: Thankfully, the canyonlands dtb is stored in flash, it's possible
> to update it. But it is an extra step that's not done automatically
> with make install). 
> 
>> The canyonlands dts was binding to an external vendor driver. So it
>> wasn't documented nor expected to work with dwc2 until your recent
>> patch adding the compatible string.
> 
> Oh, no that's not what happend. Let me explain why there was no "external 
> vendor driver": AMCC/APM were planing to upstream their hole platform. And
> in fact, the devs tried very hard to include their driver back in 2011 [0].
> But this driver was denied inclusion back then due to:
> 
> "[...]
> I would also like to point out that the same Synopsys USB controller
> is used in a number of other SoCs (especially ARM chips), and
> supported by other drivers, some of these even in mainline.
> 
> See http://thread.gmane.org/gmane.linux.usb.general/61714/focus=62139
> for a related thread.
> 
> Instead of trying to add a completely new driver to mainline (and one
> which has been repeatedly been rejected), I vote for focusing on the
> existing driver code that is already in mainline, and testing and
> improving this so we can use a single implementation of this driver
> code for all SoCs that use the same IP block." [1]
> 
> Of course: The listed link goes the "USB Host driver for i.MX28" driver.
> And this is an ehci-hcd like driver... Which is as you are well aware not
> that similar to the dwc2 OTG. And as far as I can tell: AMCC abandoned
> the patch series right there. 
> 
> Note: AMCC did however succeed in pushing your employer's Synopsys
> DesignWare SATA and DMA drivers to the kernel back then. And I'm happy
> to report that both drivers are still around and working fine for the 460EX 
> (sata_dwc_460ex.c[2] and the DW AHB DMA [3]). (The drivers also work for
> different platforms than the original PPC. I know that because I helped
> Andy Shevchenko with testing and pushing some fixes to it when he was
> adding support for the Intel Quark SoC, which uses the DWC SATA and DMA).

Ok thanks for clearing that up. I understand.

For now we can just set the property to "INCR16" based on the
compatible string. Perhaps in the future do this from a glue-layer
driver which binds to all compatible strings other than "snps,dwc2".

I won't be able to do anything with this until next week though.

Regards,
John


[PATCH] EDAC: mpc85xx: Add T2080 l2-cache support

2016-11-28 Thread Chris Packham
The l2-cache controller on the T2080 SoC has similar capabilities to the
others already supported by the mpc85xx_edac driver. Add it to the list
of compatible devices.

Signed-off-by: Chris Packham 
---
 arch/powerpc/boot/dts/fsl/t2081si-post.dtsi | 1 +
 drivers/edac/mpc85xx_edac.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi
index c744569a20e1..a97296c64eb2 100644
--- a/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi
@@ -678,5 +678,6 @@
compatible = "fsl,t2080-l2-cache-controller";
reg = <0xc2 0x4>;
next-level-cache = <>;
+   interrupts = <16 2 1 9>;
};
 };
diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index ff0567526ee3..aee6dcdae02a 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -613,6 +613,7 @@ static const struct of_device_id mpc85xx_l2_err_of_match[] 
= {
{ .compatible = "fsl,p1020-l2-cache-controller", },
{ .compatible = "fsl,p1021-l2-cache-controller", },
{ .compatible = "fsl,p2020-l2-cache-controller", },
+   { .compatible = "fsl,t2080-l2-cache-controller", },
{},
 };
 MODULE_DEVICE_TABLE(of, mpc85xx_l2_err_of_match);
-- 
2.10.2



Re: [PATCH] powerpc/mm: Fix lazy icache flush on pre-POWER5

2016-11-28 Thread Aneesh Kumar K.V
Benjamin Herrenschmidt  writes:

> On 64-bit CPUs with no-execute support and non-snooping icache, such as
> 970 or POWER4, we have a software mechanism to ensure coherency of the
> cache (using exec faults when needed).
>
> This was broken due to a logic inversion when that code was rewritten
> from assembly to C.

The asm code for reference is

BEGIN_FTR_SECTION
   mr  r4,r30
   mr  r5,r7
   bl  hash_page_do_lazy_icache
END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE)


Reviewed-by: Aneesh Kumar K.V 

>
> Signed-off-by: Benjamin Herrenschmidt 
> Fixes: 91f1da99792a1d133df94c4753510305353064a1
> Fixes: 89ff725051d177556b23d80f2a30f880a657a6c1
> Fixes: a43c0eb8364c022725df586e91dd753633374d66
> --
> diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
> index 42c702b..6fa450c 100644
> --- a/arch/powerpc/mm/hash64_4k.c
> +++ b/arch/powerpc/mm/hash64_4k.c
> @@ -55,7 +55,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
> unsigned long vsid,
>*/
>   rflags = htab_convert_pte_flags(new_pte);
>
> - if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
> + if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
>   !cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
>   rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);
>
> diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
> index 3bbbea0..1a68cb1 100644
> --- a/arch/powerpc/mm/hash64_64k.c
> +++ b/arch/powerpc/mm/hash64_64k.c
> @@ -87,7 +87,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
> unsigned long vsid,
>   subpg_pte = new_pte & ~subpg_prot;
>   rflags = htab_convert_pte_flags(subpg_pte);
>
> - if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
> + if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
>   !cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
>
>   /*
> @@ -258,7 +258,7 @@ int __hash_page_64K(unsigned long ea, unsigned long 
> access,
>
>   rflags = htab_convert_pte_flags(new_pte);
>
> - if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
> + if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
>   !cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
>   rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);



[PATCH] powerpc/mm: Fix lazy icache flush on pre-POWER5

2016-11-28 Thread Benjamin Herrenschmidt
On 64-bit CPUs with no-execute support and non-snooping icache, such as
970 or POWER4, we have a software mechanism to ensure coherency of the
cache (using exec faults when needed).

This was broken due to a logic inversion when that code was rewritten
from assembly to C.

Signed-off-by: Benjamin Herrenschmidt 
Fixes: 91f1da99792a1d133df94c4753510305353064a1
Fixes: 89ff725051d177556b23d80f2a30f880a657a6c1
Fixes: a43c0eb8364c022725df586e91dd753633374d66
--
diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index 42c702b..6fa450c 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -55,7 +55,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 */
rflags = htab_convert_pte_flags(new_pte);
 
-   if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
+   if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);
 
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 3bbbea0..1a68cb1 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -87,7 +87,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
subpg_pte = new_pte & ~subpg_prot;
rflags = htab_convert_pte_flags(subpg_pte);
 
-   if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
+   if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
 
/*
@@ -258,7 +258,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 
rflags = htab_convert_pte_flags(new_pte);
 
-   if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
+   if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);
 


Re: [PATCH] powerpc/boot: request no dynamic linker for boot wrapper

2016-11-28 Thread Nicholas Piggin
On Mon, 28 Nov 2016 14:25:31 +
Nick Clifton  wrote:

> Hi Nicholas,
> 
> >> ... this actually seems like a better fix to me.  If you do not want the 
> >> PT_INTERP segment, then telling this linker this is a good idea.  So 
> >> wouldn't
> >> a patch like this be a better solution to the problem ?  
> > 
> > Yes, I wasn't asking for the binutils change to be reverted.  
> 
> Oh right.  Actually it looks like at least part of the patch is going to have
> to be reverted, (the part that sorts the PT_LOAD segments into ascending 
> order), 
> as currently it breaks building Linux  kernels.  *sigh*

If the kernel has been doing the wrong thing, we can accept the breakage.
It's a matter for binutils policy in the end I suppose.

> 
> > I don't think the
> > boot wrapper is relying on this non-standard form. If we go with
> > --no-dynamic-linker then I'm assuming we get a standard ELF binary?
> > That seems desirable.  
> 
> Yes, you definitely should get a proper ELF binary.
> 
> > I was just checking whether this is the best think for the kernel to do.
> > Is it possible to get a similar behaviour using the linker script instead
> > (so it's compatible with older binutils)?  
> 
> Actually probably not.  :-(  Several backends, including the PPC, will now 
> attempt to automatically create and install the PT_INTERP segment unless you
> explicitly tell them not too.  Even if you are using a custom linker script.

Okay. It sounds like we should use --no-dynamic-linker whether or not your
patch is changed.

Thanks,
Nick


[PATCH v2 14/14] cxlflash: Migrate scsi command pointer to AFU command

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

Currently, when sending a SCSI command, the pointer is stored in a
reserved field of the AFU command descriptor for retrieval once the
SCSI command has completed. In order to support new descriptor formats
that make use of the reserved field, the pointer is migrated to outside
the descriptor where it can still be found during completion processing.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/common.h  |  1 +
 drivers/scsi/cxlflash/main.c| 10 +-
 drivers/scsi/cxlflash/sislite.h |  2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 6b8d1d3..0e9de5d 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -129,6 +129,7 @@ struct afu_cmd {
struct sisl_ioarcb rcb; /* IOARCB (cache line aligned) */
struct sisl_ioasa sa;   /* IOASA must follow IOARCB */
struct afu *parent;
+   struct scsi_cmnd *scp;
struct completion cevent;
 
u8 cmd_tmf:1;
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index d2d2d83..b17ebf6 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -151,7 +151,7 @@ static void process_cmd_err(struct afu_cmd *cmd, struct 
scsi_cmnd *scp)
  *
  * Prepares and submits command that has either completed or timed out to
  * the SCSI stack. Checks AFU command back into command pool for non-internal
- * (rcb.scp populated) commands.
+ * (cmd->scp populated) commands.
  */
 static void cmd_complete(struct afu_cmd *cmd)
 {
@@ -161,8 +161,8 @@ static void cmd_complete(struct afu_cmd *cmd)
struct cxlflash_cfg *cfg = afu->parent;
bool cmd_is_tmf;
 
-   if (cmd->rcb.scp) {
-   scp = cmd->rcb.scp;
+   if (cmd->scp) {
+   scp = cmd->scp;
if (unlikely(cmd->sa.ioasc))
process_cmd_err(cmd, scp);
else
@@ -315,7 +315,7 @@ static int send_tmf(struct afu *afu, struct scsi_cmnd *scp, 
u64 tmfcmd)
cfg->tmf_active = true;
spin_unlock_irqrestore(>tmf_slock, lock_flags);
 
-   cmd->rcb.scp = scp;
+   cmd->scp = scp;
cmd->parent = afu;
cmd->cmd_tmf = true;
 
@@ -445,7 +445,7 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scp)
cmd->rcb.data_ea = sg_dma_address(sg);
}
 
-   cmd->rcb.scp = scp;
+   cmd->scp = scp;
cmd->parent = afu;
 
cmd->rcb.ctx_id = afu->ctx_hndl;
diff --git a/drivers/scsi/cxlflash/sislite.h b/drivers/scsi/cxlflash/sislite.h
index 347fc16..1a2d09c 100644
--- a/drivers/scsi/cxlflash/sislite.h
+++ b/drivers/scsi/cxlflash/sislite.h
@@ -72,7 +72,7 @@ struct sisl_ioarcb {
u16 timeout;/* in units specified by req_flags */
u32 rsvd1;
u8 cdb[16]; /* must be in big endian */
-   struct scsi_cmnd *scp;
+   u64 reserved;   /* Reserved area */
 } __packed;
 
 struct sisl_rc {
-- 
2.1.0



[PATCH v2 13/14] cxlflash: Migrate IOARRIN specific routines to function pointers

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

As staging for supporting hardware with a different queuing mechanism,
move the send_cmd() and context_reset() routines to function pointers
that are configured when the AFU is initialized. In addition, rename
the existing routines to better reflect the queue model they support.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/common.h |  3 +++
 drivers/scsi/cxlflash/main.c   | 21 +++--
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index bed8e60..6b8d1d3 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -161,6 +161,9 @@ struct afu {
 * fields after this point
 */
 
+   int (*send_cmd)(struct afu *, struct afu_cmd *);
+   void (*context_reset)(struct afu_cmd *);
+
/* AFU HW */
struct cxl_ioctl_start_work work;
struct cxlflash_afu_map __iomem *afu_map;   /* entire MMIO map */
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 4e70c9a..d2d2d83 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -188,12 +188,10 @@ static void cmd_complete(struct afu_cmd *cmd)
 }
 
 /**
- * context_reset() - timeout handler for AFU commands
+ * context_reset_ioarrin() - reset command owner context via IOARRIN register
  * @cmd:   AFU command that timed out.
- *
- * Sends a reset to the AFU.
  */
-static void context_reset(struct afu_cmd *cmd)
+static void context_reset_ioarrin(struct afu_cmd *cmd)
 {
int nretry = 0;
u64 rrin = 0x1;
@@ -217,14 +215,14 @@ static void context_reset(struct afu_cmd *cmd)
 }
 
 /**
- * send_cmd() - sends an AFU command
+ * send_cmd_ioarrin() - sends an AFU command via IOARRIN register
  * @afu:   AFU associated with the host.
  * @cmd:   AFU command to send.
  *
  * Return:
  * 0 on success, SCSI_MLQUEUE_HOST_BUSY on failure
  */
-static int send_cmd(struct afu *afu, struct afu_cmd *cmd)
+static int send_cmd_ioarrin(struct afu *afu, struct afu_cmd *cmd)
 {
struct cxlflash_cfg *cfg = afu->parent;
struct device *dev = >dev->dev;
@@ -273,7 +271,7 @@ static int wait_resp(struct afu *afu, struct afu_cmd *cmd)
 
timeout = wait_for_completion_timeout(>cevent, timeout);
if (!timeout) {
-   context_reset(cmd);
+   afu->context_reset(cmd);
rc = -1;
}
 
@@ -330,7 +328,7 @@ static int send_tmf(struct afu *afu, struct scsi_cmnd *scp, 
u64 tmfcmd)
  SISL_REQ_FLAGS_TMF_CMD);
memcpy(cmd->rcb.cdb, , sizeof(tmfcmd));
 
-   rc = send_cmd(afu, cmd);
+   rc = afu->send_cmd(afu, cmd);
if (unlikely(rc)) {
spin_lock_irqsave(>tmf_slock, lock_flags);
cfg->tmf_active = false;
@@ -461,7 +459,7 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scp)
cmd->rcb.req_flags = req_flags;
memcpy(cmd->rcb.cdb, scp->cmnd, sizeof(cmd->rcb.cdb));
 
-   rc = send_cmd(afu, cmd);
+   rc = afu->send_cmd(afu, cmd);
if (unlikely(rc))
scsi_dma_unmap(scp);
 out:
@@ -1631,6 +1629,9 @@ static int init_afu(struct cxlflash_cfg *cfg)
goto err2;
}
 
+   afu->send_cmd = send_cmd_ioarrin;
+   afu->context_reset = context_reset_ioarrin;
+
pr_debug("%s: afu version %s, interface version 0x%llX\n", __func__,
 afu->version, afu->interface_version);
 
@@ -1723,7 +1724,7 @@ int cxlflash_afu_sync(struct afu *afu, ctx_hndl_t 
ctx_hndl_u,
*((__be16 *)>rcb.cdb[2]) = cpu_to_be16(ctx_hndl_u);
*((__be32 *)>rcb.cdb[4]) = cpu_to_be32(res_hndl_u);
 
-   rc = send_cmd(afu, cmd);
+   rc = afu->send_cmd(afu, cmd);
if (unlikely(rc))
goto out;
 
-- 
2.1.0



[PATCH v2 12/14] cxlflash: Cleanup queuecommand()

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

The queuecommand routine is disorganized where it populates the
private command and also contains some logic/statements that are
not needed given that cxlflash devices do not (and likely never
will) support scatter-gather.

Restructure the code to remove the unnecessary logic and create an
organized flow:

handle state -> DMA map -> populate command -> send command

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/main.c | 50 ++--
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index b763699..4e70c9a 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -388,11 +388,11 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scp)
struct afu *afu = cfg->afu;
struct device *dev = >dev->dev;
struct afu_cmd *cmd = sc_to_afucz(scp);
+   struct scatterlist *sg = scsi_sglist(scp);
u32 port_sel = scp->device->channel + 1;
-   int nseg, i, ncount;
-   struct scatterlist *sg;
+   u16 req_flags = SISL_REQ_FLAGS_SUP_UNDERRUN;
ulong lock_flags;
-   short lflag = 0;
+   int nseg = 0;
int rc = 0;
int kref_got = 0;
 
@@ -435,45 +435,35 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scp)
kref_get(>afu->mapcount);
kref_got = 1;
 
-   cmd->rcb.ctx_id = afu->ctx_hndl;
-   cmd->rcb.msi = SISL_MSI_RRQ_UPDATED;
-   cmd->rcb.port_sel = port_sel;
-   cmd->rcb.lun_id = lun_to_lunid(scp->device->lun);
-
-   if (scp->sc_data_direction == DMA_TO_DEVICE)
-   lflag = SISL_REQ_FLAGS_HOST_WRITE;
-   else
-   lflag = SISL_REQ_FLAGS_HOST_READ;
+   if (likely(sg)) {
+   nseg = scsi_dma_map(scp);
+   if (unlikely(nseg < 0)) {
+   dev_err(dev, "%s: Fail DMA map!\n", __func__);
+   rc = SCSI_MLQUEUE_HOST_BUSY;
+   goto out;
+   }
 
-   cmd->rcb.req_flags = (SISL_REQ_FLAGS_PORT_LUN_ID |
- SISL_REQ_FLAGS_SUP_UNDERRUN | lflag);
+   cmd->rcb.data_len = sg_dma_len(sg);
+   cmd->rcb.data_ea = sg_dma_address(sg);
+   }
 
-   /* Stash the scp in the reserved field, for reuse during interrupt */
cmd->rcb.scp = scp;
cmd->parent = afu;
 
-   nseg = scsi_dma_map(scp);
-   if (unlikely(nseg < 0)) {
-   dev_err(dev, "%s: Fail DMA map! nseg=%d\n",
-   __func__, nseg);
-   rc = SCSI_MLQUEUE_HOST_BUSY;
-   goto out;
-   }
+   cmd->rcb.ctx_id = afu->ctx_hndl;
+   cmd->rcb.msi = SISL_MSI_RRQ_UPDATED;
+   cmd->rcb.port_sel = port_sel;
+   cmd->rcb.lun_id = lun_to_lunid(scp->device->lun);
 
-   ncount = scsi_sg_count(scp);
-   scsi_for_each_sg(scp, sg, ncount, i) {
-   cmd->rcb.data_len = sg_dma_len(sg);
-   cmd->rcb.data_ea = sg_dma_address(sg);
-   }
+   if (scp->sc_data_direction == DMA_TO_DEVICE)
+   req_flags |= SISL_REQ_FLAGS_HOST_WRITE;
 
-   /* Copy the CDB from the scsi_cmnd passed in */
+   cmd->rcb.req_flags = req_flags;
memcpy(cmd->rcb.cdb, scp->cmnd, sizeof(cmd->rcb.cdb));
 
-   /* Send the command */
rc = send_cmd(afu, cmd);
if (unlikely(rc))
scsi_dma_unmap(scp);
-
 out:
if (kref_got)
kref_put(>mapcount, afu_unmap);
-- 
2.1.0



[PATCH v2 11/14] cxlflash: Cleanup send_tmf()

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

The send_tmf() routine includes some copy/paste cruft that can be
removed as well as the setting of an AFU command-specific while
holding the tmf_slock. While not a bug, it is out of place and
should be shifted down alongside the other command initialization
statements for clarity.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/main.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index db77030..b763699 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -299,12 +299,10 @@ static int wait_resp(struct afu *afu, struct afu_cmd *cmd)
  */
 static int send_tmf(struct afu *afu, struct scsi_cmnd *scp, u64 tmfcmd)
 {
-   struct afu_cmd *cmd = sc_to_afucz(scp);
-
u32 port_sel = scp->device->channel + 1;
-   short lflag = 0;
struct Scsi_Host *host = scp->device->host;
struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)host->hostdata;
+   struct afu_cmd *cmd = sc_to_afucz(scp);
struct device *dev = >dev->dev;
ulong lock_flags;
int rc = 0;
@@ -317,27 +315,21 @@ static int send_tmf(struct afu *afu, struct scsi_cmnd 
*scp, u64 tmfcmd)
  !cfg->tmf_active,
  cfg->tmf_slock);
cfg->tmf_active = true;
-   cmd->cmd_tmf = true;
spin_unlock_irqrestore(>tmf_slock, lock_flags);
 
+   cmd->rcb.scp = scp;
+   cmd->parent = afu;
+   cmd->cmd_tmf = true;
+
cmd->rcb.ctx_id = afu->ctx_hndl;
cmd->rcb.msi = SISL_MSI_RRQ_UPDATED;
cmd->rcb.port_sel = port_sel;
cmd->rcb.lun_id = lun_to_lunid(scp->device->lun);
-
-   lflag = SISL_REQ_FLAGS_TMF_CMD;
-
cmd->rcb.req_flags = (SISL_REQ_FLAGS_PORT_LUN_ID |
- SISL_REQ_FLAGS_SUP_UNDERRUN | lflag);
-
-   /* Stash the scp in the command, for reuse during interrupt */
-   cmd->rcb.scp = scp;
-   cmd->parent = afu;
-
-   /* Copy the CDB from the cmd passed in */
+ SISL_REQ_FLAGS_SUP_UNDERRUN |
+ SISL_REQ_FLAGS_TMF_CMD);
memcpy(cmd->rcb.cdb, , sizeof(tmfcmd));
 
-   /* Send the command */
rc = send_cmd(afu, cmd);
if (unlikely(rc)) {
spin_lock_irqsave(>tmf_slock, lock_flags);
-- 
2.1.0



[PATCH v2 10/14] cxlflash: Remove AFU command lock

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

The original design of the cxlflash driver required AFU commands
to convey state information across multiple threads. The IOASA
"host use" byte was used to track if a command was done, errored,
or timed out. A per-command spin lock was used to serialize access
to this byte. As this is no longer required with the introduction
of completions and various refactoring over time, the spin lock,
state tracking, and associated code can be removed. To support the
simplification, the wait_resp() routine is refactored to return a
success or failure. Additionally, as the simplification to the
AFU internal command routine, explicit assignments of AFU command
fields to zero are removed as the memory is zeroed upon allocation.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/common.h |  8 +---
 drivers/scsi/cxlflash/main.c   | 46 ++
 2 files changed, 16 insertions(+), 38 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 6211677..bed8e60 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -63,11 +63,6 @@ static inline void check_sizes(void)
 /* AFU defines a fixed size of 4K for command buffers (borrow 4K page define) 
*/
 #define CMD_BUFSIZE SIZE_4K
 
-/* flags in IOA status area for host use */
-#define B_DONE   0x01
-#define B_ERROR  0x02  /* set with B_DONE */
-#define B_TIMEOUT0x04  /* set with B_DONE & B_ERROR */
-
 enum cxlflash_lr_state {
LINK_RESET_INVALID,
LINK_RESET_REQUIRED,
@@ -133,9 +128,8 @@ struct cxlflash_cfg {
 struct afu_cmd {
struct sisl_ioarcb rcb; /* IOARCB (cache line aligned) */
struct sisl_ioasa sa;   /* IOASA must follow IOARCB */
-   spinlock_t slock;
-   struct completion cevent;
struct afu *parent;
+   struct completion cevent;
 
u8 cmd_tmf:1;
 
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 839eca4..db77030 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -161,10 +161,6 @@ static void cmd_complete(struct afu_cmd *cmd)
struct cxlflash_cfg *cfg = afu->parent;
bool cmd_is_tmf;
 
-   spin_lock_irqsave(>slock, lock_flags);
-   cmd->sa.host_use_b[0] |= B_DONE;
-   spin_unlock_irqrestore(>slock, lock_flags);
-
if (cmd->rcb.scp) {
scp = cmd->rcb.scp;
if (unlikely(cmd->sa.ioasc))
@@ -204,21 +200,9 @@ static void context_reset(struct afu_cmd *cmd)
struct afu *afu = cmd->parent;
struct cxlflash_cfg *cfg = afu->parent;
struct device *dev = >dev->dev;
-   ulong lock_flags;
 
pr_debug("%s: cmd=%p\n", __func__, cmd);
 
-   spin_lock_irqsave(>slock, lock_flags);
-
-   /* Already completed? */
-   if (cmd->sa.host_use_b[0] & B_DONE) {
-   spin_unlock_irqrestore(>slock, lock_flags);
-   return;
-   }
-
-   cmd->sa.host_use_b[0] |= (B_DONE | B_ERROR | B_TIMEOUT);
-   spin_unlock_irqrestore(>slock, lock_flags);
-
writeq_be(rrin, >host_map->ioarrin);
do {
rrin = readq_be(>host_map->ioarrin);
@@ -278,20 +262,30 @@ static int send_cmd(struct afu *afu, struct afu_cmd *cmd)
  * wait_resp() - polls for a response or timeout to a sent AFU command
  * @afu:   AFU associated with the host.
  * @cmd:   AFU command that was sent.
+ *
+ * Return:
+ * 0 on success, -1 on timeout/error
  */
-static void wait_resp(struct afu *afu, struct afu_cmd *cmd)
+static int wait_resp(struct afu *afu, struct afu_cmd *cmd)
 {
+   int rc = 0;
ulong timeout = msecs_to_jiffies(cmd->rcb.timeout * 2 * 1000);
 
timeout = wait_for_completion_timeout(>cevent, timeout);
-   if (!timeout)
+   if (!timeout) {
context_reset(cmd);
+   rc = -1;
+   }
 
-   if (unlikely(cmd->sa.ioasc != 0))
+   if (unlikely(cmd->sa.ioasc != 0)) {
pr_err("%s: CMD 0x%X failed, IOASC: flags 0x%X, afu_rc 0x%X, "
   "scsi_rc 0x%X, fc_rc 0x%X\n", __func__, cmd->rcb.cdb[0],
   cmd->sa.rc.flags, cmd->sa.rc.afu_rc, cmd->sa.rc.scsi_rc,
   cmd->sa.rc.fc_rc);
+   rc = -1;
+   }
+
+   return rc;
 }
 
 /**
@@ -339,7 +333,6 @@ static int send_tmf(struct afu *afu, struct scsi_cmnd *scp, 
u64 tmfcmd)
/* Stash the scp in the command, for reuse during interrupt */
cmd->rcb.scp = scp;
cmd->parent = afu;
-   spin_lock_init(>slock);
 
/* Copy the CDB from the cmd passed in */
memcpy(cmd->rcb.cdb, , sizeof(tmfcmd));
@@ -466,7 +459,6 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scp)
/* Stash the scp in the reserved field, for reuse during interrupt */
cmd->rcb.scp = scp;
cmd->parent = 

[PATCH v2 09/14] cxlflash: Wait for active AFU commands to timeout upon tear down

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

With the removal of the static private command pool, the ability to
'complete' outstanding commands was lost. While not an issue for the
commands originating outside the driver, internal AFU commands are
synchronous and therefore have a timeout associated with them. To
avoid a stale memory access, the tear down sequence needs to ensure
that there are not any active commands before proceeding. As these
internal AFU commands are rare events, the simplest way to accomplish
this is detecting the activity and waiting for it to timeout.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/common.h | 1 +
 drivers/scsi/cxlflash/main.c   | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 7e4ba31..6211677 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -180,6 +180,7 @@ struct afu {
u64 *hrrq_end;
u64 *hrrq_curr;
bool toggle;
+   atomic_t cmds_active;   /* Number of currently active AFU commands */
s64 room;
spinlock_t rrin_slock; /* Lock to rrin queuing and cmd_room updates */
u64 hb;
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 19156ad..839eca4 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -531,7 +531,7 @@ static void free_mem(struct cxlflash_cfg *cfg)
  *
  * Safe to call with AFU in a partially allocated/initialized state.
  *
- * Cleans up all state associated with the command queue, and unmaps
+ * Waits for any active internal AFU commands to timeout and then unmaps
  * the MMIO space.
  */
 static void stop_afu(struct cxlflash_cfg *cfg)
@@ -539,6 +539,8 @@ static void stop_afu(struct cxlflash_cfg *cfg)
struct afu *afu = cfg->afu;
 
if (likely(afu)) {
+   while (atomic_read(>cmds_active))
+   ssleep(1);
if (likely(afu->afu_map)) {
cxl_psa_unmap((void __iomem *)afu->afu_map);
afu->afu_map = NULL;
@@ -1721,6 +1723,7 @@ int cxlflash_afu_sync(struct afu *afu, ctx_hndl_t 
ctx_hndl_u,
}
 
mutex_lock(_active);
+   atomic_inc(>cmds_active);
buf = kzalloc(sizeof(*cmd) + __alignof__(*cmd) - 1, GFP_KERNEL);
if (unlikely(!buf)) {
dev_err(dev, "%s: no memory for command\n", __func__);
@@ -1762,6 +1765,7 @@ int cxlflash_afu_sync(struct afu *afu, ctx_hndl_t 
ctx_hndl_u,
 (cmd->sa.host_use_b[0] & B_ERROR)))
rc = -1;
 out:
+   atomic_dec(>cmds_active);
mutex_unlock(_active);
kfree(buf);
pr_debug("%s: returning rc=%d\n", __func__, rc);
-- 
2.1.0



[PATCH v2 08/14] cxlflash: Remove private command pool

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

Clean up and remove the remaining private command pool infrastructure
that is no longer required.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/common.h |  7 -
 drivers/scsi/cxlflash/main.c   | 68 --
 2 files changed, 75 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 539908f..7e4ba31 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -136,8 +136,6 @@ struct afu_cmd {
spinlock_t slock;
struct completion cevent;
struct afu *parent;
-   int slot;
-   atomic_t free;
 
u8 cmd_tmf:1;
 
@@ -164,10 +162,6 @@ struct afu {
/* Stuff requiring alignment go first. */
 
u64 rrq_entry[NUM_RRQ_ENTRY];   /* 2K RRQ */
-   /*
-* Command & data for AFU commands.
-*/
-   struct afu_cmd cmd[CXLFLASH_NUM_CMDS];
 
/* Beware of alignment till here. Preferably introduce new
 * fields after this point
@@ -189,7 +183,6 @@ struct afu {
s64 room;
spinlock_t rrin_slock; /* Lock to rrin queuing and cmd_room updates */
u64 hb;
-   u32 cmd_couts;  /* Number of command checkouts */
u32 internal_lun;   /* User-desired LUN mode for this AFU */
 
char version[16];
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 43140ce..19156ad 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -35,33 +35,6 @@ MODULE_AUTHOR("Matthew R. Ochs ");
 MODULE_LICENSE("GPL");
 
 /**
- * cmd_checkin() - checks in an AFU command
- * @cmd:   AFU command to checkin.
- *
- * Safe to pass commands that have already been checked in. Several
- * internal tracking fields are reset as part of the checkin. Note
- * that these are intentionally reset prior to toggling the free bit
- * to avoid clobbering values in the event that the command is checked
- * out right away.
- */
-static void cmd_checkin(struct afu_cmd *cmd)
-{
-   cmd->rcb.scp = NULL;
-   cmd->rcb.timeout = 0;
-   cmd->sa.ioasc = 0;
-   cmd->cmd_tmf = false;
-   cmd->sa.host_use[0] = 0; /* clears both completion and retry bytes */
-
-   if (unlikely(atomic_inc_return(>free) != 1)) {
-   pr_err("%s: Freeing cmd (%d) that is not in use!\n",
-  __func__, cmd->slot);
-   return;
-   }
-
-   pr_devel("%s: released cmd %p index=%d\n", __func__, cmd, cmd->slot);
-}
-
-/**
  * process_cmd_err() - command error handler
  * @cmd:   AFU command that experienced the error.
  * @scp:   SCSI command associated with the AFU command in error.
@@ -560,28 +533,12 @@ static void free_mem(struct cxlflash_cfg *cfg)
  *
  * Cleans up all state associated with the command queue, and unmaps
  * the MMIO space.
- *
- *  - complete() will take care of commands we initiated (they'll be checked
- *  in as part of the cleanup that occurs after the completion)
- *
- *  - cmd_checkin() will take care of entries that we did not initiate and that
- *  have not (and will not) complete because they are sitting on a [now stale]
- *  hardware queue
  */
 static void stop_afu(struct cxlflash_cfg *cfg)
 {
-   int i;
struct afu *afu = cfg->afu;
-   struct afu_cmd *cmd;
 
if (likely(afu)) {
-   for (i = 0; i < CXLFLASH_NUM_CMDS; i++) {
-   cmd = >cmd[i];
-   complete(>cevent);
-   if (!atomic_read(>free))
-   cmd_checkin(cmd);
-   }
-
if (likely(afu->afu_map)) {
cxl_psa_unmap((void __iomem *)afu->afu_map);
afu->afu_map = NULL;
@@ -794,7 +751,6 @@ static void cxlflash_remove(struct pci_dev *pdev)
 static int alloc_mem(struct cxlflash_cfg *cfg)
 {
int rc = 0;
-   int i;
struct device *dev = >dev->dev;
 
/* AFU is ~12k, i.e. only one 64k page or up to four 4k pages */
@@ -808,12 +764,6 @@ static int alloc_mem(struct cxlflash_cfg *cfg)
}
cfg->afu->parent = cfg;
cfg->afu->afu_map = NULL;
-
-   for (i = 0; i < CXLFLASH_NUM_CMDS; i++) {
-   atomic_set(>afu->cmd[i].free, 1);
-   cfg->afu->cmd[i].slot = i;
-   }
-
 out:
return rc;
 }
@@ -1443,13 +1393,6 @@ static void init_pcr(struct cxlflash_cfg *cfg)
 
/* Program the Endian Control for the master context */
writeq_be(SISL_ENDIAN_CTRL, >host_map->endian_ctrl);
-
-   /* Initialize cmd fields that never change */
-   for (i = 0; i < CXLFLASH_NUM_CMDS; i++) {
-   afu->cmd[i].rcb.ctx_id = afu->ctx_hndl;
-   afu->cmd[i].rcb.msi = SISL_MSI_RRQ_UPDATED;
-   afu->cmd[i].rcb.rrq = 0x0;
-   }
 }
 
 /**
@@ -1538,19 +1481,8 @@ static int 

[PATCH v2 07/14] cxlflash: Use cmd_size for private commands

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

Instead of using a private pool of AFU commands, use cmd_size to prime
the private pool of SCSI commands such that they are allocated with a
size large enough to contain an aligned AFU command. Use scsi_cmd_priv()
to derive the aligned/zeroed private command on queuecommand and TMF
paths. Remove cmd_checkout() as it is no longer required. The remaining
AFU private command infrastructure will be removed in a cleanup commit.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/common.h | 14 +
 drivers/scsi/cxlflash/main.c   | 65 +++---
 2 files changed, 25 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 4525514..539908f 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 extern const struct file_operations cxlflash_cxl_fops;
@@ -146,6 +147,19 @@ struct afu_cmd {
 */
 } __aligned(cache_line_size());
 
+static inline struct afu_cmd *sc_to_afuc(struct scsi_cmnd *sc)
+{
+   return PTR_ALIGN(scsi_cmd_priv(sc), __alignof__(struct afu_cmd));
+}
+
+static inline struct afu_cmd *sc_to_afucz(struct scsi_cmnd *sc)
+{
+   struct afu_cmd *afuc = sc_to_afuc(sc);
+
+   memset(afuc, 0, sizeof(*afuc));
+   return afuc;
+}
+
 struct afu {
/* Stuff requiring alignment go first. */
 
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 0f13b4d..43140ce 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -35,38 +35,6 @@ MODULE_AUTHOR("Matthew R. Ochs ");
 MODULE_LICENSE("GPL");
 
 /**
- * cmd_checkout() - checks out an AFU command
- * @afu:   AFU to checkout from.
- *
- * Commands are checked out in a round-robin fashion. Note that since
- * the command pool is larger than the hardware queue, the majority of
- * times we will only loop once or twice before getting a command. The
- * CDB within the command is initialized (zeroed) prior to returning.
- *
- * Return: The checked out command or NULL when command pool is empty.
- */
-static struct afu_cmd *cmd_checkout(struct afu *afu)
-{
-   int k, dec = CXLFLASH_NUM_CMDS;
-   struct afu_cmd *cmd;
-
-   while (dec--) {
-   k = (afu->cmd_couts++ & (CXLFLASH_NUM_CMDS - 1));
-
-   cmd = >cmd[k];
-
-   if (!atomic_dec_if_positive(>free)) {
-   pr_devel("%s: returning found index=%d cmd=%p\n",
-__func__, cmd->slot, cmd);
-   memset(cmd->rcb.cdb, 0, sizeof(cmd->rcb.cdb));
-   return cmd;
-   }
-   }
-
-   return NULL;
-}
-
-/**
  * cmd_checkin() - checks in an AFU command
  * @cmd:   AFU command to checkin.
  *
@@ -232,7 +200,6 @@ static void cmd_complete(struct afu_cmd *cmd)
scp->result = (DID_OK << 16);
 
cmd_is_tmf = cmd->cmd_tmf;
-   cmd_checkin(cmd); /* Don't use cmd after here */
 
pr_debug_ratelimited("%s: calling scsi_done scp=%p result=%X "
 "ioasc=%d\n", __func__, scp, scp->result,
@@ -365,7 +332,7 @@ static void wait_resp(struct afu *afu, struct afu_cmd *cmd)
  */
 static int send_tmf(struct afu *afu, struct scsi_cmnd *scp, u64 tmfcmd)
 {
-   struct afu_cmd *cmd;
+   struct afu_cmd *cmd = sc_to_afucz(scp);
 
u32 port_sel = scp->device->channel + 1;
short lflag = 0;
@@ -376,13 +343,6 @@ static int send_tmf(struct afu *afu, struct scsi_cmnd 
*scp, u64 tmfcmd)
int rc = 0;
ulong to;
 
-   cmd = cmd_checkout(afu);
-   if (unlikely(!cmd)) {
-   dev_err(dev, "%s: could not get a free command\n", __func__);
-   rc = SCSI_MLQUEUE_HOST_BUSY;
-   goto out;
-   }
-
/* When Task Management Function is active do not send another */
spin_lock_irqsave(>tmf_slock, lock_flags);
if (cfg->tmf_active)
@@ -394,6 +354,7 @@ static int send_tmf(struct afu *afu, struct scsi_cmnd *scp, 
u64 tmfcmd)
spin_unlock_irqrestore(>tmf_slock, lock_flags);
 
cmd->rcb.ctx_id = afu->ctx_hndl;
+   cmd->rcb.msi = SISL_MSI_RRQ_UPDATED;
cmd->rcb.port_sel = port_sel;
cmd->rcb.lun_id = lun_to_lunid(scp->device->lun);
 
@@ -402,8 +363,10 @@ static int send_tmf(struct afu *afu, struct scsi_cmnd 
*scp, u64 tmfcmd)
cmd->rcb.req_flags = (SISL_REQ_FLAGS_PORT_LUN_ID |
  SISL_REQ_FLAGS_SUP_UNDERRUN | lflag);
 
-   /* Stash the scp in the reserved field, for reuse during interrupt */
+   /* Stash the scp in the command, for reuse during interrupt */
cmd->rcb.scp = scp;
+   cmd->parent = afu;
+   spin_lock_init(>slock);
 
/* Copy the CDB from 

[PATCH v2 06/14] cxlflash: Allocate memory instead of using command pool for AFU sync

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

As staging for the removal of the AFU command pool, remove the reliance
upon the pool for the internal AFU sync command. Instead of obtaining an
AFU command from the pool, dynamically allocate memory with the appropriate
alignment requirements. Since the AFU sync service is only executed from
the process environment, blocking is acceptable.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/main.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 9f2821d..0f13b4d 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -1823,8 +1823,8 @@ int cxlflash_afu_sync(struct afu *afu, ctx_hndl_t 
ctx_hndl_u,
struct cxlflash_cfg *cfg = afu->parent;
struct device *dev = >dev->dev;
struct afu_cmd *cmd = NULL;
+   char *buf = NULL;
int rc = 0;
-   int retry_cnt = 0;
static DEFINE_MUTEX(sync_active);
 
if (cfg->state != STATE_NORMAL) {
@@ -1833,23 +1833,23 @@ int cxlflash_afu_sync(struct afu *afu, ctx_hndl_t 
ctx_hndl_u,
}
 
mutex_lock(_active);
-retry:
-   cmd = cmd_checkout(afu);
-   if (unlikely(!cmd)) {
-   retry_cnt++;
-   udelay(1000 * retry_cnt);
-   if (retry_cnt < MC_RETRY_CNT)
-   goto retry;
-   dev_err(dev, "%s: could not get a free command\n", __func__);
+   buf = kzalloc(sizeof(*cmd) + __alignof__(*cmd) - 1, GFP_KERNEL);
+   if (unlikely(!buf)) {
+   dev_err(dev, "%s: no memory for command\n", __func__);
rc = -1;
goto out;
}
 
-   pr_debug("%s: afu=%p cmd=%p %d\n", __func__, afu, cmd, ctx_hndl_u);
+   cmd = (struct afu_cmd *)PTR_ALIGN(buf, __alignof__(*cmd));
+   init_completion(>cevent);
+   spin_lock_init(>slock);
+   cmd->parent = afu;
 
-   memset(cmd->rcb.cdb, 0, sizeof(cmd->rcb.cdb));
+   pr_debug("%s: afu=%p cmd=%p %d\n", __func__, afu, cmd, ctx_hndl_u);
 
cmd->rcb.req_flags = SISL_REQ_FLAGS_AFU_CMD;
+   cmd->rcb.ctx_id = afu->ctx_hndl;
+   cmd->rcb.msi = SISL_MSI_RRQ_UPDATED;
cmd->rcb.port_sel = 0x0;/* NA */
cmd->rcb.lun_id = 0x0;  /* NA */
cmd->rcb.data_len = 0x0;
@@ -1875,8 +1875,7 @@ int cxlflash_afu_sync(struct afu *afu, ctx_hndl_t 
ctx_hndl_u,
rc = -1;
 out:
mutex_unlock(_active);
-   if (cmd)
-   cmd_checkin(cmd);
+   kfree(buf);
pr_debug("%s: returning rc=%d\n", __func__, rc);
return rc;
 }
-- 
2.1.0



[PATCH v2 05/14] cxlflash: Remove unused buffer from AFU command

2016-11-28 Thread Uma Krishnan
From: "Matthew R. Ochs" 

The cxlflash driver originally required a per-command 4K buffer that
hosted data passed to the AFU. When the routines that initiate AFU
and internal SCSI commands were refactored to use scsi_execute(), the
need for this buffer became obsolete. As it is no longer necessary,
the buffer is removed.

Signed-off-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/common.h |  1 -
 drivers/scsi/cxlflash/main.c   | 28 ++--
 2 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index ef2943d..4525514 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -134,7 +134,6 @@ struct afu_cmd {
struct sisl_ioasa sa;   /* IOASA must follow IOARCB */
spinlock_t slock;
struct completion cevent;
-   char *buf;  /* per command buffer */
struct afu *parent;
int slot;
atomic_t free;
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 1292a74..9f2821d 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -41,8 +41,7 @@ MODULE_LICENSE("GPL");
  * Commands are checked out in a round-robin fashion. Note that since
  * the command pool is larger than the hardware queue, the majority of
  * times we will only loop once or twice before getting a command. The
- * buffer and CDB within the command are initialized (zeroed) prior to
- * returning.
+ * CDB within the command is initialized (zeroed) prior to returning.
  *
  * Return: The checked out command or NULL when command pool is empty.
  */
@@ -59,7 +58,6 @@ static struct afu_cmd *cmd_checkout(struct afu *afu)
if (!atomic_dec_if_positive(>free)) {
pr_devel("%s: returning found index=%d cmd=%p\n",
 __func__, cmd->slot, cmd);
-   memset(cmd->buf, 0, CMD_BUFSIZE);
memset(cmd->rcb.cdb, 0, sizeof(cmd->rcb.cdb));
return cmd;
}
@@ -590,17 +588,9 @@ static void cxlflash_wait_for_pci_err_recovery(struct 
cxlflash_cfg *cfg)
  */
 static void free_mem(struct cxlflash_cfg *cfg)
 {
-   int i;
-   char *buf = NULL;
struct afu *afu = cfg->afu;
 
if (cfg->afu) {
-   for (i = 0; i < CXLFLASH_NUM_CMDS; i++) {
-   buf = afu->cmd[i].buf;
-   if (!((u64)buf & (PAGE_SIZE - 1)))
-   free_page((ulong)buf);
-   }
-
free_pages((ulong)afu, get_order(sizeof(struct afu)));
cfg->afu = NULL;
}
@@ -849,7 +839,6 @@ static int alloc_mem(struct cxlflash_cfg *cfg)
 {
int rc = 0;
int i;
-   char *buf = NULL;
struct device *dev = >dev->dev;
 
/* AFU is ~12k, i.e. only one 64k page or up to four 4k pages */
@@ -864,20 +853,7 @@ static int alloc_mem(struct cxlflash_cfg *cfg)
cfg->afu->parent = cfg;
cfg->afu->afu_map = NULL;
 
-   for (i = 0; i < CXLFLASH_NUM_CMDS; buf += CMD_BUFSIZE, i++) {
-   if (!((u64)buf & (PAGE_SIZE - 1))) {
-   buf = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-   if (unlikely(!buf)) {
-   dev_err(dev,
-   "%s: Allocate command buffers fail!\n",
-  __func__);
-   rc = -ENOMEM;
-   free_mem(cfg);
-   goto out;
-   }
-   }
-
-   cfg->afu->cmd[i].buf = buf;
+   for (i = 0; i < CXLFLASH_NUM_CMDS; i++) {
atomic_set(>afu->cmd[i].free, 1);
cfg->afu->cmd[i].slot = i;
}
-- 
2.1.0



Re: [mm v2 0/3] Support memory cgroup hotplug

2016-11-28 Thread Tejun Heo
Hello, Balbir.

On Tue, Nov 29, 2016 at 11:09:26AM +1100, Balbir Singh wrote:
> On 29/11/16 08:10, Tejun Heo wrote:
> > On Thu, Nov 24, 2016 at 12:05:12AM +1100, Balbir Singh wrote:
> >> On my desktop NODES_SHIFT is 6, many distro kernels have it a 9. I've known
> >> of solutions that use fake NUMA for partitioning and need as many nodes as
> >> possible.
> > 
> > It was a crude kludge that people used before memcg.  If people still
> > use it, that's fine but we don't want to optimize / make code
> > complicated for it, so let's please put away this part of
> > justification.
> 
> Are you suggesting those use cases can be ignored now?

Don't do that.  When did I say that?  What I said is that it isn't a
good idea to optimize and complicate the code base for it at this
point.  It shouldn't a controversial argument given fake numa's
inherent issues and general lack of popularity.

Besides, does node hotplug even apply to fake numa?  ISTR it being
configured statically on the boot prompt.

> > NUMA code already has possible detection.  Why not simply make memcg
> > use those instead of MAX_NUMNODES like how we use nr_cpu_ids instead
> > of NR_CPUS?
> 
> nodes_possible_map is set to node_online_map at the moment for ppc64.
> Which becomes a problem when hotplugging a node that was not already
> online.
> 
> I am not sure what you mean by possible detection. node_possible_map
> is set based on CONFIG_NODE_SHIFT and then can be adjusted by the
> architecture (if desired). Are you suggesting firmware populate it
> in?

That's what we do with cpus.  The kernel is built with high maximum
limit and the kernel queries the firmware during boot to determine how
many are actually possible on the system, which in most cases isn't
too far from what's already on the system.  I don't see why we would
take a different approach with NUMA nodes.

Thanks.

-- 
tejun


[PATCH v2 04/14] cxlflash: Avoid command room violation

2016-11-28 Thread Uma Krishnan
During test, a command room violation interrupt is occasionally seen
for the master context when the CXL flash devices are stressed.

After studying the code, there could be gaps in the way command room
value is being cached in cxlflash. When the cached command room is zero
the thread attempting to send becomes burdened with updating the cached
value with the actual value from the AFU. Today, this is handled with an
atomic set operation of the raw value read. Following the atomic update,
the thread proceeds to send.

This behavior is incorrect on two counts:

   - The update fails to take into account the current thread and its
 consumption of one of the hardware commands.

   - The update does not take into account other threads also atomically
 updating. Per design, a worker thread updates the cached value when a
 send thread times out. By not protecting the update with a lock, the
 cached value can be incorrectly clobbered.

To correct these issues, the update of the cached command room has been
simplified and also protected using a spin lock which is held until the
MMIO is complete. This ensures the command room is properly consumed by
the same thread. Update of cached value also takes into account the
current thread consuming a hardware command.

Signed-off-by: Uma Krishnan 
---
 drivers/scsi/cxlflash/common.h |  4 +--
 drivers/scsi/cxlflash/main.c   | 66 --
 2 files changed, 20 insertions(+), 50 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 6e68155..ef2943d 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -173,8 +173,8 @@ struct afu {
u64 *hrrq_end;
u64 *hrrq_curr;
bool toggle;
-   bool read_room;
-   atomic64_t room;
+   s64 room;
+   spinlock_t rrin_slock; /* Lock to rrin queuing and cmd_room updates */
u64 hb;
u32 cmd_couts;  /* Number of command checkouts */
u32 internal_lun;   /* User-desired LUN mode for this AFU */
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 6d33d8c..1292a74 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -306,59 +306,34 @@ static int send_cmd(struct afu *afu, struct afu_cmd *cmd)
 {
struct cxlflash_cfg *cfg = afu->parent;
struct device *dev = >dev->dev;
-   int nretry = 0;
int rc = 0;
-   u64 room;
-   long newval;
+   s64 room;
+   ulong lock_flags;
 
/*
-* This routine is used by critical users such an AFU sync and to
-* send a task management function (TMF). Thus we want to retry a
-* bit before returning an error. To avoid the performance penalty
-* of MMIO, we spread the update of 'room' over multiple commands.
+* To avoid the performance penalty of MMIO, spread the update of
+* 'room' over multiple commands.
 */
-retry:
-   newval = atomic64_dec_if_positive(>room);
-   if (!newval) {
-   do {
-   room = readq_be(>host_map->cmd_room);
-   atomic64_set(>room, room);
-   if (room)
-   goto write_ioarrin;
-   udelay(1 << nretry);
-   } while (nretry++ < MC_ROOM_RETRY_CNT);
-
-   dev_err(dev, "%s: no cmd_room to send 0x%X\n",
-  __func__, cmd->rcb.cdb[0]);
-
-   goto no_room;
-   } else if (unlikely(newval < 0)) {
-   /* This should be rare. i.e. Only if two threads race and
-* decrement before the MMIO read is done. In this case
-* just benefit from the other thread having updated
-* afu->room.
-*/
-   if (nretry++ < MC_ROOM_RETRY_CNT) {
-   udelay(1 << nretry);
-   goto retry;
+   spin_lock_irqsave(>rrin_slock, lock_flags);
+   if (--afu->room < 0) {
+   room = readq_be(>host_map->cmd_room);
+   if (room <= 0) {
+   dev_dbg_ratelimited(dev, "%s: no cmd_room to send "
+   "0x%02X, room=0x%016llX\n",
+   __func__, cmd->rcb.cdb[0], room);
+   afu->room = 0;
+   rc = SCSI_MLQUEUE_HOST_BUSY;
+   goto out;
}
-
-   goto no_room;
+   afu->room = room - 1;
}
 
-write_ioarrin:
writeq_be((u64)>rcb, >host_map->ioarrin);
 out:
+   spin_unlock_irqrestore(>rrin_slock, lock_flags);
pr_devel("%s: cmd=%p len=%d ea=%p rc=%d\n", __func__, cmd,
 cmd->rcb.data_len, (void *)cmd->rcb.data_ea, rc);
return rc;
-
-no_room:
-   afu->read_room = true;
-   kref_get(>afu->mapcount);
-   

[PATCH v2 03/14] cxlflash: Improve context_reset() logic

2016-11-28 Thread Uma Krishnan
Currently, the context reset routine waits for command room to
be available before sending the reset request. Per review of the
SISLite specification and clarifications from the CXL Flash AFU
designers, this wait is unnecessary. The reset request can be
sent anytime regardless of command room, so long as only a single
reset request is active at any one point in time.

This commit simplifies the reset routine by removing the wait for
command room. Additionally it adds a debug trace to help pinpoint
hardware errors when a context reset does not complete.

Signed-off-by: Uma Krishnan 
Acked-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/main.c | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 6004860..6d33d8c 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -263,8 +263,9 @@ static void context_reset(struct afu_cmd *cmd)
 {
int nretry = 0;
u64 rrin = 0x1;
-   u64 room = 0;
struct afu *afu = cmd->parent;
+   struct cxlflash_cfg *cfg = afu->parent;
+   struct device *dev = >dev->dev;
ulong lock_flags;
 
pr_debug("%s: cmd=%p\n", __func__, cmd);
@@ -280,23 +281,6 @@ static void context_reset(struct afu_cmd *cmd)
cmd->sa.host_use_b[0] |= (B_DONE | B_ERROR | B_TIMEOUT);
spin_unlock_irqrestore(>slock, lock_flags);
 
-   /*
-* We really want to send this reset at all costs, so spread
-* out wait time on successive retries for available room.
-*/
-   do {
-   room = readq_be(>host_map->cmd_room);
-   atomic64_set(>room, room);
-   if (room)
-   goto write_rrin;
-   udelay(1 << nretry);
-   } while (nretry++ < MC_ROOM_RETRY_CNT);
-
-   pr_err("%s: no cmd_room to send reset\n", __func__);
-   return;
-
-write_rrin:
-   nretry = 0;
writeq_be(rrin, >host_map->ioarrin);
do {
rrin = readq_be(>host_map->ioarrin);
@@ -305,6 +289,9 @@ static void context_reset(struct afu_cmd *cmd)
/* Double delay each time */
udelay(1 << nretry);
} while (nretry++ < MC_ROOM_RETRY_CNT);
+
+   dev_dbg(dev, "%s: returning rrin=0x%016llX nretry=%d\n",
+   __func__, rrin, nretry);
 }
 
 /**
-- 
2.1.0



[PATCH v2 02/14] cxlflash: Fix crash in cxlflash_restore_luntable()

2016-11-28 Thread Uma Krishnan
During test, the following crash was observed:

[34538.981505] Faulting instruction address: 0xd7c9c870
cpu 0x9: Vector: 300 (Data Access) at [c007f1e8f590]
pc: d7c9c870: cxlflash_restore_luntable+0x70/0x1d0 [cxlflash]
lr: d7c9c84c: cxlflash_restore_luntable+0x4c/0x1d0 [cxlflash]
sp: c007f1e8f810
   msr: 90019033
   dar: c0171d637438
 dsisr: 4000
  current = 0xc007f1e43f90
  paca= 0xc7b25100   softe: 0irq_happened: 0x01
pid   = 493, comm = eehd
enter ? for help
[c007f1e8f8a0] d7c940b0 init_afu+0xd60/0x1200 [cxlflash]
[c007f1e8f9a0] d7c945a8 cxlflash_pci_slot_reset+0x58/0xe0 [cxlflash]
[c007f1e8fa20] d715f790 cxl_pci_slot_reset+0x230/0x340 [cxl]
[c007f1e8fae0] c0040dd4 eeh_report_reset+0x144/0x180
[c007f1e8fb20] c003f708 eeh_pe_dev_traverse+0x98/0x170
[c007f1e8fbb0] c0041618 eeh_handle_normal_event+0x328/0x410
[c007f1e8fc30] c0041db8 eeh_handle_event+0x178/0x330
[c007f1e8fce0] c0042118 eeh_event_handler+0x1a8/0x1b0
[c007f1e8fd80] c011420c kthread+0xec/0x100
[c007f1e8fe30] c000a47c ret_from_kernel_thread+0x5c/0xe0

When superpipe mode is disabled for a LUN, the references for the
local lun are deleted but the LUN is still identified as being present
in the LUN table. This mismatched state can result in the above crash
when the LUN table is restored during an error recovery operation.

To fix this issue, the local LUN information structure is updated to
reflect the LUN is no longer in the LUN table once all references to
the LUN are gone.

Signed-off-by: Uma Krishnan 
Acked-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/lunmgt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/cxlflash/lunmgt.c b/drivers/scsi/cxlflash/lunmgt.c
index a0923ca..6c318db9 100644
--- a/drivers/scsi/cxlflash/lunmgt.c
+++ b/drivers/scsi/cxlflash/lunmgt.c
@@ -254,8 +254,14 @@ int cxlflash_manage_lun(struct scsi_device *sdev,
if (lli->parent->mode != MODE_NONE)
rc = -EBUSY;
else {
+   /*
+* Clean up local LUN for this port and reset table
+* tracking when no more references exist.
+*/
sdev->hostdata = NULL;
lli->port_sel &= ~CHAN2PORT(chan);
+   if (lli->port_sel == 0U)
+   lli->in_table = false;
}
}
 
-- 
2.1.0



[PATCH v2 01/14] cxlflash: Set sg_tablesize to 1 instead of SG_NONE

2016-11-28 Thread Uma Krishnan
The following Oops is encountered when blk_mq is enabled with the
cxlflash driver:

[ 2960.817172] Oops: Kernel access of bad area, sig: 11 [#5]
[ 2960.817309] NIP  __blk_mq_run_hw_queue+0x278/0x4c0
[ 2960.817313] LR __blk_mq_run_hw_queue+0x2bc/0x4c0
[ 2960.817314] Call Trace:
[ 2960.817320] __blk_mq_run_hw_queue+0x2bc/0x4c0 (unreliable)
[ 2960.817324] blk_mq_run_hw_queue+0xd8/0x100
[ 2960.817329] blk_mq_insert_requests+0x14c/0x1f0
[ 2960.817333] blk_mq_flush_plug_list+0x150/0x190
[ 2960.817338] blk_flush_plug_list+0x11c/0x2b0
[ 2960.817344] blk_finish_plug+0x58/0x80
[ 2960.817348] __do_page_cache_readahead+0x1c0/0x2e0
[ 2960.817352] force_page_cache_readahead+0x68/0xd0
[ 2960.817356] generic_file_read_iter+0x43c/0x6a0
[ 2960.817359] blkdev_read_iter+0x68/0xa0
[ 2960.817361] __vfs_read+0x11c/0x180
[ 2960.817364] vfs_read+0xa4/0x1c0
[ 2960.817366] SyS_read+0x6c/0x110
[ 2960.817369] system_call+0x38/0xb4

The SCSI blk_mq stack assumes that sg_tablesize is always a non-zero
value with scsi_mq_setup_tags() allocating tags using sg_tablesize.
The cxlflash driver currently uses SG_NONE (0) for the sg_tablesize
as the devices it supports are not capable of scatter gather. This
mismatch of values results in the Oops above.

To resolve this issue, sg_tablesize for cxlflash can simply be set
to 1, a value which satisfies the constraints in cxlflash and the
lack of support of SG_NONE in SCSI blk_mq.

Signed-off-by: Uma Krishnan 
Acked-by: Matthew R. Ochs 
---
 drivers/scsi/cxlflash/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index b301655..6004860 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -2377,7 +2377,7 @@ static struct scsi_host_template driver_template = {
.cmd_per_lun = CXLFLASH_MAX_CMDS_PER_LUN,
.can_queue = CXLFLASH_MAX_CMDS,
.this_id = -1,
-   .sg_tablesize = SG_NONE,/* No scatter gather support */
+   .sg_tablesize = 1,  /* No scatter gather support */
.max_sectors = CXLFLASH_MAX_SECTORS,
.use_clustering = ENABLE_CLUSTERING,
.shost_attrs = cxlflash_host_attrs,
-- 
2.1.0



[PATCH v2 00/14] cxlflash: Fixes, enhancements, cleanup and staging

2016-11-28 Thread Uma Krishnan
The first four patches in this patch series include fixes for command
room violation and lun table management.

The remaining patches remove the reliance upon an internally maintained
private command pool in favor of private commands being allocated
alongside the SCSI commands. Several cleanup opportunities were noticed
while removing the private command pool infrastructure and have been
included as well. Lastly, the final two patches provide staging for
supporting hardware with a different queuing model.

The series is based upon 4.9-rc5, intended for 4.10 and is bisectable.

v2 Changes:
- Incorporate comments from Matthew R. Ochs
- Updated command room violation patch to use a spin lock

Matthew R. Ochs (10):
  cxlflash: Remove unused buffer from AFU command
  cxlflash: Allocate memory instead of using command pool for AFU sync
  cxlflash: Use cmd_size for private commands
  cxlflash: Remove private command pool
  cxlflash: Wait for active AFU commands to timeout upon tear down
  cxlflash: Remove AFU command lock
  cxlflash: Cleanup send_tmf()
  cxlflash: Cleanup queuecommand()
  cxlflash: Migrate IOARRIN specific routines to function pointers
  cxlflash: Migrate scsi command pointer to AFU command

Uma Krishnan (4):
  cxlflash: Set sg_tablesize to 1 instead of SG_NONE
  cxlflash: Fix crash in cxlflash_restore_luntable()
  cxlflash: Improve context_reset() logic
  cxlflash: Avoid command room violation

 drivers/scsi/cxlflash/common.h  |  39 ++--
 drivers/scsi/cxlflash/lunmgt.c  |   6 +
 drivers/scsi/cxlflash/main.c| 410 ++--
 drivers/scsi/cxlflash/sislite.h |   2 +-
 4 files changed, 130 insertions(+), 327 deletions(-)

-- 
2.1.0



Re: [mm v2 0/3] Support memory cgroup hotplug

2016-11-28 Thread Balbir Singh


On 29/11/16 08:10, Tejun Heo wrote:
> On Thu, Nov 24, 2016 at 12:05:12AM +1100, Balbir Singh wrote:
>> On my desktop NODES_SHIFT is 6, many distro kernels have it a 9. I've known
>> of solutions that use fake NUMA for partitioning and need as many nodes as
>> possible.
> 
> It was a crude kludge that people used before memcg.  If people still
> use it, that's fine but we don't want to optimize / make code
> complicated for it, so let's please put away this part of
> justification.

Are you suggesting those use cases can be ignored now?

> 
> It's understandable that some kernels want to have large NODES_SHIFT
> to support wide range of configurations but if that makes wastage too
> high, the simpler solution is updating the users to use the rumtime
> detected possible number / mask instead of the compile time
> NODES_SHIFT.  Note that we do exactly the same thing for per-cpu
> things - we configure high max but do all operations on what's
> possible on the system.
> 
> NUMA code already has possible detection.  Why not simply make memcg
> use those instead of MAX_NUMNODES like how we use nr_cpu_ids instead
> of NR_CPUS?
> 

nodes_possible_map is set to node_online_map at the moment for ppc64.
Which becomes a problem when hotplugging a node that was not already
online.

I am not sure what you mean by possible detection. node_possible_map
is set based on CONFIG_NODE_SHIFT and then can be adjusted by the
architecture (if desired). Are you suggesting firmware populate it
in?

Thanks,
Balbir Singh


Re: [mm v2 0/3] Support memory cgroup hotplug

2016-11-28 Thread Tejun Heo
On Thu, Nov 24, 2016 at 12:05:12AM +1100, Balbir Singh wrote:
> On my desktop NODES_SHIFT is 6, many distro kernels have it a 9. I've known
> of solutions that use fake NUMA for partitioning and need as many nodes as
> possible.

It was a crude kludge that people used before memcg.  If people still
use it, that's fine but we don't want to optimize / make code
complicated for it, so let's please put away this part of
justification.

It's understandable that some kernels want to have large NODES_SHIFT
to support wide range of configurations but if that makes wastage too
high, the simpler solution is updating the users to use the rumtime
detected possible number / mask instead of the compile time
NODES_SHIFT.  Note that we do exactly the same thing for per-cpu
things - we configure high max but do all operations on what's
possible on the system.

NUMA code already has possible detection.  Why not simply make memcg
use those instead of MAX_NUMNODES like how we use nr_cpu_ids instead
of NR_CPUS?

Thanks.

-- 
tejun


[PATCH v2] adb: properly mark continued kernel messages

2016-11-28 Thread Andreas Schwab
Use pr_cont where appropriate, and switch to pr_foo throughout.
Additionally, lower messages in adb_probe_task to debug level.

Signed-off-by: Andreas Schwab 
---
 drivers/macintosh/adb.c| 24 +---
 drivers/macintosh/adbhid.c | 56 +++---
 2 files changed, 39 insertions(+), 41 deletions(-)

diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
index 226179b975..5abc265383 100644
--- a/drivers/macintosh/adb.c
+++ b/drivers/macintosh/adb.c
@@ -206,18 +206,17 @@ static int adb_scan_bus(void)
}
 
/* Now fill in the handler_id field of the adb_handler entries. */
-   printk(KERN_DEBUG "adb devices:");
+   pr_debug("adb devices:\n");
for (i = 1; i < 16; i++) {
if (adb_handler[i].original_address == 0)
continue;
adb_request(, NULL, ADBREQ_SYNC | ADBREQ_REPLY, 1,
(i << 4) | 0xf);
adb_handler[i].handler_id = req.reply[2];
-   printk(" [%d]: %d %x", i, adb_handler[i].original_address,
-  adb_handler[i].handler_id);
+   pr_debug(" [%d]: %d %x\n", i, adb_handler[i].original_address,
+adb_handler[i].handler_id);
devmask |= 1 << i;
}
-   printk("\n");
return devmask;
 }
 
@@ -228,9 +227,9 @@ static int adb_scan_bus(void)
 static int
 adb_probe_task(void *x)
 {
-   printk(KERN_INFO "adb: starting probe task...\n");
+   pr_debug("adb: starting probe task...\n");
do_adb_reset_bus();
-   printk(KERN_INFO "adb: finished probe task...\n");
+   pr_debug("adb: finished probe task...\n");
 
up(_probe_mutex);
 
@@ -340,7 +339,7 @@ static int __init adb_init(void)
adb_controller->init())
adb_controller = NULL;
if (adb_controller == NULL) {
-   printk(KERN_WARNING "Warning: no ADB interface detected\n");
+   pr_warn("Warning: no ADB interface detected\n");
} else {
 #ifdef CONFIG_PPC
if (of_machine_is_compatible("AAPL,PowerBook1998") ||
@@ -483,8 +482,7 @@ adb_register(int default_id, int handler_id, struct adb_ids 
*ids,
(!handler_id || (handler_id == adb_handler[i].handler_id) 
|| 
try_handler_change(i, handler_id))) {
if (adb_handler[i].handler != 0) {
-   printk(KERN_ERR
-  "Two handlers for ADB device %d\n",
+   pr_err("Two handlers for ADB device %d\n",
   default_id);
continue;
}
@@ -538,10 +536,10 @@ adb_input(unsigned char *buf, int nb, int autopoll)

id = buf[0] >> 4;
if (dump_adb_input) {
-   printk(KERN_INFO "adb packet: ");
+   pr_info("adb packet: ");
for (i = 0; i < nb; ++i)
-   printk(" %x", buf[i]);
-   printk(", id = %d\n", id);
+   pr_cont(" %x", buf[i]);
+   pr_cont(", id = %d\n", id);
}
write_lock_irqsave(_handler_lock, flags);
handler = adb_handler[id].handler;
@@ -891,7 +889,7 @@ static void __init
 adbdev_init(void)
 {
if (register_chrdev(ADB_MAJOR, "adb", _fops)) {
-   printk(KERN_ERR "adb: unable to get major %d\n", ADB_MAJOR);
+   pr_err("adb: unable to get major %d\n", ADB_MAJOR);
return;
}
 
diff --git a/drivers/macintosh/adbhid.c b/drivers/macintosh/adbhid.c
index 09d72bb00d..3c37e54b0b 100644
--- a/drivers/macintosh/adbhid.c
+++ b/drivers/macintosh/adbhid.c
@@ -267,7 +267,7 @@ adbhid_keyboard_input(unsigned char *data, int nb, int 
apoll)
int id = (data[0] >> 4) & 0x0f;
 
if (!adbhid[id]) {
-   printk(KERN_ERR "ADB HID on ID %d not yet registered, packet 
%#02x, %#02x, %#02x, %#02x\n",
+   pr_err("ADB HID on ID %d not yet registered, packet %#02x, 
%#02x, %#02x, %#02x\n",
   id, data[0], data[1], data[2], data[3]);
return;
}
@@ -319,8 +319,8 @@ adbhid_input_keycode(int id, int scancode, int repeat)
ahid->flags &= ~FLAG_CAPSLOCK_TRANSLATE;
}
} else {
-   printk(KERN_INFO "Spurious caps lock event "
-"(scancode 0xff).\n");
+   pr_info("Spurious caps lock event "
+   "(scancode 0xff).\n");
}
}
}
@@ -396,8 +396,8 @@ adbhid_input_keycode(int id, int scancode, int repeat)
input_report_key(adbhid[id]->input, key, !up_flag);

Re: [PATCH net 11/16] net: ethernet: marvell: mvneta: fix fixed-link phydev leaks

2016-11-28 Thread Thomas Petazzoni
Hello,

On Mon, 28 Nov 2016 19:25:04 +0100, Johan Hovold wrote:
> Make sure to deregister and free any fixed-link PHY registered using
> of_phy_register_fixed_link() on probe errors and on driver unbind.
> 
> Fixes: 83895bedeee6 ("net: mvneta: add support for fixed links")
> Signed-off-by: Johan Hovold 
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Thomas Petazzoni 

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


[PATCH net 12/16] net: ethernet: mediatek: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on initialisation errors and on uninit.

Fixes: 0c72c50f6f93 ("net-next: mediatek: add fixed-phy support")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 4a62ffd7729d..86a89cbd3ec9 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -318,6 +318,8 @@ static int mtk_phy_connect(struct net_device *dev)
return 0;
 
 err_phy:
+   if (of_phy_is_fixed_link(mac->of_node))
+   of_phy_deregister_fixed_link(mac->of_node);
of_node_put(np);
dev_err(eth->dev, "%s: invalid phy\n", __func__);
return -EINVAL;
@@ -1923,6 +1925,8 @@ static void mtk_uninit(struct net_device *dev)
struct mtk_eth *eth = mac->hw;
 
phy_disconnect(dev->phydev);
+   if (of_phy_is_fixed_link(mac->of_node))
+   of_phy_deregister_fixed_link(mac->of_node);
mtk_irq_disable(eth, MTK_QDMA_INT_MASK, ~0);
mtk_irq_disable(eth, MTK_PDMA_INT_MASK, ~0);
 }
-- 
2.7.3



[PATCH] mail: improve bash completion

2016-11-28 Thread Greg Kurz
This adds completion of the --to, --cc and --bcc options based on the
content of the [mail "alias"] section of GIT configuration files.

Signed-off-by: Greg Kurz 
---
 stgit/argparse.py  |1 +
 stgit/commands/mail.py |6 +++---
 stgit/completion.py|2 ++
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/stgit/argparse.py b/stgit/argparse.py
index 43c6cf93..86f4a1095d10 100644
--- a/stgit/argparse.py
+++ b/stgit/argparse.py
@@ -286,6 +286,7 @@ known_files = Compgen(['$(_known_files)'])
 repo = Compgen(actions = ['directory'])
 dir = Compgen(actions = ['directory'])
 files = Compgen(actions = ['file'])
+mail_aliases = Compgen(['$(_mail_aliases)'])
 def strings(*ss): return Compgen(ss)
 class patch_range(CompgenBase):
 def __init__(self, *endpoints):
diff --git a/stgit/commands/mail.py b/stgit/commands/mail.py
index ff9f2e11ca36..2dc3dba6680d 100644
--- a/stgit/commands/mail.py
+++ b/stgit/commands/mail.py
@@ -102,11 +102,11 @@ args = [argparse.patch_range(argparse.applied_patches,
 options = [
 opt('-a', '--all', action = 'store_true',
 short = 'E-mail all the applied patches'),
-opt('--to', action = 'append',
+opt('--to', action = 'append', args = [argparse.mail_aliases],
 short = 'Add TO to the To: list'),
-opt('--cc', action = 'append',
+opt('--cc', action = 'append', args = [argparse.mail_aliases],
 short = 'Add CC to the Cc: list'),
-opt('--bcc', action = 'append',
+opt('--bcc', action = 'append', args = [argparse.mail_aliases],
 short = 'Add BCC to the Bcc: list'),
 opt('--auto', action = 'store_true',
 short = 'Automatically cc the patch signers'),
diff --git a/stgit/completion.py b/stgit/completion.py
index 38f06705b76a..010fc4fbee0c 100644
--- a/stgit/completion.py
+++ b/stgit/completion.py
@@ -58,6 +58,8 @@ def util():
  'esac'),
  fun('_stg_branches',
  'local g=$(_gitdir)', 'test "$g" && (cd $g/patches/ && echo *)'),
+ fun('_mail_aliases',
+ 'git config --name-only --get-regexp "^mail\.alias\." | cut -d. 
-f 3'),
  ref_list_fun('_all_branches', 'refs/heads'),
  ref_list_fun('_tags', 'refs/tags'),
  ref_list_fun('_remotes', 'refs/remotes')]



[PATCH net 15/16] net: ethernet: ti: davinci_emac: fix fixed-link phydev and of-node leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Also remember to put the of-node reference on probe errors.

Fixes: 1bb6aa56bb38 ("net: davinci_emac: Add support for fixed-link
PHY")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/ti/davinci_emac.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index 84fbe5714f8b..481c7bf0395b 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1767,6 +1767,7 @@ static int davinci_emac_try_get_mac(struct 
platform_device *pdev,
  */
 static int davinci_emac_probe(struct platform_device *pdev)
 {
+   struct device_node *np = pdev->dev.of_node;
int rc = 0;
struct resource *res, *res_ctrl;
struct net_device *ndev;
@@ -1805,7 +1806,7 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
if (!pdata) {
dev_err(>dev, "no platform data\n");
rc = -ENODEV;
-   goto no_pdata;
+   goto err_free_netdev;
}
 
/* MAC addr and PHY mask , RMII enable info from platform_data */
@@ -1941,6 +1942,10 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
cpdma_chan_destroy(priv->rxchan);
cpdma_ctlr_destroy(priv->dma);
 no_pdata:
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
+   of_node_put(priv->phy_node);
+err_free_netdev:
free_netdev(ndev);
return rc;
 }
@@ -1956,6 +1961,7 @@ static int davinci_emac_remove(struct platform_device 
*pdev)
 {
struct net_device *ndev = platform_get_drvdata(pdev);
struct emac_priv *priv = netdev_priv(ndev);
+   struct device_node *np = pdev->dev.of_node;
 
dev_notice(>dev, "DaVinci EMAC: davinci_emac_remove()\n");
 
@@ -1968,6 +1974,8 @@ static int davinci_emac_remove(struct platform_device 
*pdev)
unregister_netdev(ndev);
of_node_put(priv->phy_node);
pm_runtime_disable(>dev);
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
free_netdev(ndev);
 
return 0;
-- 
2.7.3



[PATCH net 16/16] net: dsa: slave: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on slave-setup errors and on slave destroy.

Fixes: 0d8bcdd383b8 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: Johan Hovold 
---
 net/dsa/slave.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 2a5c20a13fe4..30e2e21d7619 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1177,6 +1177,8 @@ static int dsa_slave_phy_setup(struct dsa_slave_priv *p,
ret = dsa_slave_phy_connect(p, slave_dev, p->port);
if (ret) {
netdev_err(slave_dev, "failed to connect to port %d: 
%d\n", p->port, ret);
+   if (phy_is_fixed)
+   of_phy_deregister_fixed_link(port_dn);
return ret;
}
}
@@ -1292,10 +1294,18 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
device *parent,
 void dsa_slave_destroy(struct net_device *slave_dev)
 {
struct dsa_slave_priv *p = netdev_priv(slave_dev);
+   struct dsa_switch *ds = p->parent;
+   struct device_node *port_dn;
+
+   port_dn = ds->ports[p->port].dn;
 
netif_carrier_off(slave_dev);
-   if (p->phy)
+   if (p->phy) {
phy_disconnect(p->phy);
+
+   if (of_phy_is_fixed_link(port_dn))
+   of_phy_deregister_fixed_link(port_dn);
+   }
unregister_netdev(slave_dev);
free_netdev(slave_dev);
 }
-- 
2.7.3



[PATCH net 14/16] net: ethernet: dwc_eth_qos: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: 077742dac2c7 ("dwc_eth_qos: Add support for Synopsys DWC Ethernet
QoS")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index 4ba2421e625d..97d64bfed465 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -2881,7 +2881,7 @@ static int dwceqos_probe(struct platform_device *pdev)
ret = of_get_phy_mode(lp->pdev->dev.of_node);
if (ret < 0) {
dev_err(>pdev->dev, "error in getting phy i/f\n");
-   goto err_out_clk_dis_phy;
+   goto err_out_deregister_fixed_link;
}
 
lp->phy_interface = ret;
@@ -2889,14 +2889,14 @@ static int dwceqos_probe(struct platform_device *pdev)
ret = dwceqos_mii_init(lp);
if (ret) {
dev_err(>pdev->dev, "error in dwceqos_mii_init\n");
-   goto err_out_clk_dis_phy;
+   goto err_out_deregister_fixed_link;
}
 
ret = dwceqos_mii_probe(ndev);
if (ret != 0) {
netdev_err(ndev, "mii_probe fail.\n");
ret = -ENXIO;
-   goto err_out_clk_dis_phy;
+   goto err_out_deregister_fixed_link;
}
 
dwceqos_set_umac_addr(lp, lp->ndev->dev_addr, 0);
@@ -2914,7 +2914,7 @@ static int dwceqos_probe(struct platform_device *pdev)
if (ret) {
dev_err(>pdev->dev, "Unable to retrieve DT, error %d\n",
ret);
-   goto err_out_clk_dis_phy;
+   goto err_out_deregister_fixed_link;
}
dev_info(>pdev->dev, "pdev->id %d, baseaddr 0x%08lx, irq %d\n",
 pdev->id, ndev->base_addr, ndev->irq);
@@ -2924,7 +2924,7 @@ static int dwceqos_probe(struct platform_device *pdev)
if (ret) {
dev_err(>pdev->dev, "Unable to request IRQ %d, error %d\n",
ndev->irq, ret);
-   goto err_out_clk_dis_phy;
+   goto err_out_deregister_fixed_link;
}
 
if (netif_msg_probe(lp))
@@ -2935,11 +2935,14 @@ static int dwceqos_probe(struct platform_device *pdev)
ret = register_netdev(ndev);
if (ret) {
dev_err(>dev, "Cannot register net device, aborting.\n");
-   goto err_out_clk_dis_phy;
+   goto err_out_deregister_fixed_link;
}
 
return 0;
 
+err_out_deregister_fixed_link:
+   if (of_phy_is_fixed_link(pdev->dev.of_node))
+   of_phy_deregister_fixed_link(pdev->dev.of_node);
 err_out_clk_dis_phy:
clk_disable_unprepare(lp->phy_ref_clk);
 err_out_clk_dis_aper:
@@ -2959,8 +2962,11 @@ static int dwceqos_remove(struct platform_device *pdev)
if (ndev) {
lp = netdev_priv(ndev);
 
-   if (ndev->phydev)
+   if (ndev->phydev) {
phy_disconnect(ndev->phydev);
+   if (of_phy_is_fixed_link(pdev->dev.of_node))
+   of_phy_deregister_fixed_link(pdev->dev.of_node);
+   }
mdiobus_unregister(lp->mii_bus);
mdiobus_free(lp->mii_bus);
 
-- 
2.7.3



[PATCH net 13/16] net: ethernet: renesas: ravb: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on initialisation errors and on device
close after having disconnected the PHY.

Fixes: b4bc88a868ed ("ravb: Add fixed-link support")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/renesas/ravb_main.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 630536bc72f9..f1f3be2cfe21 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1008,7 +1008,8 @@ static int ravb_phy_init(struct net_device *ndev)
of_node_put(pn);
if (!phydev) {
netdev_err(ndev, "failed to connect PHY\n");
-   return -ENOENT;
+   err = -ENOENT;
+   goto err_deregister_fixed_link;
}
 
/* This driver only support 10/100Mbit speeds on Gen3
@@ -1020,8 +1021,7 @@ static int ravb_phy_init(struct net_device *ndev)
err = phy_set_max_speed(phydev, SPEED_100);
if (err) {
netdev_err(ndev, "failed to limit PHY to 100Mbit/s\n");
-   phy_disconnect(phydev);
-   return err;
+   goto err_phy_disconnect;
}
 
netdev_info(ndev, "limited PHY to 100Mbit/s\n");
@@ -1033,6 +1033,14 @@ static int ravb_phy_init(struct net_device *ndev)
phy_attached_info(phydev);
 
return 0;
+
+err_phy_disconnect:
+   phy_disconnect(phydev);
+err_deregister_fixed_link:
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
+
+   return err;
 }
 
 /* PHY control start function */
@@ -1634,6 +1642,7 @@ static void ravb_set_rx_mode(struct net_device *ndev)
 /* Device close function for Ethernet AVB */
 static int ravb_close(struct net_device *ndev)
 {
+   struct device_node *np = ndev->dev.parent->of_node;
struct ravb_private *priv = netdev_priv(ndev);
struct ravb_tstamp_skb *ts_skb, *ts_skb2;
 
@@ -1663,6 +1672,8 @@ static int ravb_close(struct net_device *ndev)
if (ndev->phydev) {
phy_stop(ndev->phydev);
phy_disconnect(ndev->phydev);
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
}
 
if (priv->chip_id != RCAR_GEN2) {
-- 
2.7.3



[PATCH net 08/16] net: ethernet: fs_enet: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: bb74d9a4a87b ("fs_enet: use the new fixed PHY helpers")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c 
b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
index dc120c148d97..4b86260584a0 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
@@ -980,7 +980,7 @@ static int fs_enet_probe(struct platform_device *ofdev)
err = clk_prepare_enable(clk);
if (err) {
ret = err;
-   goto out_free_fpi;
+   goto out_deregister_fixed_link;
}
fpi->clk_per = clk;
}
@@ -1061,6 +1061,9 @@ static int fs_enet_probe(struct platform_device *ofdev)
of_node_put(fpi->phy_node);
if (fpi->clk_per)
clk_disable_unprepare(fpi->clk_per);
+out_deregister_fixed_link:
+   if (of_phy_is_fixed_link(ofdev->dev.of_node))
+   of_phy_deregister_fixed_link(ofdev->dev.of_node);
 out_free_fpi:
kfree(fpi);
return ret;
@@ -1079,6 +1082,8 @@ static int fs_enet_remove(struct platform_device *ofdev)
of_node_put(fep->fpi->phy_node);
if (fep->fpi->clk_per)
clk_disable_unprepare(fep->fpi->clk_per);
+   if (of_phy_is_fixed_link(ofdev->dev.of_node))
+   of_phy_deregister_fixed_link(ofdev->dev.of_node);
free_netdev(ndev);
return 0;
 }
-- 
2.7.3



[PATCH net 10/16] net: ethernet: ucc_geth: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: 87009814cdbb ("ucc_geth: use the new fixed PHY helpers")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index 186ef8f16c80..f76d33279454 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -3868,9 +3868,8 @@ static int ucc_geth_probe(struct platform_device* ofdev)
dev = alloc_etherdev(sizeof(*ugeth));
 
if (dev == NULL) {
-   of_node_put(ug_info->tbi_node);
-   of_node_put(ug_info->phy_node);
-   return -ENOMEM;
+   err = -ENOMEM;
+   goto err_deregister_fixed_link;
}
 
ugeth = netdev_priv(dev);
@@ -3907,10 +3906,7 @@ static int ucc_geth_probe(struct platform_device* ofdev)
if (netif_msg_probe(ugeth))
pr_err("%s: Cannot register net device, aborting\n",
   dev->name);
-   free_netdev(dev);
-   of_node_put(ug_info->tbi_node);
-   of_node_put(ug_info->phy_node);
-   return err;
+   goto err_free_netdev;
}
 
mac_addr = of_get_mac_address(np);
@@ -3923,16 +3919,29 @@ static int ucc_geth_probe(struct platform_device* ofdev)
ugeth->node = np;
 
return 0;
+
+err_free_netdev:
+   free_netdev(dev);
+err_deregister_fixed_link:
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
+   of_node_put(ug_info->tbi_node);
+   of_node_put(ug_info->phy_node);
+
+   return err;
 }
 
 static int ucc_geth_remove(struct platform_device* ofdev)
 {
struct net_device *dev = platform_get_drvdata(ofdev);
struct ucc_geth_private *ugeth = netdev_priv(dev);
+   struct device_node *np = ofdev->dev.of_node;
 
unregister_netdev(dev);
free_netdev(dev);
ucc_geth_memclean(ugeth);
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
of_node_put(ugeth->ug_info->tbi_node);
of_node_put(ugeth->ug_info->phy_node);
 
-- 
2.7.3



[PATCH net 07/16] net: ethernet: fec: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: 407066f8f371 ("net: fec: Support phys probed from devicetree and
fixed-link")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/freescale/fec_main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5aa9d4ded214..74dcdf097348 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3475,6 +3475,8 @@ fec_probe(struct platform_device *pdev)
 failed_clk_ipg:
fec_enet_clk_enable(ndev, false);
 failed_clk:
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
 failed_phy:
of_node_put(phy_node);
 failed_ioremap:
@@ -3488,6 +3490,7 @@ fec_drv_remove(struct platform_device *pdev)
 {
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
+   struct device_node *np = pdev->dev.of_node;
 
cancel_work_sync(>tx_timeout_work);
fec_ptp_stop(pdev);
@@ -3495,6 +3498,8 @@ fec_drv_remove(struct platform_device *pdev)
fec_enet_mii_remove(fep);
if (fep->reg_phy)
regulator_disable(fep->reg_phy);
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
of_node_put(fep->phy_node);
free_netdev(ndev);
 
-- 
2.7.3



[PATCH net 11/16] net: ethernet: marvell: mvneta: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: 83895bedeee6 ("net: mvneta: add support for fixed links")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/marvell/mvneta.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 0c0a45af950f..707bc4680b9b 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4191,6 +4191,8 @@ static int mvneta_probe(struct platform_device *pdev)
clk_disable_unprepare(pp->clk);
 err_put_phy_node:
of_node_put(phy_node);
+   if (of_phy_is_fixed_link(dn))
+   of_phy_deregister_fixed_link(dn);
 err_free_irq:
irq_dispose_mapping(dev->irq);
 err_free_netdev:
@@ -4202,6 +4204,7 @@ static int mvneta_probe(struct platform_device *pdev)
 static int mvneta_remove(struct platform_device *pdev)
 {
struct net_device  *dev = platform_get_drvdata(pdev);
+   struct device_node *dn = pdev->dev.of_node;
struct mvneta_port *pp = netdev_priv(dev);
 
unregister_netdev(dev);
@@ -4209,6 +4212,8 @@ static int mvneta_remove(struct platform_device *pdev)
clk_disable_unprepare(pp->clk);
free_percpu(pp->ports);
free_percpu(pp->stats);
+   if (of_phy_is_fixed_link(dn))
+   of_phy_deregister_fixed_link(dn);
irq_dispose_mapping(dev->irq);
of_node_put(pp->phy_node);
free_netdev(dev);
-- 
2.7.3



[PATCH net 06/16] net: ethernet: bcmgenet: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Note that we're still leaking any fixed-link PHY registered in the
non-OF probe path.

Fixes: 9abf0c2b717a ("net: bcmgenet: use the new fixed PHY helpers")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/broadcom/genet/bcmmii.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c 
b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 2e745bd51df4..e87607621e62 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -627,6 +627,7 @@ static int bcmgenet_mii_bus_init(struct bcmgenet_priv *priv)
 int bcmgenet_mii_init(struct net_device *dev)
 {
struct bcmgenet_priv *priv = netdev_priv(dev);
+   struct device_node *dn = priv->pdev->dev.of_node;
int ret;
 
ret = bcmgenet_mii_alloc(priv);
@@ -640,6 +641,8 @@ int bcmgenet_mii_init(struct net_device *dev)
return 0;
 
 out:
+   if (of_phy_is_fixed_link(dn))
+   of_phy_deregister_fixed_link(dn);
of_node_put(priv->phy_dn);
mdiobus_unregister(priv->mii_bus);
mdiobus_free(priv->mii_bus);
@@ -649,7 +652,10 @@ int bcmgenet_mii_init(struct net_device *dev)
 void bcmgenet_mii_exit(struct net_device *dev)
 {
struct bcmgenet_priv *priv = netdev_priv(dev);
+   struct device_node *dn = priv->pdev->dev.of_node;
 
+   if (of_phy_is_fixed_link(dn))
+   of_phy_deregister_fixed_link(dn);
of_node_put(priv->phy_dn);
mdiobus_unregister(priv->mii_bus);
mdiobus_free(priv->mii_bus);
-- 
2.7.3



[PATCH net 09/16] net: ethernet: gianfar: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: be40364544bd ("gianfar: use the new fixed PHY helpers")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/freescale/gianfar.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 4b4f5bc0e279..9061c2f82b9c 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1312,6 +1312,7 @@ static void gfar_init_addr_hash_table(struct gfar_private 
*priv)
  */
 static int gfar_probe(struct platform_device *ofdev)
 {
+   struct device_node *np = ofdev->dev.of_node;
struct net_device *dev = NULL;
struct gfar_private *priv = NULL;
int err = 0, i;
@@ -1462,6 +1463,8 @@ static int gfar_probe(struct platform_device *ofdev)
return 0;
 
 register_fail:
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
unmap_group_regs(priv);
gfar_free_rx_queues(priv);
gfar_free_tx_queues(priv);
@@ -1474,11 +1477,16 @@ static int gfar_probe(struct platform_device *ofdev)
 static int gfar_remove(struct platform_device *ofdev)
 {
struct gfar_private *priv = platform_get_drvdata(ofdev);
+   struct device_node *np = ofdev->dev.of_node;
 
of_node_put(priv->phy_node);
of_node_put(priv->tbi_node);
 
unregister_netdev(priv->ndev);
+
+   if (of_phy_is_fixed_link(np))
+   of_phy_deregister_fixed_link(np);
+
unmap_group_regs(priv);
gfar_free_rx_queues(priv);
gfar_free_tx_queues(priv);
-- 
2.7.3



[PATCH net 05/16] net: ethernet: bcmsysport: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: 186534a3f832 ("net: systemport: use the new fixed PHY helpers")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index c3354b9941d1..25d1eb4933d0 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1755,13 +1755,13 @@ static int bcm_sysport_probe(struct platform_device 
*pdev)
if (priv->irq0 <= 0 || priv->irq1 <= 0) {
dev_err(>dev, "invalid interrupts\n");
ret = -EINVAL;
-   goto err;
+   goto err_free_netdev;
}
 
priv->base = devm_ioremap_resource(>dev, r);
if (IS_ERR(priv->base)) {
ret = PTR_ERR(priv->base);
-   goto err;
+   goto err_free_netdev;
}
 
priv->netdev = dev;
@@ -1779,7 +1779,7 @@ static int bcm_sysport_probe(struct platform_device *pdev)
ret = of_phy_register_fixed_link(dn);
if (ret) {
dev_err(>dev, "failed to register fixed PHY\n");
-   goto err;
+   goto err_free_netdev;
}
 
priv->phy_dn = dn;
@@ -1821,7 +1821,7 @@ static int bcm_sysport_probe(struct platform_device *pdev)
ret = register_netdev(dev);
if (ret) {
dev_err(>dev, "failed to register net_device\n");
-   goto err;
+   goto err_deregister_fixed_link;
}
 
priv->rev = topctrl_readl(priv, REV_CNTL) & REV_MASK;
@@ -1832,7 +1832,11 @@ static int bcm_sysport_probe(struct platform_device 
*pdev)
 priv->base, priv->irq0, priv->irq1, txq, rxq);
 
return 0;
-err:
+
+err_deregister_fixed_link:
+   if (of_phy_is_fixed_link(dn))
+   of_phy_deregister_fixed_link(dn);
+err_free_netdev:
free_netdev(dev);
return ret;
 }
@@ -1840,11 +1844,14 @@ static int bcm_sysport_probe(struct platform_device 
*pdev)
 static int bcm_sysport_remove(struct platform_device *pdev)
 {
struct net_device *dev = dev_get_drvdata(>dev);
+   struct device_node *dn = pdev->dev.of_node;
 
/* Not much to do, ndo_close has been called
 * and we use managed allocations
 */
unregister_netdev(dev);
+   if (of_phy_is_fixed_link(dn))
+   of_phy_deregister_fixed_link(dn);
free_netdev(dev);
dev_set_drvdata(>dev, NULL);
 
-- 
2.7.3



[PATCH net 04/16] net: ethernet: aurora: nb8800: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: c7dfe3abf40e ("net: ethernet: nb8800: support fixed-link DT
node")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/aurora/nb8800.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index 00c38bf151e6..e078d8da978c 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1466,12 +1466,12 @@ static int nb8800_probe(struct platform_device *pdev)
 
ret = nb8800_hw_init(dev);
if (ret)
-   goto err_free_bus;
+   goto err_deregister_fixed_link;
 
if (ops && ops->init) {
ret = ops->init(dev);
if (ret)
-   goto err_free_bus;
+   goto err_deregister_fixed_link;
}
 
dev->netdev_ops = _netdev_ops;
@@ -1504,6 +1504,9 @@ static int nb8800_probe(struct platform_device *pdev)
 
 err_free_dma:
nb8800_dma_free(dev);
+err_deregister_fixed_link:
+   if (of_phy_is_fixed_link(pdev->dev.of_node))
+   of_phy_deregister_fixed_link(pdev->dev.of_node);
 err_free_bus:
of_node_put(priv->phy_node);
mdiobus_unregister(bus);
@@ -1521,6 +1524,8 @@ static int nb8800_remove(struct platform_device *pdev)
struct nb8800_priv *priv = netdev_priv(ndev);
 
unregister_netdev(ndev);
+   if (of_phy_is_fixed_link(pdev->dev.of_node))
+   of_phy_deregister_fixed_link(pdev->dev.of_node);
of_node_put(priv->phy_node);
 
mdiobus_unregister(priv->mii_bus);
-- 
2.7.3



[PATCH net 01/16] net: dsa: slave: fix of-node leak and phy priority

2016-11-28 Thread Johan Hovold
Make sure to drop the reference taken by of_parse_phandle() before
returning from dsa_slave_phy_setup().

Note that this also modifies the PHY priority so that any fixed-link
node is only parsed when no phy-handle is given, which is in accordance
with the common scheme for this.

Fixes: 0d8bcdd383b8 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: Johan Hovold 
---
 net/dsa/slave.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 6b1282c006b1..2a5c20a13fe4 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1125,7 +1125,7 @@ static int dsa_slave_phy_setup(struct dsa_slave_priv *p,
p->phy_interface = mode;
 
phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
-   if (of_phy_is_fixed_link(port_dn)) {
+   if (!phy_dn && of_phy_is_fixed_link(port_dn)) {
/* In the case of a fixed PHY, the DT node associated
 * to the fixed PHY is the Port DT node
 */
@@ -1135,7 +1135,7 @@ static int dsa_slave_phy_setup(struct dsa_slave_priv *p,
return ret;
}
phy_is_fixed = true;
-   phy_dn = port_dn;
+   phy_dn = of_node_get(port_dn);
}
 
if (ds->ops->get_phy_flags)
@@ -1154,6 +1154,7 @@ static int dsa_slave_phy_setup(struct dsa_slave_priv *p,
ret = dsa_slave_phy_connect(p, slave_dev, phy_id);
if (ret) {
netdev_err(slave_dev, "failed to connect to 
phy%d: %d\n", phy_id, ret);
+   of_node_put(phy_dn);
return ret;
}
} else {
@@ -1162,6 +1163,8 @@ static int dsa_slave_phy_setup(struct dsa_slave_priv *p,
phy_flags,
p->phy_interface);
}
+
+   of_node_put(phy_dn);
}
 
if (p->phy && phy_is_fixed)
-- 
2.7.3



[PATCH net 02/16] of_mdio: add helper to deregister fixed-link PHYs

2016-11-28 Thread Johan Hovold
Add helper to deregister fixed-link PHYs registered using
of_phy_register_fixed_link().

Convert the two drivers that care to deregister their fixed-link PHYs to
use the new helper, but note that most drivers currently fail to do so.

Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/ti/cpsw.c | 16 ++--
 drivers/of/of_mdio.c   | 15 +++
 include/linux/of_mdio.h|  4 
 net/dsa/dsa.c  | 12 ++--
 4 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 58947aae31c7..9f0646512624 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2459,20 +2459,8 @@ static void cpsw_remove_dt(struct platform_device *pdev)
if (strcmp(slave_node->name, "slave"))
continue;
 
-   if (of_phy_is_fixed_link(slave_node)) {
-   struct phy_device *phydev;
-
-   phydev = of_phy_find_device(slave_node);
-   if (phydev) {
-   fixed_phy_unregister(phydev);
-   /* Put references taken by
-* of_phy_find_device() and
-* of_phy_register_fixed_link().
-*/
-   phy_device_free(phydev);
-   phy_device_free(phydev);
-   }
-   }
+   if (of_phy_is_fixed_link(slave_node))
+   of_phy_deregister_fixed_link(slave_node);
 
of_node_put(slave_data->phy_node);
 
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 5a3145a02547..262281bd68fa 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -490,3 +490,18 @@ int of_phy_register_fixed_link(struct device_node *np)
return -ENODEV;
 }
 EXPORT_SYMBOL(of_phy_register_fixed_link);
+
+void of_phy_deregister_fixed_link(struct device_node *np)
+{
+   struct phy_device *phydev;
+
+   phydev = of_phy_find_device(np);
+   if (!phydev)
+   return;
+
+   fixed_phy_unregister(phydev);
+
+   put_device(>mdio.dev);  /* of_phy_find_device() */
+   phy_device_free(phydev);/* fixed_phy_register() */
+}
+EXPORT_SYMBOL(of_phy_deregister_fixed_link);
diff --git a/include/linux/of_mdio.h b/include/linux/of_mdio.h
index 2ab233661ae5..a58cca8bcb29 100644
--- a/include/linux/of_mdio.h
+++ b/include/linux/of_mdio.h
@@ -29,6 +29,7 @@ struct phy_device *of_phy_attach(struct net_device *dev,
 extern struct mii_bus *of_mdio_find_bus(struct device_node *mdio_np);
 extern int of_mdio_parse_addr(struct device *dev, const struct device_node 
*np);
 extern int of_phy_register_fixed_link(struct device_node *np);
+extern void of_phy_deregister_fixed_link(struct device_node *np);
 extern bool of_phy_is_fixed_link(struct device_node *np);
 
 #else /* CONFIG_OF */
@@ -83,6 +84,9 @@ static inline int of_phy_register_fixed_link(struct 
device_node *np)
 {
return -ENOSYS;
 }
+static inline void of_phy_deregister_fixed_link(struct device_node *np)
+{
+}
 static inline bool of_phy_is_fixed_link(struct device_node *np)
 {
return false;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index cb0091b99592..7899919cd9f0 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -506,16 +506,8 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
 
 void dsa_cpu_dsa_destroy(struct device_node *port_dn)
 {
-   struct phy_device *phydev;
-
-   if (of_phy_is_fixed_link(port_dn)) {
-   phydev = of_phy_find_device(port_dn);
-   if (phydev) {
-   fixed_phy_unregister(phydev);
-   put_device(>mdio.dev);
-   phy_device_free(phydev);
-   }
-   }
+   if (of_phy_is_fixed_link(port_dn))
+   of_phy_deregister_fixed_link(port_dn);
 }
 
 static void dsa_switch_destroy(struct dsa_switch *ds)
-- 
2.7.3



[PATCH net 00/16] net: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
This series fixes failures to deregister and free fixed-link phydevs
that have been registered using the of_phy_register_fixed_link()
interface.

All but two drivers currently fail to do this and this series fixes most
of them with the exception of a staging driver and the stmmac drivers
which will be fixed by follow-on patches.

Included are also a couple of fixes for related of-node leaks.

Note that all patches except the of_mdio one have been compile-tested
only.

Also note that the series is against net due to dependencies not yet in
net-next.

Johan


Johan Hovold (16):
  net: dsa: slave: fix of-node leak and phy priority
  of_mdio: add helper to deregister fixed-link PHYs
  net: ethernet: altera: fix fixed-link phydev leaks
  net: ethernet: aurora: nb8800: fix fixed-link phydev leaks
  net: ethernet: bcmsysport: fix fixed-link phydev leaks
  net: ethernet: bcmgenet: fix fixed-link phydev leaks
  net: ethernet: fec: fix fixed-link phydev leaks
  net: ethernet: fs_enet: fix fixed-link phydev leaks
  net: ethernet: gianfar: fix fixed-link phydev leaks
  net: ethernet: ucc_geth: fix fixed-link phydev leaks
  net: ethernet: marvell: mvneta: fix fixed-link phydev leaks
  net: ethernet: mediatek: fix fixed-link phydev leaks
  net: ethernet: renesas: ravb: fix fixed-link phydev leaks
  net: ethernet: dwc_eth_qos: fix fixed-link phydev leaks
  net: ethernet: ti: davinci_emac: fix fixed-link phydev and of-node
leaks
  net: dsa: slave: fix fixed-link phydev leaks

 drivers/net/ethernet/altera/altera_tse_main.c  |  9 -
 drivers/net/ethernet/aurora/nb8800.c   |  9 +++--
 drivers/net/ethernet/broadcom/bcmsysport.c | 17 +++-
 drivers/net/ethernet/broadcom/genet/bcmmii.c   |  6 ++
 drivers/net/ethernet/freescale/fec_main.c  |  5 +
 .../net/ethernet/freescale/fs_enet/fs_enet-main.c  |  7 ++-
 drivers/net/ethernet/freescale/gianfar.c   |  8 
 drivers/net/ethernet/freescale/ucc_geth.c  | 23 +++---
 drivers/net/ethernet/marvell/mvneta.c  |  5 +
 drivers/net/ethernet/mediatek/mtk_eth_soc.c|  4 
 drivers/net/ethernet/renesas/ravb_main.c   | 17 +---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c| 20 ---
 drivers/net/ethernet/ti/cpsw.c | 16 ++-
 drivers/net/ethernet/ti/davinci_emac.c | 10 +-
 drivers/of/of_mdio.c   | 15 ++
 include/linux/of_mdio.h|  4 
 net/dsa/dsa.c  | 12 ++-
 net/dsa/slave.c| 19 +++---
 18 files changed, 152 insertions(+), 54 deletions(-)

-- 
2.7.3



[PATCH net 03/16] net: ethernet: altera: fix fixed-link phydev leaks

2016-11-28 Thread Johan Hovold
Make sure to deregister and free any fixed-link PHY registered using
of_phy_register_fixed_link() on probe errors and on driver unbind.

Fixes: 7cdbc6f74f8e ("altera tse: add support for fixed-links.")
Signed-off-by: Johan Hovold 
---
 drivers/net/ethernet/altera/altera_tse_main.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index bda31f308cc2..6532829b70d2 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -819,6 +819,8 @@ static int init_phy(struct net_device *dev)
 
if (!phydev) {
netdev_err(dev, "Could not find the PHY\n");
+   if (fixed_link)
+   of_phy_deregister_fixed_link(priv->device->of_node);
return -ENODEV;
}
 
@@ -1545,10 +1547,15 @@ static int altera_tse_probe(struct platform_device 
*pdev)
 static int altera_tse_remove(struct platform_device *pdev)
 {
struct net_device *ndev = platform_get_drvdata(pdev);
+   struct altera_tse_private *priv = netdev_priv(ndev);
 
-   if (ndev->phydev)
+   if (ndev->phydev) {
phy_disconnect(ndev->phydev);
 
+   if (of_phy_is_fixed_link(priv->device->of_node))
+   of_phy_deregister_fixed_link(priv->device->of_node);
+   }
+
platform_set_drvdata(pdev, NULL);
altera_tse_mdio_destroy(ndev);
unregister_netdev(ndev);
-- 
2.7.3



Re: [PATCH] powerpc/boot: request no dynamic linker for boot wrapper

2016-11-28 Thread Nick Clifton
Hi Nicholas,

>> ... this actually seems like a better fix to me.  If you do not want the 
>> PT_INTERP segment, then telling this linker this is a good idea.  So wouldn't
>> a patch like this be a better solution to the problem ?
> 
> Yes, I wasn't asking for the binutils change to be reverted.

Oh right.  Actually it looks like at least part of the patch is going to have
to be reverted, (the part that sorts the PT_LOAD segments into ascending 
order), 
as currently it breaks building Linux  kernels.  *sigh*

> I don't think the
> boot wrapper is relying on this non-standard form. If we go with
> --no-dynamic-linker then I'm assuming we get a standard ELF binary?
> That seems desirable.

Yes, you definitely should get a proper ELF binary.

> I was just checking whether this is the best think for the kernel to do.
> Is it possible to get a similar behaviour using the linker script instead
> (so it's compatible with older binutils)?

Actually probably not.  :-(  Several backends, including the PPC, will now 
attempt to automatically create and install the PT_INTERP segment unless you
explicitly tell them not too.  Even if you are using a custom linker script.

Cheers
  Nick



Re: [PATCH] powerpc/boot: request no dynamic linker for boot wrapper

2016-11-28 Thread Nick Clifton
Hi Nicholas,

>>> The boot wrapper performs its own relocations and does not require
>>> PT_INTERP segment.
>>>
>>> Without this option, binutils 2.28 and newer tries to create a program
>>> header segment due to PT_INTERP, and the link fails because there is no
>>> space for it.  
>>
>> 2.28 is not released yet though is it?
>>
>> So can we just declare versions with that behaviour broken?
> 
> No it's not released yet, but I don't know if it's due entirely to binutils
> bug. Let's see what Nick thinks.

Well the patch that caused this problem was an attempt to fix the linker so 
that it would enforce the ELF standard.  Prior to the patch the linker would 
silently create binaries that violated the standard and which, at least for
the people reporting the problem in binutils PR 20815, failed to execute.

It now appears however that some programs, including the boot wrapper and the
Linux kernel, may actually rely upon non-standard ELF binaries being created.
Before I revert the patch however, I would like to ask...

>>> +# Do not include PT_INTERP segment when linking pie. Non-pie linking
>>> +# just ignores this option.
>>> +LD_VERSION=$(${CROSS}ld --version | $srctree/scripts/ld-version.sh)
>>> +LD_NO_DL_MIN_VERSION=$(echo 2.26 | $srctree/scripts/ld-version.sh)
>>> +if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VERSION" ] ; then
>>> +   nodl="--no-dynamic-linker"
>>> +fi  

... this actually seems like a better fix to me.  If you do not want the 
PT_INTERP segment, then telling this linker this is a good idea.  So wouldn't
a patch like this be a better solution to the problem ?

Cheers
  Nick


[PATCH] powerpc/pserie: Use lmb_is_removable() to check removability

2016-11-28 Thread Nathan Fontenot
We should be using lmb_is_removable() to validate that enough LMBs
are available to remove when doing a remove by count. This will check
that the LMB is owned by the system and it is considered removable.
This patch also adds a pr_info() notification to report the LMB count
to remove was not satisfied.

What we do now is just check that there are enough LMBs owned by the
system when validating there are enough LMBs to remove. This can
lead to situations where there are enough LMBs owned by the system
but not enough that are considered removable. This results in having
to bail out of the remove operation instead of just failing the request
that we should have known wouldn't succeed.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 76ec104..2617f9f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -472,12 +472,15 @@ static int dlpar_memory_remove_by_count(u32 
lmbs_to_remove,
 
/* Validate that there are enough LMBs to satisfy the request */
for (i = 0; i < num_lmbs; i++) {
-   if (lmbs[i].flags & DRCONF_MEM_ASSIGNED)
+   if (lmb_is_removable([i]))
lmbs_available++;
}
 
-   if (lmbs_available < lmbs_to_remove)
+   if (lmbs_available < lmbs_to_remove) {
+   pr_info("Not enough LMBs available (%d of %d) to satisfy 
request\n",
+   lmbs_available, lmbs_to_remove);
return -EINVAL;
+   }
 
for (i = 0; i < num_lmbs && lmbs_removed < lmbs_to_remove; i++) {
rc = dlpar_remove_lmb([i]);



Re: [RFC][PATCH] Update ppc disassembly in xmon

2016-11-28 Thread Peter Bergner

On 11/23/16 11:52 PM, Balbir Singh wrote:



On 24/11/16 16:14, Andrew Donnellan wrote:

On 24/11/16 13:05, Balbir Singh wrote:

9. The license for these files is now GPL v3 or later


As much as I love the GPLv3, isn't this an instant NAK?



Thats why I called it out, my bad though I should have done
a stronger check than just "git grep "version 3"" which matched
scripts/kconfig/zconf.tab.c_shipped -- derived from bison, which
can itself be licensed differently

Michael, please hold these patches and don't merge them yet.


You should double check, but if all of the changes are from IBM
(ie, Alan and I), then as the authors of the changes, we have the
right to relicense the changes, so we could backport them as GPLv2.

Peter





Re: [PATCH] cpuidle/powernv: staticise powernv_idle_driver

2016-11-28 Thread Rafael J. Wysocki
On Tue, Nov 22, 2016 at 6:08 AM, Andrew Donnellan
 wrote:
> powernv_idle_driver isn't exported, it can be made static. Found by sparse.
>
> Signed-off-by: Andrew Donnellan 

Applied.

Thanks,
Rafael


Re: [PATCH] powerpc/boot: request no dynamic linker for boot wrapper

2016-11-28 Thread Nicholas Piggin
On Mon, 28 Nov 2016 12:39:38 +
Nick Clifton  wrote:

> Hi Nicholas,
> 
> >>> The boot wrapper performs its own relocations and does not require
> >>> PT_INTERP segment.
> >>>
> >>> Without this option, binutils 2.28 and newer tries to create a program
> >>> header segment due to PT_INTERP, and the link fails because there is no
> >>> space for it.
> >>
> >> 2.28 is not released yet though is it?
> >>
> >> So can we just declare versions with that behaviour broken?  
> > 
> > No it's not released yet, but I don't know if it's due entirely to binutils
> > bug. Let's see what Nick thinks.  
> 
> Well the patch that caused this problem was an attempt to fix the linker so 
> that it would enforce the ELF standard.  Prior to the patch the linker would 
> silently create binaries that violated the standard and which, at least for
> the people reporting the problem in binutils PR 20815, failed to execute.
> 
> It now appears however that some programs, including the boot wrapper and the
> Linux kernel, may actually rely upon non-standard ELF binaries being created.
> Before I revert the patch however, I would like to ask...
> 
> >>> +# Do not include PT_INTERP segment when linking pie. Non-pie linking
> >>> +# just ignores this option.
> >>> +LD_VERSION=$(${CROSS}ld --version | $srctree/scripts/ld-version.sh)
> >>> +LD_NO_DL_MIN_VERSION=$(echo 2.26 | $srctree/scripts/ld-version.sh)
> >>> +if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VERSION" ] ; then
> >>> + nodl="--no-dynamic-linker"
> >>> +fi
> 
> ... this actually seems like a better fix to me.  If you do not want the 
> PT_INTERP segment, then telling this linker this is a good idea.  So wouldn't
> a patch like this be a better solution to the problem ?

Yes, I wasn't asking for the binutils change to be reverted. We're
generally happy to adapt to toolchain improvements. I don't think the
boot wrapper is relying on this non-standard form. If we go with
--no-dynamic-linker then I'm assuming we get a standard ELF binary?
That seems desirable.

I was just checking whether this is the best think for the kernel to do.
Is it possible to get a similar behaviour using the linker script instead
(so it's compatible with older binutils)?

Thanks,
Nick


Re: powerpc/eeh/of: use builtin_platform_driver

2016-11-28 Thread Michael Ellerman
On Wed, 2016-11-23 at 14:58:56 UTC, Geliang Tang wrote:
> Use builtin_platform_driver() helper to simplify the code.
> 
> Signed-off-by: Geliang Tang 
> Acked-by: Russell Currey 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ebb242d56bbe14af6ea25cf5e45440

cheers


Re: powerpc: Fix __cmpxchg() to take a volatile ptr again

2016-11-28 Thread Michael Ellerman
On Fri, 2016-11-25 at 03:07:09 UTC, Michael Ellerman wrote:
> In commit d0563a1297e2 ("powerpc: Implement {cmp}xchg for u8 and u16")
> we removed the volatile from __cmpxchg().
> 
> This is leading to warnings such as:
> 
>   drivers/gpu/drm/drm_lock.c: In function ‘drm_lock_take’:
>   arch/powerpc/include/asm/cmpxchg.h:484:37: warning: passing argument 1
>   of ‘__cmpxchg’ discards ‘volatile’ qualifier from pointer target
>  (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_,   \
> 
> There doesn't seem to be consensus across architectures whether the
> argument is volatile or not, so at least for now put the volatile back.
> 
> Fixes: d0563a1297e2 ("powerpc: Implement {cmp}xchg for u8 and u16")
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/da58b23cb976ab83a80d358102e139

cheers


Re: [powerpc,v6,1/3] Setup AMOR in HV mode

2016-11-28 Thread Michael Ellerman
On Tue, 2016-11-15 at 06:56:14 UTC, Balbir Singh wrote:
> AMOR should be setup in HV mode, we set it up once
> and let the generic kernel handle IAMR. This patch is
> used to enable storage keys in a following patch as
> defined in ISA 3. We don't setup AMOR in DD1, since we
> can't setup IAMR in DD1 (bits have to be 0). If we setup
> AMOR some other code could potentially try to set IAMR
> (guest kernel for example).
> 
> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Balbir Singh 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ee97b6b99f42285d29d439f2e5376e

cheers


Re: powernv: Clear SPRN_PSSCR when a POWER9 CPU comes online

2016-11-28 Thread Michael Ellerman
On Tue, 2016-11-22 at 18:06:40 UTC, "Gautham R. Shenoy" wrote:
> From: "Gautham R. Shenoy" 
> 
> Ensure that PSSCR is set to a safe value corresponding to no
> state-loss each time a POWER9 CPU comes online.
> 
> Signed-off-by: Gautham R. Shenoy 
> Acked-By: Michael Neuling 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/378f96d3cd442d5cb8e2692d8767a4

cheers


Re: cxl: drop duplicate header sched.h

2016-11-28 Thread Michael Ellerman
On Wed, 2016-11-23 at 15:27:38 UTC, Geliang Tang wrote:
> Drop duplicate header sched.h from native.c.
> 
> Signed-off-by: Geliang Tang 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7184bc2ddb15d9539c701668d6a5be

cheers


Re: powerpc/xmon: Add 'dt' command to dump trace buffers

2016-11-28 Thread Michael Ellerman
On Fri, 2015-11-06 at 02:21:17 UTC, Michael Ellerman wrote:
> There is a nice interface for asking ftrace to dump all its tracing
> buffers. The only down side for use in xmon is that it uses printk.
> Depending on circumstances printk may not work when in xmon, but it also
> may, so add a 'dt' command which dumps the ftrace buffers, and add a
> note to the help to mentiont that it uses printk.
> 
> Calling this routine also disables tracing, which is problematic if you
> return from xmon and expect the system to keep operating normally. So
> after we do the dump turn tracing back on.
> 
> Both functions already have nop versions defined for when ftrace is not
> enabled, so we don't need any extra #ifdefs.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/56144ec7c93f6f18aa878560074633

cheers


Re: powerpc/mm: Fixup kernel read only mapping

2016-11-28 Thread Michael Ellerman
On Thu, 2016-11-24 at 09:39:54 UTC, "Aneesh Kumar K.V" wrote:
> With commit e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO") we started
> using the ppp value 0b110 to map kernel readonly. But that facility
> was only added as part of ISA 2.04. For earlier ISA version only supported ppp
> bit value for readonly mapping is 0b011. (This implies both user and kernel
> get mapped using the same ppp bit value for readonly mapping.).
> 
> Update the code such that for earlier architecture version we use ppp value
> 0b011 for readonly mapping. We don't differentiate between power5+ and power5
> here and apply the new ppp bits only from power6 (ISA 2.05). This keep the
> changes minimal.
> 
> This fixes issue with PS3 spu usage reported at
> https://lkml.kernel.org/r/rep.1421449714.ge...@infradead.org
> 
> Fixes: e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO")
> Tested-by: Geoff Levand 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/984d7a1ec67ce3a46324fa4bcb4c74

cheers


Re: [PATCH] powerpc/boot: request no dynamic linker for boot wrapper

2016-11-28 Thread Nicholas Piggin
On Mon, 28 Nov 2016 22:07:39 +1100
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> 
> > The boot wrapper performs its own relocations and does not require
> > PT_INTERP segment.
> >
> > Without this option, binutils 2.28 and newer tries to create a program
> > header segment due to PT_INTERP, and the link fails because there is no
> > space for it.  
> 
> 2.28 is not released yet though is it?
> 
> So can we just declare versions with that behaviour broken?

No it's not released yet, but I don't know if it's due entirely to binutils
bug. Let's see what Nick thinks.

> 
> > diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
> > index 404b3aa..cd941a8 100755
> > --- a/arch/powerpc/boot/wrapper
> > +++ b/arch/powerpc/boot/wrapper
> > @@ -181,6 +181,13 @@ case "$elfformat" in
> >  elf32-powerpc) format=elf32ppc ;;
> >  esac
> >  
> > +# Do not include PT_INTERP segment when linking pie. Non-pie linking
> > +# just ignores this option.
> > +LD_VERSION=$(${CROSS}ld --version | $srctree/scripts/ld-version.sh)
> > +LD_NO_DL_MIN_VERSION=$(echo 2.26 | $srctree/scripts/ld-version.sh)
> > +if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VERSION" ] ; then
> > +   nodl="--no-dynamic-linker"
> > +fi  
> 
> Some distros (RHEL at least?), ship the wrapper as a standalone script.
> So I don't think we can call things in $srctree. Or at least I don't
> know how that's supposed to work when it's shipped standalone.
> We're also basically reinventing ld-option, which is a PITA.

Okay I didn't realize that. It's already using mkuboot.sh, but only
for uboot targets... I don't know, I don't have any good ideas at the
moment.

Maybe go back to using the linker script if possible? (I hadn't been
able to quite get it right yesterday, and this option seemed cleaner,
but it might be possible)

Thanks,
Nick


Re: [PATCH v7 4/7] powerpc/mm: Add radix__tlb_flush_pte

2016-11-28 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:

> diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
> index 3493cf4e0452..7648952e4f08 100644
> --- a/arch/powerpc/mm/tlb-radix.c
> +++ b/arch/powerpc/mm/tlb-radix.c
> @@ -428,3 +428,21 @@ void radix__flush_tlb_all(void)
>: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(0) : 
> "memory");
>   asm volatile("eieio; tlbsync; ptesync": : :"memory");
>  }
> +
> +void radix__flush_tlb_pte(unsigned long old_pte, struct mm_struct *mm,
> +   unsigned long address)
> +{
> + /*
> +  * We track page size in pte only for DD1, So we can
> +  * call this only on DD1.
> +  */
> + if (!cpu_has_feature(CPU_FTR_POWER9_DD1)) {
> + VM_WARN_ON(1);
> + return;
> + }

That's a bit gross but I guess it's OK.

How about we give the function a name that makes it obvious as well?

Like radix__flush_tlb_pte_p9_dd1() - ugly but unlikely anyone would call
it by accident outside of a workaround.

cheers


Re: [PATCH] powerpc/boot: request no dynamic linker for boot wrapper

2016-11-28 Thread Michael Ellerman
Nicholas Piggin  writes:

> The boot wrapper performs its own relocations and does not require
> PT_INTERP segment.
>
> Without this option, binutils 2.28 and newer tries to create a program
> header segment due to PT_INTERP, and the link fails because there is no
> space for it.

2.28 is not released yet though is it?

So can we just declare versions with that behaviour broken?

> diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
> index 404b3aa..cd941a8 100755
> --- a/arch/powerpc/boot/wrapper
> +++ b/arch/powerpc/boot/wrapper
> @@ -181,6 +181,13 @@ case "$elfformat" in
>  elf32-powerpc)   format=elf32ppc ;;
>  esac
>  
> +# Do not include PT_INTERP segment when linking pie. Non-pie linking
> +# just ignores this option.
> +LD_VERSION=$(${CROSS}ld --version | $srctree/scripts/ld-version.sh)
> +LD_NO_DL_MIN_VERSION=$(echo 2.26 | $srctree/scripts/ld-version.sh)
> +if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VERSION" ] ; then
> + nodl="--no-dynamic-linker"
> +fi

Some distros (RHEL at least?), ship the wrapper as a standalone script.
So I don't think we can call things in $srctree. Or at least I don't
know how that's supposed to work when it's shipped standalone.

We're also basically reinventing ld-option, which is a PITA.

cheers


Re: [PATCH kernel v6 5/7] vfio/spapr: Postpone default window creation

2016-11-28 Thread Alexey Kardashevskiy
On 25/11/16 22:37, David Gibson wrote:
> On Fri, Nov 25, 2016 at 05:38:26PM +1100, Alexey Kardashevskiy wrote:
>> On 25/11/16 15:39, David Gibson wrote:
>>> On Thu, Nov 24, 2016 at 04:48:08PM +1100, Alexey Kardashevskiy wrote:
 We are going to allow the userspace to configure container in
 one memory context and pass container fd to another so
 we are postponing memory allocations accounted against
 the locked memory limit. One of previous patches took care of
 it_userspace.

 At the moment we create the default DMA window when the first group is
 attached to a container; this is done for the userspace which is not
 DDW-aware but familiar with the SPAPR TCE IOMMU v2 in the part of memory
 pre-registration - such client expects the default DMA window to exist.

 This postpones the default DMA window allocation till one of
 the folliwing happens:
 1. first map/unmap request arrives;
 2. new window is requested;
 This adds noop for the case when the userspace requested removal
 of the default window which has not been created yet.

 Signed-off-by: Alexey Kardashevskiy 
>>>
>>> Hmm.. it just occurred to me: why do you even need to delay creation
>>> of the default window?
>>
>>
>> Because I want to account the memory it uses in locked_vm of the mm which
>> will later be used for map/unmap.
> 
> Ah, good point.  How is the locked vm accounted for the non ddw case.


When ioctl(ENABLE) is processed, the SPAPR TCE driver increments it. The
DDW case does not use ENABLE, the memory preregistration is used instead.



-- 
Alexey



signature.asc
Description: OpenPGP digital signature


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-4.9-6 tag

2016-11-28 Thread Michael Ellerman
Linus Torvalds  writes:

> On Sat, Nov 26, 2016 at 12:11 AM, Michael Ellerman  
> wrote:
>>
>> powerpc fixes for 4.9 #6
>>
>> Fixes marked for stable:
>>  - Set missing wakeup bit in LPCR on POWER9 (Benjamin Herrenschmidt)
>>  - Fix the early OPAL console wrappers (Oliver O'Halloran)
>>  - Fixup kernel read only mapping (Aneesh Kumar K.V)
>>
>> Fixes for code merged this cycle:
>>  - Fix missing CRCs, add more asm-prototypes.h declarations (Nicholas Piggin)
>
> Pulled, but I wanted to talk about your merge "summary".
>
> Your merge summaries seem to be entirely automatically generated,
> which makes them less than great. I can see all that stuff in the git
> tree already, just formatting it differently isn't all that useful.

OK. The starting point is obviously an automatically generated list of
commits, but I have been editing that a fair bit to drop boring commits
and combine multiple commits into a single line, and then sort it by
topic area etc.

But obviously I'm not editing it enough, so I'll try to summarise it
much more heavily.

> For something like this late-rc pull when there are only a couple of
> commits, the end result actually ends up looking almost like a summary
> and all I did was remove the names that don't add to the description
> (and are in the git commits).

I actually do like to include the names, just to give people a bit of
acknowledgment in the pull request, but I can drop them if you prefer.
Or maybe I'll just include a credits section at the bottom of the tag
with everyone's name once, and you can drop that from the commit?

> For some of the bigger pull requests, the summary is almost anything
> but, and the only real value-add is the grouping by subject area.
>
> I really prefer a _summary_. Something that is human-legible. So that
> when people read the merge commit log, they get an overview of what
> changed, not a list of the details.

Right.

cheers


Re: [PATCH] crypto: vmx - rebuild generated asm when target changes

2016-11-28 Thread Nicholas Piggin
On Mon, 28 Nov 2016 13:51:36 +0530
"Naveen N. Rao"  wrote:

> On 2016/11/26 03:24PM, Nicholas Piggin wrote:
> > Switching from big endian to little endian can fail to regenerate
> > the crypto assembly properly. Switch to using standard form of
> > kbuild dependency checking (i.e., use FORCE and if_changed).
> > 
> > Signed-off-by: Nicholas Piggin   
> 
> Hi Nick,
> A similar patch is already in:
> https://mid.mail-archive.com/linux-crypto@vger.kernel.org/msg21855.html
> 
> - Naveen
> 

Hi Naveen,

I didn't notice your patch, thanks for pointing it out.

Thanks,
Nick


Re: [PATCH] crypto: vmx - rebuild generated asm when target changes

2016-11-28 Thread Naveen N. Rao
On 2016/11/26 03:24PM, Nicholas Piggin wrote:
> Switching from big endian to little endian can fail to regenerate
> the crypto assembly properly. Switch to using standard form of
> kbuild dependency checking (i.e., use FORCE and if_changed).
> 
> Signed-off-by: Nicholas Piggin 

Hi Nick,
A similar patch is already in:
https://mid.mail-archive.com/linux-crypto@vger.kernel.org/msg21855.html

- Naveen