Re: [PATCH v02] powerpc/mobility: Fix node detach/rename problem

2018-08-09 Thread Michael Ellerman
Michael Bringmann  writes:
> On 08/08/2018 09:02 AM, Michael Ellerman wrote:
>> Michael Bringmann  writes:
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
>>> b/arch/powerpc/platforms/pseries/mobility.c
>>> index e245a88..efc9442 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -22,6 +22,9 @@
>>>  #include 
>>>  #include "pseries.h"
>>>  
>>> +extern int of_free_phandle_cache(void);
>>> +extern void of_populate_phandle_cache(void);
>> 
>> We don't do that, they should be in a header.
>> 
>> But that's a minor problem given that the patch doesn't compile, because
>> both those functions are static.
>
> I am building against the latest 'linux-ppc' kernel.  It includes patch

OK you must be using the master branch.

> Commit b9952b5218added5577e4a3443969bc20884cea9 Mon Sep 17 00:00:00 2001
> From: Frank Rowand 
> Date: Thu, 12 Jul 2018 14:00:07 -0700
> Subject: of: overlay: update phandle cache on overlay apply and remove

That only landed in v4.18-rc6, so it's not in my next branch which is
where patches like this targeted for the next release are applied.

> which makes the functions static.  I will rebuild and test with an
> earlier version if you will specify which one.

No that's fine it will just have to wait until next and master are
merged before it can go in.

cheers


Re: [PATCH] powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range

2018-08-09 Thread Nicholas Piggin
On Thu,  9 Aug 2018 19:06:59 +0530
"Aneesh Kumar K.V"  wrote:

> This patch makes sure we update the mmu_gather page size even if we are
> requesting for a fullmm flush. This avoids triggering VM_WARN_ON in code
> paths like __tlb_remove_page_size that explicitly check for removing range 
> page
> size to be same as mmu gather page size.
> 
> Signed-off-by: Aneesh Kumar K.V 

Acked-by: Nicholas Piggin 

Thanks, sorry bout that.

> ---
>  arch/powerpc/include/asm/tlb.h | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 97ecef697e1b..f0e571b2dc7c 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -49,13 +49,11 @@ static inline void __tlb_remove_tlb_entry(struct 
> mmu_gather *tlb, pte_t *ptep,
>  static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
>unsigned int page_size)
>  {
> - if (tlb->fullmm)
> - return;
> -
>   if (!tlb->page_size)
>   tlb->page_size = page_size;
>   else if (tlb->page_size != page_size) {
> - tlb_flush_mmu(tlb);
> + if (!tlb->fullmm)
> + tlb_flush_mmu(tlb);
>   /*
>* update the page size after flush for the new
>* mmu_gather.



Re: [PATCH 2/2] powerpc/64s: reimplement book3s idle code in C

2018-08-09 Thread Nicholas Piggin
On Thu, 9 Aug 2018 20:15:04 +0530
Gautham R Shenoy  wrote:

> Hello Nicholas,
> On Fri, Aug 03, 2018 at 02:13:50PM +1000, Nicholas Piggin wrote:
> > Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
> > speific HV idle code to the powernv platform code. Generic book3s
> > assembly stubs are kept in common code and used only to save and
> > restore the stack frame and non-volatile GPRs just before going to
> > idle and just after waking up, which can return into C code.
> > 
> > Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
> > way this stuff will cope with non-trivial new CPU implementation
> > details, firmware changes, etc., without becoming unmaintainable.
> > 
> > This is not a strict translation to C code, there are some
> > differences.
> > 
> > - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
> >   but saves and restores them all explicitly.
> > 
> > - The optimisation where EC=ESL=0 idle modes did not have to save
> >   GPRs or change MSR is restored, because it's now simple to do.
> >   State loss modes that did not actually lose GPRs can use this
> >   optimization too.
> > 
> > - KVM secondary entry code is now more of a call/return style
> >   rather than spaghetti. nap_state_lost is not required beccause
> >   KVM always returns via NVGPR restorig path.
> > 
> > This seems pretty solid, so needs more review and testing now. The
> > KVM changes are pretty significant and complicated. POWER7 needs to
> > be tested too.
> > 
> > Open question:
> > - Why does P9 restore some of the PMU SPRs (but not others), and
> >   P8 only zeroes them?  
> 
> We are restoring MMCR0 (from the value saved in the stack) and MMCR1,
> MMCR2, and MMCRA in the stop_sprs in PACA. We saw that MMCRH and MMCRC
> are cleared on both POWER8 and POWER9. Hence we didn't restore
> them. MMCRS is being initialized by the KVM code.
> 
> Is there anything apart from these that need to be restored ?

No I'm wondering why it is we restore those on POWER9? POWER8 does not
restore them, only zeroes. What is the difference with POWER9?

I will leave that in for now so we don't change too much with one patch,
but it would be nice to document a bit better the reasons for saving or
clearing SPRs.

> 
> > 
> > Since RFC v1:
> > - Now tested and working with POWER9 hash and radix.
> > - KVM support added. This took a bit of work to untangle and might
> >   still have some issues, but POWER9 seems to work including hash on
> >   radix with dependent threads mode.
> > - This snowballed a bit because of KVM and other details making it
> >   not feasible to leave POWER7/8 code alone. That's only half done
> >   at the moment.
> > - So far this trades about 800 lines of asm for 500 of C. With POWER7/8
> >   support done it might be another hundred or so lines of C.
> > 
> > Since RFC v2:
> > - Fixed deep state SLB reloading
> > - Now tested and working with POWER8.
> > - Accounted for most feedback.
> > 
> > Since RFC v3:
> > - Rebased to powerpc merge + idle state bugfix
> > - Split SLB flush/restore code out and shared with MCE code (pseries
> >   MCE patches can also use).
> > - More testing on POWER8 including KVM with secondaries.
> > - Performance testing looks good. EC=ESL=0 is about 5% faster, other
> >   stop states look a little faster too.
> > - Adjusted SPR saving to handler POWER7, haven't tested it.  
> 
> 
> This patch looks good to me.
> 
> A couple of comments below. 
> 
> > ---  
> 
> [... snip ..]
> 
> > @@ -178,23 +177,30 @@ struct paca_struct {
> >  #endif
> > 
> >  #ifdef CONFIG_PPC_POWERNV
> > -   /* Per-core mask tracking idle threads and a lock bit-[L][] */
> > -   u32 *core_idle_state_ptr;
> > -   u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> > -   /* Mask to indicate thread id in core */
> > -   u8 thread_mask;
> > -   /* Mask to denote subcore sibling threads */
> > -   u8 subcore_sibling_mask;
> > -   /* Flag to request this thread not to stop */
> > -   atomic_t dont_stop;
> > -   /* The PSSCR value that the kernel requested before going to stop */
> > -   u64 requested_psscr;
> > +   /* PowerNV idle fields */
> > +   /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
> > +   unsigned long idle_state;
> > +   union {
> > +   /* P7/P8 specific fields */
> > +   struct {
> > +   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> > +   u8 thread_idle_state;
> > +   /* Mask to indicate thread id in core */
> > +   u8 thread_mask;  
> 
> This is no longer needed. We can get this from cpu_thread_in_core()
> from the C code.
> 
> The only place where we are currently using this is to DUMP the value
> of the thread_mask from xmon but not anywhere else in the idle entry
> code.

Good catch, removed it.

> > +   /* Mask to denote subcore sibling threads */
> > +   u8 subcore_sibling_mask;
> > +   };
> > 
> > -   /*
> > -

Re: ssb: Remove SSB_WARN_ON, SSB_BUG_ON and SSB_DEBUG

2018-08-09 Thread Kalle Valo
Michael Büsch  writes:

> Use the standard WARN_ON instead.
> If a small kernel is desired, WARN_ON can be disabled globally.
>
> Also remove SSB_DEBUG. Besides WARN_ON it only adds a tiny debug check.
> Include this check unconditionally.
>
> Signed-off-by: Michael Buesch 

Applied manually:

209b43759d65 ssb: Remove SSB_WARN_ON, SSB_BUG_ON and SSB_DEBUG

-- 
Kalle Valo


Re: [PATCH v3 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling

2018-08-09 Thread Nathan Fontenot
On 08/08/2018 10:29 AM, John Allen wrote:
> While handling PRRN events, the time to handle the actual hotplug events
> dwarfs the time it takes to perform the device tree updates and queue the
> hotplug events. In the case that PRRN events are being queued continuously,
> hotplug events have been observed to be queued faster than the kernel can
> actually handle them. This patch avoids the problem by waiting for a
> hotplug request to complete before queueing more hotplug events.
> 
> Signed-off-by: John Allen 

In the V2 thread it was mentioned that we could just call the DLPAR operation
directly instead of going through the workqueue. I have written a patch to do
this that also cleans up some of the request handling.

requests that come through the hotplug interrupt still use the workqueue. The
other requests, PRRN and sysfs, just call the dlpar handler directly. This
eliminates the need for a wait conditional and return code handling in the
workqueue handler and should solve the issue that John solves with his patch.

This still needs testing but wanted to get people's thoughts.

-Nathan

---
 arch/powerpc/platforms/pseries/dlpar.c|   37 +++--
 arch/powerpc/platforms/pseries/mobility.c |   18 +-
 arch/powerpc/platforms/pseries/pseries.h  |5 ++--
 arch/powerpc/platforms/pseries/ras.c  |2 +-
 4 files changed, 19 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index a0b20c03f078..052c4f2ba0a0 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -32,8 +32,6 @@ static struct workqueue_struct *pseries_hp_wq;
 struct pseries_hp_work {
struct work_struct work;
struct pseries_hp_errorlog *errlog;
-   struct completion *hp_completion;
-   int *rc;
 };
 
 struct cc_workarea {
@@ -329,7 +327,7 @@ int dlpar_release_drc(u32 drc_index)
return 0;
 }
 
-static int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog)
+int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog)
 {
int rc;
 
@@ -371,20 +369,13 @@ static void pseries_hp_work_fn(struct work_struct *work)
struct pseries_hp_work *hp_work =
container_of(work, struct pseries_hp_work, work);
 
-   if (hp_work->rc)
-   *(hp_work->rc) = handle_dlpar_errorlog(hp_work->errlog);
-   else
-   handle_dlpar_errorlog(hp_work->errlog);
-
-   if (hp_work->hp_completion)
-   complete(hp_work->hp_completion);
+   handle_dlpar_errorlog(hp_work->errlog);
 
kfree(hp_work->errlog);
kfree((void *)work);
 }
 
-void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog,
-struct completion *hotplug_done, int *rc)
+void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog)
 {
struct pseries_hp_work *work;
struct pseries_hp_errorlog *hp_errlog_copy;
@@ -397,13 +388,9 @@ void queue_hotplug_event(struct pseries_hp_errorlog 
*hp_errlog,
if (work) {
INIT_WORK((struct work_struct *)work, pseries_hp_work_fn);
work->errlog = hp_errlog_copy;
-   work->hp_completion = hotplug_done;
-   work->rc = rc;
queue_work(pseries_hp_wq, (struct work_struct *)work);
} else {
-   *rc = -ENOMEM;
kfree(hp_errlog_copy);
-   complete(hotplug_done);
}
 }
 
@@ -521,18 +508,15 @@ static int dlpar_parse_id_type(char **cmd, struct 
pseries_hp_errorlog *hp_elog)
 static ssize_t dlpar_store(struct class *class, struct class_attribute *attr,
   const char *buf, size_t count)
 {
-   struct pseries_hp_errorlog *hp_elog;
-   struct completion hotplug_done;
+   struct pseries_hp_errorlog hp_elog;
char *argbuf;
char *args;
int rc;
 
args = argbuf = kstrdup(buf, GFP_KERNEL);
-   hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
-   if (!hp_elog || !argbuf) {
+   if (!argbuf) {
pr_info("Could not allocate resources for DLPAR operation\n");
kfree(argbuf);
-   kfree(hp_elog);
return -ENOMEM;
}
 
@@ -540,25 +524,22 @@ static ssize_t dlpar_store(struct class *class, struct 
class_attribute *attr,
 * Parse out the request from the user, this will be in the form:
 *
 */
-   rc = dlpar_parse_resource(, hp_elog);
+   rc = dlpar_parse_resource(, _elog);
if (rc)
goto dlpar_store_out;
 
-   rc = dlpar_parse_action(, hp_elog);
+   rc = dlpar_parse_action(, _elog);
if (rc)
goto dlpar_store_out;
 
-   rc = dlpar_parse_id_type(, hp_elog);
+   rc = dlpar_parse_id_type(, _elog);
if (rc)
goto dlpar_store_out;
 
-   init_completion(_done);
-   

Re: [PATCH 2/2] powerpc/64s: reimplement book3s idle code in C

2018-08-09 Thread Gautham R Shenoy
Hello Nicholas,
On Fri, Aug 03, 2018 at 02:13:50PM +1000, Nicholas Piggin wrote:
> Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
> speific HV idle code to the powernv platform code. Generic book3s
> assembly stubs are kept in common code and used only to save and
> restore the stack frame and non-volatile GPRs just before going to
> idle and just after waking up, which can return into C code.
> 
> Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
> way this stuff will cope with non-trivial new CPU implementation
> details, firmware changes, etc., without becoming unmaintainable.
> 
> This is not a strict translation to C code, there are some
> differences.
> 
> - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
>   but saves and restores them all explicitly.
> 
> - The optimisation where EC=ESL=0 idle modes did not have to save
>   GPRs or change MSR is restored, because it's now simple to do.
>   State loss modes that did not actually lose GPRs can use this
>   optimization too.
> 
> - KVM secondary entry code is now more of a call/return style
>   rather than spaghetti. nap_state_lost is not required beccause
>   KVM always returns via NVGPR restorig path.
> 
> This seems pretty solid, so needs more review and testing now. The
> KVM changes are pretty significant and complicated. POWER7 needs to
> be tested too.
> 
> Open question:
> - Why does P9 restore some of the PMU SPRs (but not others), and
>   P8 only zeroes them?

We are restoring MMCR0 (from the value saved in the stack) and MMCR1,
MMCR2, and MMCRA in the stop_sprs in PACA. We saw that MMCRH and MMCRC
are cleared on both POWER8 and POWER9. Hence we didn't restore
them. MMCRS is being initialized by the KVM code.

Is there anything apart from these that need to be restored ?

> 
> Since RFC v1:
> - Now tested and working with POWER9 hash and radix.
> - KVM support added. This took a bit of work to untangle and might
>   still have some issues, but POWER9 seems to work including hash on
>   radix with dependent threads mode.
> - This snowballed a bit because of KVM and other details making it
>   not feasible to leave POWER7/8 code alone. That's only half done
>   at the moment.
> - So far this trades about 800 lines of asm for 500 of C. With POWER7/8
>   support done it might be another hundred or so lines of C.
> 
> Since RFC v2:
> - Fixed deep state SLB reloading
> - Now tested and working with POWER8.
> - Accounted for most feedback.
> 
> Since RFC v3:
> - Rebased to powerpc merge + idle state bugfix
> - Split SLB flush/restore code out and shared with MCE code (pseries
>   MCE patches can also use).
> - More testing on POWER8 including KVM with secondaries.
> - Performance testing looks good. EC=ESL=0 is about 5% faster, other
>   stop states look a little faster too.
> - Adjusted SPR saving to handler POWER7, haven't tested it.


This patch looks good to me.

A couple of comments below. 

> ---

[... snip ..]

> @@ -178,23 +177,30 @@ struct paca_struct {
>  #endif
> 
>  #ifdef CONFIG_PPC_POWERNV
> - /* Per-core mask tracking idle threads and a lock bit-[L][] */
> - u32 *core_idle_state_ptr;
> - u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> - /* Mask to indicate thread id in core */
> - u8 thread_mask;
> - /* Mask to denote subcore sibling threads */
> - u8 subcore_sibling_mask;
> - /* Flag to request this thread not to stop */
> - atomic_t dont_stop;
> - /* The PSSCR value that the kernel requested before going to stop */
> - u64 requested_psscr;
> + /* PowerNV idle fields */
> + /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
> + unsigned long idle_state;
> + union {
> + /* P7/P8 specific fields */
> + struct {
> + /* PNV_THREAD_RUNNING/NAP/SLEEP */
> + u8 thread_idle_state;
> + /* Mask to indicate thread id in core */
> + u8 thread_mask;

This is no longer needed. We can get this from cpu_thread_in_core()
from the C code.

The only place where we are currently using this is to DUMP the value
of the thread_mask from xmon but not anywhere else in the idle entry
code.



> + /* Mask to denote subcore sibling threads */
> + u8 subcore_sibling_mask;
> + };
> 
> - /*
> -  * Save area for additional SPRs that need to be
> -  * saved/restored during cpuidle stop.
> -  */
> - struct stop_sprs stop_sprs;
> + /* P9 specific fields */
> + struct {
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + /* The PSSCR value that the kernel requested before 
> going to stop */
> + u64 requested_psscr;
> + /* Flag to request this thread not to stop */
> + atomic_t dont_stop;
> +#endif
> + };
> + };
>  #endif
[..snip..]


Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100

2018-08-09 Thread Alex Williamson
On Thu, 9 Aug 2018 14:21:29 +1000
Alexey Kardashevskiy  wrote:

> On 08/08/2018 18:39, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 02/08/2018 02:16, Alex Williamson wrote:  
> >> On Wed, 1 Aug 2018 18:37:35 +1000
> >> Alexey Kardashevskiy  wrote:
> >>  
> >>> On 01/08/2018 00:29, Alex Williamson wrote:  
>  On Tue, 31 Jul 2018 14:03:35 +1000
>  Alexey Kardashevskiy  wrote:
>  
> > On 31/07/2018 02:29, Alex Williamson wrote:
> >> On Mon, 30 Jul 2018 18:58:49 +1000
> >> Alexey Kardashevskiy  wrote:
> >>> After some local discussions, it was pointed out that force disabling
> >>> nvlinks won't bring us much as for an nvlink to work, both sides need 
> >>> to
> >>> enable it so malicious guests cannot penetrate good ones (or a host)
> >>> unless a good guest enabled the link but won't happen with a well
> >>> behaving guest. And if two guests became malicious, then can still 
> >>> only
> >>> harm each other, and so can they via other ways such network. This is
> >>> different from PCIe as once PCIe link is unavoidably enabled, a well
> >>> behaving device cannot firewall itself from peers as it is up to the
> >>> upstream bridge(s) now to decide the routing; with nvlink2, a GPU 
> >>> still
> >>> has means to protect itself, just like a guest can run "firewalld" for
> >>> network.
> >>>
> >>> Although it would be a nice feature to have an extra barrier between
> >>> GPUs, is inability to block the links in hypervisor still a blocker 
> >>> for
> >>> V100 pass through?  
> >>
> >> How is the NVLink configured by the guest, is it 'on'/'off' or are
> >> specific routes configured?   
> >
> > The GPU-GPU links need not to be blocked and need to be enabled
> > (==trained) by a driver in the guest. There are no routes between GPUs
> > in NVLink fabric, these are direct links, it is just a switch on each
> > side, both switches need to be on for a link to work.
> 
>  Ok, but there is at least the possibility of multiple direct links per
>  GPU, the very first diagram I find of NVlink shows 8 interconnected
>  GPUs:
> 
>  https://www.nvidia.com/en-us/data-center/nvlink/
> >>>
> >>> Out design is like the left part of the picture but it is just a detail.  
> >>
> >> Unless we can specifically identify a direct link vs a mesh link, we
> >> shouldn't be making assumptions about the degree of interconnect.
> >>
>  So if each switch enables one direct, point to point link, how does the
>  guest know which links to open for which peer device?
> >>>
> >>> It uses PCI config space on GPUs to discover the topology.  
> >>
> >> So do we need to virtualize this config space if we're going to
> >> virtualize the topology?
> >>  
>  And of course
>  since we can't see the spec, a security audit is at best hearsay :-\
> >>>
> >>> Yup, the exact discovery protocol is hidden.  
> >>
> >> It could be reverse engineered...
> >>  
> > The GPU-CPU links - the GPU bit is the same switch, the CPU NVlink state
> > is controlled via the emulated PCI bridges which I pass through together
> > with the GPU.
> 
>  So there's a special emulated switch, is that how the guest knows which
>  GPUs it can enable NVLinks to?
> >>>
> >>> Since it only has PCI config space (there is nothing relevant in the
> >>> device tree at all), I assume (double checking with the NVIDIA folks
> >>> now) the guest driver enables them all, tests which pair works and
> >>> disables the ones which do not. This gives a malicious guest a tiny
> >>> window of opportunity to break into a good guest. Hm :-/  
> >>
> >> Let's not minimize that window, that seems like a prime candidate for
> >> an exploit.
> >>  
> >> If the former, then isn't a non-malicious
> >> guest still susceptible to a malicious guest?  
> >
> > A non-malicious guest needs to turn its switch on for a link to a GPU
> > which belongs to a malicious guest.
> 
>  Actual security, or obfuscation, will we ever know...
> >>> If the latter, how is
> >> routing configured by the guest given that the guest view of the
> >> topology doesn't match physical hardware?  Are these routes
> >> deconfigured by device reset?  Are they part of the save/restore
> >> state?  Thanks,  
> 
>  Still curious what happens to these routes on reset.  Can a later user
>  of a GPU inherit a device where the links are already enabled?  Thanks,  
>    
> >>>
> >>> I am told that the GPU reset disables links. As a side effect, we get an
> >>> HMI (a hardware fault which reset the host machine) when trying
> >>> accessing the GPU RAM which indicates that the link is down as the
> >>> memory is only accessible via the nvlink. We have special fencing code
> >>> in our host firmware (skiboot) to fence this memory on 

[PATCH] powerpc/powernv/idle: Fix build error

2018-08-09 Thread Aneesh Kumar K.V
Fix the below build error using strlcpy instead of strncpy

In function 'pnv_parse_cpuidle_dt',
inlined from 'pnv_init_idle_states' at 
arch/powerpc/platforms/powernv/idle.c:840:7,
inlined from '__machine_initcall_powernv_pnv_init_idle_states' at 
arch/powerpc/platforms/powernv/idle.c:870:1:
arch/powerpc/platforms/powernv/idle.c:820:3: error: 'strncpy' specified bound 
16 equals destination size [-Werror=stringop-truncation]
   strncpy(pnv_idle_states[i].name, temp_string[i],
   ^~~~
PNV_IDLE_NAME_LEN);

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/powernv/idle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index ecb002c5db83..35f699ebb662 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -817,7 +817,7 @@ static int pnv_parse_cpuidle_dt(void)
goto out;
}
for (i = 0; i < nr_idle_states; i++)
-   strncpy(pnv_idle_states[i].name, temp_string[i],
+   strlcpy(pnv_idle_states[i].name, temp_string[i],
PNV_IDLE_NAME_LEN);
nr_pnv_idle_states = nr_idle_states;
rc = 0;
-- 
2.17.1



[PATCH] powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range

2018-08-09 Thread Aneesh Kumar K.V
This patch makes sure we update the mmu_gather page size even if we are
requesting for a fullmm flush. This avoids triggering VM_WARN_ON in code
paths like __tlb_remove_page_size that explicitly check for removing range page
size to be same as mmu gather page size.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/tlb.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 97ecef697e1b..f0e571b2dc7c 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -49,13 +49,11 @@ static inline void __tlb_remove_tlb_entry(struct mmu_gather 
*tlb, pte_t *ptep,
 static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 unsigned int page_size)
 {
-   if (tlb->fullmm)
-   return;
-
if (!tlb->page_size)
tlb->page_size = page_size;
else if (tlb->page_size != page_size) {
-   tlb_flush_mmu(tlb);
+   if (!tlb->fullmm)
+   tlb_flush_mmu(tlb);
/*
 * update the page size after flush for the new
 * mmu_gather.
-- 
2.17.1



[PATCH] powerpc/mm/hash: Remove unnecessary do { }while(0) loop

2018-08-09 Thread Aneesh Kumar K.V
Avoid coverity false warnings like

*** CID 187347:  Control flow issues  (UNREACHABLE)
/arch/powerpc/mm/hash_native_64.c: 819 in native_flush_hash_range()
813slot += hidx & _PTEIDX_GROUP_IX;
814hptep = htab_address + slot;
815want_v = hpte_encode_avpn(vpn, psize, ssize);
816hpte_v = hpte_get_old_v(hptep);
817
818if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
>>> CID 187347:  Control flow issues  (UNREACHABLE)

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index c81793d47af9..f82ee8a3b561 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -137,10 +137,9 @@ extern bool __rpte_sub_valid(real_pte_t rpte, unsigned 
long index);
shift = mmu_psize_defs[psize].shift;\
for (index = 0; vpn < __end; index++,   \
 vpn += (1L << (shift - VPN_SHIFT))) {  \
-   if (!__split || __rpte_sub_valid(rpte, index))  \
-   do {
+   if (!__split || __rpte_sub_valid(rpte, index))
 
-#define pte_iterate_hashed_end() } while(0); } } while(0)
+#define pte_iterate_hashed_end()  } } while(0)
 
 #define pte_pagesize_index(mm, addr, pte)  \
(((pte) & H_PAGE_COMBO)? MMU_PAGE_4K: MMU_PAGE_64K)
-- 
2.17.1



Re: [PATCH v6 1/2] powerpc: Detect the presence of big-cores via "ibm,thread-groups"

2018-08-09 Thread Srikar Dronamraju
* Gautham R. Shenoy  [2018-08-09 11:02:07]:

> 
>  int threads_per_core, threads_per_subcore, threads_shift;
> +bool has_big_cores;
>  cpumask_t threads_core_mask;
>  EXPORT_SYMBOL_GPL(threads_per_core);
>  EXPORT_SYMBOL_GPL(threads_per_subcore);
>  EXPORT_SYMBOL_GPL(threads_shift);
> +EXPORT_SYMBOL_GPL(has_big_cores);

Why do we need EXPORT_SYMBOL_GPL?

>  EXPORT_SYMBOL_GPL(threads_core_mask);
> 
> + *
> + * Returns 0 on success, -EINVAL if the property does not exist,
> + * -ENODATA if property does not have a value, and -EOVERFLOW if the
> + * property data isn't large enough.
> + */
> +int parse_thread_groups(struct device_node *dn,
> + struct thread_groups *tg)
> +{
> + unsigned int nr_groups, threads_per_group, property;
> + int i;
> + u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
> + u32 *thread_list;
> + size_t total_threads;
> + int ret;
> +
> + ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> +  thread_group_array, 3);
> +
> + if (ret)
> + goto out_err;
> +
> + property = thread_group_array[0];
> + nr_groups = thread_group_array[1];
> + threads_per_group = thread_group_array[2];
> + total_threads = nr_groups * threads_per_group;
> +

Shouldnt we check for property and nr_groups
If the property is not 1 and nr_groups < 1, we should error out
No point in calling a of_property read if property is not right.


Nit: 
Cant we directly assign to tg->property, and hence avoid local
variables, property, nr_groups and threads_per_group?

> + ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> +  thread_group_array,
> +  3 + total_threads);
> +
> +static inline bool dt_has_big_core(struct device_node *dn,
> +struct thread_groups *tg)
> +{
> + if (parse_thread_groups(dn, tg))
> + return false;
> +
> + if (tg->property != 1)
> + return false;
> +
> + if (tg->nr_groups < 1)
> + return false;

Can avoid these check if we can check in parse_thread_groups.

>  /**
>   * setup_cpu_maps - initialize the following cpu maps:
>   *  cpu_possible_mask
> @@ -457,6 +605,7 @@ void __init smp_setup_cpu_maps(void)
>   int cpu = 0;
>   int nthreads = 1;
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 755dc98..f5717de 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "cacheinfo.h"
>  #include "setup.h"
> @@ -1025,6 +1026,33 @@ static ssize_t show_physical_id(struct device *dev,
>  }
>  static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
> 
> +static ssize_t show_small_core_siblings(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct cpu *cpu = container_of(dev, struct cpu, dev);
> + struct device_node *dn = of_get_cpu_node(cpu->dev.id, NULL);
> + struct thread_groups tg;
> + int i, j;
> + ssize_t ret = 0;
> +

Here we need to check for validity of dn and error out accordingly.


> + if (parse_thread_groups(dn, ))
> + return -ENODATA;

Did we miss a of_node_put(dn)?

> +
> + i = get_cpu_thread_group_start(cpu->dev.id, );
> +
> + if (i == -1)
> + return -ENODATA;
> +
> + for (j = 0; j < tg.threads_per_group - 1; j++)
> + ret += sprintf(buf + ret, "%d,", tg.thread_list[i + j]);

Here, we are making the assumption that group_start will always be the
first thread in the thread_group. However we didnt make the same
assumption in get_cpu_thread_group_start.



Re: [PATCH v6 2/2] powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcores

2018-08-09 Thread Srikar Dronamraju
* Gautham R. Shenoy  [2018-08-09 11:02:08]:

> 
> 3) ppc64_cpu --smt=2
>SMT domain ceases to exist as each domain consists of just one
>group.
> 

When seen in isolation, the above looks as if ppc64_cpu --smt=2 o/p says
" SMT domain ceases to exist"

> @@ -999,7 +1012,17 @@ static void add_cpu_to_masks(int cpu)
>  {
>   int first_thread = cpu_first_thread_sibling(cpu);
>   int chipid = cpu_to_chip_id(cpu);
> - int i;
> +
> + struct thread_groups tg;
> + int i, cpu_group_start = -1;
> +
> + if (has_big_cores) {
> + struct device_node *dn = of_get_cpu_node(cpu, NULL);
> +

Not checking for validity of dn and no of_node_puts?

> + parse_thread_groups(dn, );
> + cpu_group_start = get_cpu_thread_group_start(cpu, );
> + cpumask_set_cpu(cpu, cpu_smallcore_sibling_mask(cpu));
> + }
> 
>   /*
>* This CPU will not be in the online mask yet so we need to manually

The rest looks good



Re: [PATCH] lib/test_hexdump: fix failure on big endian cpu

2018-08-09 Thread Michael Ellerman
Andy Shevchenko  writes:
> On Thu, 2018-08-09 at 16:18 +1000, Michael Ellerman wrote:
>> rashmica  writes:
>> > On 08/08/18 17:25, Michael Ellerman wrote:
>> > > Christophe Leroy  writes:
>> > > > 
>> > mpe I sent a patch doing that awhile ago and you obviously didn't
>> > like
>> > it because you never merged it :P
>
> Hmm... I (as an author of the test case) never saw that patch.

I probably told Rashmica to send it to the linuxppc list and then I was
meant to follow up who should merge it. So my fault.

>> Sorry, I wasn't sure who should merge it, and never followed up.
>
> lib/* most of the time under Andrew's responsibility, though since it's
> orphaned in MAINTAINERS, anyone can push, though, I think, it's good to
> notify Andrew.

Yeah. I was going to apply it with your Ack because it seems pretty
uncontroversial, and there's nothing else in linux-next touching that
file so it should be conflict-free.

I guess I'll wait and see if Andrew would prefer to take it.

cheers


Re: [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.

2018-08-09 Thread Michal Suchánek
On Thu, 9 Aug 2018 18:33:33 +1000
Nicholas Piggin  wrote:

> On Thu, 9 Aug 2018 13:39:45 +0530
> Ananth N Mavinakayanahalli  wrote:
> 
> > On Thu, Aug 09, 2018 at 06:02:53PM +1000, Nicholas Piggin wrote:  
> > > On Thu, 09 Aug 2018 16:34:07 +1000
> > > Michael Ellerman  wrote:
> > > 
> > > > "Aneesh Kumar K.V"  writes:
> > > > > On 08/08/2018 08:26 PM, Michael Ellerman wrote:  
> > > > >> Mahesh J Salgaonkar  writes:  
> > > > >>> From: Mahesh Salgaonkar 
> > > > >>>
> > > > >>> Introduce recovery action for recovered memory errors
> > > > >>> (MCEs). There are soft memory errors like SLB Multihit,
> > > > >>> which can be a result of a bad hardware OR software BUG.
> > > > >>> Kernel can easily recover from these soft errors by
> > > > >>> flushing SLB contents. After the recovery kernel can still
> > > > >>> continue to function without any issue. But in some
> > > > >>> scenario's we may keep getting these soft errors until the
> > > > >>> root cause is fixed. To be able to analyze and find the
> > > > >>> root cause, best way is to gather enough data and system
> > > > >>> state at the time of MCE. Hence this patch introduces a
> > > > >>> sysctl knob where user can decide either to continue after
> > > > >>> recovery or panic the kernel to capture the dump.  
> > > > >> 
> > > > >> I'm not convinced we want this.
> > > > >> 
> > > > >> As we've discovered it's often not possible to reconstruct
> > > > >> what happened based on a dump anyway.
> > > > >> 
> > > > >> The key thing you need is the content of the SLB and that's
> > > > >> not included in a dump.
> > > > >> 
> > > > >> So I think we should dump the SLB content when we get the
> > > > >> MCE (which this series does) and any other useful info, and
> > > > >> then if we can recover we should.  
> > > > >
> > > > > The reasoning there is what if we got multi-hit due to some
> > > > > corruption in slb_cache_ptr. ie. some part of kernel is
> > > > > wrongly updating the paca data structure due to wrong
> > > > > pointer. Now that is far fetched, but then possible right?.
> > > > > Hence the idea that, if we don't have much insight into why a
> > > > > slb multi-hit occur from the dmesg which include slb content,
> > > > > slb_cache contents etc, there should be an easy way to force
> > > > > a dump that might assist in further debug.  
> > > > 
> > > > If you're debugging something complex that you can't determine
> > > > from the SLB dump then you should be running a debug kernel
> > > > anyway. And if anything you want to drop into xmon and sit
> > > > there, preserving the most state, rather than taking a dump.
> > > 
> > > I'm not saying for a dump specifically, just some form of crash.
> > > And we really should have an option to xmon on panic, but that's
> > > another story.
> > 
> > That's fine during development or in a lab, not something we could
> > enforce in a customer environment, could we?  
> 
> xmon on panic? Not something to enforce but IMO (without thinking
> about it too much but having encountered it several times) it should
> probably be tied xmon on BUG option.

You should get that with this patch and xmon=on or am I missing
something?

Thanks

Michal


Re: [PATCH 1/5] powerpc/asm: Add a patch_site macro & helpers for patching instructions

2018-08-09 Thread Michael Ellerman
Christophe LEROY  writes:
> Le 08/08/2018 à 18:30, Christophe LEROY a écrit :
>> Le 23/07/2018 à 17:07, Michael Ellerman a écrit :
...
>>> diff --git a/arch/powerpc/include/asm/code-patching.h 
>>> b/arch/powerpc/include/asm/code-patching.h
>>> index 812535f40124..b2051234ada8 100644
>>> --- a/arch/powerpc/include/asm/code-patching.h
>>> +++ b/arch/powerpc/include/asm/code-patching.h
>>> @@ -32,6 +32,8 @@ unsigned int create_cond_branch(const unsigned int 
>>> *addr,
>>>   int patch_branch(unsigned int *addr, unsigned long target, int flags);
>>>   int patch_instruction(unsigned int *addr, unsigned int instr);
>>>   int raw_patch_instruction(unsigned int *addr, unsigned int instr);
>>> +int patch_instruction_site(s32 *addr, unsigned int instr);
>>> +int patch_branch_site(s32 *site, unsigned long target, int flags);
>> 
>> Why use s32* instead of unsigned int* as usual for pointer to code ?
>
> Forget my stupid question, I didn't see it was a relative address and 
> not an absolute one.

No worries. 

It is a bit non-obvious at first glance, it looks like the s32 * points
to the instruction. But it points to the s32 that holds the relative
offset from itself, of the instruction.

We could add a typedef to try and make that more obvious, but I
generally don't like typedefs that hide pointerness.

cheers


Re: [PATCH] lib/test_hexdump: fix failure on big endian cpu

2018-08-09 Thread Andy Shevchenko
On Thu, 2018-08-09 at 16:18 +1000, Michael Ellerman wrote:
> rashmica  writes:
> > On 08/08/18 17:25, Michael Ellerman wrote:
> > > Christophe Leroy  writes:
> > > > 
> > mpe I sent a patch doing that awhile ago and you obviously didn't
> > like
> > it because you never merged it :P

Hmm... I (as an author of the test case) never saw that patch.

> Sorry, I wasn't sure who should merge it, and never followed up.

lib/* most of the time under Andrew's responsibility, though since it's
orphaned in MAINTAINERS, anyone can push, though, I think, it's good to
notify Andrew.

-- 
Andy Shevchenko 
Intel Finland Oy



Re: [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.

2018-08-09 Thread Nicholas Piggin
On Thu, 9 Aug 2018 13:39:45 +0530
Ananth N Mavinakayanahalli  wrote:

> On Thu, Aug 09, 2018 at 06:02:53PM +1000, Nicholas Piggin wrote:
> > On Thu, 09 Aug 2018 16:34:07 +1000
> > Michael Ellerman  wrote:
> >   
> > > "Aneesh Kumar K.V"  writes:  
> > > > On 08/08/2018 08:26 PM, Michael Ellerman wrote:
> > > >> Mahesh J Salgaonkar  writes:
> > > >>> From: Mahesh Salgaonkar 
> > > >>>
> > > >>> Introduce recovery action for recovered memory errors (MCEs). There 
> > > >>> are
> > > >>> soft memory errors like SLB Multihit, which can be a result of a bad
> > > >>> hardware OR software BUG. Kernel can easily recover from these soft 
> > > >>> errors
> > > >>> by flushing SLB contents. After the recovery kernel can still 
> > > >>> continue to
> > > >>> function without any issue. But in some scenario's we may keep getting
> > > >>> these soft errors until the root cause is fixed. To be able to 
> > > >>> analyze and
> > > >>> find the root cause, best way is to gather enough data and system 
> > > >>> state at
> > > >>> the time of MCE. Hence this patch introduces a sysctl knob where user 
> > > >>> can
> > > >>> decide either to continue after recovery or panic the kernel to 
> > > >>> capture the
> > > >>> dump.
> > > >> 
> > > >> I'm not convinced we want this.
> > > >> 
> > > >> As we've discovered it's often not possible to reconstruct what 
> > > >> happened
> > > >> based on a dump anyway.
> > > >> 
> > > >> The key thing you need is the content of the SLB and that's not 
> > > >> included
> > > >> in a dump.
> > > >> 
> > > >> So I think we should dump the SLB content when we get the MCE (which
> > > >> this series does) and any other useful info, and then if we can recover
> > > >> we should.
> > > >
> > > > The reasoning there is what if we got multi-hit due to some corruption 
> > > > in slb_cache_ptr. ie. some part of kernel is wrongly updating the paca 
> > > > data structure due to wrong pointer. Now that is far fetched, but then 
> > > > possible right?. Hence the idea that, if we don't have much insight 
> > > > into 
> > > > why a slb multi-hit occur from the dmesg which include slb content, 
> > > > slb_cache contents etc, there should be an easy way to force a dump 
> > > > that 
> > > > might assist in further debug.
> > > 
> > > If you're debugging something complex that you can't determine from the
> > > SLB dump then you should be running a debug kernel anyway. And if
> > > anything you want to drop into xmon and sit there, preserving the most
> > > state, rather than taking a dump.  
> > 
> > I'm not saying for a dump specifically, just some form of crash. And we
> > really should have an option to xmon on panic, but that's another story.  
> 
> That's fine during development or in a lab, not something we could
> enforce in a customer environment, could we?

xmon on panic? Not something to enforce but IMO (without thinking about
it too much but having encountered it several times) it should probably
be tied xmon on BUG option.

> 
> > I think HA/failover kind of environments use options like this too. If
> > anything starts going bad they don't want to try limping along but stop
> > ASAP.  
> 
> Right. And in this particular case, can we guarantee no corruption
> (leading to or post the multihit recovery) when running a customer workload,
> is the question...

I think that's an element of it. If SLB corruption is caused by
software then we could already have memory corruption. If it's hardware
then presumably we're supposed to have some guarantee of error rates.
But still you would say a machine that has taken no MCEs is less likely
to have a problem than one that has taken some MCEs!

It's not just corruption either, I've run into bugs where we get huge
streams of HMIs for example which all get recovered properly but
performance would have been in the toilet.

Anyway, being policy maybe we could drop this patch out of the SLB MCE
series and introduce it afterwards if we think it's necessary. For
SLB multi hit caused by software bug in slb handling, I'd say Michael's
pretty right about just needing the MCE output with SLB contents.

Thanks,
Nick


[PATCH] powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled

2018-08-09 Thread Christophe Leroy
The symbol memcpy_nocache_branch defined in order to allow patching
of memset function once cache is enabled leads to confusing reports
by perf tool.

Using the new patch_site functionality solves this issue.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-prototypes.h | 1 +
 arch/powerpc/kernel/setup_32.c| 7 +++
 arch/powerpc/lib/copy_32.S| 9 ++---
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 70fdc5b9b9fb..1f4691ce4126 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -146,6 +146,7 @@ void _kvmppc_save_tm_pr(struct kvm_vcpu *vcpu, u64 
guest_msr);
 /* Patch sites */
 extern s32 patch__call_flush_count_cache;
 extern s32 patch__flush_count_cache_return;
+extern s32 patch__memset_nocache, patch__memcpy_nocache;
 
 extern long flush_count_cache;
 
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 0e3743343280..ba969278bf4d 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -97,11 +97,10 @@ notrace unsigned long __init early_init(unsigned long 
dt_ptr)
  * We do the initial parsing of the flat device-tree and prepares
  * for the MMU to be fully initialized.
  */
-extern unsigned int memset_nocache_branch; /* Insn to be replaced by NOP */
-
 notrace void __init machine_init(u64 dt_ptr)
 {
-   unsigned int *addr = _nocache_branch;
+   unsigned int *addr = (unsigned int *)((unsigned 
long)__memset_nocache +
+  patch__memset_nocache);
unsigned long insn;
 
/* Configure static keys first, now that we're relocated. */
@@ -110,7 +109,7 @@ notrace void __init machine_init(u64 dt_ptr)
/* Enable early debugging if any specified (see udbg.h) */
udbg_early_init();
 
-   patch_instruction((unsigned int *), PPC_INST_NOP);
+   patch_instruction_site(__memcpy_nocache, PPC_INST_NOP);
 
insn = create_cond_branch(addr, branch_target(addr), 0x82);
patch_instruction(addr, insn);  /* replace b by bne cr0 */
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index da425bb6b369..ba66846fe973 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define COPY_16_BYTES  \
lwz r7,4(r4);   \
@@ -107,8 +108,8 @@ _GLOBAL(memset)
 * Skip optimised bloc until cache is enabled. Will be replaced
 * by 'bne' during boot to use normal procedure if r4 is not zero
 */
-_GLOBAL(memset_nocache_branch)
-   b   2f
+5: b   2f
+   patch_site  5b, patch__memset_nocache
 
clrlwi  r7,r6,32-LG_CACHELINE_BYTES
add r8,r7,r5
@@ -168,7 +169,9 @@ _GLOBAL(memmove)
/* fall through */
 
 _GLOBAL(memcpy)
-   b   generic_memcpy
+1: b   generic_memcpy
+   patch_site  1b, patch__memcpy_nocache
+
add r7,r3,r5/* test if the src & dst overlap */
add r8,r4,r5
cmplw   0,r4,r7
-- 
2.13.3



Re: [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.

2018-08-09 Thread Ananth N Mavinakayanahalli
On Thu, Aug 09, 2018 at 06:02:53PM +1000, Nicholas Piggin wrote:
> On Thu, 09 Aug 2018 16:34:07 +1000
> Michael Ellerman  wrote:
> 
> > "Aneesh Kumar K.V"  writes:
> > > On 08/08/2018 08:26 PM, Michael Ellerman wrote:  
> > >> Mahesh J Salgaonkar  writes:  
> > >>> From: Mahesh Salgaonkar 
> > >>>
> > >>> Introduce recovery action for recovered memory errors (MCEs). There are
> > >>> soft memory errors like SLB Multihit, which can be a result of a bad
> > >>> hardware OR software BUG. Kernel can easily recover from these soft 
> > >>> errors
> > >>> by flushing SLB contents. After the recovery kernel can still continue 
> > >>> to
> > >>> function without any issue. But in some scenario's we may keep getting
> > >>> these soft errors until the root cause is fixed. To be able to analyze 
> > >>> and
> > >>> find the root cause, best way is to gather enough data and system state 
> > >>> at
> > >>> the time of MCE. Hence this patch introduces a sysctl knob where user 
> > >>> can
> > >>> decide either to continue after recovery or panic the kernel to capture 
> > >>> the
> > >>> dump.  
> > >> 
> > >> I'm not convinced we want this.
> > >> 
> > >> As we've discovered it's often not possible to reconstruct what happened
> > >> based on a dump anyway.
> > >> 
> > >> The key thing you need is the content of the SLB and that's not included
> > >> in a dump.
> > >> 
> > >> So I think we should dump the SLB content when we get the MCE (which
> > >> this series does) and any other useful info, and then if we can recover
> > >> we should.  
> > >
> > > The reasoning there is what if we got multi-hit due to some corruption 
> > > in slb_cache_ptr. ie. some part of kernel is wrongly updating the paca 
> > > data structure due to wrong pointer. Now that is far fetched, but then 
> > > possible right?. Hence the idea that, if we don't have much insight into 
> > > why a slb multi-hit occur from the dmesg which include slb content, 
> > > slb_cache contents etc, there should be an easy way to force a dump that 
> > > might assist in further debug.  
> > 
> > If you're debugging something complex that you can't determine from the
> > SLB dump then you should be running a debug kernel anyway. And if
> > anything you want to drop into xmon and sit there, preserving the most
> > state, rather than taking a dump.
> 
> I'm not saying for a dump specifically, just some form of crash. And we
> really should have an option to xmon on panic, but that's another story.

That's fine during development or in a lab, not something we could
enforce in a customer environment, could we?

> I think HA/failover kind of environments use options like this too. If
> anything starts going bad they don't want to try limping along but stop
> ASAP.

Right. And in this particular case, can we guarantee no corruption
(leading to or post the multihit recovery) when running a customer workload,
is the question...

Ananth



Re: [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.

2018-08-09 Thread Nicholas Piggin
On Thu, 09 Aug 2018 16:34:07 +1000
Michael Ellerman  wrote:

> "Aneesh Kumar K.V"  writes:
> > On 08/08/2018 08:26 PM, Michael Ellerman wrote:  
> >> Mahesh J Salgaonkar  writes:  
> >>> From: Mahesh Salgaonkar 
> >>>
> >>> Introduce recovery action for recovered memory errors (MCEs). There are
> >>> soft memory errors like SLB Multihit, which can be a result of a bad
> >>> hardware OR software BUG. Kernel can easily recover from these soft errors
> >>> by flushing SLB contents. After the recovery kernel can still continue to
> >>> function without any issue. But in some scenario's we may keep getting
> >>> these soft errors until the root cause is fixed. To be able to analyze and
> >>> find the root cause, best way is to gather enough data and system state at
> >>> the time of MCE. Hence this patch introduces a sysctl knob where user can
> >>> decide either to continue after recovery or panic the kernel to capture 
> >>> the
> >>> dump.  
> >> 
> >> I'm not convinced we want this.
> >> 
> >> As we've discovered it's often not possible to reconstruct what happened
> >> based on a dump anyway.
> >> 
> >> The key thing you need is the content of the SLB and that's not included
> >> in a dump.
> >> 
> >> So I think we should dump the SLB content when we get the MCE (which
> >> this series does) and any other useful info, and then if we can recover
> >> we should.  
> >
> > The reasoning there is what if we got multi-hit due to some corruption 
> > in slb_cache_ptr. ie. some part of kernel is wrongly updating the paca 
> > data structure due to wrong pointer. Now that is far fetched, but then 
> > possible right?. Hence the idea that, if we don't have much insight into 
> > why a slb multi-hit occur from the dmesg which include slb content, 
> > slb_cache contents etc, there should be an easy way to force a dump that 
> > might assist in further debug.  
> 
> If you're debugging something complex that you can't determine from the
> SLB dump then you should be running a debug kernel anyway. And if
> anything you want to drop into xmon and sit there, preserving the most
> state, rather than taking a dump.

I'm not saying for a dump specifically, just some form of crash. And we
really should have an option to xmon on panic, but that's another story.

I think HA/failover kind of environments use options like this too. If
anything starts going bad they don't want to try limping along but stop
ASAP.

Thanks,
Nick


RE: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for P2020

2018-08-09 Thread Bharat Bhushan


> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: Thursday, August 9, 2018 11:42 AM
> To: Bharat Bhushan ;
> b...@kernel.crashing.org; pau...@samba.org; m...@ellerman.id.au;
> ga...@kernel.crashing.org; mark.rutl...@arm.com;
> kstew...@linuxfoundation.org; gre...@linuxfoundation.org;
> devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org
> Cc: r...@kernel.org; keesc...@chromium.org; tyr...@linux.vnet.ibm.com;
> j...@perches.com
> Subject: Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for P2020
> 
> On Thu, 2018-08-09 at 03:28 +, Bharat Bhushan wrote:
> > > -Original Message-
> > > From: Scott Wood [mailto:o...@buserror.net]
> > > Sent: Wednesday, August 8, 2018 11:27 PM
> > > To: Bharat Bhushan ;
> > > b...@kernel.crashing.org; pau...@samba.org; m...@ellerman.id.au;
> > > ga...@kernel.crashing.org; mark.rutl...@arm.com;
> > > kstew...@linuxfoundation.org; gre...@linuxfoundation.org;
> > > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> > > ker...@vger.kernel.org
> > > Cc: r...@kernel.org; keesc...@chromium.org;
> > > tyr...@linux.vnet.ibm.com; j...@perches.com
> > > Subject: Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for
> > > P2020
> > >
> > > On Wed, 2018-08-08 at 06:28 +, Bharat Bhushan wrote:
> > > > > -Original Message-
> > > > > From: Scott Wood [mailto:o...@buserror.net]
> > > > > Sent: Wednesday, August 8, 2018 11:26 AM
> > > > > To: Bharat Bhushan ;
> > > > > b...@kernel.crashing.org; pau...@samba.org; m...@ellerman.id.au;
> > > > > ga...@kernel.crashing.org; mark.rutl...@arm.com;
> > > > > kstew...@linuxfoundation.org; gre...@linuxfoundation.org;
> > > > > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> > > > > linux- ker...@vger.kernel.org
> > > > > Cc: r...@kernel.org; keesc...@chromium.org;
> > > > > tyr...@linux.vnet.ibm.com; j...@perches.com
> > > > > Subject: Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for
> > > > > P2020
> > > > >
> > > > > On Wed, 2018-08-08 at 03:44 +, Bharat Bhushan wrote:
> > > > > > > -Original Message-
> > > > > > > From: Scott Wood [mailto:o...@buserror.net]
> > > > > > > Sent: Wednesday, August 8, 2018 2:44 AM
> > > > > > > To: Bharat Bhushan ;
> > > > > > > b...@kernel.crashing.org; pau...@samba.org;
> > > > > > > m...@ellerman.id.au; ga...@kernel.crashing.org;
> > > > > > > mark.rutl...@arm.com; kstew...@linuxfoundation.org;
> > > > > > > gre...@linuxfoundation.org; devicet...@vger.kernel.org;
> > > > > > > linuxppc-dev@lists.ozlabs.org;
> > > > > > > linux- ker...@vger.kernel.org
> > > > > > > Cc: r...@kernel.org; keesc...@chromium.org;
> > > > > > > tyr...@linux.vnet.ibm.com; j...@perches.com
> > > > > > > Subject: Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges
> > > > > > > for
> > > > > > > P2020
> > > > > > >
> > > > > > > On Fri, 2018-07-27 at 15:18 +0530, Bharat Bhushan wrote:
> > > > > > > > MPIC on NXP (Freescale) P2020 supports following irq
> > > > > > > > ranges:
> > > > > > > >   > 0 - 11  (External interrupt)
> > > > > > > >   > 16 - 79 (Internal interrupt)
> > > > > > > >   > 176 - 183   (Messaging interrupt)
> > > > > > > >   > 224 - 231   (Shared message signaled interrupt)
> > > > > > >
> > > > > > > Why don't you convert to the 4-cell interrupt specifiers
> > > > > > > that make dealing with these ranges less error-prone?
> > > > > >
> > > > > > Ok , will do if we agree to have this series as per comment on
> > > > > > other patch.
> > > > >
> > > > > If you're concerned with errors, this would be a good things to
> > > > > do regardless.
> > > > >  Actually, it seems that p2020si-post.dtsi already uses 4-cell
> > > > > interrupts.
> > > > >
> > > > > What is motivating this patchset?  Is there something wrong in
> > > > > the existing dts files?
> > > >
> > > > There is no error in device tree. Main motivation is to improve
> > > > code for following reasons:
> > > >   - While code study it was found that if a reserved irq-number
> > > > used then there are no check in driver. irq will be configured as
> > > > correct and interrupt will never fire.
> > >
> > > Again, a wrong interrupt number won't fire, whether an interrupt by
> > > that number exists or not.  I wouldn't mind a sanity check in the
> > > driver if the programming model made it properly discoverable, but I
> > > don't think it's worth messing with device trees just for this (and
> > > even less so given that there don't seem to be new chips coming out
> > > that this would be relevant for).
> >
> > Fair enough, we can use MPIC version to define supported interrupts
> ranges.
> > Will that be acceptable.
> 
> It's better than device tree changes but I'm not convinced it's worthwhile 
> just
> to suppress some simulator warnings.
>  If the warnings really bother you, you
> can use pic-no-reset in the device tree (assuming this isn't some new chip
> that you want to make sure doesn't fall over when the 

Re: [PATCH 1/5] powerpc/asm: Add a patch_site macro & helpers for patching instructions

2018-08-09 Thread Christophe LEROY




Le 08/08/2018 à 18:30, Christophe LEROY a écrit :



Le 23/07/2018 à 17:07, Michael Ellerman a écrit :

Add a macro and some helper C functions for patching single asm
instructions.

The gas macro means we can do something like:

   1:    nop
   patch_site 1b, patch__foo

Which is less visually distracting than defining a GLOBAL symbol at 1,
and also doesn't pollute the symbol table which can confuse eg. perf.

These are obviously similar to our existing feature sections, but are
not automatically patched based on CPU/MMU features, rather they are
designed to be manually patched by C code at some arbitrary point.

Signed-off-by: Michael Ellerman 
---
  arch/powerpc/include/asm/code-patching-asm.h | 18 ++
  arch/powerpc/include/asm/code-patching.h |  2 ++
  arch/powerpc/lib/code-patching.c | 16 
  3 files changed, 36 insertions(+)
  create mode 100644 arch/powerpc/include/asm/code-patching-asm.h

diff --git a/arch/powerpc/include/asm/code-patching-asm.h 
b/arch/powerpc/include/asm/code-patching-asm.h

new file mode 100644
index ..ed7b1448493a
--- /dev/null
+++ b/arch/powerpc/include/asm/code-patching-asm.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright 2018, Michael Ellerman, IBM Corporation.
+ */
+#ifndef _ASM_POWERPC_CODE_PATCHING_ASM_H
+#define _ASM_POWERPC_CODE_PATCHING_ASM_H
+
+/* Define a "site" that can be patched */
+.macro patch_site label name
+    .pushsection ".rodata"
+    .balign 4
+    .global \name
+\name:
+    .4byte    \label - .
+    .popsection
+.endm
+
+#endif /* _ASM_POWERPC_CODE_PATCHING_ASM_H */
diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h

index 812535f40124..b2051234ada8 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -32,6 +32,8 @@ unsigned int create_cond_branch(const unsigned int 
*addr,

  int patch_branch(unsigned int *addr, unsigned long target, int flags);
  int patch_instruction(unsigned int *addr, unsigned int instr);
  int raw_patch_instruction(unsigned int *addr, unsigned int instr);
+int patch_instruction_site(s32 *addr, unsigned int instr);
+int patch_branch_site(s32 *site, unsigned long target, int flags);


Why use s32* instead of unsigned int* as usual for pointer to code ?


Forget my stupid question, I didn't see it was a relative address and 
not an absolute one.


Christophe



Christophe


  int instr_is_relative_branch(unsigned int instr);
  int instr_is_relative_link_branch(unsigned int instr);
diff --git a/arch/powerpc/lib/code-patching.c 
b/arch/powerpc/lib/code-patching.c

index e0d881ab304e..850f3b8f4da5 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -195,6 +195,22 @@ int patch_branch(unsigned int *addr, unsigned 
long target, int flags)

  return patch_instruction(addr, create_branch(addr, target, flags));
  }
+int patch_branch_site(s32 *site, unsigned long target, int flags)
+{
+    unsigned int *addr;
+
+    addr = (unsigned int *)((unsigned long)site + *site);
+    return patch_instruction(addr, create_branch(addr, target, flags));
+}
+
+int patch_instruction_site(s32 *site, unsigned int instr)
+{
+    unsigned int *addr;
+
+    addr = (unsigned int *)((unsigned long)site + *site);
+    return patch_instruction(addr, instr);
+}
+
  bool is_offset_in_branch_range(long offset)
  {
  /*



Re: [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.

2018-08-09 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> On 08/08/2018 08:26 PM, Michael Ellerman wrote:
>> Mahesh J Salgaonkar  writes:
>>> From: Mahesh Salgaonkar 
>>>
>>> Introduce recovery action for recovered memory errors (MCEs). There are
>>> soft memory errors like SLB Multihit, which can be a result of a bad
>>> hardware OR software BUG. Kernel can easily recover from these soft errors
>>> by flushing SLB contents. After the recovery kernel can still continue to
>>> function without any issue. But in some scenario's we may keep getting
>>> these soft errors until the root cause is fixed. To be able to analyze and
>>> find the root cause, best way is to gather enough data and system state at
>>> the time of MCE. Hence this patch introduces a sysctl knob where user can
>>> decide either to continue after recovery or panic the kernel to capture the
>>> dump.
>> 
>> I'm not convinced we want this.
>> 
>> As we've discovered it's often not possible to reconstruct what happened
>> based on a dump anyway.
>> 
>> The key thing you need is the content of the SLB and that's not included
>> in a dump.
>> 
>> So I think we should dump the SLB content when we get the MCE (which
>> this series does) and any other useful info, and then if we can recover
>> we should.
>
> The reasoning there is what if we got multi-hit due to some corruption 
> in slb_cache_ptr. ie. some part of kernel is wrongly updating the paca 
> data structure due to wrong pointer. Now that is far fetched, but then 
> possible right?. Hence the idea that, if we don't have much insight into 
> why a slb multi-hit occur from the dmesg which include slb content, 
> slb_cache contents etc, there should be an easy way to force a dump that 
> might assist in further debug.

If you're debugging something complex that you can't determine from the
SLB dump then you should be running a debug kernel anyway. And if
anything you want to drop into xmon and sit there, preserving the most
state, rather than taking a dump.

The last SLB multi-hit I debugged was this:

  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=db7130d63fd8


Which took quite a while to track down, including a bunch of tracing and
so on. A dump would not have helped in the slightest.

cheers


Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for P2020

2018-08-09 Thread Scott Wood
On Thu, 2018-08-09 at 03:28 +, Bharat Bhushan wrote:
> > -Original Message-
> > From: Scott Wood [mailto:o...@buserror.net]
> > Sent: Wednesday, August 8, 2018 11:27 PM
> > To: Bharat Bhushan ;
> > b...@kernel.crashing.org; pau...@samba.org; m...@ellerman.id.au;
> > ga...@kernel.crashing.org; mark.rutl...@arm.com;
> > kstew...@linuxfoundation.org; gre...@linuxfoundation.org;
> > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> > ker...@vger.kernel.org
> > Cc: r...@kernel.org; keesc...@chromium.org; tyr...@linux.vnet.ibm.com;
> > j...@perches.com
> > Subject: Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for P2020
> > 
> > On Wed, 2018-08-08 at 06:28 +, Bharat Bhushan wrote:
> > > > -Original Message-
> > > > From: Scott Wood [mailto:o...@buserror.net]
> > > > Sent: Wednesday, August 8, 2018 11:26 AM
> > > > To: Bharat Bhushan ;
> > > > b...@kernel.crashing.org; pau...@samba.org; m...@ellerman.id.au;
> > > > ga...@kernel.crashing.org; mark.rutl...@arm.com;
> > > > kstew...@linuxfoundation.org; gre...@linuxfoundation.org;
> > > > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> > > > ker...@vger.kernel.org
> > > > Cc: r...@kernel.org; keesc...@chromium.org;
> > > > tyr...@linux.vnet.ibm.com; j...@perches.com
> > > > Subject: Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for
> > > > P2020
> > > > 
> > > > On Wed, 2018-08-08 at 03:44 +, Bharat Bhushan wrote:
> > > > > > -Original Message-
> > > > > > From: Scott Wood [mailto:o...@buserror.net]
> > > > > > Sent: Wednesday, August 8, 2018 2:44 AM
> > > > > > To: Bharat Bhushan ;
> > > > > > b...@kernel.crashing.org; pau...@samba.org; m...@ellerman.id.au;
> > > > > > ga...@kernel.crashing.org; mark.rutl...@arm.com;
> > > > > > kstew...@linuxfoundation.org; gre...@linuxfoundation.org;
> > > > > > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> > > > > > linux- ker...@vger.kernel.org
> > > > > > Cc: r...@kernel.org; keesc...@chromium.org;
> > > > > > tyr...@linux.vnet.ibm.com; j...@perches.com
> > > > > > Subject: Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for
> > > > > > P2020
> > > > > > 
> > > > > > On Fri, 2018-07-27 at 15:18 +0530, Bharat Bhushan wrote:
> > > > > > > MPIC on NXP (Freescale) P2020 supports following irq
> > > > > > > ranges:
> > > > > > >   > 0 - 11  (External interrupt)
> > > > > > >   > 16 - 79 (Internal interrupt)
> > > > > > >   > 176 - 183   (Messaging interrupt)
> > > > > > >   > 224 - 231   (Shared message signaled interrupt)
> > > > > > 
> > > > > > Why don't you convert to the 4-cell interrupt specifiers that
> > > > > > make dealing with these ranges less error-prone?
> > > > > 
> > > > > Ok , will do if we agree to have this series as per comment on
> > > > > other patch.
> > > > 
> > > > If you're concerned with errors, this would be a good things to do
> > > > regardless.
> > > >  Actually, it seems that p2020si-post.dtsi already uses 4-cell
> > > > interrupts.
> > > > 
> > > > What is motivating this patchset?  Is there something wrong in the
> > > > existing dts files?
> > > 
> > > There is no error in device tree. Main motivation is to improve code
> > > for following reasons:
> > >   - While code study it was found that if a reserved irq-number used
> > > then there are no check in driver. irq will be configured as correct
> > > and interrupt will never fire.
> > 
> > Again, a wrong interrupt number won't fire, whether an interrupt by that
> > number exists or not.  I wouldn't mind a sanity check in the driver if the
> > programming model made it properly discoverable, but I don't think it's
> > worth messing with device trees just for this (and even less so given that
> > there don't seem to be new chips coming out that this would be relevant
> > for).
> 
> Fair enough, we can use MPIC version to define supported interrupts ranges.
> Will that be acceptable.

It's better than device tree changes but I'm not convinced it's worthwhile
just to suppress some simulator warnings.  If the warnings really bother you,
you can use pic-no-reset in the device tree (assuming this isn't some new chip
that you want to make sure doesn't fall over when the usual mpic init happens)
and/or convince the hardware people to make the interface properly
discoverable including discontiguous regions (if there *is* some new chip I
haven't heard about).

-Scott



Re: [PATCH] lib/test_hexdump: fix failure on big endian cpu

2018-08-09 Thread Michael Ellerman
rashmica  writes:
> On 08/08/18 17:25, Michael Ellerman wrote:
>> Christophe Leroy  writes:
>>> diff --git a/lib/test_hexdump.c b/lib/test_hexdump.c
>>> index 3f415d8101f3..626f580b4ff7 100644
>>> --- a/lib/test_hexdump.c
>>> +++ b/lib/test_hexdump.c
>>> @@ -32,16 +32,33 @@ static const char * const test_data_2_le[] __initconst 
>>> = {
>>> "d14c", "9919", "b143", "0caf",
>>>  };
>>>  
>>> +static const char * const test_data_2_be[] __initconst = {
>>> +   "be32", "db7b", "0a18", "93b2",
>>> +   "70ba", "c424", "7d83", "349b",
>>> +   "a69c", "31ad", "9c0f", "ace9",
>>> +   "4cd1", "1999", "43b1", "af0c",
>>> +};
>>> +
>>>  static const char * const test_data_4_le[] __initconst = {
>>> "7bdb32be", "b293180a", "24c4ba70", "9b34837d",
>>> "ad319ca6", "e9ac0f9c", "9919d14c", "0cafb143",
>>>  };
>>>  
>>> +static const char * const test_data_4_be[] __initconst = {
>>> +   "be32db7b", "0a1893b2", "70bac424", "7d83349b",
>>> +   "a69c31ad", "9c0face9", "4cd11999", "43b1af0c",
>>> +};
>>> +
>> Is there a reason we can't do it all at compile time?
>
> mpe I sent a patch doing that awhile ago and you obviously didn't like
> it because you never merged it :P

Sorry, I wasn't sure who should merge it, and never followed up.

cheers


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-09 Thread Christoph Hellwig
On Thu, Aug 09, 2018 at 08:13:32AM +1000, Benjamin Herrenschmidt wrote:
> > > - if (xen_domain())
> > > + if (xen_domain() || pseries_secure_vm())
> > >   return true;
> > 
> > I don't think it's pseries specific actually. E.g. I suspect AMD SEV
> > might benefit from the same kind of hack.
> 
> As long as they can provide the same guarantee that the DMA ops are
> completely equivalent between virtio and other PCI devices, at least on
> the same bus, ie, we don't have to go hack special DMA ops.
> 
> I think the latter is really what Christoph wants to avoid for good
> reasons.

Yes.  I also generally want to avoid too much arch specific magic.

FYI, I'm off to a week-long vacation today, don't expect quick replies.