Re: [PATCH] powerpc/mm: Fixup wrong LPCR_VRMASD value

2016-12-16 Thread kbuild test robot
Hi Aneesh,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.9 next-20161216]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-mm-Fixup-wrong-LPCR_VRMASD-value/20161208-161150
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-pseries_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/bitops.h:46:0,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/list.h:8,
from include/linux/preempt.h:10,
from include/linux/hardirq.h:4,
from include/linux/kvm_host.h:10,
from arch/powerpc/kvm/book3s_hv.c:21:
   arch/powerpc/kvm/book3s_hv.c: In function 'kvmppc_hv_setup_htab_rma':
>> arch/powerpc/include/asm/reg.h:341:35: error: invalid suffix "fUL" on 
>> integer constant
#define   LPCR_VRMASD  (ASM_CONST(1f) << LPCR_VRMASD_SH)
  ^
   arch/powerpc/include/asm/asm-compat.h:14:26: note: in definition of macro 
'__ASM_CONST'
#  define __ASM_CONST(x) x##UL
 ^
>> arch/powerpc/include/asm/reg.h:341:25: note: in expansion of macro 
>> 'ASM_CONST'
#define   LPCR_VRMASD  (ASM_CONST(1f) << LPCR_VRMASD_SH)
^
>> arch/powerpc/kvm/book3s_hv.c:3079:32: note: in expansion of macro 
>> 'LPCR_VRMASD'
 kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
   ^~~
--
   In file included from arch/powerpc/include/asm/bitops.h:46:0,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/list.h:8,
from include/linux/preempt.h:10,
from include/linux/hardirq.h:4,
from include/linux/kvm_host.h:10,
from arch/powerpc/kvm/book3s_64_mmu_hv.c:21:
   arch/powerpc/kvm/book3s_64_mmu_hv.c: In function 'kvm_htab_write':
>> arch/powerpc/include/asm/reg.h:341:35: error: invalid suffix "fUL" on 
>> integer constant
#define   LPCR_VRMASD  (ASM_CONST(1f) << LPCR_VRMASD_SH)
  ^
   arch/powerpc/include/asm/asm-compat.h:14:26: note: in definition of macro 
'__ASM_CONST'
#  define __ASM_CONST(x) x##UL
 ^
>> arch/powerpc/include/asm/reg.h:341:25: note: in expansion of macro 
>> 'ASM_CONST'
#define   LPCR_VRMASD  (ASM_CONST(1f) << LPCR_VRMASD_SH)
^
>> arch/powerpc/kvm/book3s_64_mmu_hv.c:1416:35: note: in expansion of macro 
>> 'LPCR_VRMASD'
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
  ^~~

vim +/fUL +341 arch/powerpc/include/asm/reg.h

   335  #define   LPCR_VPM1 ASM_CONST(0x4000)
   336  #define   LPCR_ISL  ASM_CONST(0x2000)
   337  #define   LPCR_VC_SH61
   338  #define   LPCR_DPFD_SH  52
   339  #define   LPCR_DPFD (ASM_CONST(7) << LPCR_DPFD_SH)
   340  #define   LPCR_VRMASD_SH47
 > 341  #define   LPCR_VRMASD   (ASM_CONST(1f) << LPCR_VRMASD_SH)
   342  #define   LPCR_VRMA_L   ASM_CONST(0x0008)
   343  #define   LPCR_VRMA_LP0 ASM_CONST(0x0001)
   344  #define   LPCR_VRMA_LP1 ASM_CONST(0x8000)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH V2 0/4] OPTPROBES for powerpc

2016-12-16 Thread Anju T Sudhakar

Hi Balbir,



On Friday 16 December 2016 08:16 PM, Balbir Singh wrote:


On 15/12/16 03:18, Anju T Sudhakar wrote:

This is the V2 patchset of the kprobes jump optimization
(a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
for kernel developers, enhancing the performance of kprobe has
got much importance.

Currently kprobes inserts a trap instruction to probe a running kernel.
Jump optimization allows kprobes to replace the trap with a branch,
reducing the probe overhead drastically.

In this series, conditional branch instructions are not considered for
optimization as they have to be assessed carefully in SMP systems.

The kprobe placed on the kretprobe_trampoline during boot time, is also
optimized in this series. Patch 4/4 furnishes this.

The first two patches can go independently of the series. The helper
functions in these patches are invoked in patch 3/4.

Performance:

An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.
  
Example:
  
Placed a probe at an offset 0x50 in _do_fork().

*Time Diff here is, difference in time before hitting the probe and
after the probed instruction. mftb() is employed in kernel/fork.c for
this purpose.
  
# echo 0 > /proc/sys/debug/kprobes-optimization

Kprobes globally unoptimized
  [  233.607120] Time Diff = 0x1f0
  [  233.608273] Time Diff = 0x1ee
  [  233.609228] Time Diff = 0x203
  [  233.610400] Time Diff = 0x1ec
  [  233.611335] Time Diff = 0x200
  [  233.612552] Time Diff = 0x1f0
  [  233.613386] Time Diff = 0x1ee
  [  233.614547] Time Diff = 0x212
  [  233.615570] Time Diff = 0x206
  [  233.616819] Time Diff = 0x1f3
  [  233.617773] Time Diff = 0x1ec
  [  233.618944] Time Diff = 0x1fb
  [  233.619879] Time Diff = 0x1f0
  [  233.621066] Time Diff = 0x1f9
  [  233.621999] Time Diff = 0x283
  [  233.623281] Time Diff = 0x24d
  [  233.624172] Time Diff = 0x1ea
  [  233.625381] Time Diff = 0x1f0
  [  233.626358] Time Diff = 0x200
  [  233.627572] Time Diff = 0x1ed
  
# echo 1 > /proc/sys/debug/kprobes-optimization

Kprobes globally optimized
  [   70.797075] Time Diff = 0x103
  [   70.799102] Time Diff = 0x181
  [   70.801861] Time Diff = 0x15e
  [   70.803466] Time Diff = 0xf0
  [   70.804348] Time Diff = 0xd0
  [   70.805653] Time Diff = 0xad
  [   70.806477] Time Diff = 0xe0
  [   70.807725] Time Diff = 0xbe
  [   70.808541] Time Diff = 0xc3
  [   70.810191] Time Diff = 0xc7
  [   70.811007] Time Diff = 0xc0
  [   70.812629] Time Diff = 0xc0
  [   70.813640] Time Diff = 0xda
  [   70.814915] Time Diff = 0xbb
  [   70.815726] Time Diff = 0xc4
  [   70.816955] Time Diff = 0xc0
  [   70.817778] Time Diff = 0xcd
  [   70.818999] Time Diff = 0xcd
  [   70.820099] Time Diff = 0xcb
  [   70.821333] Time Diff = 0xf0

Implementation:
===
  
The trap instruction is replaced by a branch to a detour buffer. To address

the limitation of branch instruction in power architecture, detour buffer
slot is allocated from a reserved area . This will ensure that the branch
is within ± 32 MB range. The current kprobes insn caches allocate memory
area for insn slots with module_alloc(). This will always be beyond
± 32MB range.
  

The paragraph is a little confusing. We need the detour buffer to be within
+-32 MB, but then you say we always get memory from module_alloc() beyond
32MB.


The last two lines in the paragraph talks about the*current 
*method**which the regular kprobe uses
for allocating instruction slot. So in our case, we can't use 
module_alloc() since there is no guarantee that the slot allocated will 
be within +/- 32MB range.

The detour buffer contains a call to optimized_callback() which in turn
call the pre_handler(). Once the pre-handler is run, the original
instruction is emulated from the detour buffer itself. Also the detour
buffer is equipped with a branch back to the normal work flow after the
probed instruction is emulated.

Does the branch itself use registers that need to be saved? I presume
we are going to rely on the +-32MB, what are the guarantees of success
of such a mechanism?


For branching back to the next instruction, after the execution of the 
kprobe's pre-handler,
we place the branch instruction in the detour buffer itself. Hence we 
don't have to clobber any registers

after restoring them.
Before optimizing the kprobe we make sure that , 'branch to detour 
buffer' and 'branch back from detour buffer' is within +/- 32MB range. 
This ensures the working of optimized kprobe.



Thanks ,
Anju



Balbir Singh.





Re: [PATCH v3 11/15] livepatch: use kstrtobool() in enabled_store()

2016-12-16 Thread Josh Poimboeuf
On Fri, Dec 16, 2016 at 05:55:55PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 12:08:36, Josh Poimboeuf wrote:
> > The sysfs enabled value is a boolean, so kstrtobool() is a better fit
> > for parsing the input string since it does the range checking for us.
> > 
> > Suggested-by: Petr Mladek 
> > Signed-off-by: Josh Poimboeuf 
> > ---
> >  kernel/livepatch/core.c | 11 ---
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> > index 6a137e1..8ca8a0e 100644
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -408,26 +408,23 @@ static ssize_t enabled_store(struct kobject *kobj, 
> > struct kobj_attribute *attr,
> >  {
> > struct klp_patch *patch;
> > int ret;
> > -   unsigned long val;
> > +   bool enabled;
> >  
> > -   ret = kstrtoul(buf, 10, );
> > +   ret = kstrtobool(buf, );
> > if (ret)
> > return -EINVAL;
> 
> I would return "ret" here. It is -EINVAL as well but... ;-)

That was a preexisting issue with the kstrtoul() return code, but I'll
sneak your suggested change into this patch if nobody objects.

> Anyway, feel free to use:
> 
> Reviewed-by: Petr Mladek 
> 
> Best Regards,
> Petr

-- 
Josh


Re: [PATCH v3 03/15] livepatch: temporary stubs for klp_patch_pending() and klp_update_patch_state()

2016-12-16 Thread Josh Poimboeuf
On Fri, Dec 16, 2016 at 03:41:59PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 12:08:28, Josh Poimboeuf wrote:
> > Create temporary stubs for klp_patch_pending() and
> > klp_update_patch_state() so we can add TIF_PATCH_PENDING to different
> > architectures in separate patches without breaking build bisectability.
> > 
> > Signed-off-by: Josh Poimboeuf 
> > ---
> >  include/linux/livepatch.h | 7 ++-
> >  kernel/livepatch/core.c   | 3 +++
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> > index 9072f04..60558d8 100644
> > --- a/include/linux/livepatch.h
> > +++ b/include/linux/livepatch.h
> > @@ -123,10 +123,15 @@ void arch_klp_init_object_loaded(struct klp_patch 
> > *patch,
> >  int klp_module_coming(struct module *mod);
> >  void klp_module_going(struct module *mod);
> >  
> > +static inline bool klp_patch_pending(struct task_struct *task) { return 
> > false; }
> 
> I was curious about this. It is implemented correctly in the 13th
> patch and it is never used until 13th patch.

Yep, I'll move it to patch 13.

> 
> > +void klp_update_patch_state(struct task_struct *task);
> 
> It seems that the stub for this function is enough.
> 
> Well, the extra function is just a cosmetic problem. If it could be
> fixed, it would be great. But the patch makes sense:
> 
> Reviewed-by: Petr Mladek 
> 
> Best Regards,
> Petr
> 

-- 
Josh


Re: [PATCH v3 02/15] x86/entry: define _TIF_ALLWORK_MASK flags explicitly

2016-12-16 Thread Josh Poimboeuf
On Fri, Dec 16, 2016 at 03:17:35PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 12:08:27, Josh Poimboeuf wrote:
> > The _TIF_ALLWORK_MASK macro automatically includes the least-significant
> > 16 bits of the thread_info flags, which is less than obvious and tends
> > to create confusion and surprises when reading or modifying the code.
> > 
> > Define the flags explicitly.
> > 
> > Signed-off-by: Josh Poimboeuf 
> > ---
> >  arch/x86/include/asm/thread_info.h | 9 -
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/thread_info.h 
> > b/arch/x86/include/asm/thread_info.h
> > index ad6f5eb0..1fe6043 100644
> > --- a/arch/x86/include/asm/thread_info.h
> > +++ b/arch/x86/include/asm/thread_info.h
> > @@ -73,9 +73,6 @@ struct thread_info {
> >   * thread information flags
> >   * - these are process state flags that various assembly files
> >   *   may need to access
> > - * - pending work-to-be-done flags are in LSW
> 
> Yup, this is not true because also some flags from the most
> significant bits are in the _TIF_ALLWORK_MASK.
> 
> > - * - other flags in MSW
> > - * Warning: layout of LSW is hardcoded in entry.S
> >   */
> >  #define TIF_SYSCALL_TRACE  0   /* syscall trace active */
> >  #define TIF_NOTIFY_RESUME  1   /* callback before returning to user */
> > @@ -133,8 +130,10 @@ struct thread_info {
> >  
> >  /* work to do on any return to user space */
> >  #define _TIF_ALLWORK_MASK  \
> > -   ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT |   \
> > -   _TIF_NOHZ)
> > +   (_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |\
> > +_TIF_SINGLESTEP | _TIF_NEED_RESCHED | _TIF_SYSCALL_EMU |   \
> > +_TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE |   \
> > +_TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ)
> 
> All flags are sorted by the number except for
> _TIF_SINGLESTEP and _TIF_NEED_RESCHED  ;-)

You're right, I'll swap them :-)

> 
> The patch does not change the existing behavior. The same
> existing flags are listed.
> 
> Reviewed-by: Petr Mladek 
> 
> Best Regards,
> Petr

-- 
Josh


Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-16 Thread Josh Poimboeuf
On Fri, Dec 16, 2016 at 02:07:39PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 12:08:26, Josh Poimboeuf wrote:
> > For live patching and possibly other use cases, a stack trace is only
> > useful if it can be assured that it's completely reliable.  Add a new
> > save_stack_trace_tsk_reliable() function to achieve that.
> > 
> > Scenarios which indicate that a stack trace may be unreliable:
> > 
> > - running task
> 
> It seems that this has to be enforced by save_stack_trace_tsk_reliable()
> caller. It should be mentioned in the function description.

Agreed.

> > - interrupt stack
> 
> I guess that it is detected by saved regs on the stack. And it covers
> also dynamic changes like kprobes. Do I get it correctly, please? 

Right.

> What about ftrace? Is ftrace without regs safe and detected?

Yes, it's safe because the mcount code does the right thing with respect
to frame pointers.  See save_mcount_regs().

> > - preemption
> 
> I wonder if some very active kthreads might almost always be
> preempted using irq in preemptive kernel. Then they block
> the conversion with the non-reliable stacks. Have you noticed
> such problems, please?

I haven't seen such a case and I think it would be quite rare for a
kthread to be CPU-bound like that.

> > - corrupted stack data
> > - stack grows the wrong way
> 
> This is detected in unwind_next_frame() and passed via state->error.
> Am I right?

Right.  I'll add more details to the commit message for all of these.
> 
> 
> > - stack walk doesn't reach the bottom
> > - user didn't provide a large enough entries array
> > 
> > Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
> > determine at build time whether the function is implemented.
> > 
> > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> > index 0653788..3e0cf5e 100644
> > --- a/arch/x86/kernel/stacktrace.c
> > +++ b/arch/x86/kernel/stacktrace.c
> > @@ -74,6 +74,64 @@ void save_stack_trace_tsk(struct task_struct *tsk, 
> > struct stack_trace *trace)
> >  }
> >  EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
> >  
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +static int __save_stack_trace_reliable(struct stack_trace *trace,
> > +  struct task_struct *task)
> > +{
> > +   struct unwind_state state;
> > +   struct pt_regs *regs;
> > +   unsigned long addr;
> > +
> > +   for (unwind_start(, task, NULL, NULL); !unwind_done();
> > +unwind_next_frame()) {
> > +
> > +   regs = unwind_get_entry_regs();
> > +   if (regs) {
> > +   /*
> > +* Preemption and page faults on the stack can make
> > +* frame pointers unreliable.
> > +*/
> > +   if (!user_mode(regs))
> > +   return -1;
> 
> By other words, it we find regs on the stack, it almost always mean
> a non-reliable stack. The only exception is when we are in the
> userspace mode. Do I get it correctly, please?

Right.

> > +
> > +   /*
> > +* This frame contains the (user mode) pt_regs at the
> > +* end of the stack.  Finish the unwind.
> > +*/
> > +   unwind_next_frame();
> > +   break;
> > +   }
> > +
> > +   addr = unwind_get_return_address();
> > +   if (!addr || save_stack_address(trace, addr, false))
> > +   return -1;
> > +   }
> > +
> > +   if (!unwind_done() || unwind_error())
> > +   return -1;
> > +
> > +   if (trace->nr_entries < trace->max_entries)
> > +   trace->entries[trace->nr_entries++] = ULONG_MAX;
> > +
> > +   return 0;
> > +}
> 
> Great work! I am surprised that it looks so straightforward.
> 
> I still have to think and investigate it more. But it looks
> very promissing.
> 
> Best Regards,
> Petr

-- 
Josh


Re: [PATCH] perf TUI: Don't throw error for zero length symbols

2016-12-16 Thread Anton Blanchard
Hi Ravi,

> > perf report (with TUI) exits with error when it finds a sample of
> > zero length symbol(i.e. addr == sym->start == sym->end). Actually
> > these are valid samples. Don't exit TUI and show report with such
> > symbols.
> >
> > Link: https://lkml.org/lkml/2016/10/8/189

You can add:

Tested-by: Anton Blanchard 

Also, since this issue makes perf report pretty much useless on
ppc64, can we mark it for stable@, at least to get it into 4.9 where
the ppc64 kernel changes that triggered this appeared?

Anton

> > Reported-by: Anton Blanchard 
> > Signed-off-by: Ravi Bangoria 
> > ---
> >  tools/perf/util/annotate.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > index aeb5a44..430d039 100644
> > --- a/tools/perf/util/annotate.c
> > +++ b/tools/perf/util/annotate.c
> > @@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct
> > symbol *sym, struct map *map,
> >
> > pr_debug3("%s: addr=%#" PRIx64 "\n", __func__,
> > map->unmap_ip(map, addr));
> >
> > -   if (addr < sym->start || addr >= sym->end) {
> > +   if ((addr < sym->start || addr >= sym->end) &&
> > +   (addr != sym->end || sym->start != sym->end)) {
> > pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#"
> > PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n", __func__,
> > __LINE__, sym->name, sym->start, addr, sym->end); return -ERANGE;  
> 



Re: [PATCH V2 3/4] arch/powerpc: Implement Optprobes

2016-12-16 Thread Anju T Sudhakar

Hi Masami,



On Friday 16 December 2016 07:32 PM, Masami Hiramatsu wrote:

On Wed, 14 Dec 2016 21:48:27 +0530
Anju T Sudhakar  wrote:


Detour buffer contains instructions to create an in memory pt_regs.
After the execution of the pre-handler, a call is made for instruction 
emulation.
The NIP is determined in advanced through dummy instruction emulation and a 
branch
instruction is created to the NIP at the end of the trampoline.

Instruction slot for detour buffer is allocated from the reserved area.
For the time being, 64KB is reserved in memory for this purpose.

Instructions which can be emulated using analyse_instr() are suppliants
for optimization. Before optimization ensure that the address range
between the detour buffer allocated and the instruction being probed
is within ± 32MB.

Thank you for updating!
I think this has no critical issue, but I have some comments on it.
(just cleanup and hardenings)
Please see below.


Signed-off-by: Anju T Sudhakar 
Signed-off-by: Naveen N. Rao 
---
  .../features/debug/optprobes/arch-support.txt  |   2 +-
  arch/powerpc/Kconfig   |   1 +
  arch/powerpc/include/asm/kprobes.h |  23 +-
  arch/powerpc/include/asm/sstep.h   |   1 +
  arch/powerpc/kernel/Makefile   |   1 +
  arch/powerpc/kernel/optprobes.c| 333 +
  arch/powerpc/kernel/optprobes_head.S   | 135 +
  arch/powerpc/lib/sstep.c   |  22 ++
  8 files changed, 516 insertions(+), 2 deletions(-)
  create mode 100644 arch/powerpc/kernel/optprobes.c
  create mode 100644 arch/powerpc/kernel/optprobes_head.S

diff --git a/Documentation/features/debug/optprobes/arch-support.txt 
b/Documentation/features/debug/optprobes/arch-support.txt
index b8999d8..45bc99d 100644
--- a/Documentation/features/debug/optprobes/arch-support.txt
+++ b/Documentation/features/debug/optprobes/arch-support.txt
@@ -27,7 +27,7 @@
  |   nios2: | TODO |
  |openrisc: | TODO |
  |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
  |s390: | TODO |
  |   score: | TODO |
  |  sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c7f120a..d563f0a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -98,6 +98,7 @@ config PPC
select HAVE_IOREMAP_PROT
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !(CPU_LITTLE_ENDIAN && 
POWER7_CPU)
select HAVE_KPROBES
+   select HAVE_OPTPROBES if PPC64
select HAVE_ARCH_KGDB
select HAVE_KRETPROBES
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/powerpc/include/asm/kprobes.h 
b/arch/powerpc/include/asm/kprobes.h
index 2c9759bd..739ddc5 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -38,7 +38,22 @@ struct pt_regs;
  struct kprobe;
  
  typedef ppc_opcode_t kprobe_opcode_t;

-#define MAX_INSN_SIZE 1
+
+extern kprobe_opcode_t optinsn_slot;
+
+/* Optinsn template address */
+extern kprobe_opcode_t optprobe_template_entry[];
+extern kprobe_opcode_t optprobe_template_op_address[];
+extern kprobe_opcode_t optprobe_template_call_handler[];
+extern kprobe_opcode_t optprobe_template_insn[];
+extern kprobe_opcode_t optprobe_template_call_emulate[];
+extern kprobe_opcode_t optprobe_template_ret[];
+extern kprobe_opcode_t optprobe_template_end[];
+
+#define MAX_INSN_SIZE  1
+#define MAX_OPTIMIZED_LENGTH   4
+#define MAX_OPTINSN_SIZE   (optprobe_template_end - 
optprobe_template_entry)
+#define RELATIVEJUMP_SIZE  4

These size/length macros seems a bit odd. I guess MAX_INSN_SIZE
is based on sizeof(MAX_INSN_SIZE), but others are in byte.
Could you fix that? For example, define it with
sizeof(kprobe_opcode_t), and add comments on it, etc.


Sure. I will look into this and define it in  a consistent way.

  
  #ifdef PPC64_ELF_ABI_v2

  /* PPC64 ABIv2 needs local entry point */
@@ -124,6 +139,12 @@ struct kprobe_ctlblk {
struct prev_kprobe prev_kprobe;
  };
  
+struct arch_optimized_insn {

+   kprobe_opcode_t copied_insn[1];
+   /* detour buffer */
+   kprobe_opcode_t *insn;
+};
+
  extern int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
  extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index d3a42cc..f7ad425 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -87,3 +87,4 @@ struct instruction_op {
  
  extern int analyse_instr(struct instruction_op *op, struct pt_regs *regs,

 unsigned int instr);
+extern bool is_conditional_branch(unsigned int instr);
diff --git a/arch/powerpc/kernel/Makefile 

Re: [PATCH V2 0/4] OPTPROBES for powerpc

2016-12-16 Thread Naveen N. Rao
On 2016/12/17 01:46AM, Balbir Singh wrote:
> 
> 
> On 15/12/16 03:18, Anju T Sudhakar wrote:
> > This is the V2 patchset of the kprobes jump optimization
> > (a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
> > for kernel developers, enhancing the performance of kprobe has
> > got much importance.
> > 
> > Currently kprobes inserts a trap instruction to probe a running kernel.
> > Jump optimization allows kprobes to replace the trap with a branch,
> > reducing the probe overhead drastically.
> > 
> > In this series, conditional branch instructions are not considered for
> > optimization as they have to be assessed carefully in SMP systems.
> > 
> > The kprobe placed on the kretprobe_trampoline during boot time, is also
> > optimized in this series. Patch 4/4 furnishes this.
> > 
> > The first two patches can go independently of the series. The helper 
> > functions in these patches are invoked in patch 3/4.
> > 
> > Performance:
> > 
> > An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.
> >  
> > Example:
> >  
> > Placed a probe at an offset 0x50 in _do_fork().
> > *Time Diff here is, difference in time before hitting the probe and
> > after the probed instruction. mftb() is employed in kernel/fork.c for
> > this purpose.
> >  
> > # echo 0 > /proc/sys/debug/kprobes-optimization
> > Kprobes globally unoptimized
> >  [  233.607120] Time Diff = 0x1f0
> >  [  233.608273] Time Diff = 0x1ee
> >  [  233.609228] Time Diff = 0x203
> >  [  233.610400] Time Diff = 0x1ec
> >  [  233.611335] Time Diff = 0x200
> >  [  233.612552] Time Diff = 0x1f0
> >  [  233.613386] Time Diff = 0x1ee
> >  [  233.614547] Time Diff = 0x212
> >  [  233.615570] Time Diff = 0x206
> >  [  233.616819] Time Diff = 0x1f3
> >  [  233.617773] Time Diff = 0x1ec
> >  [  233.618944] Time Diff = 0x1fb
> >  [  233.619879] Time Diff = 0x1f0
> >  [  233.621066] Time Diff = 0x1f9
> >  [  233.621999] Time Diff = 0x283
> >  [  233.623281] Time Diff = 0x24d
> >  [  233.624172] Time Diff = 0x1ea
> >  [  233.625381] Time Diff = 0x1f0
> >  [  233.626358] Time Diff = 0x200
> >  [  233.627572] Time Diff = 0x1ed
> >  
> > # echo 1 > /proc/sys/debug/kprobes-optimization
> > Kprobes globally optimized
> >  [   70.797075] Time Diff = 0x103
> >  [   70.799102] Time Diff = 0x181
> >  [   70.801861] Time Diff = 0x15e
> >  [   70.803466] Time Diff = 0xf0
> >  [   70.804348] Time Diff = 0xd0
> >  [   70.805653] Time Diff = 0xad
> >  [   70.806477] Time Diff = 0xe0
> >  [   70.807725] Time Diff = 0xbe
> >  [   70.808541] Time Diff = 0xc3
> >  [   70.810191] Time Diff = 0xc7
> >  [   70.811007] Time Diff = 0xc0
> >  [   70.812629] Time Diff = 0xc0
> >  [   70.813640] Time Diff = 0xda
> >  [   70.814915] Time Diff = 0xbb
> >  [   70.815726] Time Diff = 0xc4
> >  [   70.816955] Time Diff = 0xc0
> >  [   70.817778] Time Diff = 0xcd
> >  [   70.818999] Time Diff = 0xcd
> >  [   70.820099] Time Diff = 0xcb
> >  [   70.821333] Time Diff = 0xf0
> > 
> > Implementation:
> > ===
> >  
> > The trap instruction is replaced by a branch to a detour buffer. To address
> > the limitation of branch instruction in power architecture, detour buffer
> > slot is allocated from a reserved area . This will ensure that the branch
> > is within ± 32 MB range. The current kprobes insn caches allocate memory 
> > area for insn slots with module_alloc(). This will always be beyond 
> > ± 32MB range.
> >  
> 
> The paragraph is a little confusing. We need the detour buffer to be within
> +-32 MB, but then you say we always get memory from module_alloc() beyond
> 32MB.

Yes, I think it can be described better. What Anju is mentioning is that 
the existing generic approach for kprobes insn cache uses module_alloc() 
which is not suitable for us due to the 32MB range limit with relative 
branches on powerpc.

Instead, we reserve a 64k block within .text and allocate the detour 
buffer from that area. This puts the detour buffer in range for most of 
the symbols and should be a good start.

> 
> > The detour buffer contains a call to optimized_callback() which in turn
> > call the pre_handler(). Once the pre-handler is run, the original
> > instruction is emulated from the detour buffer itself. Also the detour
> > buffer is equipped with a branch back to the normal work flow after the
> > probed instruction is emulated.
> 
> Does the branch itself use registers that need to be saved? I presume

No, we use immediate values to encode the relative address.

> we are going to rely on the +-32MB, what are the guarantees of success
> of such a mechanism?

We explicitly ensure that the return branch is within range as well 
during registration. In fact, this is one of the reasons why we can't 
optimize conditional branches - we can't know in advance where we need 
to jump back.

> 
> Balbir Singh.
> 

Thanks,
- Naveen



RE: [upstream-release] [PATCH net 2/4] fsl/fman: arm: call of_platform_populate() for arm64 platfrom

2016-12-16 Thread Madalin-Cristian Bucur
> From: Scott Wood
> Sent: Thursday, December 15, 2016 8:42 PM
> 
> On 12/15/2016 07:11 AM, Madalin Bucur wrote:
> > From: Igal Liberman 
> >
> > Signed-off-by: Igal Liberman 
> > ---
> >  drivers/net/ethernet/freescale/fman/fman.c | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/freescale/fman/fman.c
> b/drivers/net/ethernet/freescale/fman/fman.c
> > index dafd9e1..f36b4eb 100644
> > --- a/drivers/net/ethernet/freescale/fman/fman.c
> > +++ b/drivers/net/ethernet/freescale/fman/fman.c
> > @@ -2868,6 +2868,16 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> >
> > fman->dev = _dev->dev;
> >
> > +#ifdef CONFIG_ARM64
> > +   /* call of_platform_populate in order to probe sub-nodes on arm64 */
> > +   err = of_platform_populate(fm_node, NULL, NULL, _dev->dev);
> > +   if (err) {
> > +   dev_err(_dev->dev, "%s: of_platform_populate() failed\n",
> > +   __func__);
> > +   goto fman_free;
> > +   }
> > +#endif
> 
> Should we remove fsl,fman from the PPC of_device_ids[], so this doesn't
> need an ifdef?
> 
> Why is it #ifdef CONFIG_ARM64 rather than #ifndef CONFIG_PPC?
> 
> -Scott

Igal was working on adding ARM64 support when this patch was created, thus the
choice of #ifdef CONFIG_ARM64. Unifying this for PPC and ARM64 by always calling
of_platform_populate() sounds like the best approach. I would need to 
synchronize
the introduction of this code with the removal of the fsl,fman entry from the
of_device_ids[] array.

Dave, Michael, Scott, is it ok to add to v2 of this patch set the patch that 
removes
the compatible "fsl,fman" from arch/powerpc/platforms/85xx/corenet_generic.c?

Thanks,
Madalin



Re: [PATCH v3 10/15] livepatch: move patching functions into patch.c

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:35, Josh Poimboeuf wrote:
> Move functions related to the actual patching of functions and objects
> into a new patch.c file.
> 
> Signed-off-by: Josh Poimboeuf 

Looks fine.

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH v3 11/15] livepatch: use kstrtobool() in enabled_store()

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:36, Josh Poimboeuf wrote:
> The sysfs enabled value is a boolean, so kstrtobool() is a better fit
> for parsing the input string since it does the range checking for us.
> 
> Suggested-by: Petr Mladek 
> Signed-off-by: Josh Poimboeuf 
> ---
>  kernel/livepatch/core.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index 6a137e1..8ca8a0e 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -408,26 +408,23 @@ static ssize_t enabled_store(struct kobject *kobj, 
> struct kobj_attribute *attr,
>  {
>   struct klp_patch *patch;
>   int ret;
> - unsigned long val;
> + bool enabled;
>  
> - ret = kstrtoul(buf, 10, );
> + ret = kstrtobool(buf, );
>   if (ret)
>   return -EINVAL;

I would return "ret" here. It is -EINVAL as well but... ;-)

Anyway, feel free to use:

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH V2 2/4] powerpc: add helper to check if offset is within rel branch range

2016-12-16 Thread Anju T Sudhakar

Hi Masami,


Thank you for reviewing the patch set.


On Friday 16 December 2016 05:22 PM, Masami Hiramatsu wrote:

On Wed, 14 Dec 2016 21:48:30 +0530
Anju T Sudhakar  wrote:


From: "Naveen N. Rao" 


The coding is OK to me. Please add a description for this patch
here, e.g. what is done by this patch, what kind of branch
instruction will be covered, and why thse checks are needed etc.



Sure. I will give a description for this patch.


Thanks and Regards,

-Anju



Thank you,


Signed-off-by: Naveen N. Rao 
Signed-off-by: Anju T Sudhakar 
---
  arch/powerpc/include/asm/code-patching.h |  1 +
  arch/powerpc/lib/code-patching.c | 24 +++-
  2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 2015b07..75ee4f4 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,7 @@
  #define BRANCH_SET_LINK   0x1
  #define BRANCH_ABSOLUTE   0x2
  
+bool is_offset_in_branch_range(long offset);

  unsigned int create_branch(const unsigned int *addr,
   unsigned long target, int flags);
  unsigned int create_cond_branch(const unsigned int *addr,
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index d5edbeb..f643451 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -32,6 +32,28 @@ int patch_branch(unsigned int *addr, unsigned long target, 
int flags)
return patch_instruction(addr, create_branch(addr, target, flags));
  }
  
+bool is_offset_in_branch_range(long offset)

+{
+   /*
+* Powerpc branch instruction is :
+*
+*  0 6 30   31
+*  +-++---+---+
+*  | opcode  | LI |AA |LK |
+*  +-++---+---+
+*  Where AA = 0 and LK = 0
+*
+* LI is a signed 24 bits integer. The real branch offset is computed
+* by: imm32 = SignExtend(LI:'0b00', 32);
+*
+* So the maximum forward branch should be:
+*   (0x007f << 2) = 0x01fc =  0x1fc
+* The maximum backward branch should be:
+*   (0xff80 << 2) = 0xfe00 = -0x200
+*/
+   return (offset >= -0x200 && offset <= 0x1fc && !(offset & 0x3));
+}
+
  unsigned int create_branch(const unsigned int *addr,
   unsigned long target, int flags)
  {
@@ -43,7 +65,7 @@ unsigned int create_branch(const unsigned int *addr,
offset = offset - (unsigned long)addr;
  
  	/* Check we can represent the target in the instruction format */

-   if (offset < -0x200 || offset > 0x1fc || offset & 0x3)
+   if (!is_offset_in_branch_range(offset))
return 0;
  
  	/* Mask out the flags and target, so they don't step on each other. */

--
2.7.4







Re: [PATCH v3 09/15] livepatch: remove unnecessary object loaded check

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:34, Josh Poimboeuf wrote:
> klp_patch_object()'s callers already ensure that the object is loaded,
> so its call to klp_is_object_loaded() is unnecessary.
> 
> This will also make it possible to move the patching code into a
> separate file.

Fair enough.

> Signed-off-by: Josh Poimboeuf 

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH v3 08/15] livepatch: separate enabled and patched states

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:33, Josh Poimboeuf wrote:
> Once we have a consistency model, patches and their objects will be
> enabled and disabled at different times.  For example, when a patch is
> disabled, its loaded objects' funcs can remain registered with ftrace
> indefinitely until the unpatching operation is complete and they're no
> longer in use.
> 
> It's less confusing if we give them different names: patches can be
> enabled or disabled; objects (and their funcs) can be patched or
> unpatched:
> 
> - Enabled means that a patch is logically enabled (but not necessarily
>   fully applied).
> 
> - Patched means that an object's funcs are registered with ftrace and
>   added to the klp_ops func stack.
> 
> Also, since these states are binary, represent them with booleans
> instead of ints.
> 
> Signed-off-by: Josh Poimboeuf 

Makes sense. The patch is pretty straightforward.

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH v3 05/15] livepatch/powerpc: add TIF_PATCH_PENDING thread flag

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:30, Josh Poimboeuf wrote:
> Add the TIF_PATCH_PENDING thread flag to enable the new livepatch
> per-task consistency model for powerpc.  The bit getting set indicates
> the thread has a pending patch which needs to be applied when the thread
> exits the kernel.
> 
> The bit is included in the _TIF_USER_WORK_MASK macro so that
> do_notify_resume() and klp_update_patch_state() get called when the bit
> is set.
> 
> Signed-off-by: Josh Poimboeuf 

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH v3 04/15] livepatch/x86: add TIF_PATCH_PENDING thread flag

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:29, Josh Poimboeuf wrote:
> Add the TIF_PATCH_PENDING thread flag to enable the new livepatch
> per-task consistency model for x86_64.  The bit getting set indicates
> the thread has a pending patch which needs to be applied when the thread
> exits the kernel.
> 
> The bit is placed in the _TIF_ALLWORK_MASK macro, which results in
> exit_to_usermode_loop() calling klp_update_patch_state() when it's set.
> 
> Signed-off-by: Josh Poimboeuf 

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH 08/11] powerpc/kvm: KVM-HV HPT resizing stub implementation

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:54, David Gibson wrote:
> This patch adds a stub (always failing) implementation of the ioctl()s
> for the HPT resizing PAPR extension.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  4 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 16 
>  arch/powerpc/kvm/book3s_hv.c| 22 ++
>  3 files changed, 42 insertions(+)

I think I'd just squash this with the patch where you do the real
implementation.

 Thomas



Re: [PATCH] powerpc/livepatch: Remove klp_write_module_reloc() stub

2016-12-16 Thread Josh Poimboeuf
On Fri, Dec 16, 2016 at 11:44:24AM +0530, Kamalesh Babulal wrote:
> commit 425595a7fc20 ("livepatch: reuse module loader code
> to write relocations") offloads livepatch module relocation
> write to arch specific module loader code.
> 
> Remove unused klp_write_module_reloc() function stub.
> 
> Signed-off-by: Kamalesh Babulal 

Acked-by: Josh Poimboeuf 

-- 
Josh


Re: [PATCH V2 0/4] OPTPROBES for powerpc

2016-12-16 Thread Balbir Singh


On 15/12/16 03:18, Anju T Sudhakar wrote:
> This is the V2 patchset of the kprobes jump optimization
> (a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
> for kernel developers, enhancing the performance of kprobe has
> got much importance.
> 
> Currently kprobes inserts a trap instruction to probe a running kernel.
> Jump optimization allows kprobes to replace the trap with a branch,
> reducing the probe overhead drastically.
> 
> In this series, conditional branch instructions are not considered for
> optimization as they have to be assessed carefully in SMP systems.
> 
> The kprobe placed on the kretprobe_trampoline during boot time, is also
> optimized in this series. Patch 4/4 furnishes this.
> 
> The first two patches can go independently of the series. The helper 
> functions in these patches are invoked in patch 3/4.
> 
> Performance:
> 
> An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.
>  
> Example:
>  
> Placed a probe at an offset 0x50 in _do_fork().
> *Time Diff here is, difference in time before hitting the probe and
> after the probed instruction. mftb() is employed in kernel/fork.c for
> this purpose.
>  
> # echo 0 > /proc/sys/debug/kprobes-optimization
> Kprobes globally unoptimized
>  [  233.607120] Time Diff = 0x1f0
>  [  233.608273] Time Diff = 0x1ee
>  [  233.609228] Time Diff = 0x203
>  [  233.610400] Time Diff = 0x1ec
>  [  233.611335] Time Diff = 0x200
>  [  233.612552] Time Diff = 0x1f0
>  [  233.613386] Time Diff = 0x1ee
>  [  233.614547] Time Diff = 0x212
>  [  233.615570] Time Diff = 0x206
>  [  233.616819] Time Diff = 0x1f3
>  [  233.617773] Time Diff = 0x1ec
>  [  233.618944] Time Diff = 0x1fb
>  [  233.619879] Time Diff = 0x1f0
>  [  233.621066] Time Diff = 0x1f9
>  [  233.621999] Time Diff = 0x283
>  [  233.623281] Time Diff = 0x24d
>  [  233.624172] Time Diff = 0x1ea
>  [  233.625381] Time Diff = 0x1f0
>  [  233.626358] Time Diff = 0x200
>  [  233.627572] Time Diff = 0x1ed
>  
> # echo 1 > /proc/sys/debug/kprobes-optimization
> Kprobes globally optimized
>  [   70.797075] Time Diff = 0x103
>  [   70.799102] Time Diff = 0x181
>  [   70.801861] Time Diff = 0x15e
>  [   70.803466] Time Diff = 0xf0
>  [   70.804348] Time Diff = 0xd0
>  [   70.805653] Time Diff = 0xad
>  [   70.806477] Time Diff = 0xe0
>  [   70.807725] Time Diff = 0xbe
>  [   70.808541] Time Diff = 0xc3
>  [   70.810191] Time Diff = 0xc7
>  [   70.811007] Time Diff = 0xc0
>  [   70.812629] Time Diff = 0xc0
>  [   70.813640] Time Diff = 0xda
>  [   70.814915] Time Diff = 0xbb
>  [   70.815726] Time Diff = 0xc4
>  [   70.816955] Time Diff = 0xc0
>  [   70.817778] Time Diff = 0xcd
>  [   70.818999] Time Diff = 0xcd
>  [   70.820099] Time Diff = 0xcb
>  [   70.821333] Time Diff = 0xf0
> 
> Implementation:
> ===
>  
> The trap instruction is replaced by a branch to a detour buffer. To address
> the limitation of branch instruction in power architecture, detour buffer
> slot is allocated from a reserved area . This will ensure that the branch
> is within ± 32 MB range. The current kprobes insn caches allocate memory 
> area for insn slots with module_alloc(). This will always be beyond 
> ± 32MB range.
>  

The paragraph is a little confusing. We need the detour buffer to be within
+-32 MB, but then you say we always get memory from module_alloc() beyond
32MB.

> The detour buffer contains a call to optimized_callback() which in turn
> call the pre_handler(). Once the pre-handler is run, the original
> instruction is emulated from the detour buffer itself. Also the detour
> buffer is equipped with a branch back to the normal work flow after the
> probed instruction is emulated.

Does the branch itself use registers that need to be saved? I presume
we are going to rely on the +-32MB, what are the guarantees of success
of such a mechanism?

Balbir Singh.


Re: [PATCH v3 03/15] livepatch: temporary stubs for klp_patch_pending() and klp_update_patch_state()

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:28, Josh Poimboeuf wrote:
> Create temporary stubs for klp_patch_pending() and
> klp_update_patch_state() so we can add TIF_PATCH_PENDING to different
> architectures in separate patches without breaking build bisectability.
> 
> Signed-off-by: Josh Poimboeuf 
> ---
>  include/linux/livepatch.h | 7 ++-
>  kernel/livepatch/core.c   | 3 +++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index 9072f04..60558d8 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -123,10 +123,15 @@ void arch_klp_init_object_loaded(struct klp_patch 
> *patch,
>  int klp_module_coming(struct module *mod);
>  void klp_module_going(struct module *mod);
>  
> +static inline bool klp_patch_pending(struct task_struct *task) { return 
> false; }

I was curious about this. It is implemented correctly in the 13th
patch and it is never used until 13th patch.

> +void klp_update_patch_state(struct task_struct *task);

It seems that the stub for this function is enough.

Well, the extra function is just a cosmetic problem. If it could be
fixed, it would be great. But the patch makes sense:

Reviewed-by: Petr Mladek 

Best Regards,
Petr



Re: [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix

2016-12-16 Thread Balbir Singh


On 16/12/16 06:50, Reza Arbab wrote:
> Memory hotplug is leading to hash page table calls, even on radix:
> 
> ...
> arch_add_memory
> create_section_mapping
> htab_bolt_mapping
> BUG_ON(!ppc_md.hpte_insert);
> 
> To fix, refactor {create,remove}_section_mapping() into hash__ and radix__
> variants. Implement the radix versions by borrowing from existing vmemmap
> and x86 code.
> 
> This passes basic verification of plugging and removing memory, but this 
> stuff is tricky and I'd appreciate extra scrutiny of the series for 
> correctness--in particular, the adaptation of remove_pagetable() from x86.

On quick glance everything seemed alright to me. I'll review each patch and 
Ack/provide comments.
Do we care about alt maps yet?

Balbir Singh.




Re: [PATCH v3 02/15] x86/entry: define _TIF_ALLWORK_MASK flags explicitly

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:27, Josh Poimboeuf wrote:
> The _TIF_ALLWORK_MASK macro automatically includes the least-significant
> 16 bits of the thread_info flags, which is less than obvious and tends
> to create confusion and surprises when reading or modifying the code.
> 
> Define the flags explicitly.
> 
> Signed-off-by: Josh Poimboeuf 
> ---
>  arch/x86/include/asm/thread_info.h | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/thread_info.h 
> b/arch/x86/include/asm/thread_info.h
> index ad6f5eb0..1fe6043 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -73,9 +73,6 @@ struct thread_info {
>   * thread information flags
>   * - these are process state flags that various assembly files
>   *   may need to access
> - * - pending work-to-be-done flags are in LSW

Yup, this is not true because also some flags from the most
significant bits are in the _TIF_ALLWORK_MASK.

> - * - other flags in MSW
> - * Warning: layout of LSW is hardcoded in entry.S
>   */
>  #define TIF_SYSCALL_TRACE0   /* syscall trace active */
>  #define TIF_NOTIFY_RESUME1   /* callback before returning to user */
> @@ -133,8 +130,10 @@ struct thread_info {
>  
>  /* work to do on any return to user space */
>  #define _TIF_ALLWORK_MASK\
> - ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT |   \
> - _TIF_NOHZ)
> + (_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |\
> +  _TIF_SINGLESTEP | _TIF_NEED_RESCHED | _TIF_SYSCALL_EMU |   \
> +  _TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE |   \
> +  _TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ)

All flags are sorted by the number except for
_TIF_SINGLESTEP and _TIF_NEED_RESCHED  ;-)

The patch does not change the existing behavior. The same
existing flags are listed.

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH V2 4/4] arch/powerpc: Optimize kprobe in kretprobe_trampoline

2016-12-16 Thread Masami Hiramatsu
On Wed, 14 Dec 2016 21:48:28 +0530
Anju T Sudhakar  wrote:

> Kprobe placed on the  kretprobe_trampoline during boot time can be 
> optimized, since the instruction at probe point is a 'nop'.
> 

How simple & clever way! ;)

Acked-by: Masami Hiramatsu 

Thank you!

> Signed-off-by: Anju T Sudhakar 
> ---
>  arch/powerpc/kernel/kprobes.c   | 8 
>  arch/powerpc/kernel/optprobes.c | 7 +++
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index e785cc9..5b0fd07 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -282,6 +282,7 @@ asm(".global kretprobe_trampoline\n"
>   ".type kretprobe_trampoline, @function\n"
>   "kretprobe_trampoline:\n"
>   "nop\n"
> + "blr\n"
>   ".size kretprobe_trampoline, .-kretprobe_trampoline\n");
>  
>  /*
> @@ -334,6 +335,13 @@ static int __kprobes trampoline_probe_handler(struct 
> kprobe *p,
>  
>   kretprobe_assert(ri, orig_ret_address, trampoline_address);
>   regs->nip = orig_ret_address;
> + /*
> +  * Make LR point to the orig_ret_address.
> +  * When the 'nop' inside the kretprobe_trampoline
> +  * is optimized, we can do a 'blr' after executing the
> +  * detour buffer code.
> +  */
> + regs->link = orig_ret_address;
>  
>   reset_current_kprobe();
>   kretprobe_hash_unlock(current, );
> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
> index ecba221..5e4c254 100644
> --- a/arch/powerpc/kernel/optprobes.c
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -72,12 +72,11 @@ static unsigned long can_optimize(struct kprobe *p)
>  
>   /*
>* kprobe placed for kretprobe during boot time
> -  * is not optimizing now.
> -  *
> -  * TODO: Optimize kprobe in kretprobe_trampoline
> +  * has a 'nop' instruction, which can be emulated.
> +  * So further checks can be skipped.
>*/
>   if (p->addr == (kprobe_opcode_t *)_trampoline)
> - return 0;
> + return (unsigned long)p->addr + sizeof(kprobe_opcode_t);
>  
>   /*
>* We only support optimizing kernel addresses, but not
> -- 
> 2.7.4
> 


-- 
Masami Hiramatsu 


Re: [PATCH V2 3/4] arch/powerpc: Implement Optprobes

2016-12-16 Thread Masami Hiramatsu
On Wed, 14 Dec 2016 21:48:27 +0530
Anju T Sudhakar  wrote:

> Detour buffer contains instructions to create an in memory pt_regs.
> After the execution of the pre-handler, a call is made for instruction 
> emulation.
> The NIP is determined in advanced through dummy instruction emulation and a 
> branch
> instruction is created to the NIP at the end of the trampoline.
> 
> Instruction slot for detour buffer is allocated from the reserved area.
> For the time being, 64KB is reserved in memory for this purpose.
> 
> Instructions which can be emulated using analyse_instr() are suppliants
> for optimization. Before optimization ensure that the address range
> between the detour buffer allocated and the instruction being probed
> is within ± 32MB.

Thank you for updating!
I think this has no critical issue, but I have some comments on it.
(just cleanup and hardenings)
Please see below.

> 
> Signed-off-by: Anju T Sudhakar 
> Signed-off-by: Naveen N. Rao 
> ---
>  .../features/debug/optprobes/arch-support.txt  |   2 +-
>  arch/powerpc/Kconfig   |   1 +
>  arch/powerpc/include/asm/kprobes.h |  23 +-
>  arch/powerpc/include/asm/sstep.h   |   1 +
>  arch/powerpc/kernel/Makefile   |   1 +
>  arch/powerpc/kernel/optprobes.c| 333 
> +
>  arch/powerpc/kernel/optprobes_head.S   | 135 +
>  arch/powerpc/lib/sstep.c   |  22 ++
>  8 files changed, 516 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/kernel/optprobes.c
>  create mode 100644 arch/powerpc/kernel/optprobes_head.S
> 
> diff --git a/Documentation/features/debug/optprobes/arch-support.txt 
> b/Documentation/features/debug/optprobes/arch-support.txt
> index b8999d8..45bc99d 100644
> --- a/Documentation/features/debug/optprobes/arch-support.txt
> +++ b/Documentation/features/debug/optprobes/arch-support.txt
> @@ -27,7 +27,7 @@
>  |   nios2: | TODO |
>  |openrisc: | TODO |
>  |  parisc: | TODO |
> -| powerpc: | TODO |
> +| powerpc: |  ok  |
>  |s390: | TODO |
>  |   score: | TODO |
>  |  sh: | TODO |
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index c7f120a..d563f0a 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -98,6 +98,7 @@ config PPC
>   select HAVE_IOREMAP_PROT
>   select HAVE_EFFICIENT_UNALIGNED_ACCESS if !(CPU_LITTLE_ENDIAN && 
> POWER7_CPU)
>   select HAVE_KPROBES
> + select HAVE_OPTPROBES if PPC64
>   select HAVE_ARCH_KGDB
>   select HAVE_KRETPROBES
>   select HAVE_ARCH_TRACEHOOK
> diff --git a/arch/powerpc/include/asm/kprobes.h 
> b/arch/powerpc/include/asm/kprobes.h
> index 2c9759bd..739ddc5 100644
> --- a/arch/powerpc/include/asm/kprobes.h
> +++ b/arch/powerpc/include/asm/kprobes.h
> @@ -38,7 +38,22 @@ struct pt_regs;
>  struct kprobe;
>  
>  typedef ppc_opcode_t kprobe_opcode_t;
> -#define MAX_INSN_SIZE 1
> +
> +extern kprobe_opcode_t optinsn_slot;
> +
> +/* Optinsn template address */
> +extern kprobe_opcode_t optprobe_template_entry[];
> +extern kprobe_opcode_t optprobe_template_op_address[];
> +extern kprobe_opcode_t optprobe_template_call_handler[];
> +extern kprobe_opcode_t optprobe_template_insn[];
> +extern kprobe_opcode_t optprobe_template_call_emulate[];
> +extern kprobe_opcode_t optprobe_template_ret[];
> +extern kprobe_opcode_t optprobe_template_end[];
> +
> +#define MAX_INSN_SIZE1
> +#define MAX_OPTIMIZED_LENGTH 4
> +#define MAX_OPTINSN_SIZE (optprobe_template_end - 
> optprobe_template_entry)
> +#define RELATIVEJUMP_SIZE4

These size/length macros seems a bit odd. I guess MAX_INSN_SIZE
is based on sizeof(MAX_INSN_SIZE), but others are in byte.
Could you fix that? For example, define it with 
sizeof(kprobe_opcode_t), and add comments on it, etc.

>  
>  #ifdef PPC64_ELF_ABI_v2
>  /* PPC64 ABIv2 needs local entry point */
> @@ -124,6 +139,12 @@ struct kprobe_ctlblk {
>   struct prev_kprobe prev_kprobe;
>  };
>  
> +struct arch_optimized_insn {
> + kprobe_opcode_t copied_insn[1];
> + /* detour buffer */
> + kprobe_opcode_t *insn;
> +};
> +
>  extern int kprobe_exceptions_notify(struct notifier_block *self,
>   unsigned long val, void *data);
>  extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
> diff --git a/arch/powerpc/include/asm/sstep.h 
> b/arch/powerpc/include/asm/sstep.h
> index d3a42cc..f7ad425 100644
> --- a/arch/powerpc/include/asm/sstep.h
> +++ b/arch/powerpc/include/asm/sstep.h
> @@ -87,3 +87,4 @@ struct instruction_op {
>  
>  extern int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
>unsigned int instr);
> +extern bool is_conditional_branch(unsigned int instr);
> diff --git 

Re: [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:53, David Gibson wrote:
> This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
> advertise whether KVM is capable of handling the PAPR extensions for
> resizing the hashed page table during guest runtime.
> 
> At present, HPT resizing is possible with KVM PR without kernel
> modification, since the HPT is managed within qemu.  It's not possible yet
> with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
> use other capabilities which (by accident) reveal whether PR or HV is in
> use to know if it can advertise HPT resizing capability to the guest.
> 
> To avoid ambiguity with existing kernels, the encoding is a bit odd.
> 0 means "unknown" since that's what previous kernels will return
> 1 means "HPT resize possible if available if and only if the HPT is 
> allocated in
>   userspace, rather than in the kernel".  Userspace can check
>   KVM_CAP_PPC_ALLOC_HTAB to determine if that's the case.  In practice
>   this will give the same results as userspace's fallback check.
> 2 will mean "HPT resize available and implemented via ioctl()s
>   KVM_PPC_RESIZE_HPT_PREPARE and KVM_PPC_RESIZE_HPT_COMMIT"
> 
> For now we always return 1, but the intention is to return 2 once HPT
> resize is implemented for KVM HV.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/kvm/powerpc.c |  3 +++
>  include/uapi/linux/kvm.h   | 10 ++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index efd1183..bb23923 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_SPAPR_MULTITCE:
>   r = 1;
>   break;
> + case KVM_CAP_SPAPR_RESIZE_HPT:
> + r = 1; /* resize allowed only if HPT is outside kernel */
> + break;
>  #endif
>   case KVM_CAP_PPC_HTM:
>   r = cpu_has_feature(CPU_FTR_TM_COMP) &&
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index cac48ed..904afe0 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -685,6 +685,12 @@ struct kvm_ppc_smmu_info {
>   struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
>  };
>  
> +/* for KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} */
> +struct kvm_ppc_resize_hpt {
> + __u64 flags;
> + __u32 shift;
> +};

I think you should also add a final "__u32 pad" to that struct to make
sure that it is naturally aligned (like it is done with struct
kvm_coalesced_mmio_zone already for example).

 Thomas



Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:26, Josh Poimboeuf wrote:
> For live patching and possibly other use cases, a stack trace is only
> useful if it can be assured that it's completely reliable.  Add a new
> save_stack_trace_tsk_reliable() function to achieve that.
> 
> Scenarios which indicate that a stack trace may be unreliable:
> 
> - running task

It seems that this has to be enforced by save_stack_trace_tsk_reliable()
caller. It should be mentioned in the function description.


> - interrupt stack

I guess that it is detected by saved regs on the stack. And it covers
also dynamic changes like kprobes. Do I get it correctly, please? 

What about ftrace? Is ftrace without regs safe and detected?


> - preemption

I wonder if some very active kthreads might almost always be
preempted using irq in preemptive kernel. Then they block
the conversion with the non-reliable stacks. Have you noticed
such problems, please?


> - corrupted stack data
> - stack grows the wrong way

This is detected in unwind_next_frame() and passed via state->error.
Am I right?


> - stack walk doesn't reach the bottom
> - user didn't provide a large enough entries array
> 
> Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
> determine at build time whether the function is implemented.
> 
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index 0653788..3e0cf5e 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -74,6 +74,64 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct 
> stack_trace *trace)
>  }
>  EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
>  
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +static int __save_stack_trace_reliable(struct stack_trace *trace,
> +struct task_struct *task)
> +{
> + struct unwind_state state;
> + struct pt_regs *regs;
> + unsigned long addr;
> +
> + for (unwind_start(, task, NULL, NULL); !unwind_done();
> +  unwind_next_frame()) {
> +
> + regs = unwind_get_entry_regs();
> + if (regs) {
> + /*
> +  * Preemption and page faults on the stack can make
> +  * frame pointers unreliable.
> +  */
> + if (!user_mode(regs))
> + return -1;

By other words, it we find regs on the stack, it almost always mean
a non-reliable stack. The only exception is when we are in the
userspace mode. Do I get it correctly, please?

> +
> + /*
> +  * This frame contains the (user mode) pt_regs at the
> +  * end of the stack.  Finish the unwind.
> +  */
> + unwind_next_frame();
> + break;
> + }
> +
> + addr = unwind_get_return_address();
> + if (!addr || save_stack_address(trace, addr, false))
> + return -1;
> + }
> +
> + if (!unwind_done() || unwind_error())
> + return -1;
> +
> + if (trace->nr_entries < trace->max_entries)
> + trace->entries[trace->nr_entries++] = ULONG_MAX;
> +
> + return 0;
> +}

Great work! I am surprised that it looks so straightforward.

I still have to think and investigate it more. But it looks
very promissing.

Best Regards,
Petr


Re: [PATCH 3/3] powerpc/corenet: add support for the kmcent2 board

2016-12-16 Thread Valentin Longchamp
On 15/12/16 15:00, Joakim Tjernlund wrote:
> On Thu, 2016-12-15 at 14:22 +0100, Valentin Longchamp wrote:
>> This board is built around Freescale's T1040 SoC.
>>
>> The peripherals used by this design are:
>> - DDR3 RAM with SPD support
>> - parallel NOR Flash as boot medium
>> - 1 PCIe bus (PCIe1 x1)
>> - 3 FMAN Ethernet devices (FMAN1 DTSEC1/2/5)
>> - 4 IFC bus devices:
>>   - NOR flash
>>   - NAND flash
>>   - QRIO reset/power mgmt CPLD
>>   - BFTIC chassis management CPLD
>> - 2 I2C buses
>> - 1 SPI bus
>> - HDLC bus with the QE's UCC1
>> - last but not least, the mandatory serial port
>>
>> The board can be used with the corenet32_smp_defconfig.
>>
>> Signed-off-by: Valentin Longchamp 
>> ---
>>  arch/powerpc/boot/dts/fsl/kmcent2.dts | 303 
>> ++
>>  arch/powerpc/platforms/85xx/corenet_generic.c |   1 +
>>  2 files changed, 304 insertions(+)
>>  create mode 100644 arch/powerpc/boot/dts/fsl/kmcent2.dts
>>
>> diff --git a/arch/powerpc/boot/dts/fsl/kmcent2.dts 
>> b/arch/powerpc/boot/dts/fsl/kmcent2.dts
>> new file mode 100644
>> index 000..47afa43
>> --- /dev/null
>> +++ b/arch/powerpc/boot/dts/fsl/kmcent2.dts
>> @@ -0,0 +1,303 @@
>> +/*
>> + * Keymile kmcent2 Device Tree Source, based on T1040RDB DTS
>> + *
>> + * (C) Copyright 2016
>> + * Valentin Longchamp, Keymile AG, valentin.longch...@keymile.com
>> + *
>> + * Copyright 2014 - 2015 Freescale Semiconductor Inc.
>> + *
>> + * This program is free software; you can redistribute  it and/or modify it
>> + * under  the terms of  the GNU General  Public License as published by the
>> + * Free Software Foundation;  either version 2 of the  License, or (at your
>> + * option) any later version.
>> + */
>> +
> 
> [SNIP]
> 
>> +
>> +ucc_hdlc: ucc@2000 {
>> +device_type = "hdlc";
>> +compatible = "fsl,ucc-hdlc";
>> +rx-clock-name = "clk9";
>> +tx-clock-name = "clk9";
> 
> Should it be clk9 on both tx and rx clock?
> 

Yeah, why not ? The bus clock is generated somewhere else and is connected to
the clk9 input. Or maybe have I not understood your question ?


Re: [PATCH 07/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper()

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:54, David Gibson wrote:
> The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used
> to force all guest mappings of a particular host page to be set ABSENT, and
> removed from the reverse mappings.
> 
> For HPT resizing, we will have some cases where we want to set just a
> single guest HPTE ABSENT and remove its reverse mappings.  To prepare with
> this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE,
> moving it to a new helper function.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 77 
> +
>  1 file changed, 44 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 8e5ac2f..cd145eb 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -740,13 +740,53 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned 
> long hva,
>   return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
>  }
>  
> +/* Must be called with both HPTE and rmap locked */
> +static void kvmppc_unmap_hpte(struct kvm *kvm, unsigned long i,
> +   unsigned long *rmapp, unsigned long gfn)
> +{
> + __be64 *hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
> + struct revmap_entry *rev = kvm->arch.hpt.rev;
> + unsigned long j, h;
> + unsigned long ptel, psize, rcbits;
> +
> + j = rev[i].forw;
> + if (j == i) {
> + /* chain is now empty */
> + *rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
> + } else {
> + /* remove i from chain */
> + h = rev[i].back;
> + rev[h].forw = j;
> + rev[j].back = h;
> + rev[i].forw = rev[i].back = i;
> + *rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
> + }
> +
> + /* Now check and modify the HPTE */
> + ptel = rev[i].guest_rpte;
> + psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
> + if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
> + hpte_rpn(ptel, psize) == gfn) {
> + hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
> + kvmppc_invalidate_hpte(kvm, hptep, i);
> + hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
> + /* Harvest R and C */
> + rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
> + *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
> + if (rcbits & HPTE_R_C)
> + kvmppc_update_rmap_change(rmapp, psize);
> + if (rcbits & ~rev[i].guest_rpte) {
> + rev[i].guest_rpte = ptel | rcbits;
> + note_hpte_modification(kvm, [i]);
> + }
> + }   

Superfluous TAB at the end of above line.

Apart from that, the patch looks fine to me.

 Thomas



Re: [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:53, David Gibson wrote:
> The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
> table (HPT) that userspace expects a guest VM to have, and is also used to
> clear that HPT when necessary (e.g. guest reboot).
> 
> At present, once the ioctl() is called for the first time, the HPT size can
> never be changed thereafter - it will be cleared but always sized as from
> the first call.
> 
> With upcoming HPT resize implementation, we're going to need to allow
> userspace to resize the HPT at reset (to change it back to the default size
> if the guest changed it).
> 
> So, we need to allow this ioctl() to change the HPT size.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 53 
> -
>  arch/powerpc/kvm/book3s_hv.c|  5 +---
>  3 files changed, 30 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 41575b8..3b837bc 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
>  
>  extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
>  extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
> -extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
> +extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
>  extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
>  extern long kvmppc_prepare_vrma(struct kvm *kvm,
>   struct kvm_userspace_memory_region *mem);
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 68bb228..8e5ac2f 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -104,10 +104,22 @@ void kvmppc_set_hpt(struct kvm *kvm, struct 
> kvm_hpt_info *info)
>   info->virt, (long)info->order, kvm->arch.lpid);
>  }
>  
> -long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> +void kvmppc_free_hpt(struct kvm_hpt_info *info)
> +{
> + vfree(info->rev);
> + if (info->cma)
> + kvm_free_hpt_cma(virt_to_page(info->virt),
> +  1 << (info->order - PAGE_SHIFT));
> + else
> + free_pages(info->virt, info->order - PAGE_SHIFT);
> + info->virt = 0;
> + info->order = 0;
> +}

Why do you need to move kvmppc_free_hpt() around? Seems like unecessary
code churn to me?

> +long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
>  {
>   long err = -EBUSY;
> - long order;
> + struct kvm_hpt_info info;
>  
>   mutex_lock(>lock);
>   if (kvm->arch.hpte_setup_done) {
> @@ -119,8 +131,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
> *htab_orderp)
>   goto out;
>   }
>   }
> - if (kvm->arch.hpt.virt) {
> - order = kvm->arch.hpt.order;
> + if (kvm->arch.hpt.order == order) {
> + /* We already have a suitable HPT */
> +
>   /* Set the entire HPT to 0, i.e. invalid HPTEs */
>   memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
>   /*
> @@ -129,33 +142,23 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
> *htab_orderp)
>   kvmppc_rmap_reset(kvm);
>   /* Ensure that each vcpu will flush its TLB on next entry. */
>   cpumask_setall(>arch.need_tlb_flush);
> - *htab_orderp = order;
>   err = 0;
> - } else {
> - struct kvm_hpt_info info;
> -
> - err = kvmppc_allocate_hpt(, *htab_orderp);
> - if (err < 0)
> - goto out;
> - kvmppc_set_hpt(kvm, );
> + goto out;
>   }
> - out:
> +
> + if (kvm->arch.hpt.virt)
> + kvmppc_free_hpt(>arch.hpt);
> +
> + err = kvmppc_allocate_hpt(, order);
> + if (err < 0)
> + goto out;
> + kvmppc_set_hpt(kvm, );
> +
> +out:
>   mutex_unlock(>lock);
>   return err;
>  }
>  
> -void kvmppc_free_hpt(struct kvm_hpt_info *info)
> -{
> - vfree(info->rev);
> - if (info->cma)
> - kvm_free_hpt_cma(virt_to_page(info->virt),
> -  1 << (info->order - PAGE_SHIFT));
> - else
> - free_pages(info->virt, info->order - PAGE_SHIFT);
> - info->virt = 0;
> - info->order = 0;
> -}
> -
>  /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
>  static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize)
>  {
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 71c5adb..957e473 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3600,12 +3600,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
>   r = -EFAULT;

Re: [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:53, David Gibson wrote:
> Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
> and sets it up as the active page table for a VM.  For the upcoming HPT
> resize implementation we're going to want to allocate HPTs separately from
> activating them.
> 
> So, split the allocation itself out into kvmppc_allocate_hpt() and perform
> the activation with a new kvmppc_set_hpt() function.  Likewise we split
> kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
> which unsets it as an active HPT, then frees it.
> 
> We also move the logic to fall back to smaller HPT sizes if the first try
> fails into the single caller which used that behaviour,
> kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
> that previously if the initial attempt at CMA allocation failed, we would
> fall back to attempting smaller sizes with the page allocator.  Now, we
> try first CMA, then the page allocator at each size.  As far as I can tell
> this change should be harmless.
> 
> To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
> call to kvmppc_free_lpid() that was there, we move to the single caller.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h |  3 ++
>  arch/powerpc/include/asm/kvm_ppc.h   |  5 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 90 
> 
>  arch/powerpc/kvm/book3s_hv.c | 18 +--
>  4 files changed, 65 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 8810327..6dc4004 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -22,6 +22,9 @@
>  
>  #include 
>  
> +/* Power architecture requires HPT is at least 256kB */
> +#define PPC_MIN_HPT_ORDER18
> +
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu 
> *vcpu)
>  {
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 3db6be9..41575b8 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu 
> *vcpu);
>  extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
>  extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
>  
> -extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
> +extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
> +extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
>  extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
> -extern void kvmppc_free_hpt(struct kvm *kvm);
> +extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
>  extern long kvmppc_prepare_vrma(struct kvm *kvm,
>   struct kvm_userspace_memory_region *mem);
>  extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index fe88132..68bb228 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -40,76 +40,70 @@
>  
>  #include "trace_hv.h"
>  
> -/* Power architecture requires HPT is at least 256kB */
> -#define PPC_MIN_HPT_ORDER18
> -
>  static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
>   long pte_index, unsigned long pteh,
>   unsigned long ptel, unsigned long *pte_idx_ret);
>  static void kvmppc_rmap_reset(struct kvm *kvm);
>  
> -long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> +int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
>  {
>   unsigned long hpt = 0;
> - struct revmap_entry *rev;
> + int cma = 0;
>   struct page *page = NULL;
> - long order = KVM_DEFAULT_HPT_ORDER;
> -
> - if (htab_orderp) {
> - order = *htab_orderp;
> - if (order < PPC_MIN_HPT_ORDER)
> - order = PPC_MIN_HPT_ORDER;
> - }

Not sure, but as far as I can see, *htab_orderp could still be provided
from userspace via kvm_arch_vm_ioctl_hv() -> kvmppc_alloc_reset_hpt() ?
In that case, I think you should still check that the order is bigger
than PPC_MIN_HPT_ORDER, and return an error code otherwise?
Or if you feel confident that this value can never be supplied by
userspace anymore, add at least a BUG_ON(order < PPC_MIN_HPT_ORDER) here?

> + struct revmap_entry *rev;
> + unsigned long npte;
>  
> - kvm->arch.hpt.cma = 0;
> + hpt = 0;
> + cma = 0;

Both hpt and cma are initialized to zero in the variable declarations
already, so the above two lines are redundant.

>   page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
>   if (page) {
>   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
>  

Re: [PATCH V2 2/4] powerpc: add helper to check if offset is within rel branch range

2016-12-16 Thread Masami Hiramatsu
On Wed, 14 Dec 2016 21:48:30 +0530
Anju T Sudhakar  wrote:

> From: "Naveen N. Rao" 
> 

The coding is OK to me. Please add a description for this patch
here, e.g. what is done by this patch, what kind of branch 
instruction will be covered, and why thse checks are needed etc.

Thank you,

> Signed-off-by: Naveen N. Rao 
> Signed-off-by: Anju T Sudhakar 
> ---
>  arch/powerpc/include/asm/code-patching.h |  1 +
>  arch/powerpc/lib/code-patching.c | 24 +++-
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index 2015b07..75ee4f4 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -22,6 +22,7 @@
>  #define BRANCH_SET_LINK  0x1
>  #define BRANCH_ABSOLUTE  0x2
>  
> +bool is_offset_in_branch_range(long offset);
>  unsigned int create_branch(const unsigned int *addr,
>  unsigned long target, int flags);
>  unsigned int create_cond_branch(const unsigned int *addr,
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index d5edbeb..f643451 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -32,6 +32,28 @@ int patch_branch(unsigned int *addr, unsigned long target, 
> int flags)
>   return patch_instruction(addr, create_branch(addr, target, flags));
>  }
>  
> +bool is_offset_in_branch_range(long offset)
> +{
> + /*
> +  * Powerpc branch instruction is :
> +  *
> +  *  0 6 30   31
> +  *  +-++---+---+
> +  *  | opcode  | LI |AA |LK |
> +  *  +-++---+---+
> +  *  Where AA = 0 and LK = 0
> +  *
> +  * LI is a signed 24 bits integer. The real branch offset is computed
> +  * by: imm32 = SignExtend(LI:'0b00', 32);
> +  *
> +  * So the maximum forward branch should be:
> +  *   (0x007f << 2) = 0x01fc =  0x1fc
> +  * The maximum backward branch should be:
> +  *   (0xff80 << 2) = 0xfe00 = -0x200
> +  */
> + return (offset >= -0x200 && offset <= 0x1fc && !(offset & 0x3));
> +}
> +
>  unsigned int create_branch(const unsigned int *addr,
>  unsigned long target, int flags)
>  {
> @@ -43,7 +65,7 @@ unsigned int create_branch(const unsigned int *addr,
>   offset = offset - (unsigned long)addr;
>  
>   /* Check we can represent the target in the instruction format */
> - if (offset < -0x200 || offset > 0x1fc || offset & 0x3)
> + if (!is_offset_in_branch_range(offset))
>   return 0;
>  
>   /* Mask out the flags and target, so they don't step on each other. */
> -- 
> 2.7.4
> 


-- 
Masami Hiramatsu 


[PATCH] cpufreq: powernv: Add boost files to export ultra-turbo frequencies

2016-12-16 Thread Shilpasri G Bhat
In P8+, Workload Optimized Frequency(WOF) provides the capability to
boost the cpu frequency based on the utilization of the other cpus
running in the chip. The On-Chip-Controller(OCC) firmware will control
the achievability of these frequencies depending on the power headroom
available in the chip. Currently the ultra-turbo frequencies provided
by this feature are exported along with the turbo and sub-turbo
frequencies as scaling_available_frequencies. This patch will export
the ultra-turbo frequencies separately as scaling_boost_frequencies in
WOF enabled systems. This patch will add the boost sysfs file which
can be used to disable/enable ultra-turbo frequencies.

Signed-off-by: Shilpasri G Bhat 
---
 drivers/cpufreq/powernv-cpufreq.c | 48 ---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 37671b5..56dfd91 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -144,6 +144,7 @@ enum throttle_reason_type {
unsigned int max;
unsigned int nominal;
unsigned int nr_pstates;
+   bool wof_enabled;
 } powernv_pstate_info;
 
 /* Use following macros for conversions between pstate_id and index */
@@ -203,6 +204,7 @@ static int init_powernv_pstates(void)
const __be32 *pstate_ids, *pstate_freqs;
u32 len_ids, len_freqs;
u32 pstate_min, pstate_max, pstate_nominal;
+   u32 pstate_turbo, pstate_ultra_turbo;
 
power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
if (!power_mgt) {
@@ -225,6 +227,25 @@ static int init_powernv_pstates(void)
pr_warn("ibm,pstate-nominal not found\n");
return -ENODEV;
}
+
+   if (of_property_read_u32(power_mgt, "ibm,pstate-ultra-turbo",
+_ultra_turbo)) {
+   powernv_pstate_info.wof_enabled = false;
+   goto next;
+   }
+
+   if (of_property_read_u32(power_mgt, "ibm,pstate-turbo",
+_turbo)) {
+   powernv_pstate_info.wof_enabled = false;
+   goto next;
+   }
+
+   if (pstate_turbo == pstate_ultra_turbo)
+   powernv_pstate_info.wof_enabled = false;
+   else
+   powernv_pstate_info.wof_enabled = true;
+
+next:
pr_info("cpufreq pstate min %d nominal %d max %d\n", pstate_min,
pstate_nominal, pstate_max);
 
@@ -268,6 +289,13 @@ static int init_powernv_pstates(void)
powernv_pstate_info.nominal = i;
else if (id == pstate_min)
powernv_pstate_info.min = i;
+
+   if (powernv_pstate_info.wof_enabled && id == pstate_turbo) {
+   int j;
+
+   for (j = i - 1; j >= (int)powernv_pstate_info.max; j--)
+   powernv_freqs[j].flags = CPUFREQ_BOOST_FREQ;
+   }
}
 
/* End of list marker entry */
@@ -305,9 +333,12 @@ static ssize_t cpuinfo_nominal_freq_show(struct 
cpufreq_policy *policy,
 struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
__ATTR_RO(cpuinfo_nominal_freq);
 
+#define SCALING_BOOST_FREQS_ATTR_INDEX 2
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
_freq_attr_scaling_available_freqs,
_freq_attr_cpuinfo_nominal_freq,
+   _freq_attr_scaling_boost_freqs,
NULL,
 };
 
@@ -1013,11 +1044,22 @@ static int __init powernv_cpufreq_init(void)
register_reboot_notifier(_cpufreq_reboot_nb);
opal_message_notifier_register(OPAL_MSG_OCC, _cpufreq_opal_nb);
 
+   if (powernv_pstate_info.wof_enabled)
+   powernv_cpufreq_driver.boost_enabled = true;
+   else
+   powernv_cpu_freq_attr[SCALING_BOOST_FREQS_ATTR_INDEX] = NULL;
+
rc = cpufreq_register_driver(_cpufreq_driver);
-   if (!rc)
-   return 0;
+   if (rc) {
+   pr_info("Failed to register the cpufreq driver (%d)\n", rc);
+   goto clean_notifiers;
+   }
 
-   pr_info("Failed to register the cpufreq driver (%d)\n", rc);
+   if (powernv_pstate_info.wof_enabled)
+   cpufreq_enable_boost_support();
+
+   return 0;
+clean_notifiers:
unregister_all_notifiers();
clean_chip_info();
 out:
-- 
1.8.3.1



Re: [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:53, David Gibson wrote:
> Currently the kvm_hpt_info structure stores the hashed page table's order,
> and also the number of HPTEs it contains and a mask for its size.  The
> last two can be easily derived from the order, so remove them and just
> calculate them as necessary with a couple of helper inlines.
> 
> Signed-off-by: David Gibson 
> ---
[...]
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index b5799d1..fe88132 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>  
>   kvm->arch.hpt.virt = hpt;
>   kvm->arch.hpt.order = order;
> - /* HPTEs are 2**4 bytes long */
> - kvm->arch.hpt.npte = 1ul << (order - 4);
> - /* 128 (2**7) bytes in each HPTEG */
> - kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
>  
>   atomic64_set(>arch.mmio_update, 0);
>  
>   /* Allocate reverse map array */
> - rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
> + rev = vmalloc(sizeof(struct revmap_entry) * 
> kvmppc_hpt_npte(>arch.hpt));
>   if (!rev) {
>   pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
>   goto out_freehpt;
> @@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
> kvm_memory_slot *memslot,
>   if (npages > 1ul << (40 - porder))
>   npages = 1ul << (40 - porder);
>   /* Can't use more than 1 HPTE per HPTEG */
> - if (npages > kvm->arch.hpt.mask + 1)
> - npages = kvm->arch.hpt.mask + 1;
> + if (npages > kvmppc_hpt_mask(>arch.hpt) + 1)
> + npages = kvmppc_hpt_mask(>arch.hpt) + 1;
>  
>   hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
>   HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
> @@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
> kvm_memory_slot *memslot,
>   for (i = 0; i < npages; ++i) {
>   addr = i << porder;
>   /* can't use hpt_hash since va > 64 bits */
> - hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & 
> kvm->arch.hpt.mask;
> + hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
> + & kvmppc_hpt_mask(>arch.hpt);
>   /*
>* We assume that the hash table is empty and no
>* vcpus are using it at this stage.  Since we create

kvmppc_hpt_mask() is now called three times in kvmppc_map_vrma() ... you
could use a local variable to store the value so that the calculation
has only be done once here.

> @@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
> __user *buf,
>  
>   /* Skip uninteresting entries, i.e. clean on not-first pass */
>   if (!first_pass) {
> - while (i < kvm->arch.hpt.npte &&
> + while (i < kvmppc_hpt_npte(>arch.hpt) &&
>  !hpte_dirty(revp, hptp)) {
>   ++i;
>   hptp += 2;
> @@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
> __user *buf,
>   hdr.index = i;
>  
>   /* Grab a series of valid entries */
> - while (i < kvm->arch.hpt.npte &&
> + while (i < kvmppc_hpt_npte(>arch.hpt) &&
>  hdr.n_valid < 0x &&
>  nb + HPTE_SIZE < count &&
>  record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
> @@ -1332,7 +1329,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
> __user *buf,
>   ++revp;
>   }
>   /* Now skip invalid entries while we can */
> - while (i < kvm->arch.hpt.npte &&
> + while (i < kvmppc_hpt_npte(>arch.hpt) &&
>  hdr.n_invalid < 0x &&
>  record_hpte(flags, hptp, hpte, revp, 0, first_pass)) {
>   /* found an invalid entry */

Dito, you could use a local variable to store the value from
kvmppc_hpt_npte()

Anyway, apart from these nits, patch looks fine to me, so:

Reviewed-by: Thomas Huth 



Re: [PATCH] perf TUI: Don't throw error for zero length symbols

2016-12-16 Thread Ravi Bangoria
Hi Arnaldo,

Can you please pick this up if it looks good?

-Ravi

On Tuesday 22 November 2016 02:10 PM, Ravi Bangoria wrote:
> perf report (with TUI) exits with error when it finds a sample of zero
> length symbol(i.e. addr == sym->start == sym->end). Actually these are
> valid samples. Don't exit TUI and show report with such symbols.
>
> Link: https://lkml.org/lkml/2016/10/8/189
>
> Reported-by: Anton Blanchard 
> Signed-off-by: Ravi Bangoria 
> ---
>  tools/perf/util/annotate.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index aeb5a44..430d039 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct symbol *sym, 
> struct map *map,
>
>   pr_debug3("%s: addr=%#" PRIx64 "\n", __func__, map->unmap_ip(map, 
> addr));
>
> - if (addr < sym->start || addr >= sym->end) {
> + if ((addr < sym->start || addr >= sym->end) &&
> + (addr != sym->end || sym->start != sym->end)) {
>   pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#" PRIx64 ", 
> addr=%#" PRIx64 ", end=%#" PRIx64 "\n",
>  __func__, __LINE__, sym->name, sym->start, addr, 
> sym->end);
>   return -ERANGE;



Re: [PATCH] powerpc/livepatch: Remove klp_write_module_reloc() stub

2016-12-16 Thread Petr Mladek
On Fri 2016-12-16 11:44:24, Kamalesh Babulal wrote:
> commit 425595a7fc20 ("livepatch: reuse module loader code
> to write relocations") offloads livepatch module relocation
> write to arch specific module loader code.
> 
> Remove unused klp_write_module_reloc() function stub.

Yup, the function is not longer used.

> Signed-off-by: Kamalesh Babulal 

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:53, David Gibson wrote:
> This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
> advertise whether KVM is capable of handling the PAPR extensions for
> resizing the hashed page table during guest runtime.
> 
> At present, HPT resizing is possible with KVM PR without kernel
> modification, since the HPT is managed within qemu.  It's not possible yet
> with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
> use other capabilities which (by accident) reveal whether PR or HV is in
> use to know if it can advertise HPT resizing capability to the guest.
> 
> To avoid ambiguity with existing kernels, the encoding is a bit odd.
> 0 means "unknown" since that's what previous kernels will return
> 1 means "HPT resize possible if available if and only if the HPT is 
> allocated in
>   userspace, rather than in the kernel".  Userspace can check
>   KVM_CAP_PPC_ALLOC_HTAB to determine if that's the case.  In practice
>   this will give the same results as userspace's fallback check.
> 2 will mean "HPT resize available and implemented via ioctl()s
>   KVM_PPC_RESIZE_HPT_PREPARE and KVM_PPC_RESIZE_HPT_COMMIT"

This encoding IMHO clearly needs some proper documentation in
Documentation/virtual/kvm/api.txt ... and maybe also some dedicated
#defines in an uapi header file.

 Thomas



Re: [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:53, David Gibson wrote:
> Currently, the powerpc kvm_arch structure contains a number of variables
> tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
> patch gathers them all together into a single kvm_hpt_info substructure.
> This makes life more convenient for the upcoming HPT resizing
> implementation.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/include/asm/kvm_host.h | 16 ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 
> ++---
>  arch/powerpc/kvm/book3s_hv.c|  2 +-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 -
>  4 files changed, 87 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index e59b172..2673271 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -241,12 +241,20 @@ struct kvm_arch_memory_slot {
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  };
>  
> +struct kvm_hpt_info {
> + unsigned long virt;
> + struct revmap_entry *rev;
> + unsigned long npte;
> + unsigned long mask;
> + u32 order;
> + int cma;
> +};

While you're at it, it would be really great if you could add a comment
at the end of each line with a short description of what the variables
are about. E.g. if I just read "virt" and do not have much clue of the
code yet, I have a hard time to figure out what this means...

 Thomas



Re: [PATCH] powerpc/livepatch: Remove klp_write_module_reloc() stub

2016-12-16 Thread Balbir Singh


On 16/12/16 17:14, Kamalesh Babulal wrote:
> commit 425595a7fc20 ("livepatch: reuse module loader code
> to write relocations") offloads livepatch module relocation
> write to arch specific module loader code.
> 
> Remove unused klp_write_module_reloc() function stub.
> 
> Signed-off-by: Kamalesh Babulal 
> ---
>  arch/powerpc/include/asm/livepatch.h | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/livepatch.h 
> b/arch/powerpc/include/asm/livepatch.h
> index a402f7f..47a03b9 100644
> --- a/arch/powerpc/include/asm/livepatch.h
> +++ b/arch/powerpc/include/asm/livepatch.h
> @@ -28,13 +28,6 @@ static inline int klp_check_compiler_support(void)
>   return 0;
>  }
>  
> -static inline int klp_write_module_reloc(struct module *mod, unsigned long
> - type, unsigned long loc, unsigned long value)
> -{
> - /* This requires infrastructure changes; we need the loadinfos. */
> - return -ENOSYS;
> -}
> -

Makes sense

Acked-by: Balbir Singh 


Re: [PATCH 02/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity

2016-12-16 Thread Thomas Huth
On 15.12.2016 06:53, David Gibson wrote:
> The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
> all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
> by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
> it with CMA only.
> 
> To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
> Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
>  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 8 
>  arch/powerpc/kvm/book3s_hv_builtin.c | 8 
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 2da67bf..3db6be9 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -186,8 +186,8 @@ extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>   unsigned long tce_value, unsigned long npages);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>unsigned long ioba);
> -extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> -extern void kvm_release_hpt(struct page *page, unsigned long nr_pages);
> +extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages);
> +extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
>  extern int kvmppc_core_init_vm(struct kvm *kvm);
>  extern void kvmppc_core_destroy_vm(struct kvm *kvm);
>  extern void kvmppc_core_free_memslot(struct kvm *kvm,
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index b795dd1..ae17cdd 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>   }
>  
>   kvm->arch.hpt_cma_alloc = 0;
> - page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT));
> + page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
>   if (page) {
>   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
>   memset((void *)hpt, 0, (1ul << order));
> @@ -108,7 +108,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>  
>   out_freehpt:
>   if (kvm->arch.hpt_cma_alloc)
> - kvm_release_hpt(page, 1 << (order - PAGE_SHIFT));
> + kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
>   else
>   free_pages(hpt, order - PAGE_SHIFT);
>   return -ENOMEM;
> @@ -155,8 +155,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
>   kvmppc_free_lpid(kvm->arch.lpid);
>   vfree(kvm->arch.revmap);
>   if (kvm->arch.hpt_cma_alloc)
> - kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
> - 1 << (kvm->arch.hpt_order - PAGE_SHIFT));
> + kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
> +  1 << (kvm->arch.hpt_order - PAGE_SHIFT));
>   else
>   free_pages(kvm->arch.hpt_virt,
>  kvm->arch.hpt_order - PAGE_SHIFT);
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 5bb24be..4c4aa47 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -52,19 +52,19 @@ static int __init early_parse_kvm_cma_resv(char *p)
>  }
>  early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv);
>  
> -struct page *kvm_alloc_hpt(unsigned long nr_pages)
> +struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
>  {
>   VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
>  
>   return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
>  }
> -EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
> +EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
>  
> -void kvm_release_hpt(struct page *page, unsigned long nr_pages)
> +void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages)
>  {
>   cma_release(kvm_cma, page, nr_pages);
>  }
> -EXPORT_SYMBOL_GPL(kvm_release_hpt);
> +EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
>  
>  /**
>   * kvm_cma_reserve() - reserve area for kvm hash pagetable

Reviewed-by: Thomas Huth 



Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms to a common function

2016-12-16 Thread Marc Zyngier
On 16/12/16 08:43, Qiang Zhao wrote:
> On 16/12/16 04:33, Marc Zyngier  wrote:
> 
>> -Original Message-
>> From: Marc Zyngier [mailto:marc.zyng...@arm.com]
>> Sent: Friday, December 16, 2016 4:33 PM
>> To: Qiang Zhao ; o...@buserror.net; t...@linutronix.de
>> Cc: ja...@lakedaemon.net; Xiaobo Xie ; linux-
>> ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
>> Subject: Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from 
>> platforms to
>> a common function
>>
>> On 28/09/16 04:25, Zhao Qiang wrote:
>>> The codes of qe_ic init from a variety of platforms are redundant,
>>> merge them to a common function and put it to irqchip/irq-qeic.c
>>>
>>> For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0,
>>> qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of
>>> "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);".
>>>
>>> qe_ic_cascade_muxed_mpic was used for boards has the same interrupt
>>> number for low interrupt and high interrupt, qe_ic_init has checked if
>>> "low interrupt == high interrupt"
>>>
>>> Signed-off-by: Zhao Qiang 
>>> ---
>>> Changes for v2:
>>> - modify subject and commit msg
>>> - add check for qeic by type
>>> Changes for v3:
>>> - na
>>> Changes for v4:
>>> - na
>>> Changes for v5:
>>> - na
>>> Changes for v6:
>>> - rebase
>>>
>>>  arch/powerpc/platforms/83xx/misc.c| 15 ---
>>>  arch/powerpc/platforms/85xx/corenet_generic.c |  9 -
>>>  arch/powerpc/platforms/85xx/mpc85xx_mds.c | 14 --
>>>  arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 16 
>>>  arch/powerpc/platforms/85xx/twr_p102x.c   | 14 --
>>>  drivers/irqchip/irq-qeic.c| 16 
>>>  6 files changed, 16 insertions(+), 68 deletions(-)
>>>
>>
>> [...]
>>
>>> --- a/drivers/irqchip/irq-qeic.c
>>> +++ b/drivers/irqchip/irq-qeic.c
>>> @@ -598,4 +598,20 @@ static int __init init_qe_ic_sysfs(void)
>>> return 0;
>>>  }
>>>
>>> +static int __init qeic_of_init(void)
>>> +{
>>> +   struct device_node *np;
>>> +
>>> +   np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
>>> +   if (!np) {
>>> +   np = of_find_node_by_type(NULL, "qeic");
>>> +   if (!np)
>>> +   return;
>>> +   }
>>> +   qe_ic_init(np, 0, qe_ic_cascade_low_mpic,
>>> +  qe_ic_cascade_high_mpic);
>>> +   of_node_put(np);
>>> +}
>>
>> Have you actually compiled that code?
> 
> Yes.

And the abundance of warnings that the compiler certainly spits doesn't
strike you as an issue that should be addressed before posting the patches?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms to a common function

2016-12-16 Thread Qiang Zhao
On 16/12/16 04:33, Marc Zyngier  wrote:

> -Original Message-
> From: Marc Zyngier [mailto:marc.zyng...@arm.com]
> Sent: Friday, December 16, 2016 4:33 PM
> To: Qiang Zhao ; o...@buserror.net; t...@linutronix.de
> Cc: ja...@lakedaemon.net; Xiaobo Xie ; linux-
> ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms 
> to
> a common function
> 
> On 28/09/16 04:25, Zhao Qiang wrote:
> > The codes of qe_ic init from a variety of platforms are redundant,
> > merge them to a common function and put it to irqchip/irq-qeic.c
> >
> > For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0,
> > qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of
> > "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);".
> >
> > qe_ic_cascade_muxed_mpic was used for boards has the same interrupt
> > number for low interrupt and high interrupt, qe_ic_init has checked if
> > "low interrupt == high interrupt"
> >
> > Signed-off-by: Zhao Qiang 
> > ---
> > Changes for v2:
> > - modify subject and commit msg
> > - add check for qeic by type
> > Changes for v3:
> > - na
> > Changes for v4:
> > - na
> > Changes for v5:
> > - na
> > Changes for v6:
> > - rebase
> >
> >  arch/powerpc/platforms/83xx/misc.c| 15 ---
> >  arch/powerpc/platforms/85xx/corenet_generic.c |  9 -
> >  arch/powerpc/platforms/85xx/mpc85xx_mds.c | 14 --
> >  arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 16 
> >  arch/powerpc/platforms/85xx/twr_p102x.c   | 14 --
> >  drivers/irqchip/irq-qeic.c| 16 
> >  6 files changed, 16 insertions(+), 68 deletions(-)
> >
> 
> [...]
> 
> > --- a/drivers/irqchip/irq-qeic.c
> > +++ b/drivers/irqchip/irq-qeic.c
> > @@ -598,4 +598,20 @@ static int __init init_qe_ic_sysfs(void)
> > return 0;
> >  }
> >
> > +static int __init qeic_of_init(void)
> > +{
> > +   struct device_node *np;
> > +
> > +   np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
> > +   if (!np) {
> > +   np = of_find_node_by_type(NULL, "qeic");
> > +   if (!np)
> > +   return;
> > +   }
> > +   qe_ic_init(np, 0, qe_ic_cascade_low_mpic,
> > +  qe_ic_cascade_high_mpic);
> > +   of_node_put(np);
> > +}
> 
> Have you actually compiled that code?

Yes.

Best Regards
Zhao Qiang


Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms to a common function

2016-12-16 Thread Marc Zyngier
On 28/09/16 04:25, Zhao Qiang wrote:
> The codes of qe_ic init from a variety of platforms are redundant,
> merge them to a common function and put it to irqchip/irq-qeic.c
> 
> For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0,
> qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of
> "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);".
> 
> qe_ic_cascade_muxed_mpic was used for boards has the same interrupt
> number for low interrupt and high interrupt, qe_ic_init has checked
> if "low interrupt == high interrupt"
> 
> Signed-off-by: Zhao Qiang 
> ---
> Changes for v2:
>   - modify subject and commit msg
>   - add check for qeic by type
> Changes for v3:
>   - na
> Changes for v4:
>   - na
> Changes for v5:
>   - na
> Changes for v6:
>   - rebase
> 
>  arch/powerpc/platforms/83xx/misc.c| 15 ---
>  arch/powerpc/platforms/85xx/corenet_generic.c |  9 -
>  arch/powerpc/platforms/85xx/mpc85xx_mds.c | 14 --
>  arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 16 
>  arch/powerpc/platforms/85xx/twr_p102x.c   | 14 --
>  drivers/irqchip/irq-qeic.c| 16 
>  6 files changed, 16 insertions(+), 68 deletions(-)
> 

[...]

> --- a/drivers/irqchip/irq-qeic.c
> +++ b/drivers/irqchip/irq-qeic.c
> @@ -598,4 +598,20 @@ static int __init init_qe_ic_sysfs(void)
>   return 0;
>  }
>  
> +static int __init qeic_of_init(void)
> +{
> + struct device_node *np;
> +
> + np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
> + if (!np) {
> + np = of_find_node_by_type(NULL, "qeic");
> + if (!np)
> + return;
> + }
> + qe_ic_init(np, 0, qe_ic_cascade_low_mpic,
> +qe_ic_cascade_high_mpic);
> + of_node_put(np);
> +}

Have you actually compiled that code?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...