Re: [RFC PATCH] powerpc: fsl_pci: fix inbound ATMU entries for systems with >4G RAM

2016-08-25 Thread Scott Wood
On 08/26/2016 12:26 AM, Tillmann Heidsieck wrote:
> Hello Scott,
> 
> thanks for the fast reply!
> 
> On 2016-08-24 23:39, Scott Wood wrote:
> [..]
>>
>> The second inbound window is at 256G, and it maps all of RAM.  Note 
>> that
>> for accesses in this window, the PCI address is different from the host
>> address.
> 
> I probably really misunderstand the manual here ...
> 
> 1 << 40 = 0x100_ right? So if I put
> 
> (1 << 40) >> 44 = 0 in PEXIWBEAR and
> (1 << 40) >> 12 = 0x1000 in PEXIWBAR
> 
> the window base (extended) address would be (1 << 40) right? And at the
> risk of showing my complete lack of skill doing basic arithmetic ...
> isn't (1 << 40) = 128GB? So what am I missing here?

Sorry, I meant 1 TiB, not 256 GiB.  1 << 40 = 1 TiB.

> I also read that the maximum allowed window size for PCIe inbound 
> windows
> is 64G this would result in the need for more ATMU entries in case of 
>  >68G
> system memory (the question is whether this needs to be fixed because
> maybe nobody wants to build such a computer).

Hmm, it does say that.  This code was written when the physical address
size was 36 bits rather than 40.  I wonder if there's a real 64 GiB
limitation or if the manual just wasn't updated when the physical
address size grew... though in any case we should follow what the manual
says unless it's confirmed otherwise.

>> By trying to identity-map everything you also break the assumption that
>> we don't need swiotlb to avoid the PEXCSRBAR region on PCI devices
>> capable of generating 64-bit addresses.
> 
> Can you point me to where I might read up on this? I know far to little
> about all of this, that's for sure.

PEXCSRBAR (called PCICSRBAR in the Linux driver) is an inbound window
intended for MSIs, configured via a config space BAR rather than the
MMIO ATMU registers.  It only accepts 32-bit PCI addresses and can't be
disabled or pointed at anything but CCSR.  Any RAM that aliases the
PEXCSRBAR PCI address can't be DMAed to without either using swiotlb or
a non-identity mapping.

>>> Tested on a T4240 with 12G of RAM and an AMD E6760 PCIe card working 
>>> with
>>> the in-tree radeon driver.
>>
>> I have no problem using an e1000e card on a t4240 with 24 GiB RAM.
>>
>> Looking at the radeon driver I see that there's a possibility of a dma
>> mask that is exactly 40 bits.  I think there's an off-by-one error in
>> fsl_pci_dma_set_mask().  If you change "dma_mask >=
>> DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)" to "dma_mask >
>> DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)", does that make radeon work without
>> this patch?
> 
> This works in the sense that the radeon driver uses only 32bit dma 
> addresses
> after applying it.
> I don't think that is what was intended since the card clearly works 
> which
> higher addresses.

Right, step one is to fix the bug.  Step two is to make things faster. :-)

For the second step I suggest setting the high window's address/size
based on the amount of RAM present (rounded up to a power of 2) rather
than trying to map the entire 40-bit address space.  Thus if you have 12
GiB RAM, you'd get a 16 GiB window at PCI address 16 GiB, and any PCI
device with a DMA mask of at least 35 bits could use direct DMA to the
high window.  I'll put a patch together.

>>
>> This change is unrelated.
> 
> yeah sorry for sneaking that one in here, are you interested in this 
> change?
> It cleans up the code a little bit by using existing functions. I think 
> it
> makes the intend more clear but that's up for interpretation ;-)

Sure.  If you resend it, please CC me at o...@buserror.net rather than
the NXP address.

BTW, we can probably just drop that "Setting PCI inbound window greater
than memory size" message.  We routinely silently map beyond the end of
RAM with the high mapping...

>> BTW, for some reason your patch is not showing up in Patchwork.
> 
> Are there some known pitfalls when sending patches to Patchwork?

It's not the first time I've seen certain people's patches not show up
there, but I don't know what the root cause is.

-Scott



Re: [RFC PATCH] powerpc: fsl_pci: fix inbound ATMU entries for systems with >4G RAM

2016-08-25 Thread Tillmann Heidsieck

Hello Scott,

thanks for the fast reply!

On 2016-08-24 23:39, Scott Wood wrote:
[..]


The second inbound window is at 256G, and it maps all of RAM.  Note 
that

for accesses in this window, the PCI address is different from the host
address.


I probably really misunderstand the manual here ...

1 << 40 = 0x100_ right? So if I put

(1 << 40) >> 44 = 0 in PEXIWBEAR and
(1 << 40) >> 12 = 0x1000 in PEXIWBAR

the window base (extended) address would be (1 << 40) right? And at the
risk of showing my complete lack of skill doing basic arithmetic ...
isn't (1 << 40) = 128GB? So what am I missing here?

I also read that the maximum allowed window size for PCIe inbound 
windows
is 64G this would result in the need for more ATMU entries in case of 
>68G

system memory (the question is whether this needs to be fixed because
maybe nobody wants to build such a computer).



The according errors can be observed by using the EDAC driver for 
MPC85XX.


This patch changes this behaviour by adding the second window starting
at 4G. The current implementation still leaves memory beyond 68G
unmapped as this would require yet another ATMU entry.


Windows must be size-aligned, as per the description of PEXIWBARn[WBA].
 This window is illegal.


OK, I did mess up that one. I fiddled around with the starting address 
and it
seems that the chip might align the window silently, making the window 
start
at 0x0.  This solves the puzzle why this window works for me. However, 
this
means that my solution is still wrong because of the illegal window size 
as

well as that it (potentially) does not map the whole memory.



By trying to identity-map everything you also break the assumption that
we don't need swiotlb to avoid the PEXCSRBAR region on PCI devices
capable of generating 64-bit addresses.


Can you point me to where I might read up on this? I know far to little
about all of this, that's for sure.



Tested on a T4240 with 12G of RAM and an AMD E6760 PCIe card working 
with

the in-tree radeon driver.


I have no problem using an e1000e card on a t4240 with 24 GiB RAM.

Looking at the radeon driver I see that there's a possibility of a dma
mask that is exactly 40 bits.  I think there's an off-by-one error in
fsl_pci_dma_set_mask().  If you change "dma_mask >=
DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)" to "dma_mask >
DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)", does that make radeon work without
this patch?


This works in the sense that the radeon driver uses only 32bit dma 
addresses

after applying it.
I don't think that is what was intended since the card clearly works 
which

higher addresses.

To elaborate on the problem I am trying to fix. After applying my 
accepted EDAC
patch. I get errors like this (in this case for the snd device which is 
part of

the radeon card)

[8.429635] PCIe ERR_DR register: 0x0010
[8.433893] PCIe ERR_CAP_STAT register: 0x0005
[8.438671] PCIe ERR_CAP_R0 register: 0x0420
[8.443276] PCIe ERR_CAP_R1 register: 0xff000103
[8.447882] PCIe ERR_CAP_R2 register: 0x0200
[8.452486] PCIe ERR_CAP_R3 register: 0x0080a4ec

which I read as
"There is some incoming PCIe transaction to address 0x2_ec4a8000 and 
there is

no mapping for it".

According to the radeon driver it did put one of the rings to this 
address

(viewed from the CPU).





Signed-off-by: Tillmann Heidsieck 
---
 arch/powerpc/sysdev/fsl_pci.c | 40 


 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_pci.c 
b/arch/powerpc/sysdev/fsl_pci.c

index 0ef9df49f0f2..260983037904 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -349,17 +349,13 @@ static void setup_pci_atmu(struct pci_controller 
*hose)

}

sz = min(mem, paddr_lo);
-   mem_log = ilog2(sz);
+   mem_log = order_base_2(sz);

 	/* PCIe can overmap inbound & outbound since RX & TX are separated 
*/

if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) {
-   /* Size window to exact size if power-of-two or one size up */
-   if ((1ull << mem_log) != mem) {
-   mem_log++;
-   if ((1ull << mem_log) > mem)
-   pr_info("%s: Setting PCI inbound window "
-   "greater than memory size\n", name);
-   }
+   if ((1ull << mem_log) > mem)
+			pr_info("%s: Setting PCI inbound window greater than memory 
size\n",

+   name);

piwar |= ((mem_log - 1) & PIWAR_SZ_MASK);



This change is unrelated.


yeah sorry for sneaking that one in here, are you interested in this 
change?
It cleans up the code a little bit by using existing functions. I think 
it

makes the intend more clear but that's up for interpretation ;-)



BTW, for some reason your patch is not showing up in Patchwork.


Are there some known pitfalls 

Re: linux-next: build warnings after merge of the kbuild tree

2016-08-25 Thread Nicholas Piggin
On Mon, 22 Aug 2016 20:47:58 +1000
Nicholas Piggin  wrote:

> On Fri, 19 Aug 2016 20:44:55 +1000
> Nicholas Piggin  wrote:
> 
> > On Fri, 19 Aug 2016 10:37:00 +0200
> > Michal Marek  wrote:
> >   
> > > On 2016-08-19 07:09, Stephen Rothwell wrote:
> 
> [snip]
> 
> > > > 
> > > > I may be missing something, but genksyms generates the crc's off the
> > > > preprocessed C source code and we don't have any for the asm files ...  
> > > > 
> > > 
> > > Of course you are right. Which means that we are losing type information
> > > for these exports for CONFIG_MODVERSIONS purposes. I guess it's
> > > acceptable, since the asm functions are pretty basic and their
> > > signatures do not change.
> > 
> > I don't completely agree. It would be nice to have the functionality
> > still there.
> > 
> > What happens if you just run cmd_modversions on the as rule? It relies on
> > !defined(__ASSEMBLY__), but we're feeding the result to genksyms, not as.
> > It would require the header be included in the .S file and be protected for
> > asm builds.  
> 
> 
> This seems like it *could* be made to work, but there's a few problems.
> 
> - .h files are not made for C consumption. Matter of manually adding the
> ifdef guards, which isn't terrible.
> 
> - .S files do not all include their .h where the C declaration is. Also
> will cause some churn but doable and maybe not completely unreasonable.
> 
> - genksyms parser barfs when it hits the assembly of the .S file. Best
> way to fix that seems just send the #include and EXPORT_SYMBOL lines
> from the .S to the preprocessor. That's a bit of a rabbit hole too, with
> some .S files being included, etc.
> 
> I'm not sure what to do here. If nobody cares and we lose CRCs for .S
> exports, then okay we can whitelist those relocs easily. If we don't want
> to lose the functionality, the above might work but it's a bit intrusive
> an is going to require another cycle of prep patches to go through arch
> code first.
> 
> Or suggestions for alternative approach?

Here is a quick patch that I think should catch missing CRCs in
architecture independent way. If we merge something like this, we
can whitelist the symbols in arch/powerpc so people get steered to
the right place.

Powerpc seems to be the only one really catching this, and it's
only as a side effect of a test run for CONFIG_RELOCATABLE kernels,
which means version failures probably slipped through other archs.

I'll clean it up, do some more testing, and submit it unless
anybody dislikes it or has a better way to do it.

Thanks,
Nick


diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 4b8ffd3..1efc454 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -609,6 +609,7 @@ static void handle_modversions(struct module *mod, struct 
elf_info *info,
 {
unsigned int crc;
enum export export;
+   int is_crc = 0;
 
if ((!is_vmlinux(mod->name) || mod->is_dot_o) &&
strncmp(symname, "__ksymtab", 9) == 0)
@@ -618,6 +619,7 @@ static void handle_modversions(struct module *mod, struct 
elf_info *info,
 
/* CRC'd symbol */
if (strncmp(symname, CRC_PFX, strlen(CRC_PFX)) == 0) {
+   is_crc = 1;
crc = (unsigned int) sym->st_value;
sym_update_crc(symname + strlen(CRC_PFX), mod, crc,
export);
@@ -663,6 +665,10 @@ static void handle_modversions(struct module *mod, struct 
elf_info *info,
else
symname++;
 #endif
+   if (is_crc && !mod->is_dot_o) {
+   const char *e = is_vmlinux(mod->name) ?"":".ko";
+   warn("EXPORT symbol \"%s\" [%s%s] version generation 
failed, symbol will not be versioned.\n", symname + strlen(CRC_PFX), mod->name, 
e);
+   }
mod->unres = alloc_symbol(symname,
  ELF_ST_BIND(sym->st_info) == STB_WEAK,
  mod->unres);


[PATCH] powerpc: Clean up tm_abort duplication in hash_utils_64.c

2016-08-25 Thread Rui Teng
The same logic appears twice and should probably be pulled out into a function.

Suggested-by: Michael Ellerman 
Signed-off-by: Rui Teng 
---
 arch/powerpc/mm/hash_utils_64.c | 45 +
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0821556..69ef702 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1460,6 +1460,23 @@ out_exit:
local_irq_restore(flags);
 }
 
+/*
+ * Transactions are not aborted by tlbiel, only tlbie.
+ * Without, syncing a page back to a block device w/ PIO could pick up
+ * transactional data (bad!) so we force an abort here.  Before the
+ * sync the page will be made read-only, which will flush_hash_page.
+ * BIG ISSUE here: if the kernel uses a page from userspace without
+ * unmapping it first, it may see the speculated version.
+ */
+void local_tm_abort(int local)
+{
+   if (local && cpu_has_feature(CPU_FTR_TM) && current->thread.regs &&
+   MSR_TM_ACTIVE(current->thread.regs->msr)) {
+   tm_enable();
+   tm_abort(TM_CAUSE_TLBI);
+   }
+}
+
 /* WARNING: This is called from hash_low_64.S, if you change this prototype,
  *  do not forget to update the assembly call site !
  */
@@ -1487,19 +1504,7 @@ void flush_hash_page(unsigned long vpn, real_pte_t pte, 
int psize, int ssize,
} pte_iterate_hashed_end();
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-   /* Transactions are not aborted by tlbiel, only tlbie.
-* Without, syncing a page back to a block device w/ PIO could pick up
-* transactional data (bad!) so we force an abort here.  Before the
-* sync the page will be made read-only, which will flush_hash_page.
-* BIG ISSUE here: if the kernel uses a page from userspace without
-* unmapping it first, it may see the speculated version.
-*/
-   if (local && cpu_has_feature(CPU_FTR_TM) &&
-   current->thread.regs &&
-   MSR_TM_ACTIVE(current->thread.regs->msr)) {
-   tm_enable();
-   tm_abort(TM_CAUSE_TLBI);
-   }
+   local_tm_abort(local);
 #endif
 }
 
@@ -1558,19 +1563,7 @@ void flush_hash_hugepage(unsigned long vsid, unsigned 
long addr,
}
 tm_abort:
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-   /* Transactions are not aborted by tlbiel, only tlbie.
-* Without, syncing a page back to a block device w/ PIO could pick up
-* transactional data (bad!) so we force an abort here.  Before the
-* sync the page will be made read-only, which will flush_hash_page.
-* BIG ISSUE here: if the kernel uses a page from userspace without
-* unmapping it first, it may see the speculated version.
-*/
-   if (local && cpu_has_feature(CPU_FTR_TM) &&
-   current->thread.regs &&
-   MSR_TM_ACTIVE(current->thread.regs->msr)) {
-   tm_enable();
-   tm_abort(TM_CAUSE_TLBI);
-   }
+   local_tm_abort(local);
 #endif
return;
 }
-- 
2.7.4



Re: [PATCH 0/2] ibmvfc: FC-TAPE Support

2016-08-25 Thread Martin K. Petersen
> "Tyrel" == Tyrel Datwyler  writes:

Tyrel> This patchset introduces optional FC-TAPE/FC Class 3 Error
Tyrel> Recovery to the ibmvfc client driver.

Applied to 4.9/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3 2/2] powerpc/fadump: parse fadump reserve memory size based on memory range

2016-08-25 Thread Dave Young
On 08/25/16 at 11:00pm, Hari Bathini wrote:
> 
> 
> On Thursday 25 August 2016 12:31 PM, Dave Young wrote:
> > On 08/10/16 at 03:35pm, Hari Bathini wrote:
> > > When fadump is enabled, by default 5% of system RAM is reserved for
> > > fadump kernel. While that works for most cases, it is not good enough
> > > for every case.
> > > 
> > > Currently, to override the default value, fadump supports specifying
> > > memory to reserve with fadump_reserve_mem=size, where only a fixed size
> > > can be specified. This patch adds support to specify memory size to
> > > reserve for different memory ranges as below:
> > > 
> > >   fadump_reserve_mem=:[,:,...]
> > Hi, Hari
> 
> Hi Dave,
> 
> > I do not understand why you need introduce the new cmdline param, what's
> > the difference between the "fadump reserved" memory and the memory
> 
> I am not introducing a new parameter but adding a new syntax for
> an existing parameter.

Apologize for that, I was not aware it because it is not documented in
kernel-parameters.txt

> 
> > reserved by "crashkernel="? Can fadump just use crashkernel= to reserve
> > memory?
> 
> Not all syntaxes supported by crashkernel apply for fadump_reserve_mem.
> Nonetheless, it is worth considering reuse of crashkernel parameter instead
> of fadump_reserve_mem. Let me see what I can do about this..

Thanks! I originally thought fadump will reserve memory in firmware
code, if it is in kernel then it will be better to just extend and reuse
crashkernel=.

Dave
> 
> Thanks
> Hari
> 
> > Thanks
> > Dave
> > 
> > > Supporting range based input for "fadump_reserve_mem" parameter helps
> > > using the same commandline parameter for different system memory sizes.
> > > 
> > > Signed-off-by: Hari Bathini 
> > > Reviewed-by: Mahesh J Salgaonkar 
> > > ---
> > > 
> > > Changes from v2:
> > > 1. Updated changelog
> > > 
> > > 
> > >   arch/powerpc/kernel/fadump.c |   63 
> > > --
> > >   1 file changed, 54 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> > > index b3a6633..7c01b5b 100644
> > > --- a/arch/powerpc/kernel/fadump.c
> > > +++ b/arch/powerpc/kernel/fadump.c
> > > @@ -193,6 +193,55 @@ static unsigned long init_fadump_mem_struct(struct 
> > > fadump_mem_struct *fdm,
> > >   return addr;
> > >   }
> > > +/*
> > > + * This function parses command line for fadump_reserve_mem=
> > > + *
> > > + * Supports the below two syntaxes:
> > > + *1. fadump_reserve_mem=size
> > > + *2. fadump_reserve_mem=ramsize-range:size[,...]
> > > + *
> > > + * Sets fw_dump.reserve_bootvar with the memory size
> > > + * provided, 0 otherwise
> > > + *
> > > + * The function returns -EINVAL on failure, 0 otherwise.
> > > + */
> > > +static int __init parse_fadump_reserve_mem(void)
> > > +{
> > > + char *name = "fadump_reserve_mem=";
> > > + char *fadump_cmdline = NULL, *cur;
> > > +
> > > + fw_dump.reserve_bootvar = 0;
> > > +
> > > + /* find fadump_reserve_mem and use the last one if there are many */
> > > + cur = strstr(boot_command_line, name);
> > > + while (cur) {
> > > + fadump_cmdline = cur;
> > > + cur = strstr(cur+1, name);
> > > + }
> > > +
> > > + /* when no fadump_reserve_mem= cmdline option is provided */
> > > + if (!fadump_cmdline)
> > > + return 0;
> > > +
> > > + fadump_cmdline += strlen(name);
> > > +
> > > + /* for fadump_reserve_mem=size cmdline syntax */
> > > + if (!is_colon_in_param(fadump_cmdline)) {
> > > + fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
> > > + return 0;
> > > + }
> > > +
> > > + /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
> > > + cur = fadump_cmdline;
> > > + fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
> > > + , memblock_phys_mem_size());
> > > + if (cur == fadump_cmdline) {
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > >   /**
> > >* fadump_calculate_reserve_size(): reserve variable boot area 5% of 
> > > System RAM
> > >*
> > > @@ -212,12 +261,17 @@ static inline unsigned long 
> > > fadump_calculate_reserve_size(void)
> > >   {
> > >   unsigned long size;
> > > + /* sets fw_dump.reserve_bootvar */
> > > + parse_fadump_reserve_mem();
> > > +
> > >   /*
> > >* Check if the size is specified through fadump_reserve_mem= 
> > > cmdline
> > >* option. If yes, then use that.
> > >*/
> > >   if (fw_dump.reserve_bootvar)
> > >   return fw_dump.reserve_bootvar;
> > > + else
> > > + printk(KERN_INFO "fadump: calculating default boot size\n");
> > >   /* divide by 20 to get 5% of value */
> > >   size = memblock_end_of_DRAM() / 20;
> > > @@ -348,15 +402,6 @@ static int __init early_fadump_param(char *p)
> > >   }
> > >   

Re: Suspected regression?

2016-08-25 Thread Scott Wood
On Tue, 2016-08-23 at 13:34 +0200, Christophe Leroy wrote:
> 
> Le 23/08/2016 à 11:20, Alessio Igor Bogani a écrit :
> > 
> > Hi Christophe,
> > 
> > Sorry for delay in reply I was on vacation.
> > 
> > On 6 August 2016 at 11:29, christophe leroy 
> > wrote:
> > > 
> > > Alessio,
> > > 
> > > 
> > > Le 05/08/2016 à 09:51, Christophe Leroy a écrit :
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Le 19/07/2016 à 23:52, Scott Wood a écrit :
> > > > > 
> > > > > 
> > > > > On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote:
> > > > > > 
> > > > > > 
> > > > > > Hi all,
> > > > > > 
> > > > > > I have got two boards MVME5100 (MPC7410 cpu) and MVME7100
> > > > > > (MPC8641D
> > > > > > cpu) for which I use the same cross-compiler (ppc7400).
> > > > > > 
> > > > > > I tested these against kernel HEAD to found that these don't boot
> > > > > > anymore (PID 1 crash).
> > > > > > 
> > > > > > Bisecting results in first offending commit:
> > > > > > 7aef4136566b0539a1a98391181e188905e33401
> > > > > > 
> > > > > > Removing it from HEAD make boards boot properly again.
> > > > > > 
> > > > > > A third system based on P2010 isn't affected at all.
> > > > > > 
> > > > > > Is it a regression or I have made something wrong?
> > > > > 
> > > > > I booted both my next branch, and Linus's master on MPC8641HPCN and
> > > > > didn't see
> > > > > this -- though possibly your RFS is doing something
> > > > > different.  Maybe
> > > > > that's
> > > > > the difference with P2010 as well.
> > > > > 
> > > > > Is there any way you can debug the cause of the crash?  Or send me a
> > > > > minimal
> > > > > RFS that demonstrates the problem (ideally with debug symbols on the
> > > > > userspace
> > > > > binaries)?
> > > > > 
> > > > I got from Alessio the below information:
> > > > 
> > > > systemd[1]: Caught , core dump failed (child 137, code=killed,
> > > > status=7/BUS).
> > > > systemd[1]: Freezing execution.
> > > > 
> > > > 
> > > > What can generate SIGBUS ?
> > > > And shouldn't we also get some KERN_ERR trace, something like
> > > > "unhandled
> > > > signal 7 at ." ?
> > > > 
> > > As far as I can see, SIGBUS is mainly generated from alignment
> > > exception.
> > > According to 7410 Reference Manual, alignment exception can happen in
> > > the
> > > following cases:
> > > * An operand of a dcbz instruction is on a page that is write-through or
> > > cache-inhibited for a virtual mode access.
> > > * An attempt to execute a dcbz instruction occurs when the cache is
> > > disabled
> > > or locked.
> > > 
> > > Could try with below patch to check if the dcbz insn is causing the
> > > SIGBUS ?
> > Unfortunately that patch doesn't solve the problem.
> > 
> > Is there a chance that cache behavior could settled by board firmware
> > (PPCBug on the MPC7410 board and MotLoad on the MPC8641D one)?
> > In that case what do you suggest me to looking for?
> If the removal of dcbz doesn't solve the issue, I don't think it is a 
> cache related issue.
> As far as I understood, your init gets a SIGBUS signal, right ? Then we 
> must identify the reason for that sigbus.

My guess would be errors demand-loading a page via NFS.

One approach might be to hack up the code so that both versions of
csum_partial_copy_generic() are present, and call both each time.  If the
results differ or the copied bytes are wrong, then spit out a dump of the
details.

-Scott



Re: [RFC PATCH v3 00/12] powerpc: "paca->soft_enabled" based local atomic operation implementation

2016-08-25 Thread Madhavan Srinivasan



On Thursday 25 August 2016 12:45 PM, Nicholas Piggin wrote:

On Thu, 25 Aug 2016 11:59:51 +0530
Madhavan Srinivasan  wrote:


Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

This is looking really nice. I like how you're able to specify
the mask nicely in the handler, and test the mask without adding
any instructions to fastpaths.


Thanks.



So far, I only have a few trivial nitpicks as you can see. I'll
apply the series and give it a more careful look tomorrow.

Yes. Please.

Also, just noticed that i broke the Booke build, will fix that and 
respin it right away.

Thanks for the review

Maddy


Thanks,
Nick





Re: [RFC PATCH v3 11/12] powerpc: Support to replay PMIs

2016-08-25 Thread Madhavan Srinivasan



On Thursday 25 August 2016 12:38 PM, Nicholas Piggin wrote:

On Thu, 25 Aug 2016 12:00:02 +0530
Madhavan Srinivasan  wrote:


Code to replay the Performance Monitoring Interrupts(PMI).
In the masked_interrupt handler, for PMIs we reset the MSR[EE]
and return. In the __check_irq_replay(), replay the PMI interrupt
by calling performance_monitor_common handler.

Patch also adds a new soft_irq_set_mask() to update paca->soft_enabled.
New Kconfig is added "CONFIG_IRQ_DEBUG_SUPPORT" to add a warn_on
to alert the usage of soft_irq_set_mask() for disabling lower
bitmask interrupts.

Have also moved the code under the CONFIG_TRACE_IRQFLAGS in
arch_local_irq_restore() to new Kconfig as suggested.

Should you make a single patch out of this and patch 10?
It doesn't make sense to mask perf interrupts if we can't
replay them.

Perhaps split the CONFIG_IRQ_DEBUG_SUPPORT change into its
own patch first and have the PMU masking and replaying as
a single patch?


Make senses. Will make the changes.

Maddy



Just a suggestion.





Re: [RFC PATCH v3 08/12] powerpc: Introduce new mask bit for soft_enabled

2016-08-25 Thread Madhavan Srinivasan



On Thursday 25 August 2016 12:35 PM, Nicholas Piggin wrote:

On Thu, 25 Aug 2016 11:59:59 +0530
Madhavan Srinivasan  wrote:


diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index c19169ac1fbb..e457438c6fdf 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -32,6 +32,7 @@
   */
  #define IRQ_DISABLE_MASK_NONE 0
  #define IRQ_DISABLE_MASK_LINUX1
+#define IRQ_DISABLE_MASK_PMU   2
  
  #endif /* CONFIG_PPC64 */

This bit belongs in patch 10, I think?
Yes. I had that in Patch 10, but then since this patch talks abt making 
the soft_enabled
as a mask, having the third value here made sense. But I can move back 
to patch 10.


Maddy







Re: [PATCH -next] ibmvnic: convert to use simple_open()

2016-08-25 Thread David Miller
From: Wei Yongjun 
Date: Wed, 24 Aug 2016 13:50:03 +

> From: Wei Yongjun 
> 
> Remove an open coded simple_open() function and replace file
> operations references to the function with simple_open()
> instead.
> 
> Generated by: scripts/coccinelle/api/simple_open.cocci
> 
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH -next] ibmvnic: fix error return code in ibmvnic_probe()

2016-08-25 Thread David Miller
From: Wei Yongjun 
Date: Wed, 24 Aug 2016 13:47:58 +

> From: Wei Yongjun 
> 
> Fix to return error code -ENOMEM from the dma_map_single error
> handling case instead of 0, as done elsewhere in this function.
> 
> Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH v3 0/5] kexec_file: Add buffer hand-over for the next kernel

2016-08-25 Thread Thiago Jung Bauermann
Am Donnerstag, 25 August 2016, 14:12:43 schrieb Andrew Morton:
> I grabbed these two patch series.  I also merged the "IMA:
> Demonstration code for kexec buffer passing." demonstration patch just
> to get things a bit of testing.

Thank you very much!

> I assume that once the "ima: carry the
> measurement list across kexec" series has stabilised, I should drop the
> demo patch and also grab those?  If so, pelase start cc'ing me.

I'm not sure how Mimi is planning to upstream that series.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH v3 0/5] kexec_file: Add buffer hand-over for the next kernel

2016-08-25 Thread Andrew Morton
On Thu, 25 Aug 2016 15:18:26 -0300 Thiago Jung Bauermann 
 wrote:

> Hello,
> 
> This patch series implements a mechanism which allows the kernel to pass
> on a buffer to the kernel that will be kexec'd. This buffer is passed
> as a segment which is added to the kimage when it is being prepared
> by kexec_file_load.
> 
> How the second kernel is informed of this buffer is architecture-specific.
> On powerpc, this is done via the device tree, by checking
> the properties /chosen/linux,kexec-handover-buffer-start and
> /chosen/linux,kexec-handover-buffer-end, which is analogous to how the
> kernel finds the initrd.
> 
> This is needed because the Integrity Measurement Architecture subsystem
> needs to preserve its measurement list accross the kexec reboot. The
> following patch series for the IMA subsystem uses this feature for that
> purpose:
> 
> https://lists.infradead.org/pipermail/kexec/2016-August/016745.html
> 
> This is so that IMA can implement trusted boot support on the OpenPower
> platform, because on such systems an intermediary Linux instance running
> as part of the firmware is used to boot the target operating system via
> kexec. Using this mechanism, IMA on this intermediary instance can
> hand over to the target OS the measurements of the components that were
> used to boot it.
> 
> Because there could be additional measurement events between the
> kexec_file_load call and the actual reboot, IMA needs a way to update the
> buffer with those additional events before rebooting. One can minimize
> the interval between the kexec_file_load and the reboot syscalls, but as
> small as it can be, there is always the possibility that the measurement
> list will be out of date at the time of reboot.
> 
> To address this issue, this patch series also introduces
> kexec_update_segment, which allows a reboot notifier to change the
> contents of the image segment during the reboot process.
> 
> The last patch is not intended to be merged, it just demonstrates how
> this feature can be used.
> 
> This series applies on top of v6 of the "kexec_file_load implementation
> for PowerPC" patch series (which applies on top of v4.8-rc1):
> 
> https://lists.infradead.org/pipermail/kexec/2016-August/016960.html

I grabbed these two patch series.  I also merged the "IMA:
Demonstration code for kexec buffer passing." demonstration patch just
to get things a bit of testing.  I assume that once the "ima: carry the
measurement list across kexec" series has stabilised, I should drop the
demo patch and also grab those?  If so, pelase start cc'ing me.



Re: [PATCH v2 2/5] firmware: annotate thou shalt not request fw on init or probe

2016-08-25 Thread Dmitry Torokhov
On Thu, Aug 25, 2016 at 12:41 PM, Luis R. Rodriguez  wrote:
> On Thu, Aug 25, 2016 at 01:05:44PM +0200, Daniel Vetter wrote:
>> On Wed, Aug 24, 2016 at 10:39 PM, Luis R. Rodriguez  
>> wrote:

>> > Can they use initramfs for this ?
>>
>> Apparently that's also uncool with the embedded folks.
>
> What's uncool with embedded folks? To use initramfs for firmware ?
> If so can you explain why ?

Because it is not needed? If you are embedded you have an option of
only compiling drivers and services that you need directly into the
kernel, then initramfs is just an extra piece that complicates your
boot process.

Thanks.

-- 
Dmitry


Re: [PATCH v2 2/5] firmware: annotate thou shalt not request fw on init or probe

2016-08-25 Thread Luis R. Rodriguez
On Thu, Aug 25, 2016 at 10:10:52PM +0200, Daniel Vetter wrote:
> Cutting down since a lot of this is probably better discussed at
> ks/lpc. Aside, if you want to check out Chris Wilson's work on our new
> depency handling, it's called kfence.
> 
> https://lkml.org/lkml/2016/7/17/37

Thanks more reading.. :)

> On Thu, Aug 25, 2016 at 9:41 PM, Luis R. Rodriguez  wrote:
> >> > So .. I agree, let's avoid the hacks. Patches welcomed.
> >>
> >> Hm, this is a definite change of tack - back when I discussed this
> >> with Greg about 2 ks ago it sounded like "don't do this". The only
> >> thing we need is some way to wait for rootfs before we do the
> >> request_firmware. Everything else we handle already in the kernel.
> >
> > OK so lets just get this userspace event done, and we're set.
> 
> Ah great. As long as that special wait-for-rootfs-pls firmware request
> is still allowed, i915 folks will be happy. We will call it from
> probe though ;-) but all from our own workers.

We should strive for this to be transparent to drivers, ie, this safeguard of
looking for firmware should be something handled by the kernel as otherwise
we're just forcing a race condition given the review in the last thread.

  Luis


Re: [PATCH v2 2/5] firmware: annotate thou shalt not request fw on init or probe

2016-08-25 Thread Daniel Vetter
Cutting down since a lot of this is probably better discussed at
ks/lpc. Aside, if you want to check out Chris Wilson's work on our new
depency handling, it's called kfence.

https://lkml.org/lkml/2016/7/17/37

On Thu, Aug 25, 2016 at 9:41 PM, Luis R. Rodriguez  wrote:
>> > So .. I agree, let's avoid the hacks. Patches welcomed.
>>
>> Hm, this is a definite change of tack - back when I discussed this
>> with Greg about 2 ks ago it sounded like "don't do this". The only
>> thing we need is some way to wait for rootfs before we do the
>> request_firmware. Everything else we handle already in the kernel.
>
> OK so lets just get this userspace event done, and we're set.

Ah great. As long as that special wait-for-rootfs-pls firmware request
is still allowed, i915 folks will be happy. We will call it from
probe though ;-) but all from our own workers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [PATCH v2 2/5] firmware: annotate thou shalt not request fw on init or probe

2016-08-25 Thread Luis R. Rodriguez
Summoning Felix for the embedded aspect on initramfs below.
Jörg might be interested in the async facilities you speak of as well.

On Thu, Aug 25, 2016 at 01:05:44PM +0200, Daniel Vetter wrote:
> On Wed, Aug 24, 2016 at 10:39 PM, Luis R. Rodriguez  wrote:
> > On Wed, Aug 24, 2016 at 08:55:55AM +0200, Daniel Vetter wrote:
> >> On Fri, Jun 17, 2016 at 12:54 AM, Luis R. Rodriguez  
> >> wrote:
> >> > Thou shalt not make firmware calls early on init or probe.
> >
> > <-- snip -->
> >
> >> > There are 4 offenders at this time:
> >> >
> >> > mcgrof@ergon ~/linux-next (git::20160609)$ export 
> >> > COCCI=scripts/coccinelle/api/request_firmware.cocci
> >> > mcgrof@ergon ~/linux-next (git::20160609)$ make coccicheck MODE=report
> >> >
> >> > drivers/fmc/fmc-fakedev.c: ERROR: driver call request firmware call on 
> >> > its init routine on line 321.
> >> > drivers/fmc/fmc-write-eeprom.c: ERROR: driver call request firmware call 
> >> > on its probe routine on line 136.
> >> > drivers/tty/serial/rp2.c: ERROR: driver call request firmware call on 
> >> > its probe routine on line 796.
> >> > drivers/tty/serial/ucc_uart.c: ERROR: driver call request firmware call 
> >> > on its probe routine on line 1246.
> >>
> >> Plus all gpu drivers which need firmware. And yes we must load them at
> >> probe
> >
> > Do you have an upstream driver in mind that does this ? Is it on device
> > drier module probe or a DRM subsystem specific probe call of some sort ?
> 
> i915 is the one I care about for obvious reasons ;-) It's all from the
> pci device probe function, but nested really deeply.

The above SmPL grammar should capture well nested calls of calls within probe,
so curious why it didn't pick up i915. Let's see.

i915_pci_probe() --> i915_driver_load() -->
i915_load_modeset_init() --> (drivers/gpu/drm/i915/i915_drv.c)
a) intel_csr_ucode_init() (drivers/gpu/drm/i915/intel_csr.c)
...
b) intel_guc_init() (drivers/gpu/drm/i915/intel_guc_loader.c)

The two firmwares come from:

a) intel_csr_ucode_init() --> schedule_work(_priv->csr.work);
csr_load_work_fn() --> request_firmware()

b) intel_guc_init() --> guc_fw_fetch() request_firmware()

---

a) is not dealt with given the grammar does not include scheduled work,
however using work to offload loading firmware seems reasonable here.

b) This should have been picked up, but its not. Upon closer inspection
   the grammar currently expects the module init and probe to be on the
   same file. Loosening this:

diff --git 
a/scripts/coccinelle/api/request_firmware-avoid-init-probe-init.cocci 
b/scripts/coccinelle/api/request_firmware-avoid-init-probe-init.cocci
index cf180c59e042..e19e6d3dfc0f 100644
--- a/scripts/coccinelle/api/request_firmware-avoid-init-probe-init.cocci
+++ b/scripts/coccinelle/api/request_firmware-avoid-init-probe-init.cocci
@@ -49,7 +49,7 @@ identifier init;
 
 module_init(init);
 
-@ has_probe depends on defines_module_init@
+@ has_probe @
 identifier drv_calls, drv_probe;
 type bus_driver;
 identifier probe_op =~ "(probe)";
@@ -59,7 +59,7 @@ bus_driver drv_calls = {
.probe_op = drv_probe,
 };
 
-@hascall depends on !after_start && defines_module_init@
+@hascall depends on !after_start @
 position p;
 @@
 

I get a lot more complaints but still -- i915 b) case is not yet covered:

./drivers/bluetooth/ath3k.c: ERROR: driver call request firmware call on its 
probe routine on line 546.
./drivers/bluetooth/bcm203x.c: ERROR: driver call request firmware call on its 
probe routine on line 193.
./drivers/bluetooth/bcm203x.c: ERROR: driver call request firmware call on its 
probe routine on line 218.
./drivers/bluetooth/bfusb.c: ERROR: driver call request firmware call on its 
probe routine on line 655.
./drivers/fmc/fmc-fakedev.c: ERROR: driver call request firmware call on its 
init routine on line 321.
./drivers/fmc/fmc-write-eeprom.c: ERROR: driver call request firmware call on 
its probe routine on line 136.
./drivers/tty/serial/ucc_uart.c: ERROR: driver call request firmware call on 
its probe routine on line 1246.
./sound/soc/codecs/wm2000.c: ERROR: driver call request firmware call on its 
probe routine on line 893.
./sound/soc/sh/siu_dai.c: ERROR: driver call request firmware call on its probe 
routine on line 747.
./drivers/net/wireless/intersil/orinoco/orinoco_usb.c: ERROR: driver call 
request firmware call on its probe routine on line 1661.
./sound/soc/intel/common/sst-acpi.c: ERROR: driver call request firmware call 
on its probe routine on line 161.
./drivers/input/touchscreen/goodix.c: ERROR: driver call request firmware call 
on its probe routine on line 744.
./drivers/media/usb/go7007/go7007-loader.c: ERROR: driver call request firmware 
call on its probe routine on line 78.
./drivers/media/usb/go7007/go7007-loader.c: ERROR: driver call request firmware 
call on its probe routine on line 93.
./drivers/tty/serial/rp2.c: ERROR: driver call 

[PATCH v3 5/5] IMA: Demonstration code for kexec buffer passing.

2016-08-25 Thread Thiago Jung Bauermann
This shows how kernel code can use the kexec buffer passing mechanism
to pass information to the next kernel.

This patch is not intended to be committed.

Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/ima.h   | 11 +
 kernel/kexec_file.c   |  4 ++
 security/integrity/ima/ima.h  |  5 +++
 security/integrity/ima/ima_init.c | 26 +++
 security/integrity/ima/ima_template.c | 85 +++
 5 files changed, 131 insertions(+)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 0eb7c2e7f0d6..96528d007139 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -11,6 +11,7 @@
 #define _LINUX_IMA_H
 
 #include 
+#include 
 struct linux_binprm;
 
 #ifdef CONFIG_IMA
@@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
 
+#ifdef CONFIG_KEXEC_FILE
+extern void ima_add_kexec_buffer(struct kimage *image);
+#endif
+
 #else
 static inline int ima_bprm_check(struct linux_binprm *bprm)
 {
@@ -60,6 +65,12 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
return;
 }
 
+#ifdef CONFIG_KEXEC_FILE
+static inline void ima_add_kexec_buffer(struct kimage *image)
+{
+}
+#endif
+
 #endif /* CONFIG_IMA */
 
 #ifdef CONFIG_IMA_APPRAISE
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 6a48519b5c5b..f1a0207a6742 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -248,6 +249,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
}
}
 
+   /* IMA needs to pass the measurement list to the next kernel. */
+   ima_add_kexec_buffer(image);
+
/* Call arch image load handlers */
ldata = arch_kexec_kernel_image_load(image);
 
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index db25f54a04fe..0334001055d7 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -102,6 +102,11 @@ struct ima_queue_entry {
 };
 extern struct list_head ima_measurements;  /* list of all measurements */
 
+#ifdef CONFIG_KEXEC_FILE
+extern void *kexec_buffer;
+extern size_t kexec_buffer_size;
+#endif
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 32912bd54ead..a1924d0f3b2b 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -104,6 +105,29 @@ void __init ima_load_x509(void)
 }
 #endif
 
+#ifdef CONFIG_KEXEC_FILE
+static void ima_load_kexec_buffer(void)
+{
+   int rc;
+
+   /* Fetch the buffer from the previous kernel, if any. */
+   rc = kexec_get_handover_buffer(_buffer, _buffer_size);
+   if (rc == 0) {
+   /* Demonstrate that buffer handover works. */
+   pr_err("kexec buffer contents: %s\n", (char *) kexec_buffer);
+   pr_err("kexec buffer contents after update: %s\n",
+  (char *) kexec_buffer + 4 * PAGE_SIZE + 10);
+
+   kexec_free_handover_buffer();
+   } else if (rc == -ENOENT)
+   pr_debug("No kexec buffer from the previous kernel.\n");
+   else
+   pr_debug("Error restoring kexec buffer: %d\n", rc);
+}
+#else
+static void ima_load_kexec_buffer(void) { }
+#endif
+
 int __init ima_init(void)
 {
u8 pcr_i[TPM_DIGEST_SIZE];
@@ -134,5 +158,7 @@ int __init ima_init(void)
 
ima_init_policy();
 
+   ima_load_kexec_buffer();
+
return ima_fs_init();
 }
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index febd12ed9b55..6dd3f902567d 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -15,6 +15,8 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
+#include 
 #include "ima.h"
 #include "ima_template_lib.h"
 
@@ -182,6 +184,89 @@ static int template_desc_init_fields(const char 
*template_fmt,
return 0;
 }
 
+#ifdef CONFIG_KEXEC_FILE
+void *kexec_buffer = NULL;
+size_t kexec_buffer_size = 0;
+
+/* Physical address of the measurement buffer in the next kernel. */
+unsigned long kexec_buffer_load_addr = 0;
+
+/*
+ * Called during reboot. IMA can add here new events that were generated after
+ * the kexec image was loaded.
+ */
+static int ima_update_kexec_buffer(struct notifier_block *self,
+  unsigned long action, void *data)
+{
+   int ret;
+
+   if (!kexec_in_progress)
+   return NOTIFY_OK;
+
+   /*
+* Add content deep in the buffer to show that we can update
+

[PATCH v3 4/5] kexec_file: Add mechanism to update kexec segments.

2016-08-25 Thread Thiago Jung Bauermann
kexec_update_segment allows a given segment in kexec_image to have
its contents updated. This is useful if the current kernel wants to
send information to the next kernel that is up-to-date at the time of
reboot.

Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/kexec.h |  2 ++
 kernel/kexec_core.c   | 99 +++
 2 files changed, 101 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index edadff6c86ff..ff3aa93649e2 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -256,6 +256,8 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
*image,
  unsigned int size, bool get_value);
 extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
 const char *name);
+int kexec_update_segment(const char *buffer, size_t bufsz,
+unsigned long load_addr, size_t memsz);
 extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 561675589511..11ca5f8678df 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -721,6 +721,105 @@ static struct page *kimage_alloc_page(struct kimage 
*image,
return page;
 }
 
+/**
+ * kexec_update_segment - update the contents of a kimage segment
+ * @buffer:New contents of the segment.
+ * @bufsz: @buffer size.
+ * @load_addr: Segment's physical address in the next kernel.
+ * @memsz: Segment size.
+ *
+ * This function assumes kexec_mutex is held.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_update_segment(const char *buffer, size_t bufsz,
+unsigned long load_addr, size_t memsz)
+{
+   int i;
+   unsigned long entry;
+   unsigned long *ptr = NULL;
+   void *dest = NULL;
+
+   if (kexec_image == NULL) {
+   pr_err("Can't update segment: no kexec image loaded.\n");
+   return -EINVAL;
+   }
+
+   /*
+* kexec_add_buffer rounds up segment sizes to PAGE_SIZE, so
+* we have to do it here as well.
+*/
+   memsz = ALIGN(memsz, PAGE_SIZE);
+
+   for (i = 0; i < kexec_image->nr_segments; i++)
+   /* We only support updating whole segments. */
+   if (load_addr == kexec_image->segment[i].mem &&
+   memsz == kexec_image->segment[i].memsz) {
+   if (!kexec_image->segment[i].skip_checksum) {
+   pr_err("Trying to update non-modifiable 
segment.\n");
+   return -EINVAL;
+   }
+
+   break;
+   }
+   if (i == kexec_image->nr_segments) {
+   pr_err("Couldn't find segment to update: 0x%lx, size 0x%zx\n",
+  load_addr, memsz);
+   return -EINVAL;
+   }
+
+   for (entry = kexec_image->head; !(entry & IND_DONE) && memsz;
+entry = *ptr++) {
+   void *addr = (void *) (entry & PAGE_MASK);
+
+   switch (entry & IND_FLAGS) {
+   case IND_DESTINATION:
+   dest = addr;
+   break;
+   case IND_INDIRECTION:
+   ptr = __va(entry & PAGE_MASK);
+   break;
+   case IND_SOURCE:
+   /* Shouldn't happen, but verify just to be safe. */
+   if (dest == NULL) {
+   pr_err("Invalid kexec entries list.");
+   return -EINVAL;
+   }
+
+   if (dest == (void *) load_addr) {
+   struct page *page;
+   char *ptr;
+   size_t uchunk, mchunk;
+
+   page = kmap_to_page(addr);
+
+   ptr = kmap_atomic(page);
+   ptr += load_addr & ~PAGE_MASK;
+   mchunk = min_t(size_t, memsz,
+  PAGE_SIZE - (load_addr & 
~PAGE_MASK));
+   uchunk = min(bufsz, mchunk);
+   memcpy(ptr, buffer, uchunk);
+
+   kunmap_atomic(ptr);
+
+   bufsz -= uchunk;
+   load_addr += mchunk;
+   buffer += mchunk;
+   memsz -= mchunk;
+   }
+   dest += PAGE_SIZE;
+   }
+
+   /* Shouldn't happen, but verify just to be safe. */
+   if (ptr == NULL) {
+   pr_err("Invalid kexec entries list.");
+   return 

[PATCH v3 3/5] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-25 Thread Thiago Jung Bauermann
Add skip_checksum member to struct kexec_buf to specify whether the
corresponding segment should be part of the checksum calculation.

The next patch will add a way to update segments after a kimage is loaded.
Segments that will be updated in this way should not be checksummed,
otherwise they will cause the purgatory checksum verification to fail
when the machine is rebooted.

As a bonus, we don't need to special-case the purgatory segment anymore
to avoid checksumming it.

Places using struct kexec_buf get false as the default value for
skip_checksum since they all use designated initializers.  Therefore,
there is no behavior change with this patch and all segments except the
purgatory are checksummed.

Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/kexec.h | 23 ++-
 kernel/kexec_file.c   | 15 +++
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 16561e96a6d7..edadff6c86ff 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -100,6 +100,9 @@ struct kexec_segment {
size_t bufsz;
unsigned long mem;
size_t memsz;
+
+   /* Whether this segment is ignored in the checksum calculation. */
+   bool skip_checksum;
 };
 
 #ifdef CONFIG_COMPAT
@@ -151,15 +154,16 @@ struct kexec_file_ops {
 
 /**
  * struct kexec_buf - parameters for finding a place for a buffer in memory
- * @image: kexec image in which memory to search.
- * @buffer:Contents which will be copied to the allocated memory.
- * @bufsz: Size of @buffer.
- * @mem:   On return will have address of the buffer in memory.
- * @memsz: Size for the buffer in memory.
- * @buf_align: Minimum alignment needed.
- * @buf_min:   The buffer can't be placed below this address.
- * @buf_max:   The buffer can't be placed above this address.
- * @top_down:  Allocate from top of memory.
+ * @image: kexec image in which memory to search.
+ * @buffer:Contents which will be copied to the allocated memory.
+ * @bufsz: Size of @buffer.
+ * @mem:   On return will have address of the buffer in memory.
+ * @memsz: Size for the buffer in memory.
+ * @buf_align: Minimum alignment needed.
+ * @buf_min:   The buffer can't be placed below this address.
+ * @buf_max:   The buffer can't be placed above this address.
+ * @top_down:  Allocate from top of memory.
+ * @skip_checksum: Don't verify checksum for this buffer in purgatory.
  */
 struct kexec_buf {
struct kimage *image;
@@ -171,6 +175,7 @@ struct kexec_buf {
unsigned long buf_min;
unsigned long buf_max;
bool top_down;
+   bool skip_checksum;
 };
 
 int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 1e10689f7662..6a48519b5c5b 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -584,6 +584,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
ksegment->bufsz = kbuf->bufsz;
ksegment->mem = kbuf->mem;
ksegment->memsz = kbuf->memsz;
+   ksegment->skip_checksum = kbuf->skip_checksum;
kbuf->image->nr_segments++;
return 0;
 }
@@ -598,7 +599,6 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
char *digest;
void *zero_buf;
struct kexec_sha_region *sha_regions;
-   struct purgatory_info *pi = >purgatory_info;
 
zero_buf = __va(page_to_pfn(ZERO_PAGE(0)) << PAGE_SHIFT);
zero_buf_sz = PAGE_SIZE;
@@ -638,11 +638,7 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
struct kexec_segment *ksegment;
 
ksegment = >segment[i];
-   /*
-* Skip purgatory as it will be modified once we put digest
-* info in purgatory.
-*/
-   if (ksegment->kbuf == pi->purgatory_buf)
+   if (ksegment->skip_checksum)
continue;
 
ret = crypto_shash_update(desc, ksegment->kbuf,
@@ -714,7 +710,7 @@ static int __kexec_load_purgatory(struct kimage *image, 
unsigned long min,
Elf_Shdr *sechdrs = NULL;
struct kexec_buf kbuf = { .image = image, .bufsz = 0, .buf_align = 1,
  .buf_min = min, .buf_max = max,
- .top_down = top_down };
+ .top_down = top_down, .skip_checksum = true };
 
/*
 * sechdrs_c points to section headers in purgatory and are read
@@ -819,7 +815,10 @@ static int __kexec_load_purgatory(struct kimage *image, 
unsigned long min,
if (kbuf.buf_align < bss_align)
kbuf.buf_align = bss_align;
 
-   /* Add buffer to segment list */
+   /*
+* Add buffer to segment list. Don't checksum the segment as
+* it will be modified once we 

[PATCH v3 2/5] powerpc: kexec_file: Add buffer hand-over support for the next kernel

2016-08-25 Thread Thiago Jung Bauermann
The buffer hand-over mechanism allows the currently running kernel to pass
data to kernel that will be kexec'd via a kexec segment. The second kernel
can check whether the previous kernel sent data and retrieve it.

This is the architecture-specific part.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/kexec.h   |  12 +++-
 arch/powerpc/kernel/kexec_elf_64.c |   2 +-
 arch/powerpc/kernel/machine_kexec_64.c | 114 +++--
 3 files changed, 120 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 09813f3ea3c2..cfc702f60726 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -92,12 +92,20 @@ static inline bool kdump_in_progress(void)
 }
 
 #ifdef CONFIG_KEXEC_FILE
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+   phys_addr_t handover_buffer_addr;
+   unsigned long handover_buffer_size;
+};
+
 int setup_purgatory(struct kimage *image, const void *slave_code,
const void *fdt, unsigned long kernel_load_addr,
unsigned long fdt_load_addr, unsigned long stack_top,
int debug);
-int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
- unsigned long initrd_len, const char *cmdline);
+int setup_new_fdt(const struct kimage *image, void *fdt,
+ unsigned long initrd_load_addr, unsigned long initrd_len,
+ const char *cmdline);
 bool find_debug_console(const void *fdt);
 #endif /* CONFIG_KEXEC_FILE */
 
diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
b/arch/powerpc/kernel/kexec_elf_64.c
index a2b8ddc44b74..6127a495a774 100644
--- a/arch/powerpc/kernel/kexec_elf_64.c
+++ b/arch/powerpc/kernel/kexec_elf_64.c
@@ -209,7 +209,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
goto out;
}
 
-   ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline);
+   ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
if (ret)
goto out;
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 5d59ccdc39f5..59f1e5d4b6c4 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -489,6 +489,60 @@ int arch_kimage_file_post_load_cleanup(struct kimage 
*image)
return image->fops->cleanup(image->image_loader_data);
 }
 
+bool kexec_can_hand_over_buffer(void)
+{
+   return true;
+}
+
+int arch_kexec_add_handover_buffer(struct kimage *image,
+  unsigned long load_addr, unsigned long size)
+{
+   image->arch.handover_buffer_addr = load_addr;
+   image->arch.handover_buffer_size = size;
+
+   return 0;
+}
+
+int kexec_get_handover_buffer(void **addr, unsigned long *size)
+{
+   int ret;
+   u64 start_addr, end_addr;
+
+   ret = of_property_read_u64(of_chosen,
+  "linux,kexec-handover-buffer-start",
+  _addr);
+   if (ret == -EINVAL)
+   return -ENOENT;
+   else if (ret)
+   return -EINVAL;
+
+   ret = of_property_read_u64(of_chosen, "linux,kexec-handover-buffer-end",
+  _addr);
+   if (ret == -EINVAL)
+   return -ENOENT;
+   else if (ret)
+   return -EINVAL;
+
+   *addr =  __va(start_addr);
+   /* -end is the first address after the buffer. */
+   *size = end_addr - start_addr;
+
+   return 0;
+}
+
+int kexec_free_handover_buffer(void)
+{
+   int ret;
+   void *addr;
+   unsigned long size;
+
+   ret = kexec_get_handover_buffer(, );
+   if (ret)
+   return ret;
+
+   return memblock_free((phys_addr_t) addr, size);
+}
+
 /**
  * arch_kexec_walk_mem() - call func(data) for each unreserved memory block
  * @kbuf:  Context info for the search. Also passed to @func.
@@ -686,9 +740,52 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
return 0;
 }
 
-/*
- * setup_new_fdt() - modify /chosen and memory reservation for the next kernel
- * @fdt:
+/**
+ * setup_handover_buffer() - add properties and reservation for the handover 
buffer
+ * @image: kexec image being loaded.
+ * @fdt:   Flattened device tree for the next kernel.
+ * @chosen_node:   Offset to the chosen node.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+static int setup_handover_buffer(const struct kimage *image, void *fdt,
+int chosen_node)
+{
+   int ret;
+
+   if (image->arch.handover_buffer_addr == 0)
+   return 0;
+
+   ret = fdt_setprop_u64(fdt, chosen_node,
+ "linux,kexec-handover-buffer-start",
+ image->arch.handover_buffer_addr);
+   if 

[PATCH v3 0/5] kexec_file: Add buffer hand-over for the next kernel

2016-08-25 Thread Thiago Jung Bauermann
Hello,

This patch series implements a mechanism which allows the kernel to pass
on a buffer to the kernel that will be kexec'd. This buffer is passed
as a segment which is added to the kimage when it is being prepared
by kexec_file_load.

How the second kernel is informed of this buffer is architecture-specific.
On powerpc, this is done via the device tree, by checking
the properties /chosen/linux,kexec-handover-buffer-start and
/chosen/linux,kexec-handover-buffer-end, which is analogous to how the
kernel finds the initrd.

This is needed because the Integrity Measurement Architecture subsystem
needs to preserve its measurement list accross the kexec reboot. The
following patch series for the IMA subsystem uses this feature for that
purpose:

https://lists.infradead.org/pipermail/kexec/2016-August/016745.html

This is so that IMA can implement trusted boot support on the OpenPower
platform, because on such systems an intermediary Linux instance running
as part of the firmware is used to boot the target operating system via
kexec. Using this mechanism, IMA on this intermediary instance can
hand over to the target OS the measurements of the components that were
used to boot it.

Because there could be additional measurement events between the
kexec_file_load call and the actual reboot, IMA needs a way to update the
buffer with those additional events before rebooting. One can minimize
the interval between the kexec_file_load and the reboot syscalls, but as
small as it can be, there is always the possibility that the measurement
list will be out of date at the time of reboot.

To address this issue, this patch series also introduces
kexec_update_segment, which allows a reboot notifier to change the
contents of the image segment during the reboot process.

The last patch is not intended to be merged, it just demonstrates how
this feature can be used.

This series applies on top of v6 of the "kexec_file_load implementation
for PowerPC" patch series (which applies on top of v4.8-rc1):

https://lists.infradead.org/pipermail/kexec/2016-August/016960.html

Changes for v3:
- Rebased series on kexec_file_load patch series v6.
  Both patch series apply cleanly on todays' Linus master branch, except
  for a few lines of fuzz in arch/powerpc/Makefile and arch/powerpc/Kconfig.
- Patch "kexec_file: Add buffer hand-over support for the next kernel"
  - Fix compilation warning in  by adding a struct kexec_buf
forward declaration when CONFIG_KEXEC_FILE=n. (Fenguang Wu)
- Patch "kexec_file: Allow skipping checksum calculation for some segments."
  - Substitute checksum argument in kexec_add_buffer with skip_checksum
member in struct kexec_buf, as suggested by Dave Young.
- Patch "kexec_file: Add mechanism to update kexec segments."
  - Use kmap_atomic in kexec_update_segment, as suggested by Andrew Morton.
  - Fix build warning on m68k by passing unsigned long value to __va instead
of void *. (Fenguang Wu)
  - Change bufsz and memsz arguments of kexec_update_segment to size_t to fix
compilation warning. (Fenguang Wu)
- Patch "kexec: Share logic to copy segment page contents."
  - Dropped this patch.
- Patch "IMA: Demonstration code for kexec buffer passing."
  - Update to use kexec_buf.skip_checksum instead of passing it in
kexec_add_buffer.

Changes for v2:
- Rebased on v5 of kexec_file_load implementation for PowerPC patch series.
- Patch "kexec_file: Add buffer hand-over support for the next kernel"
  - Changed kexec_add_handover_buffer to receive a struct kexec_buf, as
suggested by Dave Young.
- Patch "powerpc: kexec_file: Add buffer hand-over support for the next kernel"
  - Moved setup_handover_buffer from kexec_elf_64.c to machine_kexec_64.c.
  - Call setup_handover_buffer from setup_new_fdt instead of elf64_load.
  - Changed kexec_get_handover_buffer to read from the expanded device tree
instead of the flattened device tree.
- Patch "kexec_file: Add mechanism to update kexec segments.":
  - Removed unnecessary "#include " in kexec_file.c.
  - Round up memsz argument to PAGE_SIZE.
  - Check if kexec_image is NULL in kexec_update_segment.
- Patch "IMA: Demonstration code for kexec buffer passing."
  - Avoid registering reboot notifier again if kexec_file_load is called
more than once.


Thiago Jung Bauermann (5):
  kexec_file: Add buffer hand-over support for the next kernel
  powerpc: kexec_file: Add buffer hand-over support for the next kernel
  kexec_file: Allow skipping checksum calculation for some segments.
  kexec_file: Add mechanism to update kexec segments.
  IMA: Demonstration code for kexec buffer passing.

 arch/powerpc/include/asm/kexec.h   |  12 +++-
 arch/powerpc/kernel/kexec_elf_64.c |   2 +-
 arch/powerpc/kernel/machine_kexec_64.c | 114 +++--
 include/linux/ima.h|  11 
 include/linux/kexec.h  |  56 +---
 kernel/kexec_core.c|  99 
 

[PATCH v3 1/5] kexec_file: Add buffer hand-over support for the next kernel

2016-08-25 Thread Thiago Jung Bauermann
The buffer hand-over mechanism allows the currently running kernel to pass
data to kernel that will be kexec'd via a kexec segment. The second kernel
can check whether the previous kernel sent data and retrieve it.

This is the architecture-independent part of the feature.

Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/kexec.h | 31 +++
 kernel/kexec_file.c   | 68 +++
 2 files changed, 99 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d419d0e51fe5..16561e96a6d7 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -383,6 +383,37 @@ static inline void *boot_phys_to_virt(unsigned long entry)
return phys_to_virt(boot_phys_to_phys(entry));
 }
 
+#ifdef CONFIG_KEXEC_FILE
+bool __weak kexec_can_hand_over_buffer(void);
+int __weak arch_kexec_add_handover_buffer(struct kimage *image,
+ unsigned long load_addr,
+ unsigned long size);
+int kexec_add_handover_buffer(struct kexec_buf *kbuf);
+int __weak kexec_get_handover_buffer(void **addr, unsigned long *size);
+int __weak kexec_free_handover_buffer(void);
+#else
+struct kexec_buf;
+
+static inline bool kexec_can_hand_over_buffer(void)
+{
+   return false;
+}
+
+static inline int kexec_add_handover_buffer(struct kexec_buf *kbuf)
+{
+   return -ENOTSUPP;
+}
+
+static inline int kexec_get_handover_buffer(void **addr, unsigned long *size)
+{
+   return -ENOTSUPP;
+}
+
+static inline int kexec_free_handover_buffer(void)
+{
+   return -ENOTSUPP;
+}
+#endif /* CONFIG_KEXEC_FILE */
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 3125d1689712..1e10689f7662 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -113,6 +113,74 @@ void kimage_file_post_load_cleanup(struct kimage *image)
image->image_loader_data = NULL;
 }
 
+/**
+ * kexec_can_hand_over_buffer - can we pass data to the kexec'd kernel?
+ */
+bool __weak kexec_can_hand_over_buffer(void)
+{
+   return false;
+}
+
+/**
+ * arch_kexec_add_handover_buffer - do arch-specific steps to handover buffer
+ *
+ * Architectures should use this function to pass on the handover buffer
+ * information to the next kernel.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int __weak arch_kexec_add_handover_buffer(struct kimage *image,
+ unsigned long load_addr,
+ unsigned long size)
+{
+   return -ENOTSUPP;
+}
+
+/**
+ * kexec_add_handover_buffer - add buffer to be used by the next kernel
+ * @kbuf:  Buffer contents and memory parameters.
+ *
+ * This function assumes that kexec_mutex is held.
+ * On successful return, @kbuf->mem will have the physical address of
+ * the buffer in the next kernel.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_add_handover_buffer(struct kexec_buf *kbuf)
+{
+   int ret;
+
+   if (!kexec_can_hand_over_buffer())
+   return -ENOTSUPP;
+
+   ret = kexec_add_buffer(kbuf);
+   if (ret)
+   return ret;
+
+   return arch_kexec_add_handover_buffer(kbuf->image, kbuf->mem,
+ kbuf->memsz);
+}
+
+/**
+ * kexec_get_handover_buffer - get the handover buffer from the previous kernel
+ * @addr:  On successful return, set to point to the buffer contents.
+ * @size:  On successful return, set to the buffer size.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int __weak kexec_get_handover_buffer(void **addr, unsigned long *size)
+{
+   return -ENOTSUPP;
+}
+
+/**
+ * kexec_free_handover_buffer - free memory used by the handover buffer
+ */
+int __weak kexec_free_handover_buffer(void)
+{
+   return -ENOTSUPP;
+}
+
 /*
  * In file mode list of segments is prepared by kernel. Copy relevant
  * data from user space, do error checking, prepare segment list
-- 
1.9.1



Re: [RFCv3 00/17] PAPR HPT resizing, guest & host side

2016-08-25 Thread David Gibson
On Thu, Aug 25, 2016 at 10:38:34PM +1000, Paul Mackerras wrote:
> On Mon, Mar 21, 2016 at 02:53:07PM +1100, David Gibson wrote:
> > This is an implementation of the kernel parts of the PAPR hashed page
> > table (HPT) resizing extension.
> > 
> > It contains a complete guest-side implementation - or as complete as
> > it can be until we have a final PAPR change.
> > 
> > It also contains a draft host side implementation for KVM HV (the KVM
> > PR and TCG host-side implementations live in qemu).  This works, but
> > is very slow in the critical section (where the guest must be
> > stopped).  It is significantly slower than the TCG/PR implementation;
> > unusably slow for large hash tables (~2.8s for a 1G HPT).
> > 
> > I'm still looking into what's the cause of the slowness, and I'm not
> > sure yet if the current approach can be tweaked to be fast enough, or
> > if it will require a new approach.
> 
> I have finally managed to have a close look at this series.  The
> approach and implementation seem basically sane,

Ok, good to know.

> though I think the
> rehash function could be optimized a bit.  I also have an optimized
> implementation of hpte_page_size() and hpte_base_page_size() which
> should be a lot quicker than the 2d linear (areal?) search which we do
> at present.

Ok, sounds like with those optimizations this approach might be good
enough.  I aim to send a revised version of these some time after the
RHEL 7.3 crunch.

In the meantime, any word on the PAPR proposal?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/powernv/pci: Use kmalloc_array() in two functions

2016-08-25 Thread walter harms


Am 24.08.2016 22:36, schrieb SF Markus Elfring:
> From: Markus Elfring 
> Date: Wed, 24 Aug 2016 22:26:37 +0200
> 
> A multiplication for the size determination of a memory allocation
> indicated that an array data structure should be processed.
> Thus reuse the corresponding function "kmalloc_array".
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index fd9444f..2366552 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1305,7 +1305,9 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
> u16 num_vfs)
>   else
>   m64_bars = 1;
>  
> - pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
> + pdn->m64_map = kmalloc_array(m64_bars,
> +  sizeof(*pdn->m64_map),
> +  GFP_KERNEL);
>   if (!pdn->m64_map)
>   return -ENOMEM;
>   /* Initialize the m64_map to IODA_INVALID_M64 */
> @@ -1572,8 +1574,9 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
> num_vfs)
>  
>   /* Allocating pe_num_map */
>   if (pdn->m64_single_mode)
> - pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map) * 
> num_vfs,
> - GFP_KERNEL);
> + pdn->pe_num_map = kmalloc_array(num_vfs,
> + 
> sizeof(*pdn->pe_num_map),
> + GFP_KERNEL);
>   else
>   pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map), 
> GFP_KERNEL);
>  


what is the value of num_vfs in the !pdn->m64_single_mode case ?
otherwise someone could make it like:

if (!pdn->m64_single_mode)
   num_vfs=1;

pdn->pe_num_map = kmalloc_array(num_vfs, 

so it looks a bit oversophisticated.

re,
 wh






[PATCH 11/44] usb: gadget: udc: fsl_qe_udc: don't print on ENOMEM

2016-08-25 Thread Wolfram Sang
All kmalloc-based functions print enough information on failures.

Signed-off-by: Wolfram Sang 
---
 drivers/usb/gadget/udc/fsl_qe_udc.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_qe_udc.c 
b/drivers/usb/gadget/udc/fsl_qe_udc.c
index cf8819a5c5b263..9d6b2c8eed4294 100644
--- a/drivers/usb/gadget/udc/fsl_qe_udc.c
+++ b/drivers/usb/gadget/udc/fsl_qe_udc.c
@@ -421,10 +421,8 @@ static int qe_ep_rxbd_update(struct qe_ep *ep)
bd = ep->rxbase;
 
ep->rxframe = kmalloc(sizeof(*ep->rxframe), GFP_ATOMIC);
-   if (ep->rxframe == NULL) {
-   dev_err(ep->udc->dev, "malloc rxframe failed\n");
+   if (!ep->rxframe)
return -ENOMEM;
-   }
 
qe_frame_init(ep->rxframe);
 
@@ -435,9 +433,7 @@ static int qe_ep_rxbd_update(struct qe_ep *ep)
 
size = (ep->ep.maxpacket + USB_CRC_SIZE + 2) * (bdring_len + 1);
ep->rxbuffer = kzalloc(size, GFP_ATOMIC);
-   if (ep->rxbuffer == NULL) {
-   dev_err(ep->udc->dev, "malloc rxbuffer failed,size=%d\n",
-   size);
+   if (!ep->rxbuffer) {
kfree(ep->rxframe);
return -ENOMEM;
}
@@ -668,10 +664,8 @@ static int qe_ep_init(struct qe_udc *udc,
 
if ((ep->tm == USBP_TM_CTL) || (ep->dir == USB_DIR_IN)) {
ep->txframe = kmalloc(sizeof(*ep->txframe), GFP_ATOMIC);
-   if (ep->txframe == NULL) {
-   dev_err(udc->dev, "malloc txframe failed\n");
+   if (!ep->txframe)
goto en_done2;
-   }
qe_frame_init(ep->txframe);
}
 
@@ -2347,10 +2341,8 @@ static struct qe_udc *qe_udc_config(struct 
platform_device *ofdev)
u32 offset;
 
udc = kzalloc(sizeof(*udc), GFP_KERNEL);
-   if (udc == NULL) {
-   dev_err(>dev, "malloc udc failed\n");
+   if (!udc)
goto cleanup;
-   }
 
udc->dev = >dev;
 
-- 
2.9.3



[PATCH 00/44] usb: don't print on ENOMEM

2016-08-25 Thread Wolfram Sang
Here is my next series to save memory by removing unneeded strings. It removes
in the usb subsystem all unspecific error messages after calling malloc-based
functions, i.e. (devm_)k[zcm]alloc. kmalloc prints enough information in that
case. If the message was specific (e.g. "can't save CLEAR_TT_BUFFER state"), I
left it. This series saves ~4.5KB of "out of memory" permutations in .text and
.rodata. For modified lines, (x == NULL) was replaced with (!NULL) as well.
This seems to be the dominant style in this subsystem and checkpatch recommends
it as well (and I prefer it, too).

Wolfram Sang (44):
  usb: atm: cxacru: don't print on ENOMEM
  usb: atm: speedtch: don't print on ENOMEM
  usb: atm: ueagle-atm: don't print on ENOMEM
  usb: atm: usbatm: don't print on ENOMEM
  usb: class: usbtmc: don't print on ENOMEM
  usb: core: hcd: don't print on ENOMEM
  usb: core: hub: don't print on ENOMEM
  usb: core: message: don't print on ENOMEM
  usb: core: urb: don't print on ENOMEM
  usb: dwc2: gadget: don't print on ENOMEM
  usb: gadget: udc: fsl_qe_udc: don't print on ENOMEM
  usb: gadget: udc: goku_udc: don't print on ENOMEM
  usb: gadget: udc: udc-xilinx: don't print on ENOMEM
  usb: host: fhci-hcd: don't print on ENOMEM
  usb: host: max3421-hcd: don't print on ENOMEM
  usb: host: uhci-hcd: don't print on ENOMEM
  usb: host: xhci-tegra: don't print on ENOMEM
  usb: host: xhci: don't print on ENOMEM
  usb: misc: adutux: don't print on ENOMEM
  usb: misc: appledisplay: don't print on ENOMEM
  usb: misc: cypress_cy7c63: don't print on ENOMEM
  usb: misc: cytherm: don't print on ENOMEM
  usb: misc: ftdi-elan: don't print on ENOMEM
  usb: misc: idmouse: don't print on ENOMEM
  usb: misc: iowarrior: don't print on ENOMEM
  usb: misc: ldusb: don't print on ENOMEM
  usb: misc: legousbtower: don't print on ENOMEM
  usb: misc: lvstest: don't print on ENOMEM
  usb: misc: trancevibrator: don't print on ENOMEM
  usb: misc: usblcd: don't print on ENOMEM
  usb: misc: usbsevseg: don't print on ENOMEM
  usb: misc: uss720: don't print on ENOMEM
  usb: misc: yurex: don't print on ENOMEM
  usb: musb: am35x: don't print on ENOMEM
  usb: musb: da8xx: don't print on ENOMEM
  usb: renesas_usbhs: mod_gadget: don't print on ENOMEM
  usb: renesas_usbhs: mod_host: don't print on ENOMEM
  usb: renesas_usbhs: pipe: don't print on ENOMEM
  usb: storage: alauda: don't print on ENOMEM
  usb: storage: sddr09: don't print on ENOMEM
  usb: usb-skeleton: don't print on ENOMEM
  usb: wusbcore: crypto: don't print on ENOMEM
  usb: wusbcore: security: don't print on ENOMEM
  usb: wusbcore: wa-nep: don't print on ENOMEM

 drivers/usb/atm/cxacru.c   |  4 +---
 drivers/usb/atm/speedtch.c |  1 -
 drivers/usb/atm/ueagle-atm.c   |  9 ++---
 drivers/usb/atm/usbatm.c   |  7 +--
 drivers/usb/class/usbtmc.c |  4 +---
 drivers/usb/core/hcd.c |  4 +---
 drivers/usb/core/hub.c |  9 +++--
 drivers/usb/core/message.c |  5 +
 drivers/usb/core/urb.c |  4 +---
 drivers/usb/dwc2/gadget.c  |  8 ++--
 drivers/usb/gadget/udc/fsl_qe_udc.c| 16 
 drivers/usb/gadget/udc/goku_udc.c  |  3 +--
 drivers/usb/gadget/udc/udc-xilinx.c|  4 +---
 drivers/usb/host/fhci-hcd.c|  4 +---
 drivers/usb/host/max3421-hcd.c |  8 ++--
 drivers/usb/host/uhci-hcd.c|  5 +
 drivers/usb/host/xhci-tegra.c  |  1 -
 drivers/usb/host/xhci.c|  4 +---
 drivers/usb/misc/adutux.c  | 13 +++--
 drivers/usb/misc/appledisplay.c|  3 ---
 drivers/usb/misc/cypress_cy7c63.c  |  5 +
 drivers/usb/misc/cytherm.c | 32 
 drivers/usb/misc/ftdi-elan.c   |  1 -
 drivers/usb/misc/idmouse.c |  1 -
 drivers/usb/misc/iowarrior.c   | 20 ++--
 drivers/usb/misc/ldusb.c   | 20 +---
 drivers/usb/misc/legousbtower.c| 16 
 drivers/usb/misc/lvstest.c |  4 +---
 drivers/usb/misc/trancevibrator.c  |  3 +--
 drivers/usb/misc/usblcd.c  |  9 ++---
 drivers/usb/misc/usbsevseg.c   |  8 ++--
 drivers/usb/misc/uss720.c  |  4 +---
 drivers/usb/misc/yurex.c   |  8 ++--
 drivers/usb/musb/am35x.c   |  4 +---
 drivers/usb/musb/da8xx.c   |  4 +---
 drivers/usb/renesas_usbhs/mod_gadget.c |  6 +-
 drivers/usb/renesas_usbhs/mod_host.c   | 10 ++
 drivers/usb/renesas_usbhs/pipe.c   |  4 +---
 drivers/usb/storage/alauda.c   | 11 +++
 drivers/usb/storage/sddr09.c   | 14 --
 drivers/usb/usb-skeleton.c |  9 ++---
 drivers/usb/wusbcore/crypto.c  |  4 +---
 drivers/usb/wusbcore/security.c|  4 +---
 drivers/usb/wusbcore/wa-nep.c  |  5 +
 44 files changed, 78 

Re: [PATCH v3 2/2] powerpc/fadump: parse fadump reserve memory size based on memory range

2016-08-25 Thread Hari Bathini



On Thursday 25 August 2016 12:31 PM, Dave Young wrote:

On 08/10/16 at 03:35pm, Hari Bathini wrote:

When fadump is enabled, by default 5% of system RAM is reserved for
fadump kernel. While that works for most cases, it is not good enough
for every case.

Currently, to override the default value, fadump supports specifying
memory to reserve with fadump_reserve_mem=size, where only a fixed size
can be specified. This patch adds support to specify memory size to
reserve for different memory ranges as below:

fadump_reserve_mem=:[,:,...]

Hi, Hari


Hi Dave,


I do not understand why you need introduce the new cmdline param, what's
the difference between the "fadump reserved" memory and the memory


I am not introducing a new parameter but adding a new syntax for
an existing parameter.


reserved by "crashkernel="? Can fadump just use crashkernel= to reserve
memory?


Not all syntaxes supported by crashkernel apply for fadump_reserve_mem.
Nonetheless, it is worth considering reuse of crashkernel parameter instead
of fadump_reserve_mem. Let me see what I can do about this..

Thanks
Hari


Thanks
Dave


Supporting range based input for "fadump_reserve_mem" parameter helps
using the same commandline parameter for different system memory sizes.

Signed-off-by: Hari Bathini 
Reviewed-by: Mahesh J Salgaonkar 
---

Changes from v2:
1. Updated changelog


  arch/powerpc/kernel/fadump.c |   63 --
  1 file changed, 54 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b3a6633..7c01b5b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -193,6 +193,55 @@ static unsigned long init_fadump_mem_struct(struct 
fadump_mem_struct *fdm,
return addr;
  }
  
+/*

+ * This function parses command line for fadump_reserve_mem=
+ *
+ * Supports the below two syntaxes:
+ *1. fadump_reserve_mem=size
+ *2. fadump_reserve_mem=ramsize-range:size[,...]
+ *
+ * Sets fw_dump.reserve_bootvar with the memory size
+ * provided, 0 otherwise
+ *
+ * The function returns -EINVAL on failure, 0 otherwise.
+ */
+static int __init parse_fadump_reserve_mem(void)
+{
+   char *name = "fadump_reserve_mem=";
+   char *fadump_cmdline = NULL, *cur;
+
+   fw_dump.reserve_bootvar = 0;
+
+   /* find fadump_reserve_mem and use the last one if there are many */
+   cur = strstr(boot_command_line, name);
+   while (cur) {
+   fadump_cmdline = cur;
+   cur = strstr(cur+1, name);
+   }
+
+   /* when no fadump_reserve_mem= cmdline option is provided */
+   if (!fadump_cmdline)
+   return 0;
+
+   fadump_cmdline += strlen(name);
+
+   /* for fadump_reserve_mem=size cmdline syntax */
+   if (!is_colon_in_param(fadump_cmdline)) {
+   fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
+   return 0;
+   }
+
+   /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
+   cur = fadump_cmdline;
+   fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
+   , memblock_phys_mem_size());
+   if (cur == fadump_cmdline) {
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
  /**
   * fadump_calculate_reserve_size(): reserve variable boot area 5% of System 
RAM
   *
@@ -212,12 +261,17 @@ static inline unsigned long 
fadump_calculate_reserve_size(void)
  {
unsigned long size;
  
+	/* sets fw_dump.reserve_bootvar */

+   parse_fadump_reserve_mem();
+
/*
 * Check if the size is specified through fadump_reserve_mem= cmdline
 * option. If yes, then use that.
 */
if (fw_dump.reserve_bootvar)
return fw_dump.reserve_bootvar;
+   else
+   printk(KERN_INFO "fadump: calculating default boot size\n");
  
  	/* divide by 20 to get 5% of value */

size = memblock_end_of_DRAM() / 20;
@@ -348,15 +402,6 @@ static int __init early_fadump_param(char *p)
  }
  early_param("fadump", early_fadump_param);
  
-/* Look for fadump_reserve_mem= cmdline option */

-static int __init early_fadump_reserve_mem(char *p)
-{
-   if (p)
-   fw_dump.reserve_bootvar = memparse(p, );
-   return 0;
-}
-early_param("fadump_reserve_mem", early_fadump_reserve_mem);
-
  static void register_fw_dump(struct fadump_mem_struct *fdm)
  {
int rc;


___
kexec mailing list
ke...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec





Re: [alsa-devel] [PATCH] ALSA: snd-aoa: enable sound on PowerBook G4 12"

2016-08-25 Thread Takashi Iwai
On Wed, 24 Aug 2016 22:35:58 +0200,
Aaro Koskinen wrote:
> 
> Hi,
> 
> On Wed, Aug 24, 2016 at 09:43:23PM +0200, Johannes Berg wrote:
> > On Wed, 2016-08-24 at 20:57 +0300, Aaro Koskinen wrote:
> > > Enable sound on PowerBook G4 12".
> > 
> > Looks good to me, I assume you tested it and it works :)
> 
> Yes, I have this laptop in use.

OK, applied the patch now.  Thanks.


Takashi


Re: [RFC/PATCH 1/2] cpuidle: Allow idle-states to be disabled at start

2016-08-25 Thread Daniel Lezcano
On 08/25/2016 03:46 PM, Balbir Singh wrote:
> 
> 
> On 25/08/16 01:06, Daniel Lezcano wrote:
>> On 08/24/2016 04:48 PM, Balbir Singh wrote:
>>>
>>>
>>> On 25/08/16 00:44, Daniel Lezcano wrote:
 On 08/19/2016 12:26 AM, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
>
> Currently all the idle states registered by a cpu-idle driver are
> enabled by default. This patch adds a mechanism which allows the
> driver to hint if an idle-state should start in a disabled state. The
> cpu-idle core will use this hint to appropriately initialize the
> usage->disable knob of the CPU device idle state.

 Why do you need to do that ?

>>>
>>> I think patch 2/2 explains the reason as it uses this infrastructure
>>
>> Ok, let me elaborate the question, I was not clear.
>>
>> Why the userspace can't setup the system environment at boot time by
>> disabling the state instead of adding extra code to disable it at boot
>> time in the kernel and then re-enable it from userspace ?
> 
> Gautham's patches don't want to have those states enabled by default.
> They are unlikely to be what production systems need, but likely
> what a knowledgeable person can look into selectively enable for
> experimentation.

Why not invert the logic ?

A knowledgeable person can look into selectively disable for production.

In addition, a kernel command line option to specify which state to
disable would be appropriate and beneficial for all existing drivers.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [RFC/PATCH 1/2] cpuidle: Allow idle-states to be disabled at start

2016-08-25 Thread Balbir Singh


On 25/08/16 01:06, Daniel Lezcano wrote:
> On 08/24/2016 04:48 PM, Balbir Singh wrote:
>>
>>
>> On 25/08/16 00:44, Daniel Lezcano wrote:
>>> On 08/19/2016 12:26 AM, Gautham R. Shenoy wrote:
 From: "Gautham R. Shenoy" 

 Currently all the idle states registered by a cpu-idle driver are
 enabled by default. This patch adds a mechanism which allows the
 driver to hint if an idle-state should start in a disabled state. The
 cpu-idle core will use this hint to appropriately initialize the
 usage->disable knob of the CPU device idle state.
>>>
>>> Why do you need to do that ?
>>>
>>
>> I think patch 2/2 explains the reason as it uses this infrastructure
> 
> Ok, let me elaborate the question, I was not clear.
> 
> Why the userspace can't setup the system environment at boot time by
> disabling the state instead of adding extra code to disable it at boot
> time in the kernel and then re-enable it from userspace ?

Gautham's patches don't want to have those states enabled by default.
They are unlikely to be what production systems need, but likely
what a knowledgeable person can look into selectively enable for
experimentation.

@Gautham?


Balbir Singh.


Re: [PATCH 3/5] of/platform: introduce a generic way to declare a platform bus

2016-08-25 Thread Rob Herring
On Mon, Aug 22, 2016 at 9:06 PM, Kevin Hao  wrote:
> The specific buses which need to be probed at boot time are different
> between platforms. Instead of put all the buses into the default
> of_default_bus_match_table[] match tables, this patch introduces a
> general way to declare a platform bus.

I'd prefer to not do this with linker sections if possible. Doesn't
PPC have machine descriptors that you could add the match table to? If
that table exists then arch_want_default_of_probe could return a
pointer to it.

Are there any platforms that work with the default match table? I'd be
fine with adding some strings to the default if that helps.

{ .type = "soc", },

device_type is deprecated for FDT, so this shouldn't be needed except
for really old stuff.

{ .compatible = "soc", },

Doesn't appear in kernel dts files. Could be out of tree or old ones?

{ .compatible = "simple-bus" },

Already handled.

{ .compatible = "gianfar" },

Seems like the children of this should be probed by the gianfar driver.

{ .compatible = "gpio-leds", },

I don't think this is needed.

{ .type = "qe", },

Again, deprecated.

{ .compatible = "fsl,qe", },

No issue to add this. Though, you could also probe it from mpc85xx_qe_init().

Rob


Re: [PATCH] powerpc: Remove suspect CONFIG_PPC_BOOK3E #ifdefs in nohash/64/pgtable.h

2016-08-25 Thread Aneesh Kumar K.V
Rui Teng  writes:

> There are three #ifdef CONFIG_PPC_BOOK3E sections in nohash/64/pgtable.h.
> And there should be no configurations possible which use nohash/64/pgtable.h
> but don't also enable CONFIG_PPC_BOOK3E.
>
> Suggested-by: Michael Ellerman 
> Signed-off-by: Rui Teng 


Reviewed-by: Aneesh Kumar K.V 

> ---
>  arch/powerpc/include/asm/nohash/64/pgtable.h | 14 +-
>  1 file changed, 1 insertion(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
> b/arch/powerpc/include/asm/nohash/64/pgtable.h
> index d4d808c..6213fc1 100644
> --- a/arch/powerpc/include/asm/nohash/64/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
> @@ -26,15 +26,11 @@
>  #else
>  #define PMD_CACHE_INDEX  PMD_INDEX_SIZE
>  #endif
> +
>  /*
>   * Define the address range of the kernel non-linear virtual area
>   */
> -
> -#ifdef CONFIG_PPC_BOOK3E
>  #define KERN_VIRT_START ASM_CONST(0x8000)
> -#else
> -#define KERN_VIRT_START ASM_CONST(0xD000)
> -#endif
>  #define KERN_VIRT_SIZE   ASM_CONST(0x1000)
>  
>  /*
> @@ -43,11 +39,7 @@
>   * (we keep a quarter for the virtual memmap)
>   */
>  #define VMALLOC_STARTKERN_VIRT_START
> -#ifdef CONFIG_PPC_BOOK3E
>  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 2)
> -#else
> -#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
> -#endif
>  #define VMALLOC_END  (VMALLOC_START + VMALLOC_SIZE)
>  
>  /*
> @@ -85,12 +77,8 @@
>   * Defines the address of the vmemap area, in its own region on
>   * hash table CPUs and after the vmalloc space on Book3E
>   */
> -#ifdef CONFIG_PPC_BOOK3E
>  #define VMEMMAP_BASE VMALLOC_END
>  #define VMEMMAP_END  KERN_IO_START
> -#else
> -#define VMEMMAP_BASE (VMEMMAP_REGION_ID << REGION_SHIFT)
> -#endif
>  #define vmemmap  ((struct page *)VMEMMAP_BASE)
>  
>  
> -- 
> 2.7.4



Re: [PATCH] hwrng: pasemi_rng.c: Migrate to managed API

2016-08-25 Thread PrasannaKumar Muralidharan
> I will propose to use devm_ioremap_resource() instead for removing this 
> hardcoded 0x100, but i cannot find any user of this driver in any dts. (And 
> so cannot check that this 0x100 is given in any DT resource node)
>
> Is this normal ?

I wanted to use devm_ioremap_resource but could not find DT entry
required for this driver in any of the .dts files. So did not change
that. I could not find any dts/dtsi for this platform. So I assume
that the dtb is not present in the kernel, dtb is supplied by the
bootloader. I may be wrong in this. Can anyone confirm this?

Regards,
PrasannaKumar


Re: [RFCv3 00/17] PAPR HPT resizing, guest & host side

2016-08-25 Thread Paul Mackerras
On Mon, Mar 21, 2016 at 02:53:07PM +1100, David Gibson wrote:
> This is an implementation of the kernel parts of the PAPR hashed page
> table (HPT) resizing extension.
> 
> It contains a complete guest-side implementation - or as complete as
> it can be until we have a final PAPR change.
> 
> It also contains a draft host side implementation for KVM HV (the KVM
> PR and TCG host-side implementations live in qemu).  This works, but
> is very slow in the critical section (where the guest must be
> stopped).  It is significantly slower than the TCG/PR implementation;
> unusably slow for large hash tables (~2.8s for a 1G HPT).
> 
> I'm still looking into what's the cause of the slowness, and I'm not
> sure yet if the current approach can be tweaked to be fast enough, or
> if it will require a new approach.

I have finally managed to have a close look at this series.  The
approach and implementation seem basically sane, though I think the
rehash function could be optimized a bit.  I also have an optimized
implementation of hpte_page_size() and hpte_base_page_size() which
should be a lot quicker than the 2d linear (areal?) search which we do
at present.

Paul.


Re: [PATCH] hwrng: pasemi_rng.c: Migrate to managed API

2016-08-25 Thread LABBE Corentin
On Thu, Aug 25, 2016 at 05:04:16PM +0530, PrasannaKumar Muralidharan wrote:
> Use devm_ioremap and devm_hwrng_register instead of ioremap and
> hwrng_register. This removes unregistering and error handling code.
> 
> This patch is not tested with hardware as I don't have access to it.
> 
> Signed-off-by: PrasannaKumar Muralidharan 
> ---
>  drivers/char/hw_random/pasemi-rng.c | 26 +++---
>  1 file changed, 3 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/char/hw_random/pasemi-rng.c 
> b/drivers/char/hw_random/pasemi-rng.c
> index 699b725..0f03397 100644
> --- a/drivers/char/hw_random/pasemi-rng.c
> +++ b/drivers/char/hw_random/pasemi-rng.c
> @@ -100,37 +100,18 @@ static int rng_probe(struct platform_device *ofdev)
>   void __iomem *rng_regs;
>   struct device_node *rng_np = ofdev->dev.of_node;
>   struct resource res;
> - int err = 0;
>  
> - err = of_address_to_resource(rng_np, 0, );
> - if (err)
> + if (of_address_to_resource(rng_np, 0, ))
>   return -ENODEV;
>  
> - rng_regs = ioremap(res.start, 0x100);
> -
> + rng_regs = devm_ioremap(>dev, res.start, 0x100);
>   if (!rng_regs)
>   return -ENOMEM;
>  

I will propose to use devm_ioremap_resource() instead for removing this 
hardcoded 0x100, but i cannot find any user of this driver in any dts. (And so 
cannot check that this 0x100 is given in any DT resource node)

Is this normal ?

Regard



[PATCH] hwrng: pasemi_rng.c: Migrate to managed API

2016-08-25 Thread PrasannaKumar Muralidharan
Use devm_ioremap and devm_hwrng_register instead of ioremap and
hwrng_register. This removes unregistering and error handling code.

This patch is not tested with hardware as I don't have access to it.

Signed-off-by: PrasannaKumar Muralidharan 
---
 drivers/char/hw_random/pasemi-rng.c | 26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/drivers/char/hw_random/pasemi-rng.c 
b/drivers/char/hw_random/pasemi-rng.c
index 699b725..0f03397 100644
--- a/drivers/char/hw_random/pasemi-rng.c
+++ b/drivers/char/hw_random/pasemi-rng.c
@@ -100,37 +100,18 @@ static int rng_probe(struct platform_device *ofdev)
void __iomem *rng_regs;
struct device_node *rng_np = ofdev->dev.of_node;
struct resource res;
-   int err = 0;
 
-   err = of_address_to_resource(rng_np, 0, );
-   if (err)
+   if (of_address_to_resource(rng_np, 0, ))
return -ENODEV;
 
-   rng_regs = ioremap(res.start, 0x100);
-
+   rng_regs = devm_ioremap(>dev, res.start, 0x100);
if (!rng_regs)
return -ENOMEM;
 
pasemi_rng.priv = (unsigned long)rng_regs;
 
pr_info("Registering PA Semi RNG\n");
-
-   err = hwrng_register(_rng);
-
-   if (err)
-   iounmap(rng_regs);
-
-   return err;
-}
-
-static int rng_remove(struct platform_device *dev)
-{
-   void __iomem *rng_regs = (void __iomem *)pasemi_rng.priv;
-
-   hwrng_unregister(_rng);
-   iounmap(rng_regs);
-
-   return 0;
+   return devm_hwrng_register(>dev, _rng);
 }
 
 static const struct of_device_id rng_match[] = {
@@ -146,7 +127,6 @@ static struct platform_driver rng_driver = {
.of_match_table = rng_match,
},
.probe  = rng_probe,
-   .remove = rng_remove,
 };
 
 module_platform_driver(rng_driver);
-- 
2.5.0



Re: [PATCH v2 2/5] firmware: annotate thou shalt not request fw on init or probe

2016-08-25 Thread Daniel Vetter
On Wed, Aug 24, 2016 at 10:39 PM, Luis R. Rodriguez  wrote:
> On Wed, Aug 24, 2016 at 08:55:55AM +0200, Daniel Vetter wrote:
>> On Fri, Jun 17, 2016 at 12:54 AM, Luis R. Rodriguez  
>> wrote:
>> > Thou shalt not make firmware calls early on init or probe.
>
> <-- snip -->
>
>> > There are 4 offenders at this time:
>> >
>> > mcgrof@ergon ~/linux-next (git::20160609)$ export 
>> > COCCI=scripts/coccinelle/api/request_firmware.cocci
>> > mcgrof@ergon ~/linux-next (git::20160609)$ make coccicheck MODE=report
>> >
>> > drivers/fmc/fmc-fakedev.c: ERROR: driver call request firmware call on its 
>> > init routine on line 321.
>> > drivers/fmc/fmc-write-eeprom.c: ERROR: driver call request firmware call 
>> > on its probe routine on line 136.
>> > drivers/tty/serial/rp2.c: ERROR: driver call request firmware call on its 
>> > probe routine on line 796.
>> > drivers/tty/serial/ucc_uart.c: ERROR: driver call request firmware call on 
>> > its probe routine on line 1246.
>>
>> Plus all gpu drivers which need firmware. And yes we must load them at
>> probe
>
> Do you have an upstream driver in mind that does this ? Is it on device
> drier module probe or a DRM subsystem specific probe call of some sort ?

i915 is the one I care about for obvious reasons ;-) It's all from the
pci device probe function, but nested really deeply.

>> because people are generally pissed when they boot their machine
>> and the screen goes black. On top of that a lot of people want their
>> gpu drivers to be built-in, but can't ship the firmware blobs in the
>> kernel image because gpl. Yep, there's a bit a contradiction there ...
>
> Can they use initramfs for this ?

Apparently that's also uncool with the embedded folks. Tbh I don't
know exactly why. Also I thought initramfs is available only after
built-in drivers have finished loading, but maybe that changed.

> Also just curious -- as with other subsystems, is it possible to load
> a generic driver first, say vesa, and then a more enhanced one later ?
> I am not saying this is ideal or am I suggesting this, I'd just like
> to know the feasibility of this.

Some users want a fully running gfx stack 2s after power-on. There's
not even enough time to load an uefi or vga driver first. i915
directly initializes the gpu from power-on state on those.

>> I think what would work is loading the different subsystems of the
>> driver in parallel (we already do that largely)
>
> Init level stuff is actually pretty synchronous, and in fact both
> init and probe are called serially. There are a few subsystems which
> have been doing things a bit differently, but these are exceptions.
>
> When you say we already do this largely, can you describe a bit more
> precisely what you mean ?

Oh, this isn't subsystems as in linux device/driver model, but
different parts within the driver. We fire up a bunch of struct work
to get various bits done asynchronously.

>>, and then if one
>> firmware blob isn't there yet simply stall that async worker until it
>> shows up.
>
> Is this an existing framework or do you mean if we add something
> generic to do this async loading of subsystems ?

normal workers, and flush_work and friends to sync up. We have some
really fancy ideas for essentially async init tasks that can declare
their depencies systemd-unit file style, but that's only in the
prototype stage. We need this (eventually) since handling the ordering
correctly is getting unwieldy. But again just struct work launched
from the main pci probe function.

>> But the answers I've gotten thus far from request_firmware()
>> folks (well at least Greg) is don't do that ...
>
> Well in this patch set I'm adding myself as a MAINTAINER and I've
> been extending the firmware API recently to help with a few new
> features I want, I've been wanting to hear more feedback from
> other's needs and I had actually not gotten much -- except
> only recently with the usermode helper and reasons why some
> folks thought they could not use direct firmware loading from
> the fs. I'm keen to hear or more use cases and needs specially if
> they have to do with improving boot time and asynchronous boot.
>
>> Is that problem still somewhere on the radar?
>
> Not mine.
>
>> Atm there's various
>> wait_for_rootfs hacks out there floating in vendor/product trees.
>
> This one I've heard about recently, and I suggested two possible
> solutions, with a preference to a simply notification of when
> rootfs is available from userspace.
>
>> "Avoid at all costs" sounds like upstream prefers to not care about
>> android/cros in those case (yes I know most arm socs don't need
>> firmware, which would make it a problem fo just be a subset of all
>> devices).
>
> In my days of dealing with Android I learned most folks did not frankly
> care too much about upstream-first model. That means things were reactive.
> That's a business mind set and that's fine. However for upstream we want
> what is best and to 

Re: PowerPC agpmode issues

2016-08-25 Thread Benjamin Herrenschmidt
On Thu, 2016-08-25 at 05:09 +0200, Mike wrote:
> Any improvement on your ends? Seems -1 is now the quirk. But does
> your trackpads work? Did an update after getting a new and the latest
> released powerbook up. Also found an interesting interface which can
> replace our ide drives, intended for ipod classics, but it can fit in
> the bay and has msata interface.

There are bigger issues with Apple AGP implementation, but yes that's
one of them. Another one is that because it's not cache coherent, AGP
pages shouldn't also be mapped cachable in Linux via the linear
mapping, as the prefetcher could cause cache aliases of them which
would be very bad. Sadly, Linux uses BATs on ppc32 to map the linear
mapping and so we can't unmap selected pages.

So sadly, while slow, I'm afraid PCI mode is the way to go for those
old things.

To revive those old Mac laptops, one of the more interesting to do
would be to port all my old power management code from radeonfb to
radeon KMS so sleep can work again ;-)

> On 5 Feb 2016 15:32, "Herminio Hernandez Jr."
>  wrote:
> > I have been experiencing the same thing with my iBook and
> > PowerBook. 
> > 
> > Sent from my iPhone
> > 
> > On Feb 4, 2016, at 8:47 PM, Mike  wrote:
> > 
> > > Hi. 
> > > Managed to get the Radeon R300 running on mesa 11.1.1 with an old
> > > 2013 patch from Michel Dànzer, next problem is of course enabling
> > > agpmode, running with pci-mode with radeon.agpmode=-1 works, but
> > > is of course slow, and seems to load the cpu a lot.
> > > 
> > > Upon initial investigation i could not initially believe agp
> > > could be this this broken for this long, until i found this.
> > >  "committed with Ben Skeggs on Feb 26, 2013"
> > > https://github.com/DespairFactor/bullhead/commit/650e1203c11354ba
> > > 84d69ba445abc0efcfe3890a
> > > http://lxr.free-electrons.com/source/drivers/gpu/drm/nouveau/nouv
> > > eau_agp.c?v=4.2
> > > #ifdef __powerpc__
> > >   /* Disable AGP by default on all PowerPC machines for
> > >    * now -- At least some UniNorth-2 AGP bridges are
> > >    * known to be broken: DMA from the host to the card
> > >    * works just fine, but writeback from the card to the
> > >    * host goes straight to memory untranslated bypassing
> > >    * the GATT somehow, making them quite painful to deal
> > >    * with...
> > >    */
> > >   if (nouveau_agpmode == -1)
> > >   return false;
> > > #endif
> > >  
> > >  and now later this: 
> > > https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nou
> > > veau/nvkm/subdev/pci/agp.c
> > > #ifdef __powerpc__
> > >   /* Disable AGP by default on all PowerPC machines for now -- At
> > >* least some UniNorth-2 AGP bridges are known to be broken:
> > >* DMA from the host to the card works just fine, but writeback
> > >* from the card to the host goes straight to memory
> > >* untranslated bypassing that GATT somehow, making them quite
> > >* painful to deal with...
> > >*/
> > >   mode = 0;
> > > #endif
> > > 
> > > All seems to point to serious issues had around the time of
> > > change to ums to kms and a serious regression hitting the linux
> > > kernel? No?
> > > 
> > > Cheers
> > > -Mike
> > > ___
> > > Linuxppc-dev mailing list
> > > Linuxppc-dev@lists.ozlabs.org
> > > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > 


RE: [PATCH 1/1] pci: host: pci-layerscape: add missing of_node_put after calling of_parse_phandle

2016-08-25 Thread Peter Chen
 
>
>On Fri, Aug 12, 2016 at 09:34:41AM +0800, Peter Chen wrote:
>> of_node_put needs to be called when the device node which is got from
>> of_parse_phandle has finished using.
>>
>> Cc: Minghuan Lian 
>> Cc: Mingkai Hu 
>> Cc: Roy Zang 
>> Signed-off-by: Peter Chen 
>> ---
>>  drivers/pci/host/pci-layerscape.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/pci/host/pci-layerscape.c
>> b/drivers/pci/host/pci-layerscape.c
>> index 114ba81..573b996 100644
>> --- a/drivers/pci/host/pci-layerscape.c
>> +++ b/drivers/pci/host/pci-layerscape.c
>> @@ -173,6 +173,8 @@ static int ls_pcie_msi_host_init(struct pcie_port *pp,
>>  return -EINVAL;
>>  }
>>
>> +of_node_put(msi_node);
>> +
>
>Can you please look for and fix similar errors in other drivers in 
>drivers/pci/host/*?
>
>For example:
>
>  advk_pcie_probe() and iproc_pcie_msi_enable() call
>  of_parse_phandle() but don't call of_node_put() in failure paths.
>
>  dra7xx_pcie_init_irq_domain(), nwl_pcie_init_irq_domain(), and
>  xilinx_pcie_init_irq_domain() call of_get_next_child() but don't
>  call of_node_put() in failure paths.
>
>  ks_pcie_get_irq_controller_info() calls of_find_node_by_name().
>

Would you agree that I fix the issues for drivers that call of_parse_phandle or 
of_get_next_child
or of_find_node_by_name? I can grep the symbols, and check if of_node_put is 
called properly.
I will group all fixed within one patch set.

>I think there may be others, e.g., the pci_host_bridge_msi_domain() path calls
>of_parse_phandle(), but I'm not sure of_node_put() is called on failure paths.
>

I find pci_host_bridge_msi_domain does not call of_parse_phandle directly, only 
of_msi_get_domain calls of_parse_phandle, and
of_node_put is called properly.

Peter


>When we find bugs like this, it's nice to fix one occurrence, but it's really 
>great if we
>can squash similar bugs nearby so the bug isn't copied into new drivers.
>
>>  return 0;
>>  }
>>
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci"
>> in the body of a message to majord...@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 00/12] powerpc: "paca->soft_enabled" based local atomic operation implementation

2016-08-25 Thread Nicholas Piggin
On Thu, 25 Aug 2016 11:59:51 +0530
Madhavan Srinivasan  wrote:

> Local atomic operations are fast and highly reentrant per CPU counters.
> Used for percpu variable updates. Local atomic operations only guarantee
> variable modification atomicity wrt the CPU which owns the data and
> these needs to be executed in a preemption safe way.

This is looking really nice. I like how you're able to specify
the mask nicely in the handler, and test the mask without adding
any instructions to fastpaths.

So far, I only have a few trivial nitpicks as you can see. I'll
apply the series and give it a more careful look tomorrow.

Thanks,
Nick


Re: [RFC PATCH v3 11/12] powerpc: Support to replay PMIs

2016-08-25 Thread Nicholas Piggin
On Thu, 25 Aug 2016 12:00:02 +0530
Madhavan Srinivasan  wrote:

> Code to replay the Performance Monitoring Interrupts(PMI).
> In the masked_interrupt handler, for PMIs we reset the MSR[EE]
> and return. In the __check_irq_replay(), replay the PMI interrupt
> by calling performance_monitor_common handler.
> 
> Patch also adds a new soft_irq_set_mask() to update paca->soft_enabled.
> New Kconfig is added "CONFIG_IRQ_DEBUG_SUPPORT" to add a warn_on
> to alert the usage of soft_irq_set_mask() for disabling lower
> bitmask interrupts.
> 
> Have also moved the code under the CONFIG_TRACE_IRQFLAGS in
> arch_local_irq_restore() to new Kconfig as suggested.

Should you make a single patch out of this and patch 10?
It doesn't make sense to mask perf interrupts if we can't
replay them.

Perhaps split the CONFIG_IRQ_DEBUG_SUPPORT change into its
own patch first and have the PMU masking and replaying as
a single patch?

Just a suggestion.


Re: [RFC PATCH v3 08/12] powerpc: Introduce new mask bit for soft_enabled

2016-08-25 Thread Nicholas Piggin
On Thu, 25 Aug 2016 11:59:59 +0530
Madhavan Srinivasan  wrote:

> diff --git a/arch/powerpc/include/asm/hw_irq.h 
> b/arch/powerpc/include/asm/hw_irq.h
> index c19169ac1fbb..e457438c6fdf 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -32,6 +32,7 @@
>   */
>  #define IRQ_DISABLE_MASK_NONE0
>  #define IRQ_DISABLE_MASK_LINUX   1
> +#define IRQ_DISABLE_MASK_PMU 2
>  
>  #endif /* CONFIG_PPC64 */

This bit belongs in patch 10, I think?


Re: [PATCH v3 2/2] powerpc/fadump: parse fadump reserve memory size based on memory range

2016-08-25 Thread Dave Young
On 08/10/16 at 03:35pm, Hari Bathini wrote:
> When fadump is enabled, by default 5% of system RAM is reserved for
> fadump kernel. While that works for most cases, it is not good enough
> for every case.
> 
> Currently, to override the default value, fadump supports specifying
> memory to reserve with fadump_reserve_mem=size, where only a fixed size
> can be specified. This patch adds support to specify memory size to
> reserve for different memory ranges as below:
> 
>   fadump_reserve_mem=:[,:,...]

Hi, Hari

I do not understand why you need introduce the new cmdline param, what's
the difference between the "fadump reserved" memory and the memory
reserved by "crashkernel="? Can fadump just use crashkernel= to reserve
memory?

Thanks
Dave

> 
> Supporting range based input for "fadump_reserve_mem" parameter helps
> using the same commandline parameter for different system memory sizes.
> 
> Signed-off-by: Hari Bathini 
> Reviewed-by: Mahesh J Salgaonkar 
> ---
> 
> Changes from v2:
> 1. Updated changelog
> 
> 
>  arch/powerpc/kernel/fadump.c |   63 
> --
>  1 file changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index b3a6633..7c01b5b 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -193,6 +193,55 @@ static unsigned long init_fadump_mem_struct(struct 
> fadump_mem_struct *fdm,
>   return addr;
>  }
>  
> +/*
> + * This function parses command line for fadump_reserve_mem=
> + *
> + * Supports the below two syntaxes:
> + *1. fadump_reserve_mem=size
> + *2. fadump_reserve_mem=ramsize-range:size[,...]
> + *
> + * Sets fw_dump.reserve_bootvar with the memory size
> + * provided, 0 otherwise
> + *
> + * The function returns -EINVAL on failure, 0 otherwise.
> + */
> +static int __init parse_fadump_reserve_mem(void)
> +{
> + char *name = "fadump_reserve_mem=";
> + char *fadump_cmdline = NULL, *cur;
> +
> + fw_dump.reserve_bootvar = 0;
> +
> + /* find fadump_reserve_mem and use the last one if there are many */
> + cur = strstr(boot_command_line, name);
> + while (cur) {
> + fadump_cmdline = cur;
> + cur = strstr(cur+1, name);
> + }
> +
> + /* when no fadump_reserve_mem= cmdline option is provided */
> + if (!fadump_cmdline)
> + return 0;
> +
> + fadump_cmdline += strlen(name);
> +
> + /* for fadump_reserve_mem=size cmdline syntax */
> + if (!is_colon_in_param(fadump_cmdline)) {
> + fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
> + return 0;
> + }
> +
> + /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
> + cur = fadump_cmdline;
> + fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
> + , memblock_phys_mem_size());
> + if (cur == fadump_cmdline) {
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  /**
>   * fadump_calculate_reserve_size(): reserve variable boot area 5% of System 
> RAM
>   *
> @@ -212,12 +261,17 @@ static inline unsigned long 
> fadump_calculate_reserve_size(void)
>  {
>   unsigned long size;
>  
> + /* sets fw_dump.reserve_bootvar */
> + parse_fadump_reserve_mem();
> +
>   /*
>* Check if the size is specified through fadump_reserve_mem= cmdline
>* option. If yes, then use that.
>*/
>   if (fw_dump.reserve_bootvar)
>   return fw_dump.reserve_bootvar;
> + else
> + printk(KERN_INFO "fadump: calculating default boot size\n");
>  
>   /* divide by 20 to get 5% of value */
>   size = memblock_end_of_DRAM() / 20;
> @@ -348,15 +402,6 @@ static int __init early_fadump_param(char *p)
>  }
>  early_param("fadump", early_fadump_param);
>  
> -/* Look for fadump_reserve_mem= cmdline option */
> -static int __init early_fadump_reserve_mem(char *p)
> -{
> - if (p)
> - fw_dump.reserve_bootvar = memparse(p, );
> - return 0;
> -}
> -early_param("fadump_reserve_mem", early_fadump_reserve_mem);
> -
>  static void register_fw_dump(struct fadump_mem_struct *fdm)
>  {
>   int rc;
> 


[RFC PATCH v3 05/12] powerpc: reverse the soft_enable logic

2016-08-25 Thread Madhavan Srinivasan
"paca->soft_enabled" is used as a flag to mask some of interrupts.
Currently supported flags values and their details:

soft_enabledMSR[EE]

0   0   Disabled (PMI and HMI not masked)
1   1   Enabled

"paca->soft_enabled" is initialized to 1 to make the interripts as
enabled. arch_local_irq_disable() will toggle the value when interrupts
needs to disbled. At this point, the interrupts are not actually disabled,
instead, interrupt vector has code to check for the flag and mask it when it 
occurs.
By "mask it", it update interrupt paca->irq_happened and return.
arch_local_irq_restore() is called to re-enable interrupts, which checks and
replays interrupts if any occured.

Now, as mentioned, current logic doesnot mask "performance monitoring 
interrupts"
and PMIs are implemented as NMI. But this patchset depends on local_irq_*
for a successful local_* update. Meaning, mask all possible interrupts during
local_* update and replay them after the update.

So the idea here is to reserve the "paca->soft_enabled" logic. New values and
details:

soft_enabledMSR[EE]

1   0   Disabled  (PMI and HMI not masked)
0   1   Enabled

Reason for the this change is to create foundation for a third flag value "2"
for "soft_enabled" to add support to mask PMIs.

Foundation patch to support checking of new flag value for "paca->soft_enabled".
Modify the condition checking for the "soft_enabled".

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 4 ++--
 arch/powerpc/kernel/entry_64.S| 5 ++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index a7564b8a4831..c19169ac1fbb 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -30,8 +30,8 @@
 /*
  * flags for paca->soft_enabled
  */
-#define IRQ_DISABLE_MASK_NONE  1
-#define IRQ_DISABLE_MASK_LINUX 0
+#define IRQ_DISABLE_MASK_NONE  0
+#define IRQ_DISABLE_MASK_LINUX 1
 
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3078de64824b..b50d79e5bfbc 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -131,8 +131,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 */
 #if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_BUG)
lbz r10,PACASOFTIRQEN(r13)
-   xorir10,r10,IRQ_DISABLE_MASK_NONE
-1: tdnei   r10,0
+1: tdnei   r10,IRQ_DISABLE_MASK_NONE
EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 
@@ -1012,7 +1011,7 @@ _GLOBAL(enter_rtas)
 * check it with the asm equivalent of WARN_ON
 */
lbz r0,PACASOFTIRQEN(r13)
-1: tdnei   r0,IRQ_DISABLE_MASK_LINUX
+1: tdeqi   r0,IRQ_DISABLE_MASK_NONE
EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif

-- 
2.7.4



[RFC PATCH v3 04/12] powerpc: Use set_soft_enabled api to update paca->soft_enabled

2016-08-25 Thread Madhavan Srinivasan
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/kvm_ppc.h | 2 +-
 arch/powerpc/kernel/irq.c  | 2 +-
 arch/powerpc/kernel/setup_64.c | 4 ++--
 arch/powerpc/kernel/time.c | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 740ee309cea8..5624ec233664 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -707,7 +707,7 @@ static inline void kvmppc_fix_ee_before_entry(void)
 
/* Only need to enable IRQs by hard enabling them after this */
local_paca->irq_happened = 0;
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_NONE;
+   set_soft_enabled(IRQ_DISABLE_MASK_NONE);
 #endif
 }
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7afa6bf96671..40af16a102bb 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -332,7 +332,7 @@ bool prep_irq_for_idle(void)
 * of entering the low power state.
 */
local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_NONE;
+   set_soft_enabled(IRQ_DISABLE_MASK_NONE);
 
/* Tell the caller to enter the low power state */
return true;
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f31930b9bfc1..4333cfc305f8 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -197,7 +197,7 @@ static void __init fixup_boot_paca(void)
/* Allow percpu accesses to work until we setup percpu data */
get_paca()->data_offset = 0;
/* Mark interrupts disabled in PACA */
-   get_paca()->soft_enabled = IRQ_DISABLE_MASK_LINUX;
+   set_soft_enabled(IRQ_DISABLE_MASK_LINUX);
 }
 
 static void __init configure_exceptions(void)
@@ -334,7 +334,7 @@ void __init early_setup(unsigned long dt_ptr)
 void early_setup_secondary(void)
 {
/* Mark interrupts disabled in PACA */
-   get_paca()->soft_enabled = 0;
+   set_soft_enabled(IRQ_DISABLE_MASK_LINUX);
 
/* Initialize the hash table or TLB handling */
early_init_mmu_secondary();
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 7105757cdb90..d2aa6888db43 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -268,7 +268,7 @@ void accumulate_stolen_time(void)
 * needs to reflect that so various debug stuff doesn't
 * complain
 */
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_LINUX;
+   set_soft_enabled(IRQ_DISABLE_MASK_LINUX);
 
sst = scan_dispatch_log(acct->starttime_user);
ust = scan_dispatch_log(acct->starttime);
@@ -276,7 +276,7 @@ void accumulate_stolen_time(void)
acct->user_time -= ust;
local_paca->stolen_time += ust + sst;
 
-   local_paca->soft_enabled = save_soft_enabled;
+   set_soft_enabled(save_soft_enabled);
 }
 
 static inline u64 calculate_stolen_time(u64 stop_tb)
-- 
2.7.4



[RFC PATCH v3 03/12] powerpc: move set_soft_enabled()

2016-08-25 Thread Madhavan Srinivasan
Move set_soft_enabled() from powerpc/kernel/irq.c to
asm/hw_irq.c. this way updation of paca->soft_enabled
can be forced wherever possible.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 6 ++
 arch/powerpc/kernel/irq.c | 6 --
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 1fcc2fd7275a..a7564b8a4831 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -47,6 +47,12 @@ extern void unknown_exception(struct pt_regs *regs);
 #ifdef CONFIG_PPC64
 #include 
 
+static inline notrace void set_soft_enabled(unsigned long enable)
+{
+   __asm__ __volatile__("stb %0,%1(13)"
+   : : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled)));
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
unsigned long flags;
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ed1123125063..7afa6bf96671 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -107,12 +107,6 @@ static inline notrace unsigned long get_irq_happened(void)
return happened;
 }
 
-static inline notrace void set_soft_enabled(unsigned long enable)
-{
-   __asm__ __volatile__("stb %0,%1(13)"
-   : : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled)));
-}
-
 static inline notrace int decrementer_check_overflow(void)
 {
u64 now = get_tb_or_rtc();
-- 
2.7.4



[PATCH] powerpc: Remove suspect CONFIG_PPC_BOOK3E #ifdefs in nohash/64/pgtable.h

2016-08-25 Thread Rui Teng
There are three #ifdef CONFIG_PPC_BOOK3E sections in nohash/64/pgtable.h.
And there should be no configurations possible which use nohash/64/pgtable.h
but don't also enable CONFIG_PPC_BOOK3E.

Suggested-by: Michael Ellerman 
Signed-off-by: Rui Teng 
---
 arch/powerpc/include/asm/nohash/64/pgtable.h | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index d4d808c..6213fc1 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -26,15 +26,11 @@
 #else
 #define PMD_CACHE_INDEXPMD_INDEX_SIZE
 #endif
+
 /*
  * Define the address range of the kernel non-linear virtual area
  */
-
-#ifdef CONFIG_PPC_BOOK3E
 #define KERN_VIRT_START ASM_CONST(0x8000)
-#else
-#define KERN_VIRT_START ASM_CONST(0xD000)
-#endif
 #define KERN_VIRT_SIZE ASM_CONST(0x1000)
 
 /*
@@ -43,11 +39,7 @@
  * (we keep a quarter for the virtual memmap)
  */
 #define VMALLOC_START  KERN_VIRT_START
-#ifdef CONFIG_PPC_BOOK3E
 #define VMALLOC_SIZE   (KERN_VIRT_SIZE >> 2)
-#else
-#define VMALLOC_SIZE   (KERN_VIRT_SIZE >> 1)
-#endif
 #define VMALLOC_END(VMALLOC_START + VMALLOC_SIZE)
 
 /*
@@ -85,12 +77,8 @@
  * Defines the address of the vmemap area, in its own region on
  * hash table CPUs and after the vmalloc space on Book3E
  */
-#ifdef CONFIG_PPC_BOOK3E
 #define VMEMMAP_BASE   VMALLOC_END
 #define VMEMMAP_ENDKERN_IO_START
-#else
-#define VMEMMAP_BASE   (VMEMMAP_REGION_ID << REGION_SHIFT)
-#endif
 #define vmemmap((struct page *)VMEMMAP_BASE)
 
 
-- 
2.7.4



[RFC PATCH v3 07/12] powerpc: Add new _EXCEPTION_PROLOG_1 macro

2016-08-25 Thread Madhavan Srinivasan
Factor out the EXCPETION_PROLOG_1 macro, so that STD_EXCEPTION_*
and MASKABLE_EXCEPTION will have separate version. This is needed
to support addition of new parameter "bitmask" for MASKABLE_* macros
to specify the interrupt mask.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 75e262466b85..dd3253bd0d8e 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -161,18 +161,40 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,area+EX_R10(r13);   /* save r10 - r12 */\
OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
 
-#define __EXCEPTION_PROLOG_1(area, extra, vec) \
+#define __EXCEPTION_PROLOG_1_PRE(area) \
OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR); \
OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR);  \
SAVE_CTR(r10, area);\
-   mfcrr9; \
-   extra(vec); \
+   mfcrr9;
+
+#define __EXCEPTION_PROLOG_1_POST(area)
\
std r11,area+EX_R11(r13);   \
std r12,area+EX_R12(r13);   \
GET_SCRATCH0(r10);  \
std r10,area+EX_R13(r13)
+
+/*
+ * This version of the EXCEPTION_PROLOG_1 will carry
+ * addition parameter called "bitmask" to support
+ * checking of the interrupt maskable level in the SOFTEN_TEST.
+ * Intended to be used in MASKABLE_EXCPETION_* macros.
+ */
+#define __EXCEPTION_PROLOG_1(area, extra, vec) \
+   __EXCEPTION_PROLOG_1_PRE(area); \
+   extra(vec); \
+   __EXCEPTION_PROLOG_1_POST(area);
+
+/*
+ * This version of the EXCEPTION_PROLOG_1 is intended
+ * to be used in STD_EXCEPTION* macros
+ */
+#define _EXCEPTION_PROLOG_1(area, extra, vec)  \
+   __EXCEPTION_PROLOG_1_PRE(area); \
+   extra(vec); \
+   __EXCEPTION_PROLOG_1_POST(area);
+
 #define EXCEPTION_PROLOG_1(area, extra, vec)   \
-   __EXCEPTION_PROLOG_1(area, extra, vec)
+   _EXCEPTION_PROLOG_1(area, extra, vec)
 
 #define __EXCEPTION_PROLOG_PSERIES_1(label, h) \
ld  r12,PACAKBASE(r13); /* get high part of  */   \
-- 
2.7.4



[RFC PATCH v3 12/12] powerpc: rewrite local_t using soft_irq

2016-08-25 Thread Madhavan Srinivasan
Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

Here is the design of this patch. Since local_* operations
are only need to be atomic to interrupts (IIUC), we have two options.
Either replay the "op" if interrupted or replay the interrupt after
the "op". Initial patchset posted was based on implementing local_* operation
based on CR5 which replay's the "op". Patchset had issues in case of
rewinding the address pointor from an array. This make the slow patch
really slow. Since CR5 based implementation proposed using __ex_table to find
the rewind addressr, this rasied concerns about size of __ex_table and vmlinux.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123115.html

But this patch uses, arch_local_irq_*() to soft_disable
interrupts (including PMIs). After finishing the "op", arch_local_irq_restore()
called and correspondingly interrupts are replayed if any occured.

patch re-write the current local_* functions to use arch_local_irq_disbale.
Base flow for each function is

{
soft_irq_set_mask()
load
..
store
arch_local_irq_restore()
}

Reason for the approach is that, currently l[w/d]arx/st[w/d]cx.
instruction pair is used for local_* operations, which are heavy
on cycle count and they dont support a local variant. So to
see whether the new implementation helps, used a modified
version of Rusty's benchmark code on local_t.

https://lkml.org/lkml/2008/12/16/450

Modifications to Rusty's benchmark code:
- Executed only local_t test

Here are the values with the patch.

Time in ns per iteration

Local_t Without Patch   With Patch

_inc28  8
_add28  8
_read   3   3
_add_return 28  7

Currently only asm/local.h has been rewrite, and also
the entire change is tested only in PPC64 (pseries guest)
and PPC64 host (LE).

TODO:
- local_cmpxchg and local_xchg needs modification.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/local.h | 91 +++-
 1 file changed, 63 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index b8da91363864..4ab726e99fea 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -14,24 +14,50 @@ typedef struct
 #define local_read(l)  atomic_long_read(&(l)->a)
 #define local_set(l,i) atomic_long_set(&(l)->a, (i))
 
-#define local_add(i,l) atomic_long_add((i),(&(l)->a))
-#define local_sub(i,l) atomic_long_sub((i),(&(l)->a))
-#define local_inc(l)   atomic_long_inc(&(l)->a)
-#define local_dec(l)   atomic_long_dec(&(l)->a)
+static __inline__ void local_add(long i, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   flags = soft_irq_set_mask(IRQ_DISABLE_MASK_PMU | 
IRQ_DISABLE_MASK_LINUX);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   add %0,%1,%0\n"
+   PPC_STL" %0,0(%2)\n"
+   : "=" (t)
+   : "r" (i), "r" (&(l->a.counter)));
+   arch_local_irq_restore(flags);
+}
+
+static __inline__ void local_sub(long i, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   flags = soft_irq_set_mask(IRQ_DISABLE_MASK_PMU | 
IRQ_DISABLE_MASK_LINUX);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   subf%0,%1,%0\n"
+   PPC_STL" %0,0(%2)\n"
+   : "=" (t)
+   : "r" (i), "r" (&(l->a.counter)));
+   arch_local_irq_restore(flags);
+}
 
 static __inline__ long local_add_return(long a, local_t *l)
 {
long t;
+   unsigned long flags;
 
+   flags = soft_irq_set_mask(IRQ_DISABLE_MASK_PMU | 
IRQ_DISABLE_MASK_LINUX);
__asm__ __volatile__(
-"1:"   PPC_LLARX(%0,0,%2,0) "  # local_add_return\n\
+   PPC_LL" %0,0(%2)\n\
add %0,%1,%0\n"
-   PPC405_ERR77(0,%2)
-   PPC_STLCX   "%0,0,%2 \n\
-   bne-1b"
+   PPC_STL "%0,0(%2)\n"
: "=" (t)
: "r" (a), "r" (&(l->a.counter))
: "cc", "memory");
+   arch_local_irq_restore(flags);
 
return t;
 }
@@ -41,16 +67,18 @@ static __inline__ long local_add_return(long a, local_t *l)
 static __inline__ long local_sub_return(long a, local_t *l)
 {
long t;
+   unsigned long flags;
+
+   flags = soft_irq_set_mask(IRQ_DISABLE_MASK_PMU | 
IRQ_DISABLE_MASK_LINUX);
 
__asm__ __volatile__(
-"1:"   PPC_LLARX(%0,0,%2,0) "  # local_sub_return\n\
+"1:"   PPC_LL" %0,0(%2)\n\
subf%0,%1,%0\n"
-   PPC405_ERR77(0,%2)
-   PPC_STLCX   "%0,0,%2 \n\
-   bne-1b"
+   PPC_STL "%0,0(%2)\n"
: "=" (t)
: "r" (a), "r" 

[RFC PATCH v3 11/12] powerpc: Support to replay PMIs

2016-08-25 Thread Madhavan Srinivasan
Code to replay the Performance Monitoring Interrupts(PMI).
In the masked_interrupt handler, for PMIs we reset the MSR[EE]
and return. In the __check_irq_replay(), replay the PMI interrupt
by calling performance_monitor_common handler.

Patch also adds a new soft_irq_set_mask() to update paca->soft_enabled.
New Kconfig is added "CONFIG_IRQ_DEBUG_SUPPORT" to add a warn_on
to alert the usage of soft_irq_set_mask() for disabling lower
bitmask interrupts.

Have also moved the code under the CONFIG_TRACE_IRQFLAGS in
arch_local_irq_restore() to new Kconfig as suggested.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/Kconfig |  4 
 arch/powerpc/include/asm/hw_irq.h| 17 +
 arch/powerpc/kernel/entry_64.S   |  5 +
 arch/powerpc/kernel/exceptions-64s.S |  2 ++
 arch/powerpc/kernel/irq.c| 12 ++--
 5 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 927d2ab2ce08..878f05925340 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -51,6 +51,10 @@ config TRACE_IRQFLAGS_SUPPORT
bool
default y
 
+config IRQ_DEBUG_SUPPORT
+   bool
+   default n
+
 config LOCKDEP_SUPPORT
bool
default y
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 415734c07cfa..9f71559ce868 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -81,6 +81,23 @@ static inline unsigned long arch_local_irq_disable(void)
return flags;
 }
 
+static inline unsigned long soft_irq_set_mask(int value)
+{
+   unsigned long flags, zero;
+
+#ifdef CONFIG_IRQ_DEBUG_SUPPORT
+   WARN_ON(value <= IRQ_DISABLE_MASK_LINUX);
+#endif
+   asm volatile(
+   "li %1,%3; lbz %0,%2(13); stb %1,%2(13)"
+   : "=r" (flags), "=" (zero)
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+"i" (value)
+   : "memory");
+
+   return flags;
+}
+
 extern void arch_local_irq_restore(unsigned long);
 
 static inline void arch_local_irq_enable(void)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 99bab5c65734..b79baff3dae4 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -933,6 +933,11 @@ restore_check_irq_replay:
addir3,r1,STACK_FRAME_OVERHEAD;
bl  do_IRQ
b   ret_from_except
+1: cmpwi   cr0,r3,0xf00
+   bne 1f
+   addir3,r1,STACK_FRAME_OVERHEAD;
+   bl  performance_monitor_exception
+   b   ret_from_except
 1: cmpwi   cr0,r3,0xe60
bne 1f
addir3,r1,STACK_FRAME_OVERHEAD;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index bc2a838e1926..44c871324228 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -655,6 +655,8 @@ _GLOBAL(__replay_interrupt)
beq decrementer_common
cmpwi   r3,0x500
beq hardware_interrupt_common
+   cmpwi   r3,0xf00
+   beq performance_monitor_common
 BEGIN_FTR_SECTION
cmpwi   r3,0xe80
beq h_doorbell_common
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 40af16a102bb..57343a55111e 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -159,6 +159,14 @@ notrace unsigned int __check_irq_replay(void)
if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
return 0x900;
 
+   /*
+* In masked_handler() for PMI, we disable MSR[EE] and return.
+* When replaying it here
+*/
+   local_paca->irq_happened &= ~PACA_IRQ_PMI;
+   if (happened & PACA_IRQ_PMI)
+   return 0xf00;
+
/* Finally check if an external interrupt happened */
local_paca->irq_happened &= ~PACA_IRQ_EE;
if (happened & PACA_IRQ_EE)
@@ -235,7 +243,7 @@ notrace void arch_local_irq_restore(unsigned long en)
 */
if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
__hard_irq_disable();
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_IRQ_DEBUG_SUPPORT
else {
/*
 * We should already be hard disabled here. We had bugs
@@ -246,7 +254,7 @@ notrace void arch_local_irq_restore(unsigned long en)
if (WARN_ON(mfmsr() & MSR_EE))
__hard_irq_disable();
}
-#endif /* CONFIG_TRACE_IRQFLAGS */
+#endif /* CONFIG_IRQ_DEBUG_SUPPORT */
 
set_soft_enabled(IRQ_DISABLE_MASK_LINUX);
 
-- 
2.7.4



[RFC PATCH v3 10/12] powerpc: Add support to mask perf interrupts

2016-08-25 Thread Madhavan Srinivasan
To support masking of the PMI interrupts, couple of new interrupt handler
macros are added MASKABLE_EXCEPTION_PSERIES_OOL and
MASKABLE_RELON_EXCEPTION_PSERIES_OOL.

Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added to
use in the exception code to check for PMI interrupts.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 13 +
 arch/powerpc/include/asm/hw_irq.h|  1 +
 arch/powerpc/kernel/exceptions-64s.S |  4 ++--
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 41be0c2d7658..ca40b5c59869 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -427,6 +427,7 @@ label##_relon_hv:   
\
 #define SOFTEN_VALUE_0xe62 PACA_IRQ_HMI
 #define SOFTEN_VALUE_0xea0 PACA_IRQ_EE
 #define SOFTEN_VALUE_0xea2 PACA_IRQ_EE
+#define SOFTEN_VALUE_0xf00 PACA_IRQ_PMI
 
 #define __SOFTEN_TEST(h, vec, bitmask) \
lbz r10,PACASOFTIRQEN(r13); \
@@ -462,6 +463,12 @@ label##_pSeries:   
\
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
EXC_STD, SOFTEN_TEST_PR, bitmask)
 
+#define MASKABLE_EXCEPTION_PSERIES_OOL(vec, label, bitmask)\
+   .globl label##_pSeries; \
+label##_pSeries:   \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, vec, bitmask); \
+   EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD);
+
 #define MASKABLE_EXCEPTION_HV(loc, vec, label, bitmask)
\
. = loc;\
.globl label##_hv;  \
@@ -490,6 +497,12 @@ label##_relon_pSeries: 
\
_MASKABLE_RELON_EXCEPTION_PSERIES(vec, label,   \
  EXC_STD, SOFTEN_NOTEST_PR, bitmask)
 
+#define MASKABLE_RELON_EXCEPTION_PSERIES_OOL(vec, label, bitmask)  \
+   .globl label##_relon_pSeries;   \
+label##_relon_pSeries: \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_PR, vec, bitmask);\
+   EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD);
+
 #define MASKABLE_RELON_EXCEPTION_HV(loc, vec, label, bitmask)  \
. = loc;\
.globl label##_relon_hv;\
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index e457438c6fdf..415734c07cfa 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -26,6 +26,7 @@
 #define PACA_IRQ_DEC   0x08 /* Or FIT */
 #define PACA_IRQ_EE_EDGE   0x10 /* BookE only */
 #define PACA_IRQ_HMI   0x20
+#define PACA_IRQ_PMI   0x40
 
 /*
  * flags for paca->soft_enabled
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6c56554cfcc3..bc2a838e1926 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -580,7 +580,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xea2)
 
/* moved from 0xf00 */
-   STD_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
+   MASKABLE_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor, 
IRQ_DISABLE_MASK_PMU)
KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xf00)
STD_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xf20)
@@ -1126,7 +1126,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
MASKABLE_RELON_EXCEPTION_HV_OOL(0xea0, h_virt_irq,
IRQ_DISABLE_MASK_LINUX)
 
-   STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
+   MASKABLE_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor, 
IRQ_DISABLE_MASK_PMU)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
-- 
2.7.4



[RFC PATCH v3 09/12] powerpc: Add "bitmask" paramater to MASKABLE_* macros

2016-08-25 Thread Madhavan Srinivasan
Make it explicit the interrupt masking supported
by a gievn interrupt handler. Patch correspondingly
extends the MASKABLE_* macros with an addition's parameter.
"bitmask" parameter is passed to SOFTEN_TEST macro to decide
on masking the interrupt.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 62 
 arch/powerpc/kernel/exceptions-64s.S | 36 ---
 2 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 1eea4ab75607..41be0c2d7658 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -179,9 +179,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
  * checking of the interrupt maskable level in the SOFTEN_TEST.
  * Intended to be used in MASKABLE_EXCPETION_* macros.
  */
-#define __EXCEPTION_PROLOG_1(area, extra, vec) \
+#define __EXCEPTION_PROLOG_1(area, extra, vec, bitmask)
\
__EXCEPTION_PROLOG_1_PRE(area); \
-   extra(vec); \
+   extra(vec, bitmask);\
__EXCEPTION_PROLOG_1_POST(area);
 
 /*
@@ -428,79 +428,79 @@ label##_relon_hv: 
\
 #define SOFTEN_VALUE_0xea0 PACA_IRQ_EE
 #define SOFTEN_VALUE_0xea2 PACA_IRQ_EE
 
-#define __SOFTEN_TEST(h, vec)  \
+#define __SOFTEN_TEST(h, vec, bitmask) \
lbz r10,PACASOFTIRQEN(r13); \
-   andi.   r10,r10,IRQ_DISABLE_MASK_LINUX; \
+   andi.   r10,r10,bitmask;\
li  r10,SOFTEN_VALUE_##vec; \
bne masked_##h##interrupt
-#define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
+#define _SOFTEN_TEST(h, vec, bitmask)  __SOFTEN_TEST(h, vec, bitmask)
 
-#define SOFTEN_TEST_PR(vec)\
+#define SOFTEN_TEST_PR(vec, bitmask)   \
KVMTEST(vec);   \
-   _SOFTEN_TEST(EXC_STD, vec)
+   _SOFTEN_TEST(EXC_STD, vec, bitmask)
 
-#define SOFTEN_TEST_HV(vec)\
+#define SOFTEN_TEST_HV(vec, bitmask)   \
KVMTEST(vec);   \
-   _SOFTEN_TEST(EXC_HV, vec)
+   _SOFTEN_TEST(EXC_HV, vec, bitmask)
 
-#define SOFTEN_NOTEST_PR(vec)  _SOFTEN_TEST(EXC_STD, vec)
-#define SOFTEN_NOTEST_HV(vec)  _SOFTEN_TEST(EXC_HV, vec)
+#define SOFTEN_NOTEST_PR(vec, bitmask) _SOFTEN_TEST(EXC_STD, vec, 
bitmask)
+#define SOFTEN_NOTEST_HV(vec, bitmask) _SOFTEN_TEST(EXC_HV, vec, 
bitmask)
 
-#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra) \
+#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask)\
SET_SCRATCH0(r13);/* save r13 */\
EXCEPTION_PROLOG_0(PACA_EXGEN); \
-   __EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec);   \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec, bitmask);  \
EXCEPTION_PROLOG_PSERIES_1(label##_common, h);
 
-#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)  \
-   __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)
+#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask) \
+   __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask)
 
-#define MASKABLE_EXCEPTION_PSERIES(loc, vec, label)\
+#define MASKABLE_EXCEPTION_PSERIES(loc, vec, label, bitmask)   \
. = loc;\
.globl label##_pSeries; \
 label##_pSeries:   \
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
-   EXC_STD, SOFTEN_TEST_PR)
+   EXC_STD, SOFTEN_TEST_PR, bitmask)
 
-#define MASKABLE_EXCEPTION_HV(loc, vec, label) \
+#define MASKABLE_EXCEPTION_HV(loc, vec, label, bitmask)
\
. = loc;\
.globl label##_hv;  \
 label##_hv:\
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
-   EXC_HV, SOFTEN_TEST_HV)
+   EXC_HV, SOFTEN_TEST_HV, bitmask)
 
-#define 

[RFC PATCH v3 08/12] powerpc: Introduce new mask bit for soft_enabled

2016-08-25 Thread Madhavan Srinivasan
Currently soft_enabled is used as the flag to determine
the interrupt state. Patch extends the soft_enabled
to be used as a mask instead of a flag. So, each MASKABLE_*
macro will carry additionaly "bitmask" paramater to specify
the interrupt maskable level.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 4 ++--
 arch/powerpc/include/asm/hw_irq.h| 1 +
 arch/powerpc/include/asm/irqflags.h  | 4 ++--
 arch/powerpc/kernel/entry_64.S   | 4 ++--
 arch/powerpc/kernel/exceptions-64e.S | 6 +++---
 5 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index dd3253bd0d8e..1eea4ab75607 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -430,9 +430,9 @@ label##_relon_hv:   
\
 
 #define __SOFTEN_TEST(h, vec)  \
lbz r10,PACASOFTIRQEN(r13); \
-   cmpwi   r10,IRQ_DISABLE_MASK_LINUX; 
\
+   andi.   r10,r10,IRQ_DISABLE_MASK_LINUX; \
li  r10,SOFTEN_VALUE_##vec; \
-   beq masked_##h##interrupt
+   bne masked_##h##interrupt
 #define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
 
 #define SOFTEN_TEST_PR(vec)\
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index c19169ac1fbb..e457438c6fdf 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -32,6 +32,7 @@
  */
 #define IRQ_DISABLE_MASK_NONE  0
 #define IRQ_DISABLE_MASK_LINUX 1
+#define IRQ_DISABLE_MASK_PMU   2
 
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/include/asm/irqflags.h 
b/arch/powerpc/include/asm/irqflags.h
index d0ed2a7d7d10..9ff09747a226 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -48,11 +48,11 @@
 #define RECONCILE_IRQ_STATE(__rA, __rB)\
lbz __rA,PACASOFTIRQEN(r13);\
lbz __rB,PACAIRQHAPPENED(r13);  \
-   cmpwi   cr0,__rA,IRQ_DISABLE_MASK_LINUX;\
+   andi.   __rA,__rA,IRQ_DISABLE_MASK_LINUX;\
li  __rA,IRQ_DISABLE_MASK_LINUX;\
ori __rB,__rB,PACA_IRQ_HARD_DIS;\
stb __rB,PACAIRQHAPPENED(r13);  \
-   beq 44f;\
+   bne 44f;\
stb __rA,PACASOFTIRQEN(r13);\
TRACE_DISABLE_INTS; \
 44:
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index b50d79e5bfbc..99bab5c65734 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -764,8 +764,8 @@ restore:
 */
ld  r5,SOFTE(r1)
lbz r6,PACASOFTIRQEN(r13)
-   cmpwi   cr0,r5,IRQ_DISABLE_MASK_LINUX
-   beq restore_irq_off
+   andi.   r5,r5,IRQ_DISABLE_MASK_LINUX
+   bne restore_irq_off
 
/* We are enabling, were we already enabled ? Yes, just return */
cmpwi   cr0,r6,IRQ_DISABLE_MASK_NONE
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 5c628b5696f6..8e40df2c2f30 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -212,8 +212,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
/* Interrupts had better not already be enabled... */
twnei   r6,IRQ_DISABLE_MASK_LINUX
 
-   cmpwi   cr0,r5,IRQ_DISABLE_MASK_LINUX
-   beq 1f
+   andi.   r5,r5,IRQ_DISABLE_MASK_LINUX
+   bne 1f
 
TRACE_ENABLE_INTS
stb r5,PACASOFTIRQEN(r13)
@@ -352,7 +352,7 @@ ret_from_mc_except:
 
 #define PROLOG_ADDITION_MASKABLE_GEN(n)
\
lbz r10,PACASOFTIRQEN(r13); /* are irqs soft-disabled ? */  \
-   cmpwi   cr0,r10,IRQ_DISABLE_MASK_LINUX;/* yes -> go out of line */ \
+   andi.   r10,r10,IRQ_DISABLE_MASK_LINUX;/* yes -> go out of line */ \
beq masked_interrupt_book3e_##n
 
 #define PROLOG_ADDITION_2REGS_GEN(n)   \
-- 
2.7.4



[RFC PATCH v3 01/12] powerpc: Add #defs for paca->soft_enabled flags

2016-08-25 Thread Madhavan Srinivasan
Two #defs IRQ_DISABLE_LEVEL_NONE and IRQ_DISABLE_LEVEL_LINUX
are added to be used when updating paca->soft_enabled.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index c7d82ff62a33..df5def1f635a 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -27,6 +27,12 @@
 #define PACA_IRQ_EE_EDGE   0x10 /* BookE only */
 #define PACA_IRQ_HMI   0x20
 
+/*
+ * flags for paca->soft_enabled
+ */
+#define IRQ_DISABLE_MASK_NONE  1
+#define IRQ_DISABLE_MASK_LINUX 0
+
 #endif /* CONFIG_PPC64 */
 
 #ifndef __ASSEMBLY__
-- 
2.7.4



[RFC PATCH v3 06/12] powerpc: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_*

2016-08-25 Thread Madhavan Srinivasan
Currently we use both EXCEPTION_PROLOG_1 amd __EXCEPTION_PROLOG_1
in the MASKABLE_* macros. As a cleanup, this patch makes MASKABLE_*
to use only __EXCEPTION_PROLOG_1. There is not logic change.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 38272fe8a757..75e262466b85 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -450,7 +450,7 @@ label##_hv: 
\
 #define MASKABLE_EXCEPTION_HV_OOL(vec, label)  \
.globl label##_hv;  \
 label##_hv:\
-   EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);\
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);  \
EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_HV);
 
 #define __MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, h, extra)   \
@@ -478,7 +478,7 @@ label##_relon_hv:   
\
 #define MASKABLE_RELON_EXCEPTION_HV_OOL(vec, label)\
.globl label##_relon_hv;\
 label##_relon_hv:  \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);  \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);
\
EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_HV);
 
 /*
-- 
2.7.4



[RFC PATCH v3 02/12] powerpc: Cleanup to use IRQ_DISABLE_MASK_* macros for paca->soft_enabled update

2016-08-25 Thread Madhavan Srinivasan
Replace the hardcoded values used when updating
paca->soft_enabled with IRQ_DISABLE_MASK_* #def.
No logic change.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h |  2 +-
 arch/powerpc/include/asm/hw_irq.h| 15 ---
 arch/powerpc/include/asm/irqflags.h  |  6 +++---
 arch/powerpc/include/asm/kvm_ppc.h   |  2 +-
 arch/powerpc/kernel/entry_64.S   | 16 
 arch/powerpc/kernel/exceptions-64e.S |  6 +++---
 arch/powerpc/kernel/head_64.S|  5 +++--
 arch/powerpc/kernel/idle_book3e.S|  2 +-
 arch/powerpc/kernel/idle_power4.S|  3 ++-
 arch/powerpc/kernel/irq.c|  9 +
 arch/powerpc/kernel/process.c|  3 ++-
 arch/powerpc/kernel/setup_64.c   |  3 +++
 arch/powerpc/kernel/time.c   |  2 +-
 arch/powerpc/mm/hugetlbpage.c|  2 +-
 arch/powerpc/perf/core-book3s.c  |  2 +-
 15 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index bed66e5743b3..38272fe8a757 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -408,7 +408,7 @@ label##_relon_hv:   
\
 
 #define __SOFTEN_TEST(h, vec)  \
lbz r10,PACASOFTIRQEN(r13); \
-   cmpwi   r10,0;  \
+   cmpwi   r10,IRQ_DISABLE_MASK_LINUX; 
\
li  r10,SOFTEN_VALUE_##vec; \
beq masked_##h##interrupt
 #define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index df5def1f635a..1fcc2fd7275a 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -64,9 +64,10 @@ static inline unsigned long arch_local_irq_disable(void)
unsigned long flags, zero;
 
asm volatile(
-   "li %1,0; lbz %0,%2(13); stb %1,%2(13)"
+   "li %1,%3; lbz %0,%2(13); stb %1,%2(13)"
: "=r" (flags), "=" (zero)
-   : "i" (offsetof(struct paca_struct, soft_enabled))
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+ "i" (IRQ_DISABLE_MASK_LINUX)
: "memory");
 
return flags;
@@ -76,7 +77,7 @@ extern void arch_local_irq_restore(unsigned long);
 
 static inline void arch_local_irq_enable(void)
 {
-   arch_local_irq_restore(1);
+   arch_local_irq_restore(IRQ_DISABLE_MASK_NONE);
 }
 
 static inline unsigned long arch_local_irq_save(void)
@@ -86,7 +87,7 @@ static inline unsigned long arch_local_irq_save(void)
 
 static inline bool arch_irqs_disabled_flags(unsigned long flags)
 {
-   return flags == 0;
+   return flags == IRQ_DISABLE_MASK_LINUX;
 }
 
 static inline bool arch_irqs_disabled(void)
@@ -106,9 +107,9 @@ static inline bool arch_irqs_disabled(void)
u8 _was_enabled;\
__hard_irq_disable();   \
_was_enabled = local_paca->soft_enabled;\
-   local_paca->soft_enabled = 0;   \
+   local_paca->soft_enabled = IRQ_DISABLE_MASK_LINUX;\
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;  \
-   if (_was_enabled)   \
+   if (_was_enabled == IRQ_DISABLE_MASK_NONE)  \
trace_hardirqs_off();   \
 } while(0)
 
@@ -131,7 +132,7 @@ static inline void may_hard_irq_enable(void)
 
 static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 {
-   return !regs->softe;
+   return (regs->softe == IRQ_DISABLE_MASK_LINUX);
 }
 
 extern bool prep_irq_for_idle(void);
diff --git a/arch/powerpc/include/asm/irqflags.h 
b/arch/powerpc/include/asm/irqflags.h
index f2149066fe5d..d0ed2a7d7d10 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -48,8 +48,8 @@
 #define RECONCILE_IRQ_STATE(__rA, __rB)\
lbz __rA,PACASOFTIRQEN(r13);\
lbz __rB,PACAIRQHAPPENED(r13);  \
-   cmpwi   cr0,__rA,0; \
-   li  __rA,0; \
+   cmpwi   cr0,__rA,IRQ_DISABLE_MASK_LINUX;\
+   li  __rA,IRQ_DISABLE_MASK_LINUX;\
ori __rB,__rB,PACA_IRQ_HARD_DIS;\
stb __rB,PACAIRQHAPPENED(r13);  \
beq 44f;\
@@ -63,7 +63,7 @@
 
 #define RECONCILE_IRQ_STATE(__rA, __rB)\
lbz __rA,PACAIRQHAPPENED(r13);  \
-   li  __rB,0; \
+   li  __rB,IRQ_DISABLE_MASK_LINUX;\
ori __rA,__rA,PACA_IRQ_HARD_DIS;\

[RFC PATCH v3 00/12] powerpc: "paca->soft_enabled" based local atomic operation implementation

2016-08-25 Thread Madhavan Srinivasan
Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

Here is the design of the patchset. Since local_* operations
are only need to be atomic to interrupts (IIUC), we have two options.
Either replay the "op" if interrupted or replay the interrupt after
the "op". Initial patchset posted was based on implementing local_* operation
based on CR5 which replay's the "op". Patchset had issues in case of
rewinding the address pointor from an array. This make the slow patch
really slow. Since CR5 based implementation proposed using __ex_table to find
the rewind address, this rasied concerns about size of __ex_table and vmlinux.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123115.html

But this patchset uses Benjamin Herrenschmidt suggestion of using
arch_local_irq_disable() to soft_disable interrupts (including PMIs).
After finishing the "op", arch_local_irq_restore() called and correspondingly
interrupts are replayed if any occured.

Current paca->soft_enabled logic is reserved and MASKABLE_EXCEPTION_* macros
are extended to support this feature.

patch re-write the current local_* functions to use arch_local_irq_disbale.
Base flow for each function is

 {
soft_irq_set_mask()
load
..
store
arch_local_irq_restore()
 }

Reason for the approach is that, currently l[w/d]arx/st[w/d]cx.
instruction pair is used for local_* operations, which are heavy
on cycle count and they dont support a local variant. So to
see whether the new implementation helps, used a modified
version of Rusty's benchmark code on local_t.

https://lkml.org/lkml/2008/12/16/450

Modifications to Rusty's benchmark code:
 - Executed only local_t test

Here are the values with the patch.

Time in ns per iteration

Local_t Without Patch   With Patch

_inc28  8
_add28  8
_read   3   3
_add_return 28  7

Currently only asm/local.h has been rewrite, and also
the entire change is tested only in PPC64 (pseries guest)
and PPC64 LE host.

First four are the clean up patches which lays the foundation
to make things easier. Fifth patch in the patchset reverse the
current soft_enabled logic and commit message details the reason and
need for this change. Sixth and seventh patch refactor's the 
__EXPECTION_PROLOG_1
code to support addition of a new parameter to MASKABLE_* macros. New parameter
will give the possible mask for the interrupt. Rest of the patches are 
to add support for maskable PMI and implementation of local_t using 
arch_local_irq_*().

Since the patchset is experimental, changes made are focused on pseries and
powernv platforms only. Would really like to know comments for
this approach before extending to other powerpc platforms.

Changelog RFC v2:
1)Renamed IRQ_DISABLE_LEVEL_* to IRQ_DISABLE_MASK_* and made logic changes
  to treat soft_enabled as a mask and not a flag or level.
2)Added a new Kconfig variable to support a WARN_ON
3)Refactored patchset for eaiser review.
4)Made changes to commit messages.
5)Made changes for BOOK3E version

Changelog RFC v1:

1)Commit messages are improved.
2)Renamed the arch_local_irq_disable_var to soft_irq_set_level as suggested
3)Renamed the LAZY_INTERRUPT* macro to IRQ_DISABLE_LEVEL_* as suggested
4)Extended the MASKABLE_EXCEPTION* macros to support additional parameter.
5)Each MASKABLE_EXCEPTION_* macro will carry a "mask_level"
6)Logic to decide on jump to maskable_handler in SOFTEN_TEST is now based on
  "mask_level"
7)__EXCEPTION_PROLOG_1 is factored out to support "mask_level" parameter.
  This reduced the code changes needed for supporting "mask_level" parameters.

Madhavan Srinivasan (12):
  powerpc: Add #defs for paca->soft_enabled flags
  powerpc: Cleanup to use IRQ_DISABLE_MASK_* macros for
paca->soft_enabled update
  powerpc: move set_soft_enabled()
  powerpc: Use set_soft_enabled api to update paca->soft_enabled
  powerpc: reverse the soft_enable logic
  powerpc: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_*
  powerpc: Add new _EXCEPTION_PROLOG_1 macro
  powerpc: Introduce new mask bit for soft_enabled
  powerpc: Add "bitmask" paramater to MASKABLE_* macros
  powerpc: Add support to mask perf interrupts
  powerpc: Support to replay PMIs
  powerpc: rewrite local_t using soft_irq

 arch/powerpc/Kconfig |   4 ++
 arch/powerpc/include/asm/exception-64s.h | 103 +--
 arch/powerpc/include/asm/hw_irq.h|  46 +++---
 arch/powerpc/include/asm/irqflags.h  |   8 +--
 arch/powerpc/include/asm/kvm_ppc.h   |   2 +-
 arch/powerpc/include/asm/local.h |  91 ++-
 arch/powerpc/kernel/entry_64.S   |  24