Re: 32-bit powerpc, aty128fb: vmap allocation for size 135168 failed

2017-08-18 Thread Meelis Roos
> Meelis Roos  writes:
> 
> > I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G 
> > RAM and after some time of sddm respawning and X trying to restart, 
> > dmesg is full of messages about vmap allocation failures.
> 
> Did it just start happening? ie. did rc4 work?

It goes back to at least 4.0 - that's the oldest kernel I had laying 
around precompiled. The messages about ROM signature changed somewehere 
between 4.0 and 4.7 (4.7 is already like 4.13) but after some time, the 
same vmalloc errors appear.

Maybe the userspace has changed with more respawning that brings the 
problem out.

I tried to read the code but I do not understand it yet. The warning 
seems to come from generic pci_map_rom() checking ROM size, and 
returning rom pointer to aty128fb (it returns resource size too but that 
is ignored). aty128fb starts to look at the x86 PCI ROM signature again 
but does not tell that the signature is missing. How come?

-- 
Meelis Roos (mr...@linux.ee)


Re: [RFC v7 26/25] mm/mprotect, powerpc/mm/pkeys, x86/mm/pkeys: Add sysfs interface

2017-08-18 Thread Thiago Jung Bauermann

Ram Pai  writes:

> On Fri, Aug 11, 2017 at 02:34:43PM -0300, Thiago Jung Bauermann wrote:
>> Expose useful information for programs using memory protection keys.
>> Provide implementation for powerpc and x86.
>> 
>> On a powerpc system with pkeys support, here is what is shown:
>> 
>> $ head /sys/kernel/mm/protection_keys/*
>> ==> /sys/kernel/mm/protection_keys/disable_execute_supported <==
>> true
>
> We should not just call out disable_execute_supported.
> disable_access_supported and disable_write_supported should also 
> be called out.

Ok, will do in the next version.

>> ==> /sys/kernel/mm/protection_keys/total_keys <==
>> 32
>> 
>
>> ==> /sys/kernel/mm/protection_keys/usable_keys <==
>> 30
>
> This is little nebulous.  It depends on how we define
> usable as.  Is it the number of keys that are available
> to the app?  If that is the case that value is dynamic.
> Sometime the OS steals one key for execute-only key.
> And anything that is dynamic can be inherently racy.
> So I think we should define 'usable' as guaranteed number
> of keys available to the app

Yes, that is how I defined it: the difference between the number of keys
provided by the platform and the keys reserved by the OS. I do need to
spell it out somewhere inside Documentation/ though.

> and display a value that is one less than what is available.
>
> in the above example the value should be 29.

Good point, I didn't account for the execute-only key. I will make that
change in the next version.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [RFC v7 24/25] powerpc: Deliver SEGV signal on pkey violation

2017-08-18 Thread Ram Pai
On Fri, Aug 18, 2017 at 10:04:10AM -0700, Ram Pai wrote:
> On Fri, Aug 18, 2017 at 02:48:31PM +1000, Michael Ellerman wrote:
> > Ram Pai  writes:
> > > On Fri, Aug 11, 2017 at 08:26:30PM +1000, Michael Ellerman wrote:
> > >> Thiago Jung Bauermann  writes:
> > >> 
> > >> > Ram Pai  writes:
> > >> >
> > >> >> The value of the AMR register at the time of exception
> > >> >> is made available in gp_regs[PT_AMR] of the siginfo.
> > ...
> > >> 
> > >> I don't understand why we are putting it in there at all?
> > >> 
> > >> Is there some special handling of the actual register on signals? I
> > >> haven't seen it. In which case the process can get the value of AMR by
> > >> reading the register. ??
> > >
> > > The value of AMR register at the time of the key-exception may not be
> > > the same when the signal handler is invoked. 
> > 
> > Why not?
> 
> Assume two threads of a task.
> 
> T1:  mprotect_key(foo, PAGE_SIZE, pkey=4);
> T1:  set AMR to disable access for pkey 4;
> T1:  key fault
> T2: set AMR to enable access to pkey 4;
> T1:  fault handler called.
>   This fault handler will see the new AMR and not the
>   one at the time of the fault.

Ok. Ben debunked my above reason. So at this point I dont have a solid
reason to defend my statement --
"The value of AMR register at the time of the key-exception may not be
the same when the signal handler is invoked."

Coming back to the your main question, "why we need to provide the
contents of AMR register to the signal handler?" --   the only reason
i can see is, probably tools like gdb and ptrace may find it useful.

And since it was suggested that content of IAMR is also useful to the
application, the value of which cannot be accessed from userspace,
it may make sense to provide both the contents.

Please suggest.
RP



Re: [RFC v7 24/25] powerpc: Deliver SEGV signal on pkey violation

2017-08-18 Thread Ram Pai
On Sat, Aug 19, 2017 at 07:54:20AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2017-08-18 at 10:04 -0700, Ram Pai wrote:
> > Assume two threads of a task.
> > 
> > T1:  mprotect_key(foo, PAGE_SIZE, pkey=4);
> > T1:  set AMR to disable access for pkey 4;
> > T1:  key fault
> > T2: set AMR to enable access to pkey 4;
> > T1:  fault handler called.
> > This fault handler will see the new AMR and not the
> > one at the time of the fault.
> 
> You aren't context switching AMR with the threads ? Ugh... something is
> very wrong then.

I do store and restore AMR accross context switch. So nevermind; the
above problem cannot happen.

RP



Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Bart Van Assche
On Fri, 2017-08-18 at 16:57 -0500, Brian King wrote:
> To add to my analysis above, #9 should not be there... It looks like
> jiffies_at_alloc would also be getting reinitialized in this case, resulting 
> in
> a perpetual retry, which is what I was seeing.

Hello Brian,

Some time ago I noticed that jiffies_at_alloc is indeed set while a command
is being prepared instead of at command allocation time. I think that
behavior was introduced in 2005 through commit b21a41385118 ("[SCSI] add
global timeout to the scsi mid-layer"). At that time SCSI commands were
allocated at prep time and freed at unprep time. Recently that has been
changed such that a SCSI command (struct scsi_cmnd) has the same lifetime as
struct request. In other words, it was not possible in 2005 but it is
possible today to set jiffies_at_alloc at command allocation time instead of
when a command is being prepared. Do you want me to submit a patch that
implements this change?

Bart.



Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Brian King
On 08/18/2017 04:41 PM, Bart Van Assche wrote:
> On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
>> I think I have an understanding what is going on and why Bart's patch is 
>> causing problems for ipr.
>> I can work around the boot hang in ipr, but ultimately I think we need to 
>> figure out a fix
>> in scsi / block. I added some tracing and confirmed its not a matter of 
>> commands getting stuck
>> in ipr. The issue is we are retrying failed commands until we finally run 
>> out of time. This is
>> what I see:
>>
>> 1. sd_revalidate_disk calls scsi_report_opcode
>> 2. ipr RAID arrays don't support MAINTENANCE_IN / 
>> MI_REPORT_SUPPORTED_OPERATION_CODES
>> 3. ipr returns the command with DID_ERROR
>> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, 
>> and returns NEEDS_RETRY
>> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
>> calls scsi_mq_requeue_cmd
>> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior 
>> we did not
>> 7. This results in the command getting scmd->retries zeroed out when it gets 
>> re-queued,
>>since we go through prep again and we lose our retry counter, resulting 
>> in lots and lots of retries.
>> 8. Since the default command timeout for an ipr RAID array is 120 seconds, 
>> these retries go on for
>>quite a long time...
>> 9. Finally, the command has been retried so long we trip over the overall 
>> retry timer
>>in scsi_softirq_done and we timeout the command.
>>
>> I'll follow up with a patch to ipr to workaround the hang, but I think we 
>> need to somehow preserve
>> the retry counter in the scsi command, as this will likely cause issues with 
>> other drivers. 
> 
> Hello Brian,
> 
> Thanks for the detailed analysis. This is very helpful. Have you considered
> to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
> CODES commands with the appropriate check condition code instead of DID_ERROR?

Yes. That data is actually in the sense buffer, but since I'm also setting 
DID_ERROR,
scsi_decide_disposition isn't using it. I've got a patch to do just as you 
suggest,
to stop setting DID_ERROR when there is more detailed error data available, 
but it will need some additional testing before I submit, as it will impact much
more than just this case. 

To add to my analysis above, #9 should not be there... It looks like
jiffies_at_alloc would also be getting reinitialized in this case, resulting in
a perpetual retry, which is what I was seeing.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [RFC v7 24/25] powerpc: Deliver SEGV signal on pkey violation

2017-08-18 Thread Benjamin Herrenschmidt
On Fri, 2017-08-18 at 10:04 -0700, Ram Pai wrote:
> Assume two threads of a task.
> 
> T1:  mprotect_key(foo, PAGE_SIZE, pkey=4);
> T1:  set AMR to disable access for pkey 4;
> T1:  key fault
> T2: set AMR to enable access to pkey 4;
> T1:  fault handler called.
> This fault handler will see the new AMR and not the
> one at the time of the fault.

You aren't context switching AMR with the threads ? Ugh... something is
very wrong then.

Ben.




Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Bart Van Assche
On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
> I think I have an understanding what is going on and why Bart's patch is 
> causing problems for ipr.
> I can work around the boot hang in ipr, but ultimately I think we need to 
> figure out a fix
> in scsi / block. I added some tracing and confirmed its not a matter of 
> commands getting stuck
> in ipr. The issue is we are retrying failed commands until we finally run out 
> of time. This is
> what I see:
> 
> 1. sd_revalidate_disk calls scsi_report_opcode
> 2. ipr RAID arrays don't support MAINTENANCE_IN / 
> MI_REPORT_SUPPORTED_OPERATION_CODES
> 3. ipr returns the command with DID_ERROR
> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and 
> returns NEEDS_RETRY
> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
> calls scsi_mq_requeue_cmd
> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior 
> we did not
> 7. This results in the command getting scmd->retries zeroed out when it gets 
> re-queued,
>since we go through prep again and we lose our retry counter, resulting in 
> lots and lots of retries.
> 8. Since the default command timeout for an ipr RAID array is 120 seconds, 
> these retries go on for
>quite a long time...
> 9. Finally, the command has been retried so long we trip over the overall 
> retry timer
>in scsi_softirq_done and we timeout the command.
> 
> I'll follow up with a patch to ipr to workaround the hang, but I think we 
> need to somehow preserve
> the retry counter in the scsi command, as this will likely cause issues with 
> other drivers. 

Hello Brian,

Thanks for the detailed analysis. This is very helpful. Have you considered
to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
CODES commands with the appropriate check condition code instead of DID_ERROR?

Thanks,

Bart.

[PATCH] ipr: Set no_report_opcodes for RAID arrays

2017-08-18 Thread Brian King
Since ipr RAID arrays do not support the MAINTENANCE_IN /
MI_REPORT_SUPPORTED_OPERATION_CODES, set no_report_opcodes
to prevent it from being sent.

Signed-off-by: Brian King 
---

Index: linux-2.6.git/drivers/scsi/ipr.c
===
--- linux-2.6.git.orig/drivers/scsi/ipr.c
+++ linux-2.6.git/drivers/scsi/ipr.c
@@ -4935,6 +4935,7 @@ static int ipr_slave_configure(struct sc
}
if (ipr_is_vset_device(res)) {
sdev->scsi_level = SCSI_SPC_3;
+   sdev->no_report_opcodes = 1;
blk_queue_rq_timeout(sdev->request_queue,
 IPR_VSET_RW_TIMEOUT);
blk_queue_max_hw_sectors(sdev->request_queue, 
IPR_VSET_MAX_SECTORS);



Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Brian King
On 08/17/2017 10:52 AM, Bart Van Assche wrote:
> On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote:
>> On 08/16/2017 12:21 PM, Bart Van Assche wrote:
>>> On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote:
 As of next-20170809, linux-next on powerpc boot hung with below trace
 message.

 [ ... ]

 A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq:
 Always unprepare ...) in the merge branch 'scsi/for-next'

 System booted fine when the below commit is reverted: 

 commit 270065e92c317845d69095ec8e3d18616b5b39d5
 Author: Bart Van Assche 
 Date:   Thu Aug 3 14:40:14 2017 -0700

 scsi: scsi-mq: Always unprepare before requeuing a request
>>>
>>> Hello Brian and Michael,
>>>
>>> Do you agree that this probably indicates a bug in the PowerPC block driver
>>> that is used to access the boot disk? Anyway, since a solution is not yet
>>> available, I will submit a revert for this patch.
>>
>> I've been looking at this a bit, and can recreate the issue, but haven't
>> got to root cause of the issue as of yet. If I do a sysrq-w while the system 
>> is hung
>> during boot I see this:
>>
>> [   25.561523] Workqueue: events_unbound async_run_entry_fn
>> [   25.561527] Call Trace:
>> [   25.561529] [c001697873f0] [c00169701600] 0xc00169701600 
>> (unreliable)
>> [   25.561534] [c001697875c0] [c001ab78] __switch_to+0x2e8/0x430
>> [   25.561539] [c00169787620] [c091ccb0] __schedule+0x310/0xa00
>> [   25.561543] [c001697876f0] [c091d3e0] schedule+0x40/0xb0
>> [   25.561548] [c00169787720] [c0921e40] 
>> schedule_timeout+0x200/0x430
>> [   25.561553] [c00169787810] [c091db10] 
>> io_schedule_timeout+0x30/0x70
>> [   25.561558] [c00169787840] [c091e978] 
>> wait_for_common_io.constprop.3+0x178/0x280
>> [   25.561563] [c001697878c0] [c047f7ec] blk_execute_rq+0x7c/0xd0
>> [   25.561567] [c00169787910] [c0614cd0] scsi_execute+0x100/0x230
>> [   25.561572] [c00169787990] [c060d29c] 
>> scsi_report_opcode+0xbc/0x170
>> [   25.561577] [c00169787a50] [d4fe6404] 
>> sd_revalidate_disk+0xe04/0x1620 [sd_mod]
>> [   25.561583] [c00169787b80] [d4fe6d84] 
>> sd_probe_async+0xb4/0x230 [sd_mod]
>> [   25.561588] [c00169787c00] [c010fc44] 
>> async_run_entry_fn+0x74/0x210
>> [   25.561593] [c00169787c90] [c0102f48] 
>> process_one_work+0x198/0x480
>> [   25.561598] [c00169787d30] [c01032b8] worker_thread+0x88/0x510
>> [   25.561603] [c00169787dc0] [c010b030] kthread+0x160/0x1a0
>> [   25.561608] [c00169787e30] [c000b3a4] 
>> ret_from_kernel_thread+0x5c/0xb8
>>
>> I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID 
>> arrays don't support
>> the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting 
>> sdev->no_report_opcodes = 1
>> in ipr's slave configure. This seems to eliminate the boot hang for me, but 
>> is only working around
>> the issue. Since this command is not supported by ipr, it should return with 
>> an illegal request.
>> When I'm hung at this point, there is nothing outstanding to the adapter / 
>> driver. I'll continue
>> debugging...
> 
> (+linux-scsi)
> 
> Hello Brian,
> 
> Is kernel debugging enabled on your test system? Is lockdep enabled?
> Anyway, stack traces like the above usually mean that a request got stuck in
> a block or scsi driver (ipr in this case). Information about pending requests,
> including the SCSI CDB, is available under /sys/kernel/debug/block (see also
> commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")).

I think I have an understanding what is going on and why Bart's patch is 
causing problems for ipr.
I can work around the boot hang in ipr, but ultimately I think we need to 
figure out a fix
in scsi / block. I added some tracing and confirmed its not a matter of 
commands getting stuck
in ipr. The issue is we are retrying failed commands until we finally run out 
of time. This is
what I see:

1. sd_revalidate_disk calls scsi_report_opcode
2. ipr RAID arrays don't support MAINTENANCE_IN / 
MI_REPORT_SUPPORTED_OPERATION_CODES
3. ipr returns the command with DID_ERROR
4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and 
returns NEEDS_RETRY
5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
calls scsi_mq_requeue_cmd
6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior we 
did not
7. This results in the command getting scmd->retries zeroed out when it gets 
re-queued,
   since we go through prep again and we lose our retry counter, resulting in 
lots and lots of retries.
8. Since the default command timeout for an ipr RAID array is 120 seconds, 
these retries go on for
   quite a long time...
9. Finally, the command has been retried so long we trip over the overall retry 
timer
   in scsi_soft

Re: [PATCH] tpm: vtpm: constify vio_device_id

2017-08-18 Thread Jason Gunthorpe
On Fri, Aug 18, 2017 at 09:32:46PM +1000, Michael Ellerman wrote:

> >>  drivers/char/tpm/tpm_ibmvtpm.c | 2 +-
> 
> Who merges changes for this driver? I assume it's Jarkko?

Yes

Jason


Re: [RFC v7 24/25] powerpc: Deliver SEGV signal on pkey violation

2017-08-18 Thread Ram Pai
On Fri, Aug 18, 2017 at 02:48:31PM +1000, Michael Ellerman wrote:
> Ram Pai  writes:
> > On Fri, Aug 11, 2017 at 08:26:30PM +1000, Michael Ellerman wrote:
> >> Thiago Jung Bauermann  writes:
> >> 
> >> > Ram Pai  writes:
> >> >
> >> >> The value of the AMR register at the time of exception
> >> >> is made available in gp_regs[PT_AMR] of the siginfo.
> ...
> >> 
> >> I don't understand why we are putting it in there at all?
> >> 
> >> Is there some special handling of the actual register on signals? I
> >> haven't seen it. In which case the process can get the value of AMR by
> >> reading the register. ??
> >
> > The value of AMR register at the time of the key-exception may not be
> > the same when the signal handler is invoked. 
> 
> Why not?

Assume two threads of a task.

T1:  mprotect_key(foo, PAGE_SIZE, pkey=4);
T1:  set AMR to disable access for pkey 4;
T1:  key fault
T2: set AMR to enable access to pkey 4;
T1:  fault handler called.
This fault handler will see the new AMR and not the
one at the time of the fault.

RP



Re: [RFC v7 25/25] powerpc: Enable pkey subsystem

2017-08-18 Thread Ram Pai
On Fri, Aug 18, 2017 at 12:26:33PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael Ellerman  writes:
> 
> > Ram Pai  writes:
> >> On Thu, Aug 17, 2017 at 05:30:27PM -0300, Thiago Jung Bauermann wrote:
> >>> Ram Pai  writes:
> >>> > On Thu, Aug 10, 2017 at 06:27:34PM -0300, Thiago Jung Bauermann wrote:
> >>> >> Ram Pai  writes:
> >>> >> > @@ -227,6 +229,24 @@ static inline void pkey_mm_init(struct 
> >>> >> > mm_struct *mm)
> >>> >> >  mm->context.execute_only_pkey = -1;
> >>> >> >  }
> >>> >> >
> >>> >> > +static inline void pkey_mmu_values(int total_data, int 
> >>> >> > total_execute)
> >>> >> > +{
> >>> >> > +/*
> >>> >> > + * since any pkey can be used for data or execute, we
> >>> >> > + * will  just  treat all keys as equal and track them
> >>> >> > + * as one entity.
> >>> >> > + */
> >>> >> > +pkeys_total = total_data + total_execute;
> >>> >> > +}
> >>> >> 
> >>> >> Right now this works because the firmware reports 0 execute keys in the
> >>> >> device tree, but if (when?) it is fixed to report 32 execute keys as
> >>> >> well as 32 data keys (which are the same keys), any place using
> >>> >> pkeys_total expecting it to mean the number of keys that are available
> >>> >> will be broken. This includes pkey_initialize and mm_pkey_is_allocated.
> >>> >
> >>> > Good point. we should just ignore total_execute. It should
> >>> > be the same value as total_data on the latest platforms.
> >>> > On older platforms it will continue to be zero.
> >>> 
> >>> Indeed. There should just be a special case to disable execute
> >>> protection for P7.
> >>
> >> Ok. we should disable execute protection for P7 and earlier generations of 
> >> CPU.
> >
> > You should do what the device tree says you can do.
> >
> > If it says there are no execute keys then you shouldn't touch the IAMR.
> 
> The downside of that approach is that the device tree in P8 LPARs
> currently says there are no execute keys even though there are. We'd
> have to require customers to upgrade their firmware to a fixed version
> if they want to use execute keys.

Correct. the device tree for this property currently does not correctly
capture the number of execute keys.

On skiboot based systems, there is not device tree property to refer to
aswell. Thiago has a patch to fix it, but existing systems without the
skiboot fix, will not expose that property.

So unfortunately we will have to rely on multiple peices of information
to enable the pkey system in the kernel.


RP

> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center

-- 
Ram Pai



Re: [PATCH 7/7] powerpc: use helper functions to get and set hash slots

2017-08-18 Thread Ram Pai
On Fri, Aug 18, 2017 at 10:18:31PM +1000, Michael Ellerman wrote:
> Ram Pai  writes:
> 
> > replace redundant code in __hash_page_64K(), __hash_page_huge(),
> > __hash_page_4K(), __hash_page_4K() and flush_hash_page()   with
> > helper functions pte_get_hash_gslot() and   pte_set_hash_slot()
> 
> This seems out of order.
> 
> At lease some of these are patching or even entirely replacing code you
> just added.
> 
> > diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
> > b/arch/powerpc/mm/hugetlbpage-hash64.c
> > index 5964b6d..e6dcd50 100644
> > --- a/arch/powerpc/mm/hugetlbpage-hash64.c
> > +++ b/arch/powerpc/mm/hugetlbpage-hash64.c
> > @@ -112,18 +103,7 @@ int __hash_page_huge(unsigned long ea, unsigned long 
> > access, unsigned long vsid,
> > return -1;
> > }
> >  
> > -#ifdef CONFIG_PPC_64K_PAGES
> > -   /*
> > -* Insert slot number & secondary bit in PTE second half.
> > -*/
> > -   hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
> > -   rpte.hidx &= ~(0xfUL);
> > -   *hidxp = rpte.hidx  | (slot & 0xfUL);
> > -   /*
> > -* check __real_pte for details on matching smp_rmb()
> > -*/
> > -   smp_wmb();
> > -#endif /* CONFIG_PPC_64K_PAGES */
> > +   new_pte |= pte_set_hash_slot(ptep, rpte, 0, slot);
> > }
> 
> Here for example. That entire chunk was just added in patch in 2.

Had it that way in my earlier patch series. But others found it
difficult to review its correctness. So had to have the code inline
first, followed by the modularization later; in this patch series.
Looks like you prefer it the earlier way. Will do in the next series.

RP



Re: [PATCH kernel] PCI: Disable IOV before pcibios_sriov_disable()

2017-08-18 Thread Bjorn Helgaas
On Fri, Aug 18, 2017 at 08:05:42AM +1000, Alexey Kardashevskiy wrote:
> On 11/08/17 18:19, Alexey Kardashevskiy wrote:
> > From: Gavin Shan 
> > 
> > The PowerNV platform is the only user of pcibios_sriov_disable().
> > The IOV BAR could be shifted by pci_iov_update_resource(). The
> > warning message in the function is printed if the IOV capability
> > is in enabled (PCI_SRIOV_CTRL_VFE && PCI_SRIOV_CTRL_MSE) state.
> > 
> > This is the backtrace of what is happening:
> >pci_disable_sriov
> >sriov_disable
> >pnv_pci_sriov_disable
> >pnv_pci_vf_resource_shift
> >pci_update_resource
> >pci_iov_update_resource
> > 
> > This fixes the issue by disabling IOV capability before calling
> > pcibios_sriov_disable(). With it, the disabling path matches
> > the enabling path: pcibios_sriov_enable() is called before the
> > IOV capability is enabled.
> > 
> > Cc: shan.ga...@gmail.com
> > Cc: Benjamin Herrenschmidt 
> > Cc: Paul Mackerras 
> > Reported-by: Carol L Soto 
> > Signed-off-by: Gavin Shan 
> > Tested-by: Carol L Soto 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > 
> > This is repost. Since Gavin left the team, I am trying to push it out.
> > The previos converstion is here: https://patchwork.ozlabs.org/patch/732653/
> > 
> > Two questions were raised then. I'll try to comment on this below.
> 
> Bjorn, ping? Thanks.

Thanks for the reminder.  This is in patchwork, so it's on my to-do
list.

My last response in the thread above was:

  I'm not going to merge this without a comment in
  pnv_pci_vf_resource_shift() that addresses the two questions I
  raised in my very first response.  I don't think the existing
  comment about "After doing so, there would be a 'hole'" is
  sufficient.  If it were sufficient, I wouldn't have raised the
  questions in the first place.

The problem here is that I'm looking for a comment *in the code*, and
you and Gavin are giving responses and clarifications in email.

What we need to do is transfer this email information into something
useful when reading the code, i.e., a comment in the code.

> >> 1) "res" is already in the resource tree, so we shouldn't be changing
> >>   its start address, because that may make the tree inconsistent,
> >>   e.g., the resource may no longer be completely contained in its
> >>   parent, it may conflict with a sibling, etc.
> > 
> > We should not, yes. But...
> > 
> > At the boot time IOV BAR gets as much MMIO space as it can possibly use.
> > (Embarassingly I cannot trace where this is coming from, 8GB is selected
> > via pci_assign_unassigned_root_bus_resources() path somehow).
> > For example, it is 256*32MB=8GB where 256 is maximum PEs number and 32MB
> > is a PF/VF BAR size. Whatever shifting we do afterwards, the boudaries of
> > that 8GB area do not change and we test it in pnv_pci_vf_resource_shift():
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/platforms/powernv/pci-ioda.c#n987
> > 
> >> 2) If we update "res->start", shouldn't we update "res->end"
> >>   correspondingly?
> > 
> > We have to update the PF's IOV BAR address as we allocate PEs dynamically
> > and we do not know in advance where our VF numbers start in that
> > 8GB window. So we change IOV BASR start. Changing the end may make it
> > look more like there is a free area to use but in reality it won't be
> > usable as well as the area we "release" by shifting the start address.
> > 
> > We could probably move that M64 MMIO window by the same delta in
> > opposite direction so the IOV BAR start address would remain the same
> > but its VF#0 would be mapped to let's say PF#5. I am just afraid there
> > is an alignment requirement for these M64 window start address; and this
> > would be even more tricky to manage.
> > 
> > We could also create reserved areas for the amount of space "release" by
> > moving the start address, not sure how though.
> > 
> > So how do we proceed with this particular patch now? Thanks.
> > ---
> >  drivers/pci/iov.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index 120485d6f352..ac41c8be9200 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -331,7 +331,6 @@ static int sriov_enable(struct pci_dev *dev, int 
> > nr_virtfn)
> > while (i--)
> > pci_iov_remove_virtfn(dev, i, 0);
> >  
> > -   pcibios_sriov_disable(dev);
> >  err_pcibios:
> > iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> > pci_cfg_access_lock(dev);
> > @@ -339,6 +338,8 @@ static int sriov_enable(struct pci_dev *dev, int 
> > nr_virtfn)
> > ssleep(1);
> > pci_cfg_access_unlock(dev);
> >  
> > +   pcibios_sriov_disable(dev);
> > +
> > if (iov->link != dev->devfn)
> > sysfs_remove_link(&dev->dev.kobj, "dep_link");
> >  
> > @@ -357,14 +358,14 @@ static void sriov_disable(struct pci_dev *dev)
> > for (i = 0; i < iov->num_VFs; i++)
> > 

Re: [RFC v7 25/25] powerpc: Enable pkey subsystem

2017-08-18 Thread Thiago Jung Bauermann

Michael Ellerman  writes:

> Ram Pai  writes:
>> On Thu, Aug 17, 2017 at 05:30:27PM -0300, Thiago Jung Bauermann wrote:
>>> Ram Pai  writes:
>>> > On Thu, Aug 10, 2017 at 06:27:34PM -0300, Thiago Jung Bauermann wrote:
>>> >> Ram Pai  writes:
>>> >> > @@ -227,6 +229,24 @@ static inline void pkey_mm_init(struct mm_struct 
>>> >> > *mm)
>>> >> >mm->context.execute_only_pkey = -1;
>>> >> >  }
>>> >> >
>>> >> > +static inline void pkey_mmu_values(int total_data, int total_execute)
>>> >> > +{
>>> >> > +  /*
>>> >> > +   * since any pkey can be used for data or execute, we
>>> >> > +   * will  just  treat all keys as equal and track them
>>> >> > +   * as one entity.
>>> >> > +   */
>>> >> > +  pkeys_total = total_data + total_execute;
>>> >> > +}
>>> >> 
>>> >> Right now this works because the firmware reports 0 execute keys in the
>>> >> device tree, but if (when?) it is fixed to report 32 execute keys as
>>> >> well as 32 data keys (which are the same keys), any place using
>>> >> pkeys_total expecting it to mean the number of keys that are available
>>> >> will be broken. This includes pkey_initialize and mm_pkey_is_allocated.
>>> >
>>> > Good point. we should just ignore total_execute. It should
>>> > be the same value as total_data on the latest platforms.
>>> > On older platforms it will continue to be zero.
>>> 
>>> Indeed. There should just be a special case to disable execute
>>> protection for P7.
>>
>> Ok. we should disable execute protection for P7 and earlier generations of 
>> CPU.
>
> You should do what the device tree says you can do.
>
> If it says there are no execute keys then you shouldn't touch the IAMR.

The downside of that approach is that the device tree in P8 LPARs
currently says there are no execute keys even though there are. We'd
have to require customers to upgrade their firmware to a fixed version
if they want to use execute keys.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH v7 1/4] powerpc/fadump: reduce memory consumption for capture kernel

2017-08-18 Thread Hari Bathini



On Friday 18 August 2017 05:27 PM, Michal Suchánek wrote:

On Fri, 18 Aug 2017 16:20:53 +0530
Hari Bathini  wrote:


Hi Michal,


Thanks for the patches. I tried testing with the patches:

[0.00] fadump: Firmware-assisted dump is active.
[0.00] fadump: Modifying command line to enforce the
additional parameters passed through 'fadump_extra_args='
[0.00] fadump: Original command line:
BOOT_IMAGE=/vmlinux-4.13.0-rc1-bz155783+ root=/dev/mapper/rhel-root
ro crashkernel=2048M fadump=on fadump_reserve_mem=1024M
"fadump_extra_args=nr_cpus=1 numa=off udev.childern-max=2"
rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap
[0.00] fadump: Modified command line:
BOOT_IMAGE=/vmlinux-4.13.0-rc1-bz155783+ root=/dev/mapper/rhel-root
ro crashkernel=2048M fadump=on fadump_reserve_mem=1024M
"fadump_extra_args nr_cpus=1 numa=off udev.childern-max=2"
rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap

Looks like the quotes are retained not enforcing the parameters...

Hello,

You are passing an argument >>"fadump_extra_args<<  - that is an
argument containing a quote in the name. I did not test this scenario


Actually, this was not intentional..
I passed fadump_extra_args="nr_cpus=1 numa=off udev.childern-max=2" 
through grub loader
but chosen/bootargs ended up having "fadump_extra_args=nr_cpus=1 
numa=off udev.childern-max=2".

Need to check why that is..

Thanks
Hari


assuming the argument name would not match in this case. It would
probably not match if the quote was in the middle of the argument
name but at the start it is skipped. Note that due to the requirement
to remove quotes symmetrically which is added in the third patch this
case does not break the commandline - it merely makes the arguments
ineffective.

The format suggested in the documentation is

fadump_extra_args="nr_cpus=1 numa=off udev.childern-max=2"<< - that

is quotes around the value. This format worked in my testing.

Unfortunately, the format with quote before argument name would
probably require another extra parameter to the callback to detect
properly.

Thanks

Michal


I am yet to test the patches in other scenarios though..


Thanks

Hari


On Friday 18 August 2017 01:44 AM, Michal Suchanek wrote:

From: Hari Bathini 

With fadump (dump capture) kernel booting like a regular kernel, it
needs almost the same amount of memory to boot as the production
kernel, which is unwarranted for a dump capture kernel. But with no
option to disable some of the unnecessary subsystems in fadump
kernel, that much memory is wasted on fadump, depriving the
production kernel of that memory.

Introduce kernel parameter 'fadump_extra_args=' that would take
regular parameters as a space separated quoted string, to be
enforced when fadump is active. This 'fadump_extra_args=' parameter
can be leveraged to pass parameters like nr_cpus=1,
cgroup_disable=memory and numa=off, to disable unwarranted
resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed
early in the boot process to put the subsequent fadump messages in
context.

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
Signed-off-by: Michal Suchanek 
---
Changes from v6:
Correct and simplify quote handling. Ideally I would like to extend
parse_args to give the length of the original quoted value to
callback. However, parse_args removes at most one doubel-quote from
the start and one from the end so that is easy to detect. Otherwise
all other users will have to be updated to trash the new argument.
---
   arch/powerpc/include/asm/fadump.h |   2 +
   arch/powerpc/kernel/fadump.c  | 109
--
arch/powerpc/kernel/prom.c|   7 +++ 3 files changed, 115
insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h
b/arch/powerpc/include/asm/fadump.h index
ce88bbe1d809..98ae00943fb3 100644 ---
a/arch/powerpc/include/asm/fadump.h +++
b/arch/powerpc/include/asm/fadump.h @@ -208,11 +208,13 @@ extern
int early_init_dt_scan_fw_dump(unsigned long node, const char
*uname, int depth, void *data); extern int fadump_reserve_mem(void);
   extern int setup_fadump(void);
+extern void enforce_fadump_extra_args(char *cmdline);
   extern int is_fadump_active(void);
   extern void crash_fadump(struct pt_regs *, const char *);
   extern void fadump_cleanup(void);

   #else/* CONFIG_FA_DUMP */
+static inline void enforce_fadump_extra_args(char *cmdline) { }
   static inline int is_fadump_active(void) { return 0; }
   static inline void crash_fadump(struct pt_regs *regs, const char
*str) { } #endif
diff --git a/arch/powerpc/kernel/fadump.c
b/arch/powerpc/kernel/fadump.c index dc0c49cfd90a..a1614d9b8a21
100644 --- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned
long node,
 * dump data waiting for us.
 */
fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump",
NULL);
-   if (fdm_active)
+   if (fdm_active)

Re: [PATCH] powerpc: powernv: Fix build error on const discarding

2017-08-18 Thread Corentin Labbe
On Thu, Aug 17, 2017 at 10:52:11PM +1000, Michael Ellerman wrote:
> Corentin Labbe  writes:
> 
> > When building a random powerpc kernel I hit this build error:
> >   CC  arch/powerpc/platforms/powernv/opal-imc.o
> > arch/powerpc/platforms/powernv/opal-imc.c: In function « 
> > disable_nest_pmu_counters »:
> > arch/powerpc/platforms/powernv/opal-imc.c:130:13: error : assignment 
> > discards « const » qualifier from pointer target type 
> > [-Werror=discarded-qualifiers]
> >l_cpumask = cpumask_of_node(nid);
> >  ^
> > This patch simply add const to l_cpumask to fix this issue.
> 
> Thanks. I'm not sure why we haven't seen that.
> 
> Do you mind attaching your .config ?
> 
> cheers

Yes

Regards
#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.13.0-rc5 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
CONFIG_GENERIC_CPU=y
# CONFIG_CELL_CPU is not set
# CONFIG_POWER4_CPU is not set
# CONFIG_POWER5_CPU is not set
# CONFIG_POWER6_CPU is not set
# CONFIG_POWER7_CPU is not set
# CONFIG_POWER8_CPU is not set
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
# CONFIG_ALTIVEC is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_FORCE_SMP=y
CONFIG_SMP=y
CONFIG_NR_CPUS=4
CONFIG_PPC_DOORBELL=y
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
# CONFIG_CPU_LITTLE_ENDIAN is not set
CONFIG_64BIT=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MAX=33
CONFIG_ARCH_MMAP_RND_BITS_MIN=18
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_LOCKBREAK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
# CONFIG_GENERIC_CSUM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=0
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_UDBG_16550=y
CONFIG_GENERIC_TBSYNC=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_EPAPR_BOOT=y
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
# CONFIG_PPC_OF_PLATFORM_PCI is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PPC_EMULATE_SSTEP=y
CONFIG_ZONE_DMA32=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_XZ is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_ARCH_HAS_TICK_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=19
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CPU

Re: [PATCH 2/2] kvm/xive: Add missing barriers and document them

2017-08-18 Thread Guilherme G. Piccoli
On 08/17/2017 11:10 PM, Benjamin Herrenschmidt wrote:
> This adds missing memory barriers to order updates/tests of
> the virtual CPPR and MFRR, thus fixing a lost IPI problem.
> 
> While at it also document all barriers in this file
> 
> This fixes a bug causing guest IPIs to occasionally get lost.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Thanks Ben. Shouldn't this be marked to stable (v4.12+)?
Also, if a Fixes tag is required:

Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE
interrupt controller").

Feel free to add my:
Tested-by: Guilherme G. Piccoli 

Cheers,


Guilherme

> ---
>  arch/powerpc/kvm/book3s_xive_template.c | 57 
> +++--
>  1 file changed, 55 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive_template.c 
> b/arch/powerpc/kvm/book3s_xive_template.c
> index 150be86b1018..d1ed2c41b5d2 100644
> --- a/arch/powerpc/kvm/book3s_xive_template.c
> +++ b/arch/powerpc/kvm/book3s_xive_template.c
> @@ -17,6 +17,12 @@ static void GLUE(X_PFX,ack_pending)(struct 
> kvmppc_xive_vcpu *xc)
>   u16 ack;
> 
>   /*
> +  * Ensure any previous store to CPPR is ordered vs.
> +  * the subsequent loads from PIPR or ACK.
> +  */
> + eieio();
> +
> + /*
>* DD1 bug workaround: If PIPR is less favored than CPPR
>* ignore the interrupt or we might incorrectly lose an IPB
>* bit.
> @@ -244,6 +250,11 @@ static u32 GLUE(X_PFX,scan_interrupts)(struct 
> kvmppc_xive_vcpu *xc,
>   /*
>* If we found an interrupt, adjust what the guest CPPR should
>* be as if we had just fetched that interrupt from HW.
> +  *
> +  * Note: This can only make xc->cppr smaller as the previous
> +  * loop will only exit with hirq != 0 if prio is lower than
> +  * the current xc->cppr. Thus we don't need to re-check xc->mfrr
> +  * for pending IPIs.
>*/
>   if (hirq)
>   xc->cppr = prio;
> @@ -390,6 +401,12 @@ X_STATIC int GLUE(X_PFX,h_cppr)(struct kvm_vcpu *vcpu, 
> unsigned long cppr)
>   xc->cppr = cppr;
> 
>   /*
> +  * Order the above update of xc->cppr with the subsequent
> +  * read of xc->mfrr inside push_pending_to_hw()
> +  */
> + smp_mb();
> +
> + /*
>* We are masking less, we need to look for pending things
>* to deliver and set VP pending bits accordingly to trigger
>* a new interrupt otherwise we might miss MFRR changes for
> @@ -429,21 +446,37 @@ X_STATIC int GLUE(X_PFX,h_eoi)(struct kvm_vcpu *vcpu, 
> unsigned long xirr)
>* used to signal MFRR changes is EOId when fetched from
>* the queue.
>*/
> - if (irq == XICS_IPI || irq == 0)
> + if (irq == XICS_IPI || irq == 0) {
> + /*
> +  * This barrier orders the setting of xc->cppr vs.
> +  * subsquent test of xc->mfrr done inside
> +  * scan_interrupts and push_pending_to_hw
> +  */
> + smp_mb();
>   goto bail;
> + }
> 
>   /* Find interrupt source */
>   sb = kvmppc_xive_find_source(xive, irq, &src);
>   if (!sb) {
>   pr_devel(" source not found !\n");
>   rc = H_PARAMETER;
> + /* Same as above */
> + smp_mb();
>   goto bail;
>   }
>   state = &sb->irq_state[src];
>   kvmppc_xive_select_irq(state, &hw_num, &xd);
> 
>   state->in_eoi = true;
> - mb();
> +
> + /*
> +  * This barrier orders both setting of in_eoi above vs,
> +  * subsequent test of guest_priority, and the setting
> +  * of xc->cppr vs. subsquent test of xc->mfrr done inside
> +  * scan_interrupts and push_pending_to_hw
> +  */
> + smp_mb();
> 
>  again:
>   if (state->guest_priority == MASKED) {
> @@ -470,6 +503,14 @@ X_STATIC int GLUE(X_PFX,h_eoi)(struct kvm_vcpu *vcpu, 
> unsigned long xirr)
> 
>   }
> 
> + /*
> +  * This barrier orders the above guest_priority check
> +  * and spin_lock/unlock with clearing in_eoi below.
> +  *
> +  * It also has to be a full mb() as it must ensure
> +  * the MMIOs done in source_eoi() are completed before
> +  * state->in_eoi is visible.
> +  */
>   mb();
>   state->in_eoi = false;
>  bail:
> @@ -504,6 +545,18 @@ X_STATIC int GLUE(X_PFX,h_ipi)(struct kvm_vcpu *vcpu, 
> unsigned long server,
>   /* Locklessly write over MFRR */
>   xc->mfrr = mfrr;
> 
> + /*
> +  * The load of xc->cppr below and the subsequent MMIO store
> +  * to the IPI must happen after the above mfrr update is
> +  * globally visible so that:
> +  *
> +  * - Synchronize with another CPU doing an H_EOI or a H_CPPR
> +  *   updating xc->cppr then reading xc->mfrr.
> +  *
> +  * - The target of the IPI sees the xc->mfrr update
> +  */
> + mb();
> +
>   /* Shoot the IPI if most favored than target cppr */
>   if (mfrr < xc->cppr)
>

Re: [1/1] selftests/powerpc: Improve tm-resched-dscr

2017-08-18 Thread Michael Ellerman
On Thu, 2017-08-17 at 01:06:47 UTC, Sam bobroff wrote:
> The tm-resched-dscr self test can, in some situations, run for
> several minutes before being successfully interrupted by the context
> switch it needs in order to perform the test. This often seems to
> occur when the test is being run in a virtual machine.
> 
> Improve the test by running it under eat_cpu() to guarantee
> contention for the CPU and increase the chance of a context switch.
> 
> In practice this seems to reduce the test time, in some cases, from
> more than two minutes to under a second.
> 
> Also remove the "progress dots" so that if the test does run for a
> long time, it doesn't produce large amounts of unnecessary output.
> 
> Signed-off-by: Sam Bobroff 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/99597ceda00b57396a176e08bd5e5c

cheers


Re: powerpc/perf: Fix usage of nest_imc_refc

2017-08-18 Thread Michael Ellerman
On Wed, 2017-08-16 at 16:21:34 UTC, Madhavan Srinivasan wrote:
> nest_imc_refc is a reference count struct,
> used to track number of active perf sessions
> using the nest units.
> 
> It is preferred to access nest_imc_refc using
> per-cpu pointer 'local_nest_imc_refc'. Since,
> nest_imc_refc is not initialized using node_id
> as array index. Patch to fix the same.
> 
> Fixes: 885dcd709ba91 ('powerpc/perf: Add nest IMC PMU support')
> Reported-by: Dan Carpenter 
> Signed-off-by: Madhavan Srinivasan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/711bd207a233141308d0aea0d2e286

cheers


Re: [2/5] powerpc: Fix missing CR before {

2017-08-18 Thread Michael Ellerman
On Wed, 2017-08-16 at 06:01:15 UTC, Benjamin Herrenschmidt wrote:
> Signed-off-by: Benjamin Herrenschmidt 

Patches 2-5 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6a303833b5e3acbb4c97cc11cc6886

cheers


Re: [2/3] block/ps3vram: Check return of ps3vram_cache_init

2017-08-18 Thread Michael Ellerman
On Mon, 2017-08-07 at 20:09:20 UTC, Geoff Levand wrote:
> Cc: Markus Elfring 
> Cc: Jim Paris 
> Cc: Jens Axboe 
> Signed-off-by: Geoff Levand 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/00e7c259e9c44f414ead5fc9bb3c45

cheers


Re: [1/3] block/ps3vram: Delete an error message for a failed memory allocation in ps3vram_cache_init()

2017-08-18 Thread Michael Ellerman
On Mon, 2017-08-07 at 20:09:20 UTC, Geoff Levand wrote:
> From: Markus Elfring 
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Link: 
> http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
> 
> Signed-off-by: Markus Elfring 
> Cc: Jim Paris 
> Cc: Jens Axboe 
> Signed-off-by: Geoff Levand 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fd1335e048a961ef63f7da1a0c8f39

cheers


Re: powerpc: add const to bin_attribute structures

2017-08-18 Thread Michael Ellerman
On Wed, 2017-08-02 at 18:07:38 UTC, Bhumika Goyal wrote:
> Declare bin_attribute structures as const as they are only passed as an
> argument to the function sysfs_create_bin_file. This argument is of
> type const, so declare the structure as const.
> 
> Signed-off-by: Bhumika Goyal 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8bfa42ab84910841336218265fcee9

cheers


Re: [v4, 1/3] mm/hugetlb: Allow arch to override and call the weak function

2017-08-18 Thread Michael Ellerman
On Fri, 2017-07-28 at 05:01:25 UTC, "Aneesh Kumar K.V" wrote:
> When running in guest mode ppc64 supports a different mechanism for hugetlb
> allocation/reservation. The LPAR management application called HMC can
> be used to reserve a set of hugepages and we pass the details of
> reserved pages via device tree to the guest. (more details in
> htab_dt_scan_hugepage_blocks()) . We do the memblock_reserve of the range
> and later in the boot sequence, we add the reserved range to huge_boot_pages.
> 
> But to enable 16G hugetlb on baremetal config (when we are not running as 
> guest)
> we want to do memblock reservation during boot. Generic code already does this
> 
> Signed-off-by: Aneesh Kumar K.V 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e24a1307ba1f99fc62a0bd61d5e87f

cheers


Re: [05/11] powerpc/topology: Remove the unused parent_node() macro

2017-08-18 Thread Michael Ellerman
On Wed, 2017-07-26 at 13:34:30 UTC, Dou Liyang wrote:
> Commit a7be6e5a7f8d ("mm: drop useless local parameters of
> __register_one_node()") removes the last user of parent_node().
> 
> The parent_node() macro in POWERPC platform is unnecessary.
> 
> Remove it for cleanup.
> 
> Reported-by: Michael Ellerman 
> Signed-off-by: Dou Liyang 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Acked-by: Michael Ellerman 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7baebe54a64af44dc34d0bd51462f0

cheers


Re: [1/2] powerpc: string: implement optimized memset variants

2017-08-18 Thread Michael Ellerman
On Mon, 2017-03-27 at 19:37:40 UTC, "Naveen N. Rao" wrote:
> Based on Matthew Wilcox's patches for other architectures.
> 
> Signed-off-by: Naveen N. Rao 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/694fc88ce271fd48f7939c032c1247

cheers


[GIT PULL] Please pull powerpc/linux.git powerpc-4.13-7 tag

2017-08-18 Thread Michael Ellerman
Hi Linus,

Please pull another powerpc fix for 4.13:

The following changes since commit 96ea91e7b6ee2c406598d859e7348b4829404eea:

  powerpc/watchdog: add locking around init/exit functions (2017-08-09 23:45:33 
+1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.13-7

for you to fetch changes up to 5a69aec945d27e78abac9fd032533d3aaebf7c1e:

  powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC 
(2017-08-16 19:35:54 +1000)


powerpc fixes for 4.13 #7

A bug in the VSX register saving that could cause userspace FP/VMX register
corruption. Never seen to happen (that we know of), was found by code
inspection, but still tagged for stable given the consequences.


Benjamin Herrenschmidt (1):
  powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC

 arch/powerpc/kernel/process.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


signature.asc
Description: PGP signature


Re: 32-bit powerpc, aty128fb: vmap allocation for size 135168 failed

2017-08-18 Thread Meelis Roos
> > I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G 
> > RAM and after some time of sddm respawning and X trying to restart, 
> > dmesg is full of messages about vmap allocation failures.
> 
> Did it just start happening? ie. did rc4 work?

No, rc4 was the first one I tried after 4.12 and it already had the 
problem. Not sure about 4.12 to be honest, will soee some day.

-- 
Meelis Roos (mr...@linux.ee)


Re: 32-bit powerpc, aty128fb: vmap allocation for size 135168 failed

2017-08-18 Thread Michael Ellerman
Meelis Roos  writes:

> I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G 
> RAM and after some time of sddm respawning and X trying to restart, 
> dmesg is full of messages about vmap allocation failures.

Did it just start happening? ie. did rc4 work?

cheers


Re: [PATCH 7/7] powerpc: use helper functions to get and set hash slots

2017-08-18 Thread Michael Ellerman
Ram Pai  writes:

> replace redundant code in __hash_page_64K(), __hash_page_huge(),
> __hash_page_4K(), __hash_page_4K() and flush_hash_page()   with
> helper functions pte_get_hash_gslot() and   pte_set_hash_slot()

This seems out of order.

At lease some of these are patching or even entirely replacing code you
just added.

> diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
> b/arch/powerpc/mm/hugetlbpage-hash64.c
> index 5964b6d..e6dcd50 100644
> --- a/arch/powerpc/mm/hugetlbpage-hash64.c
> +++ b/arch/powerpc/mm/hugetlbpage-hash64.c
> @@ -112,18 +103,7 @@ int __hash_page_huge(unsigned long ea, unsigned long 
> access, unsigned long vsid,
>   return -1;
>   }
>  
> -#ifdef CONFIG_PPC_64K_PAGES
> - /*
> -  * Insert slot number & secondary bit in PTE second half.
> -  */
> - hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
> - rpte.hidx &= ~(0xfUL);
> - *hidxp = rpte.hidx  | (slot & 0xfUL);
> - /*
> -  * check __real_pte for details on matching smp_rmb()
> -  */
> - smp_wmb();
> -#endif /* CONFIG_PPC_64K_PAGES */
> + new_pte |= pte_set_hash_slot(ptep, rpte, 0, slot);
>   }

Here for example. That entire chunk was just added in patch in 2.

cheers


Re: [PATCH v7 1/4] powerpc/fadump: reduce memory consumption for capture kernel

2017-08-18 Thread Michal Suchánek
On Fri, 18 Aug 2017 16:20:53 +0530
Hari Bathini  wrote:

> Hi Michal,
> 
> 
> Thanks for the patches. I tried testing with the patches:
> 
> [0.00] fadump: Firmware-assisted dump is active.
> [0.00] fadump: Modifying command line to enforce the
> additional parameters passed through 'fadump_extra_args='
> [0.00] fadump: Original command line: 
> BOOT_IMAGE=/vmlinux-4.13.0-rc1-bz155783+ root=/dev/mapper/rhel-root
> ro crashkernel=2048M fadump=on fadump_reserve_mem=1024M 
> "fadump_extra_args=nr_cpus=1 numa=off udev.childern-max=2" 
> rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap
> [0.00] fadump: Modified command line: 
> BOOT_IMAGE=/vmlinux-4.13.0-rc1-bz155783+ root=/dev/mapper/rhel-root
> ro crashkernel=2048M fadump=on fadump_reserve_mem=1024M
> "fadump_extra_args nr_cpus=1 numa=off udev.childern-max=2"
> rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap
> 
> Looks like the quotes are retained not enforcing the parameters...

Hello,

You are passing an argument >>"fadump_extra_args<<  - that is an
argument containing a quote in the name. I did not test this scenario
assuming the argument name would not match in this case. It would
probably not match if the quote was in the middle of the argument
name but at the start it is skipped. Note that due to the requirement
to remove quotes symmetrically which is added in the third patch this
case does not break the commandline - it merely makes the arguments
ineffective.

The format suggested in the documentation is
>>fadump_extra_args="nr_cpus=1 numa=off udev.childern-max=2"<< - that
is quotes around the value. This format worked in my testing.

Unfortunately, the format with quote before argument name would
probably require another extra parameter to the callback to detect
properly.

Thanks

Michal

> 
> I am yet to test the patches in other scenarios though..
> 
> 
> Thanks
> 
> Hari
> 
> 
> On Friday 18 August 2017 01:44 AM, Michal Suchanek wrote:
> > From: Hari Bathini 
> >
> > With fadump (dump capture) kernel booting like a regular kernel, it
> > needs almost the same amount of memory to boot as the production
> > kernel, which is unwarranted for a dump capture kernel. But with no
> > option to disable some of the unnecessary subsystems in fadump
> > kernel, that much memory is wasted on fadump, depriving the
> > production kernel of that memory.
> >
> > Introduce kernel parameter 'fadump_extra_args=' that would take
> > regular parameters as a space separated quoted string, to be
> > enforced when fadump is active. This 'fadump_extra_args=' parameter
> > can be leveraged to pass parameters like nr_cpus=1,
> > cgroup_disable=memory and numa=off, to disable unwarranted
> > resources/subsystems.
> >
> > Also, ensure the log "Firmware-assisted dump is active" is printed
> > early in the boot process to put the subsequent fadump messages in
> > context.
> >
> > Suggested-by: Michael Ellerman 
> > Signed-off-by: Hari Bathini 
> > Signed-off-by: Michal Suchanek 
> > ---
> > Changes from v6:
> > Correct and simplify quote handling. Ideally I would like to extend
> > parse_args to give the length of the original quoted value to
> > callback. However, parse_args removes at most one doubel-quote from
> > the start and one from the end so that is easy to detect. Otherwise
> > all other users will have to be updated to trash the new argument.
> > ---
> >   arch/powerpc/include/asm/fadump.h |   2 +
> >   arch/powerpc/kernel/fadump.c  | 109
> > --
> > arch/powerpc/kernel/prom.c|   7 +++ 3 files changed, 115
> > insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/fadump.h
> > b/arch/powerpc/include/asm/fadump.h index
> > ce88bbe1d809..98ae00943fb3 100644 ---
> > a/arch/powerpc/include/asm/fadump.h +++
> > b/arch/powerpc/include/asm/fadump.h @@ -208,11 +208,13 @@ extern
> > int early_init_dt_scan_fw_dump(unsigned long node, const char
> > *uname, int depth, void *data); extern int fadump_reserve_mem(void);
> >   extern int setup_fadump(void);
> > +extern void enforce_fadump_extra_args(char *cmdline);
> >   extern int is_fadump_active(void);
> >   extern void crash_fadump(struct pt_regs *, const char *);
> >   extern void fadump_cleanup(void);
> >
> >   #else /* CONFIG_FA_DUMP */
> > +static inline void enforce_fadump_extra_args(char *cmdline) { }
> >   static inline int is_fadump_active(void) { return 0; }
> >   static inline void crash_fadump(struct pt_regs *regs, const char
> > *str) { } #endif
> > diff --git a/arch/powerpc/kernel/fadump.c
> > b/arch/powerpc/kernel/fadump.c index dc0c49cfd90a..a1614d9b8a21
> > 100644 --- a/arch/powerpc/kernel/fadump.c
> > +++ b/arch/powerpc/kernel/fadump.c
> > @@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned
> > long node,
> >  * dump data waiting for us.
> >  */
> > fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump",
> > NULL);
> > -   if (fdm_active)
> > +   if (fdm_active) {
> > +   pr_info("Firmware-as

Re: [PATCH] tpm: vtpm: constify vio_device_id

2017-08-18 Thread Michael Ellerman
Jason Gunthorpe  writes:

> On Thu, Aug 17, 2017 at 11:04:21PM +0530, Arvind Yadav wrote:
>> vio_device_id are not supposed to change at runtime. All functions
>> working with vio_device_id provided by  work with
>> const vio_device_id. So mark the non-const structs as const.
>> 
>> Signed-off-by: Arvind Yadav 
>
> Reviewed-by: Jason Gunthorpe 

Thanks Jason.

>>  drivers/char/tpm/tpm_ibmvtpm.c | 2 +-

Who merges changes for this driver? I assume it's Jarkko?

cheers


Re: [PATCH 5/5] Use __func__ instead of function name

2017-08-18 Thread Michael Ellerman
Michal Suchánek  writes:

> On 2017-07-29 09:24, SZ Lin wrote:
>> Fix following checkpatch.pl warning:
>> WARNING: Prefer using '"%s...", __func__' to using
>> the function's name, in a string
>> 
>> Signed-off-by: SZ Lin 
>> ---
>>  drivers/char/tpm/tpm_ibmvtpm.c | 12 ++--
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c 
>> b/drivers/char/tpm/tpm_ibmvtpm.c
>> index e75a674b44ac..2d33acc43e25 100644
>> --- a/drivers/char/tpm/tpm_ibmvtpm.c
>> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
>> @@ -151,7 +151,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip,
>> u8 *buf, size_t count)
>>  rc = ibmvtpm_send_crq(ibmvtpm->vdev, be64_to_cpu(word[0]),
>>be64_to_cpu(word[1]));
>>  if (rc != H_SUCCESS) {
>> -dev_err(ibmvtpm->dev, "tpm_ibmvtpm_send failed rc=%d\n", rc);
>> +dev_err(ibmvtpm->dev, "%s failed rc=%d\n", __func__, rc);
>
> Can function name contain a %?

Let's hope not.

$ git grep "%s.*__func__" | wc -l
16937

cheers


Re: [PATCH v7 1/4] powerpc/fadump: reduce memory consumption for capture kernel

2017-08-18 Thread Hari Bathini

Hi Michal,


Thanks for the patches. I tried testing with the patches:

[0.00] fadump: Firmware-assisted dump is active.
[0.00] fadump: Modifying command line to enforce the additional 
parameters passed through 'fadump_extra_args='
[0.00] fadump: Original command line: 
BOOT_IMAGE=/vmlinux-4.13.0-rc1-bz155783+ root=/dev/mapper/rhel-root ro 
crashkernel=2048M fadump=on fadump_reserve_mem=1024M 
"fadump_extra_args=nr_cpus=1 numa=off udev.childern-max=2" 
rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap
[0.00] fadump: Modified command line: 
BOOT_IMAGE=/vmlinux-4.13.0-rc1-bz155783+ root=/dev/mapper/rhel-root ro 
crashkernel=2048M fadump=on fadump_reserve_mem=1024M "fadump_extra_args 
nr_cpus=1 numa=off udev.childern-max=2" rd.lvm.lv=rhel/root 
rd.lvm.lv=rhel/swap


Looks like the quotes are retained not enforcing the parameters...

I am yet to test the patches in other scenarios though..


Thanks

Hari


On Friday 18 August 2017 01:44 AM, Michal Suchanek wrote:

From: Hari Bathini 

With fadump (dump capture) kernel booting like a regular kernel, it needs
almost the same amount of memory to boot as the production kernel, which is
unwarranted for a dump capture kernel. But with no option to disable some
of the unnecessary subsystems in fadump kernel, that much memory is wasted
on fadump, depriving the production kernel of that memory.

Introduce kernel parameter 'fadump_extra_args=' that would take regular
parameters as a space separated quoted string, to be enforced when fadump
is active. This 'fadump_extra_args=' parameter can be leveraged to pass
parameters like nr_cpus=1, cgroup_disable=memory and numa=off, to disable
unwarranted resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed early
in the boot process to put the subsequent fadump messages in context.

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
Signed-off-by: Michal Suchanek 
---
Changes from v6:
Correct and simplify quote handling. Ideally I would like to extend
parse_args to give the length of the original quoted value to callback.
However, parse_args removes at most one doubel-quote from the start and
one from the end so that is easy to detect. Otherwise all other users
will have to be updated to trash the new argument.
---
  arch/powerpc/include/asm/fadump.h |   2 +
  arch/powerpc/kernel/fadump.c  | 109 --
  arch/powerpc/kernel/prom.c|   7 +++
  3 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index ce88bbe1d809..98ae00943fb3 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -208,11 +208,13 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data);
  extern int fadump_reserve_mem(void);
  extern int setup_fadump(void);
+extern void enforce_fadump_extra_args(char *cmdline);
  extern int is_fadump_active(void);
  extern void crash_fadump(struct pt_regs *, const char *);
  extern void fadump_cleanup(void);

  #else /* CONFIG_FA_DUMP */
+static inline void enforce_fadump_extra_args(char *cmdline) { }
  static inline int is_fadump_active(void) { return 0; }
  static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
  #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index dc0c49cfd90a..a1614d9b8a21 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 * dump data waiting for us.
 */
fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
-   if (fdm_active)
+   if (fdm_active) {
+   pr_info("Firmware-assisted dump is active.\n");
fw_dump.dump_active = 1;
+   }

/* Get the sizes required to store dump data for the firmware provided
 * dump sections.
@@ -332,8 +334,11 @@ int __init fadump_reserve_mem(void)
  {
unsigned long base, size, memory_boundary;

-   if (!fw_dump.fadump_enabled)
+   if (!fw_dump.fadump_enabled) {
+   if (fw_dump.dump_active)
+   pr_warn("Firmware-assisted dump was active but kernel booted 
with fadump disabled!\n");
return 0;
+   }

if (!fw_dump.fadump_supported) {
printk(KERN_INFO "Firmware-assisted dump is not supported on"
@@ -373,7 +378,6 @@ int __init fadump_reserve_mem(void)
memory_boundary = memblock_end_of_DRAM();

if (fw_dump.dump_active) {
-   printk(KERN_INFO "Firmware-assisted dump is active.\n");
/*
 * If last boot has crashed then reserve all the memory
 * above boot_memory_size so that we don't touch it until
@@ -460,6 +464,105 @@ static int __init early_fadump_reserve_mem(char *p)
  }
  early_param(

Re: [PATCH v2 4/8] powerpc/xive: introduce xive_esb_write()

2017-08-18 Thread David Gibson
On Fri, Aug 11, 2017 at 04:23:37PM +0200, Cédric Le Goater wrote:

Rationale in the commit message, maybe.

> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/sysdev/xive/common.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/sysdev/xive/common.c 
> b/arch/powerpc/sysdev/xive/common.c
> index 8a58662ed793..ac5f18a66742 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -203,6 +203,15 @@ static u8 xive_esb_read(struct xive_irq_data *xd, u32 
> offset)
>   return (u8)val;
>  }
>  
> +static void xive_esb_write(struct xive_irq_data *xd, u32 offset, u64 data)
> +{
> + /* Handle HW errata */
> + if (xd->flags & XIVE_IRQ_FLAG_SHIFT_BUG)
> + offset |= offset << 4;
> +
> + out_be64(xd->eoi_mmio + offset, data);
> +}
> +
>  #ifdef CONFIG_XMON
>  static void xive_dump_eq(const char *name, struct xive_q *q)
>  {
> @@ -297,7 +306,7 @@ void xive_do_source_eoi(u32 hw_irq, struct xive_irq_data 
> *xd)
>  {
>   /* If the XIVE supports the new "store EOI facility, use it */
>   if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
> - out_be64(xd->eoi_mmio + XIVE_ESB_STORE_EOI, 0);
> + xive_esb_write(xd, XIVE_ESB_STORE_EOI, 0);
>   else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) {
>   /*
>* The FW told us to call it. This happens for some

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 3/8] powerpc/xive: rename xive_poke_esb() in xive_esb_read()

2017-08-18 Thread David Gibson
On Fri, Aug 11, 2017 at 04:23:36PM +0200, Cédric Le Goater wrote:
> xive_poke_esb() is performing a load/read so it is better named as
> xive_esb_read() as we will need to introduce a xive_esb_write()
> routine. Also use the XIVE_ESB_LOAD_EOI offset when EOI'ing LSI
> interrupts.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
> 
>  Changes since v1:
> 
>  - fixed naming.
>  
>  arch/powerpc/sysdev/xive/common.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/xive/common.c 
> b/arch/powerpc/sysdev/xive/common.c
> index 8774af7a4105..8a58662ed793 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -190,7 +190,7 @@ static u32 xive_scan_interrupts(struct xive_cpu *xc, bool 
> just_peek)
>   * This is used to perform the magic loads from an ESB
>   * described in xive.h
>   */
> -static u8 xive_poke_esb(struct xive_irq_data *xd, u32 offset)
> +static u8 xive_esb_read(struct xive_irq_data *xd, u32 offset)
>  {
>   u64 val;
>  
> @@ -227,7 +227,7 @@ void xmon_xive_do_dump(int cpu)
>   xive_dump_eq("IRQ", &xc->queue[xive_irq_priority]);
>  #ifdef CONFIG_SMP
>   {
> - u64 val = xive_poke_esb(&xc->ipi_data, XIVE_ESB_GET);
> + u64 val = xive_esb_read(&xc->ipi_data, XIVE_ESB_GET);
>   xmon_printf("  IPI state: %x:%c%c\n", xc->hw_ipi,
>   val & XIVE_ESB_VAL_P ? 'P' : 'p',
>   val & XIVE_ESB_VAL_P ? 'Q' : 'q');
> @@ -326,9 +326,9 @@ void xive_do_source_eoi(u32 hw_irq, struct xive_irq_data 
> *xd)
>* properly.
>*/
>   if (xd->flags & XIVE_IRQ_FLAG_LSI)
> - in_be64(xd->eoi_mmio);
> + xive_esb_read(xd, XIVE_ESB_LOAD_EOI);
>   else {
> - eoi_val = xive_poke_esb(xd, XIVE_ESB_SET_PQ_00);
> + eoi_val = xive_esb_read(xd, XIVE_ESB_SET_PQ_00);
>   DBG_VERBOSE("eoi_val=%x\n", offset, eoi_val);
>  
>   /* Re-trigger if needed */
> @@ -383,12 +383,12 @@ static void xive_do_source_set_mask(struct 
> xive_irq_data *xd,
>* ESB accordingly on unmask.
>*/
>   if (mask) {
> - val = xive_poke_esb(xd, XIVE_ESB_SET_PQ_01);
> + val = xive_esb_read(xd, XIVE_ESB_SET_PQ_01);
>   xd->saved_p = !!(val & XIVE_ESB_VAL_P);
>   } else if (xd->saved_p)
> - xive_poke_esb(xd, XIVE_ESB_SET_PQ_10);
> + xive_esb_read(xd, XIVE_ESB_SET_PQ_10);
>   else
> - xive_poke_esb(xd, XIVE_ESB_SET_PQ_00);
> + xive_esb_read(xd, XIVE_ESB_SET_PQ_00);
>  }
>  
>  /*
> @@ -768,7 +768,7 @@ static int xive_irq_retrigger(struct irq_data *d)
>* To perform a retrigger, we first set the PQ bits to
>* 11, then perform an EOI.
>*/
> - xive_poke_esb(xd, XIVE_ESB_SET_PQ_11);
> + xive_esb_read(xd, XIVE_ESB_SET_PQ_11);
>  
>   /*
>* Note: We pass "0" to the hw_irq argument in order to
> @@ -803,7 +803,7 @@ static int xive_irq_set_vcpu_affinity(struct irq_data *d, 
> void *state)
>   irqd_set_forwarded_to_vcpu(d);
>  
>   /* Set it to PQ=10 state to prevent further sends */
> - pq = xive_poke_esb(xd, XIVE_ESB_SET_PQ_10);
> + pq = xive_esb_read(xd, XIVE_ESB_SET_PQ_10);
>  
>   /* No target ? nothing to do */
>   if (xd->target == XIVE_INVALID_TARGET) {
> @@ -832,7 +832,7 @@ static int xive_irq_set_vcpu_affinity(struct irq_data *d, 
> void *state)
>* for sure the queue slot is no longer in use.
>*/
>   if (pq & 2) {
> - pq = xive_poke_esb(xd, XIVE_ESB_SET_PQ_11);
> + pq = xive_esb_read(xd, XIVE_ESB_SET_PQ_11);
>   xd->saved_p = true;
>  
>   /*

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH 6/6] dpaa_eth: check allocation result

2017-08-18 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ef30038..ff7f153 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2557,6 +2557,9 @@ static struct dpaa_bp *dpaa_bp_alloc(struct device *dev)
 
dpaa_bp->bpid = FSL_DPAA_BPID_INV;
dpaa_bp->percpu_count = devm_alloc_percpu(dev, *dpaa_bp->percpu_count);
+   if (!dpaa_bp->percpu_count)
+   return ERR_PTR(-ENOMEM);
+
dpaa_bp->config_count = FSL_DPAA_ETH_MAX_BUF_COUNT;
 
dpaa_bp->seed_cb = dpaa_bp_seed;
-- 
2.1.0



[PATCH 3/6] dpaa_eth: enable Rx hashing control

2017-08-18 Thread Madalin Bucur
Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 113 +
 1 file changed, 113 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index aad825088..965f652 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -399,6 +399,117 @@ static void dpaa_get_strings(struct net_device *net_dev, 
u32 stringset,
memcpy(strings, dpaa_stats_global, size);
 }
 
+static int dpaa_get_hash_opts(struct net_device *dev,
+ struct ethtool_rxnfc *cmd)
+{
+   cmd->data = 0;
+
+   switch (cmd->flow_type) {
+   case TCP_V4_FLOW:
+   case TCP_V6_FLOW:
+   case UDP_V4_FLOW:
+   case UDP_V6_FLOW:
+   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   /* Fall through */
+   case IPV4_FLOW:
+   case IPV6_FLOW:
+   case SCTP_V4_FLOW:
+   case SCTP_V6_FLOW:
+   case AH_ESP_V4_FLOW:
+   case AH_ESP_V6_FLOW:
+   case AH_V4_FLOW:
+   case AH_V6_FLOW:
+   case ESP_V4_FLOW:
+   case ESP_V6_FLOW:
+   cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+   break;
+   default:
+   cmd->data = 0;
+   break;
+   }
+
+   return 0;
+}
+
+static int dpaa_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
+ u32 *unused)
+{
+   int ret = -EOPNOTSUPP;
+
+   switch (cmd->cmd) {
+   case ETHTOOL_GRXFH:
+   ret = dpaa_get_hash_opts(dev, cmd);
+   break;
+   default:
+   break;
+   }
+
+   return ret;
+}
+
+static void dpaa_set_hash(struct net_device *net_dev, bool enable)
+{
+   struct mac_device *mac_dev;
+   struct fman_port *rxport;
+   struct dpaa_priv *priv;
+
+   priv = netdev_priv(net_dev);
+   mac_dev = priv->mac_dev;
+   rxport = mac_dev->port[0];
+
+   fman_port_use_kg_hash(rxport, enable);
+}
+
+static int dpaa_set_hash_opts(struct net_device *dev,
+ struct ethtool_rxnfc *nfc)
+{
+   int ret = -EINVAL;
+
+   /* we support hashing on IPv4/v6 src/dest IP and L4 src/dest port */
+   if (nfc->data &
+   ~(RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | RXH_L4_B_2_3))
+   return -EINVAL;
+
+   switch (nfc->flow_type) {
+   case TCP_V4_FLOW:
+   case TCP_V6_FLOW:
+   case UDP_V4_FLOW:
+   case UDP_V6_FLOW:
+   case IPV4_FLOW:
+   case IPV6_FLOW:
+   case SCTP_V4_FLOW:
+   case SCTP_V6_FLOW:
+   case AH_ESP_V4_FLOW:
+   case AH_ESP_V6_FLOW:
+   case AH_V4_FLOW:
+   case AH_V6_FLOW:
+   case ESP_V4_FLOW:
+   case ESP_V6_FLOW:
+   dpaa_set_hash(dev, !!nfc->data);
+   ret = 0;
+   break;
+   default:
+   break;
+   }
+
+   return ret;
+}
+
+static int dpaa_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+   int ret = -EOPNOTSUPP;
+
+   switch (cmd->cmd) {
+   case ETHTOOL_SRXFH:
+   ret = dpaa_set_hash_opts(dev, cmd);
+   break;
+   default:
+   break;
+   }
+
+   return ret;
+}
+
 const struct ethtool_ops dpaa_ethtool_ops = {
.get_drvinfo = dpaa_get_drvinfo,
.get_msglevel = dpaa_get_msglevel,
@@ -412,4 +523,6 @@ const struct ethtool_ops dpaa_ethtool_ops = {
.get_strings = dpaa_get_strings,
.get_link_ksettings = dpaa_get_link_ksettings,
.set_link_ksettings = dpaa_set_link_ksettings,
+   .get_rxnfc = dpaa_get_rxnfc,
+   .set_rxnfc = dpaa_set_rxnfc,
 };
-- 
2.1.0



[PATCH 5/6] Documentation: networking: add RSS information

2017-08-18 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 Documentation/networking/dpaa.txt | 68 ++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/dpaa.txt 
b/Documentation/networking/dpaa.txt
index 76e016d..f88194f 100644
--- a/Documentation/networking/dpaa.txt
+++ b/Documentation/networking/dpaa.txt
@@ -13,6 +13,7 @@ Contents
- Configuring DPAA Ethernet in your kernel
- DPAA Ethernet Frame Processing
- DPAA Ethernet Features
+   - DPAA IRQ Affinity and Receive Side Scaling
- Debugging
 
 DPAA Ethernet Overview
@@ -147,7 +148,10 @@ gradually.
 
 The driver has Rx and Tx checksum offloading for UDP and TCP. Currently the Rx
 checksum offload feature is enabled by default and cannot be controlled through
-ethtool.
+ethtool. Also, rx-flow-hash and rx-hashing was added. The addition of RSS
+provides a big performance boost for the forwarding scenarios, allowing
+different traffic flows received by one interface to be processed by different
+CPUs in parallel.
 
 The driver has support for multiple prioritized Tx traffic classes. Priorities
 range from 0 (lowest) to 3 (highest). These are mapped to HW workqueues with
@@ -166,6 +170,68 @@ classes as follows:
 tc qdisc add dev  root handle 1: \
 mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
 
+DPAA IRQ Affinity and Receive Side Scaling
+==
+
+Traffic coming on the DPAA Rx queues or on the DPAA Tx confirmation
+queues is seen by the CPU as ingress traffic on a certain portal.
+The DPAA QMan portal interrupts are affined each to a certain CPU.
+The same portal interrupt services all the QMan portal consumers.
+
+By default the DPAA Ethernet driver enables RSS, making use of the
+DPAA FMan Parser and Keygen blocks to distribute traffic on 128
+hardware frame queues using a hash on IP v4/v6 source and destination
+and L4 source and destination ports, in present in the received frame.
+When RSS is disabled, all traffic received by a certain interface is
+received on the default Rx frame queue. The default DPAA Rx frame
+queues are configured to put the received traffic into a pool channel
+that allows any available CPU portal to dequeue the ingress traffic.
+The default frame queues have the HOLDACTIVE option set, ensuring that
+traffic bursts from a certain queue are serviced by the same CPU.
+This ensures a very low rate of frame reordering. A drawback of this
+is that only one CPU at a time can service the traffic received by a
+certain interface when RSS is not enabled.
+
+To implement RSS, the DPAA Ethernet driver allocates an extra set of
+128 Rx frame queues that are configured to dedicated channels, in a
+round-robin manner. The mapping of the frame queues to CPUs is now
+hardcoded, there is no indirection table to move traffic for a certain
+FQ (hash result) to another CPU. The ingress traffic arriving on one
+of these frame queues will arrive at the same portal and will always
+be processed by the same CPU. This ensures intra-flow order preservation
+and workload distribution for multiple traffic flows.
+
+RSS can be turned off for a certain interface using ethtool, i.e.
+
+   # ethtool -N fm1-mac9 rx-flow-hash tcp4 ""
+
+To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6:
+
+   # ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn
+
+There is no independent control for individual protocols, any command
+run for one of tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 is
+going to control the rx-flow-hashing for all protocols on that interface.
+
+Besides using the FMan Keygen computed hash for spreading traffic on the
+128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when
+the NETIF_F_RXHASH feature is on (active by default). This can be turned
+on or off through ethtool, i.e.:
+
+   # ethtool -K fm1-mac9 rx-hashing off
+   # ethtool -k fm1-mac9 | grep hash
+   receive-hashing: off
+   # ethtool -K fm1-mac9 rx-hashing on
+   Actual changes:
+   receive-hashing: on
+   # ethtool -k fm1-mac9 | grep hash
+   receive-hashing: on
+
+Please note that Rx hashing depends upon the rx-flow-hashing being on
+for that interface - turning off rx-flow-hashing will also disable the
+rx-hashing (without ethtool reporting it as off as that depends on the
+NETIF_F_RXHASH feature flag).
+
 Debugging
 =
 
-- 
2.1.0



[PATCH 4/6] dpaa_eth: add NETIF_F_RXHASH

2017-08-18 Thread Madalin Bucur
Set the skb hash when then FMan Keygen hash result is available.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 19 ---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  1 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c |  9 +++--
 drivers/net/ethernet/freescale/fman/fman_port.c| 11 +++
 drivers/net/ethernet/freescale/fman/fman_port.h|  2 ++
 5 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 6d89e74..ef30038 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -236,7 +236,7 @@ static int dpaa_netdev_init(struct net_device *net_dev,
net_dev->max_mtu = dpaa_get_max_mtu();
 
net_dev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-NETIF_F_LLTX);
+NETIF_F_LLTX | NETIF_F_RXHASH);
 
net_dev->hw_features |= NETIF_F_SG | NETIF_F_HIGHDMA;
/* The kernels enables GSO automatically, if we declare NETIF_F_SG.
@@ -2237,12 +2237,13 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
dma_addr_t addr = qm_fd_addr(fd);
enum qm_fd_format fd_format;
struct net_device *net_dev;
-   u32 fd_status;
+   u32 fd_status, hash_offset;
struct dpaa_bp *dpaa_bp;
struct dpaa_priv *priv;
unsigned int skb_len;
struct sk_buff *skb;
int *count_ptr;
+   void *vaddr;
 
fd_status = be32_to_cpu(fd->status);
fd_format = qm_fd_get_format(fd);
@@ -2288,7 +2289,8 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
 
/* prefetch the first 64 bytes of the frame or the SGT start */
-   prefetch(phys_to_virt(addr) + qm_fd_get_offset(fd));
+   vaddr = phys_to_virt(addr);
+   prefetch(vaddr + qm_fd_get_offset(fd));
 
fd_format = qm_fd_get_format(fd);
/* The only FD types that we may receive are contig and S/G */
@@ -2309,6 +2311,14 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
 
skb->protocol = eth_type_trans(skb, net_dev);
 
+   if (net_dev->features & NETIF_F_RXHASH && priv->keygen_in_use &&
+   !fman_port_get_hash_result_offset(priv->mac_dev->port[RX],
+ &hash_offset))
+   skb_set_hash(skb, be32_to_cpu(*(u32 *)(vaddr + hash_offset)),
+// if L4 exists, it was used in the hash generation
+be32_to_cpu(fd->status) & FM_FD_STAT_L4CV ?
+   PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
+
skb_len = skb->len;
 
if (unlikely(netif_receive_skb(skb) == NET_RX_DROP))
@@ -2774,6 +2784,9 @@ static int dpaa_eth_probe(struct platform_device *pdev)
if (err)
goto init_ports_failed;
 
+   /* Rx traffic distribution based on keygen hashing defaults to on */
+   priv->keygen_in_use = true;
+
priv->percpu_priv = devm_alloc_percpu(dev, *priv->percpu_priv);
if (!priv->percpu_priv) {
dev_err(dev, "devm_alloc_percpu() failed\n");
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 496a12c..bd94220 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -159,6 +159,7 @@ struct dpaa_priv {
struct list_head dpaa_fq_list;
 
u8 num_tc;
+   bool keygen_in_use;
u32 msg_enable; /* net_device message level */
 
struct {
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 965f652..faea674 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -402,6 +402,8 @@ static void dpaa_get_strings(struct net_device *net_dev, 
u32 stringset,
 static int dpaa_get_hash_opts(struct net_device *dev,
  struct ethtool_rxnfc *cmd)
 {
+   struct dpaa_priv *priv = netdev_priv(dev);
+
cmd->data = 0;
 
switch (cmd->flow_type) {
@@ -409,7 +411,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
case TCP_V6_FLOW:
case UDP_V4_FLOW:
case UDP_V6_FLOW:
-   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   if (priv->keygen_in_use)
+   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
/* Fall through */
case IPV4_FLOW:
case IPV6_FLOW:
@@ -421,7 +424,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
case AH_V6_FLOW:
case ESP_V4_FLOW:
case ESP_V6_FLOW:
-   cm

[PATCH 1/6] fsl/fman: enable FMan Keygen

2017-08-18 Thread Madalin Bucur
From: Iordache Florinel-R70177 

Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.

Signed-off-by: Iordache Florinel 
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/fman/Makefile  |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c|  26 +
 drivers/net/ethernet/freescale/fman/fman.h|   2 +
 drivers/net/ethernet/freescale/fman/fman_keygen.c | 783 ++
 drivers/net/ethernet/freescale/fman/fman_keygen.h |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c   |  40 +-
 drivers/net/ethernet/freescale/fman/fman_port.h   |   5 +
 7 files changed, 902 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
index 6049177..2c38119 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -4,6 +4,6 @@ obj-$(CONFIG_FSL_FMAN) += fsl_fman.o
 obj-$(CONFIG_FSL_FMAN) += fsl_fman_port.o
 obj-$(CONFIG_FSL_FMAN) += fsl_mac.o
 
-fsl_fman-objs  := fman_muram.o fman.o fman_sp.o
+fsl_fman-objs  := fman_muram.o fman.o fman_sp.o fman_keygen.o
 fsl_fman_port-objs := fman_port.o
 fsl_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
diff --git a/drivers/net/ethernet/freescale/fman/fman.c 
b/drivers/net/ethernet/freescale/fman/fman.c
index e714b8f..491a5ac 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -34,6 +34,7 @@
 
 #include "fman.h"
 #include "fman_muram.h"
+#include "fman_keygen.h"
 
 #include 
 #include 
@@ -56,6 +57,7 @@
 /* Modules registers offsets */
 #define BMI_OFFSET 0x0008
 #define QMI_OFFSET 0x00080400
+#define KG_OFFSET  0x000C1000
 #define DMA_OFFSET 0x000C2000
 #define FPM_OFFSET 0x000C3000
 #define IMEM_OFFSET0x000C4000
@@ -617,6 +619,7 @@ struct fman {
struct fman_qmi_regs __iomem *qmi_regs;
struct fman_dma_regs __iomem *dma_regs;
struct fman_hwp_regs __iomem *hwp_regs;
+   struct fman_kg_regs __iomem *kg_regs;
fman_exceptions_cb *exception_cb;
fman_bus_error_cb *bus_error_cb;
/* Spinlock for FMan use */
@@ -631,6 +634,8 @@ struct fman {
/* Fifo in MURAM */
unsigned long fifo_offset;
size_t fifo_size;
+   /* KeyGen handle */
+   struct fman_keygen *keygen;
 
u32 liodn_base[64];
u32 liodn_offset[64];
@@ -1811,6 +1816,7 @@ static int fman_config(struct fman *fman)
fman->qmi_regs = base_addr + QMI_OFFSET;
fman->dma_regs = base_addr + DMA_OFFSET;
fman->hwp_regs = base_addr + HWP_OFFSET;
+   fman->kg_regs = base_addr + KG_OFFSET;
fman->base_addr = base_addr;
 
spin_lock_init(&fman->spinlock);
@@ -2083,6 +2089,11 @@ static int fman_init(struct fman *fman)
/* Init HW Parser */
hwp_init(fman->hwp_regs);
 
+   /* Init KeyGen */
+   fman->keygen = keygen_init(fman->kg_regs);
+   if (!fman->keygen)
+   return -EINVAL;
+
err = enable(fman, cfg);
if (err != 0)
return err;
@@ -2562,6 +2573,21 @@ int fman_get_rx_extra_headroom(void)
 EXPORT_SYMBOL(fman_get_rx_extra_headroom);
 
 /**
+ * fman_get_keygen
+ *
+ * @fman:  A Pointer to FMan device
+ *
+ * Get the handle to KeyGen module part of FM driver
+ *
+ * Return: Handle to KeyGen
+ */
+struct fman_keygen *fman_get_keygen(struct fman *fman)
+{
+   return fman->keygen;
+}
+EXPORT_SYMBOL(fman_get_keygen);
+
+/**
  * fman_bind
  * @dev:   FMan OF device pointer
  *
diff --git a/drivers/net/ethernet/freescale/fman/fman.h 
b/drivers/net/ethernet/freescale/fman/fman.h
index f53e147..291990e 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -320,6 +320,8 @@ u16 fman_get_max_frm(void);
 
 int fman_get_rx_extra_headroom(void);
 
+struct fman_keygen *fman_get_keygen(struct fman *fman);
+
 struct fman *fman_bind(struct device *dev);
 
 #endif /* __FM_H */
diff --git a/drivers/net/ethernet/freescale/fman/fman_keygen.c 
b/drivers/net/ethernet/freescale/fman/fman_keygen.c
new file mode 100644
index 000..f54da3c
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_keygen.c
@@ -0,0 +1,783 @@
+/*
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the foll

[PATCH 2/6] dpaa_eth: use multiple Rx frame queues

2017-08-18 Thread Madalin Bucur
Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 50 +++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  1 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |  3 ++
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index c7fa285..6d89e74 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -158,7 +158,7 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms");
 #define DPAA_RX_PRIV_DATA_SIZE (u16)(DPAA_TX_PRIV_DATA_SIZE + \
dpaa_rx_extra_headroom)
 
-#define DPAA_ETH_RX_QUEUES 128
+#define DPAA_ETH_PCD_RXQ_NUM   128
 
 #define DPAA_ENQUEUE_RETRIES   10
 
@@ -169,6 +169,7 @@ struct fm_port_fqs {
struct dpaa_fq *tx_errq;
struct dpaa_fq *rx_defq;
struct dpaa_fq *rx_errq;
+   struct dpaa_fq *rx_pcdq;
 };
 
 /* All the dpa bps in use at any moment */
@@ -628,6 +629,7 @@ static inline void dpaa_assign_wq(struct dpaa_fq *fq, int 
idx)
fq->wq = 5;
break;
case FQ_TYPE_RX_DEFAULT:
+   case FQ_TYPE_RX_PCD:
fq->wq = 6;
break;
case FQ_TYPE_TX:
@@ -688,6 +690,7 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct 
list_head *list,
  struct fm_port_fqs *port_fqs)
 {
struct dpaa_fq *dpaa_fq;
+   u32 fq_base, fq_base_aligned, i;
 
dpaa_fq = dpaa_fq_alloc(dev, 0, 1, list, FQ_TYPE_RX_ERROR);
if (!dpaa_fq)
@@ -701,6 +704,26 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct 
list_head *list,
 
port_fqs->rx_defq = &dpaa_fq[0];
 
+   /* the PCD FQIDs range needs to be aligned for correct operation */
+   if (qman_alloc_fqid_range(&fq_base, 2 * DPAA_ETH_PCD_RXQ_NUM))
+   goto fq_alloc_failed;
+
+   fq_base_aligned = ALIGN(fq_base, DPAA_ETH_PCD_RXQ_NUM);
+
+   for (i = fq_base; i < fq_base_aligned; i++)
+   qman_release_fqid(i);
+
+   for (i = fq_base_aligned + DPAA_ETH_PCD_RXQ_NUM;
+i < (fq_base + 2 * DPAA_ETH_PCD_RXQ_NUM); i++)
+   qman_release_fqid(i);
+
+   dpaa_fq = dpaa_fq_alloc(dev, fq_base_aligned, DPAA_ETH_PCD_RXQ_NUM,
+   list, FQ_TYPE_RX_PCD);
+   if (!dpaa_fq)
+   goto fq_alloc_failed;
+
+   port_fqs->rx_pcdq = &dpaa_fq[0];
+
if (!dpaa_fq_alloc(dev, 0, DPAA_ETH_TXQ_NUM, list, FQ_TYPE_TX_CONF_MQ))
goto fq_alloc_failed;
 
@@ -870,13 +893,14 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
  const struct dpaa_fq_cbs *fq_cbs,
  struct fman_port *tx_port)
 {
-   int egress_cnt = 0, conf_cnt = 0, num_portals = 0, cpu;
+   int egress_cnt = 0, conf_cnt = 0, num_portals = 0, portal_cnt = 0, cpu;
const cpumask_t *affine_cpus = qman_affine_cpus();
-   u16 portals[NR_CPUS];
+   u16 channels[NR_CPUS];
struct dpaa_fq *fq;
 
for_each_cpu(cpu, affine_cpus)
-   portals[num_portals++] = qman_affine_channel(cpu);
+   channels[num_portals++] = qman_affine_channel(cpu);
+
if (num_portals == 0)
dev_err(priv->net_dev->dev.parent,
"No Qman software (affine) channels found");
@@ -890,6 +914,12 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
case FQ_TYPE_RX_ERROR:
dpaa_setup_ingress(priv, fq, &fq_cbs->rx_errq);
break;
+   case FQ_TYPE_RX_PCD:
+   if (!num_portals)
+   continue;
+   dpaa_setup_ingress(priv, fq, &fq_cbs->rx_defq);
+   fq->channel = channels[portal_cnt++ % num_portals];
+   break;
case FQ_TYPE_TX:
dpaa_setup_egress(priv, fq, tx_port,
  &fq_cbs->egress_ern);
@@ -1039,7 +1069,8 @@ static int dpaa_fq_init(struct dpaa_fq *dpaa_fq, bool 
td_enable)
/* Put all the ingress queues in our "ingress CGR". */
if (priv->use_ingress_cgr &&
(dpaa_fq->fq_type == FQ_TYPE_RX_DEFAULT ||
-dpaa_fq->fq_type == FQ_TYPE_RX_ERROR)) {
+dpaa_fq->fq_type == FQ_TYPE_RX_ERROR ||
+dpaa_fq->fq_type == FQ_TYPE_RX_PCD)) {
initfq.we_mask |= cpu_to_be16(QM_INITFQ_WE_CGID);
initfq.fqd.fq_ctrl |= cpu_to_be16(QM_FQCTRL_CGE);
i

[PATCH 0/6] Add RSS to DPAA 1.x Ethernet driver

2017-08-18 Thread Madalin Bucur
This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.

Iordache Florinel-R70177 (1):
  fsl/fman: enable FMan Keygen

Madalin Bucur (5):
  dpaa_eth: use multiple Rx frame queues
  dpaa_eth: enable Rx hashing control
  dpaa_eth: add NETIF_F_RXHASH
  Documentation: networking: add RSS information
  dpaa_eth: check allocation result

 Documentation/networking/dpaa.txt  |  68 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  72 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |   2 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |   3 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 118 
 drivers/net/ethernet/freescale/fman/Makefile   |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c |  26 +
 drivers/net/ethernet/freescale/fman/fman.h |   2 +
 drivers/net/ethernet/freescale/fman/fman_keygen.c  | 783 +
 drivers/net/ethernet/freescale/fman/fman_keygen.h  |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c|  51 +-
 drivers/net/ethernet/freescale/fman/fman_port.h|   7 +
 12 files changed, 1167 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

-- 
2.1.0



Re: [PATCH 1/6] ASoC: fsl: make snd_pcm_hardware const

2017-08-18 Thread Nicolin Chen
On Thu, Aug 17, 2017 at 03:46:07PM +0530, Bhumika Goyal wrote:
> Make these const as they are only passed as the 2nd argument to the
> function snd_soc_set_runtime_hwparams, which is const.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Acked-by: Nicolin Chen 


Re: [PATCH 5/5] Use __func__ instead of function name

2017-08-18 Thread Michal Suchánek

On 2017-07-29 09:24, SZ Lin wrote:

Fix following checkpatch.pl warning:
WARNING: Prefer using '"%s...", __func__' to using
the function's name, in a string

Signed-off-by: SZ Lin 
---
 drivers/char/tpm/tpm_ibmvtpm.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c 
b/drivers/char/tpm/tpm_ibmvtpm.c

index e75a674b44ac..2d33acc43e25 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -151,7 +151,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip,
u8 *buf, size_t count)
rc = ibmvtpm_send_crq(ibmvtpm->vdev, be64_to_cpu(word[0]),
  be64_to_cpu(word[1]));
if (rc != H_SUCCESS) {
-   dev_err(ibmvtpm->dev, "tpm_ibmvtpm_send failed rc=%d\n", rc);
+   dev_err(ibmvtpm->dev, "%s failed rc=%d\n", __func__, rc);


Can function name contain a %?

I would prefer dev_err(ibmvtpm->dev, __func__ " failed rc=%d\n", rc);

It's not what checkpatch advises in the above message, though.

Presumably with many messages from the same function using %s would
save space but that is not the usual case.

Thanks

Michal




Re: [PATCH v2 1/1] Split VGA default device handler out of VGA arbiter

2017-08-18 Thread Daniel Vetter
On Thu, Aug 17, 2017 at 09:30:28PM +1000, Daniel Axtens wrote:
> A system without PCI legacy resources (e.g. ARM64) may find that no
> default/boot VGA device has been marked, because the VGA arbiter
> checks for legacy resource decoding before marking a card as default.
> 
> Split the small bit of code that does default VGA handling out from
> the arbiter. Add a Kconfig option to allow the kernel to be built
> with just the default handling, or the arbiter and default handling.
> 
> Add handling for devices that should be marked as default but aren't
> handled by the vga arbiter by adding a late initcall and a class
> enable hook. If there is no default from vgaarb then the first card
> that is enabled, has a driver bound, and can decode memory or I/O
> will be marked as default.
> 
> Signed-off-by: Daniel Axtens 

Looks reasonable, but I have no clue at all about this. Can you pls get
some proper review from pci/platform folks (ppc would be good to)? I can
apply to drm-misc once that's done.

Just documentation comments below.

Thanks, Daniel
> 
> ---
> 
> v2: Tested on:
>  - x86_64 laptop
>  - arm64 D05 board with hibmc card
>  - qemu powerpc with tcg and bochs std-vga
> 
> I know this adds another config option and that's a bit sad, but
> we can't include it unconditionally as it depends on PCI.
> Suggestions welcome.
> ---
>  arch/ia64/pci/fixup.c|   2 +-
>  arch/powerpc/kernel/pci-common.c |   2 +-
>  arch/x86/pci/fixup.c |   2 +-
>  arch/x86/video/fbdev.c   |   2 +-
>  drivers/gpu/vga/Kconfig  |  12 +++
>  drivers/gpu/vga/Makefile |   1 +
>  drivers/gpu/vga/vga_default.c| 159 
> +++
>  drivers/gpu/vga/vga_switcheroo.c |   2 +-
>  drivers/gpu/vga/vgaarb.c |  41 +-
>  drivers/pci/pci-sysfs.c  |   2 +-
>  include/linux/vga_default.h  |  44 +++
>  include/linux/vgaarb.h   |  14 
>  12 files changed, 225 insertions(+), 58 deletions(-)
>  create mode 100644 drivers/gpu/vga/vga_default.c
>  create mode 100644 include/linux/vga_default.h
> 
> diff --git a/arch/ia64/pci/fixup.c b/arch/ia64/pci/fixup.c
> index 41caa99add51..b35d1cf4501a 100644
> --- a/arch/ia64/pci/fixup.c
> +++ b/arch/ia64/pci/fixup.c
> @@ -5,7 +5,7 @@
>  
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  
>  #include 
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index 341a7469cab8..4fd890a51d18 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -31,7 +31,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  #include 
>  #include 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index 11e407489db0..b1254bc09a45 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -5,7 +5,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c
> index 9fd24846d094..62cfa74ea86e 100644
> --- a/arch/x86/video/fbdev.c
> +++ b/arch/x86/video/fbdev.c
> @@ -9,7 +9,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  int fb_is_primary_device(struct fb_info *info)
>  {
> diff --git a/drivers/gpu/vga/Kconfig b/drivers/gpu/vga/Kconfig
> index 29437eabe095..81d4105aecf6 100644
> --- a/drivers/gpu/vga/Kconfig
> +++ b/drivers/gpu/vga/Kconfig
> @@ -1,3 +1,14 @@
> +config VGA_DEFAULT
> + bool "VGA Default Device Support" if EXPERT
> + default y
> + depends on PCI
> + help
> +   Some programs find it helpful to know what VGA device is the default.
> +   On platforms like x86 this means the device used by the BIOS to show
> +   early boot messages. On other platforms this may be an arbitrary PCI
> +   graphics card. Select this to have a default device recorded within
> +   the kernel and exposed to userspace through sysfs.
> +
>  config VGA_ARB
>   bool "VGA Arbitration" if EXPERT
>   default y
> @@ -22,6 +33,7 @@ config VGA_SWITCHEROO
>   depends on X86
>   depends on ACPI
>   select VGA_ARB
> + select VGA_DEFAULT
>   help
> Many laptops released in 2008/9/10 have two GPUs with a multiplexer
> to switch between them. This adds support for dynamic switching when
> diff --git a/drivers/gpu/vga/Makefile b/drivers/gpu/vga/Makefile
> index 14ca30b75d0a..1e30f90d40fb 100644
> --- a/drivers/gpu/vga/Makefile
> +++ b/drivers/gpu/vga/Makefile
> @@ -1,2 +1,3 @@
>  obj-$(CONFIG_VGA_ARB)  += vgaarb.o
> +obj-$(CONFIG_VGA_DEFAULT) += vga_default.o
>  obj-$(CONFIG_VGA_SWITCHEROO) += vga_switcheroo.o
> diff --git a/drivers/gpu/vga/vga_default.c b/drivers/gpu/vga/vga_default.c
> new file mode 100644
> index ..f6fcb0eb1507
> --- /dev/null
> +++ b/drivers/gpu/vga/vga_default.c
> @@ -0,0 +1,159 @@
> +/*
> + * vga_default.c: What is the default/boot PCI VGA device?
> + *
> + * (C) Copyright