from:"Jon Masters"

Re: [PATCH] arm64: PCI: Enable SMC conduit

2021-03-25 Thread Jon Masters

Hi Marcin,

Many thanks for your thoughtful, heartfelt response, and I don't
disagree with your sentiments.

The truth is that we have a messy situation. As a collective community
of people who are passionate about the success of Arm in general
purpose systems, I know many who would share my personal feeling that
this is all beyond very unfortunate. That other architecture has
working, robust, PCI IP that adheres to standards (more or less)
correctly. There is no reason we can't either. But it takes a
collective industry wide effort, alongside leadership from Arm (and
others) to push things forward. I'm very impressed with where
SystemReady is headed and there are great people behind making that
happen. So I have faith that things will improve. Now is a good time
to unite as an industry behind improving both the status quo (quirks)
and future IP so that it is properly compliant. My opinion is that now
is not a good moment to rework entirely how we do PCI enumeration to
use an alternative scheme.

Please see the below for more.

On Thu, Mar 25, 2021 at 4:45 PM Marcin Wojtas  wrote:

> So what we have after 4 years:
> * Direct convincing of IP vendors still being a plan.

Things need to improve here. I've *expressed* as much to certain folks
around the industry. I'm not afraid to get more vocal. There is too
much IP out there even now that is doing inexcusably non-compliant
things. When I would talk to these vendors they didn't seem to take
standards compliance seriously (to any standard) because they're used
to making some BSP for some platform and nobody has stood thoroughly
over them to the point of extreme discomfort so that they change their
approach. It is now past time that we stand over these folks and get
them to change. I am not afraid to get much more intense here in my
approach and I would hope that others who feel similarly about
standardization would also choose to engage with extreme vigor.
Extreme vigor. It must become an extreme embarrassment for any of them
to continue to have any IP that claims to be "PCI" which isnot
PCI.

> * Reverting the original approach towards MCFG quirks.
> * Double-standards in action as displayed by 2 cases above.

The truth is we've had an inconsistent approach. But an understandable
one. It's painful to take quirks. I am grateful that the maintainers
are willing to consider this approach now in order to get to where we
want to be, but I completely understand the hesitance in the past.
Along with the above, we all need to do all we can to ensure that
quirks are an absolute last resort. It's one thing to have a corner
case issue that couldn't be tested pre-silicon, but there is *no
excuse* in 2021 to ever tape out another chip that hasn't had at least
a basic level of ECAM testing (and obviously it should be much more).
Emulation time should catch the vast majority of bugs as real PCIe
devices are used against a design using speed bridges and the like.
There's no excuse not to test. And frankly it boggles my mind that
anyone would think that was a prudent way to do business. You can have
every distro "just work" by doing it right, or you can have years of
pain by doing it wrong. And too many still think the BSP hack it up
model is the way to go. We ought to be dealing predominantly with the
long tail of stuff that is using obviously busted IP that was already
baked. We can use quirks for that. But then they need to go away and
be replaced with real ECAM that works on future platforms.

> I'm sorry for my bitter tone, but I think this time could and should
> have been spent better - I doubt it managed to push us in any
> significant way towards wide fully-standard compliant PCIE IP
> adoption.

Truthfully there will be some parts of the Arm story that will be
unpleasant footnotes in the historical telling. How we haven't moved
the third party IP vendors faster is a significant one. I think we
have a chance to finally change that now that Arm is gaining traction.
I am very sad that some of the early comers who tried to do the right
thing had to deal with the state of third party IP at the time.

Jon.

Re: [PATCH v3 2/2] arm64: mm: reserve CMA and crashkernel in ZONE_DMA32

2021-03-22 Thread Jon Masters


On 3/22/21 2:34 PM, Jon Masters wrote:

Hi Nicolas,

On 11/7/19 4:56 AM, Nicolas Saenz Julienne wrote:

With the introduction of ZONE_DMA in arm64 we moved the default CMA and
crashkernel reservation into that area. This caused a regression on big
machines that need big CMA and crashkernel reservations. Note that
ZONE_DMA is only 1GB big.

Restore the previous behavior as the wide majority of devices are OK
with reserving these in ZONE_DMA32. The ones that need them in ZONE_DMA
will configure it explicitly.

Reported-by: Qian Cai 
Signed-off-by: Nicolas Saenz Julienne 
---
  arch/arm64/mm/init.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 580d1052ac34..8385d3c0733f 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -88,7 +88,7 @@ static void __init reserve_crashkernel(void)
  if (crash_base == 0) {
  /* Current arm64 boot protocol requires 2MB alignment */
-    crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+    crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
  crash_size, SZ_2M);
  if (crash_base == 0) {
  pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
@@ -454,7 +454,7 @@ void __init arm64_memblock_init(void)
  high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
-    dma_contiguous_reserve(arm64_dma_phys_limit ? : 
arm64_dma32_phys_limit);

+    dma_contiguous_reserve(arm64_dma32_phys_limit);
  }
  void __init bootmem_init(void)


Can we get a bit more of a backstory about what the regression was on 
larger machines? If the 32-bit DMA region is too small, but the machine 
otherwise has plenty of memory, the crashkernel reservation will fail. 
Most e.g. enterprise users aren't going to respond to that situation by 
determining the placement manually, they'll just not have a crashkernel.


Nevermind, looks like Catalin already changed this logic in Jan 2021 by 
removing arm64_dma32_phys_limit and I'm out of date.


Jon.

--
Computer Architect

Re: [PATCH v3 2/2] arm64: mm: reserve CMA and crashkernel in ZONE_DMA32

2021-03-22 Thread Jon Masters


Hi Nicolas,

On 11/7/19 4:56 AM, Nicolas Saenz Julienne wrote:

With the introduction of ZONE_DMA in arm64 we moved the default CMA and
crashkernel reservation into that area. This caused a regression on big
machines that need big CMA and crashkernel reservations. Note that
ZONE_DMA is only 1GB big.

Restore the previous behavior as the wide majority of devices are OK
with reserving these in ZONE_DMA32. The ones that need them in ZONE_DMA
will configure it explicitly.

Reported-by: Qian Cai 
Signed-off-by: Nicolas Saenz Julienne 
---
  arch/arm64/mm/init.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 580d1052ac34..8385d3c0733f 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -88,7 +88,7 @@ static void __init reserve_crashkernel(void)
  
  	if (crash_base == 0) {

/* Current arm64 boot protocol requires 2MB alignment */
-   crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+   crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
crash_size, SZ_2M);
if (crash_base == 0) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
@@ -454,7 +454,7 @@ void __init arm64_memblock_init(void)
  
  	high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
  
-	dma_contiguous_reserve(arm64_dma_phys_limit ? : arm64_dma32_phys_limit);

+   dma_contiguous_reserve(arm64_dma32_phys_limit);
  }
  
  void __init bootmem_init(void)


Can we get a bit more of a backstory about what the regression was on 
larger machines? If the 32-bit DMA region is too small, but the machine 
otherwise has plenty of memory, the crashkernel reservation will fail. 
Most e.g. enterprise users aren't going to respond to that situation by 
determining the placement manually, they'll just not have a crashkernel.


Jon.

--
Computer Architect

Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-24 Thread Jon Masters


On 1/7/21 1:09 PM, Florian Fainelli wrote:

On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:

On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:

Hi Greg and Konrad,

This change is intended to be non-arch specific. Any arch that lacks DMA access
control and has devices not behind an IOMMU can make use of it. Could you share
why you think this should be arch specific?


The idea behind non-arch specific code is it to be generic. The devicetree
is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
be in arch specific code.


In premise the same code could be used with an ACPI enabled system with
an appropriate service to identify the restricted DMA regions and unlock
them.

More than 1 architecture requiring this function (ARM and ARM64 are the
two I can think of needing this immediately) sort of calls for making
the code architecture agnostic since past 2, you need something that scales.

There is already code today under kernel/dma/contiguous.c that is only
activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
no different.




Just a note for history/archives that this approach would not be 
appropriate on general purpose Arm systems, such as SystemReady-ES 
edge/non-server platforms seeking to run general purpose distros. I want 
to have that in the record before someone at Arm (or NVidia, or a bunch 
of others that come to mind who have memory firewalls) gets an idea.


If you're working at an Arm vendor and come looking at this later 
thinking "wow, what a great idea!", please fix your hardware to have a 
real IOMMU/SMMU and real PCIe. You'll be pointed at this reply.


Jon.

--
Computer Architect

Re: [PATCH] arm64: PCI: Enable SMC conduit

2021-01-07 Thread Jon Masters


Hi will, everyone,

On 1/7/21 1:14 PM, Will Deacon wrote:


On Mon, Jan 04, 2021 at 10:57:35PM -0600, Jeremy Linton wrote:

Given that most arm64 platform's PCI implementations needs quirks
to deal with problematic config accesses, this is a good place to
apply a firmware abstraction. The ARM PCI SMMCCC spec details a
standard SMC conduit designed to provide a simple PCI config
accessor. This specification enhances the existing ACPI/PCI
abstraction and expects power, config, etc functionality is handled
by the platform. It also is very explicit that the resulting config
space registers must behave as is specified by the pci specification.

Lets hook the normal ACPI/PCI config path, and when we detect
missing MADT data, attempt to probe the SMC conduit. If the conduit
exists and responds for the requested segment number (provided by the
ACPI namespace) attach a custom pci_ecam_ops which redirects
all config read/write requests to the firmware.

This patch is based on the Arm PCI Config space access document @
https://developer.arm.com/documentation/den0115/latest


Why does firmware need to be involved with this at all? Can't we just
quirk Linux when these broken designs show up in production? We'll need
to modify Linux _anyway_ when the firmware interface isn't implemented
correctly...


I agree with Will on this. I think we want to find a way to address some 
of the non-compliance concerns through quirks in Linux. However...


Several folks here (particularly Lorenzo) have diligently worked hard 
over the past few years - and pushed their patience - to accommodate 
hardware vendors with early "not quite compliant" systems. They've taken 
lots of quirks that frankly shouldn't continue to be necessary were it 
even remotely a priority in the vendor ecosystem to get a handle on 
addressing PCIe compliance once and for all. But, again frankly, it 
hasn't been enough of a priority to get this fixed. The third party IP 
vendors *need* to address this, and their customers *need* to push back.


We can't keep having a situation in which kinda-sorta compliant stuff 
comes to market that would work out of the box but for whatever the 
quirk is this time around. There have been multiple OS releases for the 
past quite a few years on which this stuff could be tested prior to ever 
taping out a chip, and so it ought not to be possible to come to market 
now with an excuse that it wasn't tested. And yet here we still are. All 
these years and still the message isn't quite being received properly. I 
do know it takes time to make hardware, and some of it was designed 
years before and is still trickling down into these threads. But I also 
think there are cases where much more could have been done earlier.


None of these vendors can possibly want this deep down. Their engineers 
almost certainly realize that just having compliant ECAM would mean that 
the hardware was infinitely more valuable being able to run out of the 
box software that much more easily. And it's not just ECAM. Inevitably, 
that is just the observable syndrome for worse issues, often with the 
ITS and forcing quirked systems to have lousy legacy interrupts, etc. 
Alas, this level of nuance is likely lost by the time it reaches upper 
management, where "Linux" is all the same to them. I would hope that can 
change. I would also remind them that if they want to run non-Linux 
OSes, they will also want to be actually compliant. The willingness of 
kind folks like Lorenzo and others here to entertain quirks is not 
necessarily something you will find in every part of the industry.


But that all said, we have a situation in which there are still 
platforms out there that aren't fully compliant and something has to be 
done to support them because otherwise it's going to be even more ugly 
with vendor trees, distro hacks, and other stuff.


Some of you in recent weeks have asked what I and others can do to help 
from the distro and standardization side of things. To do my part, I'm 
going to commit to reach out to assorted vendors and have a heart to 
heart with them about really, truly fully addressing their compliance 
issues. That includes Cadence, Synopsys, and others who need to stop 
shipping IP that requires quirks, as well as SoC vendors who need to do 
more to test their silicon with stock kernels prior to taping out. And I 
would like to involve the good folks here who are trying to navigate.


I would also politely suggest that we collectively consider how much 
wiggle room there can be to use quirks for what we are stuck with rather 
than an SMC-based solution. We all know that quirks can't be a free ride 
forever. Those who need them should offer something strong in return. A 
firm commitment that they will never come asking for the same stuff in 
the future. Is there a way we can do something like that?


Jon.

--
Computer Architect

Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

2020-11-25 Thread Jon Masters


On 11/11/20 12:43 AM, Ben Widawsky wrote:


+   case CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX:
+   dev_dbg(>pdev->dev,
+  "found UNSUPPORTED Secondary Mailbox 
capability\n");


Per spec, the secondary mailbox is intended for use by platform 
firmware, so Linux should never be using it anyway. Maybe that message 
is slightly misleading?


Jon.

P.S. Related - I've severe doubts about the mailbox approach being 
proposed by CXL and have begun to push back through the spec org.


--
Computer Architect

Re: Litmus test for question from Al Viro

2020-10-02 Thread Jon Masters


On 10/1/20 12:15 PM, Alan Stern wrote:

On Wed, Sep 30, 2020 at 09:51:16PM -0700, Paul E. McKenney wrote:

Hello!

Al Viro posted the following query:



 fun question regarding barriers, if you have time for that
 V->A = V->B = 1;

 CPU1:
 to_free = NULL
 spin_lock()
 if (!smp_load_acquire(>B))
 to_free = V
 V->A = 0
 spin_unlock()
 kfree(to_free)

 CPU2:
 to_free = V;
 if (READ_ONCE(V->A)) {
 spin_lock()
 if (V->A)
 to_free = NULL
 smp_store_release(>B, 0);
 spin_unlock()
 }
 kfree(to_free);
 1) is it guaranteed that V will be freed exactly once and that
  no accesses to *V will happen after freeing it?
 2) do we need smp_store_release() there?  I.e. will anything
  break if it's replaced with plain V->B = 0?


Here are my answers to Al's questions:

1) It is guaranteed that V will be freed exactly once.  It is not
guaranteed that no accesses to *V will occur after it is freed, because
the test contains a data race.  CPU1's plain "V->A = 0" write races with
CPU2's READ_ONCE; if the plain write were replaced with
"WRITE_ONCE(V->A, 0)" then the guarantee would hold.  Equally well,
CPU1's smp_load_acquire could be replaced with a plain read while the
plain write is replaced with smp_store_release.

2) The smp_store_release in CPU2 is not needed.  Replacing it with a
plain V->B = 0 will not break anything.


This was my interpretation also. I made the mistake of reading this 
right before trying to go to bed the other night and ended up tweeting 
at Paul that I'd regret it if he gave me scary dreams. Thought about it 
and read your write up and it is still exactly how I see it.


Jon.

--
Computer Architect

Re: dma-coherent property for PCIe Root

2020-09-30 Thread Jon Masters


On 9/14/20 1:23 AM, Valmiki wrote:

Hi All,

How does "dma-coherent" property will work for PCIe as RC on an
ARM SOC ?
Because the end point device drivers are the one which will request dma 
buffers and Root port driver doesn't involve in data path of end point

except for handling interrupts.

How does EP DMA buffers will be hardware coherent if RC driver exposes
dma-coherent property ?


This simply means that the RC supports maintaining coherency, it doesn't 
mean that the RC driver does anything. It's a property of the hardware.


Jon.

--
Computer Architect

Re: Linux 5.3-rc8

2019-10-03 Thread Jon Masters

On 9/10/19 12:21 AM, Ahmed S. Darwish wrote:

> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.

Tangent: I asked aloud on Twitter last night if anyone had exploited
Rowhammer-like effects to generate entropy...and sure enough, the usual
suspects have: https://arxiv.org/pdf/1808.04286.pdf

While this requires low level access to a memory controller, it's
perhaps an example of something a platform designer could look at as a
source to introduce boot-time entropy for e.g. EFI_RNG_PROTOCOL even on
an existing platform without dedicated hardware for the purpose.

Just a thought.

Jon.

Re: Linux 5.3-rc8

2019-10-03 Thread Jon Masters

On 9/10/19 12:21 AM, Ahmed S. Darwish wrote:

> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.

Tangent: I asked aloud on Twitter last night if anyone had exploited
Rowhammer-like effects to generate entropy...and sure enough, the usual
suspects have: https://arxiv.org/pdf/1808.04286.pdf

While this requires low level access to a memory controller, it's
perhaps an example of something a platform designer could look at as a
source to introduce boot-time entropy for e.g. EFI_RNG_PROTOCOL even on
an existing platform without dedicated hardware for the purpose.

Just a thought.

Jon.

Re: [PATCH v3] Documentation: Add section about CPU vulnerabilities for Spectre

2019-06-17 Thread Jon Masters

On 6/17/19 4:22 PM, Jon Masters wrote:

>> +   For kernel code that has been identified where data pointers could
>> +   potentially be influenced for Spectre attacks, new "nospec" accessor
>> +   macros are used to prevent speculative loading of data.
> 
> Maybe explain that nospec (speculative clamping) relies on the absence
> of value prediction in the masking (in current hardware). It may NOT
> always be a safe approach in future hardware, where Spectre-v1 attacks
> are likely to persist but hardware may speculate about the mask value.

Something like the Arm CSDB barrier would seem to be potentially useful
for $FUTURE_X86 as a fence with lighter-weight semantics than an *fence.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop

Re: [PATCH v3] Documentation: Add section about CPU vulnerabilities for Spectre

2019-06-17 Thread Jon Masters

Hi Tim,

Nice writeup. A few suggestions inline.

On 6/17/19 3:11 PM, Tim Chen wrote:

> +In Spectre variant 2 attacks, the attacker can steer speculative indirect
> +branches in the victim to gadget code by poisoning the branch target
> +buffer of a CPU used for predicting indirect branch addresses. Such
> +poisoning could be done by indirect branching into existing code, with the
> +address offset of the indirect branch under the attacker's control. Since
> +the branch prediction hardware does not fully disambiguate branch address
> +and uses the offset for prediction, this could cause privileged code's
> +indirect branch to jump to a gadget code with the same offset.

Maybe mention "on impacted hardware" (implied).

> +One other variant 2 attack vector is for the attacker to poison the
> +return stack buffer (RSB) [13] to cause speculative RET execution to go
> +to an gadget.  An attacker's imbalanced CALL instructions might "poison"
> +entries in the return stack buffer which are later consumed by a victim's
> +RET instruction.  This attack can be mitigated by flushing the return
> +stack buffer on context switch, or VM exit.

Maybe replace CALL and RET with generic equivalents or label as x86
examples.

> +   For kernel code that has been identified where data pointers could
> +   potentially be influenced for Spectre attacks, new "nospec" accessor
> +   macros are used to prevent speculative loading of data.

Maybe explain that nospec (speculative clamping) relies on the absence
of value prediction in the masking (in current hardware). It may NOT
always be a safe approach in future hardware, where Spectre-v1 attacks
are likely to persist but hardware may speculate about the mask value.

> +   On x86, a user process can protect itself against Spectre variant
> +   2 attacks by using the prctl() syscall to disable indirect branch
> +   speculation for itself.

This is not x86 specific. The same prctl is wired up elsewhere also.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop

Re: [LSF/MM TOPIC] FS, MM, and stable trees

2019-03-20 Thread Jon Masters

On 3/20/19 2:28 AM, Greg KH wrote:
> On Wed, Mar 20, 2019 at 02:14:09AM -0400, Jon Masters wrote:
>> On 3/20/19 1:06 AM, Greg KH wrote:
>>> On Tue, Mar 19, 2019 at 11:46:09PM -0400, Jon Masters wrote:
>>>> On 2/13/19 2:52 PM, Greg KH wrote:
>>>>> On Wed, Feb 13, 2019 at 02:25:12PM -0500, Sasha Levin wrote:
>>>>
>>>>>> So really, it sounds like a low hanging fruit: we don't really need to
>>>>>> write much more testing code code nor do we have to refactor existing
>>>>>> test suites. We just need to make sure the right tests are running on
>>>>>> stable kernels. I really want to clarify what each subsystem sees as
>>>>>> "sufficient" (and have that documented somewhere).
>>>>>
>>>>> kernel.ci and 0-day and Linaro are starting to add the fs and mm tests
>>>>> to their test suites to address these issues (I think 0-day already has
>>>>> many of them).  So this is happening, but not quite obvious.  I know I
>>>>> keep asking Linaro about this :(
>>>>
>>>> We're working on investments for LDCG[0] in 2019 that include kernel CI
>>>> changes for server use cases. Please keep us informed of what you folks
>>>> ultimately want to see, and I'll pass on to the steering committee too.
>>>>
>>>> Ultimately I've been pushing for a kernel 0-day project for Arm. That's
>>>> probably going to require a lot of duplicated effort since the original
>>>> 0-day project isn't open, but creating an open one could help everyone.
>>>
>>> Why are you trying to duplicate it on your own?  That's what kernel.ci
>>> should be doing, please join in and invest in that instead.  It's an
>>> open source project with its own governance and needs sponsors, why
>>> waste time and money doing it all on your own?
>>
>> To clarify, I'm pushing for investment in kernel.ci to achieve that goal
>> that it could provide the same 0-day capability for Arm and others.
> 
> Great, that's what I was trying to suggest :)
> 
>> It'll ultimately result in duplicated effort vs if 0-day were open.
> 
> "Half" of 0-day is open, but it's that other half that is still
> needed...

;) I'm hoping this might also help that to happen...

Best,

Jon.

Re: [LSF/MM TOPIC] FS, MM, and stable trees

2019-03-20 Thread Jon Masters

On 3/20/19 1:06 AM, Greg KH wrote:
> On Tue, Mar 19, 2019 at 11:46:09PM -0400, Jon Masters wrote:
>> On 2/13/19 2:52 PM, Greg KH wrote:
>>> On Wed, Feb 13, 2019 at 02:25:12PM -0500, Sasha Levin wrote:
>>
>>>> So really, it sounds like a low hanging fruit: we don't really need to
>>>> write much more testing code code nor do we have to refactor existing
>>>> test suites. We just need to make sure the right tests are running on
>>>> stable kernels. I really want to clarify what each subsystem sees as
>>>> "sufficient" (and have that documented somewhere).
>>>
>>> kernel.ci and 0-day and Linaro are starting to add the fs and mm tests
>>> to their test suites to address these issues (I think 0-day already has
>>> many of them).  So this is happening, but not quite obvious.  I know I
>>> keep asking Linaro about this :(
>>
>> We're working on investments for LDCG[0] in 2019 that include kernel CI
>> changes for server use cases. Please keep us informed of what you folks
>> ultimately want to see, and I'll pass on to the steering committee too.
>>
>> Ultimately I've been pushing for a kernel 0-day project for Arm. That's
>> probably going to require a lot of duplicated effort since the original
>> 0-day project isn't open, but creating an open one could help everyone.
> 
> Why are you trying to duplicate it on your own?  That's what kernel.ci
> should be doing, please join in and invest in that instead.  It's an
> open source project with its own governance and needs sponsors, why
> waste time and money doing it all on your own?

To clarify, I'm pushing for investment in kernel.ci to achieve that goal
that it could provide the same 0-day capability for Arm and others.
It'll ultimately result in duplicated effort vs if 0-day were open.

Jon.

Re: [LSF/MM TOPIC] FS, MM, and stable trees

2019-03-19 Thread Jon Masters

On 2/13/19 2:52 PM, Greg KH wrote:
> On Wed, Feb 13, 2019 at 02:25:12PM -0500, Sasha Levin wrote:

>> So really, it sounds like a low hanging fruit: we don't really need to
>> write much more testing code code nor do we have to refactor existing
>> test suites. We just need to make sure the right tests are running on
>> stable kernels. I really want to clarify what each subsystem sees as
>> "sufficient" (and have that documented somewhere).
> 
> kernel.ci and 0-day and Linaro are starting to add the fs and mm tests
> to their test suites to address these issues (I think 0-day already has
> many of them).  So this is happening, but not quite obvious.  I know I
> keep asking Linaro about this :(

We're working on investments for LDCG[0] in 2019 that include kernel CI
changes for server use cases. Please keep us informed of what you folks
ultimately want to see, and I'll pass on to the steering committee too.

Ultimately I've been pushing for a kernel 0-day project for Arm. That's
probably going to require a lot of duplicated effort since the original
0-day project isn't open, but creating an open one could help everyone.

Jon.

[0] Linaro DataCenter Group (formerly "LEG")

Re: [PATCH v2 00/11] arch/x86: AMD QoS support

2018-11-02 Thread Jon Masters

On 10/5/18 4:55 PM, Moger, Babu wrote:

> The public specification for this feature is available at
> https://www.amd.com/system/files/TechDocs/56375_Quality_of_Service_Extensions.pdf

404 error

Re: [PATCH v2 00/11] arch/x86: AMD QoS support

2018-11-02 Thread Jon Masters

On 10/5/18 4:55 PM, Moger, Babu wrote:

> The public specification for this feature is available at
> https://www.amd.com/system/files/TechDocs/56375_Quality_of_Service_Extensions.pdf

404 error

Re: [SCHEDULER] Performance drop in 4.19 compared to 4.18 kernel

2018-10-04 Thread Jon Masters

On 9/7/18 5:34 AM, Jirka Hladky wrote:

> We would also be more than happy to test the new patches for the
> performance - please let us know if you are interested.  We have a
> pool of 1 NUMA up to 8 NUMA boxes for that, both AMD and Intel,
> covering different CPU generations from Sandy Bridge till Skylake.

I've followed up internally on ensuring Arm is added to your mix. Can
get you suitable 1 and 2P systems for internal use.

Jon.

Re: [SCHEDULER] Performance drop in 4.19 compared to 4.18 kernel

2018-10-04 Thread Jon Masters

On 9/7/18 5:34 AM, Jirka Hladky wrote:

> We would also be more than happy to test the new patches for the
> performance - please let us know if you are interested.  We have a
> pool of 1 NUMA up to 8 NUMA boxes for that, both AMD and Intel,
> covering different CPU generations from Sandy Bridge till Skylake.

I've followed up internally on ensuring Arm is added to your mix. Can
get you suitable 1 and 2P systems for internal use.

Jon.

Re: [RFC 00/60] Coscheduling for Linux

2018-10-04 Thread Jon Masters

On 9/7/18 5:39 PM, Jan H. Schönherr wrote:

> The collective context switch from one coscheduled set of tasks to another
> -- while fast -- is not atomic. If a use-case needs the absolute guarantee
> that all tasks of the previous set have stopped executing before any task
> of the next set starts executing, an additional hand-shake/barrier needs to
> be added.

In case nobody else brought it up yet, you're going to need a handshake
to strengthen protection against L1TF attacks. Otherwise, there's still
a small window where an attack can occur during the reschedule. Perhaps
one could then cause this to happen artificially by repeatedly have a VM
do some kind of pause/mwait type operation that might do a reschedule.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop

Re: [RFC 00/60] Coscheduling for Linux

2018-10-04 Thread Jon Masters

On 9/7/18 5:39 PM, Jan H. Schönherr wrote:

> The collective context switch from one coscheduled set of tasks to another
> -- while fast -- is not atomic. If a use-case needs the absolute guarantee
> that all tasks of the previous set have stopped executing before any task
> of the next set starts executing, an additional hand-shake/barrier needs to
> be added.

In case nobody else brought it up yet, you're going to need a handshake
to strengthen protection against L1TF attacks. Otherwise, there's still
a small window where an attack can occur during the reschedule. Perhaps
one could then cause this to happen artificially by repeatedly have a VM
do some kind of pause/mwait type operation that might do a reschedule.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop

Re: [PATCH 2/2] x86/speculation: Provide application property based STIBP protection

2018-10-02 Thread Jon Masters

Quick reply: I agree, I'm just supporting this :)

-- 
Computer Architect


> On Oct 2, 2018, at 11:43, Jiri Kosina  wrote:
> 
> On Tue, 2 Oct 2018, Jon Masters wrote:
> 
>>> This patch provides an application property based spectre_v2
>>> protection with STIBP against attack from another app from
>>> a sibling hyper-thread.  For security sensitive non-dumpable
>>> app, STIBP will be turned on before switching to it for Intel
>>> processors vulnerable to spectre_v2.
>> 
>> A general comment. I think in practice this will be similar to the
>> speculative store buffer bypass (aka "variant 4") issue in terms of
>> opt-in mitigation. Many users won't want to take the performance hit of
>> having STIBP by default for peer threads. We should make sure that we
>> don't force users into a mitigation but retain an option. Whether it's
>> default-on or not can be debated, though I think the vendors lean toward
>> having default-off with an opt-in, and customers will probably agree. So
>> anyway, I encourage a pragmatic approach similar to that for SSBD.
> 
> Which is what Tim's patchset is implementing on top.
> 
> Thanks,
> 
> -- 
> Jiri Kosina
> SUSE Labs
>

Re: [PATCH 2/2] x86/speculation: Provide application property based STIBP protection

2018-10-02 Thread Jon Masters

Quick reply: I agree, I'm just supporting this :)

-- 
Computer Architect


> On Oct 2, 2018, at 11:43, Jiri Kosina  wrote:
> 
> On Tue, 2 Oct 2018, Jon Masters wrote:
> 
>>> This patch provides an application property based spectre_v2
>>> protection with STIBP against attack from another app from
>>> a sibling hyper-thread.  For security sensitive non-dumpable
>>> app, STIBP will be turned on before switching to it for Intel
>>> processors vulnerable to spectre_v2.
>> 
>> A general comment. I think in practice this will be similar to the
>> speculative store buffer bypass (aka "variant 4") issue in terms of
>> opt-in mitigation. Many users won't want to take the performance hit of
>> having STIBP by default for peer threads. We should make sure that we
>> don't force users into a mitigation but retain an option. Whether it's
>> default-on or not can be debated, though I think the vendors lean toward
>> having default-off with an opt-in, and customers will probably agree. So
>> anyway, I encourage a pragmatic approach similar to that for SSBD.
> 
> Which is what Tim's patchset is implementing on top.
> 
> Thanks,
> 
> -- 
> Jiri Kosina
> SUSE Labs
>

Re: [PATCH 2/2] x86/speculation: Provide application property based STIBP protection

2018-10-02 Thread Jon Masters

On 9/19/18 5:35 PM, Tim Chen wrote:
> This patch provides an application property based spectre_v2
> protection with STIBP against attack from another app from
> a sibling hyper-thread.  For security sensitive non-dumpable
> app, STIBP will be turned on before switching to it for Intel
> processors vulnerable to spectre_v2.

A general comment. I think in practice this will be similar to the
speculative store buffer bypass (aka "variant 4") issue in terms of
opt-in mitigation. Many users won't want to take the performance hit of
having STIBP by default for peer threads. We should make sure that we
don't force users into a mitigation but retain an option. Whether it's
default-on or not can be debated, though I think the vendors lean toward
having default-off with an opt-in, and customers will probably agree. So
anyway, I encourage a pragmatic approach similar to that for SSBD.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop

Re: [PATCH 2/2] x86/speculation: Provide application property based STIBP protection

2018-10-02 Thread Jon Masters

On 9/19/18 5:35 PM, Tim Chen wrote:
> This patch provides an application property based spectre_v2
> protection with STIBP against attack from another app from
> a sibling hyper-thread.  For security sensitive non-dumpable
> app, STIBP will be turned on before switching to it for Intel
> processors vulnerable to spectre_v2.

A general comment. I think in practice this will be similar to the
speculative store buffer bypass (aka "variant 4") issue in terms of
opt-in mitigation. Many users won't want to take the performance hit of
having STIBP by default for peer threads. We should make sure that we
don't force users into a mitigation but retain an option. Whether it's
default-on or not can be debated, though I think the vendors lean toward
having default-off with an opt-in, and customers will probably agree. So
anyway, I encourage a pragmatic approach similar to that for SSBD.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop

Re: [RFC PATCH 00/11] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64

2018-09-13 Thread Jon Masters

Tnx

-- 
Computer Architect


> On Sep 13, 2018, at 11:52, Will Deacon  wrote:
> 
>> On Fri, Sep 07, 2018 at 02:36:08AM -0400, Jon Masters wrote:
>>> On 09/05/2018 08:28 AM, Will Deacon wrote:
>>>> On Tue, Sep 04, 2018 at 02:38:02PM -0400, Jon Masters wrote:
>>>>> On 08/24/2018 11:52 AM, Will Deacon wrote:
>>>>> 
>>>>> I hacked up this RFC on the back of the recent changes to the mmu_gather
>>>>> stuff in mainline. It's had a bit of testing and it looks pretty good so
>>>>> far.
>>>> 
>>>> I will request the server folks go and test this. You'll probably
>>>> remember a couple of parts we've seen where aggressive walker caches
>>>> ended up (correctly) seeing stale page table entries and we had all
>>>> manner of horrifically hard to debug problems. We have some fairly nice
>>>> reproducers that were able to find this last time that we can test.
>>> 
>>> Cheers, Jon, that would be very helpful. You're probably best off using
>>> my (rebasing) tlb branch rather than picking the RFC:
>>> 
>>>  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git tlb
>>> 
>>> Let me know if you'd prefer something stable (I can tag it with a date).
>> 
>> That would be useful. I've prodded each of the Arm server SoC vendors I
>> work with via our weekly call to have them each specifically check this.
>> A tag would be helpful to that effort I expect. They all claim to be
>> watching this thread now, so we'll see if they see cabbages here.
> 
> This is now all queued up in the (stable) arm64 for-next/core branch:
> 
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/core
> 
> so that's the best place to grab the patches.
> 
> Will

Re: [RFC PATCH 00/11] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64

2018-09-13 Thread Jon Masters

Tnx

-- 
Computer Architect


> On Sep 13, 2018, at 11:52, Will Deacon  wrote:
> 
>> On Fri, Sep 07, 2018 at 02:36:08AM -0400, Jon Masters wrote:
>>> On 09/05/2018 08:28 AM, Will Deacon wrote:
>>>> On Tue, Sep 04, 2018 at 02:38:02PM -0400, Jon Masters wrote:
>>>>> On 08/24/2018 11:52 AM, Will Deacon wrote:
>>>>> 
>>>>> I hacked up this RFC on the back of the recent changes to the mmu_gather
>>>>> stuff in mainline. It's had a bit of testing and it looks pretty good so
>>>>> far.
>>>> 
>>>> I will request the server folks go and test this. You'll probably
>>>> remember a couple of parts we've seen where aggressive walker caches
>>>> ended up (correctly) seeing stale page table entries and we had all
>>>> manner of horrifically hard to debug problems. We have some fairly nice
>>>> reproducers that were able to find this last time that we can test.
>>> 
>>> Cheers, Jon, that would be very helpful. You're probably best off using
>>> my (rebasing) tlb branch rather than picking the RFC:
>>> 
>>>  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git tlb
>>> 
>>> Let me know if you'd prefer something stable (I can tag it with a date).
>> 
>> That would be useful. I've prodded each of the Arm server SoC vendors I
>> work with via our weekly call to have them each specifically check this.
>> A tag would be helpful to that effort I expect. They all claim to be
>> watching this thread now, so we'll see if they see cabbages here.
> 
> This is now all queued up in the (stable) arm64 for-next/core branch:
> 
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/core
> 
> so that's the best place to grab the patches.
> 
> Will

Re: [RFC PATCH 00/11] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64

2018-09-07 Thread Jon Masters

On 09/05/2018 08:28 AM, Will Deacon wrote:
> On Tue, Sep 04, 2018 at 02:38:02PM -0400, Jon Masters wrote:
>> On 08/24/2018 11:52 AM, Will Deacon wrote:
>>
>>> I hacked up this RFC on the back of the recent changes to the mmu_gather
>>> stuff in mainline. It's had a bit of testing and it looks pretty good so
>>> far.
>>
>> I will request the server folks go and test this. You'll probably
>> remember a couple of parts we've seen where aggressive walker caches
>> ended up (correctly) seeing stale page table entries and we had all
>> manner of horrifically hard to debug problems. We have some fairly nice
>> reproducers that were able to find this last time that we can test.
> 
> Cheers, Jon, that would be very helpful. You're probably best off using
> my (rebasing) tlb branch rather than picking the RFC:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git tlb
> 
> Let me know if you'd prefer something stable (I can tag it with a date).

That would be useful. I've prodded each of the Arm server SoC vendors I
work with via our weekly call to have them each specifically check this.
A tag would be helpful to that effort I expect. They all claim to be
watching this thread now, so we'll see if they see cabbages here.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [RFC PATCH 00/11] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64

2018-09-07 Thread Jon Masters

On 09/05/2018 08:28 AM, Will Deacon wrote:
> On Tue, Sep 04, 2018 at 02:38:02PM -0400, Jon Masters wrote:
>> On 08/24/2018 11:52 AM, Will Deacon wrote:
>>
>>> I hacked up this RFC on the back of the recent changes to the mmu_gather
>>> stuff in mainline. It's had a bit of testing and it looks pretty good so
>>> far.
>>
>> I will request the server folks go and test this. You'll probably
>> remember a couple of parts we've seen where aggressive walker caches
>> ended up (correctly) seeing stale page table entries and we had all
>> manner of horrifically hard to debug problems. We have some fairly nice
>> reproducers that were able to find this last time that we can test.
> 
> Cheers, Jon, that would be very helpful. You're probably best off using
> my (rebasing) tlb branch rather than picking the RFC:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git tlb
> 
> Let me know if you'd prefer something stable (I can tag it with a date).

That would be useful. I've prodded each of the Arm server SoC vendors I
work with via our weekly call to have them each specifically check this.
A tag would be helpful to that effort I expect. They all claim to be
watching this thread now, so we'll see if they see cabbages here.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [RFC PATCH 00/11] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64

2018-09-04 Thread Jon Masters

On 08/24/2018 11:52 AM, Will Deacon wrote:

> I hacked up this RFC on the back of the recent changes to the mmu_gather
> stuff in mainline. It's had a bit of testing and it looks pretty good so
> far.

I will request the server folks go and test this. You'll probably
remember a couple of parts we've seen where aggressive walker caches
ended up (correctly) seeing stale page table entries and we had all
manner of horrifically hard to debug problems. We have some fairly nice
reproducers that were able to find this last time that we can test.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [RFC PATCH 00/11] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64

2018-09-04 Thread Jon Masters

On 08/24/2018 11:52 AM, Will Deacon wrote:

> I hacked up this RFC on the back of the recent changes to the mmu_gather
> stuff in mainline. It's had a bit of testing and it looks pretty good so
> far.

I will request the server folks go and test this. You'll probably
remember a couple of parts we've seen where aggressive walker caches
ended up (correctly) seeing stale page table entries and we had all
manner of horrifically hard to debug problems. We have some fairly nice
reproducers that were able to find this last time that we can test.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH 2/3] ACPI: SPCR: Add support for AMD CT/SZ

2018-04-16 Thread Jon Masters

On 03/13/2018 08:36 PM, Daniel Kurtz wrote:
> AMD Carrizo / Stoneyridge use a DesignWare 8250 UART that uses a special
> earlycon setup handler to configure its input clock in order to compute
> baud rate divisor registers.
> Detect them by examining the OEMID field in the SPCR header, and pass
> then pass uart type amdcz to earlycon.

FYI I spoke with Microsoft some time ago about standardizing the ability
to convey non-standard baud rates in the SPCR. If you're interested in
driving, I can put you in touch with the right person over there.

Jon.

-- 
Computer Architect

Re: [PATCH 2/3] ACPI: SPCR: Add support for AMD CT/SZ

2018-04-16 Thread Jon Masters

On 03/13/2018 08:36 PM, Daniel Kurtz wrote:
> AMD Carrizo / Stoneyridge use a DesignWare 8250 UART that uses a special
> earlycon setup handler to configure its input clock in order to compute
> baud rate divisor registers.
> Detect them by examining the OEMID field in the SPCR header, and pass
> then pass uart type amdcz to earlycon.

FYI I spoke with Microsoft some time ago about standardizing the ability
to convey non-standard baud rates in the SPCR. If you're interested in
driving, I can put you in touch with the right person over there.

Jon.

-- 
Computer Architect

FLR on AER

2018-02-22 Thread Jon Masters

Hi Bjorn,

It looks like the AER driver won’t do a device FLR but instead will default to 
progressively bigger hammers. Am I missing something?

Jon.

-- 
Computer Architect | Sent from my 64-bit #ARM Powered phone

FLR on AER

2018-02-22 Thread Jon Masters

Hi Bjorn,

It looks like the AER driver won’t do a device FLR but instead will default to 
progressively bigger hammers. Am I missing something?

Jon.

-- 
Computer Architect | Sent from my 64-bit #ARM Powered phone

Re: Patch "[Variant 2/Spectre-v2] arm64: Implement branch predictor hardening for Falkor" has been added to the 4.14-stable tree

2018-02-19 Thread Jon Masters

On 02/14/2018 11:16 AM, Timur Tabi wrote:
> On Wed, Feb 14, 2018 at 7:53 AM,   wrote:
>>
>> This is a note to let you know that I've just added the patch titled
>>
>> [Variant 2/Spectre-v2] arm64: Implement branch predictor hardening for 
>> Falkor
>>
>> to the 4.14-stable tree which can be found at:
>> 
>> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>>
>> The filename of the patch is:
>>  arm64-implement-branch-predictor-hardening-for-falkor.patch
>> and it can be found in the queue-4.14 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let  know about it.
> 
> Please note that there is a follow-on patch, also from Shanker, that
> fixes this patch (it was slightly mangled when merged into 4.16-rc1):
> 
> https://www.spinics.net/lists/arm-kernel/msg633726.html
> 
> I would love for it to be included in 4.14.20, but it hasn't been
> merged into Linus' tree yet.  I will send a patch request when it does
> land in 4.16-rc2.

That has now landed in Linus's tree.

Jon.

Re: Patch "[Variant 2/Spectre-v2] arm64: Implement branch predictor hardening for Falkor" has been added to the 4.14-stable tree

2018-02-19 Thread Jon Masters

On 02/14/2018 11:16 AM, Timur Tabi wrote:
> On Wed, Feb 14, 2018 at 7:53 AM,   wrote:
>>
>> This is a note to let you know that I've just added the patch titled
>>
>> [Variant 2/Spectre-v2] arm64: Implement branch predictor hardening for 
>> Falkor
>>
>> to the 4.14-stable tree which can be found at:
>> 
>> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>>
>> The filename of the patch is:
>>  arm64-implement-branch-predictor-hardening-for-falkor.patch
>> and it can be found in the queue-4.14 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let  know about it.
> 
> Please note that there is a follow-on patch, also from Shanker, that
> fixes this patch (it was slightly mangled when merged into 4.16-rc1):
> 
> https://www.spinics.net/lists/arm-kernel/msg633726.html
> 
> I would love for it to be included in 4.14.20, but it hasn't been
> merged into Linus' tree yet.  I will send a patch request when it does
> land in 4.16-rc2.

That has now landed in Linus's tree.

Jon.

Re: [PATCH] arm64: Make L1_CACHE_SHIFT configurable

2018-02-19 Thread Jon Masters

On 02/12/2018 07:17 PM, Florian Fainelli wrote:
> On 02/12/2018 04:10 PM, Timur Tabi wrote:
>> On 02/12/2018 05:57 PM, Florian Fainelli wrote:
>>> That is debatable, is there a good publicly available table of what the
>>> typical L1 cache line size is on ARMv8 platforms?

With a server hat on...

There are many ARMv8 server platforms that do 64b today, but future
designs are likely to head toward 128b (for a variety of reasons). Many
of the earlier designs were 64b because that's what certain other arches
were using in their server cores. I doubt Vulcan will remain a unique
and special case for very long. On the CCIX side of things, I've been
trying to push people to go with 128b lines in future designs too.

Jon.

Re: [PATCH] arm64: Make L1_CACHE_SHIFT configurable

2018-02-19 Thread Jon Masters

On 02/12/2018 07:17 PM, Florian Fainelli wrote:
> On 02/12/2018 04:10 PM, Timur Tabi wrote:
>> On 02/12/2018 05:57 PM, Florian Fainelli wrote:
>>> That is debatable, is there a good publicly available table of what the
>>> typical L1 cache line size is on ARMv8 platforms?

With a server hat on...

There are many ARMv8 server platforms that do 64b today, but future
designs are likely to head toward 128b (for a variety of reasons). Many
of the earlier designs were 64b because that's what certain other arches
were using in their server cores. I doubt Vulcan will remain a unique
and special case for very long. On the CCIX side of things, I've been
trying to push people to go with 128b lines in future designs too.

Jon.

Re: [PATCH 2/2] x86/speculation: Support "Enhanced IBRS" on future CPUs

2018-02-19 Thread Jon Masters

On 02/16/2018 07:10 AM, David Woodhouse wrote:
> On Fri, 2018-02-16 at 12:04 +0100, Paolo Bonzini wrote:
>> On 16/02/2018 11:21, David Woodhouse wrote:

>>> Even if the guest doesn't have/support IBRS_ALL, and is frobbing the
>>> (now emulated) MSR on every kernel entry/exit, that's *still* going to
>>> be a metric shitload faster than what it *thought* it was doing.

Is there any indication/log to the admin that VM doesn't know about
IBRS_ALL and is constantly uselessly writing to an emulated MSR?

While it's probably true that the overhead in time is similar to (or
better than) an actual IBRS MSR write, if the admin/user knows the VM
needs updating, then there's a fighting chance that they might do so.

Jon.

Re: [PATCH 2/2] x86/speculation: Support "Enhanced IBRS" on future CPUs

2018-02-19 Thread Jon Masters

On 02/16/2018 07:10 AM, David Woodhouse wrote:
> On Fri, 2018-02-16 at 12:04 +0100, Paolo Bonzini wrote:
>> On 16/02/2018 11:21, David Woodhouse wrote:

>>> Even if the guest doesn't have/support IBRS_ALL, and is frobbing the
>>> (now emulated) MSR on every kernel entry/exit, that's *still* going to
>>> be a metric shitload faster than what it *thought* it was doing.

Is there any indication/log to the admin that VM doesn't know about
IBRS_ALL and is constantly uselessly writing to an emulated MSR?

While it's probably true that the overhead in time is similar to (or
better than) an actual IBRS MSR write, if the admin/user knows the VM
needs updating, then there's a fighting chance that they might do so.

Jon.

Re: [PATCH v3] ACPI / tables: Add IORT to injectable table list

2018-02-19 Thread Jon Masters

On 02/02/2018 07:12 AM, Wang, Dongsheng wrote:
> Hey, Hanjun,
> 
> On 2018/2/2 19:54:24, "Hanjun Guo"  wrote:
> 
>> On 2018/2/2 18:25, Yang Shunyong wrote:
>>> Loading IORT table from initrd is used to fix firmware IORT defects.
>>
>> I don't think this fix "firmware defects", it just for debug purpose,
>> we will not use that for production purpose, right? I think above line
>> can be removed.
>>
> I thinks the upgrade feature not only for debug. Here's an important
> way to fix bugs that come from the firmware.
> 
> Documentation/acpi/initrd_table_override.txt

I think we should message that this is for debug/development use.
Ultimately, it's not the way to address firmware problems. Firmware
needs to address that by shipping the correct(ed) tables :)

-- 
Computer Architect

Re: [PATCH v3] ACPI / tables: Add IORT to injectable table list

2018-02-19 Thread Jon Masters

On 02/02/2018 07:12 AM, Wang, Dongsheng wrote:
> Hey, Hanjun,
> 
> On 2018/2/2 19:54:24, "Hanjun Guo"  wrote:
> 
>> On 2018/2/2 18:25, Yang Shunyong wrote:
>>> Loading IORT table from initrd is used to fix firmware IORT defects.
>>
>> I don't think this fix "firmware defects", it just for debug purpose,
>> we will not use that for production purpose, right? I think above line
>> can be removed.
>>
> I thinks the upgrade feature not only for debug. Here's an important
> way to fix bugs that come from the firmware.
> 
> Documentation/acpi/initrd_table_override.txt

I think we should message that this is for debug/development use.
Ultimately, it's not the way to address firmware problems. Firmware
needs to address that by shipping the correct(ed) tables :)

-- 
Computer Architect

Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

2018-02-19 Thread Jon Masters

Hi Bjorn, Rafael, others,

On 02/15/2018 06:39 PM, Bjorn Helgaas wrote:
> On Thu, Feb 15, 2018 at 10:57:25PM +0100, Rafael J. Wysocki wrote:
>> On Wednesday, February 14, 2018 9:16:53 PM CET Bjorn Helgaas wrote:
>>> On Wed, Feb 14, 2018 at 04:58:08PM +0530, George Cherian wrote:
 On 02/13/2018 08:39 PM, Bjorn Helgaas wrote:
> On Fri, Feb 02, 2018 at 07:00:46AM +, George Cherian wrote:
>> The PCIe Controller on Cavium ThunderX2 processors does not
>> respond to downstream CFG/ECFG cycles when root port is
>> in power management D3-hot state.
>
> I think you're talking about the CPU initiating a config cycle to
> a device below the root port, right?
 Yes
>>>
>>> If a bridge, e.g., a Root Port in your case, is in D3hot, we should be
>>> able to access config space of the bridge itself, but the secondary
>>> bus will be in B2 or B3 and we won't be able to access config space
>>> for any devices below the bridge.  This is true for *all* bridges, not
>>> just this Cavium Root Port.
>>
>> Right.
>>
>> But AFAICS config space reads from devices that aren't there (which
>> effectively is what happens if the bridge is in D3hot) are at least
>> expected to return all ones.
> 
> Yes.  AIUI, the PCIe spec doesn't actually *require* all ones

Indeed. This was my reading of the spec last year when I originally
discovered this bug (and suggested the temporary bandaid of the runtime
kernel parameter to disable pm for the port). I've seen this on certain
Cavium ThunderX2 systems in specific configurations, but in my debug
sessions it seemed that the problem was we're expecting all 1s and we
don't get those, so we then ultimately get an SError and go to lunch.



> But from the discussion below, it sounds like this may have helped
> uncover a more serious Linux bug, i.e., we don't resume a device
> before trying to use it.

I suspected this too, but didn't get chance to followup. I had expected
the above would have been posted many months ago.

Jon.

-- 
Computer Architect

Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

2018-02-19 Thread Jon Masters

Hi Bjorn, Rafael, others,

On 02/15/2018 06:39 PM, Bjorn Helgaas wrote:
> On Thu, Feb 15, 2018 at 10:57:25PM +0100, Rafael J. Wysocki wrote:
>> On Wednesday, February 14, 2018 9:16:53 PM CET Bjorn Helgaas wrote:
>>> On Wed, Feb 14, 2018 at 04:58:08PM +0530, George Cherian wrote:
 On 02/13/2018 08:39 PM, Bjorn Helgaas wrote:
> On Fri, Feb 02, 2018 at 07:00:46AM +, George Cherian wrote:
>> The PCIe Controller on Cavium ThunderX2 processors does not
>> respond to downstream CFG/ECFG cycles when root port is
>> in power management D3-hot state.
>
> I think you're talking about the CPU initiating a config cycle to
> a device below the root port, right?
 Yes
>>>
>>> If a bridge, e.g., a Root Port in your case, is in D3hot, we should be
>>> able to access config space of the bridge itself, but the secondary
>>> bus will be in B2 or B3 and we won't be able to access config space
>>> for any devices below the bridge.  This is true for *all* bridges, not
>>> just this Cavium Root Port.
>>
>> Right.
>>
>> But AFAICS config space reads from devices that aren't there (which
>> effectively is what happens if the bridge is in D3hot) are at least
>> expected to return all ones.
> 
> Yes.  AIUI, the PCIe spec doesn't actually *require* all ones

Indeed. This was my reading of the spec last year when I originally
discovered this bug (and suggested the temporary bandaid of the runtime
kernel parameter to disable pm for the port). I've seen this on certain
Cavium ThunderX2 systems in specific configurations, but in my debug
sessions it seemed that the problem was we're expecting all 1s and we
don't get those, so we then ultimately get an SError and go to lunch.



> But from the discussion below, it sounds like this may have helped
> uncover a more serious Linux bug, i.e., we don't resume a device
> before trying to use it.

I suspected this too, but didn't get chance to followup. I had expected
the above would have been posted many months ago.

Jon.

-- 
Computer Architect

Re: [PATCH v4 00/17] arm64: Add SMCCC v1.1 support and CVE-2017-5715 (Spectre variant 2) mitigation

2018-02-15 Thread Jon Masters

Hi Marc, all,

On 02/06/2018 12:56 PM, Marc Zyngier wrote:
> ARM has recently published a SMC Calling Convention (SMCCC)
> specification update[1] that provides an optimised calling convention
> and optional, discoverable support for mitigating CVE-2017-5715. ARM
> Trusted Firmware (ATF) has already gained such an implementation[2].

I'm probably just missing something, but does this end up reported
somewhere conveniently user visible? In particular, if the new SMC is
*not* provided, does the user end up easily seeing this?

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v4 00/17] arm64: Add SMCCC v1.1 support and CVE-2017-5715 (Spectre variant 2) mitigation

2018-02-15 Thread Jon Masters

Hi Marc, all,

On 02/06/2018 12:56 PM, Marc Zyngier wrote:
> ARM has recently published a SMC Calling Convention (SMCCC)
> specification update[1] that provides an optimised calling convention
> and optional, discoverable support for mitigating CVE-2017-5715. ARM
> Trusted Firmware (ATF) has already gained such an implementation[2].

I'm probably just missing something, but does this end up reported
somewhere conveniently user visible? In particular, if the new SMC is
*not* provided, does the user end up easily seeing this?

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [RFC 04/10] x86/mm: Only flush indirect branches when switching into non dumpable process

2018-01-28 Thread Jon Masters

Hi Peter, David, all,

First a quick note on David's earlier comment, about this optimization
being still up for debate. The problem with this optimization as-is is
that it doesn't protect userspace-to-userspace unless applications are
rebuilt and we get the infrastructure to handle that (ELF, whatever).

But...

On 01/21/2018 06:22 AM, Peter Zijlstra wrote:
> On Sat, Jan 20, 2018 at 08:22:55PM +0100, KarimAllah Ahmed wrote:
>> From: Tim Chen 
>>
>> Flush indirect branches when switching into a process that marked
>> itself non dumpable.  This protects high value processes like gpg
>> better, without having too high performance overhead.
> 
> So if I understand it right, this is only needed if the 'other'
> executable itself is susceptible to spectre. If say someone audited gpg
> for spectre-v1 and build it with retpoline, it would be safe to not
> issue the IBPB, right?

More importantly, rebuilding the world introduces a lot of challenges
that need to be discussed heavily before they happen (I would like to
see someone run a session at one of the various upcoming events on
userspace, I've already prodded a few people to nudge that forward). In
particular, we don't have the infrastructure in gcc/glibc to dynamically
patch userspace call sites to enable/disable retpolines.

We discussed nasty hacks last year (I even suggested an ugly kernel
exported page similar to VDSO that could be implementation patched for
different uarches), but the bottom line is there isn't anything in place
to provide a similar userspace experience to what the kernel can do, and
that would need to be solved in addition to the ELF/ABI bits.

> So would it make sense to provide an ELF flag / personality thing such
> that userspace can indicate its spectre-safe?
> 
> I realize that this is all future work, because so far auditing for v1
> is a lot of pain (we need better tools), but would it be something that
> makes sense in the longer term?

So I would just caution that doing this isn't necessarily bad, but it's
far more than just ELF bits and rebuilding. Once userspace is rebuilt
with un-nopable retpolines, they're there whether you need them on
$future_hardware or not, and that fancy branch predictor is useless. So
we really need a way to allow for userspace patchable calls, or at least
some kind of plan before everyone runs away with rebuilding.

(unless they're embedded/Gentoo/whatever...have fun in that case)

Jon.

P.S. This is why for certain downstream distros you'll see IBPB use like
prior to this patch - it'll prevent certain attacks that can't be
otherwise mitigated without going and properly solving the tools issue.

Re: [RFC 04/10] x86/mm: Only flush indirect branches when switching into non dumpable process

2018-01-28 Thread Jon Masters

Hi Peter, David, all,

First a quick note on David's earlier comment, about this optimization
being still up for debate. The problem with this optimization as-is is
that it doesn't protect userspace-to-userspace unless applications are
rebuilt and we get the infrastructure to handle that (ELF, whatever).

But...

On 01/21/2018 06:22 AM, Peter Zijlstra wrote:
> On Sat, Jan 20, 2018 at 08:22:55PM +0100, KarimAllah Ahmed wrote:
>> From: Tim Chen 
>>
>> Flush indirect branches when switching into a process that marked
>> itself non dumpable.  This protects high value processes like gpg
>> better, without having too high performance overhead.
> 
> So if I understand it right, this is only needed if the 'other'
> executable itself is susceptible to spectre. If say someone audited gpg
> for spectre-v1 and build it with retpoline, it would be safe to not
> issue the IBPB, right?

More importantly, rebuilding the world introduces a lot of challenges
that need to be discussed heavily before they happen (I would like to
see someone run a session at one of the various upcoming events on
userspace, I've already prodded a few people to nudge that forward). In
particular, we don't have the infrastructure in gcc/glibc to dynamically
patch userspace call sites to enable/disable retpolines.

We discussed nasty hacks last year (I even suggested an ugly kernel
exported page similar to VDSO that could be implementation patched for
different uarches), but the bottom line is there isn't anything in place
to provide a similar userspace experience to what the kernel can do, and
that would need to be solved in addition to the ELF/ABI bits.

> So would it make sense to provide an ELF flag / personality thing such
> that userspace can indicate its spectre-safe?
> 
> I realize that this is all future work, because so far auditing for v1
> is a lot of pain (we need better tools), but would it be something that
> makes sense in the longer term?

So I would just caution that doing this isn't necessarily bad, but it's
far more than just ELF bits and rebuilding. Once userspace is rebuilt
with un-nopable retpolines, they're there whether you need them on
$future_hardware or not, and that fancy branch predictor is useless. So
we really need a way to allow for userspace patchable calls, or at least
some kind of plan before everyone runs away with rebuilding.

(unless they're embedded/Gentoo/whatever...have fun in that case)

Jon.

P.S. This is why for certain downstream distros you'll see IBPB use like
prior to this patch - it'll prevent certain attacks that can't be
otherwise mitigated without going and properly solving the tools issue.

Re: [patch V2 1/2] sysfs/cpu: Add vulnerability folder

2018-01-28 Thread Jon Masters

On 01/07/2018 04:48 PM, Thomas Gleixner wrote:

> +#ifdef CONFIG_GENERIC_CPU_VULNERABILITIES
> +
> +ssize_t __weak cpu_show_meltdown(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "Not affected\n");
> +}
> +
> +ssize_t __weak cpu_show_spectre_v1(struct device *dev,
> +struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "Not affected\n");
> +}
> +
> +ssize_t __weak cpu_show_spectre_v2(struct device *dev,
> +struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "Not affected\n");
> +}

Just wondering aloud (after the merge) here but shouldn't the default be
"unknown", at least for Spectre? It's pervasive enough.

Jon.

Re: [patch V2 1/2] sysfs/cpu: Add vulnerability folder

2018-01-28 Thread Jon Masters

On 01/07/2018 04:48 PM, Thomas Gleixner wrote:

> +#ifdef CONFIG_GENERIC_CPU_VULNERABILITIES
> +
> +ssize_t __weak cpu_show_meltdown(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "Not affected\n");
> +}
> +
> +ssize_t __weak cpu_show_spectre_v1(struct device *dev,
> +struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "Not affected\n");
> +}
> +
> +ssize_t __weak cpu_show_spectre_v2(struct device *dev,
> +struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "Not affected\n");
> +}

Just wondering aloud (after the merge) here but shouldn't the default be
"unknown", at least for Spectre? It's pervasive enough.

Jon.

Re: [PATCH v3 1/2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-22 Thread Jon Masters

On 01/22/2018 06:33 AM, Will Deacon wrote:
> On Fri, Jan 19, 2018 at 04:22:47AM -0800, Jayachandran C wrote:
>> Use PSCI based mitigation for speculative execution attacks targeting
>> the branch predictor. We use the same mechanism as the one used for
>> Cortex-A CPUs, we expect the PSCI version call to have a side effect
>> of clearing the BTBs.
>>
>> Signed-off-by: Jayachandran C 
>> ---
>>  arch/arm64/kernel/cpu_errata.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>> index 70e5f18..45ff9a2 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -338,6 +338,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>>  .capability = ARM64_HARDEN_BP_POST_GUEST_EXIT,
>>  MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR_V1),
>>  },
>> +{
>> +.capability = ARM64_HARDEN_BRANCH_PREDICTOR,
>> +MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN),
>> +.enable = enable_psci_bp_hardening,
>> +},
>> +{
>> +.capability = ARM64_HARDEN_BRANCH_PREDICTOR,
>> +MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2),
>> +.enable = enable_psci_bp_hardening,
>> +},
>>  #endif
> 
> Thanks.
> 
> Acked-by: Will Deacon 

Thanks. I have separately asked for a specification tweak to allow us to
discover whether firmware has been augmented to provide the necessary
support that we need. That applies beyond Cavium.

(for now in RHEL, we've asked the vendors to give us a temporary patch
that we can match DMI or other data later in boot and warn users on)

Jon.

Re: [PATCH v3 1/2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-22 Thread Jon Masters

On 01/22/2018 06:33 AM, Will Deacon wrote:
> On Fri, Jan 19, 2018 at 04:22:47AM -0800, Jayachandran C wrote:
>> Use PSCI based mitigation for speculative execution attacks targeting
>> the branch predictor. We use the same mechanism as the one used for
>> Cortex-A CPUs, we expect the PSCI version call to have a side effect
>> of clearing the BTBs.
>>
>> Signed-off-by: Jayachandran C 
>> ---
>>  arch/arm64/kernel/cpu_errata.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>> index 70e5f18..45ff9a2 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -338,6 +338,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>>  .capability = ARM64_HARDEN_BP_POST_GUEST_EXIT,
>>  MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR_V1),
>>  },
>> +{
>> +.capability = ARM64_HARDEN_BRANCH_PREDICTOR,
>> +MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN),
>> +.enable = enable_psci_bp_hardening,
>> +},
>> +{
>> +.capability = ARM64_HARDEN_BRANCH_PREDICTOR,
>> +MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2),
>> +.enable = enable_psci_bp_hardening,
>> +},
>>  #endif
> 
> Thanks.
> 
> Acked-by: Will Deacon 

Thanks. I have separately asked for a specification tweak to allow us to
discover whether firmware has been augmented to provide the necessary
support that we need. That applies beyond Cavium.

(for now in RHEL, we've asked the vendors to give us a temporary patch
that we can match DMI or other data later in boot and warn users on)

Jon.

Re: [PATCH v3 2/2] arm64: Turn on KPTI only on CPUs that need it

2018-01-22 Thread Jon Masters

On 01/22/2018 06:41 AM, Will Deacon wrote:
> On Fri, Jan 19, 2018 at 04:22:48AM -0800, Jayachandran C wrote:
>> Whitelist Broadcom Vulcan/Cavium ThunderX2 processors in
>> unmap_kernel_at_el0(). These CPUs are not vulnerable to
>> CVE-2017-5754 and do not need KPTI when KASLR is off.
>>
>> Signed-off-by: Jayachandran C 
>> ---
>>  arch/arm64/kernel/cpufeature.c | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 647d44b..fb698ca 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -866,6 +866,13 @@ static bool unmap_kernel_at_el0(const struct 
>> arm64_cpu_capabilities *entry,
>>  if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>>  return true;
>>  
>> +/* Don't force KPTI for CPUs that are not vulnerable */
>> +switch (read_cpuid_id() & MIDR_CPU_MODEL_MASK) {
>> +case MIDR_CAVIUM_THUNDERX2:
>> +case MIDR_BRCM_VULCAN:
>> +return false;
>> +}
>> +
>>  /* Defer to CPU feature registers */
>>  return !cpuid_feature_extract_unsigned_field(pfr0,
>>   ID_AA64PFR0_CSV3_SHIFT);
> 
> We'll need to re-jig this to work properly with big/little because this is
> only called once, but that's ok for now:
> 
> Acked-by: Will Deacon 
> 
> Suzuki has a series reworking much of the cpufeatures code so that we can
> do this properly for 4.17.

Thanks, much appreciated.

Jon.

Re: [PATCH v3 2/2] arm64: Turn on KPTI only on CPUs that need it

2018-01-22 Thread Jon Masters

On 01/22/2018 06:41 AM, Will Deacon wrote:
> On Fri, Jan 19, 2018 at 04:22:48AM -0800, Jayachandran C wrote:
>> Whitelist Broadcom Vulcan/Cavium ThunderX2 processors in
>> unmap_kernel_at_el0(). These CPUs are not vulnerable to
>> CVE-2017-5754 and do not need KPTI when KASLR is off.
>>
>> Signed-off-by: Jayachandran C 
>> ---
>>  arch/arm64/kernel/cpufeature.c | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 647d44b..fb698ca 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -866,6 +866,13 @@ static bool unmap_kernel_at_el0(const struct 
>> arm64_cpu_capabilities *entry,
>>  if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>>  return true;
>>  
>> +/* Don't force KPTI for CPUs that are not vulnerable */
>> +switch (read_cpuid_id() & MIDR_CPU_MODEL_MASK) {
>> +case MIDR_CAVIUM_THUNDERX2:
>> +case MIDR_BRCM_VULCAN:
>> +return false;
>> +}
>> +
>>  /* Defer to CPU feature registers */
>>  return !cpuid_feature_extract_unsigned_field(pfr0,
>>   ID_AA64PFR0_CSV3_SHIFT);
> 
> We'll need to re-jig this to work properly with big/little because this is
> only called once, but that's ok for now:
> 
> Acked-by: Will Deacon 
> 
> Suzuki has a series reworking much of the cpufeatures code so that we can
> do this properly for 4.17.

Thanks, much appreciated.

Jon.

Re: [PATCH v3 1/2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-19 Thread Jon Masters

On 01/19/2018 07:22 AM, Jayachandran C wrote:
> Use PSCI based mitigation for speculative execution attacks targeting
> the branch predictor. We use the same mechanism as the one used for
> Cortex-A CPUs, we expect the PSCI version call to have a side effect
> of clearing the BTBs.
> 
> Signed-off-by: Jayachandran C 
> ---
>  arch/arm64/kernel/cpu_errata.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 70e5f18..45ff9a2 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -338,6 +338,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>   .capability = ARM64_HARDEN_BP_POST_GUEST_EXIT,
>   MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR_V1),
>   },
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN),
> + .enable = enable_psci_bp_hardening,
> + },
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2),
> + .enable = enable_psci_bp_hardening,
> + },
>  #endif
>   {
>   }
> 

Both of these patches seem reasonable to me.

Re: [PATCH v3 1/2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-19 Thread Jon Masters

On 01/19/2018 07:22 AM, Jayachandran C wrote:
> Use PSCI based mitigation for speculative execution attacks targeting
> the branch predictor. We use the same mechanism as the one used for
> Cortex-A CPUs, we expect the PSCI version call to have a side effect
> of clearing the BTBs.
> 
> Signed-off-by: Jayachandran C 
> ---
>  arch/arm64/kernel/cpu_errata.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 70e5f18..45ff9a2 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -338,6 +338,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>   .capability = ARM64_HARDEN_BP_POST_GUEST_EXIT,
>   MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR_V1),
>   },
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN),
> + .enable = enable_psci_bp_hardening,
> + },
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2),
> + .enable = enable_psci_bp_hardening,
> + },
>  #endif
>   {
>   }
> 

Both of these patches seem reasonable to me.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-18 Thread Jon Masters

Hi JC, Will,

On 01/18/2018 06:28 PM, Jayachandran C wrote:
> On Thu, Jan 18, 2018 at 01:27:15PM -0500, Jon Masters wrote:
>> On 01/18/2018 12:56 PM, Jayachandran C wrote:
>>> On Thu, Jan 18, 2018 at 01:53:55PM +, Will Deacon wrote:

>>> I think in this case we have to choose between crashing or giving a false
>>> sense of security when a guest compiled with HARDEN_BRANCH_PREDICTOR is
>>> booted on an hypervisor that does not support hardening. Crashing maybe
>>> a reasonable option.
>>
>> Crashing is a completely unreasonable option and is totally
>> unacceptable. We never do this in enterprise, period.
>>
>> It's reasonable to give an output in dmesg that a system isn't hardened,
>> but it's not reasonable to crash. On x86, we added a new qemu machine
>> type for those guests that would have IBRS exposed, and ask users to
>> switch that on explicitly, but even if they boot the new kernels on
>> unpatched infrastructure, we'll detect the lack of the branch predictor
>> control interface and just log that.
>>
>> The exact same thing should happen on ARM.
> 
> With the current patchset from ARM, there is no way of detecting if the
> hypervisor is hardened or not, to provide the warning.  The only other
> option I have call get version blindly and provide a false sense of
> security.

Agreed that (unless) I'm missing something, the current arm patchset
doesn't have an enumeration mechanism to see if firmware supports the
branch predictor hardening or not. Am I missing something there?

On the three other affected arches we're tracking, there's an
enumeration mechanism. On x86, there's a new set of CPUID bits. On
POWER, there's a new hcall that tells us whether the millicode supports
what we need, and on z there's a new facility code we can test for that
is also passed into VMs. So we need to have a similar enumeration
mechanism on ARM that is passed into guests as well.

> Since both options are bad, I don't have a good solution here. If RedHat
> has a preference here on what would be better, I can go with that.

We need an enumeration mechanism that determines whether the hypervisor
is patched. In the absence of that, blindly calling in and hoping that
the firmware is updated is better than nothing. I'll look to see if
there's a generic upstream solution for enumeration that I've missed (or
that can be added, perhaps a new SMC enumeration mechanism). If there
isn't a short term fix, we'll work with you guys directly to add
something RHEL specific by checking some firmware version somehow.

Jon.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-18 Thread Jon Masters

Hi JC, Will,

On 01/18/2018 06:28 PM, Jayachandran C wrote:
> On Thu, Jan 18, 2018 at 01:27:15PM -0500, Jon Masters wrote:
>> On 01/18/2018 12:56 PM, Jayachandran C wrote:
>>> On Thu, Jan 18, 2018 at 01:53:55PM +, Will Deacon wrote:

>>> I think in this case we have to choose between crashing or giving a false
>>> sense of security when a guest compiled with HARDEN_BRANCH_PREDICTOR is
>>> booted on an hypervisor that does not support hardening. Crashing maybe
>>> a reasonable option.
>>
>> Crashing is a completely unreasonable option and is totally
>> unacceptable. We never do this in enterprise, period.
>>
>> It's reasonable to give an output in dmesg that a system isn't hardened,
>> but it's not reasonable to crash. On x86, we added a new qemu machine
>> type for those guests that would have IBRS exposed, and ask users to
>> switch that on explicitly, but even if they boot the new kernels on
>> unpatched infrastructure, we'll detect the lack of the branch predictor
>> control interface and just log that.
>>
>> The exact same thing should happen on ARM.
> 
> With the current patchset from ARM, there is no way of detecting if the
> hypervisor is hardened or not, to provide the warning.  The only other
> option I have call get version blindly and provide a false sense of
> security.

Agreed that (unless) I'm missing something, the current arm patchset
doesn't have an enumeration mechanism to see if firmware supports the
branch predictor hardening or not. Am I missing something there?

On the three other affected arches we're tracking, there's an
enumeration mechanism. On x86, there's a new set of CPUID bits. On
POWER, there's a new hcall that tells us whether the millicode supports
what we need, and on z there's a new facility code we can test for that
is also passed into VMs. So we need to have a similar enumeration
mechanism on ARM that is passed into guests as well.

> Since both options are bad, I don't have a good solution here. If RedHat
> has a preference here on what would be better, I can go with that.

We need an enumeration mechanism that determines whether the hypervisor
is patched. In the absence of that, blindly calling in and hoping that
the firmware is updated is better than nothing. I'll look to see if
there's a generic upstream solution for enumeration that I've missed (or
that can be added, perhaps a new SMC enumeration mechanism). If there
isn't a short term fix, we'll work with you guys directly to add
something RHEL specific by checking some firmware version somehow.

Jon.

Re: [v2,03/11] arm64: Take into account ID_AA64PFR0_EL1.CSV3

2018-01-18 Thread Jon Masters

On 01/09/2018 05:00 AM, Will Deacon wrote:
> On Mon, Jan 08, 2018 at 08:06:27PM -0800, Jayachandran C wrote:
>> On Mon, Jan 08, 2018 at 05:51:00PM +, Will Deacon wrote:
>>> On Mon, Jan 08, 2018 at 09:40:17AM -0800, Jayachandran C wrote:
 On Mon, Jan 08, 2018 at 09:20:09AM +, Marc Zyngier wrote:
> On 08/01/18 07:24, Jayachandran C wrote:
>> diff --git a/arch/arm64/kernel/cpufeature.c 
>> b/arch/arm64/kernel/cpufeature.c
>> index 19ed09b..202b037 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -862,6 +862,13 @@ static bool unmap_kernel_at_el0(const struct 
>> arm64_cpu_capabilities *entry,
>> return __kpti_forced > 0;
>> }
>>  
>> +   /* Don't force KPTI for CPUs that are not vulnerable */
>> +   switch (read_cpuid_id() & MIDR_CPU_MODEL_MASK) {
>> +   case MIDR_CAVIUM_THUNDERX2:
>> +   case MIDR_BRCM_VULCAN:
>> +   return false;
>> +   }
>> +
>> /* Useful for KASLR robustness */
>> if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>> return true;
>>
>
> KPTI is also an improvement for KASLR. Why would you deprive a user of
> the choice to further secure their system?

 The user has a choice with kpti= at the kernel command line, so we are
 not depriving the user of a choice. KASLR is expected to be enabled by
 distributions, and KPTI will be enabled by default as well.

 On systems that are not vulnerable to variant 3, this is an unnecessary
 overhead.
>>>
>>> KASLR can be bypassed on CPUs that are not vulnerable to variant 3 simply
>>> by timing how long accesses to kernel addresses from EL0 take -- please read
>>> the original KAISER paper for details about that attack on x86. kpti
>>> mitigates that. If you don't care about KASLR, don't enable it (arguably
>>> it's useless without kpti).
>>
>> The code above assumes that all ARM CPUs (now and future) will be vulnerable
>> to timing attacks that can bypass KASLR. I don't think that is a correct
>> assumption to make.
> 
> Well, the code is assuming that the difference between a TLB hit and a miss
> can be measured and that permission faulting entries can be cached in the
> TLB. I think that's a safe assumption for the moment. You can also disable
> kaslr on the command line and at compile-time if you don't want to use it,
> and the same thing applies to kpti. I really see this more as user
> preference, rather than something that should be keyed off the MIDR and we
> already provide those controls via the command line.
> 
> To be clear: I'll take the MIDR whitelisting, but only after the KASLR check
> above.
> 
>> If ThunderX2 is shown to be vulnerable to any timing based attack we can
>> certainly move the MIDR check after the check for the CONFIG_RANDOMIZE_BASE.
>> But I don't think that is the case now, if you have any PoC code to check
>> this I can run on the processor and make the change.
> 
> I haven't tried, but if you have a TLB worth its salt, I suspect you can
> defeat kaslr by timing prefetches or faulting loads to kernel addresses.
> 
>> It is pretty clear that we need a whitelist check either before or after the
>> CONFIG_RANDOMIZE_BASE check.
> 
> Please send a patch implementing this after the check.

JC: what's the plan here from Cavium? I didn't see such a patch (but
might have missed it). I've asked that we follow the same logic as on
x86 within Red Hat and default to disabling (k)pti on hardware known not
to be vulnerable to that explicit attack. Sure, KASLR bypass is "not
good"(TM) and there are ton(ne)s of new ways to do that found all the
time, but the performance hit is non-zero and there is a difference
between breaking randomization vs. leaking cache data, and HPC folks are
among those who are going to come asking why they need to turn off PTI
all over the place. The equivalent would be on x86 where one vendor
always has PTI enabled, the other only if the user explicitly requests
that it be turned on at boot time.

Jon.

Re: [v2,03/11] arm64: Take into account ID_AA64PFR0_EL1.CSV3

2018-01-18 Thread Jon Masters

On 01/09/2018 05:00 AM, Will Deacon wrote:
> On Mon, Jan 08, 2018 at 08:06:27PM -0800, Jayachandran C wrote:
>> On Mon, Jan 08, 2018 at 05:51:00PM +, Will Deacon wrote:
>>> On Mon, Jan 08, 2018 at 09:40:17AM -0800, Jayachandran C wrote:
 On Mon, Jan 08, 2018 at 09:20:09AM +, Marc Zyngier wrote:
> On 08/01/18 07:24, Jayachandran C wrote:
>> diff --git a/arch/arm64/kernel/cpufeature.c 
>> b/arch/arm64/kernel/cpufeature.c
>> index 19ed09b..202b037 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -862,6 +862,13 @@ static bool unmap_kernel_at_el0(const struct 
>> arm64_cpu_capabilities *entry,
>> return __kpti_forced > 0;
>> }
>>  
>> +   /* Don't force KPTI for CPUs that are not vulnerable */
>> +   switch (read_cpuid_id() & MIDR_CPU_MODEL_MASK) {
>> +   case MIDR_CAVIUM_THUNDERX2:
>> +   case MIDR_BRCM_VULCAN:
>> +   return false;
>> +   }
>> +
>> /* Useful for KASLR robustness */
>> if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>> return true;
>>
>
> KPTI is also an improvement for KASLR. Why would you deprive a user of
> the choice to further secure their system?

 The user has a choice with kpti= at the kernel command line, so we are
 not depriving the user of a choice. KASLR is expected to be enabled by
 distributions, and KPTI will be enabled by default as well.

 On systems that are not vulnerable to variant 3, this is an unnecessary
 overhead.
>>>
>>> KASLR can be bypassed on CPUs that are not vulnerable to variant 3 simply
>>> by timing how long accesses to kernel addresses from EL0 take -- please read
>>> the original KAISER paper for details about that attack on x86. kpti
>>> mitigates that. If you don't care about KASLR, don't enable it (arguably
>>> it's useless without kpti).
>>
>> The code above assumes that all ARM CPUs (now and future) will be vulnerable
>> to timing attacks that can bypass KASLR. I don't think that is a correct
>> assumption to make.
> 
> Well, the code is assuming that the difference between a TLB hit and a miss
> can be measured and that permission faulting entries can be cached in the
> TLB. I think that's a safe assumption for the moment. You can also disable
> kaslr on the command line and at compile-time if you don't want to use it,
> and the same thing applies to kpti. I really see this more as user
> preference, rather than something that should be keyed off the MIDR and we
> already provide those controls via the command line.
> 
> To be clear: I'll take the MIDR whitelisting, but only after the KASLR check
> above.
> 
>> If ThunderX2 is shown to be vulnerable to any timing based attack we can
>> certainly move the MIDR check after the check for the CONFIG_RANDOMIZE_BASE.
>> But I don't think that is the case now, if you have any PoC code to check
>> this I can run on the processor and make the change.
> 
> I haven't tried, but if you have a TLB worth its salt, I suspect you can
> defeat kaslr by timing prefetches or faulting loads to kernel addresses.
> 
>> It is pretty clear that we need a whitelist check either before or after the
>> CONFIG_RANDOMIZE_BASE check.
> 
> Please send a patch implementing this after the check.

JC: what's the plan here from Cavium? I didn't see such a patch (but
might have missed it). I've asked that we follow the same logic as on
x86 within Red Hat and default to disabling (k)pti on hardware known not
to be vulnerable to that explicit attack. Sure, KASLR bypass is "not
good"(TM) and there are ton(ne)s of new ways to do that found all the
time, but the performance hit is non-zero and there is a difference
between breaking randomization vs. leaking cache data, and HPC folks are
among those who are going to come asking why they need to turn off PTI
all over the place. The equivalent would be on x86 where one vendor
always has PTI enabled, the other only if the user explicitly requests
that it be turned on at boot time.

Jon.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-18 Thread Jon Masters

On 01/18/2018 12:56 PM, Jayachandran C wrote:
> On Thu, Jan 18, 2018 at 01:53:55PM +, Will Deacon wrote:
>> Hi JC,
>>
>> On Tue, Jan 16, 2018 at 03:45:54PM -0800, Jayachandran C wrote:
>>> On Tue, Jan 16, 2018 at 04:52:53PM -0500, Jon Masters wrote:
>>>> On 01/09/2018 07:47 AM, Jayachandran C wrote:
>>>>
>>>>> Use PSCI based mitigation for speculative execution attacks targeting
>>>>> the branch predictor. The approach is similar to the one used for
>>>>> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
>>>>> test if the firmware supports the capability.
>>>>>
>>>>> If the secure firmware has been updated with the mitigation code to
>>>>> invalidate the branch target buffer, we use the PSCI version call to
>>>>> invoke it.
>>>>
>>>> What's the status of this patch currently? Previously you had suggested
>>>> to hold while the SMC got standardized, but then you seemed happy with
>>>> pulling in. What's the latest?
>>>
>>> My understanding is that the SMC standardization is being worked on
>>> but will take more time, and the KPTI current patchset will go to
>>> mainline before that.
>>>
>>> Given that, I would expect arm64 maintainers to pick up this patch for
>>> ThunderX2, but I have not seen any comments so far.
>>>
>>> Will/Marc, please let me know if you are planning to pick this patch
>>> into the KPTI tree.
>>
>> Are you really sure you want us to apply this? If we do, then you can't run
>> KVM guests anymore because your IMPDEF SMC results in an UNDEF being
>> injected (crash below).
>>
>> I really think that you should just hook up the enable_psci_bp_hardening
>> callback like we've done for the Cortex CPUs. We can optimise this later
>> once the SMC standarisation work has been completed (which is nearly final
>> now and works in a backwards-compatible manner).
> 
> I think Marc's patch here:
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/kpti=d35e77fae4b70331310c3bc1796bb43b93f9a85e
> handles returning for undefined smc calls in guest.
> 
> I think in this case we have to choose between crashing or giving a false
> sense of security when a guest compiled with HARDEN_BRANCH_PREDICTOR is
> booted on an hypervisor that does not support hardening. Crashing maybe
> a reasonable option.

Crashing is a completely unreasonable option and is totally
unacceptable. We never do this in enterprise, period.

It's reasonable to give an output in dmesg that a system isn't hardened,
but it's not reasonable to crash. On x86, we added a new qemu machine
type for those guests that would have IBRS exposed, and ask users to
switch that on explicitly, but even if they boot the new kernels on
unpatched infrastructure, we'll detect the lack of the branch predictor
control interface and just log that.

The exact same thing should happen on ARM.

Jon.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-18 Thread Jon Masters

On 01/18/2018 12:56 PM, Jayachandran C wrote:
> On Thu, Jan 18, 2018 at 01:53:55PM +, Will Deacon wrote:
>> Hi JC,
>>
>> On Tue, Jan 16, 2018 at 03:45:54PM -0800, Jayachandran C wrote:
>>> On Tue, Jan 16, 2018 at 04:52:53PM -0500, Jon Masters wrote:
>>>> On 01/09/2018 07:47 AM, Jayachandran C wrote:
>>>>
>>>>> Use PSCI based mitigation for speculative execution attacks targeting
>>>>> the branch predictor. The approach is similar to the one used for
>>>>> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
>>>>> test if the firmware supports the capability.
>>>>>
>>>>> If the secure firmware has been updated with the mitigation code to
>>>>> invalidate the branch target buffer, we use the PSCI version call to
>>>>> invoke it.
>>>>
>>>> What's the status of this patch currently? Previously you had suggested
>>>> to hold while the SMC got standardized, but then you seemed happy with
>>>> pulling in. What's the latest?
>>>
>>> My understanding is that the SMC standardization is being worked on
>>> but will take more time, and the KPTI current patchset will go to
>>> mainline before that.
>>>
>>> Given that, I would expect arm64 maintainers to pick up this patch for
>>> ThunderX2, but I have not seen any comments so far.
>>>
>>> Will/Marc, please let me know if you are planning to pick this patch
>>> into the KPTI tree.
>>
>> Are you really sure you want us to apply this? If we do, then you can't run
>> KVM guests anymore because your IMPDEF SMC results in an UNDEF being
>> injected (crash below).
>>
>> I really think that you should just hook up the enable_psci_bp_hardening
>> callback like we've done for the Cortex CPUs. We can optimise this later
>> once the SMC standarisation work has been completed (which is nearly final
>> now and works in a backwards-compatible manner).
> 
> I think Marc's patch here:
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/kpti=d35e77fae4b70331310c3bc1796bb43b93f9a85e
> handles returning for undefined smc calls in guest.
> 
> I think in this case we have to choose between crashing or giving a false
> sense of security when a guest compiled with HARDEN_BRANCH_PREDICTOR is
> booted on an hypervisor that does not support hardening. Crashing maybe
> a reasonable option.

Crashing is a completely unreasonable option and is totally
unacceptable. We never do this in enterprise, period.

It's reasonable to give an output in dmesg that a system isn't hardened,
but it's not reasonable to crash. On x86, we added a new qemu machine
type for those guests that would have IBRS exposed, and ask users to
switch that on explicitly, but even if they boot the new kernels on
unpatched infrastructure, we'll detect the lack of the branch predictor
control interface and just log that.

The exact same thing should happen on ARM.

Jon.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-17 Thread Jon Masters

On 01/16/2018 06:45 PM, Jayachandran C wrote:
> On Tue, Jan 16, 2018 at 04:52:53PM -0500, Jon Masters wrote:
>> On 01/09/2018 07:47 AM, Jayachandran C wrote:
>>
>>> Use PSCI based mitigation for speculative execution attacks targeting
>>> the branch predictor. The approach is similar to the one used for
>>> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
>>> test if the firmware supports the capability.
>>>
>>> If the secure firmware has been updated with the mitigation code to
>>> invalidate the branch target buffer, we use the PSCI version call to
>>> invoke it.
>>
>> What's the status of this patch currently? Previously you had suggested
>> to hold while the SMC got standardized, but then you seemed happy with
>> pulling in. What's the latest?
> 
> My understanding is that the SMC standardization is being worked on
> but will take more time, and the KPTI current patchset will go to
> mainline before that.
> 
> Given that, I would expect arm64 maintainers to pick up this patch for
> ThunderX2, but I have not seen any comments so far.
> 
> Will/Marc, please let me know if you are planning to pick this patch
> into the KPTI tree.

We've pulled in mitigations for QCOM Falkor into our internal
development branch (for future releases, this isn't about existing
stuff), but we can't pull in mitigations for other vendors until they're
upstream, and this patch isn't in any tree we track yet.

Therefore, I encourage all of the vendors to get this upstream. Until
that's true, it will be difficult to continue to carry out of tree bits.

Jon.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-17 Thread Jon Masters

On 01/16/2018 06:45 PM, Jayachandran C wrote:
> On Tue, Jan 16, 2018 at 04:52:53PM -0500, Jon Masters wrote:
>> On 01/09/2018 07:47 AM, Jayachandran C wrote:
>>
>>> Use PSCI based mitigation for speculative execution attacks targeting
>>> the branch predictor. The approach is similar to the one used for
>>> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
>>> test if the firmware supports the capability.
>>>
>>> If the secure firmware has been updated with the mitigation code to
>>> invalidate the branch target buffer, we use the PSCI version call to
>>> invoke it.
>>
>> What's the status of this patch currently? Previously you had suggested
>> to hold while the SMC got standardized, but then you seemed happy with
>> pulling in. What's the latest?
> 
> My understanding is that the SMC standardization is being worked on
> but will take more time, and the KPTI current patchset will go to
> mainline before that.
> 
> Given that, I would expect arm64 maintainers to pick up this patch for
> ThunderX2, but I have not seen any comments so far.
> 
> Will/Marc, please let me know if you are planning to pick this patch
> into the KPTI tree.

We've pulled in mitigations for QCOM Falkor into our internal
development branch (for future releases, this isn't about existing
stuff), but we can't pull in mitigations for other vendors until they're
upstream, and this patch isn't in any tree we track yet.

Therefore, I encourage all of the vendors to get this upstream. Until
that's true, it will be difficult to continue to carry out of tree bits.

Jon.

Re: [PATCH 2/6] s390: implement nospec_[load|ptr]

2018-01-17 Thread Jon Masters

On 01/17/2018 07:41 AM, Jiri Kosina wrote:
> On Wed, 17 Jan 2018, Martin Schwidefsky wrote:
> 
>> Implement nospec_load() and nospec_ptr() for s390 with the new
>> gmb() barrier between the boundary condition and the load that
>> may not be done speculatively.

Thanks for the patches, Martin et al. I tested various earlier versions
and will run these latest ones through some tests and add a tested by.

> FWIW the naming seems to be changing constantly. The latest patchset from 
> Dan Williams [1] uses ifence_...().
> 
> [1] 
> lkml.kernel.org/r/151586744180.5820.13215059696964205856.st...@dwillia2-desk3.amr.corp.intel.com

This is getting a little silly. Not to bikeshed this to death, but
obviously gmb (what was that ever supposed to stand for, global?) was
the wrong name. We favored seb (speculative execution barrier), etc.
Still, "ifence"? What is that supposed to mean? That sounds very
architecture specific vs. what we're actually trying to ensure, which is
that we don't speculatively load a pointer.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH 2/6] s390: implement nospec_[load|ptr]

2018-01-17 Thread Jon Masters

On 01/17/2018 07:41 AM, Jiri Kosina wrote:
> On Wed, 17 Jan 2018, Martin Schwidefsky wrote:
> 
>> Implement nospec_load() and nospec_ptr() for s390 with the new
>> gmb() barrier between the boundary condition and the load that
>> may not be done speculatively.

Thanks for the patches, Martin et al. I tested various earlier versions
and will run these latest ones through some tests and add a tested by.

> FWIW the naming seems to be changing constantly. The latest patchset from 
> Dan Williams [1] uses ifence_...().
> 
> [1] 
> lkml.kernel.org/r/151586744180.5820.13215059696964205856.st...@dwillia2-desk3.amr.corp.intel.com

This is getting a little silly. Not to bikeshed this to death, but
obviously gmb (what was that ever supposed to stand for, global?) was
the wrong name. We favored seb (speculative execution barrier), etc.
Still, "ifence"? What is that supposed to mean? That sounds very
architecture specific vs. what we're actually trying to ensure, which is
that we don't speculatively load a pointer.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-16 Thread Jon Masters

On 01/09/2018 07:47 AM, Jayachandran C wrote:

> Use PSCI based mitigation for speculative execution attacks targeting
> the branch predictor. The approach is similar to the one used for
> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
> test if the firmware supports the capability.
> 
> If the secure firmware has been updated with the mitigation code to
> invalidate the branch target buffer, we use the PSCI version call to
> invoke it.

What's the status of this patch currently? Previously you had suggested
to hold while the SMC got standardized, but then you seemed happy with
pulling in. What's the latest?

Jon.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-16 Thread Jon Masters

On 01/09/2018 07:47 AM, Jayachandran C wrote:

> Use PSCI based mitigation for speculative execution attacks targeting
> the branch predictor. The approach is similar to the one used for
> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
> test if the firmware supports the capability.
> 
> If the secure firmware has been updated with the mitigation code to
> invalidate the branch target buffer, we use the PSCI version call to
> invoke it.

What's the status of this patch currently? Previously you had suggested
to hold while the SMC got standardized, but then you seemed happy with
pulling in. What's the latest?

Jon.

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-16 Thread Jon Masters

On 01/09/2018 07:47 AM, Jayachandran C wrote:
> Use PSCI based mitigation for speculative execution attacks targeting
> the branch predictor. The approach is similar to the one used for
> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
> test if the firmware supports the capability.
> 
> If the secure firmware has been updated with the mitigation code to
> invalidate the branch target buffer, we use the PSCI version call to
> invoke it.

What's the status of this patch for TX2? Previously you were holding on
the new SMC number, but I think the consensus was to pull this in now?

Jon.

-- 
Computer Architect

Re: [PATCH v2] arm64: Branch predictor hardening for Cavium ThunderX2

2018-01-16 Thread Jon Masters

On 01/09/2018 07:47 AM, Jayachandran C wrote:
> Use PSCI based mitigation for speculative execution attacks targeting
> the branch predictor. The approach is similar to the one used for
> Cortex-A CPUs, but in case of ThunderX2 we add another SMC call to
> test if the firmware supports the capability.
> 
> If the secure firmware has been updated with the mitigation code to
> invalidate the branch target buffer, we use the PSCI version call to
> invoke it.

What's the status of this patch for TX2? Previously you were holding on
the new SMC number, but I think the consensus was to pull this in now?

Jon.

-- 
Computer Architect

Re: Improve retpoline for Skylake

2018-01-15 Thread Jon Masters

On 01/12/2018 05:03 PM, Henrique de Moraes Holschuh wrote:
> On Fri, 12 Jan 2018, Andi Kleen wrote:
>>> Skylake still loses if it takes an SMI, right? 
>>
>> SMMs are usually rare, especially on servers, and are usually
>> not very predictible, and even if you have
> 
> FWIW, a data point: SMIs can be generated on demand by userspace on
> thinkpad laptops, but they will be triggered from within a kernel
> context.  I very much doubt this is a rare pattern...

Sure. Just touch some "legacy" hardware that the vendor emulates in a
nasty SMI handler. It's definitely not acceptable to assume that SMIs
can't be generated under the control of some malicious user code.

Our numbers on Skylake weren't bad, and there seem to be all kinds of
corner cases, so again, it seems as if IBRS is the safest choice.

Jon.

Re: Improve retpoline for Skylake

2018-01-15 Thread Jon Masters

On 01/12/2018 05:03 PM, Henrique de Moraes Holschuh wrote:
> On Fri, 12 Jan 2018, Andi Kleen wrote:
>>> Skylake still loses if it takes an SMI, right? 
>>
>> SMMs are usually rare, especially on servers, and are usually
>> not very predictible, and even if you have
> 
> FWIW, a data point: SMIs can be generated on demand by userspace on
> thinkpad laptops, but they will be triggered from within a kernel
> context.  I very much doubt this is a rare pattern...

Sure. Just touch some "legacy" hardware that the vendor emulates in a
nasty SMI handler. It's definitely not acceptable to assume that SMIs
can't be generated under the control of some malicious user code.

Our numbers on Skylake weren't bad, and there seem to be all kinds of
corner cases, so again, it seems as if IBRS is the safest choice.

Jon.

Re: [PATCH 09/11] powerpc/64s: Allow control of RFI flush via sysfs

2018-01-09 Thread Jon Masters

On 01/09/2018 03:05 AM, Greg KH wrote:
> On Tue, Jan 09, 2018 at 01:06:23AM -0500, Jon Masters wrote:
>> Knowing that the IBM team was going to post with this sysfs interface,
>> our trees contain the rfi_flush file. I mentioned it to some folks on
>> this end (because we know we don't want to add things in sysfs
>> generally, debugfs is a good substitute, per Andrea, and I raised this
>> with him yesterday as a concern in the backport here) but in the end it
>> seemed reasonable to pull this in because it was what got posted, and as
>> Michael says, it's gone into other distro kernels beyond just ours.
> 
> What distro kernels end up enabling does not really reflect on what we
> end up doing in mainline.  The api for this should NOT be arch-specific
> if at all possible, that way lies madness.  Do you want to write
> userspace tools to handle the 60+ different arch implementations?
> 
> Don't let the fragmentation problems of the period in which no one was
> allowed to talk to each other, result in a unchangable mess, that would
> be insane.

Totally fine :) Just saying we tried to do reasonable things with what
we had. Whatever happens upstream in the end is, of course, what we'll
make sure fits into updates that go into the likes of RHEL.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH 09/11] powerpc/64s: Allow control of RFI flush via sysfs

2018-01-09 Thread Jon Masters

On 01/09/2018 03:05 AM, Greg KH wrote:
> On Tue, Jan 09, 2018 at 01:06:23AM -0500, Jon Masters wrote:
>> Knowing that the IBM team was going to post with this sysfs interface,
>> our trees contain the rfi_flush file. I mentioned it to some folks on
>> this end (because we know we don't want to add things in sysfs
>> generally, debugfs is a good substitute, per Andrea, and I raised this
>> with him yesterday as a concern in the backport here) but in the end it
>> seemed reasonable to pull this in because it was what got posted, and as
>> Michael says, it's gone into other distro kernels beyond just ours.
> 
> What distro kernels end up enabling does not really reflect on what we
> end up doing in mainline.  The api for this should NOT be arch-specific
> if at all possible, that way lies madness.  Do you want to write
> userspace tools to handle the 60+ different arch implementations?
> 
> Don't let the fragmentation problems of the period in which no one was
> allowed to talk to each other, result in a unchangable mess, that would
> be insane.

Totally fine :) Just saying we tried to do reasonable things with what
we had. Whatever happens upstream in the end is, of course, what we'll
make sure fits into updates that go into the likes of RHEL.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH 09/11] powerpc/64s: Allow control of RFI flush via sysfs

2018-01-08 Thread Jon Masters

On 01/08/2018 05:09 PM, Michael Ellerman wrote:
> Thomas Gleixner  writes:
> 
>> On Tue, 9 Jan 2018, Michael Ellerman wrote:
>>
>> Sorry, I wasn't aware about your efforts and did not cc you. I've just
>> queued a more generic sysfs interface for this whole mess:
> 
> No worries.
> 
>> https://lkml.kernel.org/r/20180107214913.096657...@linutronix.de
>>
>> It should be simple to extend for write and it would be great if all
>> affected architectures could share it.
> 
> As you say this has all been a bit of a mess

Indeed. All of us wish this went very differently.

I've been testing various versions of these patches since before the
holidays. For those doing backports to older kernels, a note that the
IBM team added OOL (Out Of Line) exception handlers and reworked all of
that code over the years since older kernels (e.g. 3.10) so you might
get problems on those if you enable the debug entry. I've got notes on
how to backport the OOL exceptions to older kernels if anyone cares.

> and as a result we already have people running kernels with this patch,
> so we don't want to remove the 'rfi_flush' file.

Knowing that the IBM team was going to post with this sysfs interface,
our trees contain the rfi_flush file. I mentioned it to some folks on
this end (because we know we don't want to add things in sysfs
generally, debugfs is a good substitute, per Andrea, and I raised this
with him yesterday as a concern in the backport here) but in the end it
seemed reasonable to pull this in because it was what got posted, and as
Michael says, it's gone into other distro kernels beyond just ours.

> But we will certainly add support on powerpc for the files you have
> created, in addition to 'rfi_flush'.

Thanks,

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH 09/11] powerpc/64s: Allow control of RFI flush via sysfs

2018-01-08 Thread Jon Masters

On 01/08/2018 05:09 PM, Michael Ellerman wrote:
> Thomas Gleixner  writes:
> 
>> On Tue, 9 Jan 2018, Michael Ellerman wrote:
>>
>> Sorry, I wasn't aware about your efforts and did not cc you. I've just
>> queued a more generic sysfs interface for this whole mess:
> 
> No worries.
> 
>> https://lkml.kernel.org/r/20180107214913.096657...@linutronix.de
>>
>> It should be simple to extend for write and it would be great if all
>> affected architectures could share it.
> 
> As you say this has all been a bit of a mess

Indeed. All of us wish this went very differently.

I've been testing various versions of these patches since before the
holidays. For those doing backports to older kernels, a note that the
IBM team added OOL (Out Of Line) exception handlers and reworked all of
that code over the years since older kernels (e.g. 3.10) so you might
get problems on those if you enable the debug entry. I've got notes on
how to backport the OOL exceptions to older kernels if anyone cares.

> and as a result we already have people running kernels with this patch,
> so we don't want to remove the 'rfi_flush' file.

Knowing that the IBM team was going to post with this sysfs interface,
our trees contain the rfi_flush file. I mentioned it to some folks on
this end (because we know we don't want to add things in sysfs
generally, debugfs is a good substitute, per Andrea, and I raised this
with him yesterday as a concern in the backport here) but in the end it
seemed reasonable to pull this in because it was what got posted, and as
Michael says, it's gone into other distro kernels beyond just ours.

> But we will certainly add support on powerpc for the files you have
> created, in addition to 'rfi_flush'.

Thanks,

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v2 07/11] arm64: Add skeleton to harden the branch predictor against aliasing attacks

2018-01-07 Thread Jon Masters

On 01/05/2018 08:12 AM, Will Deacon wrote:

> Aliasing attacks against CPU branch predictors can allow an attacker to
> redirect speculative control flow on some CPUs and potentially divulge
> information from one context to another.
> 
> This patch adds initial skeleton code behind a new Kconfig option to
> enable implementation-specific mitigations against these attacks for
> CPUs that are affected.

Thanks to Qualcomm for the (typically prompt and immediate) followup. As
a reminder to the other usual server suspects (all of whom we've spoken
with about mitigations for this so we know there things coming), I'm
expecting to see your patches for this hit the list within the next 48
hours. You know who you are, and I'll be doing the rounds over the next
24 hours to check your status as to when you'll be posting these.

Thanks,

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v2 07/11] arm64: Add skeleton to harden the branch predictor against aliasing attacks

2018-01-07 Thread Jon Masters

On 01/05/2018 08:12 AM, Will Deacon wrote:

> Aliasing attacks against CPU branch predictors can allow an attacker to
> redirect speculative control flow on some CPUs and potentially divulge
> information from one context to another.
> 
> This patch adds initial skeleton code behind a new Kconfig option to
> enable implementation-specific mitigations against these attacks for
> CPUs that are affected.

Thanks to Qualcomm for the (typically prompt and immediate) followup. As
a reminder to the other usual server suspects (all of whom we've spoken
with about mitigations for this so we know there things coming), I'm
expecting to see your patches for this hit the list within the next 48
hours. You know who you are, and I'll be doing the rounds over the next
24 hours to check your status as to when you'll be posting these.

Thanks,

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

On 01/04/2018 07:54 PM, Thomas Gleixner wrote:
> On Thu, 4 Jan 2018, Jon Masters wrote:
>> P.S. I've an internal document where I've been tracking "nice to haves"
>> for later, and one of them is whether it makes sense to tag binaries as
>> "trusted" (e.g. extended attribute, label, whatever). It was something I
>> wanted to bring up at some point as potentially worth considering.
> 
> Scratch that. There is no such thing as a trusted binary.

I agree with your sentiment, but for those mitigations that carry a
significant performance overhead (for example IBRS at the moment, and on
some other architectures where we might not end up with retpolines)
there /could/ be some value in leaving them on by default but allowing a
sysadmin to decide to trust a given application/container and accept the
risk. Sure, it's selectively weakened security, I get that. I am not
necessarily advocating this, just suggesting it be discussed.

[ I also totally get that you can extend variant 2 to have any
application that interacts with another abuse it (even over a pipe or a
socket, etc. provided they share the same cache and take untrusted data
that can lead to some kind of load within a speculation window), and
there are a ton of ways to still cause an attack in that case. ]

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

On 01/04/2018 07:54 PM, Thomas Gleixner wrote:
> On Thu, 4 Jan 2018, Jon Masters wrote:
>> P.S. I've an internal document where I've been tracking "nice to haves"
>> for later, and one of them is whether it makes sense to tag binaries as
>> "trusted" (e.g. extended attribute, label, whatever). It was something I
>> wanted to bring up at some point as potentially worth considering.
> 
> Scratch that. There is no such thing as a trusted binary.

I agree with your sentiment, but for those mitigations that carry a
significant performance overhead (for example IBRS at the moment, and on
some other architectures where we might not end up with retpolines)
there /could/ be some value in leaving them on by default but allowing a
sysadmin to decide to trust a given application/container and accept the
risk. Sure, it's selectively weakened security, I get that. I am not
necessarily advocating this, just suggesting it be discussed.

[ I also totally get that you can extend variant 2 to have any
application that interacts with another abuse it (even over a pipe or a
socket, etc. provided they share the same cache and take untrusted data
that can lead to some kind of load within a speculation window), and
there are a ton of ways to still cause an attack in that case. ]

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

On 01/04/2018 02:57 PM, Jon Masters wrote:
> + Jeff Law, Nick Clifton
> 
> On 01/04/2018 03:20 AM, Woodhouse, David wrote:
>> On Thu, 2018-01-04 at 03:11 +0100, Paolo Bonzini wrote:
>>> On 04/01/2018 02:59, Alan Cox wrote:
>>>>> But then, exactly because the retpoline approach adds quite some cruft
>>>>> and leaves something to be desired, why even bother?
>>>>
>>>> Performance
>>>
>>> Dunno.  If I care about mitigating this threat, I wouldn't stop at
>>> retpolines even if the full solution has pretty bad performance (it's
>>> roughly in the same ballpark as PTI).  But if I don't care, I wouldn't
>>> want retpolines either, since they do introduce a small slowdown (10-20
>>> cycles per indirect branch, meaning that after a thousand such papercuts
>>> they become slower than the full solution).
>>>
>>> A couple manually written asm retpolines may be good as mitigation to
>>> block the simplest PoCs (Linus may disagree), but patching the compiler,
>>> getting alternatives right, etc. will take a while.  The only redeeming
>>> grace of retpolines is that they don't require a microcode update, but
>>> the microcode will be out there long before these patches are included
>>> and trickle down to distros...  I just don't see the point in starting
>>> from retpolines or drawing the line there.
>>
>> No, really. The full mitigation with the microcode update and IBRS
>> support is *slow*. Horribly slow.
> 
> It is horribly slow, though the story changes with CPU generation as
> others noted (and what needs disabling in the microcode). We did various
> analysis of the retpoline patches, including benchmarks, and we decided
> that the fastest and safest approach for Tue^W yesterday was to use the
> new MSRs. Especially in light of the corner cases we would need to
> address for an empty RSB, etc. I'm adding Jeff Law because he and the
> tools team have done analysis on this and he may have thoughts.
> 
> There's also a cross-architecture concern here in that different
> solutions are needed across architectures. Retpolines are not endorsed
> or recommended by every architecture vendor at this time. It's important
> to make sure the necessary cross-vendor discussion happens now that it
> can happen in the open.
> 
> Longer term, it'll be good to see BTBs tagged using the full address
> space (including any address space IDs...) in future silicon.

P.S. I've an internal document where I've been tracking "nice to haves"
for later, and one of them is whether it makes sense to tag binaries as
"trusted" (e.g. extended attribute, label, whatever). It was something I
wanted to bring up at some point as potentially worth considering.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

On 01/04/2018 02:57 PM, Jon Masters wrote:
> + Jeff Law, Nick Clifton
> 
> On 01/04/2018 03:20 AM, Woodhouse, David wrote:
>> On Thu, 2018-01-04 at 03:11 +0100, Paolo Bonzini wrote:
>>> On 04/01/2018 02:59, Alan Cox wrote:
>>>>> But then, exactly because the retpoline approach adds quite some cruft
>>>>> and leaves something to be desired, why even bother?
>>>>
>>>> Performance
>>>
>>> Dunno.  If I care about mitigating this threat, I wouldn't stop at
>>> retpolines even if the full solution has pretty bad performance (it's
>>> roughly in the same ballpark as PTI).  But if I don't care, I wouldn't
>>> want retpolines either, since they do introduce a small slowdown (10-20
>>> cycles per indirect branch, meaning that after a thousand such papercuts
>>> they become slower than the full solution).
>>>
>>> A couple manually written asm retpolines may be good as mitigation to
>>> block the simplest PoCs (Linus may disagree), but patching the compiler,
>>> getting alternatives right, etc. will take a while.  The only redeeming
>>> grace of retpolines is that they don't require a microcode update, but
>>> the microcode will be out there long before these patches are included
>>> and trickle down to distros...  I just don't see the point in starting
>>> from retpolines or drawing the line there.
>>
>> No, really. The full mitigation with the microcode update and IBRS
>> support is *slow*. Horribly slow.
> 
> It is horribly slow, though the story changes with CPU generation as
> others noted (and what needs disabling in the microcode). We did various
> analysis of the retpoline patches, including benchmarks, and we decided
> that the fastest and safest approach for Tue^W yesterday was to use the
> new MSRs. Especially in light of the corner cases we would need to
> address for an empty RSB, etc. I'm adding Jeff Law because he and the
> tools team have done analysis on this and he may have thoughts.
> 
> There's also a cross-architecture concern here in that different
> solutions are needed across architectures. Retpolines are not endorsed
> or recommended by every architecture vendor at this time. It's important
> to make sure the necessary cross-vendor discussion happens now that it
> can happen in the open.
> 
> Longer term, it'll be good to see BTBs tagged using the full address
> space (including any address space IDs...) in future silicon.

P.S. I've an internal document where I've been tracking "nice to haves"
for later, and one of them is whether it makes sense to tag binaries as
"trusted" (e.g. extended attribute, label, whatever). It was something I
wanted to bring up at some point as potentially worth considering.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

On 01/04/2018 01:33 PM, Linus Torvalds wrote:
> On Thu, Jan 4, 2018 at 3:26 AM, Pavel Machek  wrote:
>> On Wed 2018-01-03 15:51:35, Linus Torvalds wrote:
>>>
>>> A *competent* CPU engineer would fix this by making sure speculation
>>> doesn't happen across protection domains. Maybe even a L1 I$ that is
>>> keyed by CPL.
>>
>> Would that be enough?
> 
> No, you'd need to add the CPL to the branch target buffer itself, not the I$ 
> L1.
> 
> And as somebody pointed out, that only helps the user space messing
> with the kernel. It doesn't help the "one user context fools another
> user context to mispredict". (Where the user contexts might be a
> JIT'ed JS vs the rest of the web browser).
> 
> So you really would want to just make sure the full address is used to
> index (or at least verify) the BTB lookup, and even then you'd then
> need to invalidate the BTB on context switches so that one context
> can't fill in data for another context.

IMO the correct hardware fix is to index the BTB using the full VA
including the ASID/PCID. And guarantee (as is the case) that there is
not a live conflict between address space identifiers with entries.

The sad thing is that even the latest academic courses recommend
"optimizing" branch predictors with a few low order bits (e.g. 31 in
Intel's case, various others for different vendors). The fix for variant
3 is similarly not that difficult in new hardware: don't allow the
speculated load to happen by enforcing the permission check at the right
time. The last several editions of Computer Architecture spell this out
in Appendix B (page 37 or thereabouts).

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

On 01/04/2018 01:33 PM, Linus Torvalds wrote:
> On Thu, Jan 4, 2018 at 3:26 AM, Pavel Machek  wrote:
>> On Wed 2018-01-03 15:51:35, Linus Torvalds wrote:
>>>
>>> A *competent* CPU engineer would fix this by making sure speculation
>>> doesn't happen across protection domains. Maybe even a L1 I$ that is
>>> keyed by CPL.
>>
>> Would that be enough?
> 
> No, you'd need to add the CPL to the branch target buffer itself, not the I$ 
> L1.
> 
> And as somebody pointed out, that only helps the user space messing
> with the kernel. It doesn't help the "one user context fools another
> user context to mispredict". (Where the user contexts might be a
> JIT'ed JS vs the rest of the web browser).
> 
> So you really would want to just make sure the full address is used to
> index (or at least verify) the BTB lookup, and even then you'd then
> need to invalidate the BTB on context switches so that one context
> can't fill in data for another context.

IMO the correct hardware fix is to index the BTB using the full VA
including the ASID/PCID. And guarantee (as is the case) that there is
not a live conflict between address space identifiers with entries.

The sad thing is that even the latest academic courses recommend
"optimizing" branch predictors with a few low order bits (e.g. 31 in
Intel's case, various others for different vendors). The fix for variant
3 is similarly not that difficult in new hardware: don't allow the
speculated load to happen by enforcing the permission check at the right
time. The last several editions of Computer Architecture spell this out
in Appendix B (page 37 or thereabouts).

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

+ Jeff Law, Nick Clifton

On 01/04/2018 03:20 AM, Woodhouse, David wrote:
> On Thu, 2018-01-04 at 03:11 +0100, Paolo Bonzini wrote:
>> On 04/01/2018 02:59, Alan Cox wrote:
 But then, exactly because the retpoline approach adds quite some cruft
 and leaves something to be desired, why even bother?
>>>
>>> Performance
>>
>> Dunno.  If I care about mitigating this threat, I wouldn't stop at
>> retpolines even if the full solution has pretty bad performance (it's
>> roughly in the same ballpark as PTI).  But if I don't care, I wouldn't
>> want retpolines either, since they do introduce a small slowdown (10-20
>> cycles per indirect branch, meaning that after a thousand such papercuts
>> they become slower than the full solution).
>>
>> A couple manually written asm retpolines may be good as mitigation to
>> block the simplest PoCs (Linus may disagree), but patching the compiler,
>> getting alternatives right, etc. will take a while.  The only redeeming
>> grace of retpolines is that they don't require a microcode update, but
>> the microcode will be out there long before these patches are included
>> and trickle down to distros...  I just don't see the point in starting
>> from retpolines or drawing the line there.
> 
> No, really. The full mitigation with the microcode update and IBRS
> support is *slow*. Horribly slow.

It is horribly slow, though the story changes with CPU generation as
others noted (and what needs disabling in the microcode). We did various
analysis of the retpoline patches, including benchmarks, and we decided
that the fastest and safest approach for Tue^W yesterday was to use the
new MSRs. Especially in light of the corner cases we would need to
address for an empty RSB, etc. I'm adding Jeff Law because he and the
tools team have done analysis on this and he may have thoughts.

There's also a cross-architecture concern here in that different
solutions are needed across architectures. Retpolines are not endorsed
or recommended by every architecture vendor at this time. It's important
to make sure the necessary cross-vendor discussion happens now that it
can happen in the open.

Longer term, it'll be good to see BTBs tagged using the full address
space (including any address space IDs...) in future silicon.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Jon Masters

+ Jeff Law, Nick Clifton

On 01/04/2018 03:20 AM, Woodhouse, David wrote:
> On Thu, 2018-01-04 at 03:11 +0100, Paolo Bonzini wrote:
>> On 04/01/2018 02:59, Alan Cox wrote:
 But then, exactly because the retpoline approach adds quite some cruft
 and leaves something to be desired, why even bother?
>>>
>>> Performance
>>
>> Dunno.  If I care about mitigating this threat, I wouldn't stop at
>> retpolines even if the full solution has pretty bad performance (it's
>> roughly in the same ballpark as PTI).  But if I don't care, I wouldn't
>> want retpolines either, since they do introduce a small slowdown (10-20
>> cycles per indirect branch, meaning that after a thousand such papercuts
>> they become slower than the full solution).
>>
>> A couple manually written asm retpolines may be good as mitigation to
>> block the simplest PoCs (Linus may disagree), but patching the compiler,
>> getting alternatives right, etc. will take a while.  The only redeeming
>> grace of retpolines is that they don't require a microcode update, but
>> the microcode will be out there long before these patches are included
>> and trickle down to distros...  I just don't see the point in starting
>> from retpolines or drawing the line there.
> 
> No, really. The full mitigation with the microcode update and IBRS
> support is *slow*. Horribly slow.

It is horribly slow, though the story changes with CPU generation as
others noted (and what needs disabling in the microcode). We did various
analysis of the retpoline patches, including benchmarks, and we decided
that the fastest and safest approach for Tue^W yesterday was to use the
new MSRs. Especially in light of the corner cases we would need to
address for an empty RSB, etc. I'm adding Jeff Law because he and the
tools team have done analysis on this and he may have thoughts.

There's also a cross-architecture concern here in that different
solutions are needed across architectures. Retpolines are not endorsed
or recommended by every architecture vendor at this time. It's important
to make sure the necessary cross-vendor discussion happens now that it
can happen in the open.

Longer term, it'll be good to see BTBs tagged using the full address
space (including any address space IDs...) in future silicon.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH 0/2] acpi, x86: Add SPCR table support

2017-12-10 Thread Jon Masters

On 12/08/2017 09:29 AM, Prarit Bhargava wrote:

> If I disable "Serial Port Console Debug" in my BIOS I still see the SPCR 
> configured:
> 
> [root@prarit-lab ~]# dmesg | grep SPCR
> [0.00] ACPI: SPCR 0x69031000 50 (v01
>   )
> 
> AFAICT the SPCR is always enabled on some systems.

It's part of some of the Windows design flows that it gets incorporated,
and it should always be present.

Jon (not talking about AArch64/ARM/arm/ARMv8-A/64-bit ARM/Go ARM Go).

Re: [PATCH 0/2] acpi, x86: Add SPCR table support

2017-12-10 Thread Jon Masters

On 12/08/2017 09:29 AM, Prarit Bhargava wrote:

> If I disable "Serial Port Console Debug" in my BIOS I still see the SPCR 
> configured:
> 
> [root@prarit-lab ~]# dmesg | grep SPCR
> [0.00] ACPI: SPCR 0x69031000 50 (v01
>   )
> 
> AFAICT the SPCR is always enabled on some systems.

It's part of some of the Windows design flows that it gets incorporated,
and it should always be present.

Jon (not talking about AArch64/ARM/arm/ARMv8-A/64-bit ARM/Go ARM Go).

Re: [PATCH] x86/microcode/AMD: Add support for fam17h microcode loading

2017-12-10 Thread Jon Masters

On 11/30/2017 05:46 PM, Tom Lendacky wrote:
> The size for the Microcode Patch Block (MPB) for an AMD family 17h
> processor is 3200 bytes.  Add a #define for fam17h so that it does
> not default to 2048 bytes and fail a microcode load/update.
> 
> Cc: <sta...@vger.kernel.org> # 4.1.x
> Signed-off-by: Tom Lendacky <thomas.lenda...@amd.com>

Tested-by: Jon Masters <j...@redhat.com>

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH] x86/microcode/AMD: Add support for fam17h microcode loading

2017-12-10 Thread Jon Masters

On 11/30/2017 05:46 PM, Tom Lendacky wrote:
> The size for the Microcode Patch Block (MPB) for an AMD family 17h
> processor is 3200 bytes.  Add a #define for fam17h so that it does
> not default to 2048 bytes and fail a microcode load/update.
> 
> Cc:  # 4.1.x
> Signed-off-by: Tom Lendacky 

Tested-by: Jon Masters 

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v4 00/12] arm+arm64: vdso unification to lib/vdso/

2017-10-31 Thread Jon Masters

On 10/31/2017 02:30 PM, Mark Salyzyn wrote:
> Take an effort to recode the arm64 vdso code from assembler to C
> previously submitted by Andrew Pinski , rework
> it for use in both arm and arm64, overlapping any optimizations
> for each architecture. But instead of landing it in arm64, land the
> result into lib/vdso and unify both implementations to simplify
> future maintenance. This will act as the basis for implementing
> arm64 vdso32 in the future.

In the original patch series, our QE folks found a problem that lead to
its revertion from our internal trees. I've pinged them to check this
latest version and followup if we see the same failures now.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v4 00/12] arm+arm64: vdso unification to lib/vdso/

2017-10-31 Thread Jon Masters

On 10/31/2017 02:30 PM, Mark Salyzyn wrote:
> Take an effort to recode the arm64 vdso code from assembler to C
> previously submitted by Andrew Pinski , rework
> it for use in both arm and arm64, overlapping any optimizations
> for each architecture. But instead of landing it in arm64, land the
> result into lib/vdso and unify both implementations to simplify
> future maintenance. This will act as the basis for implementing
> arm64 vdso32 in the future.

In the original patch series, our QE folks found a problem that lead to
its revertion from our internal trees. I've pinged them to check this
latest version and followup if we see the same failures now.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v3 0/7] Support PPTT for ARM64

2017-10-31 Thread Jon Masters

On 10/12/2017 03:48 PM, Jeremy Linton wrote:

> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topology. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
> 
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
> to the physical id as the remainder of the kernel expects.

Just wanted to thank you for doing this Jeremy. As you know, we're
tracking these patches and working with multiple vendors to ensure that
firmware has accurate PPTTs populated to match. We're expecting to pull
these patches and replace our current RHEL-only kludge asap. RHEL
currently has to kludge topology based upon magic "known" meanings of
the MPIDRs on various server platforms. It's (known to be) ugly and is
one of the reasons that we pushed for what became PPTT.

Beyond scheduler efficiency, in general, it's very important that Arm
systems can correctly report x86 style topology industry conventions -
especially sockets - since (and I told Arm this years ago, and other
non-Linux vendors backed me up) it's typical on server platforms to use
either "memory" or "number of sockets" when making licensing and
subscription calculations in various tooling. This became a problem
early on even with X-Gene1 and Seattle showing as 8 socket boxes ;)

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH v3 0/7] Support PPTT for ARM64

2017-10-31 Thread Jon Masters

On 10/12/2017 03:48 PM, Jeremy Linton wrote:

> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topology. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
> 
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
> to the physical id as the remainder of the kernel expects.

Just wanted to thank you for doing this Jeremy. As you know, we're
tracking these patches and working with multiple vendors to ensure that
firmware has accurate PPTTs populated to match. We're expecting to pull
these patches and replace our current RHEL-only kludge asap. RHEL
currently has to kludge topology based upon magic "known" meanings of
the MPIDRs on various server platforms. It's (known to be) ugly and is
one of the reasons that we pushed for what became PPTT.

Beyond scheduler efficiency, in general, it's very important that Arm
systems can correctly report x86 style topology industry conventions -
especially sockets - since (and I told Arm this years ago, and other
non-Linux vendors backed me up) it's typical on server platforms to use
either "memory" or "number of sockets" when making licensing and
subscription calculations in various tooling. This became a problem
early on even with X-Gene1 and Seattle showing as 8 socket boxes ;)

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH] ahci: Add support for Cavium's fifth generation SATA controller

2017-10-31 Thread Jon Masters

On 10/17/2017 02:58 AM, Christoph Hellwig wrote:
> On Tue, Oct 10, 2017 at 10:37:51PM -0700, Radha Mohan Chintakuntla wrote:
>> From: Radha Mohan Chintakuntla 
>>
>> This patch adds support for Cavium's fifth generation SATA controller.
>> It is an on-chip controller and complies with AHCI 1.3.1. As the
>> controller uses 64-bit addresses it cannot use the standard AHCI BAR5
>> and so uses BAR4.
> 
> Looks like it isn't actually AHCI 1.3.1 compliant after all then :)

I've asked various folks to followup with Intel to see if the AHCI
specification can be fixed to handle the case in which a 64-bit ABAR is
required. That should be something they'd be interested in for x86 too.

Jon.

Re: [PATCH] ahci: Add support for Cavium's fifth generation SATA controller

2017-10-31 Thread Jon Masters

On 10/17/2017 02:58 AM, Christoph Hellwig wrote:
> On Tue, Oct 10, 2017 at 10:37:51PM -0700, Radha Mohan Chintakuntla wrote:
>> From: Radha Mohan Chintakuntla 
>>
>> This patch adds support for Cavium's fifth generation SATA controller.
>> It is an on-chip controller and complies with AHCI 1.3.1. As the
>> controller uses 64-bit addresses it cannot use the standard AHCI BAR5
>> and so uses BAR4.
> 
> Looks like it isn't actually AHCI 1.3.1 compliant after all then :)

I've asked various folks to followup with Intel to see if the AHCI
specification can be fixed to handle the case in which a 64-bit ABAR is
required. That should be something they'd be interested in for x86 too.

Jon.

Cleaning up non-standard PCIe ECAM on Arm servers

2017-10-31 Thread Jon Masters

On 10/06/2017 12:39 PM, Ard Biesheuvel wrote:
> Some implementations of the Synopsys DesignWare PCIe controller implement
> a so-called ECAM shift mode, which allows a static memory window to be
> configured that covers the configuration space of the entire bus range.

Side note that we gave a presentation at Arm TechCon last week with
Cadence about a new program they're offering to perform verification of
PCIe pre-silicon using Palladium with speedbridges and running full
server Operating Systems booting using UEFI/ACPI under emulation. We've
been able to boot RHEL for Arm on these Palladium based platforms for a
while and are collaborating to turn this into a comprehensive program.

Once that was Cadence effort was announced, I pinged Synopsys to ask
them to go clean things up properly for their IP as well. Ultimately
we'll get to all the major IP vendors, and a number of PCIe specific
vendors have already had prodding from me directly over the years. So if
folks see this thread, are in the business of selling PCIe RC IP for Arm
server designs, and we haven't spoken yet, you should ping me. And you
should also talk with smart folks like Ard on how to do this right.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Cleaning up non-standard PCIe ECAM on Arm servers

2017-10-31 Thread Jon Masters

On 10/06/2017 12:39 PM, Ard Biesheuvel wrote:
> Some implementations of the Synopsys DesignWare PCIe controller implement
> a so-called ECAM shift mode, which allows a static memory window to be
> configured that covers the configuration space of the entire bus range.

Side note that we gave a presentation at Arm TechCon last week with
Cadence about a new program they're offering to perform verification of
PCIe pre-silicon using Palladium with speedbridges and running full
server Operating Systems booting using UEFI/ACPI under emulation. We've
been able to boot RHEL for Arm on these Palladium based platforms for a
while and are collaborating to turn this into a comprehensive program.

Once that was Cadence effort was announced, I pinged Synopsys to ask
them to go clean things up properly for their IP as well. Ultimately
we'll get to all the major IP vendors, and a number of PCIe specific
vendors have already had prodding from me directly over the years. So if
folks see this thread, are in the business of selling PCIe RC IP for Arm
server designs, and we haven't spoken yet, you should ping me. And you
should also talk with smart folks like Ard on how to do this right.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

Re: [PATCH 0/3] arm64: cpuinfo: make /proc/cpuinfo more human-readable

2017-10-20 Thread Jon Masters

On 10/20/2017 01:24 PM, Jon Masters wrote:

> 1). The first thing people do when they get an Arm server is to cat
> /proc/cpuinfo. They then come complaining that it's not like x86. They
> can't get the output their looking for and this results in bug filing,
> and countless hours on phone calls discussing over and over again.
> Worse, there are some parts of the stack that really need this.

Within 6 hours of sending this, I get a ping about this week's "Works On
Arm" newsletter and...people reporting bugs with not getting CPU
capabilities in /proc/cpuinfo. This madness is going to end. Soon.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

1 2 3 4 5 6 7 8 9 >

1 - 100 of 816 matches

Mail list logo