Re: [PATCH RESEND v10 00/10] Application Data Integrity feature introduced by SPARC M7

2017-11-16 Thread Anthony Yznaga

> On Nov 16, 2017, at 6:38 AM, Khalid Aziz <khalid.a...@oracle.com> wrote:
> 
> Changelog v10:
> 
>   - Patch 1/10: Updated si_codes definitions for SEGV to match 4.14
>   - Patch 2/10: No changes
>   - Patch 3/10: Updated copyright
>   - Patch 4/10: No changes
>   - Patch 5/10: No changes
>   - Patch 6/10: Updated copyright
>   - Patch 7/10: No changes
>   - Patch 8/10: No changes
>   - Patch 9/10: No changes
>   - Patch 10/10: Added code to return from kernel path to set
> PSTATE.mcde if kernel continues execution in another thread
> (Suggested by Anthony)

Looks good, Khalid.  Thanks for making the changes.

For the entire series:

Reviewed-by: Anthony Yznaga <anthony.yzn...@oracle.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 00/10] Application Data Integrity feature introduced by SPARC M7

2017-11-08 Thread Anthony Yznaga

> On Oct 20, 2017, at 9:57 AM, Khalid Aziz  wrote:
> 
> Patch 9/10
>  When a processor supports additional metadata on memory pages, that
>  additional metadata needs to be copied to new memory pages when those
>  pages are moved. This patch allows architecture specific code to
>  replace the default copy_highpage() routine with arch specific
>  version that copies the metadata as well besides the data on the page.
> 
> Patch 10/10
>  This patch adds support for a user space task to enable ADI and enable
>  tag checking for subsets of its address space. As part of enabling
>  this feature, this patch adds to support manipulation of precise
>  exception for memory corruption detection, adds code to save and
>  restore tags on page swap and migration, and adds code to handle ADI
>  tagged addresses for DMA.
> 
> Changelog v9:
> 
>   - Patch 1/10: No changes
>   - Patch 2/10: No changes
>   - Patch 3/10: No changes
>   - Patch 4/10: No changes
>   - Patch 5/10: No changes
>   - Patch 6/10: No changes
>   - Patch 7/10: No changes
>   - Patch 8/10: No changes
>   - Patch 9/10: New patch
>   - Patch 10/10: Patch 9 from v8. Added code to copy ADI tags when
> pages are migrated. Updated code to detect overflow and underflow
> of addresses when allocating tag storage.

Patch 09/10 wasn't delivered to me, but I reviewed the copy on lkml.org.

The changes looks good, but there is one remaining functional issue
which I've pointed out twice now in previous comments that still has not
been addressed:

The code paths through rtrap that overwrite PSTATE need to also set
PSTATE.mcde=1 since additional kernel work done after PSTATE is
overwritten could access ADI-enabled user memory and depend on version
checking being enabled.  For example, rtrap may call SCHEDULE_USER and
resume execution in another thread.  Without a fix, the resumed thread
will run with PSTATE.mcde=0 until it completes a return to user mode or
is rescheduled on a CPU where PSTATE.mcde is set.  If the thread
accesses ADI-enabled user memory with a versioned address (e.g. to
complete some I/O) in that timeframe then the access will fail.

Here is what you need to fix it:

diff --git a/arch/sparc/kernel/rtrap_64.S b/arch/sparc/kernel/rtrap_64.S
index dff86fa..07c82a7 100644
--- a/arch/sparc/kernel/rtrap_64.S
+++ b/arch/sparc/kernel/rtrap_64.S
@@ -24,13 +24,21 @@
.align  32
 __handle_preemption:
callSCHEDULE_USER
-wrpr   %g0, RTRAP_PSTATE, %pstate
+661:wrpr   %g0, RTRAP_PSTATE, %pstate
+   .section.sun_m7_1insn_patch, "ax"
+   .word   661b
+   wrpr%g0, RTRAP_PSTATE | PSTATE_MCDE, %pstate
+   .previous
ba,pt   %xcc, __handle_preemption_continue
 wrpr   %g0, RTRAP_PSTATE_IRQOFF, %pstate
 
 __handle_user_windows:
callfault_in_user_windows
-wrpr   %g0, RTRAP_PSTATE, %pstate
+661:wrpr   %g0, RTRAP_PSTATE, %pstate
+   .section.sun_m7_1insn_patch, "ax"
+   .word   661b
+   wrpr%g0, RTRAP_PSTATE | PSTATE_MCDE, %pstate
+   .previous
ba,pt   %xcc, __handle_preemption_continue
 wrpr   %g0, RTRAP_PSTATE_IRQOFF, %pstate
 
@@ -47,7 +55,11 @@ __handle_signal:
add %sp, PTREGS_OFF, %o0
mov %l0, %o2
calldo_notify_resume
-wrpr   %g0, RTRAP_PSTATE, %pstate
+661:wrpr   %g0, RTRAP_PSTATE, %pstate
+   .section.sun_m7_1insn_patch, "ax"
+   .word   661b
+   wrpr%g0, RTRAP_PSTATE | PSTATE_MCDE, %pstate
+   .previous
wrpr%g0, RTRAP_PSTATE_IRQOFF, %pstate
 
/* Signal delivery can modify pt_regs tstate, so we must
diff --git a/arch/sparc/kernel/urtt_fill.S b/arch/sparc/kernel/urtt_fill.S
index 364af32..3a7f2d8 100644
--- a/arch/sparc/kernel/urtt_fill.S
+++ b/arch/sparc/kernel/urtt_fill.S
@@ -49,7 +49,11 @@ user_rtt_fill_fixup_common:
SET_GL(0)
.previous
 
-   wrpr%g0, RTRAP_PSTATE, %pstate
+661:   wrpr%g0, RTRAP_PSTATE, %pstate
+   .section.sun_m7_1insn_patch, "ax"
+   .word   661b
+   wrpr%g0, RTRAP_PSTATE | PSTATE_MCDE, %pstate
+   .previous
 
mov %l1, %g6
ldx [%g6 + 

Re: [PATCH v8 9/9] sparc64: Add support for ADI (Application Data Integrity)

2017-10-13 Thread Anthony Yznaga

> On Oct 13, 2017, at 9:18 AM, Khalid Aziz <khalid.a...@oracle.com> wrote:
> 
> On 10/13/2017 08:14 AM, Khalid Aziz wrote:
>> On 10/12/2017 02:27 PM, Anthony Yznaga wrote:
>>> 
>>>> On Oct 12, 2017, at 7:44 AM, Khalid Aziz <khalid.a...@oracle.com> wrote:
>>>> 
>>>> 
>>>> On 10/06/2017 04:12 PM, Anthony Yznaga wrote:
>>>>>> On Sep 25, 2017, at 9:49 AM, Khalid Aziz <khalid.a...@oracle.com> wrote:
>>>>>> 
>>>>>> This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
>>>>>> MCD (Memory Corruption Detection) on selected memory ranges, enable
>>>>>> TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
>>>>>> version tags on page swap out/in or migration. ADI is not enabled by
>>>>> I still don't believe migration is properly supported.  Your
>>>>> implementation is relying on a fault happening on a page while its
>>>>> migration is in progress so that do_swap_page() will be called, but
>>>>> I don't see how do_swap_page() will be called if a fault does not
>>>>> happen until after the migration has completed.
>>>> 
>>>> User pages are on LRU list and for the mapped pages on LRU list, 
>>>> migrate_pages() ultimately calls try_to_unmap_one and makes a migration 
>>>> swap entry for the page being migrated. This forces a page fault upon 
>>>> access on the destination node and the page is swapped back in from swap 
>>>> cache. The fault is forced by the migration swap entry, rather than fault 
>>>> being an accidental event. If page fault happens on the destination node 
>>>> while migration is in progress, do_swap_page() waits until migration is 
>>>> done. Please take a look at the code in __unmap_and_move().
>>> 
>>> I looked at the code again, and I now believe ADI tags are never restored 
>>> for migrated pages.  Here's why:
>>> 
>> I will take a look at it again. I have run extensive tests migrating pages 
>> of a process across multiple NUMA nodes over and over again and ADI tags 
>> were never lost, so this does work. I won't rule out the possibility of 
>> having missed a code path where tags are not restored and I will look for it.
> 
> Anthony,
> 
> I just ran my migration test again which:
> 
> - malloc's 16 GB of memory
> - Assigns a rotating ADI tag every 64 bytes to the malloc'd buffer
> - Writes a pattern to the entire buffer
> - Verifies the pattern it wrote using ADI tagged addresses.

The verification will appear to succeed if the tags have been cleared.

To be complete the test should also manually verify that the in-memory tag 
values remain non-zero after migration.  migrate_page_copy() will call 
copy_huge_page() or copy_highpage() which will result in the tags being cleared 
at the destination because the stores will be done to kernel physical mapping 
VAs using block initializing stores.

Anthony

> 
> While this test was running, I had a script migrate test program pages across 
> two NUMA nodes every 30 seconds using migratepages command. I did not see an 
> ADI tag mismatch over multiple runs of this test. This test shows migration 
> is working.
> 
> Can you give me a test that shows the failure you think we should see and I 
> will debug it.
> 
> Thanks,
> Khalid
> 
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 9/9] sparc64: Add support for ADI (Application Data Integrity)

2017-10-12 Thread Anthony Yznaga

> On Oct 12, 2017, at 7:44 AM, Khalid Aziz <khalid.a...@oracle.com> wrote:
> 
> Hi Anthony,
> 
> Please quote only the relevant parts of the patch with comments. That makes 
> it much easier to find the comments.

Okay.

> 
> On 10/06/2017 04:12 PM, Anthony Yznaga wrote:
>>> On Sep 25, 2017, at 9:49 AM, Khalid Aziz <khalid.a...@oracle.com> wrote:
>>> 
>>> This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
>>> MCD (Memory Corruption Detection) on selected memory ranges, enable
>>> TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
>>> version tags on page swap out/in or migration. ADI is not enabled by
>> I still don't believe migration is properly supported.  Your
>> implementation is relying on a fault happening on a page while its
>> migration is in progress so that do_swap_page() will be called, but
>> I don't see how do_swap_page() will be called if a fault does not
>> happen until after the migration has completed.
> 
> User pages are on LRU list and for the mapped pages on LRU list, 
> migrate_pages() ultimately calls try_to_unmap_one and makes a migration swap 
> entry for the page being migrated. This forces a page fault upon access on 
> the destination node and the page is swapped back in from swap cache. The 
> fault is forced by the migration swap entry, rather than fault being an 
> accidental event. If page fault happens on the destination node while 
> migration is in progress, do_swap_page() waits until migration is done. 
> Please take a look at the code in __unmap_and_move().

I looked at the code again, and I now believe ADI tags are never restored for 
migrated pages.  Here's why:

A successful call to try_to_unmap() by __unmap_and_move() will have unmapped 
the page, replaced the pte with a migration pte, and saved the ADI tags.

If an access to the unmapped VA range is attempted while the migration pte is 
in place, handle_pte_fault() will call do_swap_page() because the page present 
flag is not set in the pte.  do_swap_page() will see that the pte is a 
migration pte and call migration_entry_wait() where it will block until the 
migration pte is removed.  do_swap_page() will then return so that the fault is 
retried.

remove_migration_pte() replaces the migration pte with a regular pte.  The 
regular pte will have the page present flag set.  Whether due to a retry or 
not, the next fault on the VA range will therefore not call do_swap_page() and 
the tags will not be restored.

> 
> 
>>> +#define finish_arch_post_lock_switch   finish_arch_post_lock_switch
>>> +static inline void finish_arch_post_lock_switch(void)
>>> +{
>>> +   /* Restore the state of MCDPER register for the new process
>>> +* just switched to.
>>> +*/
>>> +   if (adi_capable()) {
>>> +   register unsigned long tmp_mcdper;
>>> +
>>> +   tmp_mcdper = test_thread_flag(TIF_MCDPER);
>>> +   __asm__ __volatile__(
>>> +   "mov %0, %%g1\n\t"
>>> +   ".word 0x9d81\n\t"  /* wr %g0, %g1, %mcdper" */
>>> +   ".word 0xaf902001\n\t"  /* wrpr %g0, 1, %pmcdper */
>>> +   :
>>> +   : "ir" (tmp_mcdper)
>>> +   : "g1");
>>> +   if (current && current->mm && current->mm->context.adi) {
>>> +   struct pt_regs *regs;
>>> +
>>> +   regs = task_pt_regs(current);
>>> +   regs->tstate |= TSTATE_MCDE;
>> This works, but it costs additional cycles on every context switch to
>> keep setting TSTATE_MCDE.  PSTATE.mcde=1 only affects loads and stores
>> to memory mapped with TTE.mcd=1 so there is no impact if it is set and
>> no memory is mapped with TTE.mcd=1.  That is why I suggested just
>> setting TSTATE_MCDE once when a process thread is initialized.
> 
> This change was suggested by David Miller. I believe there is merit to that 
> suggestion.

I'm not saying it's without merit.  I just wanted to point out that the 
solution adds a bit of additional work to every context switch and that it's 
possible to avoid it.  I'm fine if David still prefers his solution.


> 
>>> +   /* Tag storage has not been allocated for this vma and space
>>> +* is available in tag storage descriptor. Since this page is
>>> +* being swapped out, there is high probability subsequent pages
>>> +* in the VMA will be swapped out as well. Allocate pages to
>>> +* store tags for as many pages in this vma as p

Re: [PATCH v8 9/9] sparc64: Add support for ADI (Application Data Integrity)

2017-10-06 Thread Anthony Yznaga

> On Sep 25, 2017, at 9:49 AM, Khalid Aziz <khalid.a...@oracle.com> wrote:
> 
> ADI is a new feature supported on SPARC M7 and newer processors to allow
> hardware to catch rogue accesses to memory. ADI is supported for data
> fetches only and not instruction fetches. An app can enable ADI on its
> data pages, set version tags on them and use versioned addresses to
> access the data pages. Upper bits of the address contain the version
> tag. On M7 processors, upper four bits (bits 63-60) contain the version
> tag. If a rogue app attempts to access ADI enabled data pages, its
> access is blocked and processor generates an exception. Please see
> Documentation/sparc/adi.txt for further details.
> 
> This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
> MCD (Memory Corruption Detection) on selected memory ranges, enable
> TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
> version tags on page swap out/in or migration. ADI is not enabled by

I still don't believe migration is properly supported.  Your
implementation is relying on a fault happening on a page while its
migration is in progress so that do_swap_page() will be called, but
I don't see how do_swap_page() will be called if a fault does not
happen until after the migration has completed.


> default for any task. A task must explicitly enable ADI on a memory
> range and set version tag for ADI to be effective for the task.
> 
> Signed-off-by: Khalid Aziz <khalid.a...@oracle.com>
> Cc: Khalid Aziz <kha...@gonehiking.org>
> ---
> v8: 
>   - Added note to doc about non-faulting loads not triggering
> ADI tag mismatch and more details on special tag values
> of 0x0 and 0xf, as suggested by Anthony Yznaga)
>   - Added an IPI on mprotect(...PROT_ADI...) call to set
> TSTATE.MCDE on threads running on other processors and
> restore of TSTATE.MCDE on context switch (suggested by
> David Miller)
>   - Removed restriction on enabling ADI on read-only memory
> (suggested by Anthony Yznaga)
>   - Changed kzalloc() for tag storage to use GFP_NOWAIT
>   - Added code to handle overflow and underflow when allocating
> tag storage, as suggested by Anthony Yznaga
>   - Replaced sun_m7_patch_1insn_range() with sun4v_patch_1insn_range()
> which is functionally identical (suggested by Anthony Yznaga)
>   - Added membar after restoring ADI tags in copy_user_highpage(),
> as suggested by David Miller
> 
> v7:
>   - Enhanced arch_validate_prot() to enable ADI only on writable
> addresses backed by physical RAM
>   - Added support for saving/restoring ADI tags for each ADI
> block size address range on a page on swap in/out
>   - Added code to copy ADI tags on COW
>   - Updated values for auxiliary vectors to not conflict with
> values on other architectures to avoid conflict in glibc. glibc
> consolidates all auxiliary vectors into its headers and
> duplicate values in consolidated header are problematic
>   - Disable same page merging on ADI enabled pages since ADI tags
> may not match on pages with identical data
>   - Broke the patch up further into smaller patches
> 
> v6:
>   - Eliminated instructions to read and write PSTATE as well as
> MCDPER and PMCDPER on every access to userspace addresses
> by setting PSTATE and PMCDPER correctly upon entry into
> kernel. PSTATE.mcde and PMCDPER are set upon entry into
> kernel when running on an M7 processor. PSTATE.mcde being
> set only affects memory accesses that have TTE.mcd set.
> PMCDPER being set only affects writes to memory addresses
> that have TTE.mcd set. This ensures any faults caused by
> ADI tag mismatch on a write are exposed before kernel returns
> to userspace.
> 
> v5:
>   - Fixed indentation issues and instrcuctions in assembly code
>   - Removed CONFIG_SPARC64 from mdesc.c
>   - Changed to maintain state of MCDPER register in thread info
> flags as opposed to in mm context. MCDPER is a per-thread
> state and belongs in thread info flag as opposed to mm context
> which is shared across threads. Added comments to clarify this
> is a lazily maintained state and must be updated on context
> switch and copy_process()
>   - Updated code to use the new arch_do_swap_page() and
> arch_unmap_one() functions
> 
> v4:
>   - Broke patch up into smaller patches
> 
> v3:
>   - Removed CONFIG_SPARC_ADI
>   - Replaced prctl commands with mprotect
>   - Added auxiliary vectors for ADI paramet

Re: [PATCH v7 9/9] sparc64: Add support for ADI (Application Data Integrity)

2017-08-31 Thread Anthony Yznaga
Hi Khalid,

> On Aug 30, 2017, at 3:27 PM, Khalid Aziz <khalid.a...@oracle.com> wrote:
> 
> Hi Anthony,
> 
> Thanks for taking the time to provide feedback. My comments inline below.
> 
> On 08/25/2017 04:31 PM, Anthony Yznaga wrote:
>>> On Aug 9, 2017, at 2:26 PM, Khalid Aziz <khalid.a...@oracle.com> wrote:
>>> ..deleted..
>>> +provided by the hypervisor to the kernel.  Kernel returns the value of
>>> +ADI block size to userspace using auxiliary vector along with other ADI
>>> +info. Following auxiliary vectors are provided by the kernel:
>>> +
>>> +   AT_ADI_BLKSZADI block size. This is the granularity and
>>> +   alignment, in bytes, of ADI versioning.
>>> +   AT_ADI_NBITSNumber of ADI version bits in the VA
>> The previous patch series also defined AT_ADI_UEONADI.  Why was that
>> removed?
> 
> This was based upon a conversation we had when you mentioned future 
> processors may not implement this or change the way this is interpreted and 
> any applications depending upon this value would break at that point. I 
> removed it to eliminate building an unreliable dependency. If I misunderstood 
> what you said, please let me know.

On M7 there is an array of versions maintained for cachelines in the L2
cache. If a UE is detected in this array it results in the flush of all
eight ways of the array.  Clean lines go away, but dirty lines are
written back to memory with the version forced to 0xE.  The ue-on-adp MD
property communicates this tag value that may result from a UE in order
to give the guest the opportunity to avoid using the tag value.  An
application that intentionally used ADI in a way that relied on ADI
exceptions for its functionality may not want to have to consider
whether the mismatch was legitimate or due to a UE.

On M8 the HW implementation is changed and a tag value will never be
forced to another value.  That said, I think the ue-on-adp property
value was unfortunately inadvertently carried forward to M8.

It could probably be argued that the likelihood of seeing the UE is so
low that SW can ignore the possibility, but including the information
in an auxvec shouldn't break anything.


> 
>>> +
>>> +
>>> +IMPORTANT NOTES:
>>> +
>>> +- Version tag values of 0x0 and 0xf are reserved.
>> The documentation should probably state more specifically that an
>> in-memory tag value of 0x0 or 0xf is treated as "match all" by the HW
>> meaning that a mismatch exception will never be generated regardless
>> of the tag bits set in the VA accessing the memory.
> 
> Will do.
> 
>>> +
>>> +- Version tags are set on virtual addresses from userspace even though
>>> +  tags are stored in physical memory. Tags are set on a physical page
>>> +  after it has been allocated to a task and a pte has been created for
>>> +  it.
>>> +
>>> +- When a task frees a memory page it had set version tags on, the page
>>> +  goes back to free page pool. When this page is re-allocated to a task,
>>> +  kernel clears the page using block initialization ASI which clears the
>>> +  version tags as well for the page. If a page allocated to a task is
>>> +  freed and allocated back to the same task, old version tags set by the
>>> +  task on that page will no longer be present.
>> The specifics should be included here, too, so someone doesn't have
>> to guess what's going on if they make changes and the tags are no longer
>> cleared.  The HW clears the tag for a cacheline for block initializing
>> stores to 64-byte aligned addresses if PSTATE.mcde=0 or TTE.mcd=0.
>> PSTATE.mce is set when executing in the kernel, but pages are cleared
>> using kernel physical mapping VAs which are mapped with TTE.mcd=0.
>> Another HW behavior that should be mentioned is that tag mismatches
>> are not detected for non-faulting loads.
> 
> Sure, I can add that.
> 
>>> +
>>> +- Kernel does not set any tags for user pages and it is entirely a
>>> +  task's responsibility to set any version tags. Kernel does ensure the
>>> +  version tags are preserved if a page is swapped out to the disk and
>>> +  swapped back in. It also preserves that version tags if a page is
>>> +  migrated.
>> I only have a cursory understanding of how page migration works, but
>> I could not see how the tags would be preserved if a page were migrated.
>> I figured the place to copy the tags would be migrate_page_copy(), but
>> I don't see changes there.
> 
> For migrating user pages, the way I understand the code works is if the page 
> is mapped (which is th