Re: [PATCH v10 20/20] x86/efi: EFI stub DRTM launch support for Secure Launch

2024-08-29 Thread Ard Biesheuvel
On Thu, 29 Aug 2024 at 15:24, Jonathan McDowell  wrote:
>
> On Wed, Aug 28, 2024 at 01:19:16PM -0700, ross.philip...@oracle.com wrote:
> > On 8/28/24 10:14 AM, Ard Biesheuvel wrote:
> > > On Wed, 28 Aug 2024 at 19:09, kernel test robot  wrote:
> > > >
> > > > Hi Ross,
> > > >
> > > > kernel test robot noticed the following build warnings:
> > > >
> > > > [auto build test WARNING on tip/x86/core]
> > > > [also build test WARNING on char-misc/char-misc-testing 
> > > > char-misc/char-misc-next char-misc/char-misc-linus 
> > > > herbert-cryptodev-2.6/master efi/next linus/master v6.11-rc5]
> > > > [cannot apply to herbert-crypto-2.6/master next-20240828]
> > > > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > > > And when submitting patch, we suggest to use '--base' as documented in
> > > > https://urldefense.com/v3/__https://git-scm.com/docs/git-format-patch*_base_tree_information__;Iw!!ACWV5N9M2RV99hQ!KhkZK77BXRIR4F24tKkUeIlIrdqXtUW2vcnDV74c_5BmrQBQaQ4FqcDKKv9LB3HQUocTGkrmIxuz-LAC$
> > > >  ]
> > > >
> > > > url:
> > > > https://urldefense.com/v3/__https://github.com/intel-lab-lkp/linux/commits/Ross-Philipson/Documentation-x86-Secure-Launch-kernel-documentation/20240827-065225__;!!ACWV5N9M2RV99hQ!KhkZK77BXRIR4F24tKkUeIlIrdqXtUW2vcnDV74c_5BmrQBQaQ4FqcDKKv9LB3HQUocTGkrmI7Z6SQKy$
> > > > base:   tip/x86/core
> > > > patch link:
> > > > https://urldefense.com/v3/__https://lore.kernel.org/r/20240826223835.3928819-21-ross.philipson*40oracle.com__;JQ!!ACWV5N9M2RV99hQ!KhkZK77BXRIR4F24tKkUeIlIrdqXtUW2vcnDV74c_5BmrQBQaQ4FqcDKKv9LB3HQUocTGkrmIzWfs1XZ$
> > > > patch subject: [PATCH v10 20/20] x86/efi: EFI stub DRTM launch support 
> > > > for Secure Launch
> > > > config: i386-randconfig-062-20240828 
> > > > (https://urldefense.com/v3/__https://download.01.org/0day-ci/archive/20240829/202408290030.febuhhbr-...@intel.com/config__;!!ACWV5N9M2RV99hQ!KhkZK77BXRIR4F24tKkUeIlIrdqXtUW2vcnDV74c_5BmrQBQaQ4FqcDKKv9LB3HQUocTGkrmIwkYG0TY$
> > > >  )
> > >
> > >
> > > This is a i386 32-bit build, which makes no sense: this stuff should
> > > just declare 'depends on 64BIT'
> >
> > Our config entry already has 'depends on X86_64' which in turn depends on
> > 64BIT. I would think that would be enough. Do you think it needs an explicit
> > 'depends on 64BIT' in our entry as well?
>
> The error is in x86-stub.c, which is pre-existing and compiled for 32
> bit as well, so you need more than a "depends" here.
>

Ugh, that means this is my fault - apologies. Replacing the #ifdef
with IS_ENABLED() makes the code visible to the 32-bit compiler, even
though the code is disregarded.

I'd still prefer IS_ENABLED(), but this would require the code in
question to live in a separate compilation unit (which depends on
CONFIG_SECURE_LAUNCH). If that is too fiddly, feel free to bring back
the #ifdef CONFIG_SECURE_LAUNCH here (and retain my R-b)



Re: [PATCH v10 20/20] x86/efi: EFI stub DRTM launch support for Secure Launch

2024-08-28 Thread Ard Biesheuvel
On Wed, 28 Aug 2024 at 19:09, kernel test robot  wrote:
>
> Hi Ross,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on tip/x86/core]
> [also build test WARNING on char-misc/char-misc-testing 
> char-misc/char-misc-next char-misc/char-misc-linus 
> herbert-cryptodev-2.6/master efi/next linus/master v6.11-rc5]
> [cannot apply to herbert-crypto-2.6/master next-20240828]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:
> https://github.com/intel-lab-lkp/linux/commits/Ross-Philipson/Documentation-x86-Secure-Launch-kernel-documentation/20240827-065225
> base:   tip/x86/core
> patch link:
> https://lore.kernel.org/r/20240826223835.3928819-21-ross.philipson%40oracle.com
> patch subject: [PATCH v10 20/20] x86/efi: EFI stub DRTM launch support for 
> Secure Launch
> config: i386-randconfig-062-20240828 
> (https://download.01.org/0day-ci/archive/20240829/202408290030.febuhhbr-...@intel.com/config)


This is a i386 32-bit build, which makes no sense: this stuff should
just declare 'depends on 64BIT'


> compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 
> 617a15a9eac96088ae5e9134248d8236e34b91b1)
> reproduce (this is a W=1 build): 
> (https://download.01.org/0day-ci/archive/20240829/202408290030.febuhhbr-...@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version 
> of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot 
> | Closes: 
> https://lore.kernel.org/oe-kbuild-all/202408290030.febuhhbr-...@intel.com/
>
> sparse warnings: (new ones prefixed by >>)
> >> drivers/firmware/efi/libstub/x86-stub.c:945:41: sparse: sparse: non 
> >> size-preserving pointer to integer cast
>drivers/firmware/efi/libstub/x86-stub.c:953:65: sparse: sparse: non 
> size-preserving pointer to integer cast
> >> drivers/firmware/efi/libstub/x86-stub.c:980:70: sparse: sparse: non 
> >> size-preserving integer to pointer cast
>drivers/firmware/efi/libstub/x86-stub.c:1014:45: sparse: sparse: non 
> size-preserving integer to pointer cast
>
> vim +945 drivers/firmware/efi/libstub/x86-stub.c
>
>927
>928  static bool efi_secure_launch_update_boot_params(struct slr_table 
> *slrt,
>929   struct boot_params 
> *boot_params)
>930  {
>931  struct slr_entry_intel_info *txt_info;
>932  struct slr_entry_policy *policy;
>933  struct txt_os_mle_data *os_mle;
>934  bool updated = false;
>935  int i;
>936
>937  txt_info = slr_next_entry_by_tag(slrt, NULL, 
> SLR_ENTRY_INTEL_INFO);
>938  if (!txt_info)
>939  return false;
>940
>941  os_mle = txt_os_mle_data_start((void *)txt_info->txt_heap);
>942  if (!os_mle)
>943  return false;
>944
>  > 945  os_mle->boot_params_addr = (u64)boot_params;
>946
>947  policy = slr_next_entry_by_tag(slrt, NULL, 
> SLR_ENTRY_ENTRY_POLICY);
>948  if (!policy)
>949  return false;
>950
>951  for (i = 0; i < policy->nr_entries; i++) {
>952  if (policy->policy_entries[i].entity_type == 
> SLR_ET_BOOT_PARAMS) {
>953  policy->policy_entries[i].entity = 
> (u64)boot_params;
>954  updated = true;
>955  break;
>956  }
>957  }
>958
>959  /*
>960   * If this is a PE entry into EFI stub the mocked up boot 
> params will
>961   * be missing some of the setup header data needed for the 
> second stage
>962   * of the Secure Launch boot.
>963   */
>964  if (image) {
>965  struct setup_header *hdr = (struct setup_header 
> *)((u8 *)image->image_base +
>966  offsetof(struct 
> boot_params, hdr));
>967  u64 cmdline_ptr;
>968
>969  boot_params->hdr.setup_sects = hdr->setup_sects;
>970  boot_params->hdr.syssize = hdr->syssize;
>971  boot_params->hdr.version = hdr->version;
>972  boot_params->hdr.loadflags = hdr->loadflags;
>973  boot_params->hdr.kernel_alignment = 
> hdr->kernel_alignment;
>974  boot_params->hdr.min_alignment = hdr->min_alignment;
>975  boot_params->hdr.xloadflags = hdr->xloadflags;
>976  boot_params->hdr.init_size = hdr->init_size;
>977  boot_params->hdr.kernel_info_offset = 
> hdr->kernel_info_offset;
>978  efi_set_u64_fo

Re: [PATCH v10 08/20] x86/boot: Place TXT MLE header in the kernel_info section

2024-08-27 Thread Ard Biesheuvel
On Tue, 27 Aug 2024 at 00:42, Ross Philipson  wrote:
>
> The MLE (measured launch environment) header must be locatable by the
> boot loader and TXT must be setup to do a launch with this header's
> location. While the offset to the kernel_info structure does not need
> to be at a fixed offset, the offsets in the header must be relative
> offsets from the start of the setup kernel. The support in the linker
> file achieves this.
>
> Signed-off-by: Ross Philipson 
> Suggested-by: Ard Biesheuvel 

Reviewed-by: Ard Biesheuvel 

> ---
>  arch/x86/boot/compressed/kernel_info.S | 50 +++---
>  arch/x86/boot/compressed/vmlinux.lds.S |  7 
>  2 files changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/kernel_info.S 
> b/arch/x86/boot/compressed/kernel_info.S
> index f818ee8fba38..a0604a0d1756 100644
> --- a/arch/x86/boot/compressed/kernel_info.S
> +++ b/arch/x86/boot/compressed/kernel_info.S
> @@ -1,12 +1,20 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>
> +#include 
>  #include 
>
> -   .section ".rodata.kernel_info", "a"
> +/*
> + * The kernel_info structure is not placed at a fixed offest in the
> + * kernel image. So this macro and the support in the linker file
> + * allow the relative offsets for the MLE header within the kernel
> + * image to be configured at build time.
> + */
> +#define roffset(X) ((X) - kernel_info)
>
> -   .global kernel_info
> +   .section ".rodata.kernel_info", "a"
>
> -kernel_info:
> +   .balign 16
> +SYM_DATA_START(kernel_info)
> /* Header, Linux top (structure). */
> .ascii  "LToP"
> /* Size. */
> @@ -17,6 +25,40 @@ kernel_info:
> /* Maximal allowed type for setup_data and setup_indirect structs. */
> .long   SETUP_TYPE_MAX
>
> +   /* Offset to the MLE header structure */
> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> +   .long   roffset(mle_header_offset)
> +#else
> +   .long   0
> +#endif
> +
>  kernel_info_var_len_data:
> /* Empty for time being... */
> -kernel_info_end:
> +SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
> +
> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> +   /*
> +* The MLE Header per the TXT Specification, section 2.1
> +* MLE capabilities, see table 4. Capabilities set:
> +* bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
> +* bit 1: Support for RLP wakeup using MONITOR address
> +* bit 2: The ECX register will contain the pointer to the MLE page 
> table
> +* bit 5: TPM 1.2 family: Details/authorities PCR usage support
> +* bit 9: Supported format of TPM 2.0 event log - TCG compliant
> +*/
> +SYM_DATA_START(mle_header)
> +   .long   0x9082ac5a  /* UUID0 */
> +   .long   0x74a7476f  /* UUID1 */
> +   .long   0xa2555c0f  /* UUID2 */
> +   .long   0x42b651cb  /* UUID3 */
> +   .long   0x0034  /* MLE header size */
> +   .long   0x00020002  /* MLE version 2.2 */
> +   .long   roffset(sl_stub_entry_offset) /* Linear entry point of MLE 
> (virt. address) */
> +   .long   0x  /* First valid page of MLE */
> +   .long   0x  /* Offset within binary of first byte of MLE */
> +   .long   roffset(_edata_offset) /* Offset within binary of last byte + 
> 1 of MLE */
> +   .long   0x0227  /* Bit vector of MLE-supported capabilities */
> +   .long   0x  /* Starting linear address of command line 
> (unused) */
> +   .long   0x  /* Ending linear address of command line (unused) 
> */
> +SYM_DATA_END(mle_header)
> +#endif
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
> b/arch/x86/boot/compressed/vmlinux.lds.S
> index 083ec6d7722a..f82184801462 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -118,3 +118,10 @@ SECTIONS
> }
> ASSERT(SIZEOF(.rela.dyn) == 0, "Unexpected run-time relocations 
> (.rela) detected!")
>  }
> +
> +#ifdef CONFIG_SECURE_LAUNCH
> +PROVIDE(kernel_info_offset  = ABSOLUTE(kernel_info - startup_32));
> +PROVIDE(mle_header_offset   = kernel_info_offset + ABSOLUTE(mle_header - 
> startup_32));
> +PROVIDE(sl_stub_entry_offset= kernel_info_offset + 
> ABSOLUTE(sl_stub_entry - startup_32));
> +PROVIDE(_edata_offset   = kernel_info_offset + ABSOLUTE(_edata - 
> startup_32));
> +#endif
> --
> 2.39.3
>



Re: [PATCH v10 20/20] x86/efi: EFI stub DRTM launch support for Secure Launch

2024-08-27 Thread Ard Biesheuvel
On Tue, 27 Aug 2024 at 00:44, Ross Philipson  wrote:
>
> This support allows the DRTM launch to be initiated after an EFI stub
> launch of the Linux kernel is done. This is accomplished by providing
> a handler to jump to when a Secure Launch is in progress. This has to be
> called after the EFI stub does Exit Boot Services.
>
> Signed-off-by: Ross Philipson 

Reviewed-by: Ard Biesheuvel 

> ---
>  drivers/firmware/efi/libstub/efistub.h  |  8 ++
>  drivers/firmware/efi/libstub/x86-stub.c | 98 +
>  2 files changed, 106 insertions(+)
>
> diff --git a/drivers/firmware/efi/libstub/efistub.h 
> b/drivers/firmware/efi/libstub/efistub.h
> index d33ccbc4a2c6..baf42d6d0796 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -135,6 +135,14 @@ void efi_set_u64_split(u64 data, u32 *lo, u32 *hi)
> *hi = upper_32_bits(data);
>  }
>
> +static inline
> +void efi_set_u64_form(u32 lo, u32 hi, u64 *data)
> +{
> +   u64 upper = hi;
> +
> +   *data = lo | upper << 32;
> +}
> +
>  /*
>   * Allocation types for calls to boottime->allocate_pages.
>   */
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c 
> b/drivers/firmware/efi/libstub/x86-stub.c
> index f8e465da344d..04786c1b3b5d 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -9,6 +9,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  #include 
>  #include 
> @@ -923,6 +925,99 @@ static efi_status_t efi_decompress_kernel(unsigned long 
> *kernel_entry)
> return efi_adjust_memory_range_protection(addr, kernel_text_size);
>  }
>
> +static bool efi_secure_launch_update_boot_params(struct slr_table *slrt,
> +struct boot_params 
> *boot_params)
> +{
> +   struct slr_entry_intel_info *txt_info;
> +   struct slr_entry_policy *policy;
> +   struct txt_os_mle_data *os_mle;
> +   bool updated = false;
> +   int i;
> +
> +   txt_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_INTEL_INFO);
> +   if (!txt_info)
> +   return false;
> +
> +   os_mle = txt_os_mle_data_start((void *)txt_info->txt_heap);
> +   if (!os_mle)
> +   return false;
> +
> +   os_mle->boot_params_addr = (u64)boot_params;
> +
> +   policy = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_ENTRY_POLICY);
> +   if (!policy)
> +   return false;
> +
> +   for (i = 0; i < policy->nr_entries; i++) {
> +   if (policy->policy_entries[i].entity_type == 
> SLR_ET_BOOT_PARAMS) {
> +   policy->policy_entries[i].entity = (u64)boot_params;
> +   updated = true;
> +   break;
> +   }
> +   }
> +
> +   /*
> +* If this is a PE entry into EFI stub the mocked up boot params will
> +* be missing some of the setup header data needed for the second 
> stage
> +* of the Secure Launch boot.
> +*/
> +   if (image) {
> +   struct setup_header *hdr = (struct setup_header *)((u8 
> *)image->image_base +
> +   offsetof(struct boot_params, 
> hdr));
> +   u64 cmdline_ptr;
> +
> +   boot_params->hdr.setup_sects = hdr->setup_sects;
> +   boot_params->hdr.syssize = hdr->syssize;
> +   boot_params->hdr.version = hdr->version;
> +   boot_params->hdr.loadflags = hdr->loadflags;
> +   boot_params->hdr.kernel_alignment = hdr->kernel_alignment;
> +   boot_params->hdr.min_alignment = hdr->min_alignment;
> +   boot_params->hdr.xloadflags = hdr->xloadflags;
> +   boot_params->hdr.init_size = hdr->init_size;
> +   boot_params->hdr.kernel_info_offset = hdr->kernel_info_offset;
> +   efi_set_u64_form(boot_params->hdr.cmd_line_ptr, 
> boot_params->ext_cmd_line_ptr,
> +&cmdline_ptr);
> +   boot_params->hdr.cmdline_size = strlen((const char 
> *)cmdline_ptr);
> +   }
> +
> +   return updated;
> +}
> +
> +static void efi_secure_launch(struct boot_params *boot_params)
> +{
> +   struct slr_entry_dl_info *dlinfo;
> +   efi_guid_t guid = SLR_TABLE_GUID;
> +   dl_handler_func handler_callback;
> +   struct slr_table *slrt;
> +
> +   if (!IS_ENABLED(CONFIG_SECURE_LAUNCH))
> +   return;
&

Re: [PATCH v9 08/19] x86: Secure Launch kernel early boot stub

2024-06-04 Thread Ard Biesheuvel
On Tue, 4 Jun 2024 at 19:34,  wrote:
>
> On 6/4/24 10:27 AM, Ard Biesheuvel wrote:
> > On Tue, 4 Jun 2024 at 19:24,  wrote:
> >>
> >> On 5/31/24 6:33 AM, Ard Biesheuvel wrote:
> >>> On Fri, 31 May 2024 at 13:00, Ard Biesheuvel  wrote:
> >>>>
> >>>> Hello Ross,
> >>>>
> >>>> On Fri, 31 May 2024 at 03:32, Ross Philipson  
> >>>> wrote:
> >>>>>
> >>>>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> >>>>> later AMD SKINIT) to vector to during the late launch. The symbol
> >>>>> sl_stub_entry is that entry point and its offset into the kernel is
> >>>>> conveyed to the launching code using the MLE (Measured Launch
> >>>>> Environment) header in the structure named mle_header. The offset of the
> >>>>> MLE header is set in the kernel_info. The routine sl_stub contains the
> >>>>> very early late launch setup code responsible for setting up the basic
> >>>>> environment to allow the normal kernel startup_32 code to proceed. It is
> >>>>> also responsible for properly waking and handling the APs on Intel
> >>>>> platforms. The routine sl_main which runs after entering 64b mode is
> >>>>> responsible for measuring configuration and module information before
> >>>>> it is used like the boot params, the kernel command line, the TXT heap,
> >>>>> an external initramfs, etc.
> >>>>>
> >>>>> Signed-off-by: Ross Philipson 
> >>>>> ---
> >>>>>Documentation/arch/x86/boot.rst|  21 +
> >>>>>arch/x86/boot/compressed/Makefile  |   3 +-
> >>>>>arch/x86/boot/compressed/head_64.S |  30 +
> >>>>>arch/x86/boot/compressed/kernel_info.S |  34 ++
> >>>>>arch/x86/boot/compressed/sl_main.c | 577 
> >>>>>arch/x86/boot/compressed/sl_stub.S | 725 
> >>>>> +
> >>>>>arch/x86/include/asm/msr-index.h   |   5 +
> >>>>>arch/x86/include/uapi/asm/bootparam.h  |   1 +
> >>>>>arch/x86/kernel/asm-offsets.c  |  20 +
> >>>>>9 files changed, 1415 insertions(+), 1 deletion(-)
> >>>>>create mode 100644 arch/x86/boot/compressed/sl_main.c
> >>>>>create mode 100644 arch/x86/boot/compressed/sl_stub.S
> >>>>>
> >>>>> diff --git a/Documentation/arch/x86/boot.rst 
> >>>>> b/Documentation/arch/x86/boot.rst
> >>>>> index 4fd492cb4970..295cdf9bcbdb 100644
> >>>>> --- a/Documentation/arch/x86/boot.rst
> >>>>> +++ b/Documentation/arch/x86/boot.rst
> >>>>> @@ -482,6 +482,14 @@ Protocol:  2.00+
> >>>>>   - If 1, KASLR enabled.
> >>>>>   - If 0, KASLR disabled.
> >>>>>
> >>>>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
> >>>>> +
> >>>>> +   - Used internally by the setup kernel to communicate
> >>>>> + Secure Launch status to kernel proper.
> >>>>> +
> >>>>> +   - If 1, Secure Launch enabled.
> >>>>> +   - If 0, Secure Launch disabled.
> >>>>> +
> >>>>>  Bit 5 (write): QUIET_FLAG
> >>>>>
> >>>>>   - If 0, print early messages.
> >>>>> @@ -1028,6 +1036,19 @@ Offset/size: 0x000c/4
> >>>>>
> >>>>>  This field contains maximal allowed type for setup_data and 
> >>>>> setup_indirect structs.
> >>>>>
> >>>>> +   =
> >>>>> +Field name:mle_header_offset
> >>>>> +Offset/size:   0x0010/4
> >>>>> +   =
> >>>>> +
> >>>>> +  This field contains the offset to the Secure Launch Measured Launch 
> >>>>> Environment
> >>>>> +  (MLE) header. This offset is used to locate information needed 
> >>>>> during a secure
> >>>>> +  late launch using Intel TXT. If the offset is zero, the kernel does 
> >>>>> not have
> >>>>> +  Secure Launch capabilities. The MLE entry point is called from TX

Re: [PATCH v9 08/19] x86: Secure Launch kernel early boot stub

2024-06-04 Thread Ard Biesheuvel
On Tue, 4 Jun 2024 at 19:24,  wrote:
>
> On 5/31/24 6:33 AM, Ard Biesheuvel wrote:
> > On Fri, 31 May 2024 at 13:00, Ard Biesheuvel  wrote:
> >>
> >> Hello Ross,
> >>
> >> On Fri, 31 May 2024 at 03:32, Ross Philipson  
> >> wrote:
> >>>
> >>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> >>> later AMD SKINIT) to vector to during the late launch. The symbol
> >>> sl_stub_entry is that entry point and its offset into the kernel is
> >>> conveyed to the launching code using the MLE (Measured Launch
> >>> Environment) header in the structure named mle_header. The offset of the
> >>> MLE header is set in the kernel_info. The routine sl_stub contains the
> >>> very early late launch setup code responsible for setting up the basic
> >>> environment to allow the normal kernel startup_32 code to proceed. It is
> >>> also responsible for properly waking and handling the APs on Intel
> >>> platforms. The routine sl_main which runs after entering 64b mode is
> >>> responsible for measuring configuration and module information before
> >>> it is used like the boot params, the kernel command line, the TXT heap,
> >>> an external initramfs, etc.
> >>>
> >>> Signed-off-by: Ross Philipson 
> >>> ---
> >>>   Documentation/arch/x86/boot.rst|  21 +
> >>>   arch/x86/boot/compressed/Makefile  |   3 +-
> >>>   arch/x86/boot/compressed/head_64.S |  30 +
> >>>   arch/x86/boot/compressed/kernel_info.S |  34 ++
> >>>   arch/x86/boot/compressed/sl_main.c | 577 
> >>>   arch/x86/boot/compressed/sl_stub.S | 725 +
> >>>   arch/x86/include/asm/msr-index.h   |   5 +
> >>>   arch/x86/include/uapi/asm/bootparam.h  |   1 +
> >>>   arch/x86/kernel/asm-offsets.c  |  20 +
> >>>   9 files changed, 1415 insertions(+), 1 deletion(-)
> >>>   create mode 100644 arch/x86/boot/compressed/sl_main.c
> >>>   create mode 100644 arch/x86/boot/compressed/sl_stub.S
> >>>
> >>> diff --git a/Documentation/arch/x86/boot.rst 
> >>> b/Documentation/arch/x86/boot.rst
> >>> index 4fd492cb4970..295cdf9bcbdb 100644
> >>> --- a/Documentation/arch/x86/boot.rst
> >>> +++ b/Documentation/arch/x86/boot.rst
> >>> @@ -482,6 +482,14 @@ Protocol:  2.00+
> >>>  - If 1, KASLR enabled.
> >>>  - If 0, KASLR disabled.
> >>>
> >>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
> >>> +
> >>> +   - Used internally by the setup kernel to communicate
> >>> + Secure Launch status to kernel proper.
> >>> +
> >>> +   - If 1, Secure Launch enabled.
> >>> +   - If 0, Secure Launch disabled.
> >>> +
> >>> Bit 5 (write): QUIET_FLAG
> >>>
> >>>  - If 0, print early messages.
> >>> @@ -1028,6 +1036,19 @@ Offset/size: 0x000c/4
> >>>
> >>> This field contains maximal allowed type for setup_data and 
> >>> setup_indirect structs.
> >>>
> >>> +   =
> >>> +Field name:mle_header_offset
> >>> +Offset/size:   0x0010/4
> >>> +   =
> >>> +
> >>> +  This field contains the offset to the Secure Launch Measured Launch 
> >>> Environment
> >>> +  (MLE) header. This offset is used to locate information needed during 
> >>> a secure
> >>> +  late launch using Intel TXT. If the offset is zero, the kernel does 
> >>> not have
> >>> +  Secure Launch capabilities. The MLE entry point is called from TXT on 
> >>> the BSP
> >>> +  following a success measured launch. The specific state of the 
> >>> processors is
> >>> +  outlined in the TXT Software Development Guide, the latest can be 
> >>> found here:
> >>> +  
> >>> https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!Mng0gnPhOYZ8D02t1rYwQfY6U3uWaypJyd1T2rsWz3QNHr9GhIZ9ANB_-cgPExxX0e0KmCpda-3VX8Fj$
> >>> +
> >>>
> >>
> >> Could we just repaint this field as the offset relative to the start
> >> of kernel_info rather than relative to the start o

Re: [PATCH v9 08/19] x86: Secure Launch kernel early boot stub

2024-05-31 Thread Ard Biesheuvel
On Fri, 31 May 2024 at 16:04, Ard Biesheuvel  wrote:
>
> On Fri, 31 May 2024 at 15:33, Ard Biesheuvel  wrote:
> >
> > On Fri, 31 May 2024 at 13:00, Ard Biesheuvel  wrote:
> > >
> > > Hello Ross,
> > >
> > > On Fri, 31 May 2024 at 03:32, Ross Philipson  
> > > wrote:
> > > >
> > > > The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> > > > later AMD SKINIT) to vector to during the late launch. The symbol
> > > > sl_stub_entry is that entry point and its offset into the kernel is
> > > > conveyed to the launching code using the MLE (Measured Launch
> > > > Environment) header in the structure named mle_header. The offset of the
> > > > MLE header is set in the kernel_info. The routine sl_stub contains the
> > > > very early late launch setup code responsible for setting up the basic
> > > > environment to allow the normal kernel startup_32 code to proceed. It is
> > > > also responsible for properly waking and handling the APs on Intel
> > > > platforms. The routine sl_main which runs after entering 64b mode is
> > > > responsible for measuring configuration and module information before
> > > > it is used like the boot params, the kernel command line, the TXT heap,
> > > > an external initramfs, etc.
> > > >
> > > > Signed-off-by: Ross Philipson 
> > > > ---
> > > >  Documentation/arch/x86/boot.rst|  21 +
> > > >  arch/x86/boot/compressed/Makefile  |   3 +-
> > > >  arch/x86/boot/compressed/head_64.S |  30 +
> > > >  arch/x86/boot/compressed/kernel_info.S |  34 ++
> > > >  arch/x86/boot/compressed/sl_main.c | 577 
> > > >  arch/x86/boot/compressed/sl_stub.S | 725 +
> > > >  arch/x86/include/asm/msr-index.h   |   5 +
> > > >  arch/x86/include/uapi/asm/bootparam.h  |   1 +
> > > >  arch/x86/kernel/asm-offsets.c  |  20 +
> > > >  9 files changed, 1415 insertions(+), 1 deletion(-)
> > > >  create mode 100644 arch/x86/boot/compressed/sl_main.c
> > > >  create mode 100644 arch/x86/boot/compressed/sl_stub.S
> > > >
> > > > diff --git a/Documentation/arch/x86/boot.rst 
> > > > b/Documentation/arch/x86/boot.rst
> > > > index 4fd492cb4970..295cdf9bcbdb 100644
> > > > --- a/Documentation/arch/x86/boot.rst
> > > > +++ b/Documentation/arch/x86/boot.rst
> > > > @@ -482,6 +482,14 @@ Protocol:  2.00+
> > > > - If 1, KASLR enabled.
> > > > - If 0, KASLR disabled.
> > > >
> > > > +  Bit 2 (kernel internal): SLAUNCH_FLAG
> > > > +
> > > > +   - Used internally by the setup kernel to communicate
> > > > + Secure Launch status to kernel proper.
> > > > +
> > > > +   - If 1, Secure Launch enabled.
> > > > +   - If 0, Secure Launch disabled.
> > > > +
> > > >Bit 5 (write): QUIET_FLAG
> > > >
> > > > - If 0, print early messages.
> > > > @@ -1028,6 +1036,19 @@ Offset/size: 0x000c/4
> > > >
> > > >This field contains maximal allowed type for setup_data and 
> > > > setup_indirect structs.
> > > >
> > > > +   =
> > > > +Field name:mle_header_offset
> > > > +Offset/size:   0x0010/4
> > > > +   =
> > > > +
> > > > +  This field contains the offset to the Secure Launch Measured Launch 
> > > > Environment
> > > > +  (MLE) header. This offset is used to locate information needed 
> > > > during a secure
> > > > +  late launch using Intel TXT. If the offset is zero, the kernel does 
> > > > not have
> > > > +  Secure Launch capabilities. The MLE entry point is called from TXT 
> > > > on the BSP
> > > > +  following a success measured launch. The specific state of the 
> > > > processors is
> > > > +  outlined in the TXT Software Development Guide, the latest can be 
> > > > found here:
> > > > +  
> > > > https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
> > > > +
> > > >
> > >
> > > Could we just repaint this field as the offset relative to the start
> > &

Re: [PATCH v9 08/19] x86: Secure Launch kernel early boot stub

2024-05-31 Thread Ard Biesheuvel
On Fri, 31 May 2024 at 15:33, Ard Biesheuvel  wrote:
>
> On Fri, 31 May 2024 at 13:00, Ard Biesheuvel  wrote:
> >
> > Hello Ross,
> >
> > On Fri, 31 May 2024 at 03:32, Ross Philipson  
> > wrote:
> > >
> > > The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> > > later AMD SKINIT) to vector to during the late launch. The symbol
> > > sl_stub_entry is that entry point and its offset into the kernel is
> > > conveyed to the launching code using the MLE (Measured Launch
> > > Environment) header in the structure named mle_header. The offset of the
> > > MLE header is set in the kernel_info. The routine sl_stub contains the
> > > very early late launch setup code responsible for setting up the basic
> > > environment to allow the normal kernel startup_32 code to proceed. It is
> > > also responsible for properly waking and handling the APs on Intel
> > > platforms. The routine sl_main which runs after entering 64b mode is
> > > responsible for measuring configuration and module information before
> > > it is used like the boot params, the kernel command line, the TXT heap,
> > > an external initramfs, etc.
> > >
> > > Signed-off-by: Ross Philipson 
> > > ---
> > >  Documentation/arch/x86/boot.rst|  21 +
> > >  arch/x86/boot/compressed/Makefile  |   3 +-
> > >  arch/x86/boot/compressed/head_64.S |  30 +
> > >  arch/x86/boot/compressed/kernel_info.S |  34 ++
> > >  arch/x86/boot/compressed/sl_main.c | 577 
> > >  arch/x86/boot/compressed/sl_stub.S | 725 +
> > >  arch/x86/include/asm/msr-index.h   |   5 +
> > >  arch/x86/include/uapi/asm/bootparam.h  |   1 +
> > >  arch/x86/kernel/asm-offsets.c  |  20 +
> > >  9 files changed, 1415 insertions(+), 1 deletion(-)
> > >  create mode 100644 arch/x86/boot/compressed/sl_main.c
> > >  create mode 100644 arch/x86/boot/compressed/sl_stub.S
> > >
> > > diff --git a/Documentation/arch/x86/boot.rst 
> > > b/Documentation/arch/x86/boot.rst
> > > index 4fd492cb4970..295cdf9bcbdb 100644
> > > --- a/Documentation/arch/x86/boot.rst
> > > +++ b/Documentation/arch/x86/boot.rst
> > > @@ -482,6 +482,14 @@ Protocol:  2.00+
> > > - If 1, KASLR enabled.
> > > - If 0, KASLR disabled.
> > >
> > > +  Bit 2 (kernel internal): SLAUNCH_FLAG
> > > +
> > > +   - Used internally by the setup kernel to communicate
> > > + Secure Launch status to kernel proper.
> > > +
> > > +   - If 1, Secure Launch enabled.
> > > +   - If 0, Secure Launch disabled.
> > > +
> > >Bit 5 (write): QUIET_FLAG
> > >
> > > - If 0, print early messages.
> > > @@ -1028,6 +1036,19 @@ Offset/size: 0x000c/4
> > >
> > >This field contains maximal allowed type for setup_data and 
> > > setup_indirect structs.
> > >
> > > +   =
> > > +Field name:mle_header_offset
> > > +Offset/size:   0x0010/4
> > > +   =
> > > +
> > > +  This field contains the offset to the Secure Launch Measured Launch 
> > > Environment
> > > +  (MLE) header. This offset is used to locate information needed during 
> > > a secure
> > > +  late launch using Intel TXT. If the offset is zero, the kernel does 
> > > not have
> > > +  Secure Launch capabilities. The MLE entry point is called from TXT on 
> > > the BSP
> > > +  following a success measured launch. The specific state of the 
> > > processors is
> > > +  outlined in the TXT Software Development Guide, the latest can be 
> > > found here:
> > > +  
> > > https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
> > > +
> > >
> >
> > Could we just repaint this field as the offset relative to the start
> > of kernel_info rather than relative to the start of the image? That
> > way, there is no need for patch #1, and given that the consumer of
> > this field accesses it via kernel_info, I wouldn't expect any issues
> > in applying this offset to obtain the actual address.
> >
> >
> > >  The Image Checksum
> > >  ==
> > > diff --git a/arch/x86/boot/compressed/Makefile 
> > > b/arch/x86/boot/compressed/Makefile
&

Re: [PATCH v9 08/19] x86: Secure Launch kernel early boot stub

2024-05-31 Thread Ard Biesheuvel
On Fri, 31 May 2024 at 13:00, Ard Biesheuvel  wrote:
>
> Hello Ross,
>
> On Fri, 31 May 2024 at 03:32, Ross Philipson  
> wrote:
> >
> > The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> > later AMD SKINIT) to vector to during the late launch. The symbol
> > sl_stub_entry is that entry point and its offset into the kernel is
> > conveyed to the launching code using the MLE (Measured Launch
> > Environment) header in the structure named mle_header. The offset of the
> > MLE header is set in the kernel_info. The routine sl_stub contains the
> > very early late launch setup code responsible for setting up the basic
> > environment to allow the normal kernel startup_32 code to proceed. It is
> > also responsible for properly waking and handling the APs on Intel
> > platforms. The routine sl_main which runs after entering 64b mode is
> > responsible for measuring configuration and module information before
> > it is used like the boot params, the kernel command line, the TXT heap,
> > an external initramfs, etc.
> >
> > Signed-off-by: Ross Philipson 
> > ---
> >  Documentation/arch/x86/boot.rst|  21 +
> >  arch/x86/boot/compressed/Makefile  |   3 +-
> >  arch/x86/boot/compressed/head_64.S |  30 +
> >  arch/x86/boot/compressed/kernel_info.S |  34 ++
> >  arch/x86/boot/compressed/sl_main.c | 577 
> >  arch/x86/boot/compressed/sl_stub.S | 725 +
> >  arch/x86/include/asm/msr-index.h   |   5 +
> >  arch/x86/include/uapi/asm/bootparam.h  |   1 +
> >  arch/x86/kernel/asm-offsets.c  |  20 +
> >  9 files changed, 1415 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/x86/boot/compressed/sl_main.c
> >  create mode 100644 arch/x86/boot/compressed/sl_stub.S
> >
> > diff --git a/Documentation/arch/x86/boot.rst 
> > b/Documentation/arch/x86/boot.rst
> > index 4fd492cb4970..295cdf9bcbdb 100644
> > --- a/Documentation/arch/x86/boot.rst
> > +++ b/Documentation/arch/x86/boot.rst
> > @@ -482,6 +482,14 @@ Protocol:  2.00+
> > - If 1, KASLR enabled.
> > - If 0, KASLR disabled.
> >
> > +  Bit 2 (kernel internal): SLAUNCH_FLAG
> > +
> > +   - Used internally by the setup kernel to communicate
> > + Secure Launch status to kernel proper.
> > +
> > +   - If 1, Secure Launch enabled.
> > +   - If 0, Secure Launch disabled.
> > +
> >Bit 5 (write): QUIET_FLAG
> >
> > - If 0, print early messages.
> > @@ -1028,6 +1036,19 @@ Offset/size: 0x000c/4
> >
> >This field contains maximal allowed type for setup_data and 
> > setup_indirect structs.
> >
> > +   =
> > +Field name:mle_header_offset
> > +Offset/size:   0x0010/4
> > +   =
> > +
> > +  This field contains the offset to the Secure Launch Measured Launch 
> > Environment
> > +  (MLE) header. This offset is used to locate information needed during a 
> > secure
> > +  late launch using Intel TXT. If the offset is zero, the kernel does not 
> > have
> > +  Secure Launch capabilities. The MLE entry point is called from TXT on 
> > the BSP
> > +  following a success measured launch. The specific state of the 
> > processors is
> > +  outlined in the TXT Software Development Guide, the latest can be found 
> > here:
> > +  
> > https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
> > +
> >
>
> Could we just repaint this field as the offset relative to the start
> of kernel_info rather than relative to the start of the image? That
> way, there is no need for patch #1, and given that the consumer of
> this field accesses it via kernel_info, I wouldn't expect any issues
> in applying this offset to obtain the actual address.
>
>
> >  The Image Checksum
> >  ==
> > diff --git a/arch/x86/boot/compressed/Makefile 
> > b/arch/x86/boot/compressed/Makefile
> > index 9189a0e28686..9076a248d4b4 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
> >  vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
> >  vmlinux-objs-$(CONFIG_EFI_STUB) += 
> > $(objtree)/drivers/firmware/efi/libstub/lib.a
> >
> > -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
> > $(obj)/early_sha256.o
> > +vmlinux-obj

Re: [PATCH v9 19/19] x86: EFI stub DRTM launch support for Secure Launch

2024-05-31 Thread Ard Biesheuvel
On Fri, 31 May 2024 at 03:32, Ross Philipson  wrote:
>
> This support allows the DRTM launch to be initiated after an EFI stub
> launch of the Linux kernel is done. This is accomplished by providing
> a handler to jump to when a Secure Launch is in progress. This has to be
> called after the EFI stub does Exit Boot Services.
>
> Signed-off-by: Ross Philipson 

Just some minor remarks below. The overall approach in this patch
looks fine now.


> ---
>  drivers/firmware/efi/libstub/x86-stub.c | 98 +
>  1 file changed, 98 insertions(+)
>
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c 
> b/drivers/firmware/efi/libstub/x86-stub.c
> index d5a8182cf2e1..a1143d006202 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -9,6 +9,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  #include 
>  #include 
> @@ -830,6 +832,97 @@ static efi_status_t efi_decompress_kernel(unsigned long 
> *kernel_entry)
> return efi_adjust_memory_range_protection(addr, kernel_text_size);
>  }
>
> +#if (IS_ENABLED(CONFIG_SECURE_LAUNCH))

IS_ENABLED() is mostly used for C conditionals not CPP ones.

It would be nice if this #if could be dropped, and replaced with ... (see below)


> +static bool efi_secure_launch_update_boot_params(struct slr_table *slrt,
> +struct boot_params 
> *boot_params)
> +{
> +   struct slr_entry_intel_info *txt_info;
> +   struct slr_entry_policy *policy;
> +   struct txt_os_mle_data *os_mle;
> +   bool updated = false;
> +   int i;
> +
> +   txt_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_INTEL_INFO);
> +   if (!txt_info)
> +   return false;
> +
> +   os_mle = txt_os_mle_data_start((void *)txt_info->txt_heap);
> +   if (!os_mle)
> +   return false;
> +
> +   os_mle->boot_params_addr = (u32)(u64)boot_params;
> +

Why is this safe?

> +   policy = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_ENTRY_POLICY);
> +   if (!policy)
> +   return false;
> +
> +   for (i = 0; i < policy->nr_entries; i++) {
> +   if (policy->policy_entries[i].entity_type == 
> SLR_ET_BOOT_PARAMS) {
> +   policy->policy_entries[i].entity = (u64)boot_params;
> +   updated = true;
> +   break;
> +   }
> +   }
> +
> +   /*
> +* If this is a PE entry into EFI stub the mocked up boot params will
> +* be missing some of the setup header data needed for the second 
> stage
> +* of the Secure Launch boot.
> +*/
> +   if (image) {
> +   struct setup_header *hdr = (struct setup_header *)((u8 
> *)image->image_base + 0x1f1);

Could we use something other than a bare 0x1f1 constant here? struct
boot_params has a struct setup_header at the correct offset, so with
some casting of offsetof() use, we can make this look a lot more self
explanatory.


> +   u64 cmdline_ptr, hi_val;
> +
> +   boot_params->hdr.setup_sects = hdr->setup_sects;
> +   boot_params->hdr.syssize = hdr->syssize;
> +   boot_params->hdr.version = hdr->version;
> +   boot_params->hdr.loadflags = hdr->loadflags;
> +   boot_params->hdr.kernel_alignment = hdr->kernel_alignment;
> +   boot_params->hdr.min_alignment = hdr->min_alignment;
> +   boot_params->hdr.xloadflags = hdr->xloadflags;
> +   boot_params->hdr.init_size = hdr->init_size;
> +   boot_params->hdr.kernel_info_offset = hdr->kernel_info_offset;
> +   hi_val = boot_params->ext_cmd_line_ptr;

We have efi_set_u64_split() for this.

> +   cmdline_ptr = boot_params->hdr.cmd_line_ptr | hi_val << 32;
> +   boot_params->hdr.cmdline_size = strlen((const char 
> *)cmdline_ptr);;
> +   }
> +
> +   return updated;
> +}
> +
> +static void efi_secure_launch(struct boot_params *boot_params)
> +{
> +   struct slr_entry_dl_info *dlinfo;
> +   efi_guid_t guid = SLR_TABLE_GUID;
> +   dl_handler_func handler_callback;
> +   struct slr_table *slrt;
> +

... a C conditional here, e.g.,

if (!IS_ENABLED(CONFIG_SECURE_LAUNCH))
return;

The difference is that all the code will get compile test coverage
every time, instead of only in configs that enable
CONFIG_SECURE_LAUNCH.

This significantly reduces the risk that your stuff will get broken
inadvertently.

> +   /*
> +* The presence of this table indicated a Secure Launch
> +* is being requested.
> +*/
> +   slrt = (struct slr_table *)get_efi_config_table(guid);
> +   if (!slrt || slrt->magic != SLR_TABLE_MAGIC)
> +   return;
> +
> +   /*
> +* Since the EFI stub library creates its own boot_params on entry, 
> the
> +* SLRT and TXT heap have to be updated with this ve

Re: [PATCH v9 08/19] x86: Secure Launch kernel early boot stub

2024-05-31 Thread Ard Biesheuvel
Hello Ross,

On Fri, 31 May 2024 at 03:32, Ross Philipson  wrote:
>
> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> later AMD SKINIT) to vector to during the late launch. The symbol
> sl_stub_entry is that entry point and its offset into the kernel is
> conveyed to the launching code using the MLE (Measured Launch
> Environment) header in the structure named mle_header. The offset of the
> MLE header is set in the kernel_info. The routine sl_stub contains the
> very early late launch setup code responsible for setting up the basic
> environment to allow the normal kernel startup_32 code to proceed. It is
> also responsible for properly waking and handling the APs on Intel
> platforms. The routine sl_main which runs after entering 64b mode is
> responsible for measuring configuration and module information before
> it is used like the boot params, the kernel command line, the TXT heap,
> an external initramfs, etc.
>
> Signed-off-by: Ross Philipson 
> ---
>  Documentation/arch/x86/boot.rst|  21 +
>  arch/x86/boot/compressed/Makefile  |   3 +-
>  arch/x86/boot/compressed/head_64.S |  30 +
>  arch/x86/boot/compressed/kernel_info.S |  34 ++
>  arch/x86/boot/compressed/sl_main.c | 577 
>  arch/x86/boot/compressed/sl_stub.S | 725 +
>  arch/x86/include/asm/msr-index.h   |   5 +
>  arch/x86/include/uapi/asm/bootparam.h  |   1 +
>  arch/x86/kernel/asm-offsets.c  |  20 +
>  9 files changed, 1415 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/boot/compressed/sl_main.c
>  create mode 100644 arch/x86/boot/compressed/sl_stub.S
>
> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
> index 4fd492cb4970..295cdf9bcbdb 100644
> --- a/Documentation/arch/x86/boot.rst
> +++ b/Documentation/arch/x86/boot.rst
> @@ -482,6 +482,14 @@ Protocol:  2.00+
> - If 1, KASLR enabled.
> - If 0, KASLR disabled.
>
> +  Bit 2 (kernel internal): SLAUNCH_FLAG
> +
> +   - Used internally by the setup kernel to communicate
> + Secure Launch status to kernel proper.
> +
> +   - If 1, Secure Launch enabled.
> +   - If 0, Secure Launch disabled.
> +
>Bit 5 (write): QUIET_FLAG
>
> - If 0, print early messages.
> @@ -1028,6 +1036,19 @@ Offset/size: 0x000c/4
>
>This field contains maximal allowed type for setup_data and setup_indirect 
> structs.
>
> +   =
> +Field name:mle_header_offset
> +Offset/size:   0x0010/4
> +   =
> +
> +  This field contains the offset to the Secure Launch Measured Launch 
> Environment
> +  (MLE) header. This offset is used to locate information needed during a 
> secure
> +  late launch using Intel TXT. If the offset is zero, the kernel does not 
> have
> +  Secure Launch capabilities. The MLE entry point is called from TXT on the 
> BSP
> +  following a success measured launch. The specific state of the processors 
> is
> +  outlined in the TXT Software Development Guide, the latest can be found 
> here:
> +  
> https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
> +
>

Could we just repaint this field as the offset relative to the start
of kernel_info rather than relative to the start of the image? That
way, there is no need for patch #1, and given that the consumer of
this field accesses it via kernel_info, I wouldn't expect any issues
in applying this offset to obtain the actual address.


>  The Image Checksum
>  ==
> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index 9189a0e28686..9076a248d4b4 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>  vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>  vmlinux-objs-$(CONFIG_EFI_STUB) += 
> $(objtree)/drivers/firmware/efi/libstub/lib.a
>
> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
> $(obj)/early_sha256.o
> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
> $(obj)/early_sha256.o \
> +   $(obj)/sl_main.o $(obj)/sl_stub.o
>
>  $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
> $(call if_changed,ld)
> diff --git a/arch/x86/boot/compressed/head_64.S 
> b/arch/x86/boot/compressed/head_64.S
> index 1dcb794c5479..803c9e2e6d85 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
> pushq   $0
> popfq
>
> +#ifdef CONFIG_SECURE_LAUNCH
> +   /* Ensure the relocation region is coverd by a PMR */

covered

> +   movq%rbx, %rdi
> +   movl$(_bss - startup_32), %esi
> +   callq   sl_check_region
> +#endif
> +
>  /*
>   * Copy the compressed kernel to the end of our buffer
>   * where decompression in place becomes safe.
> @@ -462,6 

Re: [PATCH v8 00/11] ACPI/IORT: Support for IORT RMR node

2022-03-14 Thread Ard Biesheuvel
On Mon, 14 Mar 2022 at 11:37, Eric Auger  wrote:
>
> Hi Robin
>
> On 3/11/22 11:34 AM, Robin Murphy wrote:
> > On 2022-03-11 08:19, Eric Auger wrote:
> >> Hi guys,
> >>
> >> On 2/21/22 4:43 PM, Shameer Kolothum wrote:
> >>> Hi,
> >>>
> >>> Since we now have an updated verion[0] of IORT spec(E.d) which
> >>> addresses the memory attributes issues discussed here [1],
> >>> this series now make use of it.
> >>>
> >>> The pull request for ACPICA E.d related changes are already
> >>> raised and can be found here,
> >>> https://github.com/acpica/acpica/pull/752
> >>>
> >>> v7 --> v8
> >>>- Patch #1 has temp definitions for RMR related changes till
> >>>  the ACPICA header changes are part of kernel.
> >>>- No early parsing of RMR node info and is only parsed at the
> >>>  time of use.
> >>>- Changes to the RMR get/put API format compared to the
> >>>  previous version.
> >>>- Support for RMR descriptor shared by multiple stream IDs.
> >>>
> >>> Please take a look and let me know your thoughts.
> >>>
> >>> Thanks,
> >>> Shameer
> >>> [0] https://developer.arm.com/documentation/den0049/ed/
> >> I still have a question on the IORT E.d spec (unrelated to this series).
> >>
> >> The spec mandates that if RMR nodes are presented in the IORT,
> >> _DSM function #5 for the PCIe host bridge ACPI device object must return
> >> 0, indicating the OS must honour the PCI config that the FW computed at
> >> boot time.
> >>
> >> However implementing this _DSM #5 as above is known to prevent PCI
> >> devices with IO ports from working, on aarch64 linux.
> >>
> >> "
> >> The reason is that EFI creates I/O port mappings below
> >>  0x1000 (in fact, at 0). However Linux, for legacy reasons, does not
> >>  support I/O ports <= 0x1000 on PCI, so the I/O assignment
> >> created by EFI
> >>  is rejected.
> >>  EFI creates the mappings primarily for itself, and up until
> >> DSM #5
> >>  started to be enforced, all PCI resource allocations that
> >> existed at
> >>  boot were ignored by Linux and recreated from scratch.
> >> "
> >>
> >> This is an excerpt of a qemu commit message that reverted the _DMS #5
> >> change (Revert "acpi/gpex: Inform os to keep firmware resource map").
> >> Has the situation changed since July 2021 (ie. has UEFI been reworked?).
> >> [+ Ard]
> >
> > FWIW I wasn't aware of that, but if it's an issue then it will need to
> > be fixed in Linux or UEFI's PCI resource code (arguably if UEFI has
> > already allocated from the bottom of I/O space then Linux should be
> > safe to assume that there are no legacy PC I/O resources to worry
> > about). The DSM is required to prevent bus numbers being reassigned,
> > because if that happens then any PCI StreamIDs referenced in IORT may
> > suddenly become meaningless and the association of root complex nodes
> > and RMRs to physical hardware lost.
>
> Thank you for confirming and explaining the need for DSM #5. Ard, please
> could you confirm that the incompatibility with PCI devices with IO
> ports is still there?
>

Yes, and this needs to be fixed in Linux. The firmware complies with
the pertinent specifications, and it is Linux that deviates from this
for legacy reasons.

IIRC, this came up on the mailing list at some point, and one of the
issues is that I/O port 0x0 is mistaken for 'no resource' or some
other exceptional case like that, so even if we fix the arbitrary
limit of 0x1000, we may still run into trouble when devices uses I/O
port 0x0.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-direct: improve DMA_ATTR_NO_KERNEL_MAPPING

2021-11-04 Thread Ard Biesheuvel
On Thu, 4 Nov 2021 at 14:40, Walter Wu  wrote:
>
> On Thu, 2021-11-04 at 13:47 +0100, Ard Biesheuvel wrote:
> > On Thu, 4 Nov 2021 at 13:31, Walter Wu 
> > wrote:
> > >
> > > On Thu, 2021-11-04 at 09:57 +0100, Ard Biesheuvel wrote:
> > > > On Thu, 4 Nov 2021 at 09:53, Christoph Hellwig 
> > > > wrote:
> > > > >
> > > > > On Thu, Nov 04, 2021 at 10:32:21AM +0800, Walter Wu wrote:
> > > > > > diff --git a/include/linux/set_memory.h
> > > > > > b/include/linux/set_memory.h
> > > > > > index f36be5166c19..6c7d1683339c 100644
> > > > > > --- a/include/linux/set_memory.h
> > > > > > +++ b/include/linux/set_memory.h
> > > > > > @@ -7,11 +7,16 @@
> > > > > >
> > > > > >  #ifdef CONFIG_ARCH_HAS_SET_MEMORY
> > > > > >  #include 
> > > > > > +
> > > > > > +#ifndef CONFIG_RODATA_FULL_DEFAULT_ENABLED
> > > > >
> > > > > This is an arm64-specific symbol, and one that only controls a
> > > > > default.  I don't think it is suitable to key off stubs in
> > > > > common
> > > > > code.
> > > > >
> > > > > > +static inline int set_memory_valid(unsigned long addr, int
> > > > > > numpages, int enable) { return 0; }
> > > > >
> > > > > Pleae avoid overly long lines.
> > > > >
> > > > > > + if
> > > > > > (IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED))
> > > > > > {
> > > > > > + kaddr = (unsigned
> > > > > > long)phys_to_virt(dma_to_phys(dev, *dma_handle));
> > > > >
> > > > > This can just use page_address.
> > > > >
> > > > > > + /* page remove kernel mapping for arm64
> > > > > > */
> > > > > > + set_memory_valid(kaddr, size >>
> > > > > > PAGE_SHIFT,
> > > > > > 0);
> > > > > > + }
> > > > >
> > > > > But more importantly:  set_memory_valid only exists on arm64,
> > > > > this
> > > > > will break compile everywhere else.  And this API is complete
> > > > > crap.
> > > > > Passing kernel virtual addresses as unsigned long just sucks,
> > > > > and
> > > > > passing an integer argument for valid/non-valid also is a
> > > > > horrible
> > > > > API.
> > > > >
> > > >
> > > > ... and as I pointed out before, you can still pass rodata=off on
> > > > arm64, and get the old behavior, in which case bad things will
> > > > happen
> > > > if you try to use an API that expects to operate on page mappings
> > > > with
> > > > a 1 GB block mapping.
> > > >
> > >
> > > Thanks for your suggestion.
> > >
> > >
> > > > And you still haven't explained what the actual problem is: is
> > > > this
> > > > about CPU speculation corrupting non-cache coherent inbound DMA?
> > >
> > > No corrupiton, only cpu read it, we hope to fix the behavior.
> > >
> >
> > Fix which behavior? Please explain
> >
> > 1) the current behavior
> We call dma_direct_alloc() with DMA_ATTR_NO_KERNEL_MAPPING to get the
> allocated buffer and the kernel mapping is exist. Our goal is this
> buffer doesn't allow to be accessed by cpu. Unfortunately, we see cpu
> speculation to read it. So we need to fix it and don't use no-map the
> way.
>
> > 2) why the current behavior is problematic for you
> dma_direct_alloc() with DMA_ATTR_NO_KERNEL_MAPPING have kernel mapping,
> so it still has cpu speculation read the buffer. Although we have
> hardware to protect the buffer, we still hope use software to fix it.
>

But *why* is this a problem? You are saying that the speculative
accesses are not causing corruption, so they are causing other issues
that you want to address. So which issues are we talking about here?


> > 3) how this patch changes the current behavior
> When call dma_direct_alloc() with DMA_ATTR_NO_KERNEL_MAPPING, then
> remove the kernel mapping which belong to the buffer.
>
> > 4) why the new behavior fixes your problem.
> If I understand correctly, want to block cpu speculation, then need
> unmap the buffer at stage 1 and stage 2 page table and tlb invalidate.
> This patch is to do stage 1 unmap at EL1.
>
> >
> > There is no penalty for using too many words.
>
> Thanks.
> Walter
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-direct: improve DMA_ATTR_NO_KERNEL_MAPPING

2021-11-04 Thread Ard Biesheuvel
On Thu, 4 Nov 2021 at 13:31, Walter Wu  wrote:
>
> On Thu, 2021-11-04 at 09:57 +0100, Ard Biesheuvel wrote:
> > On Thu, 4 Nov 2021 at 09:53, Christoph Hellwig  wrote:
> > >
> > > On Thu, Nov 04, 2021 at 10:32:21AM +0800, Walter Wu wrote:
> > > > diff --git a/include/linux/set_memory.h
> > > > b/include/linux/set_memory.h
> > > > index f36be5166c19..6c7d1683339c 100644
> > > > --- a/include/linux/set_memory.h
> > > > +++ b/include/linux/set_memory.h
> > > > @@ -7,11 +7,16 @@
> > > >
> > > >  #ifdef CONFIG_ARCH_HAS_SET_MEMORY
> > > >  #include 
> > > > +
> > > > +#ifndef CONFIG_RODATA_FULL_DEFAULT_ENABLED
> > >
> > > This is an arm64-specific symbol, and one that only controls a
> > > default.  I don't think it is suitable to key off stubs in common
> > > code.
> > >
> > > > +static inline int set_memory_valid(unsigned long addr, int
> > > > numpages, int enable) { return 0; }
> > >
> > > Pleae avoid overly long lines.
> > >
> > > > + if (IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED))
> > > > {
> > > > + kaddr = (unsigned
> > > > long)phys_to_virt(dma_to_phys(dev, *dma_handle));
> > >
> > > This can just use page_address.
> > >
> > > > + /* page remove kernel mapping for arm64 */
> > > > + set_memory_valid(kaddr, size >> PAGE_SHIFT,
> > > > 0);
> > > > + }
> > >
> > > But more importantly:  set_memory_valid only exists on arm64, this
> > > will break compile everywhere else.  And this API is complete crap.
> > > Passing kernel virtual addresses as unsigned long just sucks, and
> > > passing an integer argument for valid/non-valid also is a horrible
> > > API.
> > >
> >
> > ... and as I pointed out before, you can still pass rodata=off on
> > arm64, and get the old behavior, in which case bad things will happen
> > if you try to use an API that expects to operate on page mappings
> > with
> > a 1 GB block mapping.
> >
>
> Thanks for your suggestion.
>
>
> > And you still haven't explained what the actual problem is: is this
> > about CPU speculation corrupting non-cache coherent inbound DMA?
>
> No corrupiton, only cpu read it, we hope to fix the behavior.
>

Fix which behavior? Please explain

1) the current behavior
2) why the current behavior is problematic for you
3) how this patch changes the current behavior
4) why the new behavior fixes your problem.

There is no penalty for using too many words.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-direct: improve DMA_ATTR_NO_KERNEL_MAPPING

2021-11-04 Thread Ard Biesheuvel
On Thu, 4 Nov 2021 at 09:53, Christoph Hellwig  wrote:
>
> On Thu, Nov 04, 2021 at 10:32:21AM +0800, Walter Wu wrote:
> > diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
> > index f36be5166c19..6c7d1683339c 100644
> > --- a/include/linux/set_memory.h
> > +++ b/include/linux/set_memory.h
> > @@ -7,11 +7,16 @@
> >
> >  #ifdef CONFIG_ARCH_HAS_SET_MEMORY
> >  #include 
> > +
> > +#ifndef CONFIG_RODATA_FULL_DEFAULT_ENABLED
>
> This is an arm64-specific symbol, and one that only controls a
> default.  I don't think it is suitable to key off stubs in common
> code.
>
> > +static inline int set_memory_valid(unsigned long addr, int numpages, int 
> > enable) { return 0; }
>
> Pleae avoid overly long lines.
>
> > + if (IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED)) {
> > + kaddr = (unsigned long)phys_to_virt(dma_to_phys(dev, 
> > *dma_handle));
>
> This can just use page_address.
>
> > + /* page remove kernel mapping for arm64 */
> > + set_memory_valid(kaddr, size >> PAGE_SHIFT, 0);
> > + }
>
> But more importantly:  set_memory_valid only exists on arm64, this
> will break compile everywhere else.  And this API is complete crap.
> Passing kernel virtual addresses as unsigned long just sucks, and
> passing an integer argument for valid/non-valid also is a horrible
> API.
>

... and as I pointed out before, you can still pass rodata=off on
arm64, and get the old behavior, in which case bad things will happen
if you try to use an API that expects to operate on page mappings with
a 1 GB block mapping.

And you still haven't explained what the actual problem is: is this
about CPU speculation corrupting non-cache coherent inbound DMA?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING

2021-11-01 Thread Ard Biesheuvel
On Mon, 1 Nov 2021 at 13:21, Walter Wu  wrote:
>
> Hi Ard,
>
> On Mon, 2021-11-01 at 09:34 +0100, Ard Biesheuvel wrote:
> > On Mon, 1 Nov 2021 at 04:17, Walter Wu 
> > wrote:
> > >
> > > DMA_ATTR_NO_KERNEL_MAPPING is to avoid creating a kernel mapping
> > > for the allocated buffer, but current implementation is that
> > > PTE of allocated buffer in kernel page table is valid. So we
> > > should set invalid for PTE of allocate buffer so that there are
> > > no kernel mapping for the allocated buffer.
> > >
> > > In some cases, we don't hope the allocated buffer to be read
> > > by cpu or speculative execution, so we use
> > > DMA_ATTR_NO_KERNEL_MAPPING
> > > to get no kernel mapping in order to achieve this goal.
> > >
> > > Signed-off-by: Walter Wu 
> > > Cc: Christoph Hellwig 
> > > Cc: Marek Szyprowski 
> > > Cc: Robin Murphy 
> > > Cc: Matthias Brugger 
> > > Cc: Andrew Morton 
> > > ---
> > >  kernel/dma/direct.c | 8 
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > > index 4c6c5e0635e3..aa10b4c5d762 100644
> > > --- a/kernel/dma/direct.c
> > > +++ b/kernel/dma/direct.c
> > > @@ -13,6 +13,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include "direct.h"
> > >
> > >  /*
> > > @@ -169,6 +170,9 @@ void *dma_direct_alloc(struct device *dev,
> > > size_t size,
> > > if (!PageHighMem(page))
> > > arch_dma_prep_coherent(page, size);
> > > *dma_handle = phys_to_dma_direct(dev,
> > > page_to_phys(page));
> > > +   /* remove kernel mapping for pages */
> > > +   set_memory_valid((unsigned
> > > long)phys_to_virt(dma_to_phys(dev, *dma_handle)),
> > > +   size >> PAGE_SHIFT, 0);
> >
> > This only works if the memory is mapped at page granularity in the
> > linear region, and you cannot rely on that. Many architectures prefer
> > block mappings for the linear region, and arm64 will only use page
> > mappings if rodata=full is set (which is set by default but can be
> > overridden on the kernel command line)
> >
>
> We mainly want to solve arm64 arch.

Solve what?

> RODATA_FULL_DEFAULT_ENABLED should
> be the arm64 config. If we use CONFIG_RODATA_FULL_DEFAULT_ENABLED to
> check whether it call set_memory_valid(). It should avoid other
> architectures. Do you think this method is work?
>

No. Try it with rodata=off (or rodata=on for that matter) and see what happens.

> Thanks for your explaination and suggestion.
>

YW

>
> >
> > > /* return the page pointer as the opaque cookie */
> > > return page;
> > > }
> > > @@ -278,6 +282,10 @@ void dma_direct_free(struct device *dev,
> > > size_t size,
> > >
> > > if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> > > !force_dma_unencrypted(dev) &&
> > > !is_swiotlb_for_alloc(dev)) {
> > > +   size = PAGE_ALIGN(size);
> > > +   /* create kernel mapping for pages */
> > > +   set_memory_valid((unsigned
> > > long)phys_to_virt(dma_to_phys(dev, dma_addr)),
> > > +   size >> PAGE_SHIFT, 1);
> > > /* cpu_addr is a struct page cookie, not a kernel
> > > address */
> > > dma_free_contiguous(dev, cpu_addr, size);
> > > return;
> > > --
> > > 2.18.0
> > >
> > >
> > > ___
> > > linux-arm-kernel mailing list
> > > linux-arm-ker...@lists.infradead.org
> > >
> https://urldefense.com/v3/__http://lists.infradead.org/mailman/listinfo/linux-arm-kernel__;!!CTRNKA9wMg0ARbw!16dLCjnvtRkaRLeCO9AQ7Fund5XL0FicZmeVaU3-WkFymr-0lbITfzwrvoJpiHlqnqIu-g$
> > >
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING

2021-11-01 Thread Ard Biesheuvel
On Mon, 1 Nov 2021 at 04:17, Walter Wu  wrote:
>
> DMA_ATTR_NO_KERNEL_MAPPING is to avoid creating a kernel mapping
> for the allocated buffer, but current implementation is that
> PTE of allocated buffer in kernel page table is valid. So we
> should set invalid for PTE of allocate buffer so that there are
> no kernel mapping for the allocated buffer.
>
> In some cases, we don't hope the allocated buffer to be read
> by cpu or speculative execution, so we use DMA_ATTR_NO_KERNEL_MAPPING
> to get no kernel mapping in order to achieve this goal.
>
> Signed-off-by: Walter Wu 
> Cc: Christoph Hellwig 
> Cc: Marek Szyprowski 
> Cc: Robin Murphy 
> Cc: Matthias Brugger 
> Cc: Andrew Morton 
> ---
>  kernel/dma/direct.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4c6c5e0635e3..aa10b4c5d762 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "direct.h"
>
>  /*
> @@ -169,6 +170,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> if (!PageHighMem(page))
> arch_dma_prep_coherent(page, size);
> *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
> +   /* remove kernel mapping for pages */
> +   set_memory_valid((unsigned long)phys_to_virt(dma_to_phys(dev, 
> *dma_handle)),
> +   size >> PAGE_SHIFT, 0);

This only works if the memory is mapped at page granularity in the
linear region, and you cannot rely on that. Many architectures prefer
block mappings for the linear region, and arm64 will only use page
mappings if rodata=full is set (which is set by default but can be
overridden on the kernel command line)


> /* return the page pointer as the opaque cookie */
> return page;
> }
> @@ -278,6 +282,10 @@ void dma_direct_free(struct device *dev, size_t size,
>
> if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
> +   size = PAGE_ALIGN(size);
> +   /* create kernel mapping for pages */
> +   set_memory_valid((unsigned long)phys_to_virt(dma_to_phys(dev, 
> dma_addr)),
> +   size >> PAGE_SHIFT, 1);
> /* cpu_addr is a struct page cookie, not a kernel address */
> dma_free_contiguous(dev, cpu_addr, size);
> return;
> --
> 2.18.0
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 0/9] ACPI/IORT: Support for IORT RMR node

2021-08-05 Thread Ard Biesheuvel
On Thu, 5 Aug 2021 at 15:35, Shameerali Kolothum Thodi
 wrote:
>
>
>
> > -Original Message-----
> > From: Ard Biesheuvel [mailto:a...@kernel.org]
> > Sent: 05 August 2021 14:23
> > To: Shameerali Kolothum Thodi 
> > Cc: Linux ARM ; ACPI Devel Maling List
> > ; Linux IOMMU
> > ; Linuxarm ;
> > Lorenzo Pieralisi ; Joerg Roedel
> > ; Robin Murphy ; Will Deacon
> > ; wanghuiqiang ; Guohanjun
> > (Hanjun Guo) ; Steven Price
> > ; Sami Mujawar ; Jon
> > Nettleton ; Eric Auger ;
> > yangyicong 
> > Subject: Re: [PATCH v7 0/9] ACPI/IORT: Support for IORT RMR node
> >
> > On Thu, 5 Aug 2021 at 10:10, Shameer Kolothum
> >  wrote:
> > >
> > > Hi,
> > >
> > > The series adds support to IORT RMR nodes specified in IORT
> > > Revision E.b -ARM DEN 0049E[0]. RMR nodes are used to describe
> > > memory ranges that are used by endpoints and require a unity
> > > mapping in SMMU.
> > >
> > > We have faced issues with 3408iMR RAID controller cards which
> > > fail to boot when SMMU is enabled. This is because these
> > > controllers make use of host memory for various caching related
> > > purposes and when SMMU is enabled the iMR firmware fails to
> > > access these memory regions as there is no mapping for them.
> > > IORT RMR provides a way for UEFI to describe and report these
> > > memory regions so that the kernel can make a unity mapping for
> > > these in SMMU.
> > >
> >
> > Does this mean we are ignoring the RMR memory ranges, and exposing the
> > entire physical address space to devices using the stream IDs in
> > question?
>
> Nope. RMR node is used to describe the memory ranges used by end points
> behind SMMU. And this information is used to create 1 : 1 mappings for those
> ranges in SMMU. Anything outside those ranges will result in translation
> fault(if there are no other dynamic DMA mappings).
>

Excellent! It was not obvious to me from looking at the patches, so I
had to ask.

Thanks,
Ard.

>
> >
> > > Change History:
> > >
> > > v6 --> v7
> > >
> > > The only change from v6 is the fix pointed out by Steve to
> > > the SMMUv2 SMR bypass install in patch #8.
> > >
> > > Thanks to the Tested-by tags by Laurentiu with SMMUv2 and
> > > Hanjun/Huiqiang with SMMUv3 for v6. I haven't added the tags
> > > yet as the series still needs more review[1].
> > >
> > > Feedback and tests on this series is very much appreciated.
> > >
> > > v5 --> v6
> > > - Addressed comments from Robin & Lorenzo.
> > >   : Moved iort_parse_rmr() to acpi_iort_init() from
> > > iort_init_platform_devices().
> > >   : Removed use of struct iort_rmr_entry during the initial
> > > parse. Using struct iommu_resv_region instead.
> > >   : Report RMR address alignment and overlap errors, but continue.
> > >   : Reworked arm_smmu_init_bypass_stes() (patch # 6).
> > > - Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8).
> > > - Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based
> > >   on Type of RMR region. Suggested by Jon N.
> > >
> > > Thanks,
> > > Shameer
> > > [0] https://developer.arm.com/documentation/den0049/latest/
> > > [1]
> > https://lore.kernel.org/linux-acpi/20210716083442.1708-1-shameerali.koloth
> > um.th...@huawei.com/T/#m043c95b869973a834b2fd57f3e1ed0325c84f3b7
> > > --
> > > v4 --> v5
> > >  -Added a fw_data union to struct iommu_resv_region and removed
> > >   struct iommu_rmr (Based on comments from Joerg/Robin).
> > >  -Added iommu_put_rmrs() to release mem.
> > >  -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by
> > >   yet because of the above changes.
> > >
> > > v3 -->v4
> > > -Included the SMMUv2 SMR bypass install changes suggested by
> > >  Steve(patch #7)
> > > -As per Robin's comments, RMR reserve implementation is now
> > >  more generic  (patch #8) and dropped v3 patches 8 and 10.
> > > -Rebase to 5.13-rc1
> > >
> > > RFC v2 --> v3
> > >  -Dropped RFC tag as the ACPICA header changes are now ready to be
> > >   part of 5.13[0]. But this series still has a dependency on that patch.
> > >  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
> > >   PCIe).
> > >  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
>

Re: [PATCH v7 0/9] ACPI/IORT: Support for IORT RMR node

2021-08-05 Thread Ard Biesheuvel
On Thu, 5 Aug 2021 at 10:10, Shameer Kolothum
 wrote:
>
> Hi,
>
> The series adds support to IORT RMR nodes specified in IORT
> Revision E.b -ARM DEN 0049E[0]. RMR nodes are used to describe
> memory ranges that are used by endpoints and require a unity
> mapping in SMMU.
>
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these
> controllers make use of host memory for various caching related
> purposes and when SMMU is enabled the iMR firmware fails to
> access these memory regions as there is no mapping for them.
> IORT RMR provides a way for UEFI to describe and report these
> memory regions so that the kernel can make a unity mapping for
> these in SMMU.
>

Does this mean we are ignoring the RMR memory ranges, and exposing the
entire physical address space to devices using the stream IDs in
question?


> Change History:
>
> v6 --> v7
>
> The only change from v6 is the fix pointed out by Steve to
> the SMMUv2 SMR bypass install in patch #8.
>
> Thanks to the Tested-by tags by Laurentiu with SMMUv2 and
> Hanjun/Huiqiang with SMMUv3 for v6. I haven't added the tags
> yet as the series still needs more review[1].
>
> Feedback and tests on this series is very much appreciated.
>
> v5 --> v6
> - Addressed comments from Robin & Lorenzo.
>   : Moved iort_parse_rmr() to acpi_iort_init() from
> iort_init_platform_devices().
>   : Removed use of struct iort_rmr_entry during the initial
> parse. Using struct iommu_resv_region instead.
>   : Report RMR address alignment and overlap errors, but continue.
>   : Reworked arm_smmu_init_bypass_stes() (patch # 6).
> - Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8).
> - Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based
>   on Type of RMR region. Suggested by Jon N.
>
> Thanks,
> Shameer
> [0] https://developer.arm.com/documentation/den0049/latest/
> [1] 
> https://lore.kernel.org/linux-acpi/20210716083442.1708-1-shameerali.kolothum.th...@huawei.com/T/#m043c95b869973a834b2fd57f3e1ed0325c84f3b7
> --
> v4 --> v5
>  -Added a fw_data union to struct iommu_resv_region and removed
>   struct iommu_rmr (Based on comments from Joerg/Robin).
>  -Added iommu_put_rmrs() to release mem.
>  -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by
>   yet because of the above changes.
>
> v3 -->v4
> -Included the SMMUv2 SMR bypass install changes suggested by
>  Steve(patch #7)
> -As per Robin's comments, RMR reserve implementation is now
>  more generic  (patch #8) and dropped v3 patches 8 and 10.
> -Rebase to 5.13-rc1
>
> RFC v2 --> v3
>  -Dropped RFC tag as the ACPICA header changes are now ready to be
>   part of 5.13[0]. But this series still has a dependency on that patch.
>  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
>   PCIe).
>  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
>   discussion here[1].
>  -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!)
> --
>
> Jon Nettleton (1):
>   iommu/arm-smmu: Get associated RMR info and install bypass SMR
>
> Shameer Kolothum (8):
>   iommu: Introduce a union to struct iommu_resv_region
>   ACPI/IORT: Add support for RMR node parsing
>   iommu/dma: Introduce generic helper to retrieve RMR info
>   ACPI/IORT: Add a helper to retrieve RMR memory regions
>   iommu/arm-smmu-v3: Introduce strtab init helper
>   iommu/arm-smmu-v3: Refactor arm_smmu_init_bypass_stes() to force
> bypass
>   iommu/arm-smmu-v3: Get associated RMR info and install bypass STE
>   iommu/dma: Reserve any RMR regions associated with a dev
>
>  drivers/acpi/arm64/iort.c   | 172 +++-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  76 +++--
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  48 ++
>  drivers/iommu/dma-iommu.c   |  89 +-
>  include/linux/acpi_iort.h   |   7 +
>  include/linux/dma-iommu.h   |  13 ++
>  include/linux/iommu.h   |  11 ++
>  7 files changed, 393 insertions(+), 23 deletions(-)
>
> --
> 2.17.1
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/7] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-22 Thread Ard Biesheuvel
On Wed, 21 Oct 2020 at 14:35, Nicolas Saenz Julienne
 wrote:
>
> Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
> physical address addressable by all DMA masters in the system. It's
> specially useful for setting memory zones sizes at early boot time.
>
> Signed-off-by: Nicolas Saenz Julienne 
>
> ---
>
> Changes since v3:
>  - use u64 with cpu_end
>
> Changes since v2:
>  - Use PHYS_ADDR_MAX
>  - return phys_dma_t
>  - Rename function
>  - Correct subject
>  - Add support to start parsing from an arbitrary device node in order
>for the function to work with unit tests
>
>  drivers/of/address.c | 42 ++
>  include/linux/of.h   |  7 +++
>  2 files changed, 49 insertions(+)
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index eb9ab4f1e80b..47dfe5881e18 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
> struct bus_dma_region **map)
>  }
>  #endif /* CONFIG_HAS_DMA */
>
> +/**
> + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
> + * @np: The node to start searching from or NULL to start from the root
> + *
> + * Gets the highest CPU physical address that is addressable by all DMA 
> masters
> + * in the sub-tree pointed by np, or the whole tree if NULL is passed. If no
> + * DMA constrained device is found, it returns PHYS_ADDR_MAX.
> + */
> +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> +{
> +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> +   struct of_range_parser parser;
> +   phys_addr_t subtree_max_addr;
> +   struct device_node *child;
> +   struct of_range range;
> +   const __be32 *ranges;
> +   u64 cpu_end = 0;
> +   int len;
> +
> +   if (!np)
> +   np = of_root;
> +
> +   ranges = of_get_property(np, "dma-ranges", &len);
> +   if (ranges && len) {
> +   of_dma_range_parser_init(&parser, np);
> +   for_each_of_range(&parser, &range)
> +   if (range.cpu_addr + range.size > cpu_end)
> +   cpu_end = range.cpu_addr + range.size;

Shouldn't this be 'range.cpu_addr + range.size - 1' ?

> +
> +   if (max_cpu_addr > cpu_end)
> +   max_cpu_addr = cpu_end;
> +   }
> +
> +   for_each_available_child_of_node(np, child) {
> +   subtree_max_addr = of_dma_get_max_cpu_address(child);
> +   if (max_cpu_addr > subtree_max_addr)
> +   max_cpu_addr = subtree_max_addr;
> +   }
> +
> +   return max_cpu_addr;
> +}
> +
>  /**
>   * of_dma_is_coherent - Check if device is coherent
>   * @np:device node
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 481ec0467285..db8db8f2c967 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
>const char *map_name, const char *map_mask_name,
>struct device_node **target, u32 *id_out);
>
> +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
> +
>  #else /* CONFIG_OF */
>
>  static inline void of_core_init(void)
> @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 
> id,
> return -EINVAL;
>  }
>
> +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
> +{
> +   return PHYS_ADDR_MAX;
> +}
> +
>  #define of_match_ptr(_ptr) NULL
>  #define of_match_node(_matches, _node) NULL
>  #endif /* CONFIG_OF */
> --
> 2.28.0
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-15 Thread Ard Biesheuvel
On Thu, 15 Oct 2020 at 12:31, Lorenzo Pieralisi
 wrote:
>
> On Wed, Oct 14, 2020 at 09:12:09PM +0200, Nicolas Saenz Julienne wrote:
>
> [...]
>
> > +unsigned int __init acpi_iort_get_zone_dma_size(void)
> > +{
> > + struct acpi_table_iort *iort;
> > + struct acpi_iort_node *node, *end;
> > + acpi_status status;
> > + u8 limit = 32;
> > + int i;
> > +
> > + if (acpi_disabled)
> > + return limit;
> > +
> > + status = acpi_get_table(ACPI_SIG_IORT, 0,
> > + (struct acpi_table_header **)&iort);
> > + if (ACPI_FAILURE(status))
> > + return limit;
> > +
> > + node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
> > + end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
> > +
> > + for (i = 0; i < iort->node_count; i++) {
> > + if (node >= end)
> > + break;
> > +
> > + switch (node->type) {
> > + struct acpi_iort_named_component *ncomp;
> > + struct acpi_iort_root_complex *rc;
> > +
> > + case ACPI_IORT_NODE_NAMED_COMPONENT:
> > + ncomp = (struct acpi_iort_named_component 
> > *)node->node_data;
> > + if (ncomp->memory_address_limit)
> > + limit = min(limit, 
> > ncomp->memory_address_limit);
> > + break;
> > +
> > + case ACPI_IORT_NODE_PCI_ROOT_COMPLEX:
> > + rc = (struct acpi_iort_root_complex *)node->node_data;
> > + if (rc->memory_address_limit)
>
> You need to add a node revision check here, see rc_dma_get_range() in
> drivers/acpi/arm64/iort.c, otherwise we may be reading junk data
> in older IORT tables - acpica structures are always referring to the
> latest specs.
>

Indeed - apologies for not mentioning that when handing over the patch.

Also, we could use min_not_zero() here instead of the if ()
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-15 Thread Ard Biesheuvel
On Fri, 16 Oct 2020 at 08:51, Hanjun Guo  wrote:
>
> On 2020/10/16 2:03, Catalin Marinas wrote:
> > On Thu, Oct 15, 2020 at 10:26:18PM +0800, Hanjun Guo wrote:
> >> On 2020/10/15 3:12, Nicolas Saenz Julienne wrote:
> >>> From: Ard Biesheuvel 
> >>>
> >>> We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
> >>> incorporating masters that can address less than 32 bits of DMA, in
> >>> particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
> >>> peripherals that can only address up to 1 GB (and its PCIe host
> >>> bridge can only access the bottom 3 GB)
> >>>
> >>> Instructing the DMA layer about these limitations is straight-forward,
> >>> even though we had to fix some issues regarding memory limits set in
> >>> the IORT for named components, and regarding the handling of ACPI _DMA
> >>> methods. However, the DMA layer also needs to be able to allocate
> >>> memory that is guaranteed to meet those DMA constraints, for bounce
> >>> buffering as well as allocating the backing for consistent mappings.
> >>>
> >>> This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
> >>> it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
> >>> problems with kdump, and potentially in other places where allocations
> >>> cannot cross zone boundaries. Therefore, we should avoid having two
> >>> separate DMA zones when possible.
> >>>
> >>> So let's do an early scan of the IORT, and only create the ZONE_DMA
> >>> if we encounter any devices that need it. This puts the burden on
> >>> the firmware to describe such limitations in the IORT, which may be
> >>> redundant (and less precise) if _DMA methods are also being provided.
> >>> However, it should be noted that this situation is highly unusual for
> >>> arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
> >>> the _DMA method if implemented, and so we will not lose the ability to
> >>> perform streaming DMA outside the ZONE_DMA if the _DMA method permits
> >>> it.
> >>
> >> Sorry, I'm still a little bit confused. With this patch, if we have
> >> a device which set the right _DMA method (DMA size >= 32), but with the
> >> wrong DMA size in IORT, we still have the ZONE_DMA created which
> >> is actually not needed?
> >
> > With the current kernel, we get a ZONE_DMA already with an arbitrary
> > size of 1GB that matches what RPi4 needs. We are trying to eliminate
> > such unnecessary ZONE_DMA based on some heuristics (well, something that
> > looks "better" than a OEM ID based quirk). Now, if we learn that IORT
> > for platforms in the field is that broken as to describe few bits-wide
> > DMA masks, we may have to go back to the OEM ID quirk.
>
> Some platforms using 0 as the memory size limit, for example D05 [0] and
> D06 [1], I think we need to go back to the OEM ID quirk.
>
> For D05/D06, there are multi interrupt controllers named as mbigen,
> mbigen is using the named component to describe the mappings with
> the ITS controller, and mbigen is using 0 as the memory size limit.
>
> Also since the memory size limit for PCI RC was introduced by later
> IORT revision, so firmware people may think it's fine to set that
> as 0 because the system works without it.
>

Hello Hanjun,

The patch only takes the address limit field into account if its value > 0.

Also, before commit 7fb89e1d44cb6aec ("ACPI/IORT: take _DMA methods
into account for named components"), the _DMA method was not taken
into account for named components at all, and only the IORT limit was
used, so I do not anticipate any problems with that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-15 Thread Ard Biesheuvel
On Thu, 15 Oct 2020 at 11:16, Nicolas Saenz Julienne
 wrote:
>
> On Thu, 2020-10-15 at 08:56 +0200, Ard Biesheuvel wrote:
> > On Thu, 15 Oct 2020 at 00:03, Rob Herring  wrote:
> > > On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne
> > >  wrote:
> > > > Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
> > > > physical address addressable by all DMA masters in the system. It's
> > > > specially useful for setting memory zones sizes at early boot time.
> > > >
> > > > Signed-off-by: Nicolas Saenz Julienne 
> > > >
> > > > ---
> > > >
> > > > Changes since v2:
> > > >  - Use PHYS_ADDR_MAX
> > > >  - return phys_dma_t
> > > >  - Rename function
> > > >  - Correct subject
> > > >  - Add support to start parsing from an arbitrary device node in order
> > > >for the function to work with unit tests
> > > >
> > > >  drivers/of/address.c | 42 ++
> > > >  include/linux/of.h   |  7 +++
> > > >  2 files changed, 49 insertions(+)
> > > >
> > > > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > > > index eb9ab4f1e80b..b5a9695aaf82 100644
> > > > --- a/drivers/of/address.c
> > > > +++ b/drivers/of/address.c
> > > > @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, 
> > > > const struct bus_dma_region **map)
> > > >  }
> > > >  #endif /* CONFIG_HAS_DMA */
> > > >
> > > > +/**
> > > > + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for 
> > > > DMA
> > > > + * @np: The node to start searching from or NULL to start from the root
> > > > + *
> > > > + * Gets the highest CPU physical address that is addressable by all 
> > > > DMA masters
> > > > + * in the system (or subtree when np is non-NULL). If no DMA 
> > > > constrained device
> > > > + * is found, it returns PHYS_ADDR_MAX.
> > > > + */
> > > > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> > > > +{
> > > > +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> > >
> > > One issue with using phys_addr_t is it may be 32-bit even though the
> > > DT is 64-bit addresses. LPAE capable system with LPAE disabled. Maybe
> > > the truncation is fine here? Maybe not.
> > >
> >
> > PHYS_ADDR_MAX is the max addressable CPU address on the system, and so
> > it makes sense to use it for the return type, and for the preliminary
> > return value: this is actually what /prevents/ truncation, because we
> > will only overwrite max_cpu_addr if the new u64 value is lower.
> >
>
> Actually I now see how things might go south.
>
> > > > +   if (ranges && len) {
> > > > +   of_dma_range_parser_init(&parser, np);
> > > > +   for_each_of_range(&parser, &range)
> > > > +   if (range.cpu_addr + range.size > cpu_end)
> > > > +   cpu_end = range.cpu_addr + range.size;
>
> If cpu_end hits 0x1_, it'll overflow to 0. This is possible on 32-bit
> systems (LPAE or not). And something similar might happen on LPAE disabled
> systems.
>
> I could add some extra logic, something like:
>
> /* We overflowed */
> if (cpu_end < range.cpu_addr)
> cpu_end = PHYS_ADDR_MAX;
>
> Which is not perfect but will cover most sensible cases.
>
> Or simply deal internally in u64s, and upon returning, check if "max_cpu_addr"
> falls higher than PHYS_ADDR_MAX.
>

Just use a u64 for cpu_end

> > > > +
> > > > +   if (max_cpu_addr > cpu_end)
> > > > +   max_cpu_addr = cpu_end;

... then this comparison and assignment will work as expected.

> > > > +   }
> > > > +
> > > > +   for_each_available_child_of_node(np, child) {
> > > > +   subtree_max_addr = of_dma_get_max_cpu_address(child);
> > > > +   if (max_cpu_addr > subtree_max_addr)
> > > > +   max_cpu_addr = subtree_max_addr;
> > > > +   }
> > > > +
> > > > +   return max_cpu_addr;
> > > > +}
>
> Regards,
> Nicolas
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-14 Thread Ard Biesheuvel
On Thu, 15 Oct 2020 at 00:03, Rob Herring  wrote:
>
> On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne
>  wrote:
> >
> > Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
> > physical address addressable by all DMA masters in the system. It's
> > specially useful for setting memory zones sizes at early boot time.
> >
> > Signed-off-by: Nicolas Saenz Julienne 
> >
> > ---
> >
> > Changes since v2:
> >  - Use PHYS_ADDR_MAX
> >  - return phys_dma_t
> >  - Rename function
> >  - Correct subject
> >  - Add support to start parsing from an arbitrary device node in order
> >for the function to work with unit tests
> >
> >  drivers/of/address.c | 42 ++
> >  include/linux/of.h   |  7 +++
> >  2 files changed, 49 insertions(+)
> >
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index eb9ab4f1e80b..b5a9695aaf82 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
> > struct bus_dma_region **map)
> >  }
> >  #endif /* CONFIG_HAS_DMA */
> >
> > +/**
> > + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
> > + * @np: The node to start searching from or NULL to start from the root
> > + *
> > + * Gets the highest CPU physical address that is addressable by all DMA 
> > masters
> > + * in the system (or subtree when np is non-NULL). If no DMA constrained 
> > device
> > + * is found, it returns PHYS_ADDR_MAX.
> > + */
> > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> > +{
> > +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
>
> One issue with using phys_addr_t is it may be 32-bit even though the
> DT is 64-bit addresses. LPAE capable system with LPAE disabled. Maybe
> the truncation is fine here? Maybe not.
>

PHYS_ADDR_MAX is the max addressable CPU address on the system, and so
it makes sense to use it for the return type, and for the preliminary
return value: this is actually what /prevents/ truncation, because we
will only overwrite max_cpu_addr if the new u64 value is lower.


> > +   struct of_range_parser parser;
> > +   phys_addr_t subtree_max_addr;
> > +   struct device_node *child;
> > +   phys_addr_t cpu_end = 0;
> > +   struct of_range range;
> > +   const __be32 *ranges;
> > +   int len;
> > +
> > +   if (!np)
> > +   np = of_root;
> > +
> > +   ranges = of_get_property(np, "dma-ranges", &len);
>
> I'm not really following why you changed the algorithm here. You're
> skipping disabled nodes which is good. Was there some other reason?
>
> > +   if (ranges && len) {
> > +   of_dma_range_parser_init(&parser, np);
> > +   for_each_of_range(&parser, &range)
> > +   if (range.cpu_addr + range.size > cpu_end)
> > +   cpu_end = range.cpu_addr + range.size;
> > +
> > +   if (max_cpu_addr > cpu_end)
> > +   max_cpu_addr = cpu_end;
> > +   }
> > +
> > +   for_each_available_child_of_node(np, child) {
> > +   subtree_max_addr = of_dma_get_max_cpu_address(child);
> > +   if (max_cpu_addr > subtree_max_addr)
> > +   max_cpu_addr = subtree_max_addr;
> > +   }
> > +
> > +   return max_cpu_addr;
> > +}
> > +
> >  /**
> >   * of_dma_is_coherent - Check if device is coherent
> >   * @np:device node
> > diff --git a/include/linux/of.h b/include/linux/of.h
> > index 481ec0467285..db8db8f2c967 100644
> > --- a/include/linux/of.h
> > +++ b/include/linux/of.h
> > @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
> >const char *map_name, const char *map_mask_name,
> >struct device_node **target, u32 *id_out);
> >
> > +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
> > +
> >  #else /* CONFIG_OF */
> >
> >  static inline void of_core_init(void)
> > @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, 
> > u32 id,
> > return -EINVAL;
> >  }
> >
> > +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node 
> > *np)
> > +{
> > +   return PHYS_ADDR_MAX;
> > +}
> > +
> >  #define of_match_ptr(_ptr) NULL
> >  #define of_match_node(_matches, _node) NULL
> >  #endif /* CONFIG_OF */
> > --
> > 2.28.0
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-10-12 Thread Ard Biesheuvel
On Mon, 12 Oct 2020 at 13:37, Catalin Marinas  wrote:
>
> On Sat, Oct 10, 2020 at 05:12:31PM +0200, Nicolas Saenz Julienne wrote:
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index f6902a2b4ea6..0eca5865dcb1 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -196,14 +196,16 @@ static void __init zone_sizes_init(unsigned long min, 
> > unsigned long max)
> >   unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> >
> >  #ifdef CONFIG_ZONE_DMA
> > + zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > +
> >   if (IS_ENABLED(CONFIG_ACPI)) {
> >   extern unsigned int acpi_iort_get_zone_dma_size(void);
> >
> >   zone_dma_bits = min(zone_dma_bits,
> >   acpi_iort_get_zone_dma_size());
> > - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> >   }
> >
> > + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> >   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> >  #endif
> >  #ifdef CONFIG_ZONE_DMA32
> > @@ -394,11 +396,6 @@ void __init arm64_memblock_init(void)
> >
> >   early_init_fdt_scan_reserved_mem();
> >
> > - if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> > - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > - arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
> > - }
>
> arm64_dma_phys_limit is used by memblock_alloc_low() (via
> ARCH_LOW_ADDRESS_LIMIT). I think it's too late to leave its
> initialisation to zone_sizes_init().
>

The only generic caller of memblock_alloc_low() is swiotlb_init(),
which is called much later. So at that point, we definitely need
ARCH_LOW_ADDRESS_LIMIT to be set correctly, but that means doing it in
zone_sizes_init() is early enough.

So the only problematic reference seems to be crashkernel_reserve() afaict.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()

2020-10-11 Thread Ard Biesheuvel
Hi Nicolas,

$SUBJECT is out of sync with the patch below. Also, for legibility, it
helps if the commit log is intelligible by itself, rather than relying
on $SUBJECT being the first line of the first paragraph.

On Sat, 10 Oct 2020 at 17:12, Nicolas Saenz Julienne
 wrote:
>
> The function provides the CPU physical address addressable by the most
> constrained bus in the system. It might be useful in order to
> dynamically set up memory zones during boot.
>
> Signed-off-by: Nicolas Saenz Julienne 
> ---
>  drivers/of/address.c | 34 ++
>  include/linux/of.h   |  7 +++
>  2 files changed, 41 insertions(+)
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index eb9ab4f1e80b..755e97b65096 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const 
> struct bus_dma_region **map)
>  }
>  #endif /* CONFIG_HAS_DMA */
>
> +/**
> + * of_dma_safe_phys_limit - Get system wide DMA safe address space
> + *
> + * Gets the CPU physical address limit for safe DMA addressing system wide by
> + * searching for the most constraining dma-range. Otherwise it returns ~0ULL.
> + */
> +u64 __init of_dma_safe_phys_limit(void)

I don't think 'safe' strikes the right tone here. You are looking for
the highest CPU address that is addressable by all DMA masters in the
system.

Something like

of_dma_get_max_cpu_address(void)

perhaps? Also, since this is generic code, phys_addr_t is probably a
better type to return.


> +{
> +   struct device_node *np = NULL;
> +   struct of_range_parser parser;
> +   const __be32 *ranges = NULL;

I think you can drop these NULL initializers.

> +   u64 phys_dma_limit = ~0ULL;

PHYS_ADDR_MAX

> +   struct of_range range;
> +   int len;
> +
> +   for_each_of_allnodes(np) {
> +   dma_addr_t cpu_end = 0;
> +
> +   ranges = of_get_property(np, "dma-ranges", &len);
> +   if (!ranges || !len)
> +   continue;
> +
> +   of_dma_range_parser_init(&parser, np);
> +   for_each_of_range(&parser, &range)
> +   if (range.cpu_addr + range.size > cpu_end)
> +   cpu_end = range.cpu_addr + range.size;
> +
> +   if (phys_dma_limit > cpu_end)
> +   phys_dma_limit = cpu_end;
> +   }
> +
> +   return phys_dma_limit;
> +}
> +
>  /**
>   * of_dma_is_coherent - Check if device is coherent
>   * @np:device node
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 481ec0467285..958c64cffa92 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
>const char *map_name, const char *map_mask_name,
>struct device_node **target, u32 *id_out);
>
> +u64 of_dma_safe_phys_limit(void);
> +
>  #else /* CONFIG_OF */
>
>  static inline void of_core_init(void)
> @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 
> id,
> return -EINVAL;
>  }
>
> +static inline u64 of_dma_safe_phys_limit(void)
> +{
> +   return ~0ULL;
> +}
> +
>  #define of_match_ptr(_ptr) NULL
>  #define of_match_node(_matches, _node) NULL
>  #endif /* CONFIG_OF */
> --
> 2.28.0
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-10 Thread Ard Biesheuvel
On Fri, 9 Oct 2020 at 19:10, Catalin Marinas  wrote:
>
> On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> >  wrote:
> > > We can move this check to IORT code and call it from arm64 if it
> > > can be made to work.
> >
> > Finding the smallest value in the IORT, and assigning it to
> > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > how is this going to fix the underlying issue?
>
> If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> allocations fall back to ZONE_DMA).
>
> kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> (it's been a while since I looked), the kdump allocation couldn't span
> multiple zones.
>
> In a separate thread, we try to fix kdump to use allocations above 4G as
> a fallback but this only fixes platforms with enough RAM (and maybe it's
> only those platforms that care about kdump).
>

One thing that strikes me as odd is that we are applying the same
shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
DRAM starts outside of the zone, it is shifted upwards.

On a typical ARM box, this gives me

[0.00] Zone ranges:
[0.00]   DMA  [mem 0x8000-0xbfff]
[0.00]   DMA32[mem 0xc000-0x]
[0.00]   Normal   [mem 0x0001-0x000f]

i.e., the 30-bit addressable range has bit 31 set, which is weird.

I wonder if it wouldn't be better (and less problematic in the general
case) to drop this logic for ZONE_DMA, and simply let it remain empty
unless there is really some memory there.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Ard Biesheuvel
On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
 wrote:
>
> On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> >  wrote:
> > >
> > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne 
> > > > > wrote:
> > > > > > Sadly I just realised that the series is incomplete, we have RPi4 
> > > > > > users that
> > > > > > want to boot unsing ACPI, and this series would break things for 
> > > > > > them. I'll
> > > > > > have a word with them to see what we can do for their use-case.
> > > > >
> > > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > > interface? And why would we as Linux developers care to support such
> > > > > a aims?
> > > >
> > > > The point is really whether we want to revert changes in Linux that
> > > > made both DT and ACPI boot work without quirks on RPi4.
> > >
> > > Well, and broke a big amount of devices that were otherwise fine.
> > >
> >
> > Yeah that was unfortunate.
> >
> > > > Having to check the RPi4 compatible string or OEM id in core init code 
> > > > is
> > > > awful, regardless of whether you boot via ACPI or via DT.
> > > >
> > > > The problem with this hardware is that it uses a DMA mask which is
> > > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > > with that at all. On DT, we have DMA ranges properties and the likes
> > > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > > DMA range attributes in the IORT, both of which are now handled
> > > > correctly. So all the information is there, we just have to figure out
> > > > how to consume it early on.
> > >
> > > Is it worth the effort just for a single board? I don't know about ACPI 
> > > but
> > > parsing dma-ranges that early at boot time is not trivial. My intuition 
> > > tells
> > > me that it'd be even harder for ACPI, being a more complex data structure.
> > >
> >
> > Yes, it will be harder, especially for the _DMA methods.
> >
> > > > Interestingly, this limitation always existed in the SoC, but it
> > > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > > it became a problem. This means issues like this could resurface in
> > > > the future with existing SoCs when they get shipped with more memory,
> > > > and so I would prefer fixing this in a generic way.
> > >
> > > Actually what I proposed here is pretty generic. Specially from arm64's
> > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits 
> > > based on
> > > whatever it finds in DT. Both those operations are architecture 
> > > independent.
> > > arm64 arch code doesn't care about the logic involved in ascertaining
> > > zone_dma_bits. I get that the last step isn't generic. But it's all setup 
> > > so as
> > > to make it as such whenever it's worth the effort.
> > >
> >
> > The problem is that, while we are providing a full description of the
> > SoC's capabilities, we short circuit this by inserting knowledge into
> > the code (that is shared between all DT architectures) that
> > "brcm,bcm2711" is special, and needs a DMA zone override.
> >
> > I think for ACPI boot, we might be able to work around this by cold
> > plugging the memory above 1 GB, but I have to double check whether it
> > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> > for me here from the top of their head)
>
> Is this information that we can infer from IORT nodes and make it
> generic (ie make a DMA limit out of all IORT nodes allowed ranges) ?
>

Yes, that seems feasible.

> We can move this check to IORT code and call it from arm64 if it
> can be made to work.
>

Finding the smallest value in the IORT, and assigning it to
zone_dma_bits if it is < 32 should be easy. But as I understand it,
having these separate DMA and DMA32 zones is what breaks kdump, no? So
how is this going to fix the underlying issue?

Nicolas, were there any other reported regressions caused by the
introduction of ZONE_DMA?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Ard Biesheuvel
On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
 wrote:
>
> On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users 
> > > > that
> > > > want to boot unsing ACPI, and this series would break things for them. 
> > > > I'll
> > > > have a word with them to see what we can do for their use-case.
> > >
> > > Stupid question:  why do these users insist on a totally unsuitable
> > > interface? And why would we as Linux developers care to support such
> > > a aims?
> >
> > The point is really whether we want to revert changes in Linux that
> > made both DT and ACPI boot work without quirks on RPi4.
>
> Well, and broke a big amount of devices that were otherwise fine.
>

Yeah that was unfortunate.

> > Having to check the RPi4 compatible string or OEM id in core init code is
> > awful, regardless of whether you boot via ACPI or via DT.
> >
> > The problem with this hardware is that it uses a DMA mask which is
> > narrower than 32, and the arm64 kernel is simply not set up to deal
> > with that at all. On DT, we have DMA ranges properties and the likes
> > to describe such limitations, on ACPI we have _DMA methods as well as
> > DMA range attributes in the IORT, both of which are now handled
> > correctly. So all the information is there, we just have to figure out
> > how to consume it early on.
>
> Is it worth the effort just for a single board? I don't know about ACPI but
> parsing dma-ranges that early at boot time is not trivial. My intuition tells
> me that it'd be even harder for ACPI, being a more complex data structure.
>

Yes, it will be harder, especially for the _DMA methods.

> > Interestingly, this limitation always existed in the SoC, but it
> > wasn't until they started shipping it with more than 1 GB of DRAM that
> > it became a problem. This means issues like this could resurface in
> > the future with existing SoCs when they get shipped with more memory,
> > and so I would prefer fixing this in a generic way.
>
> Actually what I proposed here is pretty generic. Specially from arm64's
> perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based 
> on
> whatever it finds in DT. Both those operations are architecture independent.
> arm64 arch code doesn't care about the logic involved in ascertaining
> zone_dma_bits. I get that the last step isn't generic. But it's all setup so 
> as
> to make it as such whenever it's worth the effort.
>

The problem is that, while we are providing a full description of the
SoC's capabilities, we short circuit this by inserting knowledge into
the code (that is shared between all DT architectures) that
"brcm,bcm2711" is special, and needs a DMA zone override.

I think for ACPI boot, we might be able to work around this by cold
plugging the memory above 1 GB, but I have to double check whether it
won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
for me here from the top of their head)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Ard Biesheuvel
On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
>
> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > want to boot unsing ACPI, and this series would break things for them. I'll
> > have a word with them to see what we can do for their use-case.
>
> Stupid question:  why do these users insist on a totally unsuitable
> interface?  And why would we as Linux developers care to support such
> a aims?
>

The point is really whether we want to revert changes in Linux that
made both DT and ACPI boot work without quirks on RPi4. Having to
check the RPi4 compatible string or OEM id in core init code is awful,
regardless of whether you boot via ACPI or via DT.

The problem with this hardware is that it uses a DMA mask which is
narrower than 32, and the arm64 kernel is simply not set up to deal
with that at all. On DT, we have DMA ranges properties and the likes
to describe such limitations, on ACPI we have _DMA methods as well as
DMA range attributes in the IORT, both of which are now handled
correctly. So all the information is there, we just have to figure out
how to consume it early on.

Interestingly, this limitation always existed in the SoC, but it
wasn't until they started shipping it with more than 1 GB of DRAM that
it became a problem. This means issues like this could resurface in
the future with existing SoCs when they get shipped with more memory,
and so I would prefer fixing this in a generic way.

Also, I assume papering over the issue like this does not fix the
kdump issue fundamentally, it just works around it, and so we might
run into this again in the future.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-08 Thread Ard Biesheuvel
(+ Lorenzo)

On Thu, 8 Oct 2020 at 12:14, Catalin Marinas  wrote:
>
> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne 
> > > > > > wrote:
> > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > > --- a/drivers/of/fdt.c
> > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > @@ -25,6 +25,7 @@
> > > > > > >  #include 
> > > > > > >  #include 
> > > > > > >  #include 
> > > > > > > +#include   /* for zone_dma_bits */
> > > > > > >
> > > > > > >  #include   /* for COMMAND_LINE_SIZE */
> > > > > > >  #include 
> > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > >  }
> > > > > > >
> > > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > > +{
> > > > > > > +   unsigned long dt_root = of_get_flat_dt_root();
> > > > > > > +
> > > > > > > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > > +   zone_dma_bits = 30;
> > > > > > > +}
> > > > > >
> > > > > > I think we could keep this entirely in the arm64 
> > > > > > setup_machine_fdt() and
> > > > > > not pollute the core code with RPi4-specific code.
> > > > >
> > > > > Actually, even better, could we not move the check to
> > > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > >
> > > > I did it this way as I vaguely remembered Rob saying he wanted to 
> > > > centralise
> > > > all early boot fdt code in one place. But I'll be happy to move it 
> > > > there.
> > >
> > > I can see Rob replied and I'm fine if that's his preference. However,
> > > what I don't particularly like is that in the arm64 code, if
> > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > > get a platform that actually needs 24 here (I truly hope not, but just
> > > the principle of relying on magic values)?
> > >
> > > So rather than guessing, I'd prefer if the arch code can override
> > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > > need to explicitly touch the zone_dma_bits variable.
> >
> > Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> > but couldn't think of a nicer alternative.
> >
> > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > want to boot unsing ACPI, and this series would break things for them. I'll
> > have a word with them to see what we can do for their use-case.
>
> Is there a way to get some SoC information from ACPI?
>

This is unfortunate. We used ACPI _DMA methods as they were designed
to communicate the DMA limit of the XHCI controller to the OS.

It shouldn't be too hard to match the OEM id field in the DSDT, and
switch to the smaller mask. But it sucks to have to add a quirk like
that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: support SMMU module probing from the IORT

2019-11-22 Thread Ard Biesheuvel
Add support for SMMU drivers built as modules to the ACPI/IORT device
probing path, by deferring the probe of the master if the SMMU driver is
known to exist but has not been loaded yet. Given that the IORT code
registers a platform device for each SMMU that it discovers, we can
easily trigger the udev based autoloading of the SMMU drivers by making
the platform device identifier part of the module alias.

Signed-off-by: Ard Biesheuvel 
---
 drivers/acpi/arm64/iort.c   | 4 ++--
 drivers/iommu/arm-smmu-v3.c | 1 +
 drivers/iommu/arm-smmu.c| 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 5a7551d060f2..a696457a9b11 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -850,9 +850,9 @@ static inline bool iort_iommu_driver_enabled(u8 type)
 {
switch (type) {
case ACPI_IORT_NODE_SMMU_V3:
-   return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
+   return IS_ENABLED(CONFIG_ARM_SMMU_V3);
case ACPI_IORT_NODE_SMMU:
-   return IS_BUILTIN(CONFIG_ARM_SMMU);
+   return IS_ENABLED(CONFIG_ARM_SMMU);
default:
pr_warn("IORT node type %u does not describe an SMMU\n", type);
return false;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7669beafc493..bf6a1e8eb9b0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3733,4 +3733,5 @@ module_platform_driver(arm_smmu_driver);
 
 MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations");
 MODULE_AUTHOR("Will Deacon ");
+MODULE_ALIAS("platform:arm-smmu-v3");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d55acc48aee3..db5106b0955b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2292,4 +2292,5 @@ module_platform_driver(arm_smmu_driver);
 
 MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");
 MODULE_AUTHOR("Will Deacon ");
+MODULE_ALIAS("platform:arm-smmu");
 MODULE_LICENSE("GPL v2");
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of: property: Add device link support for "iommu-map"

2019-11-22 Thread Ard Biesheuvel
On Fri, 22 Nov 2019 at 17:01, Rob Herring  wrote:
>
> On Fri, Nov 22, 2019 at 8:55 AM Will Deacon  wrote:
> >
> > [+Ard]
> >
> > Hi Rob,
> >
> > On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote:
> > > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon  wrote:
> > > >
> > > > Commit 8e12257dead7 ("of: property: Add device link support for iommus,
> > > > mboxes and io-channels") added device link support for IOMMU linkages
> > > > described using the "iommus" property. For PCI devices, this property
> > > > is not present and instead the "iommu-map" property is used on the host
> > > > bridge node to map the endpoint RequesterIDs to their corresponding
> > > > IOMMU instance.
> > > >
> > > > Add support for "iommu-map" to the device link supplier bindings so that
> > > > probing of PCI devices can be deferred until after the IOMMU is
> > > > available.
> > > >
> > > > Cc: Greg Kroah-Hartman 
> > > > Cc: Rob Herring 
> > > > Cc: Saravana Kannan 
> > > > Cc: Robin Murphy 
> > > > Signed-off-by: Will Deacon 
> > > > ---
> > > >
> > > > Applies against driver-core/driver-core-next.
> > > > Tested on AMD Seattle (arm64).
> > >
> > > Guess that answers my question whether anyone uses Seattle with DT.
> > > Seattle uses the old SMMU binding, and there's not even an IOMMU
> > > associated with the PCI host. I raise this mainly because the dts
> > > files for Seattle either need some love or perhaps should be removed.
> >
> > I'm using the new DT bindings on my Seattle, thanks to the firmware fairy
> > (Ard) visiting my flat with a dediprog. The patches I've posted to enable
> > modular builds of the arm-smmu driver require that the old binding is
> > disabled [1].
>
> Going to post those dts changes?
>

Last time I tried upstreaming seattle DT changes I got zero response,
so I didn't bother since.


> > > No issues with the patch itself though. I'll queue it after rc1.
> >
> > Thanks, although I think Greg has already queued it [2] due to the
> > dependencies on other patches in his tree.
>
> Okay, forgot to check my spam from Greg folder and missed that.
>
> Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] iommu/arm-smmu: Log CBFRSYNRA register on context fault

2019-04-23 Thread Ard Biesheuvel
On Tue, 23 Apr 2019 at 13:13, Robin Murphy  wrote:
>
> On 22/04/2019 08:10, Vivek Gautam wrote:
> > Bits[15:0] in CBFRSYNRA register contain information about
> > StreamID of the incoming transaction that generated the
> > fault. Dump CBFRSYNRA register to get this info.
> > This is specially useful in a distributed SMMU architecture
> > where multiple masters are connected to the SMMU.
> > SID information helps to quickly identify the faulting
> > master device.
> >
> > Signed-off-by: Vivek Gautam 
> > Reviewed-by: Bjorn Andersson 
> > ---
> >
> > Changes since v1:
> >   - Addressed review comments, given by Bjorn, for nits.
> >
> >   drivers/iommu/arm-smmu-regs.h | 2 ++
> >   drivers/iommu/arm-smmu.c  | 7 +--
> >   2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> > index a1226e4ab5f8..e9132a926761 100644
> > --- a/drivers/iommu/arm-smmu-regs.h
> > +++ b/drivers/iommu/arm-smmu-regs.h
> > @@ -147,6 +147,8 @@ enum arm_smmu_s2cr_privcfg {
> >   #define CBAR_IRPTNDX_SHIFT  24
> >   #define CBAR_IRPTNDX_MASK   0xff
> >
> > +#define ARM_SMMU_GR1_CBFRSYNRA(n)(0x400 + ((n) << 2))
> > +
> >   #define ARM_SMMU_GR1_CBA2R(n)   (0x800 + ((n) << 2))
> >   #define CBA2R_RW64_32BIT(0 << 0)
> >   #define CBA2R_RW64_64BIT(1 << 0)
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 045d93884164..e000473f8205 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -575,7 +575,9 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
> > *dev)
> >   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >   struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> >   struct arm_smmu_device *smmu = smmu_domain->smmu;
> > + void __iomem *gr1_base = ARM_SMMU_GR1(smmu);
> >   void __iomem *cb_base;
> > + u32 cbfrsynra;
>
> Nit: I would simply add to the existing "u32 fsr, fsynr;" declaration,
> but that's the sort of thing that could hopefully be fixed up when
> applying (or otherwise I might bulldoze it anyway in my eventual rework
> of register accesses throughout the driver). Regardless,
>
> Reviewed-by: Robin Murphy 
>
> >   cb_base = ARM_SMMU_CB(smmu, cfg->cbndx);
> >   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
> > @@ -585,10 +587,11 @@ static irqreturn_t arm_smmu_context_fault(int irq, 
> > void *dev)
> >
> >   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
> >   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
> > + cbfrsynra = readl_relaxed(gr1_base + 
> > ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
> >
> >   dev_err_ratelimited(smmu->dev,
> > - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> > cb=%d\n",
> > - fsr, iova, fsynr, cfg->cbndx);
> > + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> > cbfrsynra=0x%x, cb=%d\n",
> > + fsr, iova, fsynr, cbfrsynra, cfg->cbndx);
> >
> >   writel(fsr, cb_base + ARM_SMMU_CB_FSR);
> >   return IRQ_HANDLED;
> >
>

This is something I've had to hack up locally in the past, so

Acked-by: Ard Biesheuvel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-29 Thread Ard Biesheuvel
(+ Bjorn)

On Mon, 28 Jan 2019 at 12:27, Vivek Gautam  wrote:
>
> Hi Ard,
>
> On Thu, Jan 24, 2019 at 1:25 PM Ard Biesheuvel
>  wrote:
> >
> > On Thu, 24 Jan 2019 at 07:58, Vivek Gautam  
> > wrote:
> > >
> > > On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
> > >  wrote:
> > > >
> > > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
> > > > >
> > > > > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  
> > > > > > wrote:
> > > > > >>
> > > > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > > > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > > > > >>>  wrote:
> > > > > >>>>
> > > > > >>>> Hi,
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> > > > > >>>>  wrote:
> > > > > >>>>>
> > > > > >>>>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > > > > >>>>>  wrote:
> > > > > >>>>>>
> > > > > >>>>>> Qualcomm SoCs have an additional level of cache called as
> > > > > >>>>>> System cache, aka. Last level cache (LLC). This cache sits 
> > > > > >>>>>> right
> > > > > >>>>>> before the DDR, and is tightly coupled with the memory 
> > > > > >>>>>> controller.
> > > > > >>>>>> The clients using this cache request their slices from this
> > > > > >>>>>> system cache, make it active, and can then start using it.
> > > > > >>>>>> For these clients with smmu, to start using the system cache 
> > > > > >>>>>> for
> > > > > >>>>>> buffers and, related page tables [1], memory attributes need 
> > > > > >>>>>> to be
> > > > > >>>>>> set accordingly. This series add the required support.
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> Does this actually improve performance on reads from a device? 
> > > > > >>>>> The
> > > > > >>>>> non-cache coherent DMA routines perform an unconditional D-cache
> > > > > >>>>> invalidate by VA to the PoC before reading from the buffers 
> > > > > >>>>> filled by
> > > > > >>>>> the device, and I would expect the PoC to be defined as lying 
> > > > > >>>>> beyond
> > > > > >>>>> the LLC to still guarantee the architected behavior.
> > > > > >>>>
> > > > > >>>> We have seen performance improvements when running Manhattan
> > > > > >>>> GFXBench benchmarks.
> > > > > >>>>
> > > > > >>>
> > > > > >>> Ah ok, that makes sense, since in that case, the data flow is 
> > > > > >>> mostly
> > > > > >>> to the device, not from the device.
> > > > > >>>
> > > > > >>>> As for the PoC, from my knowledge on sdm845 the system cache, aka
> > > > > >>>> Last level cache (LLC) lies beyond the point of coherency.
> > > > > >>>> Non-cache coherent buffers will not be cached to system cache 
> > > > > >>>> also, and
> > > > > >>>> no additional software cache maintenance ops are required for 
> > > > > >>>> system cache.
> > > > > >>>> Pratik can add more if I am missing something.
> > > > > >>>>
> > > > > >>>> To take care of the memory attributes from DMA APIs side, we can 
> > > > > >>>> add a
> > > > > >>>> DMA_ATTR definition to take care of any dma non-coherent APIs 
> > > > > >>>> calls.
> > > > > >>>>
> > > > > >>>
> > > > > >>> So does the device use the correct inner non-cach

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-23 Thread Ard Biesheuvel
On Thu, 24 Jan 2019 at 07:58, Vivek Gautam  wrote:
>
> On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
>  wrote:
> >
> > On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
> > >
> > > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:
> > > >>
> > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > > >>>  wrote:
> > > >>>>
> > > >>>> Hi,
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> > > >>>>  wrote:
> > > >>>>>
> > > >>>>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > > >>>>>  wrote:
> > > >>>>>>
> > > >>>>>> Qualcomm SoCs have an additional level of cache called as
> > > >>>>>> System cache, aka. Last level cache (LLC). This cache sits right
> > > >>>>>> before the DDR, and is tightly coupled with the memory controller.
> > > >>>>>> The clients using this cache request their slices from this
> > > >>>>>> system cache, make it active, and can then start using it.
> > > >>>>>> For these clients with smmu, to start using the system cache for
> > > >>>>>> buffers and, related page tables [1], memory attributes need to be
> > > >>>>>> set accordingly. This series add the required support.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Does this actually improve performance on reads from a device? The
> > > >>>>> non-cache coherent DMA routines perform an unconditional D-cache
> > > >>>>> invalidate by VA to the PoC before reading from the buffers filled 
> > > >>>>> by
> > > >>>>> the device, and I would expect the PoC to be defined as lying beyond
> > > >>>>> the LLC to still guarantee the architected behavior.
> > > >>>>
> > > >>>> We have seen performance improvements when running Manhattan
> > > >>>> GFXBench benchmarks.
> > > >>>>
> > > >>>
> > > >>> Ah ok, that makes sense, since in that case, the data flow is mostly
> > > >>> to the device, not from the device.
> > > >>>
> > > >>>> As for the PoC, from my knowledge on sdm845 the system cache, aka
> > > >>>> Last level cache (LLC) lies beyond the point of coherency.
> > > >>>> Non-cache coherent buffers will not be cached to system cache also, 
> > > >>>> and
> > > >>>> no additional software cache maintenance ops are required for system 
> > > >>>> cache.
> > > >>>> Pratik can add more if I am missing something.
> > > >>>>
> > > >>>> To take care of the memory attributes from DMA APIs side, we can add 
> > > >>>> a
> > > >>>> DMA_ATTR definition to take care of any dma non-coherent APIs calls.
> > > >>>>
> > > >>>
> > > >>> So does the device use the correct inner non-cacheable, outer
> > > >>> writeback cacheable attributes if the SMMU is in pass-through?
> > > >>>
> > > >>> We have been looking into another use case where the fact that the
> > > >>> SMMU overrides memory attributes is causing issues (WC mappings used
> > > >>> by the radeon and amdgpu driver). So if the SMMU would honour the
> > > >>> existing attributes, would you still need the SMMU changes?
> > > >>
> > > >> Even if we could force a stage 2 mapping with the weakest pagetable
> > > >> attributes (such that combining would work), there would still need to
> > > >> be a way to set the TCR attributes appropriately if this behaviour is
> > > >> wanted for the SMMU's own table walks as well.
> > > >>
> > > >
> > > > Isn't that just a matter of implementing support for SMMUs that lack
> > > > the 'dma-coherent' attribute?
> > >
> > > Not quite - in general they need INC-ONC attributes in case there
> > > a

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
>
> On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:
> >>
> >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> >>>>  wrote:
> >>>>>
> >>>>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> >>>>>  wrote:
> >>>>>>
> >>>>>> Qualcomm SoCs have an additional level of cache called as
> >>>>>> System cache, aka. Last level cache (LLC). This cache sits right
> >>>>>> before the DDR, and is tightly coupled with the memory controller.
> >>>>>> The clients using this cache request their slices from this
> >>>>>> system cache, make it active, and can then start using it.
> >>>>>> For these clients with smmu, to start using the system cache for
> >>>>>> buffers and, related page tables [1], memory attributes need to be
> >>>>>> set accordingly. This series add the required support.
> >>>>>>
> >>>>>
> >>>>> Does this actually improve performance on reads from a device? The
> >>>>> non-cache coherent DMA routines perform an unconditional D-cache
> >>>>> invalidate by VA to the PoC before reading from the buffers filled by
> >>>>> the device, and I would expect the PoC to be defined as lying beyond
> >>>>> the LLC to still guarantee the architected behavior.
> >>>>
> >>>> We have seen performance improvements when running Manhattan
> >>>> GFXBench benchmarks.
> >>>>
> >>>
> >>> Ah ok, that makes sense, since in that case, the data flow is mostly
> >>> to the device, not from the device.
> >>>
> >>>> As for the PoC, from my knowledge on sdm845 the system cache, aka
> >>>> Last level cache (LLC) lies beyond the point of coherency.
> >>>> Non-cache coherent buffers will not be cached to system cache also, and
> >>>> no additional software cache maintenance ops are required for system 
> >>>> cache.
> >>>> Pratik can add more if I am missing something.
> >>>>
> >>>> To take care of the memory attributes from DMA APIs side, we can add a
> >>>> DMA_ATTR definition to take care of any dma non-coherent APIs calls.
> >>>>
> >>>
> >>> So does the device use the correct inner non-cacheable, outer
> >>> writeback cacheable attributes if the SMMU is in pass-through?
> >>>
> >>> We have been looking into another use case where the fact that the
> >>> SMMU overrides memory attributes is causing issues (WC mappings used
> >>> by the radeon and amdgpu driver). So if the SMMU would honour the
> >>> existing attributes, would you still need the SMMU changes?
> >>
> >> Even if we could force a stage 2 mapping with the weakest pagetable
> >> attributes (such that combining would work), there would still need to
> >> be a way to set the TCR attributes appropriately if this behaviour is
> >> wanted for the SMMU's own table walks as well.
> >>
> >
> > Isn't that just a matter of implementing support for SMMUs that lack
> > the 'dma-coherent' attribute?
>
> Not quite - in general they need INC-ONC attributes in case there
> actually is something in the architectural outer-cacheable domain.

But is it a problem to use INC-ONC attributes for the SMMU PTW on this
chip? AIUI, the reason for the SMMU changes is to avoid the
performance hit of snooping, which is more expensive than cache
maintenance of SMMU page tables. So are you saying the by-VA cache
maintenance is not relayed to this system cache, resulting in page
table updates to be invisible to masters using INC-ONC attributes?

> The
> case of the outer cacheablility being not that but a hint to control
> non-CPU traffic through some not-quite-transparent cache behind the PoC
> definitely stays wrapped up in qcom-specific magic ;)
>

I'm not surprised ...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:
>
> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  
> > wrote:
> >>
> >> Hi,
> >>
> >>
> >> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> >>  wrote:
> >>>
> >>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  
> >>> wrote:
> >>>>
> >>>> Qualcomm SoCs have an additional level of cache called as
> >>>> System cache, aka. Last level cache (LLC). This cache sits right
> >>>> before the DDR, and is tightly coupled with the memory controller.
> >>>> The clients using this cache request their slices from this
> >>>> system cache, make it active, and can then start using it.
> >>>> For these clients with smmu, to start using the system cache for
> >>>> buffers and, related page tables [1], memory attributes need to be
> >>>> set accordingly. This series add the required support.
> >>>>
> >>>
> >>> Does this actually improve performance on reads from a device? The
> >>> non-cache coherent DMA routines perform an unconditional D-cache
> >>> invalidate by VA to the PoC before reading from the buffers filled by
> >>> the device, and I would expect the PoC to be defined as lying beyond
> >>> the LLC to still guarantee the architected behavior.
> >>
> >> We have seen performance improvements when running Manhattan
> >> GFXBench benchmarks.
> >>
> >
> > Ah ok, that makes sense, since in that case, the data flow is mostly
> > to the device, not from the device.
> >
> >> As for the PoC, from my knowledge on sdm845 the system cache, aka
> >> Last level cache (LLC) lies beyond the point of coherency.
> >> Non-cache coherent buffers will not be cached to system cache also, and
> >> no additional software cache maintenance ops are required for system cache.
> >> Pratik can add more if I am missing something.
> >>
> >> To take care of the memory attributes from DMA APIs side, we can add a
> >> DMA_ATTR definition to take care of any dma non-coherent APIs calls.
> >>
> >
> > So does the device use the correct inner non-cacheable, outer
> > writeback cacheable attributes if the SMMU is in pass-through?
> >
> > We have been looking into another use case where the fact that the
> > SMMU overrides memory attributes is causing issues (WC mappings used
> > by the radeon and amdgpu driver). So if the SMMU would honour the
> > existing attributes, would you still need the SMMU changes?
>
> Even if we could force a stage 2 mapping with the weakest pagetable
> attributes (such that combining would work), there would still need to
> be a way to set the TCR attributes appropriately if this behaviour is
> wanted for the SMMU's own table walks as well.
>

Isn't that just a matter of implementing support for SMMUs that lack
the 'dma-coherent' attribute?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  wrote:
>
> Hi,
>
>
> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
>  wrote:
> >
> > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  
> > wrote:
> > >
> > > Qualcomm SoCs have an additional level of cache called as
> > > System cache, aka. Last level cache (LLC). This cache sits right
> > > before the DDR, and is tightly coupled with the memory controller.
> > > The clients using this cache request their slices from this
> > > system cache, make it active, and can then start using it.
> > > For these clients with smmu, to start using the system cache for
> > > buffers and, related page tables [1], memory attributes need to be
> > > set accordingly. This series add the required support.
> > >
> >
> > Does this actually improve performance on reads from a device? The
> > non-cache coherent DMA routines perform an unconditional D-cache
> > invalidate by VA to the PoC before reading from the buffers filled by
> > the device, and I would expect the PoC to be defined as lying beyond
> > the LLC to still guarantee the architected behavior.
>
> We have seen performance improvements when running Manhattan
> GFXBench benchmarks.
>

Ah ok, that makes sense, since in that case, the data flow is mostly
to the device, not from the device.

> As for the PoC, from my knowledge on sdm845 the system cache, aka
> Last level cache (LLC) lies beyond the point of coherency.
> Non-cache coherent buffers will not be cached to system cache also, and
> no additional software cache maintenance ops are required for system cache.
> Pratik can add more if I am missing something.
>
> To take care of the memory attributes from DMA APIs side, we can add a
> DMA_ATTR definition to take care of any dma non-coherent APIs calls.
>

So does the device use the correct inner non-cacheable, outer
writeback cacheable attributes if the SMMU is in pass-through?

We have been looking into another use case where the fact that the
SMMU overrides memory attributes is causing issues (WC mappings used
by the radeon and amdgpu driver). So if the SMMU would honour the
existing attributes, would you still need the SMMU changes?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-20 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  wrote:
>
> Qualcomm SoCs have an additional level of cache called as
> System cache, aka. Last level cache (LLC). This cache sits right
> before the DDR, and is tightly coupled with the memory controller.
> The clients using this cache request their slices from this
> system cache, make it active, and can then start using it.
> For these clients with smmu, to start using the system cache for
> buffers and, related page tables [1], memory attributes need to be
> set accordingly. This series add the required support.
>

Does this actually improve performance on reads from a device? The
non-cache coherent DMA routines perform an unconditional D-cache
invalidate by VA to the PoC before reading from the buffers filled by
the device, and I would expect the PoC to be defined as lying beyond
the LLC to still guarantee the architected behavior.



> This change is a realisation of following changes from downstream msm-4.9:
> iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
> iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]
>
> Changes since v2:
>  - Split the patches into io-pgtable-arm driver and arm-smmu driver.
>  - Converted smmu domain attributes to a bitmap, so multiple attributes
>can be managed easily.
>  - With addition of non-coherent page table mapping support [4], this
>patch series now aligns with the understanding of upgrading the
>non-coherent devices to use some level of outer cache.
>  - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE.
>  - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on
>stage-1 mapping.
>  - Added change to disable the attribute from arm_smmu_domain_set_attr()
>when needed.
>  - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API
>level.
>
> Goes on top of the non-coherent page tables support patch series [4]
>
> [1] https://patchwork.kernel.org/patch/10302791/
> [2] 
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51
> [3] 
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602
> [4] https://lore.kernel.org/patchwork/cover/1032938/
>
> Vivek Gautam (3):
>   iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
>   iommu/io-pgtable-arm: Add support to use system cache
>   iommu/arm-smmu: Add support to use system cache
>
>  drivers/iommu/arm-smmu.c   | 28 
>  drivers/iommu/io-pgtable-arm.c | 15 +--
>  drivers/iommu/io-pgtable.h |  4 
>  include/linux/iommu.h  |  2 ++
>  4 files changed, 43 insertions(+), 6 deletions(-)
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/7] Stop losing firmware-set DMA masks

2018-07-25 Thread Ard Biesheuvel
On 25 July 2018 at 13:31, Christoph Hellwig  wrote:
> Hi Robin,
>
> this series looks fine to me.  Do you want me to pull this into the
> dma-mapping tree?  I'd like to see a few more ACKs from the ACPI/OF/
> IOMMU maintainers though.
>

This fixes the issue I reported with the on-SoC NIC of the Socionext
SynQuacer, so assuming it works as advertised:

Acked-by: Ard Biesheuvel 

for the series.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: ARM64 SMMU Setup

2018-01-31 Thread Ard Biesheuvel


> On 31 Jan 2018, at 23:31, Thor Thayer  wrote:
> 
> Hi Ard,
> 
>> On 01/31/2018 04:21 PM, Ard Biesheuvel wrote:
>>> On 31 January 2018 at 22:09, Thor Thayer  
>>> wrote:
>>> Hi,
>>> 
>>> I'm enabling the ARM SMMU-500 on an ARM64 A53. I'm hitting a data abort in
>>> the probe functions because I'm accessing the registers in EL1 mode.
>>> 
>> Why do you think the fact that you are running at EL1 is causing the data 
>> abort?
> Yes, I may not be diagnosing this correctly. I narrowed it down to failing on 
> the first read in arm_smmu_device_cfg_probe().
> 
> When I read the same register with a debugger (I'm in EL1), it fails. I just 
> realized the read also fails in EL2.
> 
> If I elevate the read to EL3, I can read the default reset value from the 
> register but I mistakenly thought this was EL2.
> 

This confirms my suspicion. It is up to the secure firmware running in EL3 (or 
secure EL1) to configure the SMMU itself and the interconnect in a way that 
allows code running in EL2 or non-secure EL1 to access it (note that EL3 is 
always secure and EL2 is always non-secure)

I don’t know the details of the MMU-500 or the way it is integrated into your 
platform, so I can’t be of any more help, unfortunately.


> 
>>> Linux starts in EL2 mode but drops down to EL1 mode by the time it reaches
>>> the arm-smmu probe function.
>>> 
>>> Is there something else I need to add to either move the arm-smmu probe
>>> earlier or remain in EL2 until after the arm-smmu probe?
>>> 
>> No. I am pretty sure EL2 vs EL1 is not causing your issue. Could you
>> elaborate on the nature of the exception you are getting?
> The error I'm getting is:
> 
> Unhandled fault: synchronous external abort (0x9610) at 0x08e40020
> Internal error: : 9610 [#1] PREEMPT SMP

> To be honest I got lost trying to decode 0x9610 and I may have used the 
> wrong ARM document. I ended up interpreting this as Data Abort taken without 
> an Exception level from the ARMv8 ARM.
> 
> Additionally, the SMMU documentation seemed to indicate the Hypervisor would 
> need to run at EL2 so I ran with that.
> 
> Thank you for the reply and I'll go back and examine my assumptions again.
> 
> Thor
> 
>>> A Google search didn't turn up anything so I hoped posting to the list would
>>> help.
>>> 
>>> Thanks in advance,
>>> 
>>> Thor
>>> 
>>> 
>>> ___
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: ARM64 SMMU Setup

2018-01-31 Thread Ard Biesheuvel
On 31 January 2018 at 22:09, Thor Thayer  wrote:
> Hi,
>
> I'm enabling the ARM SMMU-500 on an ARM64 A53. I'm hitting a data abort in
> the probe functions because I'm accessing the registers in EL1 mode.
>

Why do you think the fact that you are running at EL1 is causing the data abort?

> Linux starts in EL2 mode but drops down to EL1 mode by the time it reaches
> the arm-smmu probe function.
>
> Is there something else I need to add to either move the arm-smmu probe
> earlier or remain in EL2 until after the arm-smmu probe?
>

No. I am pretty sure EL2 vs EL1 is not causing your issue. Could you
elaborate on the nature of the exception you are getting?

> A Google search didn't turn up anything so I hoped posting to the list would
> help.
>
> Thanks in advance,
>
> Thor
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] Optimise 64-bit IOVA allocations

2017-07-19 Thread Ard Biesheuvel
On 18 July 2017 at 17:57, Robin Murphy  wrote:
> Hi all,
>
> In the wake of the ARM SMMU optimisation efforts, it seems that certain
> workloads (e.g. storage I/O with large scatterlists) probably remain quite
> heavily influenced by IOVA allocation performance. Separately, Ard also
> reported massive performance drops for a graphical desktop on AMD Seattle
> when enabling SMMUs via IORT, which we traced to dma_32bit_pfn in the DMA
> ops domain getting initialised differently for ACPI vs. DT, and exposing
> the overhead of the rbtree slow path. Whilst we could go around trying to
> close up all the little gaps that lead to hitting the slowest case, it
> seems a much better idea to simply make said slowest case a lot less slow.
>
> I had a go at rebasing Leizhen's last IOVA series[1], but ended up finding
> the changes rather too hard to follow, so I've taken the liberty here of
> picking the whole thing up and reimplementing the main part in a rather
> less invasive manner.
>
> Robin.
>
> [1] 
> https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg17753.html
>
> Robin Murphy (1):
>   iommu/iova: Extend rbtree node caching
>
> Zhen Lei (3):
>   iommu/iova: Optimise rbtree searching
>   iommu/iova: Optimise the padding calculation
>   iommu/iova: Make dma_32bit_pfn implicit
>
>  drivers/gpu/drm/tegra/drm.c  |   3 +-
>  drivers/gpu/host1x/dev.c |   3 +-
>  drivers/iommu/amd_iommu.c|   7 +--
>  drivers/iommu/dma-iommu.c|  18 +--
>  drivers/iommu/intel-iommu.c  |  11 ++--
>  drivers/iommu/iova.c | 112 
> ---
>  drivers/misc/mic/scif/scif_rma.c |   3 +-
>  include/linux/iova.h |   8 +--
>  8 files changed, 60 insertions(+), 105 deletions(-)
>

These patches look suspiciously like the ones I have been using over
the past couple of weeks (modulo the tegra and host1x changes) from
your git tree. They work fine on my AMD Overdrive B1, both in DT and
in ACPI/IORT modes, although it is difficult to quantify any
performance deltas on my setup.

Tested-by: Ard Biesheuvel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address

2017-05-06 Thread Ard Biesheuvel
On 5 May 2017 at 18:11, Borislav Petkov  wrote:
> On Tue, Apr 18, 2017 at 04:18:31PM -0500, Tom Lendacky wrote:
>> Add a function that will return the E820 type associated with an address
>> range.
>
> ...
>
>> @@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum 
>> e820_type type)
>>* coverage of the desired range exists:
>>*/
>>   if (start >= end)
>> - return 1;
>> + return entry;
>>   }
>> - return 0;
>> +
>> + return NULL;
>> +}
>> +
>> +/*
>> + * This function checks if the entire range  is mapped with type.
>> + */
>> +bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
>> +{
>> + return __e820__mapped_all(start, end, type) ? 1 : 0;
>
> return !!__e820__mapped_all(start, end, type);
>

Even the !! double negation is redundant, given that the function returns bool.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] drivers/of_iommu: ignore SMMU DT nodes with status 'disabled'

2017-05-03 Thread Ard Biesheuvel

> On 3 May 2017, at 11:32, Robin Murphy  wrote:
> 
>> On 28/04/17 14:22, Ard Biesheuvel wrote:
>>> On 28 April 2017 at 14:17, Will Deacon  wrote:
>>>> On Fri, Apr 28, 2017 at 02:14:49PM +0100, Ard Biesheuvel wrote:
>>>>> On 28 April 2017 at 14:11, Will Deacon  wrote:
>>>>> Hi Ard,
>>>>> 
>>>>> [+ devicetree@]
>>>>> 
>>>>>> On Fri, Apr 14, 2017 at 01:43:15PM +0100, Ard Biesheuvel wrote:
>>>>>> DT nodes may have a status property, and if they do, such nodes should
>>>>>> only be considered present if the status property is set to 'okay'.
>>>>>> 
>>>>>> Currently, we call the init function of IOMMUs described by the device
>>>>>> tree without taking this into account, which may result in the output
>>>>>> below on systems where some SMMUs may be legally disabled.
>>>>>> 
>>>>>> Failed to initialise IOMMU /smb/smmu@e020
>>>>>> Failed to initialise IOMMU /smb/smmu@e0c0
>>>>>> arm-smmu e0a0.smmu: probing hardware configuration...
>>>>>> arm-smmu e0a0.smmu: SMMUv1 with:
>>>>>> arm-smmu e0a0.smmu:  stage 2 translation
>>>>>> arm-smmu e0a0.smmu:  coherent table walk
>>>>>> arm-smmu e0a0.smmu:  stream matching with 32 register groups, mask 
>>>>>> 0x7fff
>>>>>> arm-smmu e0a0.smmu:  8 context banks (8 stage-2 only)
>>>>>> arm-smmu e0a0.smmu:  Supported page sizes: 0x60211000
>>>>>> arm-smmu e0a0.smmu:  Stage-2: 40-bit IPA -> 40-bit PA
>>>>>> Failed to initialise IOMMU /smb/smmu@e060
>>>>>> Failed to initialise IOMMU /smb/smmu@e080
>>>>>> 
>>>>>> Since this is not an error condition, only call the init function if
>>>>>> the device is enabled, which also inhibits the spurious error messages.
>>>>>> 
>>>>>> Signed-off-by: Ard Biesheuvel 
>>>>>> ---
>>>>>> drivers/iommu/of_iommu.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>> 
>>>>>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>>>>>> index 2683e9fc0dcf..2dd1206e6c0d 100644
>>>>>> --- a/drivers/iommu/of_iommu.c
>>>>>> +++ b/drivers/iommu/of_iommu.c
>>>>>> @@ -183,7 +183,7 @@ static int __init of_iommu_init(void)
>>>>>>  for_each_matching_node_and_match(np, matches, &match) {
>>>>>>  const of_iommu_init_fn init_fn = match->data;
>>>>>> 
>>>>>> - if (init_fn(np))
>>>>>> + if (of_device_is_available(np) && init_fn(np))
>>>>>>  pr_err("Failed to initialise IOMMU %s\n",
>>>>>>  of_node_full_name(np));
>>>>>>  }
>>>>> 
>>>>> Is there a definition of what status = "disabled" is supposed to mean for 
>>>>> an
>>>>> IOMMU? For example, that could mean that the firmware has pre-programmed 
>>>>> the
>>>>> SMMU with particular translations or memory attributes (a bit like the
>>>>> CCA=1, CPM=1, DACS=0 case in ACPI IORT), or even disabled DMA traffic
>>>>> altogether.
>>>>> 
>>>>> So I think we'd need an update to the generic IOMMU binding text to say
>>>>> exactly what the semantics are supposed to be here.
>>>>> 
>>>> 
>>>> I agree that it might make sense to describe the behavior of the IOMMU
>>>> when it is left in the state we found it in. But that is not the same
>>>> as status=disabled.
>>>> 
>>>> The DTS subtree contains loads and loads of boilerplate
>>>> configurations, where only some pieces are enabled in the final image
>>>> by setting status=okay. So a node that has status 'disabled' should be
>>>> treated as 'not present', not as 'present but can be ignored under
>>>> assumptions such and such'
>>>> 
>>>> In other words, I think we are talking about two different issues here.
>>> 
>>> I'm not so sure... if we have a master device that has an iommus= property
>>> pointing to an IOMMU with st

Re: [PATCH] drivers/of_iommu: ignore SMMU DT nodes with status 'disabled'

2017-04-28 Thread Ard Biesheuvel
On 28 April 2017 at 14:17, Will Deacon  wrote:
> On Fri, Apr 28, 2017 at 02:14:49PM +0100, Ard Biesheuvel wrote:
>> On 28 April 2017 at 14:11, Will Deacon  wrote:
>> > Hi Ard,
>> >
>> > [+ devicetree@]
>> >
>> > On Fri, Apr 14, 2017 at 01:43:15PM +0100, Ard Biesheuvel wrote:
>> >> DT nodes may have a status property, and if they do, such nodes should
>> >> only be considered present if the status property is set to 'okay'.
>> >>
>> >> Currently, we call the init function of IOMMUs described by the device
>> >> tree without taking this into account, which may result in the output
>> >> below on systems where some SMMUs may be legally disabled.
>> >>
>> >>  Failed to initialise IOMMU /smb/smmu@e020
>> >>  Failed to initialise IOMMU /smb/smmu@e0c0
>> >>  arm-smmu e0a0.smmu: probing hardware configuration...
>> >>  arm-smmu e0a0.smmu: SMMUv1 with:
>> >>  arm-smmu e0a0.smmu:  stage 2 translation
>> >>  arm-smmu e0a0.smmu:  coherent table walk
>> >>  arm-smmu e0a0.smmu:  stream matching with 32 register groups, mask 
>> >> 0x7fff
>> >>  arm-smmu e0a0.smmu:  8 context banks (8 stage-2 only)
>> >>  arm-smmu e0a0.smmu:  Supported page sizes: 0x60211000
>> >>  arm-smmu e0a0.smmu:  Stage-2: 40-bit IPA -> 40-bit PA
>> >>  Failed to initialise IOMMU /smb/smmu@e060
>> >>  Failed to initialise IOMMU /smb/smmu@e080
>> >>
>> >> Since this is not an error condition, only call the init function if
>> >> the device is enabled, which also inhibits the spurious error messages.
>> >>
>> >> Signed-off-by: Ard Biesheuvel 
>> >> ---
>> >>  drivers/iommu/of_iommu.c | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> >> index 2683e9fc0dcf..2dd1206e6c0d 100644
>> >> --- a/drivers/iommu/of_iommu.c
>> >> +++ b/drivers/iommu/of_iommu.c
>> >> @@ -183,7 +183,7 @@ static int __init of_iommu_init(void)
>> >>   for_each_matching_node_and_match(np, matches, &match) {
>> >>   const of_iommu_init_fn init_fn = match->data;
>> >>
>> >> - if (init_fn(np))
>> >> + if (of_device_is_available(np) && init_fn(np))
>> >>   pr_err("Failed to initialise IOMMU %s\n",
>> >>   of_node_full_name(np));
>> >>   }
>> >
>> > Is there a definition of what status = "disabled" is supposed to mean for 
>> > an
>> > IOMMU? For example, that could mean that the firmware has pre-programmed 
>> > the
>> > SMMU with particular translations or memory attributes (a bit like the
>> > CCA=1, CPM=1, DACS=0 case in ACPI IORT), or even disabled DMA traffic
>> > altogether.
>> >
>> > So I think we'd need an update to the generic IOMMU binding text to say
>> > exactly what the semantics are supposed to be here.
>> >
>>
>> I agree that it might make sense to describe the behavior of the IOMMU
>> when it is left in the state we found it in. But that is not the same
>> as status=disabled.
>>
>> The DTS subtree contains loads and loads of boilerplate
>> configurations, where only some pieces are enabled in the final image
>> by setting status=okay. So a node that has status 'disabled' should be
>> treated as 'not present', not as 'present but can be ignored under
>> assumptions such and such'
>>
>> In other words, I think we are talking about two different issues here.
>
> I'm not so sure... if we have a master device that has an iommus= property
> pointing to an IOMMU with status="disabled", I really don't know whether we
> should:
>
>   1. Assume the master can do DMA with a 1:1 mapping of memory and no
>  changes to memory attributes
>
>   2. Assume the master can do DMA with a 1:1 mapping of memory, but
>  potentially with changes to the attributes
>
>   3. Assume the master can do DMA, but with some pre-existing translation
>  (what?)
>
>   4. Assume the master can't do DMA
>
> and I also don't know whether the "dma-coherent" property remains valid.
>

Ah yes. Good point.

So indeed, there should be some IOMMU specific status property that
can convey all of the above, or 1. and 4. at the minimum
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] drivers/of_iommu: ignore SMMU DT nodes with status 'disabled'

2017-04-28 Thread Ard Biesheuvel
On 28 April 2017 at 14:11, Will Deacon  wrote:
> Hi Ard,
>
> [+ devicetree@]
>
> On Fri, Apr 14, 2017 at 01:43:15PM +0100, Ard Biesheuvel wrote:
>> DT nodes may have a status property, and if they do, such nodes should
>> only be considered present if the status property is set to 'okay'.
>>
>> Currently, we call the init function of IOMMUs described by the device
>> tree without taking this into account, which may result in the output
>> below on systems where some SMMUs may be legally disabled.
>>
>>  Failed to initialise IOMMU /smb/smmu@e020
>>  Failed to initialise IOMMU /smb/smmu@e0c0
>>  arm-smmu e0a0.smmu: probing hardware configuration...
>>  arm-smmu e0a0.smmu: SMMUv1 with:
>>  arm-smmu e0a0.smmu:  stage 2 translation
>>  arm-smmu e0a0.smmu:  coherent table walk
>>  arm-smmu e0a0.smmu:  stream matching with 32 register groups, mask 
>> 0x7fff
>>  arm-smmu e0a0.smmu:  8 context banks (8 stage-2 only)
>>  arm-smmu e0a0.smmu:  Supported page sizes: 0x60211000
>>  arm-smmu e0a0.smmu:  Stage-2: 40-bit IPA -> 40-bit PA
>>  Failed to initialise IOMMU /smb/smmu@e060
>>  Failed to initialise IOMMU /smb/smmu@e080
>>
>> Since this is not an error condition, only call the init function if
>> the device is enabled, which also inhibits the spurious error messages.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  drivers/iommu/of_iommu.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index 2683e9fc0dcf..2dd1206e6c0d 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -183,7 +183,7 @@ static int __init of_iommu_init(void)
>>   for_each_matching_node_and_match(np, matches, &match) {
>>   const of_iommu_init_fn init_fn = match->data;
>>
>> - if (init_fn(np))
>> + if (of_device_is_available(np) && init_fn(np))
>>   pr_err("Failed to initialise IOMMU %s\n",
>>   of_node_full_name(np));
>>   }
>
> Is there a definition of what status = "disabled" is supposed to mean for an
> IOMMU? For example, that could mean that the firmware has pre-programmed the
> SMMU with particular translations or memory attributes (a bit like the
> CCA=1, CPM=1, DACS=0 case in ACPI IORT), or even disabled DMA traffic
> altogether.
>
> So I think we'd need an update to the generic IOMMU binding text to say
> exactly what the semantics are supposed to be here.
>

I agree that it might make sense to describe the behavior of the IOMMU
when it is left in the state we found it in. But that is not the same
as status=disabled.

The DTS subtree contains loads and loads of boilerplate
configurations, where only some pieces are enabled in the final image
by setting status=okay. So a node that has status 'disabled' should be
treated as 'not present', not as 'present but can be ignored under
assumptions such and such'

In other words, I think we are talking about two different issues here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] drivers/of_iommu: ignore SMMU DT nodes with status 'disabled'

2017-04-14 Thread Ard Biesheuvel
DT nodes may have a status property, and if they do, such nodes should
only be considered present if the status property is set to 'okay'.

Currently, we call the init function of IOMMUs described by the device
tree without taking this into account, which may result in the output
below on systems where some SMMUs may be legally disabled.

 Failed to initialise IOMMU /smb/smmu@e020
 Failed to initialise IOMMU /smb/smmu@e0c0
 arm-smmu e0a0.smmu: probing hardware configuration...
 arm-smmu e0a0.smmu: SMMUv1 with:
 arm-smmu e0a0.smmu:  stage 2 translation
 arm-smmu e0a0.smmu:  coherent table walk
 arm-smmu e0a0.smmu:  stream matching with 32 register groups, mask 0x7fff
 arm-smmu e0a0.smmu:  8 context banks (8 stage-2 only)
 arm-smmu e0a0.smmu:  Supported page sizes: 0x60211000
 arm-smmu e0a0.smmu:  Stage-2: 40-bit IPA -> 40-bit PA
 Failed to initialise IOMMU /smb/smmu@e060
 Failed to initialise IOMMU /smb/smmu@e080

Since this is not an error condition, only call the init function if
the device is enabled, which also inhibits the spurious error messages.

Signed-off-by: Ard Biesheuvel 
---
 drivers/iommu/of_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 2683e9fc0dcf..2dd1206e6c0d 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -183,7 +183,7 @@ static int __init of_iommu_init(void)
for_each_matching_node_and_match(np, matches, &match) {
const of_iommu_init_fn init_fn = match->data;
 
-   if (init_fn(np))
+   if (of_device_is_available(np) && init_fn(np))
pr_err("Failed to initialise IOMMU %s\n",
of_node_full_name(np));
}
-- 
2.9.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu