[PATCH] scsi: isci: Fix a typo in a comment

2020-10-02 Thread Christophe JAILLET
s/remtoe/remote/
and add a missing '.'

Signed-off-by: Christophe JAILLET 
---
 drivers/scsi/isci/remote_node_table.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/isci/remote_node_table.h 
b/drivers/scsi/isci/remote_node_table.h
index 721ab982d2ac..0ddfdda2b248 100644
--- a/drivers/scsi/isci/remote_node_table.h
+++ b/drivers/scsi/isci/remote_node_table.h
@@ -61,7 +61,7 @@
 /**
  *
  *
- * Remote node sets are sets of remote node index in the remtoe node table The
+ * Remote node sets are sets of remote node index in the remote node table. The
  * SCU hardware requires that STP remote node entries take three consecutive
  * remote node index so the table is arranged in sets of three. The bits are
  * used as 0111 0111 to make a byte and the bits define the set of three remote
-- 
2.25.1



Re: [PATCH v2] net: usb: rtl8150: prevent set_ethernet_addr from setting uninit address

2020-10-02 Thread Anant Thazhemadam


On 03-10-2020 04:08, David Miller wrote:
> From: Anant Thazhemadam 
> Date: Fri, 2 Oct 2020 17:04:13 +0530
>
>> But this patch is about ensuring that an uninitialized variable's
>> value (whatever that may be) is not set as the ethernet address
>> blindly (without any form of checking if get_registers() worked
>> as expected, or not).
> Right, and if you are going to check for errors then you have to
> handle the error properly.
>
> And the proper way to handle this error is to set a random ethernet
> address on the device.

Yes, I've understood that now.
I've prepared and tested a v3 accordingly, and will have it sent in soon enough.
Thank you so much for this!  :)

Thanks,
Anant



Re: [PATCH v4] staging: qlge: fix build breakage with dumping enabled

2020-10-02 Thread Benjamin Poirier
On 2020-10-03 07:59 +0800, Coiby Xu wrote:
> This fixes commit 0107635e15ac
> ("staging: qlge: replace pr_err with netdev_err") which introduced an
> build breakage of missing `struct ql_adapter *qdev` for some functions
> and a warning of type mismatch with dumping enabled, i.e.,
> 
> $ make CFLAGS_MODULE="-DQL_ALL_DUMP -DQL_OB_DUMP -DQL_CB_DUMP \
> -DQL_IB_DUMP -DQL_REG_DUMP -DQL_DEV_DUMP" M=drivers/staging/qlge
> 
> qlge_dbg.c: In function ‘ql_dump_ob_mac_rsp’:
> qlge_dbg.c:2051:13: error: ‘qdev’ undeclared (first use in this function); 
> did you mean ‘cdev’?
>  2051 |  netdev_err(qdev->ndev, "%s\n", __func__);
>   | ^~~~
> qlge_dbg.c: In function ‘ql_dump_routing_entries’:
> qlge_dbg.c:1435:10: warning: format ‘%s’ expects argument of type ‘char *’, 
> but argument 3 has type ‘int’ [-Wformat=]
>  1435 |"%s: Routing Mask %d = 0x%.08x\n",
>   | ~^
>   |  |
>   |  char *
>   | %d
>  1436 |i, value);
>   |~
>   ||
>   |int
> qlge_dbg.c:1435:37: warning: format ‘%x’ expects a matching ‘unsigned int’ 
> argument [-Wformat=]
>  1435 |"%s: Routing Mask %d = 0x%.08x\n",
>   | ^
>   | |
>   | unsigned int
> 
> Note that now ql_dump_rx_ring/ql_dump_tx_ring won't check if the passed
> parameter is a null pointer.
> 
> Fixes: 0107635e15ac ("staging: qlge: replace pr_err with netdev_err")
> Reported-by: Benjamin Poirier 
> Suggested-by: Benjamin Poirier 
> Signed-off-by: Coiby Xu 
> ---

Reviewed-by: Benjamin Poirier 


Re: [PATCH v2] net: usb: rtl8150: prevent set_ethernet_addr from setting uninit address

2020-10-02 Thread Anant Thazhemadam


On 02-10-2020 19:59, Petko Manolov wrote:
> On 20-10-02 17:35:25, Anant Thazhemadam wrote:
>> Yes, this clears things up for me. I'll see to it that this gets done in a 
>> v3.
> If set_ethernet_addr() fail, don't return error, but use eth_hw_addr_random() 
> instead to set random MAC address and continue with the probing.
>
> You can take a look here:
> https://lore.kernel.org/netdev/20201002075604.44335-1-petko.mano...@konsulko.com/
>
>
> cheers,
> Petko
Thank you for this reference. :)

Thanks,
Anant


Re: [PATCH] selftests/vm: 10x speedup for hmm-tests

2020-10-02 Thread John Hubbard

On 10/2/20 10:23 PM, SeongJae Park wrote:

On Fri, 2 Oct 2020 18:17:21 -0700 John Hubbard  wrote:


This patch reduces the running time for hmm-tests from about 10+
seconds, to just under 1.0 second, for an approximately 10x speedup.
That brings it in line with most of the other tests in selftests/vm,
which mostly run in < 1 sec.

This is done with a one-line change that simply reduces the number of
iterations of several tests, from 256, to 10.


Could this result in reduced test capacity?  If so, how about making the number
easily tweakable?



The choice of iterations was somewhat arbitrary. Unless and until we
have specific bugs that show up at a given number of iterations, we
should avoid running large iteration counts that blow the testing time
budget. Here, I'm not aware of any bugs that show up between 11 and 256
iterations, which is why I think 10 is an acceptable iteration count.

But, you are right: it's a nice thought to make the iteration count
adjustable via the command line. That would allow hmm-tests to act as a
quick selftest item, and also to provide a little bit more of a stress
test, when manually invoked with a higher iteration count.

That's a separate patch, though. Because TEST_F() and related unit test
macros used in this area, expect to run the same way every time, and
they don't really want to be fed iteration arguments. Or maybe they do,
and I've missed it on my first quick pass through.

And in fact, maybe it's not a good fit, if TEST_F() and kselftest are
pushing for more of a CUnit/gtest style of coding.

There are some design and policy questions there. It reminds me that
some programs here don't use TEST_F() at all, actually. But anyway, I'd
definitely like to sidestep all of that for now, and start with just
"get the test run time down under 1 second".

thanks,
--
John Hubbard
NVIDIA


Re: [PATCH v6 4/5] PCI: only return true when dev io state is really changed

2020-10-02 Thread Ethan Zhao
Sinan,

On Sat, Oct 3, 2020 at 12:08 AM Sinan Kaya  wrote:
>
> On 9/30/2020 3:05 AM, Ethan Zhao wrote:
> > When uncorrectable error happens, AER driver and DPC driver interrupt
> > handlers likely call
> >
> >pcie_do_recovery()
> >->pci_walk_bus()
> >  ->report_frozen_detected()
> >
> > with pci_channel_io_frozen the same time.
>
> We need some more data on this. If DPC is supported by HW, errors
> should be triggered by DPC not AER.
>
> If I remember right, there is a register that tells which AER errors
> should be handled by DPC.

When uncorrectable errors happen, non-fatal severity level, AER and
DPC could be triggered at the same time.


Thanks,
Ethan

>


[PATCH v3 2/2] vhost-vdpa: fix page pinning leakage in error path

2020-10-02 Thread Si-Wei Liu
Pinned pages are not properly accounted particularly when
mapping error occurs on IOTLB update. Clean up dangling
pinned pages for the error path. As the inflight pinned
pages, specifically for memory region that strides across
multiple chunks, would need more than one free page for
book keeping and accounting. For simplicity, pin pages
for all memory in the IOVA range in one go rather than
have multiple pin_user_pages calls to make up the entire
region. This way it's easier to track and account the
pages already mapped, particularly for clean-up in the
error path.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu 
---
Changes in v3:
- Factor out vhost_vdpa_map() change to a separate patch

Changes in v2:
- Fix incorrect target SHA1 referenced

 drivers/vhost/vdpa.c | 119 ++-
 1 file changed, 71 insertions(+), 48 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 0f27919..dad41dae 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -595,21 +595,19 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
struct vhost_dev *dev = >vdev;
struct vhost_iotlb *iotlb = dev->iotlb;
struct page **page_list;
-   unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
+   struct vm_area_struct **vmas;
unsigned int gup_flags = FOLL_LONGTERM;
-   unsigned long npages, cur_base, map_pfn, last_pfn = 0;
-   unsigned long locked, lock_limit, pinned, i;
+   unsigned long map_pfn, last_pfn = 0;
+   unsigned long npages, lock_limit;
+   unsigned long i, nmap = 0;
u64 iova = msg->iova;
+   long pinned;
int ret = 0;
 
if (vhost_iotlb_itree_first(iotlb, msg->iova,
msg->iova + msg->size - 1))
return -EEXIST;
 
-   page_list = (struct page **) __get_free_page(GFP_KERNEL);
-   if (!page_list)
-   return -ENOMEM;
-
if (msg->perm & VHOST_ACCESS_WO)
gup_flags |= FOLL_WRITE;
 
@@ -617,61 +615,86 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
if (!npages)
return -EINVAL;
 
+   page_list = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
+   vmas = kvmalloc_array(npages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+   if (!page_list || !vmas) {
+   ret = -ENOMEM;
+   goto free;
+   }
+
mmap_read_lock(dev->mm);
 
-   locked = atomic64_add_return(npages, >mm->pinned_vm);
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-   if (locked > lock_limit) {
+   if (npages + atomic64_read(>mm->pinned_vm) > lock_limit) {
ret = -ENOMEM;
-   goto out;
+   goto unlock;
}
 
-   cur_base = msg->uaddr & PAGE_MASK;
-   iova &= PAGE_MASK;
+   pinned = pin_user_pages(msg->uaddr & PAGE_MASK, npages, gup_flags,
+   page_list, vmas);
+   if (npages != pinned) {
+   if (pinned < 0) {
+   ret = pinned;
+   } else {
+   unpin_user_pages(page_list, pinned);
+   ret = -ENOMEM;
+   }
+   goto unlock;
+   }
 
-   while (npages) {
-   pinned = min_t(unsigned long, npages, list_size);
-   ret = pin_user_pages(cur_base, pinned,
-gup_flags, page_list, NULL);
-   if (ret != pinned)
-   goto out;
-
-   if (!last_pfn)
-   map_pfn = page_to_pfn(page_list[0]);
-
-   for (i = 0; i < ret; i++) {
-   unsigned long this_pfn = page_to_pfn(page_list[i]);
-   u64 csize;
-
-   if (last_pfn && (this_pfn != last_pfn + 1)) {
-   /* Pin a contiguous chunk of memory */
-   csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
-   if (vhost_vdpa_map(v, iova, csize,
-  map_pfn << PAGE_SHIFT,
-  msg->perm))
-   goto out;
-   map_pfn = this_pfn;
-   iova += csize;
+   iova &= PAGE_MASK;
+   map_pfn = page_to_pfn(page_list[0]);
+
+   /* One more iteration to avoid extra vdpa_map() call out of loop. */
+   for (i = 0; i <= npages; i++) {
+   unsigned long this_pfn;
+   u64 csize;
+
+   /* The last chunk may have no valid PFN next to it */
+   this_pfn = i < npages ? page_to_pfn(page_list[i]) : -1UL;
+
+   if (last_pfn && (this_pfn == -1UL ||
+this_pfn != 

[PATCH v3 1/2] vhost-vdpa: fix vhost_vdpa_map() on error condition

2020-10-02 Thread Si-Wei Liu
vhost_vdpa_map() should remove the iotlb entry just added
if the corresponding mapping fails to set up properly.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu 
---
 drivers/vhost/vdpa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 796fe97..0f27919 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -565,6 +565,9 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
  perm_to_iommu_flags(perm));
}
 
+   if (r)
+   vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
+
return r;
 }
 
-- 
1.8.3.1



[PATCH v3 0/2] vhost-vdpa mapping error path fixes

2020-10-02 Thread Si-Wei Liu
Commit 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
has following issues in the failure path of IOTLB update:

1) vhost_vdpa_map() does not clean up dangling iotlb entry
   upon mapping failure

2) vhost_vdpa_process_iotlb_update() has leakage of pinned
   pages in case of vhost_vdpa_map() failure

This patchset attempts to address the above issues.

Changes in v3:
- Factor out changes in vhost_vdpa_map() and the fix for
  page pinning leak to separate patches (Jason)

---
Si-Wei Liu (2):
  vhost-vdpa: fix vhost_vdpa_map() on error condition
  vhost-vdpa: fix page pinning leakage in error path

 drivers/vhost/vdpa.c | 122 +++
 1 file changed, 74 insertions(+), 48 deletions(-)

-- 
1.8.3.1



[PULL REQUEST] i2c for 5.9

2020-10-02 Thread Wolfram Sang
Linus,

some more driver fixes from I2C.

Please pull.

Thanks,

   Wolfram


The following changes since commit ba4f184e126b751d1bffad5897f263108befc780:

  Linux 5.9-rc6 (2020-09-20 16:33:55 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-current

for you to fetch changes up to 8947efc077168c53b84d039881a7c967086a248a:

  i2c: npcm7xx: Clear LAST bit after a failed transaction. (2020-09-27 20:05:27 
+0200)


Jean Delvare (1):
  i2c: i801: Exclude device from suspend direct complete optimization

Nicolas VINCENT (1):
  i2c: cpm: Fix i2c_ram structure

Tali Perry (1):
  i2c: npcm7xx: Clear LAST bit after a failed transaction.

 drivers/i2c/busses/i2c-cpm.c | 3 +++
 drivers/i2c/busses/i2c-i801.c| 1 +
 drivers/i2c/busses/i2c-npcm7xx.c | 9 +
 3 files changed, 13 insertions(+)


signature.asc
Description: PGP signature


Re: [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface

2020-10-02 Thread kernel test robot
Hi Ricardo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master driver-core/driver-core-testing 
linus/master v5.9-rc7 next-20201002]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
238c91115cd05c71447ea071624a4c9fe661f970
config: x86_64-randconfig-a012-20201002 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
bcd05599d0e53977a963799d6ee4f6e0bc21331b)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/6e37c052cde780c58a9dd815e667d89538e30579
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
git checkout 6e37c052cde780c58a9dd815e667d89538e30579
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
   return boot_cpu_has(X86_FEATURE_HYBRID_CPU);
   ^
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
   fatal error: too many errors emitted, stopping now [-ferror-limit=]
   20 errors generated.

vim +/X86_FEATURE_HYBRID_CPU +169 arch/x86/kernel/cpu/topology.c

   166  
   167  bool arch_has_cpu_type(void)
   168  {
 > 169  return boot_cpu_has(X86_FEATURE_HYBRID_CPU);
   170  }
   171  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH] selftests/vm: 10x speedup for hmm-tests

2020-10-02 Thread SeongJae Park
On Fri, 2 Oct 2020 18:17:21 -0700 John Hubbard  wrote:

> This patch reduces the running time for hmm-tests from about 10+
> seconds, to just under 1.0 second, for an approximately 10x speedup.
> That brings it in line with most of the other tests in selftests/vm,
> which mostly run in < 1 sec.
> 
> This is done with a one-line change that simply reduces the number of
> iterations of several tests, from 256, to 10.

Could this result in reduced test capacity?  If so, how about making the number
easily tweakable?


Thanks,
SeongJae Park

> Thanks to Ralph Campbell for suggesting changing NTIMES as a way to get the
> speedup.
> 
> Suggested-by: Ralph Campbell 
> Signed-off-by: John Hubbard 
> ---
> 
> This is based on mmotm.
> 
>  tools/testing/selftests/vm/hmm-tests.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/vm/hmm-tests.c 
> b/tools/testing/selftests/vm/hmm-tests.c
> index 6b79723d7dc6..5d1ac691b9f4 100644
> --- a/tools/testing/selftests/vm/hmm-tests.c
> +++ b/tools/testing/selftests/vm/hmm-tests.c
> @@ -49,7 +49,7 @@ struct hmm_buffer {
>  #define TWOMEG   (1 << 21)
>  #define HMM_BUFFER_SIZE (1024 << 12)
>  #define HMM_PATH_MAX64
> -#define NTIMES   256
> +#define NTIMES   10
>  
>  #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
>  
> -- 
> 2.28.0


mmotm 2020-10-02-22-22 uploaded

2020-10-02 Thread akpm
The mm-of-the-moment snapshot 2020-10-02-22-22 has been uploaded to

   https://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

https://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
https://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

https://github.com/hnaz/linux-mm

The directory https://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is also available at

https://github.com/hnaz/linux-mm



This mmotm tree contains the following patches against 5.9-rc7:
(patches marked "*" will be included in linux-next)

  origin.patch
* mm-slub-restore-initial-kmem_cache-flags.patch
* mm-page_alloc-handle-a-missing-case-for-memalloc_nocma_save-restore-apis.patch
* scripts-spellingtxt-fix-malformed-entry.patch
* proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
* proc-kpageflags-do-not-use-uninitialized-struct-pages.patch
* 
mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patch
* 
mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged-v5.patch
* mm-swapfile-avoid-split_swap_cluster-null-pointer-dereference.patch
* compiler-clang-add-build-check-for-clang-1001.patch
* revert-kbuild-disable-clangs-default-use-of-fmerge-all-constants.patch
* revert-arm64-bti-require-clang-=-1001-for-in-kernel-bti-support.patch
* revert-arm64-vdso-fix-compilation-with-clang-older-than-8.patch
* 
partially-revert-arm-8905-1-emit-__gnu_mcount_nc-when-using-clang-1000-or-newer.patch
* kasan-remove-mentions-of-unsupported-clang-versions.patch
* compiler-gcc-improve-version-error.patch
* compilerh-avoid-escaped-section-names.patch
* exporth-fix-section-name-for-config_trim_unused_ksyms-for-clang.patch
* kbuild-doc-describe-proper-script-invocation.patch
* increase-error-prone-spell-checking.patch
* scripts-decodecode-add-the-capability-to-supply-the-program-counter.patch
* ntfs-add-check-for-mft-record-size-in-superblock.patch
* fs-ocfs2-delete-repeated-words-in-comments.patch
* ocfs2-fix-potential-soft-lockup-during-fstrim.patch
* ocfs2-clear-links-count-in-ocfs2_mknod-if-an-error-occurs.patch
* ocfs2-fix-ocfs2-corrupt-when-iputting-an-inode.patch
* ramfs-support-o_tmpfile.patch
* fs-xattrc-fix-kernel-doc-warnings-for-setxattr-removexattr.patch
* fs_parse-mark-fs_param_bad_value-as-static.patch
* kernel-watchdog-flush-all-printk-nmi-buffers-when-hardlockup-detected.patch
  mm.patch
* mm-slabc-clean-code-by-removing-redundant-if-condition.patch
* include-linux-slabh-fix-a-typo-error-in-comment.patch
* mm-slub-branch-optimization-in-free-slowpath.patch
* mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch
* mm-slub-make-add_full-condition-more-explicit.patch
* mm-kmemleak-rely-on-rcu-for-task-stack-scanning.patch
* mmkmemleak-testc-move-kmemleak-testc-to-samples-dir.patch
* x86-numa-cleanup-configuration-dependent-command-line-options.patch
* x86-numa-add-nohmat-option.patch
* x86-numa-add-nohmat-option-fix.patch
* efi-fake_mem-arrange-for-a-resource-entry-per-efi_fake_mem-instance.patch
* acpi-hmat-refactor-hmat_register_target_device-to-hmem_register_device.patch
* 
acpi-hmat-refactor-hmat_register_target_device-to-hmem_register_device-fix.patch
* resource-report-parent-to-walk_iomem_res_desc-callback.patch
* mm-memory_hotplug-introduce-default-phys_to_target_node-implementation.patch
* 
mm-memory_hotplug-introduce-default-phys_to_target_node-implementation-fix.patch
* acpi-hmat-attach-a-device-for-each-soft-reserved-range.patch
* acpi-hmat-attach-a-device-for-each-soft-reserved-range-fix.patch
* device-dax-drop-the-dax_regionpfn_flags-attribute.patch
* device-dax-move-instance-creation-parameters-to-struct-dev_dax_data.patch
* device-dax-make-pgmap-optional-for-instance-creation.patch
* device-dax-kmem-introduce-dax_kmem_range.patch
* device-dax-kmem-move-resource-name-tracking-to-drvdata.patch
* 

Re: [PATCH v39 16/24] x86/sgx: Add a page reclaimer

2020-10-02 Thread Haitao Huang
When I turn on CONFIG_PROVE_LOCKING, kernel reports following suspicious  
RCU usages. Not sure if it is an issue. Just reporting here:


[ +34.337095] =
[  +0.01] WARNING: suspicious RCU usage
[  +0.02] 5.9.0-rc6-lock-sgx39 #1 Not tainted
[  +0.01] -
[  +0.01] ./include/linux/xarray.h:1165 suspicious  
rcu_dereference_check() usage!

[  +0.01]
  other info that might help us debug this:

[  +0.01]
  rcu_scheduler_active = 2, debug_locks = 1
[  +0.01] 1 lock held by enclaveos-runne/4238:
[  +0.01]  #0: 9cc6657e45e8 (>mmap_lock#2){}-{3:3}, at:  
vm_mmap_pgoff+0xa1/0x120

[  +0.05]
  stack backtrace:
[  +0.02] CPU: 1 PID: 4238 Comm: enclaveos-runne Not tainted  
5.9.0-rc6-lock-sgx39 #1
[  +0.01] Hardware name: Microsoft Corporation Virtual Machine/Virtual  
Machine, BIOS Hyper-V UEFI Release v4.1 04/02/2020

[  +0.02] Call Trace:
[  +0.03]  dump_stack+0x7d/0x9f
[  +0.03]  lockdep_rcu_suspicious+0xce/0xf0
[  +0.04]  xas_start+0x14c/0x1c0
[  +0.03]  xas_load+0xf/0x50
[  +0.02]  xas_find+0x25c/0x2c0
[  +0.04]  sgx_encl_may_map+0x87/0x1c0
[  +0.06]  sgx_mmap+0x29/0x70
[  +0.03]  mmap_region+0x3ee/0x710
[  +0.06]  do_mmap+0x3f1/0x5e0
[  +0.04]  vm_mmap_pgoff+0xcd/0x120
[  +0.07]  ksys_mmap_pgoff+0x1de/0x240
[  +0.05]  __x64_sys_mmap+0x33/0x40
[  +0.02]  do_syscall_64+0x37/0x80
[  +0.03]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.02] RIP: 0033:0x7fe34efe06ba
[  +0.02] Code: 89 f5 41 54 49 89 fc 55 53 74 35 49 63 e8 48 63 da 4d  
89 f9 49 89 e8 4d 63 d6 48 89 da 4c 89 ee 4c 89 e7 b8 09 00 00 00 0f 05  
<48> 3d 00 f0 ff ff 77 56 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 00
[  +0.01] RSP: 002b:7ffee83eac08 EFLAGS: 0206 ORIG_RAX:  
0009
[  +0.01] RAX: ffda RBX: 0001 RCX:  
7fe34efe06ba
[  +0.01] RDX: 0001 RSI: 1000 RDI:  
07fff000
[  +0.01] RBP: 0004 R08: 0004 R09:  

[  +0.01] R10: 0011 R11: 0206 R12:  
07fff000
[  +0.01] R13: 1000 R14: 0011 R15:  



[  +0.10] =
[  +0.01] WARNING: suspicious RCU usage
[  +0.01] 5.9.0-rc6-lock-sgx39 #1 Not tainted
[  +0.01] -
[  +0.01] ./include/linux/xarray.h:1181 suspicious  
rcu_dereference_check() usage!

[  +0.01]
  other info that might help us debug this:

[  +0.01]
  rcu_scheduler_active = 2, debug_locks = 1
[  +0.01] 1 lock held by enclaveos-runne/4238:
[  +0.01]  #0: 9cc6657e45e8 (>mmap_lock#2){}-{3:3}, at:  
vm_mmap_pgoff+0xa1/0x120

[  +0.03]
  stack backtrace:
[  +0.01] CPU: 1 PID: 4238 Comm: enclaveos-runne Not tainted  
5.9.0-rc6-lock-sgx39 #1
[  +0.01] Hardware name: Microsoft Corporation Virtual Machine/Virtual  
Machine, BIOS Hyper-V UEFI Release v4.1 04/02/2020

[  +0.01] Call Trace:
[  +0.01]  dump_stack+0x7d/0x9f
[  +0.03]  lockdep_rcu_suspicious+0xce/0xf0
[  +0.03]  xas_descend+0x116/0x120
[  +0.04]  xas_load+0x42/0x50
[  +0.02]  xas_find+0x25c/0x2c0
[  +0.04]  sgx_encl_may_map+0x87/0x1c0
[  +0.06]  sgx_mmap+0x29/0x70
[  +0.02]  mmap_region+0x3ee/0x710
[  +0.06]  do_mmap+0x3f1/0x5e0
[  +0.04]  vm_mmap_pgoff+0xcd/0x120
[  +0.07]  ksys_mmap_pgoff+0x1de/0x240
[  +0.05]  __x64_sys_mmap+0x33/0x40
[  +0.02]  do_syscall_64+0x37/0x80
[  +0.02]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.01] RIP: 0033:0x7fe34efe06ba
[  +0.01] Code: 89 f5 41 54 49 89 fc 55 53 74 35 49 63 e8 48 63 da 4d  
89 f9 49 89 e8 4d 63 d6 48 89 da 4c 89 ee 4c 89 e7 b8 09 00 00 00 0f 05  
<48> 3d 00 f0 ff ff 77 56 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 00
[  +0.01] RSP: 002b:7ffee83eac08 EFLAGS: 0206 ORIG_RAX:  
0009
[  +0.01] RAX: ffda RBX: 0001 RCX:  
7fe34efe06ba
[  +0.01] RDX: 0001 RSI: 1000 RDI:  
07fff000
[  +0.01] RBP: 0004 R08: 0004 R09:  

[  +0.01] R10: 0011 R11: 0206 R12:  
07fff000
[  +0.01] R13: 1000 R14: 0011 R15:  



[  +0.001117] =
[  +0.01] WARNING: suspicious RCU usage
[  +0.01] 5.9.0-rc6-lock-sgx39 #1 Not tainted
[  +0.01] -
[  +0.01] ./include/linux/xarray.h:1181 suspicious  
rcu_dereference_check() usage!

[  +0.01]
  other info that might help us debug this:

[  +0.01]
  rcu_scheduler_active = 2, debug_locks = 1
[  +0.01] 1 lock held by enclaveos-runne/4238:
[  +0.01]  #0: 9cc6657e45e8 (>mmap_lock#2){}-{3:3}, 

Re: [PATCH] FIX the comment of struct jbd2_journal_handle

2020-10-02 Thread Theodore Y. Ts'o
On Wed, Sep 23, 2020 at 01:12:31AM +0800, Hui Su wrote:
> the struct name was modified long ago, but the comment still
> use struct handle_s.
> 
> Signed-off-by: Hui Su 

Tnanks, applied.  I updated the commit summary to be:

jbd2: fix the comment of struct jbd2_journal_handle

- Ted


Re: [PATCH] ext4: fix leaking sysfs kobject after failed mount

2020-10-02 Thread Theodore Y. Ts'o
On Thu, Sep 24, 2020 at 11:08:59AM +0200, Jan Kara wrote:
> On Tue 22-09-20 09:24:56, Eric Biggers wrote:
> > From: Eric Biggers 
> > 
> > ext4_unregister_sysfs() only deletes the kobject.  The reference to it
> > needs to be put separately, like ext4_put_super() does.
> > 
> > This addresses the syzbot report
> > "memory leak in kobject_set_name_vargs (3)"
> > (https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1).
> > 
> > Reported-by: syzbot+9f864abad79fae7c1...@syzkaller.appspotmail.com
> > Fixes: 72ba74508b28 ("ext4: release sysfs kobject when failing to enable 
> > quotas on mount")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Eric Biggers 
> 
> Looks good. You can add:
> 
> Reviewed-by: Jan Kara 

Thanks, applied.

- Ted


[PATCH v39 21/24] x86/vdso: Implement a vDSO for Intel SGX enclave call

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

An SGX runtime must be aware of the exceptions, which happen inside an
enclave. Introduce a vDSO call that wraps EENTER/ERESUME cycle and returns
the CPU exception back to the caller exactly when it happens.

Kernel fixups the exception information to RDI, RSI and RDX. The SGX call
vDSO handler fills this information to the user provided buffer or
alternatively trigger user provided callback at the time of the exception.

The calling convention supports providing the parameters in standard RDI
RSI, RDX, RCX, R8 and R9 registers, i.e. it is possible to declare the vDSO
as a C prototype, but other than that there is no specific support for
SystemV ABI. Storing XSAVE etc. is all responsibility of the enclave and
the associated run-time.

Suggested-by: Andy Lutomirski 
Acked-by: Jethro Beekman 
Tested-by: Jethro Beekman 
Signed-off-by: Sean Christopherson 
Co-developed-by: Cedric Xing 
Signed-off-by: Cedric Xing 
Co-developed-by: Jarkko Sakkinen 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/entry/vdso/Makefile|   2 +
 arch/x86/entry/vdso/vdso.lds.S  |   1 +
 arch/x86/entry/vdso/vsgx.S  | 157 
 arch/x86/include/asm/enclu.h|   9 ++
 arch/x86/include/uapi/asm/sgx.h |  98 
 5 files changed, 267 insertions(+)
 create mode 100644 arch/x86/entry/vdso/vsgx.S
 create mode 100644 arch/x86/include/asm/enclu.h

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 3f183d0b8826..27e7635e31d3 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -29,6 +29,7 @@ VDSO32-$(CONFIG_IA32_EMULATION)   := y
 vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
 vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
 vobjs32-y += vdso32/vclock_gettime.o
+vobjs-$(VDSO64-y)  += vsgx.o
 
 # files to link into kernel
 obj-y  += vma.o extable.o
@@ -100,6 +101,7 @@ $(vobjs): KBUILD_CFLAGS := $(filter-out 
$(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS
 CFLAGS_REMOVE_vclock_gettime.o = -pg
 CFLAGS_REMOVE_vdso32/vclock_gettime.o = -pg
 CFLAGS_REMOVE_vgetcpu.o = -pg
+CFLAGS_REMOVE_vsgx.o = -pg
 
 #
 # X32 processes use x32 vDSO to access 64bit kernel data.
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index 36b644e16272..4bf48462fca7 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -27,6 +27,7 @@ VERSION {
__vdso_time;
clock_getres;
__vdso_clock_getres;
+   __vdso_sgx_enter_enclave;
local: *;
};
 }
diff --git a/arch/x86/entry/vdso/vsgx.S b/arch/x86/entry/vdso/vsgx.S
new file mode 100644
index ..5c047e588f16
--- /dev/null
+++ b/arch/x86/entry/vdso/vsgx.S
@@ -0,0 +1,157 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "extable.h"
+
+/* Relative to %rbp. */
+#define SGX_ENCLAVE_OFFSET_OF_RUN  16
+
+/* The offsets relative to struct sgx_enclave_run. */
+#define SGX_ENCLAVE_RUN_TCS0
+#define SGX_ENCLAVE_RUN_USER_HANDLER   8
+#define SGX_ENCLAVE_RUN_USER_DATA  16 /* unused */
+#define SGX_ENCLAVE_RUN_LEAF   24
+#define SGX_ENCLAVE_RUN_EXCEPTION_VECTOR   28
+#define SGX_ENCLAVE_RUN_EXCEPTION_ERROR_CODE   30
+#define SGX_ENCLAVE_RUN_EXCEPTION_ADDR 32
+#define SGX_ENCLAVE_RUN_RESERVED_START 40
+#define SGX_ENCLAVE_RUN_RESERVED_END   64
+
+.code64
+.section .text, "ax"
+
+SYM_FUNC_START(__vdso_sgx_enter_enclave)
+   /* Prolog */
+   .cfi_startproc
+   push%rbp
+   .cfi_adjust_cfa_offset  8
+   .cfi_rel_offset %rbp, 0
+   mov %rsp, %rbp
+   .cfi_def_cfa_register   %rbp
+   push%rbx
+   .cfi_rel_offset %rbx, -8
+
+   mov %ecx, %eax
+.Lenter_enclave:
+   /* EENTER <= leaf <= ERESUME */
+   cmp $EENTER, %eax
+   jb  .Linvalid_input
+   cmp $ERESUME, %eax
+   ja  .Linvalid_input
+
+   mov SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rcx
+
+   /* Validate that the reserved area contains only zeros. */
+   push%rax
+   push%rbx
+   mov $SGX_ENCLAVE_RUN_RESERVED_START, %rbx
+1:
+   mov (%rcx, %rbx), %rax
+   cmpq$0, %rax
+   jne .Linvalid_input
+
+   add $8, %rbx
+   cmpq$SGX_ENCLAVE_RUN_RESERVED_END, %rbx
+   jne 1b
+   pop %rbx
+   pop %rax
+
+   /* Load TCS and AEP */
+   mov SGX_ENCLAVE_RUN_TCS(%rcx), %rbx
+   lea .Lasync_exit_pointer(%rip), %rcx
+
+   /* Single ENCLU serving as both EENTER and AEP (ERESUME) */
+.Lasync_exit_pointer:
+.Lenclu_eenter_eresume:
+   enclu
+
+   /* EEXIT jumps here unless the enclave is doing something fancy. */
+   mov SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rbx
+
+   /* Set exit_reason. */
+   movl$EEXIT, 

[PATCH v39 13/24] x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES

2020-10-02 Thread Jarkko Sakkinen
Add an ioctl, which performs ENCLS[EADD] that adds new visible page to an
enclave, and optionally ENCLS[EEXTEND] operations that hash the page to the
enclave measurement. By visible we mean a page that can be mapped to the
address range of an enclave.

Acked-by: Jethro Beekman 
Tested-by: Jethro Beekman 
Tested-by: Haitao Huang 
Tested-by: Chunyang Hui 
Tested-by: Jordan Hand 
Tested-by: Nathaniel McCallum 
Tested-by: Seth Moore 
Tested-by: Darren Kenny 
Reviewed-by: Darren Kenny 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Co-developed-by: Suresh Siddha 
Signed-off-by: Suresh Siddha 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/include/uapi/asm/sgx.h |  30 
 arch/x86/kernel/cpu/sgx/ioctl.c | 299 
 arch/x86/kernel/cpu/sgx/sgx.h   |   1 +
 3 files changed, 330 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index c75b375f3770..10cd48d06318 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -8,10 +8,21 @@
 #include 
 #include 
 
+/**
+ * enum sgx_epage_flags - page control flags
+ * %SGX_PAGE_MEASURE:  Measure the page contents with a sequence of
+ * ENCLS[EEXTEND] operations.
+ */
+enum sgx_page_flags {
+   SGX_PAGE_MEASURE= 0x01,
+};
+
 #define SGX_MAGIC 0xA4
 
 #define SGX_IOC_ENCLAVE_CREATE \
_IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create)
+#define SGX_IOC_ENCLAVE_ADD_PAGES \
+   _IOWR(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -22,4 +33,23 @@ struct sgx_enclave_create  {
__u64   src;
 };
 
+/**
+ * struct sgx_enclave_add_pages - parameter structure for the
+ *%SGX_IOC_ENCLAVE_ADD_PAGE ioctl
+ * @src:   start address for the page data
+ * @offset:starting page offset
+ * @length:length of the data (multiple of the page size)
+ * @secinfo:   address for the SECINFO data
+ * @flags: page control flags
+ * @count: number of bytes added (multiple of the page size)
+ */
+struct sgx_enclave_add_pages {
+   __u64 src;
+   __u64 offset;
+   __u64 length;
+   __u64 secinfo;
+   __u64 flags;
+   __u64 count;
+};
+
 #endif /* _UAPI_ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 9bb4694e57c1..e13e04737683 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -194,6 +194,302 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, 
void __user *arg)
return ret;
 }
 
+static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+unsigned long offset,
+u64 secinfo_flags)
+{
+   struct sgx_encl_page *encl_page;
+   unsigned long prot;
+
+   encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
+   if (!encl_page)
+   return ERR_PTR(-ENOMEM);
+
+   encl_page->desc = encl->base + offset;
+   encl_page->encl = encl;
+
+   prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
+  _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
+  _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
+
+   /*
+* TCS pages must always RW set for CPU access while the SECINFO
+* permissions are *always* zero - the CPU ignores the user provided
+* values and silently overwrites them with zero permissions.
+*/
+   if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
+   prot |= PROT_READ | PROT_WRITE;
+
+   /* Calculate maximum of the VM flags for the page. */
+   encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
+
+   return encl_page;
+}
+
+static int sgx_validate_secinfo(struct sgx_secinfo *secinfo)
+{
+   u64 perm = secinfo->flags & SGX_SECINFO_PERMISSION_MASK;
+   u64 pt = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK;
+
+   if (pt != SGX_SECINFO_REG && pt != SGX_SECINFO_TCS)
+   return -EINVAL;
+
+   if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
+   return -EINVAL;
+
+   /*
+* CPU will silently overwrite the permissions as zero, which means
+* that we need to validate it ourselves.
+*/
+   if (pt == SGX_SECINFO_TCS && perm)
+   return -EINVAL;
+
+   if (secinfo->flags & SGX_SECINFO_RESERVED_MASK)
+   return -EINVAL;
+
+   if (memchr_inv(secinfo->reserved, 0, sizeof(secinfo->reserved)))
+   return -EINVAL;
+
+   return 0;
+}
+
+static int __sgx_encl_add_page(struct sgx_encl *encl,
+  struct sgx_encl_page *encl_page,
+  struct sgx_epc_page *epc_page,
+  struct sgx_secinfo *secinfo, unsigned long src)
+{
+   

[PATCH v39 15/24] x86/sgx: Add SGX_IOC_ENCLAVE_PROVISION

2020-10-02 Thread Jarkko Sakkinen
Provisioning Certification Enclave (PCE), the root of trust for other
enclaves, generates a signing key from a fused key called Provisioning
Certification Key. PCE can then use this key to certify an attestation key
of a Quoting Enclave (QE), e.g. we get the chain of trust down to the
hardware if the Intel signed PCE is used.

To use the needed keys, ATTRIBUTE.PROVISIONKEY is required but should be
only allowed for those who actually need it so that only the trusted
parties can certify QE's.

Obviously the attestation service should know the public key of the used
PCE and that way detect illegit attestation, but whitelisting the legit
users still adds an additional layer of defence.

Add new device file called /dev/sgx/provision. The sole purpose of this
file is to provide file descriptors that act as privilege tokens to allow
to build enclaves with ATTRIBUTE.PROVISIONKEY set. A new ioctl called
SGX_IOC_ENCLAVE_PROVISION is used to assign this token to an enclave.

Cc: linux-security-mod...@vger.kernel.org
Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Suggested-by: Andy Lutomirski 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/include/uapi/asm/sgx.h  | 11 
 arch/x86/kernel/cpu/sgx/driver.c | 18 +
 arch/x86/kernel/cpu/sgx/driver.h |  2 ++
 arch/x86/kernel/cpu/sgx/ioctl.c  | 46 
 4 files changed, 77 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index e401fa72eaab..b6ba036a9b82 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -25,6 +25,8 @@ enum sgx_page_flags {
_IOWR(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages)
 #define SGX_IOC_ENCLAVE_INIT \
_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
+#define SGX_IOC_ENCLAVE_PROVISION \
+   _IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -63,4 +65,13 @@ struct sgx_enclave_init {
__u64 sigstruct;
 };
 
+/**
+ * struct sgx_enclave_provision - parameter structure for the
+ *   %SGX_IOC_ENCLAVE_PROVISION ioctl
+ * @attribute_fd:  file handle of the attribute file in the securityfs
+ */
+struct sgx_enclave_provision {
+   __u64 attribute_fd;
+};
+
 #endif /* _UAPI_ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 7bdb49dfcca6..d01b28f7ce4a 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -134,6 +134,10 @@ static const struct file_operations sgx_encl_fops = {
.get_unmapped_area  = sgx_get_unmapped_area,
 };
 
+const struct file_operations sgx_provision_fops = {
+   .owner  = THIS_MODULE,
+};
+
 static struct miscdevice sgx_dev_enclave = {
.minor = MISC_DYNAMIC_MINOR,
.name = "enclave",
@@ -141,6 +145,13 @@ static struct miscdevice sgx_dev_enclave = {
.fops = _encl_fops,
 };
 
+static struct miscdevice sgx_dev_provision = {
+   .minor = MISC_DYNAMIC_MINOR,
+   .name = "provision",
+   .nodename = "sgx/provision",
+   .fops = _provision_fops,
+};
+
 int __init sgx_drv_init(void)
 {
unsigned int eax, ebx, ecx, edx;
@@ -181,5 +192,12 @@ int __init sgx_drv_init(void)
return ret;
}
 
+   ret = misc_register(_dev_provision);
+   if (ret) {
+   pr_err("Creating /dev/sgx/provision failed with %d.\n", ret);
+   misc_deregister(_dev_enclave);
+   return ret;
+   }
+
return 0;
 }
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
index e4063923115b..72747d01c046 100644
--- a/arch/x86/kernel/cpu/sgx/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -23,6 +23,8 @@ extern u64 sgx_attributes_reserved_mask;
 extern u64 sgx_xfrm_reserved_mask;
 extern u32 sgx_xsave_size_tbl[64];
 
+extern const struct file_operations sgx_provision_fops;
+
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
 
 int sgx_drv_init(void);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index cf5a43d6daa2..3c04798e83e5 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -679,6 +679,49 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, 
void __user *arg)
return ret;
 }
 
+/**
+ * sgx_ioc_enclave_provision - handler for %SGX_IOC_ENCLAVE_PROVISION
+ * @enclave:   an enclave pointer
+ * @arg:   userspace pointer to a struct sgx_enclave_provision instance
+ *
+ * Mark the enclave as being allowed to access a restricted attribute bit.
+ * The requested attribute is specified via the attribute_fd field in the
+ * provided struct sgx_enclave_provision.  The attribute_fd must be a
+ * handle to an SGX attribute file, e.g. "/dev/sgx/provision".
+ *
+ * Failure to explicitly request access to a restricted attribute will cause
+ * sgx_ioc_enclave_init() to fail.  

[PATCH v39 16/24] x86/sgx: Add a page reclaimer

2020-10-02 Thread Jarkko Sakkinen
There is a limited amount of EPC available. Therefore, some of it must be
copied to the regular memory, and only subset kept in the SGX reserved
memory. While kernel cannot directly access enclave memory, SGX provides a
set of ENCLS leaf functions to perform reclaiming.

Implement a page reclaimer by using these leaf functions. It picks the
victim pages in LRU fashion from all the enclaves running in the system.
The thread ksgxswapd reclaims pages on the event when the number of free
EPC pages goes below SGX_NR_LOW_PAGES up until it reaches
SGX_NR_HIGH_PAGES.

sgx_alloc_epc_page() can optionally directly reclaim pages with @reclaim
set true. A caller must also supply owner for each page so that the
reclaimer can access the associated enclaves. This is needed for locking,
as most of the ENCLS leafs cannot be executed concurrently for an enclave.
The owner is also needed for accessing SECS, which is required to be
resident when its child pages are being reclaimed.

Cc: linux...@kvack.org
Acked-by: Jethro Beekman 
Tested-by: Jethro Beekman 
Tested-by: Jordan Hand 
Tested-by: Nathaniel McCallum 
Tested-by: Chunyang Hui 
Tested-by: Seth Moore 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/sgx/driver.c |   1 +
 arch/x86/kernel/cpu/sgx/encl.c   | 344 +-
 arch/x86/kernel/cpu/sgx/encl.h   |  41 +++
 arch/x86/kernel/cpu/sgx/ioctl.c  |  78 -
 arch/x86/kernel/cpu/sgx/main.c   | 481 +++
 arch/x86/kernel/cpu/sgx/sgx.h|   9 +
 6 files changed, 947 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index d01b28f7ce4a..0446781cc7a2 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -29,6 +29,7 @@ static int sgx_open(struct inode *inode, struct file *file)
atomic_set(>flags, 0);
kref_init(>refcount);
xa_init(>page_array);
+   INIT_LIST_HEAD(>va_pages);
mutex_init(>lock);
INIT_LIST_HEAD(>mm_list);
spin_lock_init(>mm_lock);
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index c2c4a77af36b..54326efa6c2f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,9 +12,88 @@
 #include "encls.h"
 #include "sgx.h"
 
+/*
+ * ELDU: Load an EPC page as unblocked. For more info, see "OS Management of 
EPC
+ * Pages" in the SDM.
+ */
+static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
+  struct sgx_epc_page *epc_page,
+  struct sgx_epc_page *secs_page)
+{
+   unsigned long va_offset = SGX_ENCL_PAGE_VA_OFFSET(encl_page);
+   struct sgx_encl *encl = encl_page->encl;
+   struct sgx_pageinfo pginfo;
+   struct sgx_backing b;
+   pgoff_t page_index;
+   int ret;
+
+   if (secs_page)
+   page_index = SGX_ENCL_PAGE_INDEX(encl_page);
+   else
+   page_index = PFN_DOWN(encl->size);
+
+   ret = sgx_encl_get_backing(encl, page_index, );
+   if (ret)
+   return ret;
+
+   pginfo.addr = SGX_ENCL_PAGE_ADDR(encl_page);
+   pginfo.contents = (unsigned long)kmap_atomic(b.contents);
+   pginfo.metadata = (unsigned long)kmap_atomic(b.pcmd) +
+ b.pcmd_offset;
+
+   if (secs_page)
+   pginfo.secs = (u64)sgx_get_epc_addr(secs_page);
+   else
+   pginfo.secs = 0;
+
+   ret = __eldu(, sgx_get_epc_addr(epc_page),
+sgx_get_epc_addr(encl_page->va_page->epc_page) +
+ va_offset);
+   if (ret) {
+   if (encls_failed(ret))
+   ENCLS_WARN(ret, "ELDU");
+
+   ret = -EFAULT;
+   }
+
+   kunmap_atomic((void *)(unsigned long)(pginfo.metadata - b.pcmd_offset));
+   kunmap_atomic((void *)(unsigned long)pginfo.contents);
+
+   sgx_encl_put_backing(, false);
+
+   return ret;
+}
+
+static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
+ struct sgx_epc_page *secs_page)
+{
+   unsigned long va_offset = SGX_ENCL_PAGE_VA_OFFSET(encl_page);
+   struct sgx_encl *encl = encl_page->encl;
+   struct sgx_epc_page *epc_page;
+   int ret;
+
+   epc_page = sgx_alloc_epc_page(encl_page, false);
+   if (IS_ERR(epc_page))
+   return epc_page;
+
+   ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
+   if (ret) {
+   sgx_free_epc_page(epc_page);
+   return ERR_PTR(ret);
+   }
+
+   sgx_free_va_slot(encl_page->va_page, va_offset);
+   list_move(_page->va_page->list, >va_pages);
+   encl_page->desc &= ~SGX_ENCL_PAGE_VA_OFFSET_MASK;
+   encl_page->epc_page = epc_page;
+
+   return epc_page;
+}
+
 static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
   

[PATCH v39 19/24] x86/fault: Add helper function to sanitize error code

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

Add helper function to sanitize error code to prepare for vDSO exception
fixup, which will expose the error code to userspace and runs before
set_signal_archinfo(), i.e. suppresses the signal when fixup is successful.

Acked-by: Jethro Beekman 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/mm/fault.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 90ee91c244c6..24ab833ede41 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -602,6 +602,18 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
oops_end(flags, regs, sig);
 }
 
+static void sanitize_error_code(unsigned long address,
+   unsigned long *error_code)
+{
+   /*
+* To avoid leaking information about the kernel page
+* table layout, pretend that user-mode accesses to
+* kernel addresses are always protection faults.
+*/
+   if (address >= TASK_SIZE_MAX)
+   *error_code |= X86_PF_PROT;
+}
+
 static void set_signal_archinfo(unsigned long address,
unsigned long error_code)
 {
@@ -658,6 +670,8 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 * faulting through the emulate_vsyscall() logic.
 */
if (current->thread.sig_on_uaccess_err && signal) {
+   sanitize_error_code(address, _code);
+
set_signal_archinfo(address, error_code);
 
/* XXX: hwpoison faults will set the wrong code. */
@@ -806,13 +820,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long 
error_code,
if (is_errata100(regs, address))
return;
 
-   /*
-* To avoid leaking information about the kernel page table
-* layout, pretend that user-mode accesses to kernel addresses
-* are always protection faults.
-*/
-   if (address >= TASK_SIZE_MAX)
-   error_code |= X86_PF_PROT;
+   sanitize_error_code(address, _code);
 
if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);
@@ -931,6 +939,8 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, 
unsigned long address,
if (is_prefetch(regs, error_code, address))
return;
 
+   sanitize_error_code(address, _code);
+
set_signal_archinfo(address, error_code);
 
 #ifdef CONFIG_MEMORY_FAILURE
-- 
2.25.1



[PATCH v39 20/24] x86/traps: Attempt to fixup exceptions in vDSO before signaling

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

vDSO functions can now leverage an exception fixup mechanism similar to
kernel exception fixup.  For vDSO exception fixup, the initial user is
Intel's Software Guard Extensions (SGX), which will wrap the low-level
transitions to/from the enclave, i.e. EENTER and ERESUME instructions,
in a vDSO function and leverage fixup to intercept exceptions that would
otherwise generate a signal.  This allows the vDSO wrapper to return the
fault information directly to its caller, obviating the need for SGX
applications and libraries to juggle signal handlers.

Attempt to fixup vDSO exceptions immediately prior to populating and
sending signal information.  Except for the delivery mechanism, an
exception in a vDSO function should be treated like any other exception
in userspace, e.g. any fault that is successfully handled by the kernel
should not be directly visible to userspace.

Although it's debatable whether or not all exceptions are of interest to
enclaves, defer to the vDSO fixup to decide whether to do fixup or
generate a signal.  Future users of vDSO fixup, if there ever are any,
will undoubtedly have different requirements than SGX enclaves, e.g. the
fixup vs. signal logic can be made function specific if/when necessary.

Suggested-by: Andy Lutomirski 
Acked-by: Jethro Beekman 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/traps.c | 10 ++
 arch/x86/mm/fault.c |  7 +++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index e2c6fd4dde8e..13dbc59c6bc5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -117,6 +118,9 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, 
const char *str,
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;
die(str, regs, error_code);
+   } else {
+   if (fixup_vdso_exception(regs, trapnr, error_code, 0))
+   return 0;
}
 
/*
@@ -550,6 +554,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
 
+   if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
+   return;
+
show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
force_sig(SIGSEGV);
goto exit;
@@ -1031,6 +1038,9 @@ static void math_error(struct pt_regs *regs, int trapnr)
if (!si_code)
goto exit;
 
+   if (fixup_vdso_exception(regs, trapnr, 0, 0))
+   return;
+
force_sig_fault(SIGFPE, si_code,
(void __user *)uprobe_get_trap_addr(regs));
 exit:
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 24ab833ede41..bda67aba6553 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -30,6 +30,7 @@
 #include /* exception stack  
*/
 #include  /* VMALLOC_START, ...   */
 #include   /* kvm_handle_async_pf  */
+#include   /* fixup_vdso_exception()   */
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -822,6 +823,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long 
error_code,
 
sanitize_error_code(address, _code);
 
+   if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, 
address))
+   return;
+
if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);
 
@@ -941,6 +945,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, 
unsigned long address,
 
sanitize_error_code(address, _code);
 
+   if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+   return;
+
set_signal_archinfo(address, error_code);
 
 #ifdef CONFIG_MEMORY_FAILURE
-- 
2.25.1



[PATCH v39 18/24] x86/vdso: Add support for exception fixup in vDSO functions

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

The basic concept and implementation is very similar to the kernel's
exception fixup mechanism.  The key differences are that the kernel
handler is hardcoded and the fixup entry addresses are relative to
the overall table as opposed to individual entries.

Hardcoding the kernel handler avoids the need to figure out how to
get userspace code to point at a kernel function.  Given that the
expected usage is to propagate information to userspace, dumping all
fault information into registers is likely the desired behavior for
the vast majority of yet-to-be-created functions.  Use registers
DI, SI and DX to communicate fault information, which follows Linux's
ABI for register consumption and hopefully avoids conflict with
hardware features that might leverage the fixup capabilities, e.g.
register usage for SGX instructions was at least partially designed
with calling conventions in mind.

Making fixup addresses relative to the overall table allows the table
to be stripped from the final vDSO image (it's a kernel construct)
without complicating the offset logic, e.g. entry-relative addressing
would also need to account for the table's location relative to the
image.

Regarding stripping the table, modify vdso2c to extract the table from
the raw, a.k.a. unstripped, data and dump it as a standalone byte array
in the resulting .c file.  The original base of the table, its length
and a pointer to the byte array are captured in struct vdso_image.
Alternatively, the table could be dumped directly into the struct,
but because the number of entries can vary per image, that would
require either hardcoding a max sized table into the struct definition
or defining the table as a flexible length array.  The flexible length
array approach has zero benefits, e.g. the base/size are still needed,
and prevents reusing the extraction code, while hardcoding the max size
adds ongoing maintenance just to avoid exporting the explicit size.

The immediate use case is for Intel Software Guard Extensions (SGX).
SGX introduces a new CPL3-only "enclave" mode that runs as a sort of
black box shared object that is hosted by an untrusted "normal" CPl3
process.

Entering an enclave can only be done through SGX-specific instructions,
EENTER and ERESUME, and is a non-trivial process.  Because of the
complexity of transitioning to/from an enclave, the vast majority of
enclaves are expected to utilize a library to handle the actual
transitions.  This is roughly analogous to how e.g. libc implementations
are used by most applications.

Another crucial characteristic of SGX enclaves is that they can generate
exceptions as part of their normal (at least as "normal" as SGX can be)
operation that need to be handled *in* the enclave and/or are unique
to SGX.

And because they are essentially fancy shared objects, a process can
host any number of enclaves, each of which can execute multiple threads
simultaneously.

Putting everything together, userspace enclaves will utilize a library
that must be prepared to handle any and (almost) all exceptions any time
at least one thread may be executing in an enclave.  Leveraging signals
to handle the enclave exceptions is unpleasant, to put it mildly, e.g.
the SGX library must constantly (un)register its signal handler based
on whether or not at least one thread is executing in an enclave, and
filter and forward exceptions that aren't related to its enclaves.  This
becomes particularly nasty when using multiple levels of libraries that
register signal handlers, e.g. running an enclave via cgo inside of the
Go runtime.

Enabling exception fixup in vDSO allows the kernel to provide a vDSO
function that wraps the low-level transitions to/from the enclave, i.e.
the EENTER and ERESUME instructions.  The vDSO function can intercept
exceptions that would otherwise generate a signal and return the fault
information directly to its caller, thus avoiding the need to juggle
signal handlers.

Note that unlike the kernel's _ASM_EXTABLE_HANDLE implementation, the
'C' version of _ASM_VDSO_EXTABLE_HANDLE doesn't use a pre-compiled
assembly macro.  Duplicating four lines of code is simpler than adding
the necessary infrastructure to generate pre-compiled assembly and the
intended benefit of massaging GCC's inlining algorithm is unlikely to
realized in the vDSO any time soon, if ever.

Suggested-by: Andy Lutomirski 
Acked-by: Jethro Beekman 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/entry/vdso/Makefile  |  6 ++--
 arch/x86/entry/vdso/extable.c | 46 
 arch/x86/entry/vdso/extable.h | 28 +++
 arch/x86/entry/vdso/vdso-layout.lds.S |  9 -
 arch/x86/entry/vdso/vdso2c.h  | 50 ++-
 arch/x86/include/asm/vdso.h   |  5 +++
 6 files changed, 139 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/entry/vdso/extable.c
 create mode 100644 arch/x86/entry/vdso/extable.h


[PATCH v39 17/24] x86/sgx: Add ptrace() support for the SGX driver

2020-10-02 Thread Jarkko Sakkinen
Intel Sofware Guard eXtensions (SGX) allows creation of executable blobs
called enclaves, which cannot be accessed by default when not executing
inside the enclave. Enclaves can be entered by only using predefined memory
addresses, which are defined when the enclave is loaded.

However, enclaves can defined as debug enclaves at the load time. In debug
enclaves data can be read and/or written a memory word at a time by by
using ENCLS[EDBGRD] and ENCLS[EDBGWR] leaf instructions.

Add sgx_vma_access() function that implements 'access' virtual function
of struct vm_operations_struct. Use aforementioned leaf instructions to
provide read and write primitives for the enclave memory.

Cc: linux...@kvack.org
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Acked-by: Jethro Beekman 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/sgx/encl.c | 89 ++
 1 file changed, 89 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 54326efa6c2f..ae45f8f0951e 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -337,10 +337,99 @@ static int sgx_vma_mprotect(struct vm_area_struct *vma,
return mprotect_fixup(vma, pprev, start, end, newflags);
 }
 
+static int sgx_encl_debug_read(struct sgx_encl *encl, struct sgx_encl_page 
*page,
+  unsigned long addr, void *data)
+{
+   unsigned long offset = addr & ~PAGE_MASK;
+   int ret;
+
+
+   ret = __edbgrd(sgx_get_epc_addr(page->epc_page) + offset, data);
+   if (ret)
+   return -EIO;
+
+   return 0;
+}
+
+static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page 
*page,
+   unsigned long addr, void *data)
+{
+   unsigned long offset = addr & ~PAGE_MASK;
+   int ret;
+
+   ret = __edbgwr(sgx_get_epc_addr(page->epc_page) + offset, data);
+   if (ret)
+   return -EIO;
+
+   return 0;
+}
+
+static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
+ void *buf, int len, int write)
+{
+   struct sgx_encl *encl = vma->vm_private_data;
+   struct sgx_encl_page *entry = NULL;
+   char data[sizeof(unsigned long)];
+   unsigned long align;
+   unsigned int flags;
+   int offset;
+   int cnt;
+   int ret = 0;
+   int i;
+
+   /*
+* If process was forked, VMA is still there but vm_private_data is set
+* to NULL.
+*/
+   if (!encl)
+   return -EFAULT;
+
+   flags = atomic_read(>flags);
+
+   if (!(flags & SGX_ENCL_DEBUG) || !(flags & SGX_ENCL_INITIALIZED) ||
+   (flags & SGX_ENCL_DEAD))
+   return -EFAULT;
+
+   for (i = 0; i < len; i += cnt) {
+   entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
+   if (IS_ERR(entry)) {
+   ret = PTR_ERR(entry);
+   break;
+   }
+
+   align = ALIGN_DOWN(addr + i, sizeof(unsigned long));
+   offset = (addr + i) & (sizeof(unsigned long) - 1);
+   cnt = sizeof(unsigned long) - offset;
+   cnt = min(cnt, len - i);
+
+   ret = sgx_encl_debug_read(encl, entry, align, data);
+   if (ret)
+   goto out;
+
+   if (write) {
+   memcpy(data + offset, buf + i, cnt);
+   ret = sgx_encl_debug_write(encl, entry, align, data);
+   if (ret)
+   goto out;
+   } else {
+   memcpy(buf + i, data + offset, cnt);
+   }
+
+out:
+   mutex_unlock(>lock);
+
+   if (ret)
+   break;
+   }
+
+   return ret < 0 ? ret : i;
+}
+
 const struct vm_operations_struct sgx_vm_ops = {
.open = sgx_vma_open,
.fault = sgx_vma_fault,
.mprotect = sgx_vma_mprotect,
+   .access = sgx_vma_access,
 };
 
 /**
-- 
2.25.1



[PATCH v39 23/24] docs: x86/sgx: Document SGX micro architecture and kernel internals

2020-10-02 Thread Jarkko Sakkinen
Document the Intel SGX kernel architecture. The fine-grained micro
architecture details can be looked up from Intel SDM Volume 3D.

Cc: linux-...@vger.kernel.org
Acked-by: Randy Dunlap 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 Documentation/x86/index.rst |   1 +
 Documentation/x86/sgx.rst   | 284 
 2 files changed, 285 insertions(+)
 create mode 100644 Documentation/x86/sgx.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 740ee7f87898..b9db893c8aee 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -32,3 +32,4 @@ x86-specific Documentation
i386/index
x86_64/index
sva
+   sgx
diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
new file mode 100644
index ..7b742c331247
--- /dev/null
+++ b/Documentation/x86/sgx.rst
@@ -0,0 +1,284 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Software Guard eXtensions (SGX)
+===
+
+Architecture
+
+
+*Software Guard eXtensions (SGX)* is a set of instructions that enable ring-3
+applications to set aside private regions of code and data. These regions are
+called enclaves. An enclave can be entered at a fixed set of entry points. Only
+a CPU running inside the enclave can access its code and data.
+
+The support can be determined by
+
+   ``grep sgx /proc/cpuinfo``
+
+Enclave Page Cache
+==
+
+SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated
+with an enclave. It is contained in a BIOS-reserved region of physical memory.
+Unlike pages used for regular memory, pages can only be accessed outside the
+enclave for different purposes with the instructions **ENCLS**, **ENCLV** and
+**ENCLU**.
+
+Direct memory accesses at an enclave can be only done by a CPU executing inside
+the enclave. An enclave can be entered with **ENCLU[EENTER]** to a fixed set of
+entry points. However, a CPU executing inside the enclave can do outside memory
+accesses.
+
+Page Types
+--
+
+**SGX Enclave Control Structure (SECS)**
+   Enclave's address range, attributes and other global data are defined
+   by this structure.
+
+**Regular (REG)**
+   Regular EPC pages contain the code and data of an enclave.
+
+**Thread Control Structure (TCS)**
+   Thread Control Structure pages define the entry points to an enclave and
+   track the execution state of an enclave thread.
+
+**Version Array (VA)**
+   Version Array pages contain 512 slots, each of which can contain a version
+   number for a page evicted from the EPC.
+
+Enclave Page Cache Map
+--
+
+The processor tracks EPC pages via the *Enclave Page Cache Map (EPCM)*.  EPCM
+contains an entry for each EPC page, which describes the owning enclave, access
+rights and page type among the other things.
+
+The permissions from EPCM are consulted if and only if walking the kernel page
+tables succeeds. The total permissions are thus a conjunction between page 
table
+and EPCM permissions.
+
+For all intents and purposes, the SGX architecture allows the processor to
+invalidate all EPCM entries at will, i.e. requires that software be prepared to
+handle an EPCM fault at any time. The contents of EPC are encrypted with an
+ephemeral key, which is lost on power transitions.
+
+EPC management
+==
+
+EPC pages do not have ``struct page`` instances. They are IO memory from kernel
+perspective. The consequence is that they are always mapped as shared memory.
+Kernel defines ``/dev/sgx/enclave`` that can be mapped as ``MAP_SHARED`` to
+define the address range for an enclave.
+
+EPC Over-subscription
+=
+
+When the amount of free EPC pages goes below a low watermark the swapping 
thread
+starts reclaiming pages. The pages that do not have the **A** bit set are
+selected as victim pages.
+
+Launch Control
+==
+
+SGX provides a launch control mechanism. After all enclave pages have been
+copied, kernel executes **ENCLS[EINIT]**, which initializes the enclave. Only
+after this the CPU can execute inside the enclave.
+
+This leaf function takes an RSA-3072 signature of the enclave measurement and 
an
+optional cryptographic token. Linux does not take advantage of launch tokens.
+The leaf instruction checks that the measurement is correct and signature is
+signed with the key hashed to the four +**IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}**
+MSRs representing the SHA256 of a public key.
+
+Those MSRs can be configured by the BIOS to be either readable or writable.
+Linux supports only writable configuration in order to give full control to the
+kernel on launch control policy. Readable configuration requires the use of
+previously mentioned launch tokens.
+
+The launch is performed by setting the MSRs to the hash of the enclave signer's
+public key. The alternative would be to have *a launch 

[PATCH v39 22/24] selftests/x86: Add a selftest for SGX

2020-10-02 Thread Jarkko Sakkinen
Add a selftest for SGX. It is a trivial test where a simple enclave
copies one 64-bit word of memory between two memory locations.

Cc: Shuah Khan 
Cc: linux-kselft...@vger.kernel.org
Signed-off-by: Jarkko Sakkinen 
---
 tools/testing/selftests/Makefile  |   1 +
 tools/testing/selftests/sgx/.gitignore|   2 +
 tools/testing/selftests/sgx/Makefile  |  53 +++
 tools/testing/selftests/sgx/call.S|  44 ++
 tools/testing/selftests/sgx/defines.h |  21 +
 tools/testing/selftests/sgx/load.c| 277 
 tools/testing/selftests/sgx/main.c| 243 +++
 tools/testing/selftests/sgx/main.h|  38 ++
 tools/testing/selftests/sgx/sigstruct.c   | 395 ++
 tools/testing/selftests/sgx/test_encl.c   |  20 +
 tools/testing/selftests/sgx/test_encl.lds |  40 ++
 .../selftests/sgx/test_encl_bootstrap.S   |  89 
 12 files changed, 1223 insertions(+)
 create mode 100644 tools/testing/selftests/sgx/.gitignore
 create mode 100644 tools/testing/selftests/sgx/Makefile
 create mode 100644 tools/testing/selftests/sgx/call.S
 create mode 100644 tools/testing/selftests/sgx/defines.h
 create mode 100644 tools/testing/selftests/sgx/load.c
 create mode 100644 tools/testing/selftests/sgx/main.c
 create mode 100644 tools/testing/selftests/sgx/main.h
 create mode 100644 tools/testing/selftests/sgx/sigstruct.c
 create mode 100644 tools/testing/selftests/sgx/test_encl.c
 create mode 100644 tools/testing/selftests/sgx/test_encl.lds
 create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 9018f45d631d..fee80cda6304 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -68,6 +68,7 @@ TARGETS += user
 TARGETS += vm
 TARGETS += x86
 TARGETS += zram
+TARGETS += sgx
 #Please keep the TARGETS list alphabetically sorted
 # Run "make quicktest=1 run_tests" or
 # "make quicktest=1 kselftest" from top level Makefile
diff --git a/tools/testing/selftests/sgx/.gitignore 
b/tools/testing/selftests/sgx/.gitignore
new file mode 100644
index ..fbaf0bda9a92
--- /dev/null
+++ b/tools/testing/selftests/sgx/.gitignore
@@ -0,0 +1,2 @@
+test_sgx
+test_encl.elf
diff --git a/tools/testing/selftests/sgx/Makefile 
b/tools/testing/selftests/sgx/Makefile
new file mode 100644
index ..95e5c4df8014
--- /dev/null
+++ b/tools/testing/selftests/sgx/Makefile
@@ -0,0 +1,53 @@
+top_srcdir = ../../../..
+
+include ../lib.mk
+
+.PHONY: all clean
+
+CAN_BUILD_X86_64 := $(shell ../x86/check_cc.sh $(CC) \
+   ../x86/trivial_64bit_program.c)
+
+ifndef OBJCOPY
+OBJCOPY := $(CROSS_COMPILE)objcopy
+endif
+
+INCLUDES := -I$(top_srcdir)/tools/include
+HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC -z noexecstack
+ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
+  -fno-stack-protector -mrdrnd $(INCLUDES)
+
+TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx $(OUTPUT)/test_encl.elf
+
+ifeq ($(CAN_BUILD_X86_64), 1)
+all: $(TEST_CUSTOM_PROGS)
+endif
+
+$(OUTPUT)/test_sgx: $(OUTPUT)/main.o \
+   $(OUTPUT)/load.o \
+   $(OUTPUT)/sigstruct.o \
+   $(OUTPUT)/call.o
+   $(CC) $(HOST_CFLAGS) -o $@ $^ -lcrypto
+
+$(OUTPUT)/main.o: main.c
+   $(CC) $(HOST_CFLAGS) -c $< -o $@
+
+$(OUTPUT)/load.o: load.c
+   $(CC) $(HOST_CFLAGS) -c $< -o $@
+
+$(OUTPUT)/sigstruct.o: sigstruct.c
+   $(CC) $(HOST_CFLAGS) -c $< -o $@
+
+$(OUTPUT)/call.o: call.S
+   $(CC) $(HOST_CFLAGS) -c $< -o $@
+
+$(OUTPUT)/test_encl.elf: test_encl.lds test_encl.c test_encl_bootstrap.S
+   $(CC) $(ENCL_CFLAGS) -T $^ -o $@
+
+EXTRA_CLEAN := \
+   $(OUTPUT)/test_encl.elf \
+   $(OUTPUT)/load.o \
+   $(OUTPUT)/call.o \
+   $(OUTPUT)/main.o \
+   $(OUTPUT)/sigstruct.o \
+   $(OUTPUT)/test_sgx \
+   $(OUTPUT)/test_sgx.o \
diff --git a/tools/testing/selftests/sgx/call.S 
b/tools/testing/selftests/sgx/call.S
new file mode 100644
index ..f640532cda93
--- /dev/null
+++ b/tools/testing/selftests/sgx/call.S
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/**
+* Copyright(c) 2016-18 Intel Corporation.
+*/
+
+   .text
+
+   .global sgx_call_vdso
+sgx_call_vdso:
+   .cfi_startproc
+   push%r15
+   .cfi_adjust_cfa_offset  8
+   .cfi_rel_offset %r15, 0
+   push%r14
+   .cfi_adjust_cfa_offset  8
+   .cfi_rel_offset %r14, 0
+   push%r13
+   .cfi_adjust_cfa_offset  8
+   .cfi_rel_offset %r13, 0
+   push%r12
+   .cfi_adjust_cfa_offset  8
+   .cfi_rel_offset %r12, 0
+   push%rbx
+   .cfi_adjust_cfa_offset  8
+   .cfi_rel_offset %rbx, 0
+   push$0
+   .cfi_adjust_cfa_offset  8
+   push0x38(%rsp)
+   .cfi_adjust_cfa_offset  8
+   call

[PATCH v39 24/24] x86/sgx: Update MAINTAINERS

2020-10-02 Thread Jarkko Sakkinen
Add the maintainer information for the SGX subsystem.

Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Signed-off-by: Jarkko Sakkinen 
---
 MAINTAINERS | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b81aad6f7f97..ca1995b1ef45 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9070,6 +9070,17 @@ F:   Documentation/x86/intel_txt.rst
 F: arch/x86/kernel/tboot.c
 F: include/linux/tboot.h
 
+INTEL SGX
+M: Jarkko Sakkinen 
+M: Sean Christopherson 
+L: linux-...@vger.kernel.org
+S: Maintained
+Q: https://patchwork.kernel.org/project/intel-sgx/list/
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
+F: arch/x86/include/uapi/asm/sgx.h
+F: arch/x86/kernel/cpu/sgx/*
+K: \bSGX_
+
 INTERCONNECT API
 M: Georgi Djakov 
 L: linux...@vger.kernel.org
-- 
2.25.1



[PATCH v39 10/24] mm: Add 'mprotect' hook to struct vm_operations_struct

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

Background
==

1. SGX enclave pages are populated with data by copying data to them from
   normal memory via an ioctl() (SGX_IOC_ENCLAVE_ADD_PAGES).
2. It is desirable to be able to restrict those normal memory data sources.
   For instance, to ensure that the source data is executable before
   copying data to an executable enclave page.
3. Enclave page permissions are dynamic (just like normal permissions) and
   can be adjusted at runtime with mprotect().
4. The original data source may have long since vanished at the time when
   enclave page permissions are established (mmap() or mprotect()).

The solution (elsewhere in this series) is to force enclaves creators to
declare their paging permission *intent* up front to the ioctl().  This
intent can me immediately compared to the source data’s mapping (and
rejected if necessary).

The intent is also stashed off for later comparison with enclave PTEs.
This ensures that any future mmap()/mprotect() operations performed by the
enclave creator or done on behalf of the enclave can be compared with the
earlier declared permissions.

Problem
===

There is an existing mmap() hook which allows SGX to perform this
permission comparison at mmap() time.  However, there is no corresponding
->mprotect() hook.

Solution


Add a vm_ops->mprotect() hook so that mprotect() operations which are
inconsistent with any page's stashed intent can be rejected by the driver.

Cc: linux...@kvack.org
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Signed-off-by: Sean Christopherson 
Co-developed-by: Jarkko Sakkinen 
Signed-off-by: Jarkko Sakkinen 
---
 include/linux/mm.h | 3 +++
 mm/mprotect.c  | 5 -
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b2f370f0b420..dca57fe80555 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -551,6 +551,9 @@ struct vm_operations_struct {
void (*close)(struct vm_area_struct * area);
int (*split)(struct vm_area_struct * area, unsigned long addr);
int (*mremap)(struct vm_area_struct * area);
+   int (*mprotect)(struct vm_area_struct *vma,
+   struct vm_area_struct **pprev, unsigned long start,
+   unsigned long end, unsigned long newflags);
vm_fault_t (*fault)(struct vm_fault *vmf);
vm_fault_t (*huge_fault)(struct vm_fault *vmf,
enum page_entry_size pe_size);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ce8b8a5eacbb..f170f3da8a4f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -610,7 +610,10 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
-   error = mprotect_fixup(vma, , nstart, tmp, newflags);
+   if (vma->vm_ops && vma->vm_ops->mprotect)
+   error = vma->vm_ops->mprotect(vma, , nstart, tmp, 
newflags);
+   else
+   error = mprotect_fixup(vma, , nstart, tmp, 
newflags);
if (error)
goto out;
nstart = tmp;
-- 
2.25.1



[PATCH v39 14/24] x86/sgx: Add SGX_IOC_ENCLAVE_INIT

2020-10-02 Thread Jarkko Sakkinen
Add an ioctl that performs ENCLS[EINIT], which locks down the measurement
and initializes the enclave for entrance. After this, new pages can no
longer be added.

Acked-by: Jethro Beekman 
Tested-by: Jethro Beekman 
Tested-by: Haitao Huang 
Tested-by: Chunyang Hui 
Tested-by: Jordan Hand 
Tested-by: Nathaniel McCallum 
Tested-by: Seth Moore 
Tested-by: Darren Kenny 
Reviewed-by: Darren Kenny 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Co-developed-by: Suresh Siddha 
Signed-off-by: Suresh Siddha 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/include/uapi/asm/sgx.h |  11 ++
 arch/x86/kernel/cpu/sgx/encl.h  |   2 +
 arch/x86/kernel/cpu/sgx/ioctl.c | 193 
 3 files changed, 206 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 10cd48d06318..e401fa72eaab 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -23,6 +23,8 @@ enum sgx_page_flags {
_IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create)
 #define SGX_IOC_ENCLAVE_ADD_PAGES \
_IOWR(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages)
+#define SGX_IOC_ENCLAVE_INIT \
+   _IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -52,4 +54,13 @@ struct sgx_enclave_add_pages {
__u64 count;
 };
 
+/**
+ * struct sgx_enclave_init - parameter structure for the
+ *   %SGX_IOC_ENCLAVE_INIT ioctl
+ * @sigstruct: address for the SIGSTRUCT data
+ */
+struct sgx_enclave_init {
+   __u64 sigstruct;
+};
+
 #endif /* _UAPI_ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 8ff445476657..0448d22d3010 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -70,6 +70,8 @@ struct sgx_encl {
struct xarray page_array;
struct sgx_encl_page secs;
cpumask_t cpumask;
+   unsigned long attributes;
+   unsigned long attributes_mask;
 };
 
 extern const struct vm_operations_struct sgx_vm_ops;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index e13e04737683..cf5a43d6daa2 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -128,6 +128,9 @@ static int sgx_encl_create(struct sgx_encl *encl, struct 
sgx_secs *secs)
encl->base = secs->base;
encl->size = secs->size;
encl->ssaframesize = secs->ssa_frame_size;
+   encl->attributes = secs->attributes;
+   encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT |
+   SGX_ATTR_KSS;
 
/*
 * Set SGX_ENCL_CREATED only after the enclave is fully prepped.  This
@@ -490,6 +493,193 @@ static long sgx_ioc_enclave_add_pages(struct sgx_encl 
*encl, void __user *arg)
return ret;
 }
 
+static int __sgx_get_key_hash(struct crypto_shash *tfm, const void *modulus,
+ void *hash)
+{
+   SHASH_DESC_ON_STACK(shash, tfm);
+
+   shash->tfm = tfm;
+
+   return crypto_shash_digest(shash, modulus, SGX_MODULUS_SIZE, hash);
+}
+
+static int sgx_get_key_hash(const void *modulus, void *hash)
+{
+   struct crypto_shash *tfm;
+   int ret;
+
+   tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC);
+   if (IS_ERR(tfm))
+   return PTR_ERR(tfm);
+
+   ret = __sgx_get_key_hash(tfm, modulus, hash);
+
+   crypto_free_shash(tfm);
+   return ret;
+}
+
+static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct 
*sigstruct,
+void *token)
+{
+   u64 mrsigner[4];
+   void *addr;
+   int ret;
+   int i;
+   int j;
+
+   /*
+* Deny initializing enclaves with attributes (namely provisioning)
+* that have not been explicitly allowed.
+*/
+   if (encl->attributes & ~encl->attributes_mask)
+   return -EACCES;
+
+   /*
+* Attributes should not be enforced *only* against what's available on
+* platform (done in sgx_encl_create) but checked and enforced against
+* the mask for enforcement in sigstruct. For example an enclave could
+* opt to sign with AVX bit in xfrm, but still be loadable on a platform
+* without it if the sigstruct->body.attributes_mask does not turn that
+* bit on.
+*/
+   if (sigstruct->body.attributes & sigstruct->body.attributes_mask &
+   sgx_attributes_reserved_mask)
+   return -EINVAL;
+
+   if (sigstruct->body.miscselect & sigstruct->body.misc_mask &
+   sgx_misc_reserved_mask)
+   return -EINVAL;
+
+   if (sigstruct->body.xfrm & sigstruct->body.xfrm_mask &
+   sgx_xfrm_reserved_mask)
+   return -EINVAL;
+
+   ret = sgx_get_key_hash(sigstruct->modulus, mrsigner);
+   if (ret)
+   return ret;
+
+   mutex_lock(>lock);
+
+   /*
+  

[PATCH v39 07/24] x86/cpu/intel: Add nosgx kernel parameter

2020-10-02 Thread Jarkko Sakkinen
Add kernel parameter to disable Intel SGX kernel support.

Tested-by: Sean Christopherson 
Reviewed-by: Sean Christopherson 
Reviewed-by: Darren Kenny 
Signed-off-by: Jarkko Sakkinen 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 ++
 arch/x86/kernel/cpu/feat_ctl.c  | 9 +
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 115c50f4e927..7ff2b35a1b8e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3356,6 +3356,8 @@
 
nosep   [BUGS=X86-32] Disables x86 SYSENTER/SYSEXIT support.
 
+   nosgx   [X86-64,SGX] Disables Intel SGX kernel support.
+
nosmp   [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC.  legacy for "maxcpus=0".
 
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index c3afcd2e4342..1837df39527f 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -101,6 +101,15 @@ static void clear_sgx_caps(void)
setup_clear_cpu_cap(X86_FEATURE_SGX2);
 }
 
+static int __init nosgx(char *str)
+{
+   clear_sgx_caps();
+
+   return 0;
+}
+
+early_param("nosgx", nosgx);
+
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 {
bool tboot = tboot_enabled();
-- 
2.25.1



[PATCH v39 12/24] x86/sgx: Add SGX_IOC_ENCLAVE_CREATE

2020-10-02 Thread Jarkko Sakkinen
Add an ioctl that performs ENCLS[ECREATE], which creates SGX Enclave
Control Structure for the enclave. SECS contains attributes about the
enclave that are used by the hardware and cannot be directly accessed by
software, as SECS resides in the EPC.

One essential field in SECS is a field that stores the SHA256 of the
measured enclave pages. This field, MRENCLAVE, is initialized by the
ECREATE instruction and updated by every EADD and EEXTEND operation.
Finally, EINIT locks down the value.

Acked-by: Jethro Beekman 
Tested-by: Jethro Beekman 
Tested-by: Haitao Huang 
Tested-by: Chunyang Hui 
Tested-by: Jordan Hand 
Tested-by: Nathaniel McCallum 
Tested-by: Seth Moore 
Tested-by: Darren Kenny 
Reviewed-by: Darren Kenny 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Co-developed-by: Suresh Siddha 
Signed-off-by: Suresh Siddha 
Signed-off-by: Jarkko Sakkinen 
---
 .../userspace-api/ioctl/ioctl-number.rst  |   1 +
 arch/x86/include/uapi/asm/sgx.h   |  25 ++
 arch/x86/kernel/cpu/sgx/Makefile  |   1 +
 arch/x86/kernel/cpu/sgx/driver.c  |  12 +
 arch/x86/kernel/cpu/sgx/driver.h  |   1 +
 arch/x86/kernel/cpu/sgx/ioctl.c   | 223 ++
 6 files changed, 263 insertions(+)
 create mode 100644 arch/x86/include/uapi/asm/sgx.h
 create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 2a198838fca9..a89e1c46a25a 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -323,6 +323,7 @@ Code  Seq#Include File  
 Comments
  

 0xA3  90-9F  linux/dtlk.h
 0xA4  00-1F  uapi/linux/tee.hGeneric 
TEE subsystem
+0xA4  00-1F  uapi/asm/sgx.h  

 0xAA  00-3F  linux/uapi/linux/userfaultfd.h
 0xAB  00-1F  linux/nbd.h
 0xAC  00-1F  linux/raw.h
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
new file mode 100644
index ..c75b375f3770
--- /dev/null
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: ((GPL-2.0+ WITH Linux-syscall-note) OR 
BSD-3-Clause) */
+/*
+ * Copyright(c) 2016-19 Intel Corporation.
+ */
+#ifndef _UAPI_ASM_X86_SGX_H
+#define _UAPI_ASM_X86_SGX_H
+
+#include 
+#include 
+
+#define SGX_MAGIC 0xA4
+
+#define SGX_IOC_ENCLAVE_CREATE \
+   _IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create)
+
+/**
+ * struct sgx_enclave_create - parameter structure for the
+ * %SGX_IOC_ENCLAVE_CREATE ioctl
+ * @src:   address for the SECS page data
+ */
+struct sgx_enclave_create  {
+   __u64   src;
+};
+
+#endif /* _UAPI_ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 3fc451120735..91d3dc784a29 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,4 +1,5 @@
 obj-y += \
driver.o \
encl.o \
+   ioctl.o \
main.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index f54da5f19c2b..7bdb49dfcca6 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -114,10 +114,22 @@ static unsigned long sgx_get_unmapped_area(struct file 
*file,
return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
 }
 
+#ifdef CONFIG_COMPAT
+static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,
+ unsigned long arg)
+{
+   return sgx_ioctl(filep, cmd, arg);
+}
+#endif
+
 static const struct file_operations sgx_encl_fops = {
.owner  = THIS_MODULE,
.open   = sgx_open,
.release= sgx_release,
+   .unlocked_ioctl = sgx_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = sgx_compat_ioctl,
+#endif
.mmap   = sgx_mmap,
.get_unmapped_area  = sgx_get_unmapped_area,
 };
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
index f7ce40dedc91..e4063923115b 100644
--- a/arch/x86/kernel/cpu/sgx/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "sgx.h"
 
 #define SGX_EINIT_SPIN_COUNT   20
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
new file mode 100644
index ..9bb4694e57c1
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-19 Intel Corporation.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 

[PATCH v39 01/24] x86/cpufeatures: x86/msr: Add Intel SGX hardware bits

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

Add X86_FEATURE_SGX from CPUID.(EAX=7, ECX=1), which informs whether the
CPU has SGX.

Add X86_FEATURE_SGX1 and X86_FEATURE_SGX2 from CPUID.(EAX=12H, ECX=0),
which describe the level of SGX support available [1].

Add IA32_FEATURE_CONTROL.SGX_ENABLE. BIOS can use this bit to opt-in SGX
before locking the feature control MSR [2].

[1] Intel SDM: 36.7.2 Intel® SGX Resource Enumeration Leaves
[2] Intel SDM: 36.7.1 Intel® SGX Opt-In Configuration

Reviewed-by: Borislav Petkov 
Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Signed-off-by: Sean Christopherson 
Co-developed-by: Jarkko Sakkinen 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/include/asm/cpufeature.h|  5 +++--
 arch/x86/include/asm/cpufeatures.h   |  7 ++-
 arch/x86/include/asm/disabled-features.h | 18 +++---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/include/asm/required-features.h |  2 +-
 arch/x86/kernel/cpu/common.c |  4 
 6 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 59bf91c57aa8..efbdba5170a3 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -30,6 +30,7 @@ enum cpuid_leafs
CPUID_7_ECX,
CPUID_8000_0007_EBX,
CPUID_7_EDX,
+   CPUID_12_EAX,
 };
 
 #ifdef CONFIG_X86_FEATURE_NAMES
@@ -89,7 +90,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) ||\
   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) ||\
   REQUIRED_MASK_CHECK||\
-  BUILD_BUG_ON_ZERO(NCAPINTS != 19))
+  BUILD_BUG_ON_ZERO(NCAPINTS != 20))
 
 #define DISABLED_MASK_BIT_SET(feature_bit) \
 ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK,  0, feature_bit) ||\
@@ -112,7 +113,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) ||\
   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) ||\
   DISABLED_MASK_CHECK||\
-  BUILD_BUG_ON_ZERO(NCAPINTS != 19))
+  BUILD_BUG_ON_ZERO(NCAPINTS != 20))
 
 #define cpu_has(c, bit)
\
(__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 :  \
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..7150001d5232 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -13,7 +13,7 @@
 /*
  * Defines x86 CPU feature bits
  */
-#define NCAPINTS   19 /* N 32-bit words worth of 
info */
+#define NCAPINTS   20 /* N 32-bit words worth of 
info */
 #define NBUGINTS   1  /* N 32-bit bug flags */
 
 /*
@@ -241,6 +241,7 @@
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
 #define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3B */
+#define X86_FEATURE_SGX( 9*32+ 2) /* Software Guard 
Extensions */
 #define X86_FEATURE_BMI1   ( 9*32+ 3) /* 1st group bit 
manipulation extensions */
 #define X86_FEATURE_HLE( 9*32+ 4) /* Hardware Lock 
Elision */
 #define X86_FEATURE_AVX2   ( 9*32+ 5) /* AVX2 instructions */
@@ -381,6 +382,10 @@
 #define X86_FEATURE_CORE_CAPABILITIES  (18*32+30) /* "" IA32_CORE_CAPABILITIES 
MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store 
Bypass Disable */
 
+/* Intel-defined SGX features, CPUID level 0x0012:0 (EAX), word 19 */
+#define X86_FEATURE_SGX1   (19*32+ 0) /* SGX1 leaf functions */
+#define X86_FEATURE_SGX2   (19*32+ 1) /* SGX2 leaf functions */
+
 /*
  * BUG word(s)
  */
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 5861d34f9771..689a100948eb 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -28,13 +28,18 @@
 # define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
 # define DISABLE_CENTAUR_MCR   (1<<(X86_FEATURE_CENTAUR_MCR & 31))
 # define DISABLE_PCID  0
+# define DISABLE_SGX1  0
+# define DISABLE_SGX2  0
 #else
 # define DISABLE_VME   0
 # define DISABLE_K6_MTRR   0
 # define DISABLE_CYRIX_ARR 0
 # define DISABLE_CENTAUR_MCR   0
 # define DISABLE_PCID  (1<<(X86_FEATURE_PCID & 31))
-#endif /* CONFIG_X86_64 */
+# define DISABLE_SGX1  (1<<(X86_FEATURE_SGX1 & 31))
+# define DISABLE_SGX2  (1<<(X86_FEATURE_SGX2 & 31))
+ #endif /* CONFIG_X86_64 */
+
 
 #ifdef 

[PATCH v39 05/24] x86/sgx: Add wrappers for ENCLS leaf functions

2020-10-02 Thread Jarkko Sakkinen
ENCLS is a ring 0 instruction, which contains a set of leaf functions for
managing an enclave. Enclaves are measured and signed software entities,
which are protected by asserting the outside memory accesses and memory
encryption.

Add a two-layer macro system along with an encoding scheme to allow
wrappers to return trap numbers along ENCLS-specific error codes. The
bottom layer of the macro system splits between the leafs that return an
error code and those that do not. The second layer generates the correct
input/output annotations based on the number of operands for each leaf
function.

ENCLS leaf functions are documented in

  Intel SDM: 36.6 ENCLAVE INSTRUCTIONS AND INTEL®

Acked-by: Jethro Beekman 
Tested-by: Darren Kenny 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/sgx/encls.h | 238 
 1 file changed, 238 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/encls.h

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
new file mode 100644
index ..a87f15ea5cca
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -0,0 +1,238 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef _X86_ENCLS_H
+#define _X86_ENCLS_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "sgx.h"
+
+enum sgx_encls_leaf {
+   ECREATE = 0x00,
+   EADD= 0x01,
+   EINIT   = 0x02,
+   EREMOVE = 0x03,
+   EDGBRD  = 0x04,
+   EDGBWR  = 0x05,
+   EEXTEND = 0x06,
+   ELDU= 0x08,
+   EBLOCK  = 0x09,
+   EPA = 0x0A,
+   EWB = 0x0B,
+   ETRACK  = 0x0C,
+};
+
+/**
+ * ENCLS_FAULT_FLAG - flag signifying an ENCLS return code is a trapnr
+ *
+ * ENCLS has its own (positive value) error codes and also generates
+ * ENCLS specific #GP and #PF faults.  And the ENCLS values get munged
+ * with system error codes as everything percolates back up the stack.
+ * Unfortunately (for us), we need to precisely identify each unique
+ * error code, e.g. the action taken if EWB fails varies based on the
+ * type of fault and on the exact SGX error code, i.e. we can't simply
+ * convert all faults to -EFAULT.
+ *
+ * To make all three error types coexist, we set bit 30 to identify an
+ * ENCLS fault.  Bit 31 (technically bits N:31) is used to differentiate
+ * between positive (faults and SGX error codes) and negative (system
+ * error codes) values.
+ */
+#define ENCLS_FAULT_FLAG 0x4000
+
+/* Retrieve the encoded trapnr from the specified return code. */
+#define ENCLS_TRAPNR(r) ((r) & ~ENCLS_FAULT_FLAG)
+
+/* Issue a WARN() about an ENCLS leaf. */
+#define ENCLS_WARN(r, name) {\
+   do {  \
+   int _r = (r); \
+   WARN_ONCE(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
+   } while (0);  \
+}
+
+/**
+ * encls_failed() - Check if an ENCLS leaf function failed
+ * @ret:   the return value of an ENCLS leaf function call
+ *
+ * Check if an ENCLS leaf function failed. This happens when the leaf function
+ * causes a fault that is not caused by an EPCM conflict or when the leaf
+ * function returns a non-zero value.
+ */
+static inline bool encls_failed(int ret)
+{
+   int epcm_trapnr;
+
+   if (boot_cpu_has(X86_FEATURE_SGX2))
+   epcm_trapnr = X86_TRAP_PF;
+   else
+   epcm_trapnr = X86_TRAP_GP;
+
+   if (ret & ENCLS_FAULT_FLAG)
+   return ENCLS_TRAPNR(ret) != epcm_trapnr;
+
+   return !!ret;
+}
+
+/**
+ * __encls_ret_N - encode an ENCLS leaf that returns an error code in EAX
+ * @rax:   leaf number
+ * @inputs:asm inputs for the leaf
+ *
+ * Emit assembly for an ENCLS leaf that returns an error code, e.g. EREMOVE.
+ * And because SGX isn't complex enough as it is, leafs that return an error
+ * code also modify flags.
+ *
+ * Return:
+ * 0 on success,
+ * SGX error code on failure
+ */
+#define __encls_ret_N(rax, inputs...)  \
+   ({  \
+   int ret;\
+   asm volatile(   \
+   "1: .byte 0x0f, 0x01, 0xcf;\n\t"\
+   "2:\n"  \
+   ".section .fixup,\"ax\"\n"  \
+   "3: orl $"__stringify(ENCLS_FAULT_FLAG)",%%eax\n"   \
+   "   jmp 2b\n"   \
+   ".previous\n"   \
+   _ASM_EXTABLE_FAULT(1b, 3b)  \
+   : "=a"(ret) \
+   : 

[PATCH v39 09/24] x86/sgx: Add __sgx_alloc_epc_page() and sgx_free_epc_page()

2020-10-02 Thread Jarkko Sakkinen
Add __sgx_alloc_epc_page(), which iterates through EPC sections and borrows
a page structure that is not used by anyone else. When a page is no longer
needed it must be released with sgx_free_epc_page(). This function
implicitly calls ENCLS[EREMOVE], which will return the page to the
uninitialized state (i.e. not required from caller part).

Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/sgx/main.c | 62 ++
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++
 2 files changed, 65 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index c5831e3db14a..97c6895fb6c9 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -83,6 +83,68 @@ static bool __init sgx_page_reclaimer_init(void)
return true;
 }
 
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct 
sgx_epc_section *section)
+{
+   struct sgx_epc_page *page;
+
+   if (list_empty(>page_list))
+   return NULL;
+
+   page = list_first_entry(>page_list, struct sgx_epc_page, list);
+   list_del_init(>list);
+
+   return page;
+}
+
+/**
+ * __sgx_alloc_epc_page() - Allocate an EPC page
+ *
+ * Iterate through EPC sections and borrow a free EPC page to the caller. When 
a
+ * page is no longer needed it must be released with sgx_free_epc_page().
+ *
+ * Return:
+ *   an EPC page,
+ *   -errno on error
+ */
+struct sgx_epc_page *__sgx_alloc_epc_page(void)
+{
+   struct sgx_epc_section *section;
+   struct sgx_epc_page *page;
+   int i;
+
+   for (i = 0; i < sgx_nr_epc_sections; i++) {
+   section = _epc_sections[i];
+   spin_lock(>lock);
+   page = __sgx_alloc_epc_page_from_section(section);
+   spin_unlock(>lock);
+
+   if (page)
+   return page;
+   }
+
+   return ERR_PTR(-ENOMEM);
+}
+
+/**
+ * sgx_free_epc_page() - Free an EPC page
+ * @page:  an EPC page
+ *
+ * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ */
+void sgx_free_epc_page(struct sgx_epc_page *page)
+{
+   struct sgx_epc_section *section = sgx_get_epc_section(page);
+   int ret;
+
+   ret = __eremove(sgx_get_epc_addr(page));
+   if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
+   return;
+
+   spin_lock(>lock);
+   list_add_tail(>list, >page_list);
+   spin_unlock(>lock);
+}
+
 static void __init sgx_free_epc_section(struct sgx_epc_section *section)
 {
struct sgx_epc_page *page;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index dff4f5f16d09..fce756c3434b 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -49,4 +49,7 @@ static inline void *sgx_get_epc_addr(struct sgx_epc_page 
*page)
return section->va + (page->desc & PAGE_MASK) - section->pa;
 }
 
+struct sgx_epc_page *__sgx_alloc_epc_page(void);
+void sgx_free_epc_page(struct sgx_epc_page *page);
+
 #endif /* _X86_SGX_H */
-- 
2.25.1



[PATCH v39 00/24] Intel SGX foundations

2020-10-02 Thread Jarkko Sakkinen
Intel(R) SGX is a set of CPU instructions that can be used by applications
to set aside private regions of code and data. The code outside the enclave
is disallowed to access the memory inside the enclave by the CPU access
control.

There is a new hardware unit in the processor called Memory Encryption
Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
one or many MEE regions that can hold enclave data by configuring them with
PRMRR registers.

The MEE automatically encrypts the data leaving the processor package to
the MEE regions. The data is encrypted using a random key whose life-time
is exactly one power cycle.

The current implementation requires that the firmware sets
IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
decide what enclaves it wants run. The implementation does not create
any bottlenecks to support read-only MSRs later on.

You can tell if your CPU supports SGX by looking into /proc/cpuinfo:

cat /proc/cpuinfo  | grep sgx

v39 (2020-10-03):
* A new GIT tree location.
  git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
* Return -ERESTARTSYS instead of -EINTR in SGX_IOC_ENCLAVE_ADD_PAGES.
  https://lore.kernel.org/linux-sgx/20200917160322.gg8...@linux.intel.com/T/#u
* Do not initialize 'encl_size' in sgx_encl_create before
  sgx_validate_secs() is called.
  https://lore.kernel.org/linux-sgx/20200921100356.gb5...@zn.tnic/
* Revert 'count' back to struct sgx_enclave_add_pages, move the check of
  -EIO to sgx_ioc_enclave_pages() instead of being buried in subfunctions.
  https://lore.kernel.org/linux-sgx/20200921195822.ga58...@linux.intel.com/
* Fix documentation for the 'encl' parameter in sgx_ioc_enclave_create(),
  sgx_ioc_enclave_init() and sgx_ioc_enclave_provision().
  https://lore.kernel.org/linux-sgx/20200921100356.gb5...@zn.tnic/
* Refine sgx_ioc_enclave_create() kdoc to better describe the meaning and
  purpose of SECS validation done by sgx_validate_secs().
  https://lore.kernel.org/linux-sgx/20200921135107.gg5...@zn.tnic/
* Improve documentation sgx_ioc_enclave_add_pages() on IO failures.
  https://lore.kernel.org/linux-sgx/20200921194419.ga56...@linux.intel.com/
* Fix a bug in __sgx_encl_add_page(). When get_user_pages() fails, we must
  return -EFAULT instead of mistakenly returning the page count.
  Reported by Haitao Huang.
* Rewrite the commit message for vm_ops->mprotect() (courtesy of Dave Hansen)
  
https://lore.kernel.org/linux-sgx/32fc9df4-d4aa-6768-aa06-0035427b7...@intel.com/
* Fix ptrace support coding style issues.
  https://lore.kernel.org/linux-sgx/20200922154424.gl22...@zn.tnic/
* Fix the documentation.
  
https://lore.kernel.org/linux-sgx/20200915112842.897265-24-jarkko.sakki...@linux.intel.com/
* Always write MSRs for the public key before EINIT.
  https://lore.kernel.org/linux-sgx/20200921173514.gi5...@zn.tnic/
* Categorically disabled enclaves from noexec partitions.
  https://lore.kernel.org/linux-sgx/20200923135056.gd5...@linux.intel.com/
* Properly document the EWB flow, i.e. why there is three trials for EWB.
  https://lore.kernel.org/linux-sgx/20200922104538.ge22...@zn.tnic/
* Add kdoc about batch processing to sgx_reclaim_pages().
  https://lore.kernel.org/linux-sgx/20200922104538.ge22...@zn.tnic/
  https://lore.kernel.org/linux-mm/20200929011438.ga31...@linux.intel.com/
* Documentation fixes.
  
https://lore.kernel.org/linux-sgx/20200915112842.897265-1-jarkko.sakki...@linux.intel.com/T/#me637011aba9f45698eba88ff195452c0491c07fe
* SGX vDSO clean ups.
  
https://lore.kernel.org/linux-sgx/20200915112842.897265-1-jarkko.sakki...@linux.intel.com/T/#ma2204bba8d8e8a09bf9164fc1bb5c55813997b4a
* Add the commit message from "x86/vdso: Add support for exception fixup in 
vDSO functions" to Documentation/x86/sgx.rst
  https://lore.kernel.org/linux-sgx/20200923220712.gu28...@zn.tnic/
* Update correct attributes variable when allowing provisioning.
  https://lore.kernel.org/linux-sgx/20201001220824.ga24...@linux.intel.com/T/#t
* Remove sgx_exception and put its fields to sgx_exception.
  
https://lore.kernel.org/linux-sgx/20201002211212.620059-1-jarkko.sakki...@linux.intel.com/T/#u
* Remove 'exit_reason' and put EEXIT to 'self' field of sgx_enclave_run.
  
https://lore.kernel.org/linux-sgx/20201002211212.620059-1-jarkko.sakki...@linux.intel.com/T/#u
* Refine clarity of the field names in struct sgx_enclave_run and vsgx.S, and 
rewrite kdoc.
  
https://lore.kernel.org/linux-sgx/20201002211212.620059-1-jarkko.sakki...@linux.intel.com/T/#u
* Fix memory validation in vsgx.S. The reserved areas was not zero validated,
  which causes unnecessary risk for memory corruption bugs. In effect, 'flags'
  field can be  removed from struct sgx_enclave_run.
  
https://lore.kernel.org/linux-sgx/20201002211212.620059-1-jarkko.sakki...@linux.intel.com/T/#u
* Reduce the size of sgx_enclave_run from 256 bytes to 64 bytes, i.e. size of
  a cache line. This leave 24 bytes of free space to waste in future.
  

[PATCH v39 04/24] x86/sgx: Add SGX microarchitectural data structures

2020-10-02 Thread Jarkko Sakkinen
Define the SGX microarchitectural data structures used by various SGX
opcodes. This is not an exhaustive representation of all SGX data
structures but only those needed by the kernel.

The data structures are described in:

  Intel SDM: 37.6 INTEL® SGX DATA STRUCTURES OVERVIEW

Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/sgx/arch.h | 341 +
 1 file changed, 341 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/arch.h

diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/kernel/cpu/sgx/arch.h
new file mode 100644
index ..ccecc39728dc
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/arch.h
@@ -0,0 +1,341 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/**
+ * Copyright(c) 2016-18 Intel Corporation.
+ *
+ * Contains data structures defined by the SGX architecture.  Data structures
+ * defined by the Linux software stack should not be placed here.
+ */
+#ifndef _ASM_X86_SGX_ARCH_H
+#define _ASM_X86_SGX_ARCH_H
+
+#include 
+#include 
+
+#define SGX_CPUID  0x12
+#define SGX_CPUID_FIRST_VARIABLE_SUB_LEAF  2
+
+/**
+ * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
+ * %SGX_NOT_TRACKED:   Previous ETRACK's shootdown sequence has not
+ * been completed yet.
+ * %SGX_INVALID_EINITTOKEN:EINITTOKEN is invalid and enclave signer's
+ * public key does not match IA32_SGXLEPUBKEYHASH.
+ * %SGX_UNMASKED_EVENT:An unmasked event, e.g. INTR, was 
received
+ */
+enum sgx_return_code {
+   SGX_NOT_TRACKED = 11,
+   SGX_INVALID_EINITTOKEN  = 16,
+   SGX_UNMASKED_EVENT  = 128,
+};
+
+/**
+ * enum sgx_sub_leaf_types - SGX CPUID variable sub-leaf types
+ * %SGX_CPUID_SUB_LEAF_INVALID:Indicates this sub-leaf is 
invalid.
+ * %SGX_CPUID_SUB_LEAF_EPC_SECTION:Sub-leaf enumerates an EPC section.
+ */
+enum sgx_sub_leaf_types {
+   SGX_CPUID_SUB_LEAF_INVALID  = 0x0,
+   SGX_CPUID_SUB_LEAF_EPC_SECTION  = 0x1,
+};
+
+#define SGX_CPUID_SUB_LEAF_TYPE_MASK   GENMASK(3, 0)
+
+#define SGX_MODULUS_SIZE 384
+
+/**
+ * enum sgx_miscselect - additional information to an SSA frame
+ * %SGX_MISC_EXINFO:   Report #PF or #GP to the SSA frame.
+ *
+ * Save State Area (SSA) is a stack inside the enclave used to store processor
+ * state when an exception or interrupt occurs. This enum defines additional
+ * information stored to an SSA frame.
+ */
+enum sgx_miscselect {
+   SGX_MISC_EXINFO = BIT(0),
+};
+
+#define SGX_MISC_RESERVED_MASK GENMASK_ULL(63, 1)
+
+#define SGX_SSA_GPRS_SIZE  184
+#define SGX_SSA_MISC_EXINFO_SIZE   16
+
+/**
+ * enum sgx_attributes - the attributes field in  sgx_secs
+ * %SGX_ATTR_INIT: Enclave can be entered (is initialized).
+ * %SGX_ATTR_DEBUG:Allow ENCLS(EDBGRD) and ENCLS(EDBGWR).
+ * %SGX_ATTR_MODE64BIT:Tell that this a 64-bit enclave.
+ * %SGX_ATTR_PROVISIONKEY:  Allow to use provisioning keys for remote
+ * attestation.
+ * %SGX_ATTR_KSS:  Allow to use key separation and sharing (KSS).
+ * %SGX_ATTR_EINITTOKENKEY:Allow to use token signing key that is used to
+ * sign cryptographic tokens that can be passed to
+ * EINIT as an authorization to run an enclave.
+ */
+enum sgx_attribute {
+   SGX_ATTR_INIT   = BIT(0),
+   SGX_ATTR_DEBUG  = BIT(1),
+   SGX_ATTR_MODE64BIT  = BIT(2),
+   SGX_ATTR_PROVISIONKEY   = BIT(4),
+   SGX_ATTR_EINITTOKENKEY  = BIT(5),
+   SGX_ATTR_KSS= BIT(7),
+};
+
+#define SGX_ATTR_RESERVED_MASK (BIT_ULL(3) | BIT_ULL(6) | GENMASK_ULL(63, 8))
+
+/**
+ * struct sgx_secs - SGX Enclave Control Structure (SECS)
+ * @size:  size of the address space
+ * @base:  base address of the  address space
+ * @ssa_frame_size:size of an SSA frame
+ * @miscselect:additional information stored to an SSA frame
+ * @attributes:attributes for enclave
+ * @xfrm:  XSave-Feature Request Mask (subset of XCR0)
+ * @mrenclave: SHA256-hash of the enclave contents
+ * @mrsigner:  SHA256-hash of the public key used to sign the SIGSTRUCT
+ * @config_id: a user-defined value that is used in key derivation
+ * @isv_prod_id:   a user-defined value that is used in key derivation
+ * @isv_svn:   a user-defined value that is used in key derivation
+ * @config_svn:a user-defined value that is used in key 
derivation
+ *
+ * SGX Enclave Control Structure (SECS) is a special enclave page that is not
+ * visible in the address space. In fact, this structure defines the address
+ * range and other global attributes for the enclave and it is the first EPC
+ * 

[PATCH v39 02/24] x86/cpufeatures: x86/msr: Add Intel SGX Launch Control hardware bits

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

Add X86_FEATURE_SGX_LC, which informs whether or not the CPU supports SGX
Launch Control.

Add MSR_IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}, which when combined contain a
SHA256 hash of a 3072-bit RSA public key. SGX backed software packages, so
called enclaves, are always signed. All enclaves signed with the public key
are unconditionally allowed to initialize. [1]

Add FEAT_CTL_SGX_LC_ENABLED, which informs whether the aformentioned MSRs
are writable or not. If the bit is off, the public key MSRs are read-only
for the OS.

If the MSRs are read-only, the platform must provide a launch enclave (LE).
LE can create cryptographic tokens for other enclaves that they can pass
together with their signature to the ENCLS(EINIT) opcode, which is used
to initialize enclaves.

Linux is unlikely to support the locked configuration because it takes away
the control of the launch decisions from the kernel.

[1] Intel SDM: 38.1.4 Intel SGX Launch Control Configuration

Reviewed-by: Borislav Petkov 
Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Signed-off-by: Sean Christopherson 
Co-developed-by: Jarkko Sakkinen 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/include/asm/msr-index.h   | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 7150001d5232..62b58cda034a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -357,6 +357,7 @@
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
 #define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS 
instructions */
+#define X86_FEATURE_SGX_LC (16*32+30) /* Software Guard Extensions 
Launch Control */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 258d555d22f2..d0c6cfff5b55 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -609,6 +609,7 @@
 #define FEAT_CTL_LOCKEDBIT(0)
 #define FEAT_CTL_VMX_ENABLED_INSIDE_SMXBIT(1)
 #define FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX   BIT(2)
+#define FEAT_CTL_SGX_LC_ENABLEDBIT(17)
 #define FEAT_CTL_SGX_ENABLED   BIT(18)
 #define FEAT_CTL_LMCE_ENABLED  BIT(20)
 
@@ -629,6 +630,12 @@
 #define MSR_IA32_UCODE_WRITE   0x0079
 #define MSR_IA32_UCODE_REV 0x008b
 
+/* Intel SGX Launch Enclave Public Key Hash MSRs */
+#define MSR_IA32_SGXLEPUBKEYHASH0  0x008C
+#define MSR_IA32_SGXLEPUBKEYHASH1  0x008D
+#define MSR_IA32_SGXLEPUBKEYHASH2  0x008E
+#define MSR_IA32_SGXLEPUBKEYHASH3  0x008F
+
 #define MSR_IA32_SMM_MONITOR_CTL   0x009b
 #define MSR_IA32_SMBASE0x009e
 
-- 
2.25.1



[PATCH v39 06/24] x86/cpu/intel: Detect SGX support

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

Configure SGX as part of feature control MSR initialization and update
the associated X86_FEATURE flags accordingly.  Because the kernel will
require the LE hash MSRs to be writable when running native enclaves,
disable X86_FEATURE_SGX (and all derivatives) if SGX Launch Control is
not (or cannot) be fully enabled via feature control MSR.

The check is done for every CPU, not just BSP, in order to verify that
MSR_IA32_FEATURE_CONTROL is correctly configured on all CPUs. The other
parts of the kernel, like the enclave driver, expect the same
configuration from all CPUs.

Note, unlike VMX, clear the X86_FEATURE_SGX* flags for all CPUs if any
CPU lacks SGX support as the kernel expects SGX to be available on all
CPUs.  X86_FEATURE_VMX is intentionally cleared only for the current CPU
so that KVM can provide additional information if KVM fails to load,
e.g. print which CPU doesn't support VMX.  KVM/VMX requires additional
per-CPU enabling, e.g. to set CR4.VMXE and do VMXON, and so already has
the necessary infrastructure to do per-CPU checks.  SGX on the other
hand doesn't require additional enabling, so clearing the feature flags
on all CPUs means the SGX subsystem doesn't need to manually do support
checks on a per-CPU basis.

Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Signed-off-by: Sean Christopherson 
Co-developed-by: Jarkko Sakkinen 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/feat_ctl.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index 29a3bedabd06..c3afcd2e4342 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -93,16 +93,35 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
 }
 #endif /* CONFIG_X86_VMX_FEATURE_NAMES */
 
+static void clear_sgx_caps(void)
+{
+   setup_clear_cpu_cap(X86_FEATURE_SGX);
+   setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
+   setup_clear_cpu_cap(X86_FEATURE_SGX1);
+   setup_clear_cpu_cap(X86_FEATURE_SGX2);
+}
+
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 {
bool tboot = tboot_enabled();
+   bool enable_sgx;
u64 msr;
 
if (rdmsrl_safe(MSR_IA32_FEAT_CTL, )) {
clear_cpu_cap(c, X86_FEATURE_VMX);
+   clear_sgx_caps();
return;
}
 
+   /*
+* Enable SGX if and only if the kernel supports SGX and Launch Control
+* is supported, i.e. disable SGX if the LE hash MSRs can't be written.
+*/
+   enable_sgx = cpu_has(c, X86_FEATURE_SGX) &&
+cpu_has(c, X86_FEATURE_SGX1) &&
+cpu_has(c, X86_FEATURE_SGX_LC) &&
+IS_ENABLED(CONFIG_INTEL_SGX);
+
if (msr & FEAT_CTL_LOCKED)
goto update_caps;
 
@@ -124,13 +143,16 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX;
}
 
+   if (enable_sgx)
+   msr |= FEAT_CTL_SGX_ENABLED | FEAT_CTL_SGX_LC_ENABLED;
+
wrmsrl(MSR_IA32_FEAT_CTL, msr);
 
 update_caps:
set_cpu_cap(c, X86_FEATURE_MSR_IA32_FEAT_CTL);
 
if (!cpu_has(c, X86_FEATURE_VMX))
-   return;
+   goto update_sgx;
 
if ( (tboot && !(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX)) ||
(!tboot && !(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX))) {
@@ -143,4 +165,12 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
init_vmx_capabilities(c);
 #endif
}
+
+update_sgx:
+   if (!(msr & FEAT_CTL_SGX_ENABLED) ||
+   !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
+   if (enable_sgx)
+   pr_err_once("SGX disabled by BIOS\n");
+   clear_sgx_caps();
+   }
 }
-- 
2.25.1



[PATCH v39 11/24] x86/sgx: Add SGX enclave driver

2020-10-02 Thread Jarkko Sakkinen
Intel Software Guard eXtensions (SGX) is a set of CPU instructions that can
be used by applications to set aside private regions of code and data. The
code outside the SGX hosted software entity is prevented from accessing the
memory inside the enclave by the CPU. We call these entities enclaves.

Add a driver that provides an ioctl API to construct and run enclaves.
Enclaves are constructed from pages residing in reserved physical memory
areas. The contents of these pages can only be accessed when they are
mapped as part of an enclave, by a hardware thread running inside the
enclave.

The starting state of an enclave consists of a fixed measured set of
pages that are copied to the EPC during the construction process by
using the opcode ENCLS leaf functions and Software Enclave Control
Structure (SECS) that defines the enclave properties.

Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
the EPC and EINIT checks a given signed measurement and moves the enclave
into a state ready for execution.

An initialized enclave can only be accessed through special Thread Control
Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER.  This leaf
function converts a thread into enclave mode and continues the execution in
the offset defined by the TCS provided to EENTER. An enclave is exited
through syscall, exception, interrupts or by explicitly calling another
ENCLU leaf EEXIT.

The mmap() permissions are capped by the contained enclave page
permissions. The mapped areas must also be populated, i.e. each page
address must contain a page. This logic is implemented in
sgx_encl_may_map().

Cc: linux-security-mod...@vger.kernel.org
Cc: linux...@kvack.org
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Acked-by: Jethro Beekman 
Tested-by: Jethro Beekman 
Tested-by: Haitao Huang 
Tested-by: Chunyang Hui 
Tested-by: Jordan Hand 
Tested-by: Nathaniel McCallum 
Tested-by: Seth Moore 
Tested-by: Darren Kenny 
Reviewed-by: Darren Kenny 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Co-developed-by: Suresh Siddha 
Signed-off-by: Suresh Siddha 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/sgx/Makefile |   2 +
 arch/x86/kernel/cpu/sgx/driver.c | 173 
 arch/x86/kernel/cpu/sgx/driver.h |  29 +++
 arch/x86/kernel/cpu/sgx/encl.c   | 331 +++
 arch/x86/kernel/cpu/sgx/encl.h   |  85 
 arch/x86/kernel/cpu/sgx/main.c   |  11 +
 6 files changed, 631 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.h

diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 79510ce01b3b..3fc451120735 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,2 +1,4 @@
 obj-y += \
+   driver.o \
+   encl.o \
main.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
new file mode 100644
index ..f54da5f19c2b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "driver.h"
+#include "encl.h"
+
+u64 sgx_encl_size_max_32;
+u64 sgx_encl_size_max_64;
+u32 sgx_misc_reserved_mask;
+u64 sgx_attributes_reserved_mask;
+u64 sgx_xfrm_reserved_mask = ~0x3;
+u32 sgx_xsave_size_tbl[64];
+
+static int sgx_open(struct inode *inode, struct file *file)
+{
+   struct sgx_encl *encl;
+   int ret;
+
+   encl = kzalloc(sizeof(*encl), GFP_KERNEL);
+   if (!encl)
+   return -ENOMEM;
+
+   atomic_set(>flags, 0);
+   kref_init(>refcount);
+   xa_init(>page_array);
+   mutex_init(>lock);
+   INIT_LIST_HEAD(>mm_list);
+   spin_lock_init(>mm_lock);
+
+   ret = init_srcu_struct(>srcu);
+   if (ret) {
+   kfree(encl);
+   return ret;
+   }
+
+   file->private_data = encl;
+
+   return 0;
+}
+
+static int sgx_release(struct inode *inode, struct file *file)
+{
+   struct sgx_encl *encl = file->private_data;
+   struct sgx_encl_mm *encl_mm;
+
+   for ( ; ; )  {
+   spin_lock(>mm_lock);
+
+   if (list_empty(>mm_list)) {
+   encl_mm = NULL;
+   } else {
+   encl_mm = list_first_entry(>mm_list,
+  struct sgx_encl_mm, list);
+   list_del_rcu(_mm->list);
+   }
+
+   spin_unlock(>mm_lock);
+
+   /* The list is empty, ready to go. */
+   if (!encl_mm)
+   break;
+
+   synchronize_srcu(>srcu);
+   

[PATCH v39 03/24] x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

Include SGX bit to the PF error codes and throw SIGSEGV with PF_SGX when
a #PF with SGX set happens.

CPU throws a #PF with the SGX set in the event of Enclave Page Cache Map
(EPCM) conflict. The EPCM is a CPU-internal table, which describes the
properties for a enclave page. Enclaves are measured and signed software
entities, which SGX hosts. [1]

Although the primary purpose of the EPCM conflict checks  is to prevent
malicious accesses to an enclave, an illegit access can happen also for
legit reasons.

All SGX reserved memory, including EPCM is encrypted with a transient key
that does not survive from the power transition. Throwing a SIGSEGV allows
user space software to react when this happens (e.g. recreate the enclave,
which was invalidated).

[1] Intel SDM: 36.5.1 Enclave Page Cache Map (EPCM)

Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Reviewed-by: Borislav Petkov 
Signed-off-by: Sean Christopherson 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/include/asm/trap_pf.h |  1 +
 arch/x86/mm/fault.c| 13 +
 2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 305bc1214aef..1794777b2a85 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -19,6 +19,7 @@ enum x86_pf_error_code {
X86_PF_RSVD =   1 << 3,
X86_PF_INSTR=   1 << 4,
X86_PF_PK   =   1 << 5,
+   X86_PF_SGX  =   1 << 15,
 };
 
 #endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6e3e8a124903..90ee91c244c6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1101,6 +1101,19 @@ access_error(unsigned long error_code, struct 
vm_area_struct *vma)
if (error_code & X86_PF_PK)
return 1;
 
+   /*
+* Access is blocked by the Enclave Page Cache Map (EPCM), i.e. the
+* access is allowed by the PTE but not the EPCM. This usually happens
+* when the EPCM is yanked out from under us, e.g. by hardware after a
+* suspend/resume cycle. In any case, software, i.e. the kernel, can't
+* fix the source of the fault as the EPCM can't be directly modified by
+* software. Handle the fault as an access error in order to signal
+* userspace so that userspace can rebuild their enclave(s), even though
+* userspace may not have actually violated access permissions.
+*/
+   if (unlikely(error_code & X86_PF_SGX))
+   return 1;
+
/*
 * Make sure to check the VMA so that we do not perform
 * faults just to hit a X86_PF_PK as soon as we fill in a
-- 
2.25.1



[PATCH v39 08/24] x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections

2020-10-02 Thread Jarkko Sakkinen
From: Sean Christopherson 

Enumerate Enclave Page Cache (EPC) sections via CPUID and add the data
structures necessary to track EPC pages so that they can be easily borrowed
for different uses.

Embed section index to the first eight bits of the EPC page descriptor.
Existing client hardware supports only a single section, while upcoming
server hardware will support at most eight sections. Thus, eight bits
should be enough for long term needs.

Acked-by: Jethro Beekman 
Reviewed-by: Darren Kenny 
Signed-off-by: Sean Christopherson 
Co-developed-by: Serge Ayoun 
Signed-off-by: Serge Ayoun 
Co-developed-by: Jarkko Sakkinen 
Signed-off-by: Jarkko Sakkinen 
---
 arch/x86/Kconfig |  17 +++
 arch/x86/kernel/cpu/Makefile |   1 +
 arch/x86/kernel/cpu/sgx/Makefile |   2 +
 arch/x86/kernel/cpu/sgx/main.c   | 216 +++
 arch/x86/kernel/cpu/sgx/sgx.h|  52 
 5 files changed, 288 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/Makefile
 create mode 100644 arch/x86/kernel/cpu/sgx/main.c
 create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5a604353ec42..9787253ea61e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1930,6 +1930,23 @@ config X86_INTEL_TSX_MODE_AUTO
  side channel attacks- equals the tsx=auto command line parameter.
 endchoice
 
+config INTEL_SGX
+   bool "Intel SGX"
+   depends on X86_64 && CPU_SUP_INTEL
+   depends on CRYPTO=y
+   depends on CRYPTO_SHA256=y
+   select SRCU
+   select MMU_NOTIFIER
+   help
+ Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
+ that can be used by applications to set aside private regions of code
+ and data, referred to as enclaves. An enclave's private memory can
+ only be accessed by code running within the enclave. Accesses from
+ outside the enclave, including other enclaves, are disallowed by
+ hardware.
+
+ If unsure, say N.
+
 config EFI
bool "EFI runtime service support"
depends on ACPI
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 93792b457b81..c80d804fd02b 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_X86_MCE) += mce/
 obj-$(CONFIG_MTRR) += mtrr/
 obj-$(CONFIG_MICROCODE)+= microcode/
 obj-$(CONFIG_X86_CPU_RESCTRL)  += resctrl/
+obj-$(CONFIG_INTEL_SGX)+= sgx/
 
 obj-$(CONFIG_X86_LOCAL_APIC)   += perfctr-watchdog.o
 
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
new file mode 100644
index ..79510ce01b3b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -0,0 +1,2 @@
+obj-y += \
+   main.o
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
new file mode 100644
index ..c5831e3db14a
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-17 Intel Corporation.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "encls.h"
+
+struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
+static int sgx_nr_epc_sections;
+static struct task_struct *ksgxswapd_tsk;
+
+static void sgx_sanitize_section(struct sgx_epc_section *section)
+{
+   struct sgx_epc_page *page;
+   LIST_HEAD(secs_list);
+   int ret;
+
+   while (!list_empty(>unsanitized_page_list)) {
+   if (kthread_should_stop())
+   return;
+
+   spin_lock(>lock);
+
+   page = list_first_entry(>unsanitized_page_list,
+   struct sgx_epc_page, list);
+
+   ret = __eremove(sgx_get_epc_addr(page));
+   if (!ret)
+   list_move(>list, >page_list);
+   else
+   list_move_tail(>list, _list);
+
+   spin_unlock(>lock);
+
+   cond_resched();
+   }
+}
+
+static int ksgxswapd(void *p)
+{
+   int i;
+
+   set_freezable();
+
+   /*
+* Reset all pages to uninitialized state. Pages could be in initialized
+* on kmemexec.
+*/
+   for (i = 0; i < sgx_nr_epc_sections; i++)
+   sgx_sanitize_section(_epc_sections[i]);
+
+   /*
+* 2nd round for the SECS pages as they cannot be removed when they
+* still hold child pages.
+*/
+   for (i = 0; i < sgx_nr_epc_sections; i++) {
+   sgx_sanitize_section(_epc_sections[i]);
+
+   /* Should never happen. */
+   if (!list_empty(_epc_sections[i].unsanitized_page_list))
+   WARN(1, "EPC section %d has unsanitized pages.\n", i);
+   }
+
+   return 0;
+}
+
+static bool __init 

[git pull] Input updates for v5.9-rc7

2020-10-02 Thread Dmitry Torokhov
Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

to receive updates for the input subsystem. A couple more drover quirks,
now enabling newer trackpoints from Synaptics for real.

Changelog:
-

Jiri Kosina (1):
  Input: i8042 - add nopnp quirk for Acer Aspire 5 A515

Vincent Huang (1):
  Input: trackpoint - enable Synaptics trackpoints

Diffstat:


 drivers/input/mouse/trackpoint.c  | 2 ++
 drivers/input/serio/i8042-x86ia64io.h | 7 +++
 2 files changed, 9 insertions(+)

Thanks.


-- 
Dmitry


Re: [PATCHv3 1/1] ext4: Optimize file overwrites

2020-10-02 Thread Theodore Y. Ts'o
On Fri, Sep 18, 2020 at 10:36:35AM +0530, Ritesh Harjani wrote:
> In case if the file already has underlying blocks/extents allocated
> then we don't need to start a journal txn and can directly return
> the underlying mapping. Currently ext4_iomap_begin() is used by
> both DAX & DIO path. We can check if the write request is an
> overwrite & then directly return the mapping information.
> 
> This could give a significant perf boost for multi-threaded writes
> specially random overwrites.
> On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement
> could be seen in random writes (overwrite). Also bcoz this optimizes
> away the spinlock contention during jbd2 slab cache allocation
> (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed.
> 
> Reported-by: Dan Williams 
> Suggested-by: Jan Kara 
> Signed-off-by: Ritesh Harjani 

Thanks, applied.

- Ted


drivers/net/ethernet/huawei/hinic/hinic_main.c:796:25: sparse: sparse: cast to restricted __be16

2020-10-02 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   d3d45f8220d60a0b2cf8fb2be4e6ffd9008e
commit: 1f62cfa19a619f82c098468660b7950477101d45 hinic: add net_device_ops 
associated with vf
date:   5 months ago
config: x86_64-randconfig-s022-20201003 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-201-g24bdaac6-dirty
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f62cfa19a619f82c098468660b7950477101d45
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 1f62cfa19a619f82c098468660b7950477101d45
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

echo
echo "sparse warnings: (new ones prefixed by >>)"
echo
>> drivers/net/ethernet/huawei/hinic/hinic_main.c:796:25: sparse: sparse: cast 
>> to restricted __be16
>> drivers/net/ethernet/huawei/hinic/hinic_main.c:796:25: sparse: sparse: cast 
>> to restricted __be16
>> drivers/net/ethernet/huawei/hinic/hinic_main.c:796:25: sparse: sparse: cast 
>> to restricted __be16
>> drivers/net/ethernet/huawei/hinic/hinic_main.c:796:25: sparse: sparse: cast 
>> to restricted __be16

vim +796 drivers/net/ethernet/huawei/hinic/hinic_main.c

   778  
   779  static void hinic_tx_timeout(struct net_device *netdev, unsigned int 
txqueue)
   780  {
   781  struct hinic_dev *nic_dev = netdev_priv(netdev);
   782  u16 sw_pi, hw_ci, sw_ci;
   783  struct hinic_sq *sq;
   784  u16 num_sqs, q_id;
   785  
   786  num_sqs = hinic_hwdev_num_qps(nic_dev->hwdev);
   787  
   788  netif_err(nic_dev, drv, netdev, "Tx timeout\n");
   789  
   790  for (q_id = 0; q_id < num_sqs; q_id++) {
   791  if (!netif_xmit_stopped(netdev_get_tx_queue(netdev, 
q_id)))
   792  continue;
   793  
   794  sq = hinic_hwdev_get_sq(nic_dev->hwdev, q_id);
   795  sw_pi = atomic_read(>wq->prod_idx) & sq->wq->mask;
 > 796  hw_ci = be16_to_cpu(*(u16 *)(sq->hw_ci_addr)) & 
 > sq->wq->mask;
   797  sw_ci = atomic_read(>wq->cons_idx) & sq->wq->mask;
   798  netif_err(nic_dev, drv, netdev, "Txq%d: sw_pi: %d, 
hw_ci: %d, sw_ci: %d, napi->state: 0x%lx\n",
   799q_id, sw_pi, hw_ci, sw_ci,
   800nic_dev->txqs[q_id].napi.state);
   801  }
   802  }
   803  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH] [v2] ext4: Fix error handling code in add_new_gdb

2020-10-02 Thread Theodore Y. Ts'o
On Sat, Aug 29, 2020 at 10:54:02AM +0800, Dinghao Liu wrote:
> When ext4_journal_get_write_access() fails, we should
> terminate the execution flow and release n_group_desc,
> iloc.bh, dind and gdb_bh.
> 
> Signed-off-by: Dinghao Liu 

Thanks, applied.

- Ted


Re: [PATCH] compat: move strut compat_iovec out of #ifdef CONFIG_COMPAT

2020-10-02 Thread Al Viro
On Fri, Oct 02, 2020 at 08:15:12AM +0200, Christoph Hellwig wrote:
> ping?  This is needed to unbreak the work.iov_iter branch that is in
> for-next.

Folded into "iov_iter: refactor rw_copy_check_uvector and import_iovec"
and force-pushed...


Re: [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86

2020-10-02 Thread kernel test robot
Hi Ricardo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master driver-core/driver-core-testing 
linus/master v5.9-rc7 next-20201002]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
238c91115cd05c71447ea071624a4c9fe661f970
config: x86_64-randconfig-a012-20201002 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
bcd05599d0e53977a963799d6ee4f6e0bc21331b)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/ffe255e2342693ca1a8d96d052c903824595fde8
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
git checkout ffe255e2342693ca1a8d96d052c903824595fde8
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
   if (cpu_has(c, X86_FEATURE_HYBRID_CPU))
  ^
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 
>> 'X86_FEATURE_HYBRID_CPU'
   fatal error: too many errors emitted, stopping now [-ferror-limit=]
   20 errors generated.

vim +/X86_FEATURE_HYBRID_CPU +934 arch/x86/kernel/cpu/common.c

   896  
   897  void get_cpu_cap(struct cpuinfo_x86 *c)
   898  {
   899  u32 eax, ebx, ecx, edx;
   900  
   901  /* Intel-defined flags: level 0x0001 */
   902  if (c->cpuid_level >= 0x0001) {
   903  cpuid(0x0001, , , , );
   904  
   905  c->x86_capability[CPUID_1_ECX] = ecx;
   906  c->x86_capability[CPUID_1_EDX] = edx;
   907  }
   908  
   909  /* Thermal and Power Management Leaf: level 0x0006 (eax) */
   910  if (c->cpuid_level >= 0x0006)
   911  c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x0006);
   912  
   913  /* Additional Intel-defined flags: level 0x0007 */
   914  if (c->cpuid_level >= 0x0007) {
  

[PATCH v3 4/7] dma-buf: heaps: Skip sync if not mapped

2020-10-02 Thread John Stultz
This patch is basically a port of Ørjan Eide's similar patch for ION
 https://lore.kernel.org/lkml/20200414134629.54567-1-orjan.e...@arm.com/

Only sync the sg-list of dma-buf heap attachment when the attachment
is actually mapped on the device.

dma-bufs may be synced at any time. It can be reached from user space
via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when
syncs may be attempted, and dma_buf_end_cpu_access() and
dma_buf_begin_cpu_access() may not be paired.

Since the sg_list's dma_address isn't set up until the buffer is used
on the device, and dma_map_sg() is called on it, the dma_address will be
NULL if sync is attempted on the dma-buf before it's mapped on a device.

Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops
into the dma_direct code")) this was a problem as the dma-api (at least
the swiotlb_dma_ops on arm64) would use the potentially invalid
dma_address. How that failed depended on how the device handled physical
address 0. If 0 was a valid address to physical ram, that page would get
flushed a lot, while the actual pages in the buffer would not get synced
correctly. While if 0 is an invalid physical address it may cause a
fault and trigger a crash.

In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct:
merge swiotlb_dma_ops into the dma_direct code"), as this moved the
dma-api to use the page pointer in the sg_list, and (for Ion buffers at
least) this will always be valid if the sg_list exists at all.

But, this issue is re-introduced in v5.3 with
commit 449fa54d6815 ("dma-direct: correct the physical addr in
dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old
behaviour and picks the dma_address that may be invalid.

dma-buf core doesn't ensure that the buffer is mapped on the device, and
thus have a valid sg_list, before calling the exporter's
begin_cpu_access.

Logic and commit message originally by: Ørjan Eide 

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
 drivers/dma-buf/heaps/cma_heap.c| 10 ++
 drivers/dma-buf/heaps/system_heap.c | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 366963b94c72..fece9d7739ae 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -44,6 +44,7 @@ struct dma_heap_attachment {
struct device *dev;
struct sg_table table;
struct list_head list;
+   bool mapped;
 };
 
 static int cma_heap_attach(struct dma_buf *dmabuf,
@@ -68,6 +69,7 @@ static int cma_heap_attach(struct dma_buf *dmabuf,
 
a->dev = attachment->dev;
INIT_LIST_HEAD(>list);
+   a->mapped = false;
 
attachment->priv = a;
 
@@ -102,6 +104,7 @@ static struct sg_table *cma_heap_map_dma_buf(struct 
dma_buf_attachment *attachme
ret = dma_map_sgtable(attachment->dev, table, direction, 0);
if (ret)
return ERR_PTR(-ENOMEM);
+   a->mapped = true;
return table;
 }
 
@@ -109,6 +112,9 @@ static void cma_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
   struct sg_table *table,
   enum dma_data_direction direction)
 {
+   struct dma_heap_attachment *a = attachment->priv;
+
+   a->mapped = false;
dma_unmap_sgtable(attachment->dev, table, direction, 0);
 }
 
@@ -123,6 +129,8 @@ static int cma_heap_dma_buf_begin_cpu_access(struct dma_buf 
*dmabuf,
 
mutex_lock(>lock);
list_for_each_entry(a, >attachments, list) {
+   if (!a->mapped)
+   continue;
dma_sync_sgtable_for_cpu(a->dev, >table, direction);
}
mutex_unlock(>lock);
@@ -141,6 +149,8 @@ static int cma_heap_dma_buf_end_cpu_access(struct dma_buf 
*dmabuf,
 
mutex_lock(>lock);
list_for_each_entry(a, >attachments, list) {
+   if (!a->mapped)
+   continue;
dma_sync_sgtable_for_device(a->dev, >table, direction);
}
mutex_unlock(>lock);
diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index 00ed107b3b76..ef8d47e5a7ff 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -37,6 +37,7 @@ struct dma_heap_attachment {
struct device *dev;
struct sg_table *table;
struct list_head list;
+   bool mapped;
 };
 
 static struct sg_table *dup_sg_table(struct sg_table *table)
@@ -84,6 +85,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
a->table = table;
a->dev = attachment->dev;
  

[PATCH v3 6/7] dma-buf: dma-heap: Keep track of the heap device struct

2020-10-02 Thread John Stultz
Keep track of the heap device struct.

This will be useful for special DMA allocations
and actions.

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
 drivers/dma-buf/dma-heap.c | 33 +
 include/linux/dma-heap.h   |  9 +
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index afd22c9dbdcf..72c746755d89 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -30,6 +30,7 @@
  * @heap_devt  heap device node
  * @list   list head connecting to list of heaps
  * @heap_cdev  heap char device
+ * @heap_dev   heap device struct
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -40,6 +41,7 @@ struct dma_heap {
dev_t heap_devt;
struct list_head list;
struct cdev heap_cdev;
+   struct device *heap_dev;
 };
 
 static LIST_HEAD(heap_list);
@@ -190,10 +192,21 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
return heap->priv;
 }
 
+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap)
+{
+   return heap->heap_dev;
+}
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
struct dma_heap *heap, *h, *err_ret;
-   struct device *dev_ret;
unsigned int minor;
int ret;
 
@@ -247,16 +260,20 @@ struct dma_heap *dma_heap_add(const struct 
dma_heap_export_info *exp_info)
goto err1;
}
 
-   dev_ret = device_create(dma_heap_class,
-   NULL,
-   heap->heap_devt,
-   NULL,
-   heap->name);
-   if (IS_ERR(dev_ret)) {
+   heap->heap_dev = device_create(dma_heap_class,
+  NULL,
+  heap->heap_devt,
+  NULL,
+  heap->name);
+   if (IS_ERR(heap->heap_dev)) {
pr_err("dma_heap: Unable to create device\n");
-   err_ret = ERR_CAST(dev_ret);
+   err_ret = ERR_CAST(heap->heap_dev);
goto err2;
}
+
+   /* Make sure it doesn't disappear on us */
+   heap->heap_dev = get_device(heap->heap_dev);
+
/* Add heap to the list */
mutex_lock(_list_lock);
list_add(>list, _list);
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 454e354d1ffb..82857e096910 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -50,6 +50,15 @@ struct dma_heap_export_info {
  */
 void *dma_heap_get_drvdata(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:  information needed to register this heap
-- 
2.17.1



[PATCH v3 2/7] dma-buf: heaps: Move heap-helper logic into the cma_heap implementation

2020-10-02 Thread John Stultz
Since the heap-helpers logic ended up not being as generic as
hoped, move the heap-helpers dma_buf_ops implementations into
the cma_heap directly.

This will allow us to remove the heap_helpers code in a following
patch.

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v2:
* Fix unused return value and locking issue Reported-by:
kernel test robot 
Julia Lawall 
* Make cma_heap_buf_ops static suggested by
kernel test robot 
* Fix uninitialized return in cma Reported-by:
kernel test robot 
* Minor cleanups
v3:
* Use the new sgtable mapping functions, as Suggested-by:
 Daniel Mentz 
---
 drivers/dma-buf/heaps/cma_heap.c | 317 ++-
 1 file changed, 267 insertions(+), 50 deletions(-)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 626cf7fd033a..366963b94c72 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -2,76 +2,291 @@
 /*
  * DMABUF CMA heap exporter
  *
- * Copyright (C) 2012, 2019 Linaro Ltd.
+ * Copyright (C) 2012, 2019, 2020 Linaro Ltd.
  * Author:  for ST-Ericsson.
+ *
+ * Also utilizing parts of Andrew Davis' SRAM heap:
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
+ * Andrew F. Davis 
  */
-
 #include 
-#include 
 #include 
-#include 
 #include 
+#include 
+#include 
 #include 
-#include 
 #include 
+#include 
+#include 
 #include 
-#include 
 #include 
-#include 
+#include 
 
-#include "heap-helpers.h"
 
 struct cma_heap {
struct dma_heap *heap;
struct cma *cma;
 };
 
-static void cma_heap_free(struct heap_helper_buffer *buffer)
+struct cma_heap_buffer {
+   struct cma_heap *heap;
+   struct list_head attachments;
+   struct mutex lock;
+   unsigned long len;
+   struct page *cma_pages;
+   struct page **pages;
+   pgoff_t pagecount;
+   int vmap_cnt;
+   void *vaddr;
+};
+
+struct dma_heap_attachment {
+   struct device *dev;
+   struct sg_table table;
+   struct list_head list;
+};
+
+static int cma_heap_attach(struct dma_buf *dmabuf,
+  struct dma_buf_attachment *attachment)
 {
-   struct cma_heap *cma_heap = dma_heap_get_drvdata(buffer->heap);
-   unsigned long nr_pages = buffer->pagecount;
-   struct page *cma_pages = buffer->priv_virt;
+   struct cma_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a;
+   int ret;
 
-   /* free page list */
-   kfree(buffer->pages);
-   /* release memory */
-   cma_release(cma_heap->cma, cma_pages, nr_pages);
+   a = kzalloc(sizeof(*a), GFP_KERNEL);
+   if (!a)
+   return -ENOMEM;
+
+   ret = sg_alloc_table_from_pages(>table, buffer->pages,
+   buffer->pagecount, 0,
+   buffer->pagecount << PAGE_SHIFT,
+   GFP_KERNEL);
+   if (ret) {
+   kfree(a);
+   return ret;
+   }
+
+   a->dev = attachment->dev;
+   INIT_LIST_HEAD(>list);
+
+   attachment->priv = a;
+
+   mutex_lock(>lock);
+   list_add(>list, >attachments);
+   mutex_unlock(>lock);
+
+   return 0;
+}
+
+static void cma_heap_detatch(struct dma_buf *dmabuf,
+struct dma_buf_attachment *attachment)
+{
+   struct cma_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a = attachment->priv;
+
+   mutex_lock(>lock);
+   list_del(>list);
+   mutex_unlock(>lock);
+
+   sg_free_table(>table);
+   kfree(a);
+}
+
+static struct sg_table *cma_heap_map_dma_buf(struct dma_buf_attachment 
*attachment,
+enum dma_data_direction direction)
+{
+   struct dma_heap_attachment *a = attachment->priv;
+   struct sg_table *table = >table;
+   int ret;
+
+   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   if (ret)
+   return ERR_PTR(-ENOMEM);
+   return table;
+}
+
+static void cma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
+  struct sg_table *table,
+  enum dma_data_direction direction)
+{
+   dma_unmap_sgtable(attachment->dev, table, direction, 0);
+}
+
+static int cma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
+enum dma_data_direction direction)
+{
+   struct cma_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a;
+
+   if (buffer->vmap_cnt)
+   

[PATCH v3 3/7] dma-buf: heaps: Remove heap-helpers code

2020-10-02 Thread John Stultz
The heap-helpers code was not as generic as initially hoped
and it is now not being used, so remove it from the tree.

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
 drivers/dma-buf/heaps/Makefile   |   1 -
 drivers/dma-buf/heaps/heap-helpers.c | 271 ---
 drivers/dma-buf/heaps/heap-helpers.h |  53 --
 3 files changed, 325 deletions(-)
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h

diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
index 6e54cdec3da0..974467791032 100644
--- a/drivers/dma-buf/heaps/Makefile
+++ b/drivers/dma-buf/heaps/Makefile
@@ -1,4 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-y  += heap-helpers.o
 obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)  += system_heap.o
 obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o
diff --git a/drivers/dma-buf/heaps/heap-helpers.c 
b/drivers/dma-buf/heaps/heap-helpers.c
deleted file mode 100644
index 9f964ca3f59c..
--- a/drivers/dma-buf/heaps/heap-helpers.c
+++ /dev/null
@@ -1,271 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "heap-helpers.h"
-
-void init_heap_helper_buffer(struct heap_helper_buffer *buffer,
-void (*free)(struct heap_helper_buffer *))
-{
-   buffer->priv_virt = NULL;
-   mutex_init(>lock);
-   buffer->vmap_cnt = 0;
-   buffer->vaddr = NULL;
-   buffer->pagecount = 0;
-   buffer->pages = NULL;
-   INIT_LIST_HEAD(>attachments);
-   buffer->free = free;
-}
-
-struct dma_buf *heap_helper_export_dmabuf(struct heap_helper_buffer *buffer,
- int fd_flags)
-{
-   DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
-
-   exp_info.ops = _helper_ops;
-   exp_info.size = buffer->size;
-   exp_info.flags = fd_flags;
-   exp_info.priv = buffer;
-
-   return dma_buf_export(_info);
-}
-
-static void *dma_heap_map_kernel(struct heap_helper_buffer *buffer)
-{
-   void *vaddr;
-
-   vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
-   if (!vaddr)
-   return ERR_PTR(-ENOMEM);
-
-   return vaddr;
-}
-
-static void dma_heap_buffer_destroy(struct heap_helper_buffer *buffer)
-{
-   if (buffer->vmap_cnt > 0) {
-   WARN(1, "%s: buffer still mapped in the kernel\n", __func__);
-   vunmap(buffer->vaddr);
-   }
-
-   buffer->free(buffer);
-}
-
-static void *dma_heap_buffer_vmap_get(struct heap_helper_buffer *buffer)
-{
-   void *vaddr;
-
-   if (buffer->vmap_cnt) {
-   buffer->vmap_cnt++;
-   return buffer->vaddr;
-   }
-   vaddr = dma_heap_map_kernel(buffer);
-   if (IS_ERR(vaddr))
-   return vaddr;
-   buffer->vaddr = vaddr;
-   buffer->vmap_cnt++;
-   return vaddr;
-}
-
-static void dma_heap_buffer_vmap_put(struct heap_helper_buffer *buffer)
-{
-   if (!--buffer->vmap_cnt) {
-   vunmap(buffer->vaddr);
-   buffer->vaddr = NULL;
-   }
-}
-
-struct dma_heaps_attachment {
-   struct device *dev;
-   struct sg_table table;
-   struct list_head list;
-};
-
-static int dma_heap_attach(struct dma_buf *dmabuf,
-  struct dma_buf_attachment *attachment)
-{
-   struct dma_heaps_attachment *a;
-   struct heap_helper_buffer *buffer = dmabuf->priv;
-   int ret;
-
-   a = kzalloc(sizeof(*a), GFP_KERNEL);
-   if (!a)
-   return -ENOMEM;
-
-   ret = sg_alloc_table_from_pages(>table, buffer->pages,
-   buffer->pagecount, 0,
-   buffer->pagecount << PAGE_SHIFT,
-   GFP_KERNEL);
-   if (ret) {
-   kfree(a);
-   return ret;
-   }
-
-   a->dev = attachment->dev;
-   INIT_LIST_HEAD(>list);
-
-   attachment->priv = a;
-
-   mutex_lock(>lock);
-   list_add(>list, >attachments);
-   mutex_unlock(>lock);
-
-   return 0;
-}
-
-static void dma_heap_detach(struct dma_buf *dmabuf,
-   struct dma_buf_attachment *attachment)
-{
-   struct dma_heaps_attachment *a = attachment->priv;
-   struct heap_helper_buffer *buffer = dmabuf->priv;
-
-   mutex_lock(>lock);
-   list_del(>list);
-   mutex_unlock(>lock);
-
-   sg_free_table(>table);
-   kfree(a);
-}
-
-static
-struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment 

[PATCH v3 5/7] dma-buf: system_heap: Allocate higher order pages if available

2020-10-02 Thread John Stultz
While the system heap can return non-contiguous pages,
try to allocate larger order pages if possible.

This will allow slight performance gains and make implementing
page pooling easier.

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v3:
* Use page_size() rather then opencoding it
---
 drivers/dma-buf/heaps/system_heap.c | 83 ++---
 1 file changed, 65 insertions(+), 18 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index ef8d47e5a7ff..2b8d4b6abacb 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -40,6 +40,14 @@ struct dma_heap_attachment {
bool mapped;
 };
 
+#define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
+   | __GFP_NORETRY) & ~__GFP_RECLAIM) \
+   | __GFP_COMP)
+#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
+static gfp_t order_flags[] = {HIGH_ORDER_GFP, LOW_ORDER_GFP, LOW_ORDER_GFP};
+static const unsigned int orders[] = {8, 4, 0};
+#define NUM_ORDERS ARRAY_SIZE(orders)
+
 static struct sg_table *dup_sg_table(struct sg_table *table)
 {
struct sg_table *new_table;
@@ -270,8 +278,11 @@ static void system_heap_dma_buf_release(struct dma_buf 
*dmabuf)
int i;
 
table = >sg_table;
-   for_each_sgtable_sg(table, sg, i)
-   __free_page(sg_page(sg));
+   for_each_sg(table->sgl, sg, table->nents, i) {
+   struct page *page = sg_page(sg);
+
+   __free_pages(page, compound_order(page));
+   }
sg_free_table(table);
kfree(buffer);
 }
@@ -289,6 +300,26 @@ static const struct dma_buf_ops system_heap_buf_ops = {
.release = system_heap_dma_buf_release,
 };
 
+static struct page *alloc_largest_available(unsigned long size,
+   unsigned int max_order)
+{
+   struct page *page;
+   int i;
+
+   for (i = 0; i < NUM_ORDERS; i++) {
+   if (size <  (PAGE_SIZE << orders[i]))
+   continue;
+   if (max_order < orders[i])
+   continue;
+
+   page = alloc_pages(order_flags[i], orders[i]);
+   if (!page)
+   continue;
+   return page;
+   }
+   return NULL;
+}
+
 static int system_heap_allocate(struct dma_heap *heap,
unsigned long len,
unsigned long fd_flags,
@@ -296,11 +327,13 @@ static int system_heap_allocate(struct dma_heap *heap,
 {
struct system_heap_buffer *buffer;
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+   unsigned long size_remaining = len;
+   unsigned int max_order = orders[0];
struct dma_buf *dmabuf;
struct sg_table *table;
struct scatterlist *sg;
-   pgoff_t pagecount;
-   pgoff_t pg;
+   struct list_head pages;
+   struct page *page, *tmp_page;
int i, ret = -ENOMEM;
 
buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
@@ -312,25 +345,35 @@ static int system_heap_allocate(struct dma_heap *heap,
buffer->heap = heap;
buffer->len = len;
 
-   table = >sg_table;
-   pagecount = len / PAGE_SIZE;
-   if (sg_alloc_table(table, pagecount, GFP_KERNEL))
-   goto free_buffer;
-
-   sg = table->sgl;
-   for (pg = 0; pg < pagecount; pg++) {
-   struct page *page;
+   INIT_LIST_HEAD();
+   i = 0;
+   while (size_remaining > 0) {
/*
 * Avoid trying to allocate memory if the process
 * has been killed by SIGKILL
 */
if (fatal_signal_pending(current))
-   goto free_pages;
-   page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+   goto free_buffer;
+
+   page = alloc_largest_available(size_remaining, max_order);
if (!page)
-   goto free_pages;
+   goto free_buffer;
+
+   list_add_tail(>lru, );
+   size_remaining -= page_size(page);
+   max_order = compound_order(page);
+   i++;
+   }
+
+   table = >sg_table;
+   if (sg_alloc_table(table, i, GFP_KERNEL))
+   goto free_buffer;
+
+   sg = table->sgl;
+   list_for_each_entry_safe(page, tmp_page, , lru) {
sg_set_page(sg, page, page_size(page), 0);
sg = sg_next(sg);
+   list_del(>lru);
}
 
/* create the dmabuf */
@@ -350,14 +393,18 @@ static int 

[PATCH v3 1/7] dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists

2020-10-02 Thread John Stultz
In preparation for some patches to optmize the system
heap code, rework the dmabuf exporter to utilize sgtables rather
then pageslists for tracking the associated pages.

This will allow for large order page allocations, as well as
more efficient page pooling.

In doing so, the system heap stops using the heap-helpers logic
which sadly is not quite as generic as I was hoping it to be, so
this patch adds heap specific implementations of the dma_buf_ops
function handlers.

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v2:
* Fix locking issue and an unused return value Reported-by:
 kernel test robot 
 Julia Lawall 
* Make system_heap_buf_ops static Reported-by:
 kernel test robot 
v3:
* Use the new sgtable mapping functions, as Suggested-by:
 Daniel Mentz 
---
 drivers/dma-buf/heaps/system_heap.c | 344 
 1 file changed, 298 insertions(+), 46 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index 0bf688e3c023..00ed107b3b76 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -3,7 +3,11 @@
  * DMABUF System heap exporter
  *
  * Copyright (C) 2011 Google, Inc.
- * Copyright (C) 2019 Linaro Ltd.
+ * Copyright (C) 2019, 2020 Linaro Ltd.
+ *
+ * Portions based off of Andrew Davis' SRAM heap:
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
+ * Andrew F. Davis 
  */
 
 #include 
@@ -15,72 +19,321 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-
-#include "heap-helpers.h"
+#include 
 
 struct dma_heap *sys_heap;
 
-static void system_heap_free(struct heap_helper_buffer *buffer)
+struct system_heap_buffer {
+   struct dma_heap *heap;
+   struct list_head attachments;
+   struct mutex lock;
+   unsigned long len;
+   struct sg_table sg_table;
+   int vmap_cnt;
+   void *vaddr;
+};
+
+struct dma_heap_attachment {
+   struct device *dev;
+   struct sg_table *table;
+   struct list_head list;
+};
+
+static struct sg_table *dup_sg_table(struct sg_table *table)
 {
-   pgoff_t pg;
+   struct sg_table *new_table;
+   int ret, i;
+   struct scatterlist *sg, *new_sg;
+
+   new_table = kzalloc(sizeof(*new_table), GFP_KERNEL);
+   if (!new_table)
+   return ERR_PTR(-ENOMEM);
+
+   ret = sg_alloc_table(new_table, table->orig_nents, GFP_KERNEL);
+   if (ret) {
+   kfree(new_table);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   new_sg = new_table->sgl;
+   for_each_sgtable_sg(table, sg, i) {
+   sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset);
+   new_sg = sg_next(new_sg);
+   }
+
+   return new_table;
+}
+
+static int system_heap_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attachment)
+{
+   struct system_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a;
+   struct sg_table *table;
+
+   a = kzalloc(sizeof(*a), GFP_KERNEL);
+   if (!a)
+   return -ENOMEM;
+
+   table = dup_sg_table(>sg_table);
+   if (IS_ERR(table)) {
+   kfree(a);
+   return -ENOMEM;
+   }
+
+   a->table = table;
+   a->dev = attachment->dev;
+   INIT_LIST_HEAD(>list);
+
+   attachment->priv = a;
+
+   mutex_lock(>lock);
+   list_add(>list, >attachments);
+   mutex_unlock(>lock);
+
+   return 0;
+}
+
+static void system_heap_detatch(struct dma_buf *dmabuf,
+   struct dma_buf_attachment *attachment)
+{
+   struct system_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a = attachment->priv;
+
+   mutex_lock(>lock);
+   list_del(>list);
+   mutex_unlock(>lock);
+
+   sg_free_table(a->table);
+   kfree(a->table);
+   kfree(a);
+}
+
+static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment 
*attachment,
+   enum dma_data_direction 
direction)
+{
+   struct dma_heap_attachment *a = attachment->priv;
+   struct sg_table *table = a->table;
+   int ret;
+
+   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   if (ret)
+   return ERR_PTR(ret);
+
+   return table;
+}
+
+static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
+ struct sg_table *table,
+ enum dma_data_direction direction)
+{
+   dma_unmap_sgtable(attachment->dev, table, direction, 0);
+}
+
+static int 

[PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-02 Thread John Stultz
This adds a heap that allocates non-contiguous buffers that are
marked as writecombined, so they are not cached by the CPU.

This is useful, as most graphics buffers are usually not touched
by the CPU or only written into once by the CPU. So when mapping
the buffer over and over between devices, we can skip the CPU
syncing, which saves a lot of cache management overhead, greatly
improving performance.

For folk using ION, there was a ION_FLAG_CACHED flag, which
signaled if the returned buffer should be CPU cacheable or not.
With DMA-BUF heaps, we do not yet have such a flag, and by default
the current heaps (system and cma) produce CPU cachable buffers.
So for folks transitioning from ION to DMA-BUF Heaps, this fills
in some of that missing functionality.

There has been a suggestion to make this functionality a flag
(DMAHEAP_FLAG_UNCACHED?) on the system heap, similar to how
ION used the ION_FLAG_CACHED. But I want to make sure an
_UNCACHED flag would truely be a generic attribute across all
heaps. So far that has been unclear, so having it as a separate
heap seemes better for now. (But I'm open to discussion on this
point!)

This is a rework of earlier efforts to add a uncached system heap,
done utilizing the exisitng system heap, adding just a bit of
logic to handle the uncached case.

Feedback would be very welcome!

Many thanks to Liam Mark for his help to get this working.

Pending opensource users of this code include:
* AOSP HiKey960 gralloc:
  - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
  - Visibly improves performance over the system heap
* AOSP Codec2 (possibly, needs more review):
  - 
https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
 drivers/dma-buf/heaps/system_heap.c | 87 ++---
 1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index 2b8d4b6abacb..952f1fd9dacf 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -22,6 +22,7 @@
 #include 
 
 struct dma_heap *sys_heap;
+struct dma_heap *sys_uncached_heap;
 
 struct system_heap_buffer {
struct dma_heap *heap;
@@ -31,6 +32,8 @@ struct system_heap_buffer {
struct sg_table sg_table;
int vmap_cnt;
void *vaddr;
+
+   bool uncached;
 };
 
 struct dma_heap_attachment {
@@ -38,6 +41,8 @@ struct dma_heap_attachment {
struct sg_table *table;
struct list_head list;
bool mapped;
+
+   bool uncached;
 };
 
 #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
@@ -94,7 +99,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
a->dev = attachment->dev;
INIT_LIST_HEAD(>list);
a->mapped = false;
-
+   a->uncached = buffer->uncached;
attachment->priv = a;
 
mutex_lock(>lock);
@@ -124,9 +129,13 @@ static struct sg_table *system_heap_map_dma_buf(struct 
dma_buf_attachment *attac
 {
struct dma_heap_attachment *a = attachment->priv;
struct sg_table *table = a->table;
+   int attr = 0;
int ret;
 
-   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   if (a->uncached)
+   attr = DMA_ATTR_SKIP_CPU_SYNC;
+
+   ret = dma_map_sgtable(attachment->dev, table, direction, attr);
if (ret)
return ERR_PTR(ret);
 
@@ -139,9 +148,12 @@ static void system_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
  enum dma_data_direction direction)
 {
struct dma_heap_attachment *a = attachment->priv;
+   int attr = 0;
 
+   if (a->uncached)
+   attr = DMA_ATTR_SKIP_CPU_SYNC;
a->mapped = false;
-   dma_unmap_sgtable(attachment->dev, table, direction, 0);
+   dma_unmap_sgtable(attachment->dev, table, direction, attr);
 }
 
 static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -150,6 +162,9 @@ static int system_heap_dma_buf_begin_cpu_access(struct 
dma_buf *dmabuf,
struct system_heap_buffer *buffer = dmabuf->priv;
struct dma_heap_attachment *a;
 
+   if (buffer->uncached)
+   return 0;
+
mutex_lock(>lock);
 
if (buffer->vmap_cnt)
@@ -171,6 +186,9 @@ static int system_heap_dma_buf_end_cpu_access(struct 
dma_buf *dmabuf,
struct system_heap_buffer *buffer = dmabuf->priv;
struct dma_heap_attachment *a;
 
+   if (buffer->uncached)
+   return 0;
+
mutex_lock(>lock);
 
if 

[PATCH v3 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation

2020-10-02 Thread John Stultz
Hey All,
  So this is another revision of my patch series to performance
optimizations to the dma-buf system heap.

Unfortunately, in working these up, I realized the heap-helpers
infrastructure we tried to add to miniimize code duplication is
not as generic as we intended. For some heaps it makes sense to
deal with page lists, for other heaps it makes more sense to
track things with sgtables.

So this series reworks the system heap to use sgtables, and then
consolidates the pagelist method from the heap-helpers into the
CMA heap. After which the heap-helpers logic is removed (as it
is unused). I'd still like to find a better way to avoid some of
the logic duplication in implementing the entire dma_buf_ops
handlers per heap. But unfortunately that code is tied somewhat
to how the buffer's memory is tracked.

After this, the series introduces an optimization that
Ørjan Eide implemented for ION that avoids calling sync on
attachments that don't have a mapping.

Next, an optimization to use larger order pages for the system
heap. This change brings us closer to the current performance
of the ION code.

Unfortunately, after submitting the last round, I realized that
part of the reason the page-pooling patch I had included was
providing such great performance numbers, was because the
network page-pool implementation doesn't zero pages that it
pulls from the cache. This is very inappropriate for buffers we
pass to userland and was what gave it an unfair advantage
(almost constant time performance) relative to ION's allocation
performance numbers. I added some patches to zero the buffers
manually similar to how ION does it, but I found this resulted
in basically no performance improvement from the standard page
allocator. Thus I've dropped that patch in this series for now.

Unfortunately this means we still have a performance delta from
the ION system heap as measured by my microbenchmark, and this
delta comes from ION system_heap's use of deferred freeing of
pages. So less work is done in the measured interval of the
microbenchmark. I'll be looking at adding similar code
eventually but I don't want to hold the rest of the patches up
on this, as it is still a good improvement over the current
code.

I've updated the chart I shared earlier with current numbers
(including with the unsubmitted net pagepool implementation, and
with a different unsubmitted pagepool implementation borrowed
from ION) here:
https://docs.google.com/spreadsheets/d/1-1C8ZQpmkl_0DISkI6z4xelE08MlNAN7oEu34AnO4Ao/edit?usp=sharing

I did add to this series a reworked version of my uncached
system heap implementation I was submitting a few weeks back.
Since it duplicated a lot of the now reworked system heap code,
I realized it would be much simpler to add the functionality to
the system_heap implementaiton itself.

While not improving the core allocation performance, the
uncached heap allocations do result in *much* improved
performance on HiKey960 as it avoids a lot of flushing and
invalidating buffers that the cpu doesn't touch often.

Feedback on these would be great!

thanks
-john


New in v3:
* Dropped page-pool patches as after correcting the code to
  zero buffers, they provided no net performance gain.
* Added system-uncached implementation ontop of reworked
  system-heap.
* Use the new sgtable mapping functions, in the system and cma
  code  as Suggested-by: Daniel Mentz 
* Cleanup: Use page_size() rather then open-coding it



Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org

John Stultz (7):
  dma-buf: system_heap: Rework system heap to use sgtables instead of
pagelists
  dma-buf: heaps: Move heap-helper logic into the cma_heap
implementation
  dma-buf: heaps: Remove heap-helpers code
  dma-buf: heaps: Skip sync if not mapped
  dma-buf: system_heap: Allocate higher order pages if available
  dma-buf: dma-heap: Keep track of the heap device struct
  dma-buf: system_heap: Add a system-uncached heap re-using the system
heap

 drivers/dma-buf/dma-heap.c   |  33 +-
 drivers/dma-buf/heaps/Makefile   |   1 -
 drivers/dma-buf/heaps/cma_heap.c | 327 +++---
 drivers/dma-buf/heaps/heap-helpers.c | 271 ---
 drivers/dma-buf/heaps/heap-helpers.h |  53 ---
 drivers/dma-buf/heaps/system_heap.c  | 480 ---
 include/linux/dma-heap.h |   9 +
 7 files changed, 741 insertions(+), 433 deletions(-)
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h

-- 
2.17.1



Re: [PATCH v4 1/3] iommu/tegra-smmu: Use fwspec in tegra_smmu_(de)attach_dev

2020-10-02 Thread Dmitry Osipenko
03.10.2020 02:53, Nicolin Chen пишет:
> On Fri, Oct 02, 2020 at 11:12:18PM +0300, Dmitry Osipenko wrote:
>> 02.10.2020 22:45, Nicolin Chen пишет:
>>> On Fri, Oct 02, 2020 at 05:41:50PM +0300, Dmitry Osipenko wrote:
 02.10.2020 09:08, Nicolin Chen пишет:
>  static int tegra_smmu_attach_dev(struct iommu_domain *domain,
>struct device *dev)
>  {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>   struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
>   struct tegra_smmu_as *as = to_smmu_as(domain);
> - struct device_node *np = dev->of_node;
> - struct of_phandle_args args;
>   unsigned int index = 0;
>   int err = 0;
>  
> - while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
> -)) {
> - unsigned int swgroup = args.args[0];
> -
> - if (args.np != smmu->dev->of_node) {
> - of_node_put(args.np);
> - continue;
> - }
> -
> - of_node_put(args.np);
> + if (!fwspec)
> + return -ENOENT;

 Could the !fwspec ever be true here as well?
>>>
>>> There are multiple callers of this function. It's really not that
>>> straightforward to track every one of them. So I'd rather have it
>>> here as other iommu drivers do. We are human beings, so we could
>>> have missed something somewhere, especially callers are not from
>>> tegra-* drivers.
>>>
>>
>> I'm looking at the IOMMU core and it requires device to be in IOMMU
>> group before attach_dev() could be called.
>>
>> The group can't be assigned to device without the fwspec, see
>> tegra_smmu_device_group().
>>
>> Seems majority of IOMMU drivers are checking dev_iommu_priv_get() for
>> NULL in attach_dev(), some not checking anything, some check both and
>> only arm-smmu checks the fwspec.
> 
> As I said a couple of days ago, I don't like to assume that the
> callers won't change. And this time, it's from open code. So I
> don't want to assume that there won't be a change.
> 
> If you are confident that there is no need to add such a check,
> please send patches to remove those checks in those drivers to
> see if others would agree. I would be willing to remove it after
> that. Otherwise, I'd like to keep this.
> 
> Thanks for the review.
> 

I haven't tried to check every code path very thoroughly, expecting you
to do it since you're making this patch. Maybe there is a real reason
why majority of drivers do the checks and it would be good to know why.
Although, it's not critical in this particular case and indeed the
checks could be improved later on.

It looks to me that at least will be a bit better/cleaner to check the
dev_iommu_priv_get() for NULL instead of fwspec because the private
variable depends on the fwspec presence and there is a similar check in
probe_device, hence checks will be more consistent.



Re: [PATCHv2 1/3] ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument

2020-10-02 Thread Theodore Y. Ts'o
On Sat, Aug 22, 2020 at 05:04:35PM +0530, Ritesh Harjani wrote:
> Refactor ext4_overwrite_io() to take struct ext4_map_blocks
> as it's function argument with m_lblk and m_len filled
> from caller
> 
> There should be no functionality change in this patch.
> 
> Signed-off-by: Ritesh Harjani 
> ---
>  fs/ext4/file.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 2a01e31a032c..84f73ed91af2 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -188,26 +188,22 @@ ext4_extending_io(struct inode *inode, loff_t offset, 
> size_t len)
>  }
>  
>  /* Is IO overwriting allocated and initialized blocks? */
> -static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len)
> +static bool ext4_overwrite_io(struct inode *inode, struct ext4_map_blocks 
> *map)
>  {
> - struct ext4_map_blocks map;
>   unsigned int blkbits = inode->i_blkbits;
> - int err, blklen;ts
> + loff_t end = (map->m_lblk + map->m_len) << blkbits;

As Dan Carpenter has pointed out, we need to cast map->m_lblk to
loff_t, since m_lblk is 32 bits, and when this get shifted left by
blkbits, we could end up losing bits.

> - if (pos + len > i_size_read(inode))
> + if (end > i_size_read(inode))
>   return false;

This transformation is not functionally identical.

The problem is that pos is not necessarily a multiple of the file
system blocksize.From below, 

> + map.m_lblk = offset >> inode->i_blkbits;
> + map.m_len = EXT4_MAX_BLOCKS(count, offset, inode->i_blkbits);

So what previously was the starting offset of the overwrite, is now
offset shifted right by blkbits, and then shifted left back by blkbits.

So unless I'm missing something, this looks not quite right?

- Ted


[tip:timers/core] BUILD SUCCESS 1b80043ed21894eca888157145b955df02887995

2020-10-02 Thread kernel test robot
   defconfig
nios2allyesconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a004-20201002
x86_64   randconfig-a001-20201002
x86_64   randconfig-a002-20201002
x86_64   randconfig-a005-20201002
x86_64   randconfig-a003-20201002
x86_64   randconfig-a006-20201002
i386 randconfig-a006-20201002
i386 randconfig-a005-20201002
i386 randconfig-a001-20201002
i386 randconfig-a004-20201002
i386 randconfig-a003-20201002
i386 randconfig-a002-20201002
i386 randconfig-a006-20201003
i386 randconfig-a005-20201003
i386 randconfig-a001-20201003
i386 randconfig-a004-20201003
i386 randconfig-a003-20201003
i386 randconfig-a002-20201003
i386 randconfig-a014-20201002
i386 randconfig-a013-20201002
i386 randconfig-a015-20201002
i386 randconfig-a016-20201002
i386 randconfig-a011-20201002
i386 randconfig-a012-20201002
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a012-20201002
x86_64   randconfig-a015-20201002
x86_64   randconfig-a014-20201002
x86_64   randconfig-a013-20201002
x86_64   randconfig-a011-20201002
x86_64   randconfig-a016-20201002

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface

2020-10-02 Thread Randy Dunlap
On 10/2/20 6:17 PM, Ricardo Neri wrote:
> +u32 arch_get_cpu_type(int cpu)
> +{
> + struct cpuinfo_x86 *c = _data(cpu);
> +
> + if (cpu < 0 || cpu >= nr_cpu_ids)
> + return 0;
> +
> + return c->x86_cpu_type;> +}

Hi,

Consider using
#include 
and array_index_nospec() to avoid speculation problems on cpu_data.

cheers.
-- 
~Randy



Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface

2020-10-02 Thread Randy Dunlap
On 10/2/20 6:17 PM, Ricardo Neri wrote:
> +/**
> + * arch_get_cpu_type() - Get the CPU type number
> + * @cpu: Index of the CPU of which the index is needed
> + *
> + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that

Are you sure that @cpu is non-zero?


> + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> + * type have the same type number. Type numbers are defined by each CPU
> + * architecture.
> + */
> +u32 __weak arch_get_cpu_type(int cpu)
> +{
> + return 0;
> +}

arch_get_cpu_type() in patch 4/4 allows @cpu to be 0.


-- 
~Randy



Re: [PATCH v2] scsi: ufs-exynos: use devm_platform_ioremap_resource_byname()

2020-10-02 Thread Martin K. Petersen
On Wed, 16 Sep 2020 10:40:17 +0200, Bean Huo wrote:

> Use devm_platform_ioremap_resource_byname() to simplify the code.

Applied to 5.10/scsi-queue, thanks!

[1/1] scsi: ufs: ufs-exynos: Use devm_platform_ioremap_resource_byname()
  https://git.kernel.org/mkp/scsi/c/2dd39fad92a1

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3] mailbox: mediatek: Fix handling of platform_get_irq() error

2020-10-02 Thread Chun-Kuang Hu
Krzysztof Kozlowski  於 2020年8月28日 週五 上午2:25寫道:
>
> platform_get_irq() returns -ERRNO on error.  In such case casting to u32
> and comparing to 0 would pass the check.
>

Reviewed-by: Chun-Kuang Hu 

> Fixes: 623a6143a845 ("mailbox: mediatek: Add Mediatek CMDQ driver")
> Signed-off-by: Krzysztof Kozlowski 
>
> ---
>
> Changes since v2:
> 1. Fix subject.
>
> Changes since v1:
> 1. Correct u32->int,
> 2. Fix left-over '!'.
> ---
>  drivers/mailbox/mtk-cmdq-mailbox.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c 
> b/drivers/mailbox/mtk-cmdq-mailbox.c
> index 484d4438cd83..5665b6ea8119 100644
> --- a/drivers/mailbox/mtk-cmdq-mailbox.c
> +++ b/drivers/mailbox/mtk-cmdq-mailbox.c
> @@ -69,7 +69,7 @@ struct cmdq_task {
>  struct cmdq {
> struct mbox_controller  mbox;
> void __iomem*base;
> -   u32 irq;
> +   int irq;
> u32 thread_nr;
> u32 irq_mask;
> struct cmdq_thread  *thread;
> @@ -525,10 +525,8 @@ static int cmdq_probe(struct platform_device *pdev)
> }
>
> cmdq->irq = platform_get_irq(pdev, 0);
> -   if (!cmdq->irq) {
> -   dev_err(dev, "failed to get irq\n");
> -   return -EINVAL;
> -   }
> +   if (cmdq->irq < 0)
> +   return cmdq->irq;
>
> plat_data = (struct gce_plat *)of_device_get_match_data(dev);
> if (!plat_data) {
> --
> 2.17.1
>
>
> ___
> Linux-mediatek mailing list
> linux-media...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek


Re: [PATCH v38 12/24] x86/sgx: Add SGX_IOC_ENCLAVE_CREATE

2020-10-02 Thread Jarkko Sakkinen
On Fri, Oct 02, 2020 at 07:23:55PM -0500, Haitao Huang wrote:
> On Tue, 15 Sep 2020 06:28:30 -0500, Jarkko Sakkinen
>  wrote:
> 
> > Add an ioctl that performs ENCLS[ECREATE], which creates SGX Enclave
> > Control Structure for the enclave. SECS contains attributes about the
> > enclave that are used by the hardware and cannot be directly accessed by
> > software, as SECS resides in the EPC.
> > 
> > One essential field in SECS is a field that stores the SHA256 of the
> > measured enclave pages. This field, MRENCLAVE, is initialized by the
> > ECREATE instruction and updated by every EADD and EEXTEND operation.
> > Finally, EINIT locks down the value.
> > 
> > Acked-by: Jethro Beekman 
> > Tested-by: Jethro Beekman 
> > Tested-by: Haitao Huang 
> > Tested-by: Chunyang Hui 
> > Tested-by: Jordan Hand 
> > Tested-by: Nathaniel McCallum 
> > Tested-by: Seth Moore 
> > Tested-by: Darren Kenny 
> > Reviewed-by: Darren Kenny 
> > Co-developed-by: Sean Christopherson 
> > Signed-off-by: Sean Christopherson 
> > Co-developed-by: Suresh Siddha 
> > Signed-off-by: Suresh Siddha 
> > Signed-off-by: Jarkko Sakkinen 
> > ---
> >  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
> >  arch/x86/include/uapi/asm/sgx.h   |  25 ++
> >  arch/x86/kernel/cpu/sgx/Makefile  |   1 +
> >  arch/x86/kernel/cpu/sgx/driver.c  |  12 +
> >  arch/x86/kernel/cpu/sgx/driver.h  |   1 +
> >  arch/x86/kernel/cpu/sgx/ioctl.c   | 220 ++
> >  6 files changed, 260 insertions(+)
> >  create mode 100644 arch/x86/include/uapi/asm/sgx.h
> >  create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c
> > 
> > diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst
> > b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > index 2a198838fca9..a89e1c46a25a 100644
> > --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> > +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > @@ -323,6 +323,7 @@ Code  Seq#Include File
> > Comments
> >   
> > 
> >  0xA3  90-9F  linux/dtlk.h
> >  0xA4  00-1F  uapi/linux/tee.h
> > Generic TEE subsystem
> > +0xA4  00-1F  uapi/asm/sgx.h
> > 
> >  0xAA  00-3F  linux/uapi/linux/userfaultfd.h
> >  0xAB  00-1F  linux/nbd.h
> >  0xAC  00-1F  linux/raw.h
> > diff --git a/arch/x86/include/uapi/asm/sgx.h
> > b/arch/x86/include/uapi/asm/sgx.h
> > new file mode 100644
> > index ..c75b375f3770
> > --- /dev/null
> > +++ b/arch/x86/include/uapi/asm/sgx.h
> > @@ -0,0 +1,25 @@
> > +/* SPDX-License-Identifier: ((GPL-2.0+ WITH Linux-syscall-note) OR
> > BSD-3-Clause) */
> > +/*
> > + * Copyright(c) 2016-19 Intel Corporation.
> > + */
> > +#ifndef _UAPI_ASM_X86_SGX_H
> > +#define _UAPI_ASM_X86_SGX_H
> > +
> > +#include 
> > +#include 
> > +
> > +#define SGX_MAGIC 0xA4
> > +
> > +#define SGX_IOC_ENCLAVE_CREATE \
> > +   _IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create)
> > +
> > +/**
> > + * struct sgx_enclave_create - parameter structure for the
> > + * %SGX_IOC_ENCLAVE_CREATE ioctl
> > + * @src:   address for the SECS page data
> > + */
> > +struct sgx_enclave_create  {
> > +   __u64   src;
> > +};
> > +
> > +#endif /* _UAPI_ASM_X86_SGX_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/Makefile
> > b/arch/x86/kernel/cpu/sgx/Makefile
> > index 3fc451120735..91d3dc784a29 100644
> > --- a/arch/x86/kernel/cpu/sgx/Makefile
> > +++ b/arch/x86/kernel/cpu/sgx/Makefile
> > @@ -1,4 +1,5 @@
> >  obj-y += \
> > driver.o \
> > encl.o \
> > +   ioctl.o \
> > main.o
> > diff --git a/arch/x86/kernel/cpu/sgx/driver.c
> > b/arch/x86/kernel/cpu/sgx/driver.c
> > index f54da5f19c2b..7bdb49dfcca6 100644
> > --- a/arch/x86/kernel/cpu/sgx/driver.c
> > +++ b/arch/x86/kernel/cpu/sgx/driver.c
> > @@ -114,10 +114,22 @@ static unsigned long sgx_get_unmapped_area(struct
> > file *file,
> > return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> >  }
> > +#ifdef CONFIG_COMPAT
> > +static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,
> > + unsigned long arg)
> > +{
> > +   return sgx_ioctl(filep, cmd, arg);
> > +}
> > +#endif
> > +
> >  static const struct file_operations sgx_encl_fops = {
> > .owner  = THIS_MODULE,
> > .open   = sgx_open,
> > .release= sgx_release,
> > +   .unlocked_ioctl = sgx_ioctl,
> > +#ifdef CONFIG_COMPAT
> > +   .compat_ioctl   = sgx_compat_ioctl,
> > +#endif
> > .mmap   = sgx_mmap,
> > .get_unmapped_area  = sgx_get_unmapped_area,
> >  };
> > diff --git a/arch/x86/kernel/cpu/sgx/driver.h
> > b/arch/x86/kernel/cpu/sgx/driver.h
> > index f7ce40dedc91..e4063923115b 100644
> > --- a/arch/x86/kernel/cpu/sgx/driver.h
> > +++ b/arch/x86/kernel/cpu/sgx/driver.h
> > @@ -9,6 +9,7 @@
> >  #include 
> >  #include 
> >  #include 
> 

Re: [PATCH 0/4] Mediatek DRM driver detect CMDQ execution timeout by vblank IRQ

2020-10-02 Thread Chun-Kuang Hu
Hi, Jassi:

Jassi Brar  於 2020年10月3日 週六 上午4:30寫道:
>
> On Sun, Sep 27, 2020 at 6:04 PM Chun-Kuang Hu  wrote:
> >
> > CMDQ helper provide timer to detect execution timeout, but DRM driver
> > could have a better way to detect execution timeout by vblank IRQ.
> > For DRM, CMDQ command should execute in vblank, so if it fail to
> > execute in next 2 vblank, timeout happen. Even though we could
> > calculate time between 2 vblank and use timer to delect, this would
> > make things more complicated.
> >
> > This introduce a series refinement for CMDQ mailbox controller and CMDQ
> > helper. Remove timer handler in helper function because different
> > client have different way to detect timeout. Use standard mailbox
> > callback instead of proprietary one to get the necessary data
> > in callback function. Remove struct cmdq_client to access client
> > instance data by struct mbox_client.
> >
> > Chun-Kuang Hu (4):
> >   soc / drm: mediatek: cmdq: Remove timeout handler in helper function
> >   mailbox / soc / drm: mediatek: Use mailbox rx_callback instead of
> > cmdq_task_cb
> >   mailbox / soc / drm: mediatek: Remove struct cmdq_client
> >   drm/mediatek: Detect CMDQ execution timeout
> >
> >  drivers/gpu/drm/mediatek/mtk_drm_crtc.c  |  54 ++---
> >  drivers/mailbox/mtk-cmdq-mailbox.c   |  24 ++--
> >  drivers/soc/mediatek/mtk-cmdq-helper.c   | 146 ++-
> >  include/linux/mailbox/mtk-cmdq-mailbox.h |  25 +---
> >  include/linux/soc/mediatek/mtk-cmdq.h|  54 +
> >  5 files changed, 66 insertions(+), 237 deletions(-)
> >
> Please break this into two patchsets - one for mailbox and one for its users.
> Also, CC original author and recent major contributors to mtk-cmdq-mailbox.c
>

Agree with you. But for patch [2/4] ("Use mailbox rx_callback instead
of cmdq_task_cb"), I think it would be a long term process.
I would break it into:

1. mtk-cmdq-mailbox.c: add rx_callback and keep  cmdq_task_cb because
client is using cmdq_task_cb.
2. client: change from cmdq_task_cb to rx_callback.
3. mtk-cmdq-mailbox.c: remove cmdq_task_cb.

The three step has dependency, but the 2nd should move to another
series, so I would go 1st step first.

Regards,
Chun-Kuang.

> Thanks.


[rcu:rcu/next] BUILD SUCCESS d6dc6c311e779d1d01e9395f27bf6f4315db5502

2020-10-02 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git  rcu/next
branch HEAD: d6dc6c311e779d1d01e9395f27bf6f4315db5502  rcu/tree: Add a warning 
if CPU being onlined did not report QS already

elapsed time: 724m

configs tested: 116
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
arm   imx_v4_v5_defconfig
powerpcklondike_defconfig
mips  malta_defconfig
arcnsim_700_defconfig
arm  badge4_defconfig
arm lpc18xx_defconfig
powerpc  ep88xc_defconfig
powerpc  ppc64e_defconfig
arm   milbeaut_m10v_defconfig
arm  ep93xx_defconfig
arm palmz72_defconfig
m68k   m5475evb_defconfig
sh   secureedge5410_defconfig
arm   tegra_defconfig
sh   se7619_defconfig
mips   lemote2f_defconfig
mips decstation_r4k_defconfig
armzeus_defconfig
powerpc tqm8555_defconfig
sh   se7343_defconfig
powerpccell_defconfig
m68kstmark2_defconfig
arm axm55xx_defconfig
powerpc tqm5200_defconfig
mips loongson1b_defconfig
sh  sdk7786_defconfig
arm am200epdkit_defconfig
armlart_defconfig
mips   jazz_defconfig
c6x dsk6455_defconfig
armmini2440_defconfig
c6xevmc6474_defconfig
arm  imote2_defconfig
arm   omap2plus_defconfig
sh  rsk7264_defconfig
mipsbcm47xx_defconfig
powerpc canyonlands_defconfig
armmulti_v7_defconfig
mipsar7_defconfig
armdove_defconfig
powerpc stx_gp3_defconfig
powerpc mpc512x_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a004-20201002
x86_64   randconfig-a001-20201002
x86_64   randconfig-a002-20201002
x86_64   randconfig-a005-20201002
x86_64   randconfig-a003-20201002
x86_64   randconfig-a006-20201002
i386 randconfig-a006-20201002
i386 randconfig-a005-20201002
i386 randconfig-a001-20201002
i386 randconfig-a004-20201002
i386 randconfig-a003-20201002
i386 randconfig-a002-20201002
i386 randconfig-a014-20201002
i386 randconfig-a013-20201002
i386 randconfig-a015-20201002
i386 randconfig-a016-20201002
i386 randconfig-a011-20201002
i386 randconfig-a012-20201002
riscvnommu_k210_defconfig
riscvallyesconfig
riscv

Re: [PATCH v2 2/6] fpga: m10bmc-sec: create max10 bmc security engine

2020-10-02 Thread Randy Dunlap
On 10/2/20 6:24 PM, Russ Weight wrote:
> diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> index c534cc80f398..2380d36b08c7 100644
> --- a/drivers/fpga/Kconfig
> +++ b/drivers/fpga/Kconfig
> @@ -235,4 +235,15 @@ config IFPGA_SEC_MGR
> region and for the BMC. Select this option to enable
> updates for secure FPGA devices.
>  
> +config IFPGA_M10_BMC_SECURE
> +tristate "Intel MAX10 BMC security engine"
> + depends on MFD_INTEL_M10_BMC && IFPGA_SEC_MGR
> +help
> +  Secure update support for the Intel MAX10 board management
> +   controller.
> +
> +   This is a subdriver of the Intel MAX10 board management controller
> +   (BMC) and provides support for secure updates for the BMC image,
> +   the FPGA image, the Root Entry Hashes, etc.
> +
>  endif # FPGA

Dagnabit, I need a bot to do this.

Clean up the indentation in the Kconfig file.

>From Documentation/process/coding-style.rst, section 10:

Lines under a ``config`` definition
are indented with one tab, while help text is indented an additional two
spaces.

checkpatch should have found that issue. Did it not?


thanks.
-- 
~Randy



Re: [PATCH 0/7] hisi_sas: Add runtime PM support for v3 hw

2020-10-02 Thread Martin K. Petersen


John,

> This series adds runtime PM support for v3 hw. Consists of:
> - Switch to new PM suspend and resume framework
> - Add links to devices to ensure host cannot be suspended while devices
>   are not
> - Filter out phy events during suspend to avoid deadlock
> - Add controller RPM support
> - And some more minor misc related changes

Applied to 5.10/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


include/linux/sched/topology.h:237:9: error: implicit declaration of function 'cpu_logical_map'

2020-10-02 Thread kernel test robot
Hi Rafael,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   d3d45f8220d60a0b2cf8fb2be4e6ffd9008e
commit: 5b5642075c317e67ea342a2fb8023a8e754a5002 Merge branches 'pm-em' and 
'pm-core'
date:   9 weeks ago
config: mips-randconfig-r024-20201003 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
bcd05599d0e53977a963799d6ee4f6e0bc21331b)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mips-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b5642075c317e67ea342a2fb8023a8e754a5002
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 5b5642075c317e67ea342a2fb8023a8e754a5002
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   clang-12: warning: argument unused during compilation: '-mno-branch-likely' 
[-Wunused-command-line-argument]
   clang-12: warning: argument unused during compilation: '-mno-branch-likely' 
[-Wunused-command-line-argument]
   In file included from arch/mips/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:10:
   In file included from include/crypto/hash.h:11:
   In file included from include/linux/crypto.h:19:
   In file included from include/linux/slab.h:15:
   In file included from include/linux/gfp.h:9:
   include/linux/topology.h:119:9: error: implicit declaration of function 
'cpu_logical_map' [-Werror,-Wimplicit-function-declaration]
   return cpu_to_node(raw_smp_processor_id());
  ^
   arch/mips/include/asm/mach-loongson64/topology.h:7:27: note: expanded from 
macro 'cpu_to_node'
   #define cpu_to_node(cpu)(cpu_logical_map(cpu) >> 2)
^
   In file included from arch/mips/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:10:
   In file included from include/crypto/hash.h:11:
   In file included from include/linux/crypto.h:19:
   In file included from include/linux/slab.h:15:
   In file included from include/linux/gfp.h:9:
   include/linux/topology.h:176:9: error: implicit declaration of function 
'cpu_logical_map' [-Werror,-Wimplicit-function-declaration]
   return cpu_to_node(cpu);
  ^
   arch/mips/include/asm/mach-loongson64/topology.h:7:27: note: expanded from 
macro 'cpu_to_node'
   #define cpu_to_node(cpu)(cpu_logical_map(cpu) >> 2)
^
   In file included from arch/mips/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:10:
   In file included from include/crypto/hash.h:11:
   In file included from include/linux/crypto.h:19:
   In file included from include/linux/slab.h:15:
   In file included from include/linux/gfp.h:9:
   include/linux/topology.h:210:25: error: implicit declaration of function 
'cpu_logical_map' [-Werror,-Wimplicit-function-declaration]
   return cpumask_of_node(cpu_to_node(cpu));
  ^
   arch/mips/include/asm/mach-loongson64/topology.h:7:27: note: expanded from 
macro 'cpu_to_node'
   #define cpu_to_node(cpu)(cpu_logical_map(cpu) >> 2)
^
   In file included from arch/mips/kernel/asm-offsets.c:17:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:22:
   In file included from include/linux/writeback.h:14:
   In file included from include/linux/blk-cgroup.h:23:
   In file included from include/linux/blkdev.h:11:
   In file included from include/linux/genhd.h:36:
   In file included from include/linux/device.h:16:
   In file included from include/linux/energy_model.h:10:
>> include/linux/sched/topology.h:237:9: error: implicit declaration of 
>> function 'cpu_logical_map' [-Werror,-Wimplicit-function-declaration]
   return cpu_to_node(task_cpu(p));
  ^
   arch/mips/include/asm/mach-loongson64/topology.h:7:27: note: expanded from 
macro 'cpu_to_node'
   #define cpu_to_node(cpu)(cpu_logical_map(cpu) >> 2)
^
   

Re: virtiofs: WARN_ON(out_sgs + in_sgs != total_sgs)

2020-10-02 Thread Qian Cai
On Fri, 2020-10-02 at 12:28 -0400, Qian Cai wrote:
> Running some fuzzing on virtiofs from a non-privileged user could trigger a
> warning in virtio_fs_enqueue_req():
> 
> WARN_ON(out_sgs + in_sgs != total_sgs);

Okay, I can reproduce this after running for a few hours:

out_sgs = 3, in_sgs = 2, total_sgs = 6

and this time from flush_bg_queue() instead of fuse_simple_request().

>From the log, the last piece of code is:

ftruncate(fd=186, length=4)

which is a test file on virtiofs:

[main]  testfile fd:186 filename:trinity-testfile3 flags:2 fopened:1 
fcntl_flags:2000 global:1
[main]   start: 0x7f47c1199000 size:4KB  name: trinity-testfile3 global:1


[ 9863.468502] WARNING: CPU: 16 PID: 286083 at fs/fuse/virtio_fs.c:1152 
virtio_fs_enqueue_req+0xd36/0xde0 [virtiofs]
[ 9863.474442] Modules linked in: dlci 8021q garp mrp bridge stp llc 
ieee802154_socket ieee802154 vsock_loopback vmw_vsock_virtio_transport_common 
vmw_vsock_vmci_transport vsock mpls_router vmw_vmci ip_tunnel as
[ 9863.474555]  ata_piix fuse serio_raw libata e1000 sunrpc dm_mirror 
dm_region_hash dm_log dm_mod
[ 9863.535805] CPU: 16 PID: 286083 Comm: trinity-c5 Kdump: loaded Not tainted 
5.9.0-rc7-next-20201002+ #2
[ 9863.544368] Hardware name: Red Hat KVM, BIOS 
1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014
[ 9863.550129] RIP: 0010:virtio_fs_enqueue_req+0xd36/0xde0 [virtiofs]
[ 9863.552998] Code: 60 09 23 d9 e9 44 fa ff ff e8 56 09 23 d9 e9 70 fa ff ff 
48 89 cf 48 89 4c 24 08 e8 44 09 23 d9 48 8b 4c 24 08 e9 7c fa ff ff <0f> 0b 48 
c7 c7 c0 85 60 c0 44 89 e1 44 89 fa 44 89 ee e8 e3 b7
[ 9863.561720] RSP: 0018:888a696ef6f8 EFLAGS: 00010202
[ 9863.565420] RAX:  RBX: 88892e030008 RCX: 
[ 9863.568735] RDX: 0005 RSI:  RDI: 888a696ef8ac
[ 9863.572037] RBP: 888a49d03d30 R08: ed114d2ddf18 R09: 888a696ef8a0
[ 9863.575383] R10: 888a696ef8bf R11: ed114d2ddf17 R12: 0006
[ 9863.578668] R13: 0003 R14: 0002 R15: 0002
[ 9863.581971] FS:  7f47c12f5740() GS:888a7f80() 
knlGS:
[ 9863.585752] CS:  0010 DS:  ES:  CR0: 80050033
[ 9863.590232] CR2:  CR3: 000a63570005 CR4: 00770ee0
[ 9863.594698] DR0: 7f6642e43000 DR1:  DR2: 
[ 9863.598521] DR3:  DR6: 0ff0 DR7: 0600
[ 9863.601861] PKRU: 5540
[ 9863.603173] Call Trace:
[ 9863.604382]  ? virtio_fs_probe+0x13e0/0x13e0 [virtiofs]
[ 9863.606838]  ? is_bpf_text_address+0x21/0x30
[ 9863.608869]  ? kernel_text_address+0x125/0x140
[ 9863.610962]  ? __kernel_text_address+0xe/0x30
[ 9863.613117]  ? unwind_get_return_address+0x5f/0xa0
[ 9863.615427]  ? create_prof_cpu_mask+0x20/0x20
[ 9863.617435]  ? _raw_write_lock_irqsave+0xe0/0xe0
[ 9863.619627]  virtio_fs_wake_pending_and_unlock+0x1ea/0x610 [virtiofs]
[ 9863.622638]  ? queue_request_and_unlock+0x115/0x280 [fuse]
[ 9863.625224]  flush_bg_queue+0x24c/0x3e0 [fuse]
[ 9863.627325]  fuse_simple_background+0x3d7/0x6c0 [fuse]
[ 9863.629735]  fuse_send_writepage+0x173/0x420 [fuse]
[ 9863.632031]  fuse_flush_writepages+0x1fe/0x330 [fuse]
[ 9863.634463]  ? make_kgid+0x13/0x20
[ 9863.636064]  ? fuse_change_attributes_common+0x2de/0x940 [fuse]
[ 9863.638850]  fuse_do_setattr+0xe84/0x13c0 [fuse]
[ 9863.641024]  ? migrate_swap_stop+0x8d1/0x920
[ 9863.643041]  ? fuse_flush_times+0x390/0x390 [fuse]
[ 9863.645347]  ? avc_has_perm_noaudit+0x390/0x390
[ 9863.647465]  fuse_setattr+0x197/0x400 [fuse]
[ 9863.649466]  notify_change+0x744/0xda0
[ 9863.651247]  ? __down_timeout+0x2a0/0x2a0
[ 9863.653125]  ? do_truncate+0xe2/0x180
[ 9863.654854]  do_truncate+0xe2/0x180
[ 9863.656509]  ? __x64_sys_openat2+0x1c0/0x1c0
[ 9863.658512]  ? alarm_setitimer+0xa0/0x110
[ 9863.660418]  do_sys_ftruncate+0x1ee/0x2c0
[ 9863.662311]  do_syscall_64+0x33/0x40
[ 9863.663980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9863.666384] RIP: 0033:0x7f47c0c0878d
[ 9863.668061] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d cb 56 2c 00 f7 d8 64 89 08
[ 9863.676717] RSP: 002b:7fff515c2598 EFLAGS: 0246 ORIG_RAX: 
004d
[ 9863.680226] RAX: ffda RBX: 004d RCX: 7f47c0c0878d
[ 9863.688055] RDX: 0080 RSI: 0004 RDI: 00ba
[ 9863.693672] RBP: 004d R08: 003a R09: 0001
[ 9863.699423] R10: 0005 R11: 0246 R12: 0002
[ 9863.708897] R13: 7f47c12cb058 R14: 7f47c12f56c0 R15: 7f47c12cb000
[ 9863.713106] CPU: 16 PID: 286083 Comm: trinity-c5 Kdump: loaded Not tainted 
5.9.0-rc7-next-20201002+ #2
[ 9863.717465] Hardware name: Red Hat KVM, BIOS 
1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014
[ 9863.721389] Call Trace:
[ 9863.72

Re: [PATCH][next] bpf: verifier: Use fallthrough pseudo-keyword

2020-10-02 Thread Yonghong Song




On 10/2/20 4:42 PM, Gustavo A. R. Silva wrote:

Replace /* fallthrough */ comments with the new pseudo-keyword macro
fallthrough[1].

[1] 
https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva 


Acked-by: Yonghong Song 


Re: Litmus test for question from Al Viro

2020-10-02 Thread Jon Masters

On 10/1/20 12:15 PM, Alan Stern wrote:

On Wed, Sep 30, 2020 at 09:51:16PM -0700, Paul E. McKenney wrote:

Hello!

Al Viro posted the following query:



 fun question regarding barriers, if you have time for that
 V->A = V->B = 1;

 CPU1:
 to_free = NULL
 spin_lock()
 if (!smp_load_acquire(>B))
 to_free = V
 V->A = 0
 spin_unlock()
 kfree(to_free)

 CPU2:
 to_free = V;
 if (READ_ONCE(V->A)) {
 spin_lock()
 if (V->A)
 to_free = NULL
 smp_store_release(>B, 0);
 spin_unlock()
 }
 kfree(to_free);
 1) is it guaranteed that V will be freed exactly once and that
  no accesses to *V will happen after freeing it?
 2) do we need smp_store_release() there?  I.e. will anything
  break if it's replaced with plain V->B = 0?


Here are my answers to Al's questions:

1) It is guaranteed that V will be freed exactly once.  It is not
guaranteed that no accesses to *V will occur after it is freed, because
the test contains a data race.  CPU1's plain "V->A = 0" write races with
CPU2's READ_ONCE; if the plain write were replaced with
"WRITE_ONCE(V->A, 0)" then the guarantee would hold.  Equally well,
CPU1's smp_load_acquire could be replaced with a plain read while the
plain write is replaced with smp_store_release.

2) The smp_store_release in CPU2 is not needed.  Replacing it with a
plain V->B = 0 will not break anything.


This was my interpretation also. I made the mistake of reading this 
right before trying to go to bed the other night and ended up tweeting 
at Paul that I'd regret it if he gave me scary dreams. Thought about it 
and read your write up and it is still exactly how I see it.


Jon.

--
Computer Architect


Re: WARNING in ieee80211_bss_info_change_notify

2020-10-02 Thread syzbot
syzbot has bisected this issue to:

commit 135f971181d779c96ff3725c1a350a721785cc66
Author: Alex Deucher 
Date:   Mon Nov 20 22:49:53 2017 +

drm/amdgpu: don't skip attributes when powerplay is enabled

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=120f55bd90
start commit:   fcadab74 Merge tag 'drm-fixes-2020-10-01-1' of git://anong..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=110f55bd90
console output: https://syzkaller.appspot.com/x/log.txt?x=160f55bd90
kernel config:  https://syzkaller.appspot.com/x/.config?x=4e672827d2ffab1f
dashboard link: https://syzkaller.appspot.com/bug?extid=09d1cd2f71e6dd3bfd2c
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=161112eb90
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=124fc53390

Reported-by: syzbot+09d1cd2f71e6dd3bf...@syzkaller.appspotmail.com
Fixes: 135f971181d7 ("drm/amdgpu: don't skip attributes when powerplay is 
enabled")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection


Re: [PATCH 0/4] scsi: arcmsr: Fix timer stop and support new adapter ARC-1886 series

2020-10-02 Thread Martin K. Petersen


ching,

> This patch series are against to mkp's 5.10/scsi-queue.
>
> 1. Remove unnecessary syntax.
> 2. Fix device hot-plug monitoring timer stop.
> 3. Add supporting ARC-1886 series Raid controllers.
> 4. Update driver version to v1.50.00.02-20200819.

Applied to 5.10/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


include/linux/topology.h:119:9: error: implicit declaration of function 'cpu_logical_map'

2020-10-02 Thread kernel test robot
Hi Fangrui,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   d4fce2e20ffed59eb5db7780fcbbb0a21decef74
commit: ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51 Makefile: Fix 
GCC_TOOLCHAIN_DIR prefix for Clang cross compilation
date:   2 months ago
config: mips-randconfig-r024-20201003 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
bcd05599d0e53977a963799d6ee4f6e0bc21331b)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mips-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   clang-12: warning: argument unused during compilation: '-mno-branch-likely' 
[-Wunused-command-line-argument]
   clang-12: warning: argument unused during compilation: '-mno-branch-likely' 
[-Wunused-command-line-argument]
   In file included from arch/mips/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:10:
   In file included from include/crypto/hash.h:11:
   In file included from include/linux/crypto.h:19:
   In file included from include/linux/slab.h:15:
   In file included from include/linux/gfp.h:9:
>> include/linux/topology.h:119:9: error: implicit declaration of function 
>> 'cpu_logical_map' [-Werror,-Wimplicit-function-declaration]
   return cpu_to_node(raw_smp_processor_id());
  ^
   arch/mips/include/asm/mach-loongson64/topology.h:7:27: note: expanded from 
macro 'cpu_to_node'
   #define cpu_to_node(cpu)(cpu_logical_map(cpu) >> 2)
^
   In file included from arch/mips/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:10:
   In file included from include/crypto/hash.h:11:
   In file included from include/linux/crypto.h:19:
   In file included from include/linux/slab.h:15:
   In file included from include/linux/gfp.h:9:
   include/linux/topology.h:176:9: error: implicit declaration of function 
'cpu_logical_map' [-Werror,-Wimplicit-function-declaration]
   return cpu_to_node(cpu);
  ^
   arch/mips/include/asm/mach-loongson64/topology.h:7:27: note: expanded from 
macro 'cpu_to_node'
   #define cpu_to_node(cpu)(cpu_logical_map(cpu) >> 2)
^
   In file included from arch/mips/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:10:
   In file included from include/crypto/hash.h:11:
   In file included from include/linux/crypto.h:19:
   In file included from include/linux/slab.h:15:
   In file included from include/linux/gfp.h:9:
   include/linux/topology.h:210:25: error: implicit declaration of function 
'cpu_logical_map' [-Werror,-Wimplicit-function-declaration]
   return cpumask_of_node(cpu_to_node(cpu));
  ^
   arch/mips/include/asm/mach-loongson64/topology.h:7:27: note: expanded from 
macro 'cpu_to_node'
   #define cpu_to_node(cpu)(cpu_logical_map(cpu) >> 2)
^
   arch/mips/kernel/asm-offsets.c:26:6: warning: no previous prototype for 
function 'output_ptreg_defines' [-Wmissing-prototypes]
   void output_ptreg_defines(void)
^
   arch/mips/kernel/asm-offsets.c:26:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
   void output_ptreg_defines(void)
   ^
   static 
   arch/mips/kernel/asm-offsets.c:78:6: warning: no previous prototype for 
function 'output_task_defines' [-Wmissing-prototypes]
   void output_task_defines(void)
^
   arch/mips/kernel/asm-offsets.c:78:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
   void output_task_defines(void)
   ^
   static 
   arch/mips/kernel/asm-offsets.c:93:6: warning: no previous prototype for 
function 'output_thread_info_defines' [-Wmissing-prototypes]
   void output_thread_info_defines(void)
^
   

Re: [PATCH 0/3] x86: Add initial support to discover Intel hybrid CPUs

2020-10-02 Thread Luck, Tony
On Sat, Oct 03, 2020 at 03:39:29AM +0200, Thomas Gleixner wrote:
> On Fri, Oct 02 2020 at 13:19, Ricardo Neri wrote:
> > Add support to discover and enumerate CPUs in Intel hybrid parts. A hybrid
> > part has CPUs with more than one type of micro-architecture. Thus, certain
> > features may only be present in a specific CPU type.
> >
> > It is useful to know the type of CPUs present in a system. For instance,
> > perf may need to handle CPUs differently depending on the type of micro-
> > architecture. Decoding machine check error logs may need the additional
> > micro-architecture type information, so include that in the log.
> 
> 'It is useful' as justification just makes me barf.

This isn't "hetero" ... all of the cores are architecturally the same.
If CPUID says that some feature is supported, then it will be supported
on all of the cores.

There might be some model specific performance counter events that only
apply to some cores. Or a machine check error code that is logged in the
model specific MSCOD field of IA32_MCi_STATUS. But any and all code can run
on any core.

Sure there will be some different power/performance tradeoffs on some
cores. But we already have that with some cores able to achieve higher
turbo frequencies than others.

-Tony


Re: [PATCH] drm/msm/dp: fixes wrong connection state caused by failure of link train

2020-10-02 Thread Stephen Boyd
Quoting Kuogee Hsieh (2020-10-02 15:09:19)
> Connection state is set incorrectly happen at either failure of link train
> or cable plugged in while suspended. This patch fixes these problems.
> This patch also replace ST_SUSPEND_PENDING with ST_DISPLAY_OFF.
> 
> Signed-off-by: Kuogee Hsieh 

Any Fixes: tag?

> ---
>  drivers/gpu/drm/msm/dp/dp_display.c | 52 ++---
>  drivers/gpu/drm/msm/dp/dp_panel.c   |  5 +++
>  2 files changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
> b/drivers/gpu/drm/msm/dp/dp_display.c
> index 431dff9de797..898c6cc1643a 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -340,8 +340,6 @@ static int dp_display_process_hpd_high(struct 
> dp_display_private *dp)
> }
>  
> dp_add_event(dp, EV_USER_NOTIFICATION, true, 0);
> -
> -
>  end:
> return rc;
>  }

Not sure we need this hunk

> @@ -1186,19 +1180,19 @@ static int dp_pm_resume(struct device *dev)
>  
> dp = container_of(dp_display, struct dp_display_private, dp_display);
>  
> +   /* start from dis connection state */

disconnection? Or disconnected state?

> +   atomic_set(>hpd_state, ST_DISCONNECTED);
> +
> dp_display_host_init(dp);
>  
> dp_catalog_ctrl_hpd_config(dp->catalog);
>  
> status = dp_catalog_hpd_get_state_status(dp->catalog);
>  
> -   if (status) {
> +   if (status)
> dp->dp_display.is_connected = true;
> -   } else {
> +   else
> dp->dp_display.is_connected = false;
> -   /* make sure next resume host_init be called */
> -   dp->core_initialized = false;
> -   }
>  
> return 0;
>  }
> @@ -1214,6 +1208,9 @@ static int dp_pm_suspend(struct device *dev)
> if (dp_display->power_on == true)
> dp_display_disable(dp, 0);
>  
> +   /* host_init will be called at pm_resume */
> +   dp->core_initialized = false;
> +
> atomic_set(>hpd_state, ST_SUSPENDED);
>  
> return 0;
> @@ -1343,6 +1340,9 @@ int msm_dp_display_enable(struct msm_dp *dp, struct 
> drm_encoder *encoder)
>  
> mutex_lock(_display->event_mutex);
>  
> +   /* delete sentinel checking */

Stop sentinel checking?

> +   dp_del_event(dp_display, EV_CONNECT_PENDING_TIMEOUT);
> +
> rc = dp_display_set_mode(dp, _display->dp_mode);
> if (rc) {
> DRM_ERROR("Failed to perform a mode set, rc=%d\n", rc);
> @@ -1368,9 +1368,8 @@ int msm_dp_display_enable(struct msm_dp *dp, struct 
> drm_encoder *encoder)
> dp_display_unprepare(dp);
> }
>  
> -   dp_del_event(dp_display, EV_CONNECT_PENDING_TIMEOUT);
> -
> -   if (state == ST_SUSPEND_PENDING)
> +   /* manual kick off plug event to train link */
> +   if (state == ST_DISPLAY_OFF)
> dp_add_event(dp_display, EV_IRQ_HPD_INT, 0, 0);
>  
> /* completed connection */
> @@ -1402,20 +1401,21 @@ int msm_dp_display_disable(struct msm_dp *dp, struct 
> drm_encoder *encoder)
>  
> mutex_lock(_display->event_mutex);
>  
> +   /* delete sentinel checking */

Stop sentinel checking?

> +   dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
> +
> dp_display_disable(dp_display, 0);
>  
> rc = dp_display_unprepare(dp);
> if (rc)
> DRM_ERROR("DP display unprepare failed, rc=%d\n", rc);
>  
> -   dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
> -
> state =  atomic_read(_display->hpd_state);
> if (state == ST_DISCONNECT_PENDING) {

I don't understand the atomic nature of this hpd_state variable. Why is
it an atomic variable? Is taking a spinlock bad? What is to prevent the
atomic read here to not be interrupted and then this if condition check
be invalid because the variable has been updated somewhere else?

> /* completed disconnection */
> atomic_set(_display->hpd_state, ST_DISCONNECTED);
> } else {
> -   atomic_set(_display->hpd_state, ST_SUSPEND_PENDING);
> +   atomic_set(_display->hpd_state, ST_DISPLAY_OFF);


Re: [PATCH net-next 7/8] net: ethernet: ti: am65-cpsw: prepare xmit/rx path for multi-port devices in mac-only mode

2020-10-02 Thread David Miller
From: Grygorii Strashko 
Date: Thu, 1 Oct 2020 13:52:57 +0300

> This patch adds multi-port support to TI AM65x CPSW driver xmit/rx path in
> preparation for adding support for multi-port devices, like Main CPSW0 on
> K3 J721E SoC or future CPSW3g on K3 AM64x SoC.
> Hence DMA channels are common/shared for all ext Ports and the RX/TX NAPI
> and DMA processing going to be assigned to first netdev this patch:
>  - ensures all RX descriptors fields are initialized;
>  - adds synchronization for TX DMA push/pop operation (locking) as
> Networking core is not enough any more;
>  - updates TX bql processing for every packet in
> am65_cpsw_nuss_tx_compl_packets() as every completed TX skb can have
> different ndev assigned (come from different netdevs).
> 
> Signed-off-by: Grygorii Strashko 

This locking is unnecessary in single-port non-shared DMA situations
and therefore will impose unnecessary performance loss for basically
all existing supported setups.

Please do this another way.

In fact, I would encourage you to find a way to avoid the new atomic
operations even in multi-port configurations.

Thank you.


Re: [PATCH] scsi: qedf: remove redundant assignment to variable 'rc'

2020-10-02 Thread Martin K. Petersen


Jing,

> This assignment is  meaningless, so remove it.

Applied to 5.10/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v6 1/4] remoteproc: core: Add ops to enable custom coredump functionality

2020-10-02 Thread Siddharth Gupta
Each remoteproc might have different requirements for coredumps and might
want to choose the type of dumps it wants to collect. This change allows
remoteproc drivers to specify their own custom dump function to be executed
in place of rproc_coredump. If the coredump op is not specified by the
remoteproc driver it will be set to rproc_coredump by default.

Signed-off-by: Siddharth Gupta 
---
 drivers/remoteproc/remoteproc_core.c | 6 +-
 include/linux/remoteproc.h   | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/remoteproc/remoteproc_core.c 
b/drivers/remoteproc/remoteproc_core.c
index 7f90eee..dcc1341 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1681,7 +1681,7 @@ int rproc_trigger_recovery(struct rproc *rproc)
goto unlock_mutex;
 
/* generate coredump */
-   rproc_coredump(rproc);
+   rproc->ops->coredump(rproc);
 
/* load firmware */
ret = request_firmware(_p, rproc->firmware, dev);
@@ -2103,6 +2103,10 @@ static int rproc_alloc_ops(struct rproc *rproc, const 
struct rproc_ops *ops)
if (!rproc->ops)
return -ENOMEM;
 
+   /* Default to rproc_coredump if no coredump function is specified */
+   if (!rproc->ops->coredump)
+   rproc->ops->coredump = rproc_coredump;
+
if (rproc->ops->load)
return 0;
 
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index 2fa68bf..a66e2cb 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -375,6 +375,7 @@ enum rsc_handling_status {
  * @get_boot_addr: get boot address to entry point specified in firmware
  * @panic: optional callback to react to system panic, core will delay
  * panic at least the returned number of milliseconds
+ * @coredump:collect firmware dump after the subsystem is shutdown
  */
 struct rproc_ops {
int (*prepare)(struct rproc *rproc);
@@ -393,6 +394,7 @@ struct rproc_ops {
int (*sanity_check)(struct rproc *rproc, const struct firmware *fw);
u64 (*get_boot_addr)(struct rproc *rproc, const struct firmware *fw);
unsigned long (*panic)(struct rproc *rproc);
+   void (*coredump)(struct rproc *rproc);
 };
 
 /**
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v6 0/4] Introduce mini-dump support for remoteproc

2020-10-02 Thread Siddharth Gupta
Sometimes firmware sizes can be in tens of MB's and reading all the memory
during coredump can consume lot of time and memory.

Introducing support for mini-dumps. Mini-dump contains smallest amount of
useful information, that could help to debug subsystem crashes.

During bootup memory is allocated in SMEM (Shared memory) in the form of a
table that contains the physical addresses and sizes of the regions that
are supposed to be collected during coredump. This memory is shared amongst
all processors in a Qualcomm platform, so all remoteprocs fill in their
entry in the global table once they are out of reset.

This patch series adds support for parsing the global minidump table and
uses the current coredump frameork to expose this memory to userspace
during remoteproc's recovery.

This patch series also integrates the patch:
https://patchwork.kernel.org/patch/11695541/ sent by Siddharth.

Changelog:
v5 -> v6:
- Removed priv_cleanup operation from rproc_ops. The dump_segments list is
  updated and cleaned up each time minidump is invoked.
- Split patch #2 into 2 parts - one that adds the rproc_minidump function, and
  the other that uses the new function in the qcom_q6v5_pas driver.
- Updated structs in qcom_minidump to explicitly indicate the endianness of the
  data stored in SMEM, also updated member names.
- Read the global table of contents in SMEM each time adsp_minidump is invoked.

v4 -> v5:
- Fixed adsp_add_minidump_segments to read IO memory using appropriate 
functions.

v3 -> v4:
- Made adsp_priv_cleanup a static function.

v2 -> v3:
- Refactored code to remove dependency on Qualcomm configs.
- Renamed do_rproc_minidump to rproc_minidump and marked as exported
  symbol.

v1 -> v2:
- 3 kernel test robot warnings have been resolved.
- Introduced priv_cleanup op in order to making the cleaning of
  private elements used by the remoteproc more readable.
- Removed rproc_cleanup_priv as it is no longer needed.
- Switched to if/else format for rproc_alloc in order to keep 
  the static const decalaration of adsp_minidump_ops.

Siddharth Gupta (4):
  remoteproc: core: Add ops to enable custom coredump functionality
  remoteproc: coredump: Add minidump functionality
  remoteproc: qcom: Add capability to collect minidumps
  remoteproc: qcom: Add minidump id for sm8150 modem

 drivers/remoteproc/qcom_minidump.h  |  64 ++
 drivers/remoteproc/qcom_q6v5_pas.c  | 105 +-
 drivers/remoteproc/remoteproc_core.c|   6 +-
 drivers/remoteproc/remoteproc_coredump.c| 132 
 drivers/remoteproc/remoteproc_elf_helpers.h |  27 ++
 include/linux/remoteproc.h  |   3 +
 6 files changed, 334 insertions(+), 3 deletions(-)
 create mode 100644 drivers/remoteproc/qcom_minidump.h

-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [Linux-kernel-mentees][PATCH 0/2] reorder members of structures in virtio_net for optimization

2020-10-02 Thread David Miller
From: Anant Thazhemadam 
Date: Wed, 30 Sep 2020 10:47:20 +0530

> The structures virtnet_info and receive_queue have byte holes in 
> middle, and their members could do with some rearranging 
> (order-of-declaration wise) in order to overcome this.
> 
> Rearranging the members helps in:
>   * elimination the byte holes in the middle of the structures
>   * reduce the size of the structure (virtnet_info)
>   * have more members stored in one cache line (as opposed to 
> unnecessarily crossing the cacheline boundary and spanning
> different cachelines)
> 
> The analysis was performed using pahole.
> 
> These patches may be applied in any order.

What effects do these changes have on performance?

The cache locality for various TX and RX paths could be effected.

I'm not applying these patches without some data on the performance
impact.

Thank you.



[PATCH v6 3/4] remoteproc: qcom: Add capability to collect minidumps

2020-10-02 Thread Siddharth Gupta
This patch adds support for collecting minidump in the event of remoteproc
crash. Parse the minidump table based on remoteproc's unique minidump-id,
read all memory regions from the remoteproc's minidump table entry and
expose the memory to userspace. The remoteproc platform driver can choose
to collect a full/mini dump by specifying the coredump op.

Co-developed-by: Rishabh Bhatnagar 
Signed-off-by: Rishabh Bhatnagar 
Co-developed-by: Gurbir Arora 
Signed-off-by: Gurbir Arora 
Signed-off-by: Siddharth Gupta 
---
 drivers/remoteproc/qcom_minidump.h |  64 +++
 drivers/remoteproc/qcom_q6v5_pas.c | 104 -
 2 files changed, 166 insertions(+), 2 deletions(-)
 create mode 100644 drivers/remoteproc/qcom_minidump.h

diff --git a/drivers/remoteproc/qcom_minidump.h 
b/drivers/remoteproc/qcom_minidump.h
new file mode 100644
index 000..5857d06
--- /dev/null
+++ b/drivers/remoteproc/qcom_minidump.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, The Linux Foundation. All rights reserved.
+ */
+
+#ifndef __QCOM_MINIDUMP_H
+#define __QCOM_MINIDUMP_H
+
+#define MAX_NUM_OF_SS   10
+#define MAX_REGION_NAME_LENGTH  16
+#define SBL_MINIDUMP_SMEM_ID   602
+#define MD_REGION_VALID('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' 
<< 0)
+#define MD_SS_ENCR_DONE('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' 
<< 0)
+#define MD_SS_ENABLED  ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0)
+
+/**
+ * struct minidump_region - Minidump region
+ * @name   : Name of the region to be dumped
+ * @seq_num:   : Use to differentiate regions with same name.
+ * @valid  : This entry to be dumped (if set to 1)
+ * @address: Physical address of region to be dumped
+ * @size   : Size of the region
+ */
+struct minidump_region {
+   charname[MAX_REGION_NAME_LENGTH];
+   __le32  seq_num;
+   __le32  valid;
+   __le64  address;
+   __le64  size;
+};
+
+/**
+ * struct minidump_subsystem_toc: Subsystem's SMEM Table of content
+ * @status : Subsystem toc init status
+ * @enabled : if set to 1, this region would be copied during coredump
+ * @encryption_status: Encryption status for this subsystem
+ * @encryption_required : Decides to encrypt the subsystem regions or not
+ * @ss_region_count : Number of regions added in this subsystem toc
+ * @md_ss_smem_regions_baseptr : regions base pointer of the subsystem
+ */
+struct minidump_subsystem_toc {
+   __le32  status;
+   __le32  enabled;
+   __le32  encryption_status;
+   __le32  encryption_required;
+   __le32  ss_region_count;
+   __le64  md_ss_smem_regions_baseptr;
+};
+
+/**
+ * struct minidump_global_toc: Global Table of Content
+ * @md_toc_init : Global Minidump init status
+ * @md_revision : Minidump revision
+ * @md_enable_status : Minidump enable status
+ * @md_ss_toc : Array of subsystems toc
+ */
+struct minidump_global_toc {
+   __le32  status;
+   __le32  md_revision;
+   __le32  enabled;
+   struct minidump_subsystem_toc   md_ss_toc[MAX_NUM_OF_SS];
+};
+
+#endif
diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 3837f23..349f725 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -28,11 +28,13 @@
 #include "qcom_pil_info.h"
 #include "qcom_q6v5.h"
 #include "remoteproc_internal.h"
+#include "qcom_minidump.h"
 
 struct adsp_data {
int crash_reason_smem;
const char *firmware_name;
int pas_id;
+   unsigned int minidump_id;
bool has_aggre2_clk;
bool auto_boot;
 
@@ -63,6 +65,7 @@ struct qcom_adsp {
int proxy_pd_count;
 
int pas_id;
+   unsigned int minidump_id;
int crash_reason_smem;
bool has_aggre2_clk;
const char *info_name;
@@ -116,6 +119,88 @@ static void adsp_pds_disable(struct qcom_adsp *adsp, 
struct device **pds,
}
 }
 
+static void adsp_minidump_cleanup(struct rproc *rproc)
+{
+   struct rproc_dump_segment *entry, *tmp;
+
+   list_for_each_entry_safe(entry, tmp, >dump_segments, node) {
+   list_del(>node);
+   kfree(entry->priv);
+   kfree(entry);
+   }
+}
+
+static void adsp_add_minidump_segments(struct rproc *rproc,
+  struct minidump_subsystem_toc 
*minidump_ss)
+{
+   struct minidump_region __iomem *ptr;
+   struct minidump_region region;
+   int seg_cnt, i;
+   dma_addr_t da;
+   size_t size;
+   char *name;
+
+   if (!list_empty(>dump_segments)) {
+   dev_err(>dev, "dump segment list already populated\n");
+   return;
+   }
+
+   seg_cnt = le32_to_cpu(minidump_ss->ss_region_count);
+   ptr = ioremap((unsigned 

[PATCH v6 4/4] remoteproc: qcom: Add minidump id for sm8150 modem

2020-10-02 Thread Siddharth Gupta
Add minidump id for modem in sm8150 chipset so that the regions to be
included in the coredump generated upon a crash is based on the minidump
tables in SMEM instead of those in the ELF header.

Signed-off-by: Siddharth Gupta 
---
 drivers/remoteproc/qcom_q6v5_pas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 349f725..23f4532 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -707,6 +707,7 @@ static const struct adsp_data mpss_resource_init = {
.crash_reason_smem = 421,
.firmware_name = "modem.mdt",
.pas_id = 4,
+   .minidump_id = 3,
.has_aggre2_clk = false,
.auto_boot = false,
.active_pd_names = (char*[]){
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v6 2/4] remoteproc: coredump: Add minidump functionality

2020-10-02 Thread Siddharth Gupta
This change adds a new kind of core dump mechanism which instead of dumping
entire program segments of the firmware, dumps sections of the remoteproc
memory which are sufficient to allow debugging the firmware. This function
thus uses section headers instead of program headers during creation of the
core dump elf.

Signed-off-by: Rishabh Bhatnagar 
Signed-off-by: Siddharth Gupta 
---
 drivers/remoteproc/remoteproc_coredump.c| 132 
 drivers/remoteproc/remoteproc_elf_helpers.h |  27 ++
 include/linux/remoteproc.h  |   1 +
 3 files changed, 160 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_coredump.c 
b/drivers/remoteproc/remoteproc_coredump.c
index bb15a29..e7d1394 100644
--- a/drivers/remoteproc/remoteproc_coredump.c
+++ b/drivers/remoteproc/remoteproc_coredump.c
@@ -13,6 +13,8 @@
 #include "remoteproc_internal.h"
 #include "remoteproc_elf_helpers.h"
 
+#define MAX_STRTBL_SIZE 512
+
 struct rproc_coredump_state {
struct rproc *rproc;
void *header;
@@ -323,3 +325,133 @@ void rproc_coredump(struct rproc *rproc)
 */
wait_for_completion(_state.dump_done);
 }
+
+/**
+ * rproc_minidump() - perform minidump
+ * @rproc: rproc handle
+ *
+ * This function will generate an ELF header for the registered sections of
+ * segments and create a devcoredump device associated with rproc. Based on
+ * the coredump configuration this function will directly copy the segments
+ * from device memory to userspace or copy segments from device memory to
+ * a separate buffer, which can then be read by userspace.
+ * The first approach avoids using extra vmalloc memory. But it will stall
+ * recovery flow until dump is read by userspace.
+ */
+void rproc_minidump(struct rproc *rproc)
+{
+   struct rproc_dump_segment *segment;
+   void *shdr;
+   void *ehdr;
+   size_t data_size;
+   size_t offset;
+   void *data;
+   u8 class = rproc->elf_class;
+   int shnum;
+   struct rproc_coredump_state dump_state;
+   unsigned int dump_conf = rproc->dump_conf;
+   char *str_tbl = "STR_TBL";
+
+   if (list_empty(>dump_segments) ||
+   dump_conf == RPROC_COREDUMP_DISABLED)
+   return;
+
+   if (class == ELFCLASSNONE) {
+   dev_err(>dev, "Elf class is not set\n");
+   return;
+   }
+
+   /*
+* We allocate two extra section headers. The first one is null.
+* Second section header is for the string table. Also space is
+* allocated for string table.
+*/
+   data_size = elf_size_of_hdr(class) + 2 * elf_size_of_shdr(class) +
+   MAX_STRTBL_SIZE;
+   shnum = 2;
+
+   list_for_each_entry(segment, >dump_segments, node) {
+   data_size += elf_size_of_shdr(class);
+   if (dump_conf == RPROC_COREDUMP_DEFAULT)
+   data_size += segment->size;
+   shnum++;
+   }
+
+   data = vmalloc(data_size);
+   if (!data)
+   return;
+
+   ehdr = data;
+   memset(ehdr, 0, elf_size_of_hdr(class));
+   /* e_ident field is common for both elf32 and elf64 */
+   elf_hdr_init_ident(ehdr, class);
+
+   elf_hdr_set_e_type(class, ehdr, ET_CORE);
+   elf_hdr_set_e_machine(class, ehdr, rproc->elf_machine);
+   elf_hdr_set_e_version(class, ehdr, EV_CURRENT);
+   elf_hdr_set_e_entry(class, ehdr, rproc->bootaddr);
+   elf_hdr_set_e_shoff(class, ehdr, elf_size_of_hdr(class));
+   elf_hdr_set_e_ehsize(class, ehdr, elf_size_of_hdr(class));
+   elf_hdr_set_e_shentsize(class, ehdr, elf_size_of_shdr(class));
+   elf_hdr_set_e_shnum(class, ehdr, shnum);
+   elf_hdr_set_e_shstrndx(class, ehdr, 1);
+
+   /* Set the first section header as null and move to the next one. */
+   shdr = data + elf_hdr_get_e_shoff(class, ehdr);
+   memset(shdr, 0, elf_size_of_shdr(class));
+   shdr += elf_size_of_shdr(class);
+
+   /* Initialize the string table. */
+   offset = elf_hdr_get_e_shoff(class, ehdr) +
+elf_size_of_shdr(class) * elf_hdr_get_e_shnum(class, ehdr);
+   memset(data + offset, 0, MAX_STRTBL_SIZE);
+
+   /* Fill in the string table section header. */
+   memset(shdr, 0, elf_size_of_shdr(class));
+   elf_shdr_set_sh_type(class, shdr, SHT_STRTAB);
+   elf_shdr_set_sh_offset(class, shdr, offset);
+   elf_shdr_set_sh_size(class, shdr, MAX_STRTBL_SIZE);
+   elf_shdr_set_sh_entsize(class, shdr, 0);
+   elf_shdr_set_sh_flags(class, shdr, 0);
+   elf_shdr_set_sh_name(class, shdr, set_section_name(str_tbl, ehdr, 
class));
+   offset += elf_shdr_get_sh_size(class, shdr);
+   shdr += elf_size_of_shdr(class);
+
+   list_for_each_entry(segment, >dump_segments, node) {
+   memset(shdr, 0, elf_size_of_shdr(class));
+   elf_shdr_set_sh_type(class, shdr, SHT_PROGBITS);
+   

Re: [PATCH -next] scsi: snic: convert to use DEFINE_SEQ_ATTRIBUTE macro

2020-10-02 Thread Martin K. Petersen


Liu,

> Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Applied to 5.10/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] vhost-vdpa: fix page pinning leakage in error path

2020-10-02 Thread Jason Wang



On 2020/10/2 上午4:23, Si-Wei Liu wrote:

Pinned pages are not properly accounted particularly when
mapping error occurs on IOTLB update. Clean up dangling
pinned pages for the error path. As the inflight pinned
pages, specifically for memory region that strides across
multiple chunks, would need more than one free page for
book keeping and accounting. For simplicity, pin pages
for all memory in the IOVA range in one go rather than
have multiple pin_user_pages calls to make up the entire
region. This way it's easier to track and account the
pages already mapped, particularly for clean-up in the
error path.

Fixes: 20453a45fb06 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu 
---
  drivers/vhost/vdpa.c | 121 +++
  1 file changed, 73 insertions(+), 48 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 796fe97..abc4aa2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -565,6 +565,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
  perm_to_iommu_flags(perm));
}
  
+	if (r)

+   vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
return r;
  }



Please use a separate patch for this fix.


  
@@ -592,21 +594,19 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,

struct vhost_dev *dev = >vdev;
struct vhost_iotlb *iotlb = dev->iotlb;
struct page **page_list;
-   unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
+   struct vm_area_struct **vmas;
unsigned int gup_flags = FOLL_LONGTERM;
-   unsigned long npages, cur_base, map_pfn, last_pfn = 0;
-   unsigned long locked, lock_limit, pinned, i;
+   unsigned long map_pfn, last_pfn = 0;
+   unsigned long npages, lock_limit;
+   unsigned long i, nmap = 0;
u64 iova = msg->iova;
+   long pinned;
int ret = 0;
  
  	if (vhost_iotlb_itree_first(iotlb, msg->iova,

msg->iova + msg->size - 1))
return -EEXIST;
  
-	page_list = (struct page **) __get_free_page(GFP_KERNEL);

-   if (!page_list)
-   return -ENOMEM;
-
if (msg->perm & VHOST_ACCESS_WO)
gup_flags |= FOLL_WRITE;
  
@@ -614,61 +614,86 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,

if (!npages)
return -EINVAL;
  
+	page_list = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);

+   vmas = kvmalloc_array(npages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+   if (!page_list || !vmas) {
+   ret = -ENOMEM;
+   goto free;
+   }
+
mmap_read_lock(dev->mm);
  
-	locked = atomic64_add_return(npages, >mm->pinned_vm);

lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-   if (locked > lock_limit) {
+   if (npages + atomic64_read(>mm->pinned_vm) > lock_limit) {
ret = -ENOMEM;
-   goto out;
+   goto unlock;
}
  
-	cur_base = msg->uaddr & PAGE_MASK;

-   iova &= PAGE_MASK;
+   pinned = pin_user_pages(msg->uaddr & PAGE_MASK, npages, gup_flags,
+   page_list, vmas);
+   if (npages != pinned) {
+   if (pinned < 0) {
+   ret = pinned;
+   } else {
+   unpin_user_pages(page_list, pinned);
+   ret = -ENOMEM;
+   }
+   goto unlock;
+   }
  
-	while (npages) {

-   pinned = min_t(unsigned long, npages, list_size);
-   ret = pin_user_pages(cur_base, pinned,
-gup_flags, page_list, NULL);
-   if (ret != pinned)
-   goto out;
-
-   if (!last_pfn)
-   map_pfn = page_to_pfn(page_list[0]);
-
-   for (i = 0; i < ret; i++) {
-   unsigned long this_pfn = page_to_pfn(page_list[i]);
-   u64 csize;
-
-   if (last_pfn && (this_pfn != last_pfn + 1)) {
-   /* Pin a contiguous chunk of memory */
-   csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
-   if (vhost_vdpa_map(v, iova, csize,
-  map_pfn << PAGE_SHIFT,
-  msg->perm))
-   goto out;
-   map_pfn = this_pfn;
-   iova += csize;
+   iova &= PAGE_MASK;
+   map_pfn = page_to_pfn(page_list[0]);
+
+   /* One more iteration to avoid extra vdpa_map() call out of loop. */
+   for (i = 0; i <= npages; i++) {
+   unsigned long this_pfn;
+   u64 csize;
+
+   /* The last chunk may have no valid PFN 

Re: Litmus test for question from Al Viro

2020-10-02 Thread Alan Stern
On Thu, Oct 01, 2020 at 02:30:48PM -0700, Paul E. McKenney wrote:
> > Not the way I would have done it, but okay.  I would have modeled the 
> > kfree by setting a and b both to some sentinel value.
> 
> Might be well worth pursuing!  But how would you model the address
> dependencies in that approach?

Al's original test never writes to V.  So the address dependencies don't 
matter.

> > Why didn't this flag the data race?
> 
> Because I turned Al's simple assignments into *_ONCE() or better.
> In doing this, I was following the default KCSAN settings which
> (for better or worse) forgive the stores from data races.

Ah, yes.  I had realized that when reading the litmus test for the first 
time, and then forgot it.

> With your suggested change and using simple assignments where Al
> indicated them:
> 
> 
> 
> $ herd7 -conf linux-kernel.cfg 
> ~/paper/scalability/LWNLinuxMM/litmus/manual/kernel/C-viro-2020.09.29a.litmus
> Test C-viro-2020.09.29a Allowed
> States 5
> 0:r0=0; 0:r1=1; 0:r2=2; 0:r8=b; 0:r9a=0; 0:r9b=0; 1:r0=1; 1:r1=0; 1:r2=1; 
> 1:r8=a; 1:r9a=1; 1:r9b=2; 1:r9c=2; a=0; b=1; v=0;
> 0:r0=0; 0:r1=1; 0:r2=2; 0:r8=b; 0:r9a=1; 0:r9b=0; 1:r0=1; 1:r1=0; 1:r2=1; 
> 1:r8=a; 1:r9a=1; 1:r9b=2; 1:r9c=2; a=0; b=1; v=0;
> 0:r0=0; 0:r1=1; 0:r2=2; 0:r8=b; 0:r9a=1; 0:r9b=1; 1:r0=0; 1:r1=1; 1:r2=2; 
> 1:r8=a; 1:r9a=1; 1:r9b=1; 1:r9c=1; a=0; b=1; v=1;
> 0:r0=0; 0:r1=1; 0:r2=2; 0:r8=b; 0:r9a=1; 0:r9b=1; 1:r0=1; 1:r1=0; 1:r2=1; 
> 1:r8=a; 1:r9a=1; 1:r9b=2; 1:r9c=2; a=0; b=1; v=0;
> 0:r0=0; 0:r1=1; 0:r2=2; 0:r8=b; 0:r9a=1; 0:r9b=1; 1:r0=1; 1:r1=1; 1:r2=1; 
> 1:r8=a; 1:r9a=1; 1:r9b=1; 1:r9c=1; a=0; b=1; v=0;
> Ok
> Witnesses
> Positive: 3 Negative: 2
> Flag data-race
> Condition exists (0:r0=1:r0 \/ v=1 \/ 0:r2=0 \/ 1:r2=0 \/ 0:r9a=0 \/ 0:r9b=0 
> \/ 1:r9a=0 \/ 1:r9b=0 \/ 1:r9c=0)
> Observation C-viro-2020.09.29a Sometimes 3 2
> Time C-viro-2020.09.29a 17.95
> Hash=14ded51102b668bc38b790e8c3692227
> 
> 
> 
> So still "Sometimes", but the "Flag data-race" you expected is there.
> 
> I posted the updated litmus test below.  Additional or other thoughts?

Two problems remaining.  One in the litmus test and one in the memory 
model itself...

> 
> 
> C C-viro-2020.09.29a
> 
> {
>   int a = 1;
>   int b = 1;
>   int v = 1;
> }
> 
> 
> P0(int *a, int *b, int *v, spinlock_t *l)
> {
>   int r0;
>   int r1;
>   int r2 = 2;
>   int r8;
>   int r9a = 2;
>   int r9b = 2;
> 
>   r0 = 0;
>   spin_lock(l);
>   r9a = READ_ONCE(*v); // Use after free?
>   r8 = r9a - r9a; // Restore address dependency
>   r8 = b + r8;
>   r1 = smp_load_acquire(r8);
>   if (r1 == 0)
>   r0 = 1;
>   r9b = READ_ONCE(*v); // Use after free?
>   // WRITE_ONCE(*a, r9b - r9b); // Use data dependency
>   *a = r9b - r9b; // Use data dependency
>   spin_unlock(l);
>   if (r0) {
>   r2 = READ_ONCE(*v);
>   WRITE_ONCE(*v, 0); /* kfree(). */
>   }
> }
> 
> P1(int *a, int *b, int *v, spinlock_t *l)
> {
>   int r0;
>   int r1;
>   int r1a;
>   int r2 = 2;
>   int r8;
>   int r9a = 2;
>   int r9b = 2;
>   int r9c = 2;
> 
>   r0 = 1;
>   r9a = READ_ONCE(*v); // Use after free?
>   r8 = r9a - r9a; // Restore address dependency
>   r8 = a + r8;
>   r1 = READ_ONCE(*r8);
>   if (r1) {
>   spin_lock(l);
>   r9b = READ_ONCE(*v); // Use after free?
>   r8 = r9b - r9b; // Restore address dependency
>   r8 = a + r8;
>   // r1a = READ_ONCE(*r8);
>   r1a = *r8;
>   if (r1a)
>   r0 = 0;
>   r9c = READ_ONCE(*v); // Use after free?
>   smp_store_release(b, r9c - rc9); // Use data dependency
---^^^
Typo: this should be r9c.  Too bad herd7 doesn't warn about undeclared 
local variables.

>   spin_unlock(l);
>   }
>   if (r0) {
>   r2 = READ_ONCE(*v);
>   WRITE_ONCE(*v, 0); /* kfree(). */
>   }
> }
> 
> locations [a;b;v;0:r1;0:r8;1:r1;1:r8]
> exists (0:r0=1:r0 \/ (* Both or neither did kfree(). *)
>   v=1 \/ (* Neither did kfree, redundant check. *)
>   0:r2=0 \/ 1:r2=0 \/  (* Both did kfree, redundant check. *)
>   0:r9a=0 \/ 0:r9b=0 \/ 1:r9a=0 \/ (* CPU1 use after free. *)
>   1:r9b=0 \/ 1:r9c=0) (* CPU2 use after free. *)

When you fix the typo, the test still fails.  But now it all makes 
sense.  The reason for the failure is because of the way we don't model 
control dependencies.

In short, suppose P1 reads 0 for V->A.  Then it does:

if (READ_ONCE(V->A)) {
... skipped ...
}
WRITE_ONCE(V, 0); /* actually kfree(to_free); */

Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available

2020-10-02 Thread Thomas Gleixner
On Fri, Oct 02 2020 at 17:38, Oleg Nesterov wrote:
> On 10/02, Thomas Gleixner wrote:
>>
>> I think it's fundamentaly wrong that we have several places and several
>> flags which handle task_work_run() instead of having exactly one place
>> and one flag.
>
> Damn yes, agreed.

Actually there are TWO places, but they don't interfere:

   1) exit to user

   2) enter guest

>From the kernel POV they are pretty much the same as both are leaving
the kernel domain. But they have a few subtle different requirements
what has to be done or not.

So any change to that logic needs to fixup both places,

Thanks,

tglx


Re: [PATCH] scsi: mvumi: Fix error return in mvumi_io_attach()

2020-10-02 Thread Martin K. Petersen


Jing,

> Fix to return error code PTR_ERR() from the error handling case instead
> of 0.

Applied to 5.10/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V5 3/3] i2c: i2c-qcom-geni: Add shutdown callback for i2c

2020-10-02 Thread Stephen Boyd
Quoting Roja Rani Yarubandi (2020-10-01 01:44:25)
> diff --git a/drivers/i2c/busses/i2c-qcom-geni.c 
> b/drivers/i2c/busses/i2c-qcom-geni.c
> index aee2a1dd2c62..56d3fbfe7eb6 100644
> --- a/drivers/i2c/busses/i2c-qcom-geni.c
> +++ b/drivers/i2c/busses/i2c-qcom-geni.c
> @@ -380,6 +380,36 @@ static void geni_i2c_tx_msg_cleanup(struct geni_i2c_dev 
> *gi2c,
> }
>  }
>  
> +static void geni_i2c_stop_xfer(struct geni_i2c_dev *gi2c)
> +{
> +   int ret;
> +   u32 geni_status;
> +   unsigned long flags;
> +   struct i2c_msg *cur;
> +
> +   /* Resume device, runtime suspend can happen anytime during transfer 
> */
> +   ret = pm_runtime_get_sync(gi2c->se.dev);
> +   if (ret < 0) {
> +   dev_err(gi2c->se.dev, "Failed to resume device: %d\n", ret);
> +   return;
> +   }
> +
> +   spin_lock_irqsave(>lock, flags);

We grab the lock here.

> +   geni_status = readl_relaxed(gi2c->se.base + SE_GENI_STATUS);
> +   if (!(geni_status & M_GENI_CMD_ACTIVE))
> +   goto out;
> +
> +   cur = gi2c->cur;
> +   geni_i2c_abort_xfer(gi2c);

But it looks like this function takes the lock again? Did you test this
with lockdep enabled? It should hang even without lockdep, so it seems
like this path of code has not been tested.

> +   if (cur->flags & I2C_M_RD)
> +   geni_i2c_rx_msg_cleanup(gi2c, cur);
> +   else
> +   geni_i2c_tx_msg_cleanup(gi2c, cur);
> +   spin_unlock_irqrestore(>lock, flags);
> +out:
> +   pm_runtime_put_sync_suspend(gi2c->se.dev);
> +}
> +
>  static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg 
> *msg,
> u32 m_param)
>  {


Re: [PATCH 0/3] x86: Add initial support to discover Intel hybrid CPUs

2020-10-02 Thread Thomas Gleixner
On Fri, Oct 02 2020 at 13:19, Ricardo Neri wrote:
> Add support to discover and enumerate CPUs in Intel hybrid parts. A hybrid
> part has CPUs with more than one type of micro-architecture. Thus, certain
> features may only be present in a specific CPU type.
>
> It is useful to know the type of CPUs present in a system. For instance,
> perf may need to handle CPUs differently depending on the type of micro-
> architecture. Decoding machine check error logs may need the additional
> micro-architecture type information, so include that in the log.

'It is useful' as justification just makes me barf.

> A hybrid part can be identified by reading a new CPUID feature bit.
> Likewise, CPUID contains information about the CPU type as well as a new
> native model ID. Details can be found in the Intel manual (SDM, [1]).
>
> This series adds support for Intel hybrid parts in two areas: a) adding
> the hybrid feature bit as well as struct cpuinfo_x86; and b) decode machine
> check errors on hybrid parts.

Bla, bla, bla.

> A later submission will use the proposed functionality to expose the CPU
> topology to user space.

The only patch which is accepted for now is:

if (boot_cpu_has(X86_FEATURE_HYBRID_CPU))
panic("Unsuppported insanity\n");

I'm not all all willing to take anything else unless you or someone else
provides a reasonable explanation for the overall approach of supporting
this mess inlcuding stable kernels.

This has been clearly communicated years ago when the topic was
discussed at one of the Intel Techday events. It's not my problem if
Intel internal communication is disfunctional.

Just to bring you up to speed:

 1) The whole CPU enumeration of x86 sucks and is in no way prepared
to deal with heterogenous CPU faetures

Boris and I have discussed this with Intel and on LKML and there
are ideas how to clean up that mess.

This needs to be solved first before we even start to talk about
this CPU has FOO but the other does not.

 2) Intel has been told clearly that a prerequisite of adding any of
this is a well defined programming model and a proper design of
dealing with it at the kernel level.

Since that discussion at one of the Intel events I haven't heard
and seen anything related to that.

If Intel thinks that some magic PDF and some Intel internal
'works for me' patches are solving it, then I just have to give
up because explaining the requirements again is just waste of
time.

So I'm taking Patch 1/3 which defines the misfeature flag and then put
something like the above on top which will prevent booting on any of
these machines.

These two patches are going to be marked for stable simply because any
attempt to use any of these asymetric features is a recipe to
disaster. And that disaster is going to happen simply because user space
can use CPUID to figure out what a CPU supports. I'm not at all
interested in the resulting wreckage reports.

It's a sad state of affairs that the only outcome of a dicsussion which
touched all of the above is a patch set which paves the path to hell.

Not going to happen.

Thanks,

tglx




Re: [PATCH v6 11/14] misc: bcm-vk: add BCM_VK_QSTATS

2020-10-02 Thread Florian Fainelli




On 10/2/2020 2:23 PM, Scott Branden wrote:

Add BCM_VK_QSTATS Kconfig option to allow for enabling debug VK
queue statistics.

These statistics keep track of max, abs_max, and average for the
messages queues.

Co-developed-by: Desmond Yan 
Signed-off-by: Desmond Yan 
Signed-off-by: Scott Branden 


would not it make more sense to have those debug prints be trace printks 
instead? Given what you explained in the previous patch version and the 
desire to correlate with other system wide activity, that might make 
more sense. Looking at the kernel's log for debugging performance or 
utilization or just to get a glimpse of what is going on is not quite 
suited past probe.

--
Florian


[PATCH V8 5/5] platform/x86: Intel PMT Crashlog capability driver

2020-10-02 Thread David E. Box
From: Alexander Duyck 

Add support for the Intel Platform Monitoring Technology crashlog
interface. This interface provides a few sysfs values to allow for
controlling the crashlog telemetry interface as well as a character
driver to allow for mapping the crashlog memory region so that it can be
accessed after a crashlog has been recorded.

This driver is meant to only support the server version of the crashlog
which is identified as crash_type 1 with a version of zero. Currently no
other types are supported.

Signed-off-by: Alexander Duyck 
---
 .../ABI/testing/sysfs-class-intel_pmt |  65 
 drivers/platform/x86/Kconfig  |  11 +
 drivers/platform/x86/Makefile |   1 +
 drivers/platform/x86/intel_pmt_crashlog.c | 328 ++
 4 files changed, 405 insertions(+)
 create mode 100644 drivers/platform/x86/intel_pmt_crashlog.c

diff --git a/Documentation/ABI/testing/sysfs-class-intel_pmt 
b/Documentation/ABI/testing/sysfs-class-intel_pmt
index 926b5cf95fd1..ed4c886a21b1 100644
--- a/Documentation/ABI/testing/sysfs-class-intel_pmt
+++ b/Documentation/ABI/testing/sysfs-class-intel_pmt
@@ -52,3 +52,68 @@ Contact: David Box 
 Description:
(RO) The offset of telemetry region in bytes that corresponds to
the mapping for the telem file.
+
+What:  /sys/class/intel_pmt/crashlog
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   Alexander Duyck 
+Description:
+   The crashlog directory contains files for configuring an
+   instance of a PMT crashlog device that can perform crash data
+   recording. Each crashlog device has an associated crashlog
+   file. This file can be opened and mapped or read to access the
+   resulting crashlog buffer. The register layout for the buffer
+   can be determined from an XML file of specified GUID for the
+   parent device.
+
+What:  /sys/class/intel_pmt/crashlog/crashlog
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   David Box 
+Description:
+   (RO) The crashlog buffer for this crashlog device. This file
+   may be mapped or read to obtain the data.
+
+What:  /sys/class/intel_pmt/crashlog/guid
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   Alexander Duyck 
+Description:
+   (RO) The GUID for this crashlog device. The GUID identifies the
+   version of the XML file for the parent device that should be
+   used to determine the register layout.
+
+What:  /sys/class/intel_pmt/crashlog/size
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   Alexander Duyck 
+Description:
+   (RO) The length of the result buffer in bytes that corresponds
+   to the size for the crashlog buffer.
+
+What:  /sys/class/intel_pmt/crashlog/offset
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   Alexander Duyck 
+Description:
+   (RO) The offset of the buffer in bytes that corresponds
+   to the mapping for the crashlog device.
+
+What:  /sys/class/intel_pmt/crashlog/enable
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   Alexander Duyck 
+Description:
+   (RW) Boolean value controlling if the crashlog functionality
+   is enabled for the crashlog device.
+
+What:  /sys/class/intel_pmt/crashlog/trigger
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   Alexander Duyck 
+Description:
+   (RW) Boolean value controlling the triggering of the crashlog
+   device node. When read it provides data on if the crashlog has
+   been triggered. When written to it can be used to either clear
+   the current trigger by writing false, or to trigger a new
+   event if the trigger is not currently set.
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 90b4c1bd9532..4ac555a9916b 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -1383,6 +1383,17 @@ config INTEL_PMT_TELEMETRY
  To compile this driver as a module, choose M here: the module
  will be called intel_pmt_telemetry.
 
+config INTEL_PMT_CRASHLOG
+   tristate "Intel Platform Monitoring Technology (PMT) Crashlog driver"
+   select INTEL_PMT_CLASS
+   help
+ The Intel Platform Monitoring Technology (PMT) crashlog driver 
provides
+ access to hardware crashlog capabilities on devices that support the
+ feature.
+
+ To compile this driver as a module, choose M here: the module
+ will be called intel_pmt_crashlog.
+
 config INTEL_PUNIT_IPC
tristate "Intel P-Unit IPC Driver"
help
diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
index 6a7b61f59ea8..ca82c1344977 100644
--- 

[PATCH V8 4/5] platform/x86: Intel PMT Telemetry capability driver

2020-10-02 Thread David E. Box
From: Alexander Duyck 

PMT Telemetry is a capability of the Intel Platform Monitoring Technology.
The Telemetry capability provides access to device telemetry metrics that
provide hardware performance data to users from read-only register spaces.

With this driver present the intel_pmt directory can be populated with
telem devices. These devices will contain the standard intel_pmt sysfs
data and a "telem" binary sysfs attribute which can be used to access the
telemetry data.

Also create a PCI device id list for early telemetry hardware that require
workarounds for known issues.

Co-developed-by: David E. Box 
Signed-off-by: David E. Box 
Signed-off-by: Alexander Duyck 
---
 drivers/platform/x86/Kconfig   |  11 ++
 drivers/platform/x86/Makefile  |   1 +
 drivers/platform/x86/intel_pmt_telemetry.c | 160 +
 3 files changed, 172 insertions(+)
 create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c

diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 12431e0e974d..90b4c1bd9532 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -1372,6 +1372,17 @@ config INTEL_PMT_CLASS
  To compile this driver as a module, choose M here: the module
  will be called intel_pmt_class.
 
+config INTEL_PMT_TELEMETRY
+   tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver"
+   select INTEL_PMT_CLASS
+   help
+ The Intel Platform Monitory Technology (PMT) Telemetry driver provides
+ access to hardware telemetry metrics on devices that support the
+ feature.
+
+ To compile this driver as a module, choose M here: the module
+ will be called intel_pmt_telemetry.
+
 config INTEL_PUNIT_IPC
tristate "Intel P-Unit IPC Driver"
help
diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
index f4b1f87f2401..6a7b61f59ea8 100644
--- a/drivers/platform/x86/Makefile
+++ b/drivers/platform/x86/Makefile
@@ -141,6 +141,7 @@ obj-$(CONFIG_INTEL_MID_POWER_BUTTON)+= 
intel_mid_powerbtn.o
 obj-$(CONFIG_INTEL_MRFLD_PWRBTN)   += intel_mrfld_pwrbtn.o
 obj-$(CONFIG_INTEL_PMC_CORE)   += intel_pmc_core.o 
intel_pmc_core_pltdrv.o
 obj-$(CONFIG_INTEL_PMT_CLASS)  += intel_pmt_class.o
+obj-$(CONFIG_INTEL_PMT_TELEMETRY)  += intel_pmt_telemetry.o
 obj-$(CONFIG_INTEL_PUNIT_IPC)  += intel_punit_ipc.o
 obj-$(CONFIG_INTEL_SCU_IPC)+= intel_scu_ipc.o
 obj-$(CONFIG_INTEL_SCU_PCI)+= intel_scu_pcidrv.o
diff --git a/drivers/platform/x86/intel_pmt_telemetry.c 
b/drivers/platform/x86/intel_pmt_telemetry.c
new file mode 100644
index ..f8a87614efa4
--- /dev/null
+++ b/drivers/platform/x86/intel_pmt_telemetry.c
@@ -0,0 +1,160 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel Platform Monitory Technology Telemetry driver
+ *
+ * Copyright (c) 2020, Intel Corporation.
+ * All Rights Reserved.
+ *
+ * Author: "David E. Box" 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "intel_pmt_class.h"
+
+#define TELEM_DEV_NAME "pmt_telemetry"
+
+#define TELEM_SIZE_OFFSET  0x0
+#define TELEM_GUID_OFFSET  0x4
+#define TELEM_BASE_OFFSET  0x8
+#define TELEM_ACCESS(v)((v) & GENMASK(3, 0))
+/* size is in bytes */
+#define TELEM_SIZE(v)  (((v) & GENMASK(27, 12)) >> 10)
+
+/* Used by client hardware to identify a fixed telemetry entry*/
+#define TELEM_CLIENT_FIXED_BLOCK_GUID  0x1000
+
+struct pmt_telem_priv {
+   int num_entries;
+   struct intel_pmt_entry  entry[];
+};
+
+/*
+ * Early implementations of PMT on client platforms have some
+ * differences from the server platforms (which use the Out Of Band
+ * Management Services Module OOBMSM). This list tracks those
+ * platforms as needed to handle those differences. Newer client
+ * platforms are expected to be fully compatible with server.
+ */
+static const struct pci_device_id pmt_telem_early_client_pci_ids[] = {
+   { PCI_VDEVICE(INTEL, 0x9a0d) }, /* TGL */
+   { PCI_VDEVICE(INTEL, 0x467d) }, /* ADL */
+   { }
+};
+
+static bool intel_pmt_is_early_client_hw(struct device *dev)
+{
+   struct pci_dev *parent = to_pci_dev(dev->parent);
+
+   return !!pci_match_id(pmt_telem_early_client_pci_ids, parent);
+}
+
+static bool pmt_telem_region_overlaps(struct intel_pmt_entry *entry,
+ struct device *dev)
+{
+   u32 guid = readl(entry->disc_table + TELEM_GUID_OFFSET);
+
+   if (guid != TELEM_CLIENT_FIXED_BLOCK_GUID)
+   return false;
+
+   return intel_pmt_is_early_client_hw(dev);
+}
+
+static int pmt_telem_header_decode(struct intel_pmt_entry *entry,
+  struct intel_pmt_header *header,
+  struct device *dev)
+{
+   void __iomem *disc_table = entry->disc_table;
+
+   if 

[PATCH V8 1/5] PCI: Add defines for Designated Vendor-Specific Extended Capability

2020-10-02 Thread David E. Box
Add PCIe Designated Vendor-Specific Extended Capability (DVSEC) and defines
for the header offsets. Defined in PCIe r5.0, sec 7.9.6.

Signed-off-by: David E. Box 
Acked-by: Bjorn Helgaas 
Reviewed-by: Andy Shevchenko 
---
 include/uapi/linux/pci_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f9701410d3b5..beafeee39e44 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -720,6 +720,7 @@
 #define PCI_EXT_CAP_ID_DPC 0x1D/* Downstream Port Containment */
 #define PCI_EXT_CAP_ID_L1SS0x1E/* L1 PM Substates */
 #define PCI_EXT_CAP_ID_PTM 0x1F/* Precision Time Measurement */
+#define PCI_EXT_CAP_ID_DVSEC   0x23/* Designated Vendor-Specific */
 #define PCI_EXT_CAP_ID_DLF 0x25/* Data Link Feature */
 #define PCI_EXT_CAP_ID_PL_16GT 0x26/* Physical Layer 16.0 GT/s */
 #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT
@@ -1062,6 +1063,10 @@
 #define  PCI_L1SS_CTL1_LTR_L12_TH_SCALE0xe000  /* 
LTR_L1.2_THRESHOLD_Scale */
 #define PCI_L1SS_CTL2  0x0c/* Control 2 Register */
 
+/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
+#define PCI_DVSEC_HEADER1  0x4 /* Designated Vendor-Specific 
Header1 */
+#define PCI_DVSEC_HEADER2  0x8 /* Designated Vendor-Specific 
Header2 */
+
 /* Data Link Feature */
 #define PCI_DLF_CAP0x04/* Capabilities Register */
 #define  PCI_DLF_EXCHANGE_ENABLE   0x8000  /* Data Link Feature 
Exchange Enable */
-- 
2.20.1



[PATCH V8 3/5] platform/x86: Intel PMT class driver

2020-10-02 Thread David E. Box
From: Alexander Duyck 

Intel Platform Monitoring Technology is meant to provide a common way to
access telemetry and system metrics.

Register mappings are not provided by the driver. Instead, a GUID is read
from a header for each endpoint. The GUID identifies the device and is to
be used with an XML, provided by the vendor, to discover the available set
of metrics and their register mapping.  This allows firmware updates to
modify the register space without needing to update the driver every time
with new mappings. Firmware writes a new GUID in this case to specify the
new mapping.  Software tools with access to the associated XML file can
then interpret the changes.

The module manages access to all Intel PMT endpoints on a system,
independent of the device exporting them. It creates an intel_pmt class
to manage the devices. For each telemetry endpoint, sysfs files provide
GUID and size information as well as a pointer to the parent device the
telemetry came from. Software may discover the association between
endpoints and devices by iterating through the list in sysfs, or by looking
for the existence of the class folder under the device of interest.  A
binary sysfs attribute of the same name allows software to then read or map
the telemetry space for direct access.

Signed-off-by: Alexander Duyck 
---
 .../ABI/testing/sysfs-class-intel_pmt |  54 
 MAINTAINERS   |   1 +
 drivers/platform/x86/Kconfig  |  12 +
 drivers/platform/x86/Makefile |   1 +
 drivers/platform/x86/intel_pmt_class.c| 297 ++
 drivers/platform/x86/intel_pmt_class.h|  52 +++
 6 files changed, 417 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-intel_pmt
 create mode 100644 drivers/platform/x86/intel_pmt_class.c
 create mode 100644 drivers/platform/x86/intel_pmt_class.h

diff --git a/Documentation/ABI/testing/sysfs-class-intel_pmt 
b/Documentation/ABI/testing/sysfs-class-intel_pmt
new file mode 100644
index ..926b5cf95fd1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-intel_pmt
@@ -0,0 +1,54 @@
+What:  /sys/class/intel_pmt/
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   David Box 
+Description:
+   The intel_pmt/ class directory contains information for
+   devices that expose hardware telemetry using Intel Platform
+   Monitoring Technology (PMT)
+
+What:  /sys/class/intel_pmt/telem
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   David Box 
+Description:
+   The telem directory contains files describing an instance of
+   a PMT telemetry device that exposes hardware telemetry. Each
+   telem directory has an associated telem file. This file
+   may be opened and mapped or read to access the telemetry space
+   of the device. The register layout of the telemetry space is
+   determined from an XML file that matches the PCI device id and
+   GUID for the device.
+
+What:  /sys/class/intel_pmt/telem/telem
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   David Box 
+Description:
+   (RO) The telemetry data for this telemetry device. This file
+   may be mapped or read to obtain the data.
+
+What:  /sys/class/intel_pmt/telem/guid
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   David Box 
+Description:
+   (RO) The GUID for this telemetry device. The GUID identifies
+   the version of the XML file for the parent device that is to
+   be used to get the register layout.
+
+What:  /sys/class/intel_pmt/telem/size
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   David Box 
+Description:
+   (RO) The size of telemetry region in bytes that corresponds to
+   the mapping size for the telem file.
+
+What:  /sys/class/intel_pmt/telem/offset
+Date:  October 2020
+KernelVersion: 5.10
+Contact:   David Box 
+Description:
+   (RO) The offset of telemetry region in bytes that corresponds to
+   the mapping for the telem file.
diff --git a/MAINTAINERS b/MAINTAINERS
index 0f2663b1d376..47fdb8a6e151 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8950,6 +8950,7 @@ INTEL PMT DRIVER
 M: "David E. Box" 
 S: Maintained
 F: drivers/mfd/intel_pmt.c
+F: drivers/platform/x86/intel_pmt_*
 
 INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT
 M: Stanislav Yakovlev 
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 40219bba6801..12431e0e974d 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -1360,6 +1360,18 @@ config INTEL_PMC_CORE
- LTR Ignore
- MPHY/PLL gating status (Sunrisepoint PCH only)
 
+config INTEL_PMT_CLASS
+

[PATCH V8 0/5] Intel Platform Monitoring Technology

2020-10-02 Thread David E. Box
Intel Platform Monitoring Technology (PMT) is an architecture for
enumerating and accessing hardware monitoring capabilities on a device.
With customers increasingly asking for hardware telemetry, engineers not
only have to figure out how to measure and collect data, but also how to
deliver it and make it discoverable. The latter may be through some device
specific method requiring device specific tools to collect the data. This
in turn requires customers to manage a suite of different tools in order to
collect the differing assortment of monitoring data on their systems.  Even
when such information can be provided in kernel drivers, they may require
constant maintenance to update register mappings as they change with
firmware updates and new versions of hardware. PMT provides a solution for
discovering and reading telemetry from a device through a hardware agnostic
framework that allows for updates to systems without requiring patches to
the kernel or software tools.

PMT defines several capabilities to support collecting monitoring data from
hardware. All are discoverable as separate instances of the PCIE Designated
Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID
field uniquely identifies the capability. Each DVSEC also provides a BAR
offset to a header that defines capability-specific attributes, including
GUID, feature type, offset and length, as well as configuration settings
where applicable. The GUID uniquely identifies the register space of any
monitor data exposed by the capability. The GUID is associated with an XML
file from the vendor that describes the mapping of the register space along
with properties of the monitor data. This allows vendors to perform
firmware updates that can change the mapping (e.g. add new metrics) without
requiring any changes to drivers or software tools. The new mapping is
confirmed by an updated GUID, read from the hardware, which software uses
with a new XML.

The current capabilities defined by PMT are Telemetry, Watcher, and
Crashlog.  The Telemetry capability provides access to a continuous block
of read only data. The Watcher capability provides access to hardware
sampling and tracing features. Crashlog provides access to device crash
dumps.  While there is some relationship between capabilities (Watcher can
be configured to sample from the Telemetry data set) each exists as stand
alone features with no dependency on any other. The design therefore splits
them into individual, capability specific drivers. MFD is used to create
platform devices for each capability so that they may be managed by their
own driver. The PMT architecture is (for the most part) agnostic to the
type of device it can collect from. Software can determine which devices
support a PMT feature by searching through each device node entry in the
sysfs class folder. It can additionally determine if a particular device
supports a PMT feature by checking for a PMT class folder in the device
folder.

This patch set provides support for the PMT framework, along with support
for Telemetry on Tiger Lake.

Changes from V7:
Link: 
https://lore.kernel.org/lkml/20201001014250.26987-1-david.e@linux.intel.com/

- Refactor to minimize code duplication by putting more setup code 
  in the common intel_pmt_dev_create(). 
- Add and use a function pointer to handle capability specific
  header decoding.
- Add comment on usage of early_client_pci_ids list and add
  Alder Lake PCI ID to the list.
- Remove unneeded check on the count variable in intel_pmt_read().
- Add missing header functions.
- Specify module names in Kconfig.
- Fix spelling errors across patch set.

Changes from V6:
- Use NULL for OOBMSM driver data instead of an empty struct.
  Rewrite the code to check for NULL driver_data.
- Fix spelling and formatting in Kconfig.
- Use MKDEV(0,0) to prevent unneeded device node from being
  created.

Changes from V5:
- Add Alder Lake and the "Out of Band Management Services
  Module (OOBMSM)" ids to the MFD driver. Transferred to this
  patch set.
- Use a single class for all PMT capabilities as suggested by
  Hans.
- Add binary attribute for telemetry driver to allow read
  syscall as suggested by Hans.
- Use the class file to hold attributes and other common code
  used by all PMT drivers.
- Add the crashlog driver to the patch set and add a mutex to
  protect access to the enable control and trigger files as
  suggested by Hans.

Changes from V4:
- Replace MFD with PMT in driver title
- Fix commit tags in chronological order
- Fix includes in alphabetical order
- Use 'raw' string instead of defines for device names
- Add an error message when returning an error code for
  unrecognized capability id
- Use 

[PATCH V8 2/5] mfd: Intel Platform Monitoring Technology support

2020-10-02 Thread David E. Box
Intel Platform Monitoring Technology (PMT) is an architecture for
enumerating and accessing hardware monitoring facilities. PMT supports
multiple types of monitoring capabilities. This driver creates platform
devices for each type so that they may be managed by capability specific
drivers (to be introduced). Capabilities are discovered using PCIe DVSEC
ids. Support is included for the 3 current capability types, Telemetry,
Watcher, and Crashlog. The features are available on new Intel platforms
starting from Tiger Lake for which support is added. This patch adds
support for Tiger Lake (TGL), Alder Lake (ADL), and Out-of-Band Management
Services Module (OOBMSM).

Also add a quirk mechanism for several early hardware differences and bugs.
For Tiger Lake and Alder Lake, do not support Watcher and Crashlog
capabilities since they will not be compatible with future product. Also,
fix use a quirk to fix the discovery table offset.

Co-developed-by: Alexander Duyck 
Signed-off-by: Alexander Duyck 
Signed-off-by: David E. Box 
Reviewed-by: Andy Shevchenko 
---
 MAINTAINERS |   5 +
 drivers/mfd/Kconfig |  10 ++
 drivers/mfd/Makefile|   1 +
 drivers/mfd/intel_pmt.c | 226 
 4 files changed, 242 insertions(+)
 create mode 100644 drivers/mfd/intel_pmt.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 190c7fa2ea01..0f2663b1d376 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8946,6 +8946,11 @@ F:   drivers/mfd/intel_soc_pmic*
 F: include/linux/mfd/intel_msic.h
 F: include/linux/mfd/intel_soc_pmic*
 
+INTEL PMT DRIVER
+M: "David E. Box" 
+S: Maintained
+F: drivers/mfd/intel_pmt.c
+
 INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT
 M: Stanislav Yakovlev 
 L: linux-wirel...@vger.kernel.org
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 33df0837ab41..f092db50e518 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -670,6 +670,16 @@ config MFD_INTEL_PMC_BXT
  Register and P-unit access. In addition this creates devices
  for iTCO watchdog and telemetry that are part of the PMC.
 
+config MFD_INTEL_PMT
+   tristate "Intel Platform Monitoring Technology (PMT) support"
+   depends on PCI
+   select MFD_CORE
+   help
+ The Intel Platform Monitoring Technology (PMT) is an interface that
+ provides access to hardware monitor registers. This driver supports
+ Telemetry, Watcher, and Crashlog PMT capabilities/devices for
+ platforms starting from Tiger Lake.
+
 config MFD_IPAQ_MICRO
bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support"
depends on SA1100_H3100 || SA1100_H3600
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index a60e5f835283..b9565d98ab09 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -215,6 +215,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS_PCI)+= intel-lpss-pci.o
 obj-$(CONFIG_MFD_INTEL_LPSS_ACPI)  += intel-lpss-acpi.o
 obj-$(CONFIG_MFD_INTEL_MSIC)   += intel_msic.o
 obj-$(CONFIG_MFD_INTEL_PMC_BXT)+= intel_pmc_bxt.o
+obj-$(CONFIG_MFD_INTEL_PMT)+= intel_pmt.o
 obj-$(CONFIG_MFD_PALMAS)   += palmas.o
 obj-$(CONFIG_MFD_VIPERBOARD)+= viperboard.o
 obj-$(CONFIG_MFD_RC5T583)  += rc5t583.o rc5t583-irq.o
diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c
new file mode 100644
index ..1b57a970a9d7
--- /dev/null
+++ b/drivers/mfd/intel_pmt.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel Platform Monitoring Technology PMT driver
+ *
+ * Copyright (c) 2020, Intel Corporation.
+ * All Rights Reserved.
+ *
+ * Author: David E. Box 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Intel DVSEC capability vendor space offsets */
+#define INTEL_DVSEC_ENTRIES0xA
+#define INTEL_DVSEC_SIZE   0xB
+#define INTEL_DVSEC_TABLE  0xC
+#define INTEL_DVSEC_TABLE_BAR(x)   ((x) & GENMASK(2, 0))
+#define INTEL_DVSEC_TABLE_OFFSET(x)((x) & GENMASK(31, 3))
+#define INTEL_DVSEC_ENTRY_SIZE 4
+
+/* PMT capabilities */
+#define DVSEC_INTEL_ID_TELEMETRY   2
+#define DVSEC_INTEL_ID_WATCHER 3
+#define DVSEC_INTEL_ID_CRASHLOG4
+
+struct intel_dvsec_header {
+   u16 length;
+   u16 id;
+   u8  num_entries;
+   u8  entry_size;
+   u8  tbir;
+   u32 offset;
+};
+
+enum pmt_quirks {
+   /* Watcher capability not supported */
+   PMT_QUIRK_NO_WATCHER= BIT(0),
+
+   /* Crashlog capability not supported */
+   PMT_QUIRK_NO_CRASHLOG   = BIT(1),
+
+   /* Use shift instead of mask to read discovery table offset */
+   PMT_QUIRK_TABLE_SHIFT   = BIT(2),
+};
+
+struct pmt_platform_info {
+   unsigned long quirks;
+};
+
+static const struct pmt_platform_info pmt_info;
+
+static const struct pmt_platform_info tgl_info = {
+   .quirks = 

request!! .

2020-10-02 Thread Angela Campbell
I am sorry for interrupting your day, with due respect trust and humility, I 
write to you this proposal, which I believe would be of great interest to you. 
I am looking for a reliable and capable partner that will assist my family and 
I to transfer funds to his personal or company account for investment purposes 
because of my health..
I am Mrs. Angela Campbell, the wife of Late Mr. Mike Campbell, who was a 
Commercial Farmer and cocoa merchant in Bindura and Harare, the economic city 
of Zimbabwe.

My husband was among the people that were murdered in cold blood by the 
President Robert Mugabe Administration during the land dispute that just 
happened in Zimbabwe wholly affected the white farmers and this resulted to the 
killing and mob action by Zimbabwean war veterans and some lunatics in the 
society. In fact, a lot of people were killed because of this land reformed Act.

Please for further details, kindly email me with your direct contact 
informations to my private email address: angcampbel...@gmail.com 

Full Name:|Home Address|Telephone Number|Mobile Number|Date of Birth| 
Occupation:.

Please do notify me immediately you receive this proposal.

Thanks and God bless you

Mrs. Angela Campbell and (Family).


Re: [PATCH ] scsi: page warning: 'page' may be used uninitialized

2020-10-02 Thread Martin K. Petersen


John,

> corrects: drivers/target/target_core_user.c:688:6: warning: 'page' may be used
> uninitialized

Applied to 5.10/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


  1   2   3   4   5   6   7   8   9   10   >