Re: [PATCH 1/3] sched/topology: Allow archs to populate distance map

2021-05-27 Thread Srikar Dronamraju
* Valentin Schneider  [2021-05-25 11:21:02]:

> On 24/05/21 21:48, Srikar Dronamraju wrote:
> > * Valentin Schneider  [2021-05-24 15:16:09]:
> >> Ok so from your arch you can figure out the *size* of the set of unique
> >> distances, but not the individual node_distance(a, b)... That's quite
> >> unfortunate.
> >
> > Yes, thats true.
> >
> >>
> >> I suppose one way to avoid the hook would be to write some "fake" distance
> >> values into your distance_lookup_table[] for offline nodes using your
> >> distance_ref_point_depth thing, i.e. ensure an iteration of
> >> node_distance(a, b) covers all distance values [1]. You can then keep patch
> >> 3 around, and that should roughly be it.
> >>
> >
> > Yes, this would suffice but to me its not very clean.
> > static int found[distance_ref_point_depth];
> >
> > for_each_node(node){
> >   int i, nd, distance = LOCAL_DISTANCE;
> >   goto out;
> >
> >   nd = node_distance(node, first_online_node)
> >   for (i=0; i < distance_ref_point_depth; i++, distance *= 2) {
> >   if (node_online) {
> >   if (distance != nd)
> >   continue;
> >   found[i] ++;
> >   break;
> >   }
> >   if (found[i])
> >   continue;
> >   distance_lookup_table[node][i] = 
> > distance_lookup_table[first_online_node][i];
> >   found[i] ++;
> >   break;
> >   }
> > }
> >
> > But do note: We are setting a precedent for node distance between two nodes
> > to change.
> >
> 
> Indeed. AFAICT it's that or the unique-distance-values hook :/

Peter, Valentin, Michael,

Can you please let me know which approach you would want me to follow.

Or do let me know any other alternative solutions that you would want me to
try.


-- 
Thanks and Regards
Srikar Dronamraju


[PATCH] Revert "powerpc: Switch to relative jump labels"

2021-05-27 Thread Roman Bolshakov
This reverts commit b0b3b2c78ec075cec4721986a95abbbac8c3da4f.

Otherwise, direct kernel boot with initramfs no longer works in QEMU.
It's broken in some bizarre way because a valid initramfs is not
recognized anymore:

  Found initrd at 0xc1f7:0xc3d61d64
  rootfs image is not initramfs (XZ-compressed data is corrupt); looks like an 
initrd

The issue is observed on v5.13-rc3 if the kernel is built with
defconfig, GCC 7.5.0 and GNU ld 2.32.0.

Cc: Christophe Leroy 
Reported-by: Anastasia Kovaleva 
Signed-off-by: Roman Bolshakov 
---
 arch/powerpc/Kconfig  |  1 -
 arch/powerpc/include/asm/jump_label.h | 21 +++--
 arch/powerpc/kernel/jump_label.c  |  4 ++--
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 088dd2afcfe4..59e0d55ee01d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -189,7 +189,6 @@ config PPC
select HAVE_ARCH_HUGE_VMALLOC   if HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
-   select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KFENCE if PPC32
diff --git a/arch/powerpc/include/asm/jump_label.h 
b/arch/powerpc/include/asm/jump_label.h
index 2d5c6bec2b4f..09297ec9fa52 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -20,8 +20,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:\n\t"
 "nop # arch_static_branch\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".long 1b - ., %l[l_yes] - .\n\t"
-JUMP_ENTRY_TYPE "%c0 - .\n\t"
+JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
 
@@ -35,8 +34,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
asm_volatile_goto("1:\n\t"
 "b %l[l_yes] # arch_static_branch_jump\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".long 1b - ., %l[l_yes] - .\n\t"
-JUMP_ENTRY_TYPE "%c0 - .\n\t"
+JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
 
@@ -45,12 +43,23 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
return true;
 }
 
+#ifdef CONFIG_PPC64
+typedef u64 jump_label_t;
+#else
+typedef u32 jump_label_t;
+#endif
+
+struct jump_entry {
+   jump_label_t code;
+   jump_label_t target;
+   jump_label_t key;
+};
+
 #else
 #define ARCH_STATIC_BRANCH(LABEL, KEY) \
 1098:  nop;\
.pushsection __jump_table, "aw";\
-   .long 1098b - ., LABEL - .; \
-   FTR_ENTRY_LONG KEY; \
+   FTR_ENTRY_LONG 1098b, LABEL, KEY;   \
.popsection
 #endif
 
diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
index ce87dc5ea23c..144858027fa3 100644
--- a/arch/powerpc/kernel/jump_label.c
+++ b/arch/powerpc/kernel/jump_label.c
@@ -11,10 +11,10 @@
 void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
-   struct ppc_inst *addr = (struct ppc_inst *)jump_entry_code(entry);
+   struct ppc_inst *addr = (struct ppc_inst *)(unsigned long)entry->code;
 
if (type == JUMP_LABEL_JMP)
-   patch_branch(addr, jump_entry_target(entry), 0);
+   patch_branch(addr, entry->target, 0);
else
patch_instruction(addr, ppc_inst(PPC_INST_NOP));
 }
-- 
2.31.1



Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-27 Thread Paul Moore
On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek  wrote:
>
> Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
> lockdown") added an implementation of the locked_down LSM hook to
> SELinux, with the aim to restrict which domains are allowed to perform
> operations that would breach lockdown.
>
> However, in several places the security_locked_down() hook is called in
> situations where the current task isn't doing any action that would
> directly breach lockdown, leading to SELinux checks that are basically
> bogus.
>
> Since in most of these situations converting the callers such that
> security_locked_down() is called in a context where the current task
> would be meaningful for SELinux is impossible or very non-trivial (and
> could lead to TOCTOU issues for the classic Lockdown LSM
> implementation), fix this by modifying the hook to accept a struct cred
> pointer as argument, where NULL will be interpreted as a request for a
> "global", task-independent lockdown decision only. Then modify SELinux
> to ignore calls with cred == NULL.

I'm not overly excited about skipping the access check when cred is
NULL.  Based on the description and the little bit that I've dug into
thus far it looks like using SECINITSID_KERNEL as the subject would be
much more appropriate.  *Something* (the kernel in most of the
relevant cases it looks like) is requesting that a potentially
sensitive disclosure be made, and ignoring it seems like the wrong
thing to do.  Leaving the access control intact also provides a nice
avenue to audit these requests should users want to do that.

Those users that generally don't care can grant kernel_t all the
necessary permissions without much policy.

> Since most callers will just want to pass current_cred() as the cred
> parameter, rename the hook to security_cred_locked_down() and provide
> the original security_locked_down() function as a simple wrapper around
> the new hook.

I know you and Casey went back and forth on this in v1, but I agree
with Casey that having two LSM hooks here is a mistake.  I know it
makes backports hard, but spoiler alert: maintaining complex software
over any non-trivial period of time is hard, rally hard sometimes
;)

> The callers migrated to the new hook, passing NULL as cred:
> 1. arch/powerpc/xmon/xmon.c
>  Here the hook seems to be called from non-task context and is only
>  used for redacting some sensitive values from output sent to
>  userspace.

This definitely sounds like kernel_t based on the description above.

> 2. fs/tracefs/inode.c:tracefs_create_file()
>  Here the call is used to prevent creating new tracefs entries when
>  the kernel is locked down. Assumes that locking down is one-way -
>  i.e. if the hook returns non-zero once, it will never return zero
>  again, thus no point in creating these files.

More kernel_t.

> 3. kernel/trace/bpf_trace.c:bpf_probe_read_kernel{,_str}_common()
>  Called when a BPF program calls a helper that could leak kernel
>  memory. The task context is not relevant here, since the program
>  may very well be run in the context of a different task than the
>  consumer of the data.
>  See: https://bugzilla.redhat.com/show_bug.cgi?id=1955585

The access control check isn't so much who is consuming the data, but
who is requesting a potential violation of a "lockdown", yes?  For
example, the SELinux policy rule for the current lockdown check looks
something like this:

  allow   : lockdown {  };

It seems to me that the task context is relevant here and performing
the access control check based on the task's domain is correct.  If we
are also concerned about who has access to this sensitive information
once it has been determined that the task can cause it to be sent, we
should have another check point for that, assuming the access isn't
already covered by another check/hook.

> 4. net/xfrm/xfrm_user.c:copy_to_user_*()
>  Here a cryptographic secret is redacted based on the value returned
>  from the hook. There are two possible actions that may lead here:
>  a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
> task context is relevant, since the dumped data is sent back to
> the current task.

If the task context is relevant we should use it.

>  b) When deleting an SA via XFRM_MSG_DELSA, the dumped SAs are
> broadcasted to tasks subscribed to XFRM events - here the
> SELinux check is not meningful as the current task's creds do
> not represent the tasks that could potentially see the secret.

This looks very similar to the BPF hook discussed above, I believe my
comments above apply here as well.

>  It really doesn't seem worth it to try to preserve the check in the
>  a) case ...

After you've read all of the above I hope you can understand why I
disagree with this.

>  ... since the eventual leak can be circumvented anyway via b)

I don't follow the statement 

[PATCH] ASoC: fsl-asoc-card: Set .owner attribute when registering card.

2021-05-27 Thread Nicolas Cavallari
Otherwise, when compiled as module, a WARN_ON is triggered:

WARNING: CPU: 0 PID: 5 at sound/core/init.c:208 snd_card_new+0x310/0x39c [snd]
[...]
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.39 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: events deferred_probe_work_func
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0xdc/0x104)
[] (dump_stack) from [] (__warn+0xd8/0x114)
[] (__warn) from [] (warn_slowpath_fmt+0x5c/0xc4)
[] (warn_slowpath_fmt) from [] (snd_card_new+0x310/0x39c 
[snd])
[] (snd_card_new [snd]) from [] 
(snd_soc_bind_card+0x334/0x9c4 [snd_soc_core])
[] (snd_soc_bind_card [snd_soc_core]) from [] 
(devm_snd_soc_register_card+0x30/0x6c [snd_soc_core])
[] (devm_snd_soc_register_card [snd_soc_core]) from [] 
(fsl_asoc_card_probe+0x550/0xcc8 [snd_soc_fsl_asoc_card])
[] (fsl_asoc_card_probe [snd_soc_fsl_asoc_card]) from [] 
(platform_drv_probe+0x48/0x98)
[...]

Signed-off-by: Nicolas Cavallari 
---
 sound/soc/fsl/fsl-asoc-card.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index c62bfd1c3ac7..4f55b316cf0f 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -744,6 +744,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
/* Initialize sound card */
priv->pdev = pdev;
priv->card.dev = >dev;
+   priv->card.owner = THIS_MODULE;
ret = snd_soc_of_parse_card_name(>card, "model");
if (ret) {
snprintf(priv->name, sizeof(priv->name), "%s-audio",
-- 
2.32.0.rc0



Re: [PATCH v3 0/4] P2040/P2041 i2c recovery erratum

2021-05-27 Thread Wolfram Sang
On Wed, May 26, 2021 at 11:02:45AM +1000, Michael Ellerman wrote:
> Wolfram Sang  writes:
> > On Wed, May 12, 2021 at 09:20:48AM +1200, Chris Packham wrote:
> >> The P2040/P2041 has an erratum where the i2c recovery scheme
> >> documented in the reference manual (and currently implemented
> >> in the i2c-mpc.c driver) does not work. The errata document
> >> provides an alternative that does work. This series implements
> >> that alternative and uses a property in the devicetree to
> >> decide when the alternative mechanism is needed.
> >
> > The series looks good to me. Usually, I don't take DTS patches. This
> > time I'd make an exception and apply all patches to for-current because
> > this is clearly a bugfix. For that, I'd need an ack from PPC
> > maintainers. Could I have those for patches 2+3?
> 
> Yep, done.

Thanks! Series applied to for-current.



signature.asc
Description: PGP signature


Re: [PATCH] mm: generalize ZONE_[DMA|DMA32]

2021-05-27 Thread Catalin Marinas
On Thu, May 27, 2021 at 10:30:47PM +0800, Kefeng Wang wrote:
> ZONE_[DMA|DMA32] configs have duplicate definitions on platforms
> that subscribe them. Instead, just make them generic options which
> can be selected on applicable platforms.
> 
> Also only x86/arm64 architectures could enable both ZONE_DMA and
> ZONE_DMA32 if EXPERT, add ARCH_HAS_ZONE_DMA_SET to make dma zone
> configurable and visible on the two architectures.
> 
> Cc: Andrew Morton  
> Cc: Catalin Marinas  
> Cc: Will Deacon  
> Cc: Geert Uytterhoeven  
> Cc: Thomas Bogendoerfer  
> Cc: "David S. Miller"  
> Cc: Ingo Molnar  
> Cc: Borislav Petkov  
> Cc: Palmer Dabbelt 
> Cc: Richard Henderson  
> Cc: Russell King 
> Signed-off-by: Kefeng Wang 

For arm64:

Acked-by: Catalin Marinas 


Re: [PATCH] mm: generalize ZONE_[DMA|DMA32]

2021-05-27 Thread kernel test robot
Hi Kefeng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on arm64/for-next/core m68k/for-next s390/features 
linus/master v5.13-rc3]
[cannot apply to hnaz-linux-mm/master tip/x86/core sparc-next/master 
sparc/master next-20210527]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Kefeng-Wang/mm-generalize-ZONE_-DMA-DMA32/20210527-222334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/de626453e50ab17adeffd9602e997b4b67b80eb2
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Kefeng-Wang/mm-generalize-ZONE_-DMA-DMA32/20210527-222334
git checkout de626453e50ab17adeffd9602e997b4b67b80eb2
# save the attached .config to linux build tree
make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/slab.h:15,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/gfp.h:433:2: error: #error GFP_ZONES_SHIFT too large to create 
>> GFP_ZONE_TABLE integer
 433 | #error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
 |  ^
   include/linux/gfp.h: In function 'gfp_zone':
>> include/linux/gfp.h:444:20: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
 444 |  | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * 
GFP_ZONES_SHIFT)\
 |^~
   include/linux/gfp.h:469:7: note: in expansion of macro 'GFP_ZONE_TABLE'
 469 |  z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) &
 |   ^~
--
   In file included from include/linux/slab.h:15,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/gfp.h:433:2: error: #error GFP_ZONES_SHIFT too large to create 
>> GFP_ZONE_TABLE integer
 433 | #error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
 |  ^
   include/linux/gfp.h: In function 'gfp_zone':
>> include/linux/gfp.h:444:20: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
 444 |  | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * 
GFP_ZONES_SHIFT)\
 |^~
   include/linux/gfp.h:469:7: note: in expansion of macro 'GFP_ZONE_TABLE'
 469 |  z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) &
 |   ^~
   make[2]: *** [scripts/Makefile.build:117: arch/x86/kernel/asm-offsets.s] 
Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1227: prepare0] Error 2
   make[1]: Target 'modules_prepare' not remade because of errors.
   make: *** [Makefile:215: __sub-make] Error 2
   make: Target 'modules_prepare' not remade because of errors.
--
   In file included from include/linux/slab.h:15,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/gfp.h:433:2: error: #error GFP_ZONES_SHIFT too large to create 
>> GFP_ZONE_TABLE integer
 433 | #error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
 |  ^
   include/linux/gfp.h: In function 'gfp_zone':
>> include/linux/gfp.h:444:20: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
 444 |  | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * 
GFP_ZONES_SHIFT)\
 |^~
   include/linux/gfp.h:469:7: note: in expansion of macro 'GFP_ZONE_TABLE'
 469 |  z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) &
 |   ^~
   make[2]: *** [scripts/Makefile.build:117: arch/x86/kernel/asm-offsets.s] 
Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1227: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:215: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +433 include/linux/gfp.h

b11a7b94100cba Dan Williams  2016-03-17  431  
b11a7b94100cba Dan Williams  2016-03-17  432  #if 16 * GFP_ZONES_SHIFT > 
BITS_PER_LONG
b11a7b94100cba Dan Williams  2016-03-17 @433  #error GFP_ZONES_SHIFT too 
large to create GFP_ZONE_TABLE integer
b70d94ee438b3f Christoph Lameter 2009-06-16  434  #endif
b70d94ee438b3f Ch

Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions

2021-05-27 Thread Tom Lendacky
On 5/27/21 9:41 AM, Tom Lendacky wrote:
> On 5/27/21 8:02 AM, Christoph Hellwig wrote:
>> On Wed, May 19, 2021 at 11:50:07AM -0700, Florian Fainelli wrote:
>>> You convert this call site with swiotlb_init_io_tlb_mem() which did not
>>> do the set_memory_decrypted()+memset(). Is this okay or should
>>> swiotlb_init_io_tlb_mem() add an additional argument to do this
>>> conditionally?
>>
>> The zeroing is useful and was missing before.  I think having a clean
>> state here is the right thing.
>>
>> Not sure about the set_memory_decrypted, swiotlb_update_mem_attributes
>> kinda suggests it is too early to set the memory decrupted.
>>
>> Adding Tom who should now about all this.
> 
> The reason for adding swiotlb_update_mem_attributes() was because having
> the call to set_memory_decrypted() in swiotlb_init_with_tbl() triggered a
> BUG_ON() related to interrupts not being enabled yet during boot. So that
> call had to be delayed until interrupts were enabled.

I pulled down and tested the patch set and booted with SME enabled. The
following was seen during the boot:

[0.134184] BUG: Bad page state in process swapper  pfn:108002
[0.134196] page:(ptrval) refcount:0 mapcount:-128 
mapping: index:0x0 pfn:0x108002
[0.134201] flags: 0x17c000(node=0|zone=2|lastcpupid=0x1f)
[0.134208] raw: 0017c000 88847f355e28 88847f355e28 

[0.134210] raw:  0001 ff7f 

[0.134212] page dumped because: nonzero mapcount
[0.134213] Modules linked in:
[0.134218] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc2-sos-custom #3
[0.134221] Hardware name: ...
[0.134224] Call Trace:
[0.134233]  dump_stack+0x76/0x94
[0.134244]  bad_page+0xa6/0xf0
[0.134252]  __free_pages_ok+0x331/0x360
[0.134256]  memblock_free_all+0x158/0x1c1
[0.134267]  mem_init+0x1f/0x14c
[0.134273]  start_kernel+0x290/0x574
[0.134279]  secondary_startup_64_no_verify+0xb0/0xbb

I see this about 40 times during the boot, each with a different PFN. The
system boots (which seemed odd), but I don't know if there will be side
effects to this (I didn't stress the system).

I modified the code to add a flag to not do the set_memory_decrypted(), as
suggested by Florian, when invoked from swiotlb_init_with_tbl(), and that
eliminated the bad page state BUG.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>


Re: [PATCH] docs: kernel-parameters: mark numa=off is supported by a bundle of architectures

2021-05-27 Thread Jonathan Corbet
Barry Song  writes:

> risc-v and arm64 support numa=off by common arch_numa_init()
> in drivers/base/arch_numa.c. x86, ppc, mips, sparc support it
> by arch-level early_param.
> numa=off is widely used in linux distributions. it is better
> to document it.
>
> Signed-off-by: Barry Song 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 3 +++
>  1 file changed, 3 insertions(+)

Applied, thanks.

jon


Re: [PATCH -next 2/3] xen: balloon: Replaced simple_strtoull() with kstrtoull()

2021-05-27 Thread Dan Carpenter
On Thu, May 27, 2021 at 02:10:21PM +, David Laight wrote:
> From: Chen Huang
> > Sent: 26 May 2021 10:20
> > 
> > The simple_strtoull() function is deprecated in some situation, since
> > it does not check for the range overflow, use kstrtoull() instead.
> > 
> ...
> > -   target_bytes = simple_strtoull(buf, , 0) * 1024;
> > +   ret = kstrtoull(buf, 0, _bytes);
> > +   if (ret)
> > +   return ret;
> > +   target_bytes *= 1024;
> 
> I'd have thought it was more important to check *endchar
> than overflow.

That's one of the differences between simple_strtoull() and kstrtoull().
The simple_strtoull() will accept a string like "123ABC", but kstrtoull()
will only accept NUL terminated numbers or a newline followed by a NUL
terminator.  Which is fine in this context because users will be doing
"echo 1234 > /sys/foo".

> If you are worried about overflow you need a range check
> before the multiply.

This is probably a case where if the users cause an integer overflow
then they get what they deserve.

regards,
dan carpenter


Re: [PATCH] mm: generalize ZONE_[DMA|DMA32]

2021-05-27 Thread Mike Rapoport
On Thu, May 27, 2021 at 10:30:47PM +0800, Kefeng Wang wrote:
> ZONE_[DMA|DMA32] configs have duplicate definitions on platforms
> that subscribe them. Instead, just make them generic options which
> can be selected on applicable platforms.
> 
> Also only x86/arm64 architectures could enable both ZONE_DMA and
> ZONE_DMA32 if EXPERT, add ARCH_HAS_ZONE_DMA_SET to make dma zone
> configurable and visible on the two architectures.
> 
> Cc: Andrew Morton  
> Cc: Catalin Marinas  
> Cc: Will Deacon  
> Cc: Geert Uytterhoeven  
> Cc: Thomas Bogendoerfer  
> Cc: "David S. Miller"  
> Cc: Ingo Molnar  
> Cc: Borislav Petkov  
> Cc: Palmer Dabbelt 
> Cc: Richard Henderson  
> Cc: Russell King 
> Signed-off-by: Kefeng Wang 

Acked-by: Mike Rapoport 

> ---
>  arch/alpha/Kconfig |  5 +
>  arch/arm/Kconfig   |  3 ---
>  arch/arm64/Kconfig |  9 +
>  arch/ia64/Kconfig  |  4 +---
>  arch/m68k/Kconfig  |  5 +
>  arch/microblaze/Kconfig|  4 +---
>  arch/mips/Kconfig  |  7 ---
>  arch/powerpc/Kconfig   |  4 
>  arch/powerpc/platforms/Kconfig.cputype |  1 +
>  arch/riscv/Kconfig |  5 +
>  arch/s390/Kconfig  |  4 +---
>  arch/sparc/Kconfig |  5 +
>  arch/x86/Kconfig   | 15 ++-
>  mm/Kconfig | 11 +++
>  14 files changed, 22 insertions(+), 60 deletions(-)
> 
> diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
> index 5998106faa60..6a69a14c4825 100644
> --- a/arch/alpha/Kconfig
> +++ b/arch/alpha/Kconfig
> @@ -40,6 +40,7 @@ config ALPHA
>   select MMU_GATHER_NO_RANGE
>   select SET_FS
>   select SPARSEMEM_EXTREME if SPARSEMEM
> + select ZONE_DMA
>   help
> The Alpha is a 64-bit general-purpose processor designed and
> marketed by the Digital Equipment Corporation of blessed memory,
> @@ -65,10 +66,6 @@ config GENERIC_CALIBRATE_DELAY
>   bool
>   default y
>  
> -config ZONE_DMA
> - bool
> - default y
> -
>  config GENERIC_ISA_DMA
>   bool
>   default y
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 24804f11302d..000c3f80b58e 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -218,9 +218,6 @@ config GENERIC_CALIBRATE_DELAY
>  config ARCH_MAY_HAVE_PC_FDC
>   bool
>  
> -config ZONE_DMA
> - bool
> -
>  config ARCH_SUPPORTS_UPROBES
>   def_bool y
>  
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9f1d8566bbf9..42794474f37f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -42,6 +42,7 @@ config ARM64
>   select ARCH_HAS_SYSCALL_WRAPPER
>   select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT
>   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> + select ARCH_HAS_ZONE_DMA_SET if EXPERT
>   select ARCH_HAVE_ELF_PROT
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select ARCH_INLINE_READ_LOCK if !PREEMPTION
> @@ -307,14 +308,6 @@ config GENERIC_CSUM
>  config GENERIC_CALIBRATE_DELAY
>   def_bool y
>  
> -config ZONE_DMA
> - bool "Support DMA zone" if EXPERT
> - default y
> -
> -config ZONE_DMA32
> - bool "Support DMA32 zone" if EXPERT
> - default y
> -
>  config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   def_bool y
>  
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index 279252e3e0f7..fd8503a0088a 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -60,6 +60,7 @@ config IA64
>   select NUMA if !FLATMEM
>   select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
>   select SET_FS
> + select ZONE_DMA32
>   default y
>   help
> The Itanium Processor Family is Intel's 64-bit successor to
> @@ -72,9 +73,6 @@ config 64BIT
>   select ATA_NONSTANDARD if ATA
>   default y
>  
> -config ZONE_DMA32
> - def_bool y
> -
>  config MMU
>   bool
>   default y
> diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
> index 372e4e69c43a..05a729c6ad7f 100644
> --- a/arch/m68k/Kconfig
> +++ b/arch/m68k/Kconfig
> @@ -34,6 +34,7 @@ config M68K
>   select SET_FS
>   select UACCESS_MEMCPY if !MMU
>   select VIRT_TO_BUS
> + select ZONE_DMA
>  
>  config CPU_BIG_ENDIAN
>   def_bool y
> @@ -62,10 +63,6 @@ config TIME_LOW_RES
>  config NO_IOPORT_MAP
>   def_bool y
>  
> -config ZONE_DMA
> - bool
> - default y
> -
>  config HZ
>   int
>   default 1000 if CLEOPATRA
> diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
> index 0660f47012bc..14a67a42fcae 100644
> --- a/arch/microblaze/Kconfig
> +++ b/arch/microblaze/Kconfig
> @@ -43,6 +43,7 @@ config MICROBLAZE
>   select MMU_GATHER_NO_RANGE
>   select SPARSE_IRQ
>   select SET_FS
> + select ZONE_DMA
>  
>  # Endianness selection
>  choice
> @@ -60,9 +61,6 @@ config CPU_LITTLE_ENDIAN
>  
>  endchoice
>  
> -config 

Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions

2021-05-27 Thread Tom Lendacky
On 5/27/21 8:02 AM, Christoph Hellwig wrote:
> On Wed, May 19, 2021 at 11:50:07AM -0700, Florian Fainelli wrote:
>> You convert this call site with swiotlb_init_io_tlb_mem() which did not
>> do the set_memory_decrypted()+memset(). Is this okay or should
>> swiotlb_init_io_tlb_mem() add an additional argument to do this
>> conditionally?
> 
> The zeroing is useful and was missing before.  I think having a clean
> state here is the right thing.
> 
> Not sure about the set_memory_decrypted, swiotlb_update_mem_attributes
> kinda suggests it is too early to set the memory decrupted.
> 
> Adding Tom who should now about all this.

The reason for adding swiotlb_update_mem_attributes() was because having
the call to set_memory_decrypted() in swiotlb_init_with_tbl() triggered a
BUG_ON() related to interrupts not being enabled yet during boot. So that
call had to be delayed until interrupts were enabled.

Thanks,
Tom

> 


[PATCH] mm: generalize ZONE_[DMA|DMA32]

2021-05-27 Thread Kefeng Wang
ZONE_[DMA|DMA32] configs have duplicate definitions on platforms
that subscribe them. Instead, just make them generic options which
can be selected on applicable platforms.

Also only x86/arm64 architectures could enable both ZONE_DMA and
ZONE_DMA32 if EXPERT, add ARCH_HAS_ZONE_DMA_SET to make dma zone
configurable and visible on the two architectures.

Cc: Andrew Morton  
Cc: Catalin Marinas  
Cc: Will Deacon  
Cc: Geert Uytterhoeven  
Cc: Thomas Bogendoerfer  
Cc: "David S. Miller"  
Cc: Ingo Molnar  
Cc: Borislav Petkov  
Cc: Palmer Dabbelt 
Cc: Richard Henderson  
Cc: Russell King 
Signed-off-by: Kefeng Wang 
---
 arch/alpha/Kconfig |  5 +
 arch/arm/Kconfig   |  3 ---
 arch/arm64/Kconfig |  9 +
 arch/ia64/Kconfig  |  4 +---
 arch/m68k/Kconfig  |  5 +
 arch/microblaze/Kconfig|  4 +---
 arch/mips/Kconfig  |  7 ---
 arch/powerpc/Kconfig   |  4 
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 arch/riscv/Kconfig |  5 +
 arch/s390/Kconfig  |  4 +---
 arch/sparc/Kconfig |  5 +
 arch/x86/Kconfig   | 15 ++-
 mm/Kconfig | 11 +++
 14 files changed, 22 insertions(+), 60 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 5998106faa60..6a69a14c4825 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -40,6 +40,7 @@ config ALPHA
select MMU_GATHER_NO_RANGE
select SET_FS
select SPARSEMEM_EXTREME if SPARSEMEM
+   select ZONE_DMA
help
  The Alpha is a 64-bit general-purpose processor designed and
  marketed by the Digital Equipment Corporation of blessed memory,
@@ -65,10 +66,6 @@ config GENERIC_CALIBRATE_DELAY
bool
default y
 
-config ZONE_DMA
-   bool
-   default y
-
 config GENERIC_ISA_DMA
bool
default y
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 24804f11302d..000c3f80b58e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -218,9 +218,6 @@ config GENERIC_CALIBRATE_DELAY
 config ARCH_MAY_HAVE_PC_FDC
bool
 
-config ZONE_DMA
-   bool
-
 config ARCH_SUPPORTS_UPROBES
def_bool y
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9f1d8566bbf9..42794474f37f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -42,6 +42,7 @@ config ARM64
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
+   select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_ELF_PROT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_INLINE_READ_LOCK if !PREEMPTION
@@ -307,14 +308,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
def_bool y
 
-config ZONE_DMA
-   bool "Support DMA zone" if EXPERT
-   default y
-
-config ZONE_DMA32
-   bool "Support DMA32 zone" if EXPERT
-   default y
-
 config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
def_bool y
 
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 279252e3e0f7..fd8503a0088a 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -60,6 +60,7 @@ config IA64
select NUMA if !FLATMEM
select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
select SET_FS
+   select ZONE_DMA32
default y
help
  The Itanium Processor Family is Intel's 64-bit successor to
@@ -72,9 +73,6 @@ config 64BIT
select ATA_NONSTANDARD if ATA
default y
 
-config ZONE_DMA32
-   def_bool y
-
 config MMU
bool
default y
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 372e4e69c43a..05a729c6ad7f 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -34,6 +34,7 @@ config M68K
select SET_FS
select UACCESS_MEMCPY if !MMU
select VIRT_TO_BUS
+   select ZONE_DMA
 
 config CPU_BIG_ENDIAN
def_bool y
@@ -62,10 +63,6 @@ config TIME_LOW_RES
 config NO_IOPORT_MAP
def_bool y
 
-config ZONE_DMA
-   bool
-   default y
-
 config HZ
int
default 1000 if CLEOPATRA
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 0660f47012bc..14a67a42fcae 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -43,6 +43,7 @@ config MICROBLAZE
select MMU_GATHER_NO_RANGE
select SPARSE_IRQ
select SET_FS
+   select ZONE_DMA
 
 # Endianness selection
 choice
@@ -60,9 +61,6 @@ config CPU_LITTLE_ENDIAN
 
 endchoice
 
-config ZONE_DMA
-   def_bool y
-
 config ARCH_HAS_ILOG2_U32
def_bool n
 
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index ed51970c08e7..430d5324f1af 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -3277,13 +3277,6 @@ config I8253
select CLKSRC_I8253
select 

Re: [PATCH] mm: generalize ZONE_[DMA|DMA32]

2021-05-27 Thread Geert Uytterhoeven
On Thu, May 27, 2021 at 4:21 PM Kefeng Wang  wrote:
> ZONE_[DMA|DMA32] configs have duplicate definitions on platforms
> that subscribe them. Instead, just make them generic options which
> can be selected on applicable platforms.
>
> Also only x86/arm64 architectures could enable both ZONE_DMA and
> ZONE_DMA32 if EXPERT, add ARCH_HAS_ZONE_DMA_SET to make dma zone
> configurable and visible on the two architectures.
>
> Cc: Andrew Morton 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Geert Uytterhoeven 
> Cc: Thomas Bogendoerfer 
> Cc: "David S. Miller" 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Palmer Dabbelt 
> Cc: Richard Henderson 
> Cc: Russell King 
> Signed-off-by: Kefeng Wang 

>  arch/m68k/Kconfig  |  5 +

Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-27 Thread Paul Moore
On Thu, May 27, 2021 at 12:33 AM James Morris  wrote:
> On Wed, 26 May 2021, Ondrej Mosnacek wrote:
>
> > Thanks, Michael!
> >
> > James/Paul, is there anything blocking this patch from being merged?
> > Especially the BPF case is causing real trouble for people and the
> > only workaround is to broadly allow lockdown::confidentiality in the
> > policy.
>
> It would be good to see more signoffs/reviews, especially from Paul, but
> he is busy with the io_uring stuff.

Yes, it's been a busy week with various things going on around here.
I looked at the v1 posting but haven't had a chance yet to look at v2;
I promise to get to it today, but it might not happen until later
tonight.

> Let's see if anyone else can look at this in the next couple of days.

-- 
paul moore
www.paul-moore.com


RE: [PATCH -next 2/3] xen: balloon: Replaced simple_strtoull() with kstrtoull()

2021-05-27 Thread David Laight
From: Chen Huang
> Sent: 26 May 2021 10:20
> 
> The simple_strtoull() function is deprecated in some situation, since
> it does not check for the range overflow, use kstrtoull() instead.
> 
...
> - target_bytes = simple_strtoull(buf, , 0) * 1024;
> + ret = kstrtoull(buf, 0, _bytes);
> + if (ret)
> + return ret;
> + target_bytes *= 1024;

I'd have thought it was more important to check *endchar
than overflow.
If you are worried about overflow you need a range check
before the multiply.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH v8 00/15] Restricted DMA

2021-05-27 Thread Christoph Hellwig
I just finished reviewing v7, sorry.  Let me find some time to see what
difference this version makes.


Re: [PATCH v7 13/15] dma-direct: Allocate memory from restricted DMA pool if available

2021-05-27 Thread Christoph Hellwig
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> + if (swiotlb_free(dev, page, size))
> + return;
> +#endif

Please avoid the ifdefs by either stubbing out the function to be a no-op
or by using IS_ENABLED.

> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> + page = swiotlb_alloc(dev, size);
> + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> + __dma_direct_free_pages(dev, page, size);
> + page = NULL;
> + }
> +#endif

Same here, for the stub it would just return NULL.


Re: [PATCH v7 07/15] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-05-27 Thread Christoph Hellwig
> + if (is_swiotlb_active(NULL)) {

Passing a NULL argument to this doesn't make sense.  They all should have
a struct device at hand, you'll just need to dig for it.

And this function should be about to go away anyway, but until then we
need to do this properly.


Re: [PATCH v7 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-27 Thread Christoph Hellwig
I'd still much prefer to always have the pointer in struct device.
Especially as we're also looking into things like a global 64-bit bounce
buffer.  Something like this untested patch ontop of your series:


diff --git a/drivers/base/core.c b/drivers/base/core.c
index 628e33939aca..3cb95fa29f70 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include  /* for dma_default_coherent */
+#include 
 
 #include "base.h"
 #include "power/power.h"
@@ -2814,6 +2815,9 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
+#ifdef CONFIG_SWIOTLB
+   dev->dma_io_tlb_mem = _tlb_default_mem;
+#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 4987608ea4ff..6aca6fa0facc 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,7 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
- * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -523,7 +523,7 @@ struct device {
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
 #endif
-#ifdef CONFIG_DMA_RESTRICTED_POOL
+#ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index e8cf49bd90c5..c153cd0c469c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -95,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force_swiotlb;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -103,30 +104,16 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev)
-{
-#ifdef CONFIG_DMA_RESTRICTED_POOL
-   if (dev && dev->dma_io_tlb_mem)
-   return dev->dma_io_tlb_mem;
-#endif /* CONFIG_DMA_RESTRICTED_POOL */
-
-   return io_tlb_default_mem;
-}
-
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
 static inline bool is_dev_swiotlb_force(struct device *dev)
 {
-#ifdef CONFIG_DMA_RESTRICTED_POOL
-   if (dev->dma_io_tlb_mem)
-   return true;
-#endif /* CONFIG_DMA_RESTRICTED_POOL */
-   return false;
+   return dev->dma_io_tlb_mem->force_swiotlb;
 }
 
 void __init swiotlb_exit(void);
@@ -134,10 +121,8 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
-#ifdef CONFIG_DMA_RESTRICTED_POOL
 struct page *swiotlb_alloc(struct device *dev, size_t size);
 bool swiotlb_free(struct device *dev, struct page *page, size_t size);
-#endif /* CONFIG_DMA_RESTRICTED_POOL */
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d3fa4669229b..69d62e18f493 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -347,7 +347,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -429,7 +429,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -510,7 +510,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, 

Re: [PATCH v7 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-27 Thread Christoph Hellwig
On Mon, May 24, 2021 at 11:49:34AM -0400, Konrad Rzeszutek Wilk wrote:
> rmem_swiotlb_setup
> 
> ?
> 
> Which is ARM specific and inside the generic code?

I don't think it is arm specific at all.  It is OF specific, but just
about every platform but x86 uses OF.  And I can think of an ACPI version
of this as well.


Re: [PATCH v7 03/15] swiotlb: Add DMA_RESTRICTED_POOL

2021-05-27 Thread Christoph Hellwig
On Tue, May 18, 2021 at 02:42:03PM +0800, Claire Chang wrote:
> Add a new kconfig symbol, DMA_RESTRICTED_POOL, for restricted DMA pool.

Please merge this with the actual code that is getting added.


Re: [PATCH v7 02/15] swiotlb: Refactor swiotlb_create_debugfs

2021-05-27 Thread Christoph Hellwig
On Tue, May 18, 2021 at 02:42:02PM +0800, Claire Chang wrote:
>  struct io_tlb_mem *io_tlb_default_mem;
> +static struct dentry *debugfs_dir;
>  
>  /*
>   * Max segment that we can provide which (if pages are contingous) will
> @@ -662,18 +663,30 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
>  
>  #ifdef CONFIG_DEBUG_FS
>  
> +static void swiotlb_create_debugfs(struct io_tlb_mem *mem, const char *name)
>  {
>   if (!mem)
> + return;

I don't think this check makes much sense here.

> +}
> +
> +static int __init swiotlb_create_default_debugfs(void)
> +{
> + struct io_tlb_mem *mem = io_tlb_default_mem;
> +
> + if (mem) {
> + swiotlb_create_debugfs(mem, "swiotlb");
> + debugfs_dir = mem->debugfs;
> + } else {
> + debugfs_dir = debugfs_create_dir("swiotlb", NULL);
> + }

This also looks rather strange.  I'd much rather create move the
directory creation of out swiotlb_create_debugfs.  E.g. something like:

static void swiotlb_create_debugfs_file(struct io_tlb_mem *mem)
{
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
}

static int __init swiotlb_init_debugfs(void)
{
debugfs_dir = debugfs_create_dir("swiotlb", NULL);
if (io_tlb_default_mem) {
io_tlb_default_mem->debugfs = debugfs_dir;
swiotlb_create_debugfs_files(io_tlb_default_mem);
}
return 0;
}
late_initcall(swiotlb_init_debugfs);

...

static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
struct device *dev)
{
...
mem->debugfs = debugfs_create_dir(rmem->name, debugfs_dir);
swiotlb_create_debugfs_files(mem->debugfs);


}


[PATCH v8 14/15] dt-bindings: of: Add restricted DMA pool

2021-05-27 Thread Claire Chang
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..46804f24df05 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <_dma_mem_reserved>;
+   /* ... */
+   };
 };
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 15/15] of: Add plumbing for restricted DMA pool

2021-05-27 Thread Claire Chang
If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index aca94c348bd4..6cc7eaaf7e11 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1112,6 +1113,38 @@ bool of_dma_is_coherent(struct device_node *np)
 }
 EXPORT_SYMBOL_GPL(of_dma_is_coherent);
 
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
+
 /**
  * of_mmio_is_nonposted - Check if device uses non-posted MMIO
  * @np:device node
diff --git a/drivers/of/device.c b/drivers/of/device.c
index c5a9473a5fb1..2defdca418ec 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d717efbd637d..8fde97565d11 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -163,12 +163,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 13/15] dma-direct: Allocate memory from restricted DMA pool if available

2021-05-27 Thread Claire Chang
The restricted DMA pool is preferred if available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index eb4098323bbc..0d521f78c7b9 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -78,6 +78,10 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
 static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size)
 {
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (swiotlb_free(dev, page, size))
+   return;
+#endif
dma_free_contiguous(dev, page, size);
 }
 
@@ -92,7 +96,17 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   page = dma_alloc_contiguous(dev, size, gfp);
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   page = NULL;
+   }
+#endif
+
+   if (!page)
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -148,7 +162,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -161,18 +175,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_dev_swiotlb_force) is required, one must
+* set up another device coherent pool by shared-dma-pool and use
+* dma_alloc_from_dev_coherent instead.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
+(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
@@ -253,15 +272,15 @@ void dma_direct_free(struct device *dev, size_t size,
unsigned int page_order = get_order(size);
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
dma_free_contiguous(dev, cpu_addr, size);
return;
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev)) {
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev)) {
arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
return;
}
@@ -289,7 +308,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
void *ret;
 
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp))
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, 

[PATCH v8 12/15] swiotlb: Add restricted DMA alloc/free support.

2021-05-27 Thread Claire Chang
Add the functions, swiotlb_{alloc,free} to support the memory allocation
from restricted DMA pool.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/swiotlb.c| 35 +--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 0c5a18d9cf89..e8cf49bd90c5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -134,6 +134,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c4fc2e444e7a..648bfdde4b0c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -457,8 +457,9 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
 
index = wrap = wrap_index(mem, ALIGN(mem->index, stride));
do {
-   if ((slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
-   (orig_addr & iotlb_align_mask)) {
+   if (orig_addr &&
+   (slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
+   (orig_addr & iotlb_align_mask)) {
index = wrap_index(mem, index + 1);
continue;
}
@@ -704,6 +705,36 @@ late_initcall(swiotlb_create_default_debugfs);
 #endif
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   phys_addr_t tlb_addr;
+   int index;
+
+   if (!mem)
+   return NULL;
+
+   index = find_slots(dev, 0, size);
+   if (index == -1)
+   return NULL;
+
+   tlb_addr = slot_addr(mem->start, index);
+
+   return pfn_to_page(PFN_DOWN(tlb_addr));
+}
+
+bool swiotlb_free(struct device *dev, struct page *page, size_t size)
+{
+   phys_addr_t tlb_addr = page_to_phys(page);
+
+   if (!is_swiotlb_buffer(dev, tlb_addr))
+   return false;
+
+   release_slots(dev, tlb_addr);
+
+   return true;
+}
+
 static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
struct device *dev)
 {
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 11/15] dma-direct: Add a new wrapper __dma_direct_free_pages()

2021-05-27 Thread Claire Chang
Add a new wrapper __dma_direct_free_pages() that will be useful later
for swiotlb_free().

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 078f7087e466..eb4098323bbc 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,12 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -237,7 +243,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return NULL;
}
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -273,7 +279,7 @@ void dma_direct_free(struct device *dev, size_t size,
else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
arch_dma_clear_uncached(cpu_addr, size);
 
-   dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+   __dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
 struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -310,7 +316,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return page;
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -329,7 +335,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
 
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
 }
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-- 
2.31.1.818.g46aad6cb9e-goog



Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions

2021-05-27 Thread Christoph Hellwig
On Wed, May 19, 2021 at 11:50:07AM -0700, Florian Fainelli wrote:
> You convert this call site with swiotlb_init_io_tlb_mem() which did not
> do the set_memory_decrypted()+memset(). Is this okay or should
> swiotlb_init_io_tlb_mem() add an additional argument to do this
> conditionally?

The zeroing is useful and was missing before.  I think having a clean
state here is the right thing.

Not sure about the set_memory_decrypted, swiotlb_update_mem_attributes
kinda suggests it is too early to set the memory decrupted.

Adding Tom who should now about all this.


Re: [PATCH v7 00/15] Restricted DMA

2021-05-27 Thread Claire Chang
v8 here: https://lore.kernel.org/patchwork/cover/1437112/


[PATCH v8 10/15] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-05-27 Thread Claire Chang
Add a new function, release_slots, to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 88b3471ac6a8..c4fc2e444e7a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -550,27 +550,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -605,6 +593,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 09/15] swiotlb: Move alloc_size to find_slots

2021-05-27 Thread Claire Chang
Move the maintenance of alloc_size to find_slots for better code
reusability later.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index fa7f23fffc81..88b3471ac6a8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -482,8 +482,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -538,11 +541,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 08/15] swiotlb: Bounce data from/to restricted DMA pool if available

2021-05-27 Thread Claire Chang
Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that is_dev_swiotlb_force doesn't check if
swiotlb_force == SWIOTLB_FORCE. Otherwise the memory allocation behavior
with default swiotlb will be changed by the following patche
("dma-direct: Allocate memory from restricted DMA pool if available").

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 13 +
 kernel/dma/direct.c |  3 ++-
 kernel/dma/direct.h |  3 ++-
 kernel/dma/swiotlb.c|  8 
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index c530c976d18b..0c5a18d9cf89 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -120,6 +120,15 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (dev->dma_io_tlb_mem)
+   return true;
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+   return false;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -131,6 +140,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..078f7087e466 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,8 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE ||
+is_dev_swiotlb_force(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..f94813674e23 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,8 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (unlikely(swiotlb_force == SWIOTLB_FORCE) ||
+   is_dev_swiotlb_force(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b2b6503ecd88..fa7f23fffc81 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -347,7 +347,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -429,7 +429,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -506,7 +506,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -557,7 +557,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
unsigned long flags;
unsigned int offset = 

[PATCH v8 07/15] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-05-27 Thread Claire Chang
Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..7d48c433446b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(NULL)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 65430912ff72..d0e998b9e2e8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -270,7 +270,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(NULL);
 #endif
 
ret = ttm_device_init(>ttm.bdev, _bo_driver, drm->dev->dev,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..6d548ce53ce7 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(NULL)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2a6cca07540b..c530c976d18b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -123,7 +123,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -143,7 +143,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d99b403144a8..b2b6503ecd88 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -662,9 +662,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return get_io_tlb_mem(dev) != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 06/15] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-05-27 Thread Claire Chang
Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  6 +++---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..a5df35bfd150 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -832,7 +832,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..0c4fb34f11ab 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b469f04cca26..2a6cca07540b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -113,9 +113,9 @@ static inline struct io_tlb_mem *get_io_tlb_mem(struct 
device *dev)
return io_tlb_default_mem;
 }
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -127,7 +127,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,
   dir);
 
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device 

[PATCH v8 05/15] swiotlb: Add a new get_io_tlb_mem getter

2021-05-27 Thread Claire Chang
Add a new getter, get_io_tlb_mem, to help select the io_tlb_mem struct.
The restricted DMA pool is preferred if available.

The reason it was done this way instead of assigning the active pool to
dev->dma_io_tlb_mem was because directly using dev->dma_io_tlb_mem might
cause memory allocation issues for existing devices. The pool can't
support atomic coherent allocation so swiotlb_alloc needs to distinguish
it from the default swiotlb pool.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 03ad6e3b4056..b469f04cca26 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -102,6 +103,16 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
+static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev)
+{
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (dev && dev->dma_io_tlb_mem)
+   return dev->dma_io_tlb_mem;
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
+   return io_tlb_default_mem;
+}
+
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
struct io_tlb_mem *mem = io_tlb_default_mem;
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-27 Thread Claire Chang
Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Signed-off-by: Claire Chang 
---
 include/linux/device.h  |  4 +++
 include/linux/swiotlb.h |  3 +-
 kernel/dma/swiotlb.c| 76 +
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 959cb9d2c9ab..e78e1ce0b1b1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -521,6 +522,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..03ad6e3b4056 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,7 +72,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b849b01a446f..d99b403144a8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -690,3 +697,72 @@ static int __init swiotlb_create_default_debugfs(void)
 late_initcall(swiotlb_create_default_debugfs);
 
 #endif
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   if (dev->dma_io_tlb_mem)
+   return 0;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS))
+   swiotlb_create_debugfs(mem, rmem->name);
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   if (dev)
+   dev->dma_io_tlb_mem = NULL;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return -EINVAL;
+   }
+
+   rmem->ops = _swiotlb_ops;
+   pr_info("Reserved memory: created restricted DMA pool at %pa, size %ld 
MiB\n",
+   >base, (unsigned long)rmem->size / SZ_1M);
+   return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 03/15] swiotlb: Add DMA_RESTRICTED_POOL

2021-05-27 Thread Claire Chang
Add a new kconfig symbol, DMA_RESTRICTED_POOL, for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/Kconfig | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 02/15] swiotlb: Refactor swiotlb_create_debugfs

2021-05-27 Thread Claire Chang
Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d3232fc19385..b849b01a446f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -64,6 +64,7 @@
 enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem *io_tlb_default_mem;
+static struct dentry *debugfs_dir;
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -662,18 +663,30 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs(struct io_tlb_mem *mem, const char *name)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
+   return;
+
+   mem->debugfs = debugfs_create_dir(name, debugfs_dir);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   if (mem) {
+   swiotlb_create_debugfs(mem, "swiotlb");
+   debugfs_dir = mem->debugfs;
+   } else {
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   }
+
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 01/15] swiotlb: Refactor swiotlb init functions

2021-05-27 Thread Claire Chang
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Note that we now also call set_memory_decrypted in swiotlb_init_with_tbl.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 51 ++--
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..d3232fc19385 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,30 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+
+   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +207,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,7 +295,6 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
@@ -297,20 +309,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
-   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.31.1.818.g46aad6cb9e-goog



[PATCH v8 00/15] Restricted DMA

2021-05-27 Thread Claire Chang
This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/

Claire Chang (15):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Add DMA_RESTRICTED_POOL
  swiotlb: Add restricted DMA pool initialization
  swiotlb: Add a new get_io_tlb_mem getter
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Bounce data from/to restricted DMA pool if available
  swiotlb: Move alloc_size to find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  dma-direct: Add a new wrapper __dma_direct_free_pages()
  swiotlb: Add restricted DMA alloc/free support.
  dma-direct: Allocate memory from restricted DMA pool if available
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt   |  36 ++-
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
 drivers/iommu/dma-iommu.c |  12 +-
 drivers/of/address.c  |  25 ++
 drivers/of/device.c   |   3 +
 drivers/of/of_private.h   |   5 +
 drivers/pci/xen-pcifront.c|   2 +-
 drivers/xen/swiotlb-xen.c |   2 +-
 include/linux/device.h|   4 +
 include/linux/swiotlb.h   |  41 ++-
 kernel/dma/Kconfig|  14 +
 kernel/dma/direct.c   |  63 +++--
 kernel/dma/direct.h   |   9 +-
 kernel/dma/swiotlb.c  | 242 +-
 15 files changed, 362 insertions(+), 100 deletions(-)

-- 
2.31.1.818.g46aad6cb9e-goog



Re: [PATCH v7 14/15] dt-bindings: of: Add restricted DMA pool

2021-05-27 Thread Will Deacon
On Thu, May 27, 2021 at 08:48:59PM +0800, Claire Chang wrote:
> On Thu, May 27, 2021 at 7:35 PM Will Deacon  wrote:
> >
> > On Thu, May 27, 2021 at 07:29:20PM +0800, Claire Chang wrote:
> > > On Wed, May 26, 2021 at 11:53 PM Will Deacon  wrote:
> > > >
> > > > On Wed, May 26, 2021 at 01:13:22PM +0100, Will Deacon wrote:
> > > > > On Tue, May 18, 2021 at 02:42:14PM +0800, Claire Chang wrote:
> > > > > > @@ -138,4 +160,9 @@ one for multimedia processing (named 
> > > > > > multimedia-memory@7700, 64MiB).
> > > > > > memory-region = <_reserved>;
> > > > > > /* ... */
> > > > > > };
> > > > > > +
> > > > > > +   pcie_device: pcie_device@0,0 {
> > > > > > +   memory-region = <_dma_mem_reserved>;
> > > > > > +   /* ... */
> > > > > > +   };
> > > > >
> > > > > I still don't understand how this works for individual PCIe devices 
> > > > > -- how
> > > > > is dev->of_node set to point at the node you have above?
> > > > >
> > > > > I tried adding the memory-region to the host controller instead, and 
> > > > > then
> > > > > I see it crop up in dmesg:
> > > > >
> > > > >   | pci-host-generic 4000.pci: assigned reserved memory node 
> > > > > restricted_dma_mem_reserved
> > > > >
> > > > > but none of the actual PCI devices end up with 'dma_io_tlb_mem' set, 
> > > > > and
> > > > > so the restricted DMA area is not used. In fact, swiotlb isn't used 
> > > > > at all.
> > > > >
> > > > > What am I missing to make this work with PCIe devices?
> > > >
> > > > Aha, looks like we're just missing the logic to inherit the DMA
> > > > configuration. The diff below gets things working for me.
> > >
> > > I guess what was missing is the reg property in the pcie_device node.
> > > Will update the example dts.
> >
> > Thanks. I still think something like my diff makes sense, if you wouldn't 
> > mind including
> > it, as it allows restricted DMA to be used for situations where the PCIe
> > topology is not static.
> >
> > Perhaps we should prefer dev->of_node if it exists, but then use the node
> > of the host bridge's parent node otherwise?
> 
> Sure. Let me add in the next version.

Brill, thanks! I'll take it for a spin once it lands on the list.

Will


Re: [PATCH v7 14/15] dt-bindings: of: Add restricted DMA pool

2021-05-27 Thread Claire Chang
On Thu, May 27, 2021 at 7:35 PM Will Deacon  wrote:
>
> On Thu, May 27, 2021 at 07:29:20PM +0800, Claire Chang wrote:
> > On Wed, May 26, 2021 at 11:53 PM Will Deacon  wrote:
> > >
> > > On Wed, May 26, 2021 at 01:13:22PM +0100, Will Deacon wrote:
> > > > On Tue, May 18, 2021 at 02:42:14PM +0800, Claire Chang wrote:
> > > > > @@ -138,4 +160,9 @@ one for multimedia processing (named 
> > > > > multimedia-memory@7700, 64MiB).
> > > > > memory-region = <_reserved>;
> > > > > /* ... */
> > > > > };
> > > > > +
> > > > > +   pcie_device: pcie_device@0,0 {
> > > > > +   memory-region = <_dma_mem_reserved>;
> > > > > +   /* ... */
> > > > > +   };
> > > >
> > > > I still don't understand how this works for individual PCIe devices -- 
> > > > how
> > > > is dev->of_node set to point at the node you have above?
> > > >
> > > > I tried adding the memory-region to the host controller instead, and 
> > > > then
> > > > I see it crop up in dmesg:
> > > >
> > > >   | pci-host-generic 4000.pci: assigned reserved memory node 
> > > > restricted_dma_mem_reserved
> > > >
> > > > but none of the actual PCI devices end up with 'dma_io_tlb_mem' set, and
> > > > so the restricted DMA area is not used. In fact, swiotlb isn't used at 
> > > > all.
> > > >
> > > > What am I missing to make this work with PCIe devices?
> > >
> > > Aha, looks like we're just missing the logic to inherit the DMA
> > > configuration. The diff below gets things working for me.
> >
> > I guess what was missing is the reg property in the pcie_device node.
> > Will update the example dts.
>
> Thanks. I still think something like my diff makes sense, if you wouldn't 
> mind including
> it, as it allows restricted DMA to be used for situations where the PCIe
> topology is not static.
>
> Perhaps we should prefer dev->of_node if it exists, but then use the node
> of the host bridge's parent node otherwise?

Sure. Let me add in the next version.

>
> Will


Re: [PATCH v7 14/15] dt-bindings: of: Add restricted DMA pool

2021-05-27 Thread Claire Chang
On Wed, May 26, 2021 at 11:53 PM Will Deacon  wrote:
>
> On Wed, May 26, 2021 at 01:13:22PM +0100, Will Deacon wrote:
> > On Tue, May 18, 2021 at 02:42:14PM +0800, Claire Chang wrote:
> > > @@ -138,4 +160,9 @@ one for multimedia processing (named 
> > > multimedia-memory@7700, 64MiB).
> > > memory-region = <_reserved>;
> > > /* ... */
> > > };
> > > +
> > > +   pcie_device: pcie_device@0,0 {
> > > +   memory-region = <_dma_mem_reserved>;
> > > +   /* ... */
> > > +   };
> >
> > I still don't understand how this works for individual PCIe devices -- how
> > is dev->of_node set to point at the node you have above?
> >
> > I tried adding the memory-region to the host controller instead, and then
> > I see it crop up in dmesg:
> >
> >   | pci-host-generic 4000.pci: assigned reserved memory node 
> > restricted_dma_mem_reserved
> >
> > but none of the actual PCI devices end up with 'dma_io_tlb_mem' set, and
> > so the restricted DMA area is not used. In fact, swiotlb isn't used at all.
> >
> > What am I missing to make this work with PCIe devices?
>
> Aha, looks like we're just missing the logic to inherit the DMA
> configuration. The diff below gets things working for me.

I guess what was missing is the reg property in the pcie_device node.
Will update the example dts.


Re: [PATCH v7 14/15] dt-bindings: of: Add restricted DMA pool

2021-05-27 Thread Will Deacon
On Thu, May 27, 2021 at 07:29:20PM +0800, Claire Chang wrote:
> On Wed, May 26, 2021 at 11:53 PM Will Deacon  wrote:
> >
> > On Wed, May 26, 2021 at 01:13:22PM +0100, Will Deacon wrote:
> > > On Tue, May 18, 2021 at 02:42:14PM +0800, Claire Chang wrote:
> > > > @@ -138,4 +160,9 @@ one for multimedia processing (named 
> > > > multimedia-memory@7700, 64MiB).
> > > > memory-region = <_reserved>;
> > > > /* ... */
> > > > };
> > > > +
> > > > +   pcie_device: pcie_device@0,0 {
> > > > +   memory-region = <_dma_mem_reserved>;
> > > > +   /* ... */
> > > > +   };
> > >
> > > I still don't understand how this works for individual PCIe devices -- how
> > > is dev->of_node set to point at the node you have above?
> > >
> > > I tried adding the memory-region to the host controller instead, and then
> > > I see it crop up in dmesg:
> > >
> > >   | pci-host-generic 4000.pci: assigned reserved memory node 
> > > restricted_dma_mem_reserved
> > >
> > > but none of the actual PCI devices end up with 'dma_io_tlb_mem' set, and
> > > so the restricted DMA area is not used. In fact, swiotlb isn't used at 
> > > all.
> > >
> > > What am I missing to make this work with PCIe devices?
> >
> > Aha, looks like we're just missing the logic to inherit the DMA
> > configuration. The diff below gets things working for me.
> 
> I guess what was missing is the reg property in the pcie_device node.
> Will update the example dts.

Thanks. I still think something like my diff makes sense, if you wouldn't mind 
including
it, as it allows restricted DMA to be used for situations where the PCIe
topology is not static.

Perhaps we should prefer dev->of_node if it exists, but then use the node
of the host bridge's parent node otherwise?

Will


Re: [PATCH] powerpc/papr_scm: Add support for reporting dirty-shutdown-count

2021-05-27 Thread Vaibhav Jain
Thanks for catching this Santosh. Have fixed this in v2 version of this
patch

-- 
Cheers
~ Vaibhav


Santosh Sivaraj  writes:

> Hi Vaibhav,
>
> Vaibhav Jain  writes:
>
>> Persistent memory devices like NVDIMMs can loose cached writes in case
>> something prevents flush on power-fail. Such situations are termed as
>> dirty shutdown and are exposed to applications as
>> last-shutdown-state (LSS) flag and a dirty-shutdown-counter(DSC) as
>> described at [1]. The latter being useful in conditions where multiple
>> applications want to detect a dirty shutdown event without racing with
>> one another.
>>
>> PAPR-NVDIMMs have so far only exposed LSS style flags to indicate a
>> dirty-shutdown-state. This patch further adds support for DSC via the
>> "ibm,persistence-failed-count" device tree property of an NVDIMM. This
>> property is a monotonic increasing 64-bit counter thats an indication
>> of number of times an NVDIMM has encountered a dirty-shutdown event
>> causing persistence loss.
>>
>> Since this value is not expected to change after system-boot hence
>> papr_scm reads & caches its value during NVDIMM probe and exposes it
>> as a PAPR sysfs attributed named 'dirty_shutdown' to match the name of
>> similarly named NFIT sysfs attribute. Also this value is available to
>> libnvdimm via PAPR_PDSM_HEALTH payload. 'struct nd_papr_pdsm_health'
>> has been extended to add a new member called 'dimm_dsc' presence of
>> which is indicated by the newly introduced PDSM_DIMM_DSC_VALID flag.
>>
>> References:
>> [1] https://pmem.io/documents/Dirty_Shutdown_Handling-V1.0.pdf
>>
>> Signed-off-by: Vaibhav Jain 
>> ---
>>  arch/powerpc/include/uapi/asm/papr_pdsm.h |  6 +
>>  arch/powerpc/platforms/pseries/papr_scm.c | 30 +++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
>> b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> index 50ef95e2f5b1..82488b1e7276 100644
>> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> @@ -77,6 +77,9 @@
>>  /* Indicate that the 'dimm_fuel_gauge' field is valid */
>>  #define PDSM_DIMM_HEALTH_RUN_GAUGE_VALID 1
>>  
>> +/* Indicate that the 'dimm_dsc' field is valid */
>> +#define PDSM_DIMM_DSC_VALID 2
>> +
>>  /*
>>   * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
>>   * Various flags indicate the health status of the dimm.
>> @@ -105,6 +108,9 @@ struct nd_papr_pdsm_health {
>>  
>>  /* Extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID */
>>  __u16 dimm_fuel_gauge;
>> +
>> +/* Extension flag PDSM_DIMM_DSC_VALID */
>> +__u64 dimm_dsc;
>>  };
>>  __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
>>  };
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 11e7b90a3360..68f0d3d5e899 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -114,6 +114,9 @@ struct papr_scm_priv {
>>  /* Health information for the dimm */
>>  u64 health_bitmap;
>>  
>> +/* Holds the last known dirty shutdown counter value */
>> +u64 dirty_shutdown_counter;
>> +
>>  /* length of the stat buffer as expected by phyp */
>>  size_t stat_buffer_len;
>>  };
>> @@ -603,6 +606,16 @@ static int papr_pdsm_fuel_gauge(struct papr_scm_priv *p,
>>  return rc;
>>  }
>>  
>> +/* Add the dirty-shutdown-counter value to the pdsm */
>> +static int papr_psdm_dsc(struct papr_scm_priv *p,
> should be pdsm
>> + union nd_pdsm_payload *payload)
>> +{
>> +payload->health.extension_flags |= PDSM_DIMM_DSC_VALID;
>> +payload->health.dimm_dsc = p->dirty_shutdown_counter;
>> +
>> +return sizeof(struct nd_papr_pdsm_health);
>> +}
>> +
>>  /* Fetch the DIMM health info and populate it in provided package. */
>>  static int papr_pdsm_health(struct papr_scm_priv *p,
>>  union nd_pdsm_payload *payload)
>> @@ -646,6 +659,8 @@ static int papr_pdsm_health(struct papr_scm_priv *p,
>>  
>>  /* Populate the fuel gauge meter in the payload */
>>  papr_pdsm_fuel_gauge(p, payload);
>> +/* Populate the dirty-shutdown-counter field */
>> +papr_psdm_dsc(p, payload);
>   same typo
>
> Thanks,
> Santosh
>
>>  
>>  rc = sizeof(struct nd_papr_pdsm_health);
>>  
>> @@ -907,6 +922,16 @@ static ssize_t flags_show(struct device *dev,
>>  }
>>  DEVICE_ATTR_RO(flags);
>>  
>> +static ssize_t dirty_shutdown_show(struct device *dev,
>> +  struct device_attribute *attr, char *buf)
>> +{
>> +struct nvdimm *dimm = to_nvdimm(dev);
>> +struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> +
>> +return sysfs_emit(buf, "%llu\n", p->dirty_shutdown_counter);
>> +}
>> +DEVICE_ATTR_RO(dirty_shutdown);
>> +
>>  static umode_t papr_nd_attribute_visible(struct 

[PATCH v2] powerpc/papr_scm: Add support for reporting dirty-shutdown-count

2021-05-27 Thread Vaibhav Jain
Persistent memory devices like NVDIMMs can loose cached writes in case
something prevents flush on power-fail. Such situations are termed as
dirty shutdown and are exposed to applications as
last-shutdown-state (LSS) flag and a dirty-shutdown-counter(DSC) as
described at [1]. The latter being useful in conditions where multiple
applications want to detect a dirty shutdown event without racing with
one another.

PAPR-NVDIMMs have so far only exposed LSS style flags to indicate a
dirty-shutdown-state. This patch further adds support for DSC via the
"ibm,persistence-failed-count" device tree property of an NVDIMM. This
property is a monotonic increasing 64-bit counter thats an indication
of number of times an NVDIMM has encountered a dirty-shutdown event
causing persistence loss.

Since this value is not expected to change after system-boot hence
papr_scm reads & caches its value during NVDIMM probe and exposes it
as a PAPR sysfs attributed named 'dirty_shutdown' to match the name of
similarly named NFIT sysfs attribute. Also this value is available to
libnvdimm via PAPR_PDSM_HEALTH payload. 'struct nd_papr_pdsm_health'
has been extended to add a new member called 'dimm_dsc' presence of
which is indicated by the newly introduced PDSM_DIMM_DSC_VALID flag.

References:
[1] https://pmem.io/documents/Dirty_Shutdown_Handling-V1.0.pdf

Signed-off-by: Vaibhav Jain 
---
Changelog:

v2:
* Rebased the patch on latest ppc-next tree
* s/psdm/psdm/g   [ Santosh ]
---
 arch/powerpc/include/uapi/asm/papr_pdsm.h |  6 +
 arch/powerpc/platforms/pseries/papr_scm.c | 30 +++
 2 files changed, 36 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 50ef95e2f5b1..82488b1e7276 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -77,6 +77,9 @@
 /* Indicate that the 'dimm_fuel_gauge' field is valid */
 #define PDSM_DIMM_HEALTH_RUN_GAUGE_VALID 1
 
+/* Indicate that the 'dimm_dsc' field is valid */
+#define PDSM_DIMM_DSC_VALID 2
+
 /*
  * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
  * Various flags indicate the health status of the dimm.
@@ -105,6 +108,9 @@ struct nd_papr_pdsm_health {
 
/* Extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID */
__u16 dimm_fuel_gauge;
+
+   /* Extension flag PDSM_DIMM_DSC_VALID */
+   __u64 dimm_dsc;
};
__u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
};
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 11e7b90a3360..eb8be0eb6119 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -114,6 +114,9 @@ struct papr_scm_priv {
/* Health information for the dimm */
u64 health_bitmap;
 
+   /* Holds the last known dirty shutdown counter value */
+   u64 dirty_shutdown_counter;
+
/* length of the stat buffer as expected by phyp */
size_t stat_buffer_len;
 };
@@ -603,6 +606,16 @@ static int papr_pdsm_fuel_gauge(struct papr_scm_priv *p,
return rc;
 }
 
+/* Add the dirty-shutdown-counter value to the pdsm */
+static int papr_pdsm_dsc(struct papr_scm_priv *p,
+union nd_pdsm_payload *payload)
+{
+   payload->health.extension_flags |= PDSM_DIMM_DSC_VALID;
+   payload->health.dimm_dsc = p->dirty_shutdown_counter;
+
+   return sizeof(struct nd_papr_pdsm_health);
+}
+
 /* Fetch the DIMM health info and populate it in provided package. */
 static int papr_pdsm_health(struct papr_scm_priv *p,
union nd_pdsm_payload *payload)
@@ -646,6 +659,8 @@ static int papr_pdsm_health(struct papr_scm_priv *p,
 
/* Populate the fuel gauge meter in the payload */
papr_pdsm_fuel_gauge(p, payload);
+   /* Populate the dirty-shutdown-counter field */
+   papr_pdsm_dsc(p, payload);
 
rc = sizeof(struct nd_papr_pdsm_health);
 
@@ -907,6 +922,16 @@ static ssize_t flags_show(struct device *dev,
 }
 DEVICE_ATTR_RO(flags);
 
+static ssize_t dirty_shutdown_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct nvdimm *dimm = to_nvdimm(dev);
+   struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+
+   return sysfs_emit(buf, "%llu\n", p->dirty_shutdown_counter);
+}
+DEVICE_ATTR_RO(dirty_shutdown);
+
 static umode_t papr_nd_attribute_visible(struct kobject *kobj,
 struct attribute *attr, int n)
 {
@@ -925,6 +950,7 @@ static umode_t papr_nd_attribute_visible(struct kobject 
*kobj,
 static struct attribute *papr_nd_attributes[] = {
_attr_flags.attr,
_attr_perf_stats.attr,
+   _attr_dirty_shutdown.attr,
NULL,
 };
 
@@ -1149,6 +1175,10 @@ static int papr_scm_probe(struct platform_device *pdev)

Re: [PATCH] powerpc/udbg_hvc: retry putc on -EAGAIN

2021-05-27 Thread Greg KH
On Sun, May 23, 2021 at 08:51:09PM +1000, Michael Ellerman wrote:
> Greg KH  writes:
> > On Fri, May 14, 2021 at 04:44:22PM -0500, Nathan Lynch wrote:
> >> hvterm_raw_put_chars() calls hvc_put_chars(), which may return -EAGAIN
> >> when the underlying hcall returns a "busy" status, but udbg_hvc_putc()
> >> doesn't handle this. When using xmon on a PowerVM guest, this can
> >> result in incomplete or garbled output when printing relatively large
> >> amounts of data quickly, such as when dumping the kernel log buffer.
> >> 
> >> Call again on -EAGAIN.
> >> 
> >> Signed-off-by: Nathan Lynch 
> >> ---
> >>  drivers/tty/hvc/hvc_vio.c | 2 +-
> >
> > Subject line does not match up with this file name.
> >
> > Don't you want "tty" and "hvc" in there somewhere?
> 
> It's a powerpc only driver, but I guess the subject should still be
> "tty: hvc: ..." to match convention.
> 
> I was planning to take this via the powerpc tree, but I can drop it if
> you'd rather take it.

No problem, feel free to take it yourself!

greg k-h


[PATCH -next] ASoC: imx-rpmsg: fix platform_no_drv_owner.cocci warnings

2021-05-27 Thread Zou Wei
./sound/soc/fsl/imx-rpmsg.c:140:3-8: No need to set .owner here. The core will 
do it.

 Remove .owner field if calls are used which set it automatically

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Reported-by: Hulk Robot 
Signed-off-by: Zou Wei 
---
 sound/soc/fsl/imx-rpmsg.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/soc/fsl/imx-rpmsg.c b/sound/soc/fsl/imx-rpmsg.c
index 5a9a470..f0cae8c 100644
--- a/sound/soc/fsl/imx-rpmsg.c
+++ b/sound/soc/fsl/imx-rpmsg.c
@@ -137,7 +137,6 @@ static int imx_rpmsg_probe(struct platform_device *pdev)
 static struct platform_driver imx_rpmsg_driver = {
.driver = {
.name = "imx-audio-rpmsg",
-   .owner = THIS_MODULE,
.pm = _soc_pm_ops,
},
.probe = imx_rpmsg_probe,
-- 
2.6.2



Re: [PATCH] Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"

2021-05-27 Thread Frederic Barrat




On 27/05/2021 04:13, Alexey Kardashevskiy wrote:

On 27/05/2021 00:45, Frederic Barrat wrote:

This reverts commit 3c0468d4451eb6b4f6604370639f163f9637a479.

That commit was breaking alignment guarantees for the DMA address when
allocating coherent mappings, as described in
Documentation/core-api/dma-api-howto.rst

It was also noticed by Mellanox' driver:
[ 1515.763621] mlx5_core c002:01:00.0: 
mlx5_frag_buf_alloc_node:146:(pid 13402): unexpected map alignment: 
0x08c61000, page_shift=16

[ 1515.763635] mlx5_core c002:01:00.0: mlx5_cqwq_create:181:(pid
13402): mlx5_frag_buf_alloc_node() failed, -12

Signed-off-by: Frederic Barrat 


Should it be

Fixes: 3c0468d4451e ("powerpc/kernel/iommu: Align size for 
IOMMU_PAGE_SIZE() to save TCEs")


?



I had been wondering the same... I can see many revert commits with and 
without a "Fixes:" line. Since here the offending commit was merged in 
the latest merge window, I was thinking Fixes doesn't really bring 
anything, it will all stay internal to v5.13 development. I'd be happy 
to hear of the expected way of handling it. I'm guessing a big part of 
the answer is whether the tooling looking for a "Fixes" line is also 
looking for reverts.


  Fred





Anyway,

Reviewed-by: Alexey Kardashevskiy