Re: [PATCH v4 09/15] pci: Consolidate pci_iomap* and pci_iomap*wc

2021-08-12 Thread Andi Kleen




Why is it safe to make pci_iomap_wc_range() support IO ports when it
didn't before?  That might be desirable, but I think it *is* a
functional change here.


None of the callers use it to map IO ports, so it will be a no-op for 
them. But you're right, it's a (minor) functional change.


-Andi




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 10/15] asm/io.h: Add ioremap_shared fallback

2021-08-12 Thread Bjorn Helgaas
On Wed, Aug 04, 2021 at 05:52:13PM -0700, Kuppuswamy Sathyanarayanan wrote:
> From: Andi Kleen 
> 
> This function is for declaring memory that should be shared with
> a hypervisor in a confidential guest. If the architecture doesn't
> implement it it's just ioremap.

I would assume ioremap_shared() would "map" something, not "declare"
it.

> Signed-off-by: Andi Kleen 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> ---
>  arch/alpha/include/asm/io.h| 1 +
>  arch/mips/include/asm/io.h | 1 +
>  arch/parisc/include/asm/io.h   | 1 +
>  arch/sparc/include/asm/io_64.h | 1 +
>  include/asm-generic/io.h   | 4 
>  5 files changed, 8 insertions(+)
> 
> diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
> index 0fab5ac90775..701b44909b94 100644
> --- a/arch/alpha/include/asm/io.h
> +++ b/arch/alpha/include/asm/io.h
> @@ -283,6 +283,7 @@ static inline void __iomem *ioremap(unsigned long port, 
> unsigned long size)
>  }
>  
>  #define ioremap_wc ioremap
> +#define ioremap_shared ioremap
>  #define ioremap_uc ioremap
>  
>  static inline void iounmap(volatile void __iomem *addr)
> diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
> index 6f5c86d2bab4..3713ff624632 100644
> --- a/arch/mips/include/asm/io.h
> +++ b/arch/mips/include/asm/io.h
> @@ -179,6 +179,7 @@ void iounmap(const volatile void __iomem *addr);
>  #define ioremap(offset, size)
> \
>   ioremap_prot((offset), (size), _CACHE_UNCACHED)
>  #define ioremap_uc   ioremap
> +#define ioremap_shared   ioremap
>  
>  /*
>   * ioremap_cache -   map bus memory into CPU space
> diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
> index 0b5259102319..73064e152df7 100644
> --- a/arch/parisc/include/asm/io.h
> +++ b/arch/parisc/include/asm/io.h
> @@ -129,6 +129,7 @@ static inline void gsc_writeq(unsigned long long val, 
> unsigned long addr)
>   */
>  void __iomem *ioremap(unsigned long offset, unsigned long size);
>  #define ioremap_wc   ioremap
> +#define ioremap_shared   ioremap
>  #define ioremap_uc   ioremap
>  
>  extern void iounmap(const volatile void __iomem *addr);
> diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
> index 5ffa820dcd4d..18cc656eb712 100644
> --- a/arch/sparc/include/asm/io_64.h
> +++ b/arch/sparc/include/asm/io_64.h
> @@ -409,6 +409,7 @@ static inline void __iomem *ioremap(unsigned long offset, 
> unsigned long size)
>  #define ioremap_uc(X,Y)  ioremap((X),(Y))
>  #define ioremap_wc(X,Y)  ioremap((X),(Y))
>  #define ioremap_wt(X,Y)  ioremap((X),(Y))
> +#define ioremap_shared(X, Y) ioremap((X), (Y))
>  static inline void __iomem *ioremap_np(unsigned long offset, unsigned long 
> size)
>  {
>   return NULL;
> diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
> index e93375c710b9..bfcaee1691c8 100644
> --- a/include/asm-generic/io.h
> +++ b/include/asm-generic/io.h
> @@ -982,6 +982,10 @@ static inline void __iomem *ioremap(phys_addr_t addr, 
> size_t size)
>  #define ioremap_wt ioremap
>  #endif
>  
> +#ifndef ioremap_shared
> +#define ioremap_shared ioremap
> +#endif

"ioremap_shared" is a very generic term for a pretty specific thing:
"memory shared with a hypervisor in a confidential guest".

Maybe deserves a comment with at least a hint here.  "Hypervisors in a
confidential guest" isn't the first thing that comes to mind when I
read "shared".

>  /*
>   * ioremap_uc is special in that we do require an explicit architecture
>   * implementation.  In general you do not want to use this function in a
> -- 
> 2.25.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 09/15] pci: Consolidate pci_iomap* and pci_iomap*wc

2021-08-12 Thread Bjorn Helgaas
Is there a branch with all of this applied?  I was going to apply this
to help take a look at it, but it doesn't apply to v5.14-rc1.  I know
you listed some prereqs in the cover letter, but it's a fair amount of
work to sort all that out.

On Wed, Aug 04, 2021 at 05:52:12PM -0700, Kuppuswamy Sathyanarayanan wrote:
> From: Andi Kleen 

If I were applying these, I would silently update the subject lines to
match previous commits.  Since these will probably be merged via a
different tree, you can update if there's a v5:

  PCI: Consolidate pci_iomap_range(), pci_iomap_wc_range()

Also applies to 11/15 and 12/15.

> pci_iomap* and pci_iomap*wc are currently duplicated code, except
> that the _wc variant does not support IO ports. Replace them
> with a common helper and a callback for the mapping. I used
> wrappers for the maps because some architectures implement ioremap
> and friends with macros.

Maybe spell some of this out:

  pci_iomap_range() and pci_iomap_wc_range() are currently duplicated
  code, ...  Implement them using a common helper,
  pci_iomap_range_map(), ...

Using "pci_iomap*" obscures the name and doesn't save any space.

Why is it safe to make pci_iomap_wc_range() support IO ports when it
didn't before?  That might be desirable, but I think it *is* a
functional change here.

IIUC, pci_iomap_wc_range() on an IO port range previously returned
NULL, and after this patch it will work the same as pci_iomap_range(),
i.e., it will return the result of __pci_ioport_map().

> This will allow to add more variants without excessive code
> duplications. This patch should have no behavior change.
> 
> Signed-off-by: Andi Kleen 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> ---
>  lib/pci_iomap.c | 81 +++--
>  1 file changed, 44 insertions(+), 37 deletions(-)
> 
> diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> index 2d3eb1cb73b8..6251c3f651c6 100644
> --- a/lib/pci_iomap.c
> +++ b/lib/pci_iomap.c
> @@ -10,6 +10,46 @@
>  #include 
>  
>  #ifdef CONFIG_PCI
> +
> +/*
> + * Callback wrappers because some architectures define ioremap et.al.
> + * as macros.
> + */
> +static void __iomem *map_ioremap(phys_addr_t addr, size_t size)
> +{
> + return ioremap(addr, size);
> +}
> +
> +static void __iomem *map_ioremap_wc(phys_addr_t addr, size_t size)
> +{
> + return ioremap_wc(addr, size);
> +}
> +
> +static void __iomem *pci_iomap_range_map(struct pci_dev *dev,
> +  int bar,
> +  unsigned long offset,
> +  unsigned long maxlen,
> +  void __iomem *(*mapm)(phys_addr_t,
> +size_t))
> +{
> + resource_size_t start = pci_resource_start(dev, bar);
> + resource_size_t len = pci_resource_len(dev, bar);
> + unsigned long flags = pci_resource_flags(dev, bar);
> +
> + if (len <= offset || !start)
> + return NULL;
> + len -= offset;
> + start += offset;
> + if (maxlen && len > maxlen)
> + len = maxlen;
> + if (flags & IORESOURCE_IO)
> + return __pci_ioport_map(dev, start, len);
> + if (flags & IORESOURCE_MEM)
> + return mapm(start, len);
> + /* What? */
> + return NULL;
> +}
> +
>  /**
>   * pci_iomap_range - create a virtual mapping cookie for a PCI BAR
>   * @dev: PCI device that owns the BAR
> @@ -30,22 +70,8 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
> unsigned long offset,
> unsigned long maxlen)
>  {
> - resource_size_t start = pci_resource_start(dev, bar);
> - resource_size_t len = pci_resource_len(dev, bar);
> - unsigned long flags = pci_resource_flags(dev, bar);
> -
> - if (len <= offset || !start)
> - return NULL;
> - len -= offset;
> - start += offset;
> - if (maxlen && len > maxlen)
> - len = maxlen;
> - if (flags & IORESOURCE_IO)
> - return __pci_ioport_map(dev, start, len);
> - if (flags & IORESOURCE_MEM)
> - return ioremap(start, len);
> - /* What? */
> - return NULL;
> + return pci_iomap_range_map(dev, bar, offset, maxlen,
> +map_ioremap);
>  }
>  EXPORT_SYMBOL(pci_iomap_range);
>  
> @@ -70,27 +96,8 @@ void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
>unsigned long offset,
>unsigned long maxlen)
>  {
> - resource_size_t start = pci_resource_start(dev, bar);
> - resource_size_t len = pci_resource_len(dev, bar);
> - unsigned long flags = pci_resource_flags(dev, bar);
> -
> -
> - if (flags & IORESOURCE_IO)
> - return NULL;
> -
> - if (len <= offset || !start)
> - return NULL;
> -
> - len -= offset;
> - start += offset;
> - if (maxlen && len > maxlen)

Re: [PATCH v6 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-12 Thread kernel test robot
Hi Xianting,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tty/tty-testing]
[also build test WARNING on char-misc/char-misc-testing soc/for-next v5.14-rc5 
next-20210812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210812-174847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git 
tty-testing
config: hexagon-randconfig-r041-20210812 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/9f2925b5429149ceb0ea6eeaa8c81d422c3124fc
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210812-174847
git checkout 9f2925b5429149ceb0ea6eeaa8c81d422c3124fc
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/tty/hvc/hvc_console.c:190:26: warning: variable 'hp' is 
>> uninitialized when used here [-Wuninitialized]
   spin_unlock_irqrestore(>c_lock, flags);
   ^~
   drivers/tty/hvc/hvc_console.c:149:23: note: initialize the variable 'hp' to 
silence this warning
   struct hvc_struct *hp;
^
 = NULL
   1 warning generated.


vim +/hp +190 drivers/tty/hvc/hvc_console.c

   136  
   137  /*
   138   * Console APIs, NOT TTY.  These APIs are available immediately when
   139   * hvc_console_setup() finds adapters.
   140   */
   141  
   142  static void hvc_console_print(struct console *co, const char *b,
   143unsigned count)
   144  {
   145  char *c;
   146  unsigned i = 0, n = 0;
   147  int r, donecr = 0, index = co->index;
   148  unsigned long flags;
   149  struct hvc_struct *hp;
   150  
   151  /* Console access attempt outside of acceptable console range. 
*/
   152  if (index >= MAX_NR_HVC_CONSOLES)
   153  return;
   154  
   155  /* This console adapter was removed so it is not usable. */
   156  if (vtermnos[index] == -1 || !cons_outbuf[index])
   157  return;
   158  
   159  c = cons_outbuf[index];
   160  
   161  spin_lock_irqsave(>c_lock, flags);
   162  while (count > 0 || i > 0) {
   163  if (count > 0 && i < sizeof(c)) {
   164  if (b[n] == '\n' && !donecr) {
   165  c[i++] = '\r';
   166  donecr = 1;
   167  } else {
   168  c[i++] = b[n++];
   169  donecr = 0;
   170  --count;
   171  }
   172  } else {
   173  r = cons_ops[index]->put_chars(vtermnos[index], 
c, i);
   174  if (r <= 0) {
   175  /* throw away characters on error
   176   * but spin in case of -EAGAIN */
   177  if (r != -EAGAIN) {
   178  i = 0;
   179  } else {
   180  
hvc_console_flush(cons_ops[index],
   181vtermnos[index]);
   182  }
   183  } else if (r > 0) {
   184  i -= r;
   185  if (i > 0)
   186  memmove(c, c+r, i);
   187  }
   188  }
   189  }
 > 190  spin_unlock_irqrestore(>c_lock, flags);
   191  hvc_console_flush(cons_ops[index], vtermnos[index]);
   192  }
   193  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 12/38] crypto: virtio: Replace deprecated CPU-hotplug functions.

2021-08-12 Thread Herbert Xu
On Tue, Aug 03, 2021 at 04:15:55PM +0200, Sebastian Andrzej Siewior wrote:
> The functions get_online_cpus() and put_online_cpus() have been
> deprecated during the CPU hotplug rework. They map directly to
> cpus_read_lock() and cpus_read_unlock().
> 
> Replace deprecated CPU-hotplug functions with the official version.
> The behavior remains unchanged.
> 
> Cc: Gonglei 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: virtualization@lists.linux-foundation.org
> Cc: linux-cry...@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  drivers/crypto/virtio/virtio_crypto_core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 3/3] kernel/resource: cleanup and optimize iomem_is_exclusive()

2021-08-12 Thread Andy Shevchenko
On Thu, Aug 12, 2021 at 09:34:00AM +0200, David Hildenbrand wrote:
> On 12.08.21 09:14, Andy Shevchenko wrote:
> > On Thursday, August 12, 2021, David Hildenbrand  > > wrote:
> > On 11.08.21 22:47, Andy Shevchenko wrote:
> > On Wednesday, August 11, 2021, David Hildenbrand
> > mailto:da...@redhat.com>
> > >> wrote:

> > Yes, it’s like micro optimization. If you want your way I suggest then
> > to add a macro
> > 
> > #define for_each_iomem_resource_child() \
> >   for (iomem_resource...)
> 
> I think the only thing that really makes sense would be something like this 
> on top (not compiled yet):

Makes sense to me, thanks, go for it!

-- 
With Best Regards,
Andy Shevchenko


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-12 Thread Arnd Bergmann
On Thu, Aug 12, 2021 at 10:08 AM Xianting TIan
 wrote:
> 在 2021/8/6 下午10:51, Arnd Bergmann 写道:
> > On Fri, Aug 6, 2021 at 5:01 AM Xianting Tian
> >> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long
> > I think you need a higher alignment for DMA buffers, instead of 
> > sizeof(long),
> > I would suggest ARCH_DMA_MINALIGN.
>
> As some ARCH(eg, x86, riscv) doesn't define ARCH_DMA_MINALIG, so i think
> it 's better remain the code unchanged,
>
> I will send v5 patch soon.

I think you could just use "L1_CACHE_BYTES" as the alignment in this case.
This will make the structure slightly larger for architectures that do not have
alignment constraints on DMA buffers, but using a smaller alignment is
clearly wrong. Another option would be to use ARCH_KMALLOC_MINALIGN.

Note that there is a patch to add ARCH_DMA_MINALIGN to riscv already,
as some implementations do not have coherent DMA. I had failed to
realized though that on x86 you do not get an ARCH_DMA_MINALIGN
definition.

   Arnd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH resend] vsock/virtio: avoid potential deadlock when vsock device remove

2021-08-12 Thread Stefano Garzarella

On Thu, Aug 12, 2021 at 01:30:56PM +0800, Longpeng(Mike) wrote:

There's a potential deadlock case when remove the vsock device or
process the RESET event:

 vsock_for_each_connected_socket:
 spin_lock_bh(_table_lock) --- (1)
 ...
 virtio_vsock_reset_sock:
 lock_sock(sk) - (2)
 ...
 spin_unlock_bh(_table_lock)

lock_sock() may do initiative schedule when the 'sk' is owned by
other thread at the same time, we would receivce a warning message
that "scheduling while atomic".

Even worse, if the next task (selected by the scheduler) try to
release a 'sk', it need to request vsock_table_lock and the deadlock
occur, cause the system into softlockup state.
 Call trace:
  queued_spin_lock_slowpath
  vsock_remove_bound
  vsock_remove_sock
  virtio_transport_release
  __vsock_release
  vsock_release
  __sock_release
  sock_close
  __fput
  fput

So we should not require sk_lock in this case, just like the behavior
in vhost_vsock or vmci.


The difference with vhost_vsock is that here we call it also when we 
receive an event in the event queue (for example because we are 
migrating the VM).


I think the idea of this lock was to prevent concurrency with RX loop, 
but actually if a socket is connected, it can only change state to 
TCP_CLOSING/TCP_CLOSE.


I don't think there is any problem not to take the lock, at most we 
could take the rx_lock in virtio_vsock_event_handle(), but I'm not sure 
it's necessary.




Cc: Stefan Hajnoczi 
Cc: Stefano Garzarella 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 


We should add:
Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko")

Signed-off-by: Longpeng(Mike) 
---
net/vmw_vsock/virtio_transport.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e0c2c99..4f7c99d 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -357,11 +357,14 @@ static void virtio_vsock_event_fill(struct virtio_vsock 
*vsock)

static void virtio_vsock_reset_sock(struct sock *sk)
{
-   lock_sock(sk);
+   /* vmci_transport.c doesn't take sk_lock here either.  At least we're
+	 * under vsock_table_lock so the sock cannot disappear while 
we're

+* executing.
+*/
+
sk->sk_state = TCP_CLOSE;
sk->sk_err = ECONNRESET;
sk_error_report(sk);
-   release_sock(sk);
}

static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)
--
1.8.3.1



With the Fixes tag added:

Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 3/3] kernel/resource: cleanup and optimize iomem_is_exclusive()

2021-08-12 Thread David Hildenbrand

On 12.08.21 09:14, Andy Shevchenko wrote:



On Thursday, August 12, 2021, David Hildenbrand > wrote:


On 11.08.21 22:47, Andy Shevchenko wrote:



On Wednesday, August 11, 2021, David Hildenbrand
mailto:da...@redhat.com>
>> wrote:

     Let's clean it up a bit, removing the unnecessary usage of
r_next() by
     next_resource(), and use next_range_resource() in case we
are not
     interested in a certain subtree.

     Signed-off-by: David Hildenbrand mailto:da...@redhat.com>
     >>
     ---
       kernel/resource.c | 19 +++
       1 file changed, 11 insertions(+), 8 deletions(-)

     diff --git a/kernel/resource.c b/kernel/resource.c
     index 2938cf520ca3..ea853a075a83 100644
     --- a/kernel/resource.c
     +++ b/kernel/resource.c
     @@ -1754,9 +1754,8 @@ static int strict_iomem_checks;
        */
       bool iomem_is_exclusive(u64 addr)
       {
     -       struct resource *p = _resource;
     +       struct resource *p;
              bool err = false;
     -       loff_t l;
              int size = PAGE_SIZE;

              if (!strict_iomem_checks)
     @@ -1765,27 +1764,31 @@ bool iomem_is_exclusive(u64 addr)
              addr = addr & PAGE_MASK;

              read_lock(_lock);
     -       for (p = p->child; p ; p = r_next(NULL, p, )) {
     +       for (p = iomem_resource.child; p ;) {


Hi Andy,


I consider the ordinal part of p initialization is slightly
better and done outside of read lock.

Something like
p= _res...;
read lock
for (p = p->child; ...) {


Why should we care about doing that outside of the lock? That smells
like a micro-optimization the compiler will most probably overwrite
either way as the address of iomem_resource is just constant?

Also, for me it's much more readable and compact if we perform a
single initialization instead of two separate ones in this case.

We're using the pattern I use in, find_next_iomem_res() and
__region_intersects(), while we use the old pattern in
iomem_map_sanity_check(), where we also use the same unnecessary
r_next() call.

I might just cleanup iomem_map_sanity_check() in a similar way.



Yes, it’s like micro optimization. If you want your way I suggest then 
to add a macro


#define for_each_iomem_resource_child() \
  for (iomem_resource...)


I think the only thing that really makes sense would be something like this on 
top (not compiled yet):


diff --git a/kernel/resource.c b/kernel/resource.c
index ea853a075a83..35aaa72df0ce 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -80,6 +80,11 @@ static struct resource *next_resource_skip_children(struct 
resource *p)
return p->sibling;
 }
 
+#define for_each_resource(_root, _p, _skip_children) \

+   for ((_p) = (_root)->child; (_p); \
+(_p) = (_skip_children) ? next_resource_skip_children(_p) : \
+  next_resource(_p))
+
 static void *r_next(struct seq_file *m, void *v, loff_t *pos)
 {
struct resource *p = v;
@@ -1714,16 +1719,16 @@ int iomem_map_sanity_check(resource_size_t addr, 
unsigned long size)
 bool iomem_range_contains_excluded(u64 addr, u64 size)
 {
const unsigned int flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE;
-   bool excluded = false;
+   bool skip_children, excluded = false;
struct resource *p;
 
read_lock(_lock);

-   for (p = iomem_resource.child; p ;) {
+   for_each_resource(_resource, p, skip_children) {
if (p->start >= addr + size)
break;
if (p->end < addr) {
/* No need to consider children */
-   p = next_resource_skip_children(p);
+   skip_children = true;
continue;
}
/*
@@ -1735,7 +1740,7 @@ bool iomem_range_contains_excluded(u64 addr, u64 size)
excluded = true;
break;
}
-   p = next_resource(p);
+   skip_children = false;
}
read_unlock(_lock);
 
@@ -1755,7 +1760,7 @@ static int strict_iomem_checks;

 bool iomem_is_exclusive(u64 addr)
 {
struct resource *p;
-   bool err = false;
+   bool skip_children, err = false;
int size = PAGE_SIZE;
 
if (!strict_iomem_checks)

@@ -1764,7 +1769,7 @@ bool iomem_is_exclusive(u64 addr)
addr = addr & PAGE_MASK;
 
read_lock(_lock);

-   for (p = iomem_resource.child; p ;) {
+   

Re: [PATCH v1 3/3] kernel/resource: cleanup and optimize iomem_is_exclusive()

2021-08-12 Thread Andy Shevchenko
On Thursday, August 12, 2021, David Hildenbrand  wrote:

> On 11.08.21 22:47, Andy Shevchenko wrote:
>
>>
>>
>> On Wednesday, August 11, 2021, David Hildenbrand > > wrote:
>>
>> Let's clean it up a bit, removing the unnecessary usage of r_next() by
>> next_resource(), and use next_range_resource() in case we are not
>> interested in a certain subtree.
>>
>> Signed-off-by: David Hildenbrand > >
>> ---
>>   kernel/resource.c | 19 +++
>>   1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/resource.c b/kernel/resource.c
>> index 2938cf520ca3..ea853a075a83 100644
>> --- a/kernel/resource.c
>> +++ b/kernel/resource.c
>> @@ -1754,9 +1754,8 @@ static int strict_iomem_checks;
>>*/
>>   bool iomem_is_exclusive(u64 addr)
>>   {
>> -   struct resource *p = _resource;
>> +   struct resource *p;
>>  bool err = false;
>> -   loff_t l;
>>  int size = PAGE_SIZE;
>>
>>  if (!strict_iomem_checks)
>> @@ -1765,27 +1764,31 @@ bool iomem_is_exclusive(u64 addr)
>>  addr = addr & PAGE_MASK;
>>
>>  read_lock(_lock);
>> -   for (p = p->child; p ; p = r_next(NULL, p, )) {
>> +   for (p = iomem_resource.child; p ;) {
>>
>>
> Hi Andy,
>
>
>> I consider the ordinal part of p initialization is slightly better and
>> done outside of read lock.
>>
>> Something like
>> p= _res...;
>> read lock
>> for (p = p->child; ...) {
>>
>
> Why should we care about doing that outside of the lock? That smells like
> a micro-optimization the compiler will most probably overwrite either way
> as the address of iomem_resource is just constant?
>
> Also, for me it's much more readable and compact if we perform a single
> initialization instead of two separate ones in this case.
>
> We're using the pattern I use in, find_next_iomem_res() and
> __region_intersects(), while we use the old pattern in
> iomem_map_sanity_check(), where we also use the same unnecessary r_next()
> call.
>
> I might just cleanup iomem_map_sanity_check() in a similar way.
>
>

Yes, it’s like micro optimization. If you want your way I suggest then to
add a macro

#define for_each_iomem_resource_child() \
 for (iomem_resource...)



>
> Thanks,
>
> David / dhildenb
>
>

-- 
With Best Regards,
Andy Shevchenko
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 3/3] kernel/resource: cleanup and optimize iomem_is_exclusive()

2021-08-12 Thread David Hildenbrand

On 11.08.21 22:47, Andy Shevchenko wrote:



On Wednesday, August 11, 2021, David Hildenbrand > wrote:


Let's clean it up a bit, removing the unnecessary usage of r_next() by
next_resource(), and use next_range_resource() in case we are not
interested in a certain subtree.

Signed-off-by: David Hildenbrand mailto:da...@redhat.com>>
---
  kernel/resource.c | 19 +++
  1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 2938cf520ca3..ea853a075a83 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1754,9 +1754,8 @@ static int strict_iomem_checks;
   */
  bool iomem_is_exclusive(u64 addr)
  {
-       struct resource *p = _resource;
+       struct resource *p;
         bool err = false;
-       loff_t l;
         int size = PAGE_SIZE;

         if (!strict_iomem_checks)
@@ -1765,27 +1764,31 @@ bool iomem_is_exclusive(u64 addr)
         addr = addr & PAGE_MASK;

         read_lock(_lock);
-       for (p = p->child; p ; p = r_next(NULL, p, )) {
+       for (p = iomem_resource.child; p ;) {



Hi Andy,



I consider the ordinal part of p initialization is slightly better and 
done outside of read lock.


Something like
p= _res...;
read lock
for (p = p->child; ...) {


Why should we care about doing that outside of the lock? That smells 
like a micro-optimization the compiler will most probably overwrite 
either way as the address of iomem_resource is just constant?


Also, for me it's much more readable and compact if we perform a single 
initialization instead of two separate ones in this case.


We're using the pattern I use in, find_next_iomem_res() and 
__region_intersects(), while we use the old pattern in 
iomem_map_sanity_check(), where we also use the same unnecessary 
r_next() call.


I might just cleanup iomem_map_sanity_check() in a similar way.

--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Patch v1 3/3] vdpa/mlx5: Add multiqueue support

2021-08-12 Thread Jason Wang


在 2021/8/12 下午3:01, Eli Cohen 写道:

On Thu, Aug 12, 2021 at 02:47:06PM +0800, Jason Wang wrote:

On Thu, Aug 12, 2021 at 12:55 PM Eli Cohen  wrote:

On Thu, Aug 12, 2021 at 11:19:19AM +0800, Jason Wang wrote:

在 2021/8/11 下午7:04, Eli Cohen 写道:

On Wed, Aug 11, 2021 at 04:37:44PM +0800, Jason Wang wrote:

在 2021/8/11 下午3:53, Eli Cohen 写道:

One thing need to solve for mq is that the:



+static u16 ctrl_vq_idx(struct  mlx5_vdpa_dev *mvdev)
+{
+ return 2 *  mlx5_vdpa_max_qps(mvdev->max_vqs);
+}

We should handle the case when MQ is supported by the device but not the
driver.

E.g in the case when we have 2 queue pairs:

When MQ is enabled, cvq is queue 4

When MQ is not enabled, cvq is queue 2


There's some issue with this. I get callbacks targeting specific
virtqueues before features negotiation has been completed.

Specifically, I get set_vq_cb() calls. At this point I must know the
control vq index.

So I think we need do both:

1) At one hand, it's a bug for the userspace to use vq_index before feature
is negotiated

(looks like a bug in my cvq series that will call SET_VRING_CALL before
feature is negotiate, which I will look).

2) At the other hand, the driver should be able to deal with that


All I can do is drop callbacks for VQs before features negotation has
been completed.


Or just leave queue index 0, 1 work.

Since it is not expected to be changed.


Right, will do.


I think the CVQ index must not depend on the negotiated features but
rather depend of the value the device driver provides in the call to
_vdpa_register_device().

At the virtio level, it's too late to change that and it breaks the backward
compatibility.

But at the vDPA level, the under layer device can map virtio cvq to any of
it's virtqueue.

E.g map cvq (index 2) to mlx5 cvq (the last).


I am not following you here. I still don't know what index is cvq.


Right, we still need to wait for the feature being negotiated in order to
proceed.


So to summarise, before feature negotiation complete, I accept calls
referring to VQs only for indices 0 and 1.
After feature negotiation complete I know CVQ index and will accept
indices 0 to cvq index.

I don't get this "accept indices 0 to cvq index".

What I meant to say is that there are several callbacks that refer to
specific virtqueues, e.g. set_vq_address(), set_vq_num() etc. They all
accept virtqueue index as an argument.

What we want to do is verify wheather the index provided is valid or
not. If it is not valid, either return error (if the callback can return
a value) or just avoid processing it. If the index is valid then we
process it normally.

Now we need to decide which index is valid or not. We need something
like this to identifiy valid indexes range:

CVQ clear: 0 and 1
CVQ set, MQ clear: 0, 1 and 2 (for CVQ).
CVQ set, MQ set: 0..nvq where nvq is whatever provided to
_vdpa_register_device()



Yes.




And while writing this, I think this logic does not belog in mlx5_vdpa
but probably in vdpa.c



The problem is that vdpa should be unaware of a specific device type. 
E.g the above indices may work only for virtio-net but not other.


Thanks





Thanks


Thanks



Thanks




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Patch v1 3/3] vdpa/mlx5: Add multiqueue support

2021-08-12 Thread Jason Wang
On Thu, Aug 12, 2021 at 12:55 PM Eli Cohen  wrote:
>
> On Thu, Aug 12, 2021 at 11:19:19AM +0800, Jason Wang wrote:
> >
> > 在 2021/8/11 下午7:04, Eli Cohen 写道:
> > > On Wed, Aug 11, 2021 at 04:37:44PM +0800, Jason Wang wrote:
> > > > 在 2021/8/11 下午3:53, Eli Cohen 写道:
> > > > > > One thing need to solve for mq is that the:
> > > > > >
> > > > > >
> > > > > > > +static u16 ctrl_vq_idx(struct  mlx5_vdpa_dev *mvdev)
> > > > > > > +{
> > > > > > > + return 2 *  mlx5_vdpa_max_qps(mvdev->max_vqs);
> > > > > > > +}
> > > > > > We should handle the case when MQ is supported by the device but 
> > > > > > not the
> > > > > > driver.
> > > > > >
> > > > > > E.g in the case when we have 2 queue pairs:
> > > > > >
> > > > > > When MQ is enabled, cvq is queue 4
> > > > > >
> > > > > > When MQ is not enabled, cvq is queue 2
> > > > > >
> > > > > There's some issue with this. I get callbacks targeting specific
> > > > > virtqueues before features negotiation has been completed.
> > > > >
> > > > > Specifically, I get set_vq_cb() calls. At this point I must know the
> > > > > control vq index.
> > > >
> > > > So I think we need do both:
> > > >
> > > > 1) At one hand, it's a bug for the userspace to use vq_index before 
> > > > feature
> > > > is negotiated
> > > >
> > > > (looks like a bug in my cvq series that will call SET_VRING_CALL before
> > > > feature is negotiate, which I will look).
> > > >
> > > > 2) At the other hand, the driver should be able to deal with that
> > > >
> > > All I can do is drop callbacks for VQs before features negotation has
> > > been completed.
> >
> >
> > Or just leave queue index 0, 1 work.
> >
> > Since it is not expected to be changed.
> >
>
> Right, will do.
>
> >
> > >
> > > > > I think the CVQ index must not depend on the negotiated features but
> > > > > rather depend of the value the device driver provides in the call to
> > > > > _vdpa_register_device().
> > > >
> > > > At the virtio level, it's too late to change that and it breaks the 
> > > > backward
> > > > compatibility.
> > > >
> > > > But at the vDPA level, the under layer device can map virtio cvq to any 
> > > > of
> > > > it's virtqueue.
> > > >
> > > > E.g map cvq (index 2) to mlx5 cvq (the last).
> > > >
> > > I am not following you here. I still don't know what index is cvq.
> >
> >
> > Right, we still need to wait for the feature being negotiated in order to
> > proceed.
> >
>
> So to summarise, before feature negotiation complete, I accept calls
> referring to VQs only for indices 0 and 1.
> After feature negotiation complete I know CVQ index and will accept
> indices 0 to cvq index.

I don't get this "accept indices 0 to cvq index".

Thanks

>
> > Thanks
> >
> >
> > >
> > > > Thanks
> > > >
> > > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO

2021-08-12 Thread Jason Wang


在 2021/8/12 下午12:50, Michael S. Tsirkin 写道:

On Thu, Aug 12, 2021 at 11:23:04AM +0800, Jason Wang wrote:

在 2021/8/12 上午6:17, Jakub Kicinski 写道:

On Wed, 11 Aug 2021 16:16:23 +0800 Jason Wang wrote:

Try to fix this by using NETIF_F_GRO_HW instead so we're not
guaranteed to be re-segmented as original.

This sentence may need rephrasing.


Right, actually, I meant:


Try to fix this by using NETIF_F_GRO_HW instead. But we're not sure the
packet could be re-segmented to the exact original packet stream. Since it's
really depends on the bakcend implementation.



Or we may want a new netdev
feature like RX_GSO since the guest offloads for virtio-net is
actually to receive GSO packet.

Or we can try not advertise LRO is control guest offloads is not
enabled. This solves the warning but will still slow down the traffic.

IMO gro-hw fits pretty well, patch looks good.


If the re-segmentation is not a issue. I will post a formal patch.

Thanks


It is but the point is even though spec did not require this
we always allowed these configurations
in the past so hopefully most of them are not broken and combine
packets in the same way as GRO. Let's not break them all
in an attempt to catch bad configs, and meanwhile amend
the spec to recommend doing GW GRO.



Ok, let me add this in the commit log and send a formal patch.

Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization