Re: [GIT PULL] virtio: fixes, features

2022-10-12 Thread Michael S. Tsirkin
On Wed, Oct 12, 2022 at 11:06:54PM +0200, Arnd Bergmann wrote:
> On Wed, Oct 12, 2022, at 7:22 PM, Linus Torvalds wrote:
> >
> > The NO_IRQ thing is mainly actually defined by a few drivers that just
> > never got converted to the proper world order, and even then you can
> > see the confusion (ie some drivers use "-1", others use "0", and yet
> > others use "((unsigned int)(-1)".
> 
> The last time I looked at removing it for arch/arm/, one problem was
> that there were a number of platforms using IRQ 0 as a valid number.
> We have converted most of them in the meantime, leaving now only
> mach-rpc and mach-footbridge. For the other platforms, we just
> renumbered all interrupts to add one, but footbridge apparently
> relies on hardcoded ISA interrupts in device drivers. For rpc,
> it looks like IRQ 0 (printer) already wouldn't work, and it
> looks like there was never a driver referencing it either.


Do these two boxes even have pci?

> I see that openrisc and parisc also still define NO_IRQ to -1, but at
> least openrisc already relies on 0 being the invalid IRQ (from
> CONFIG_IRQ_DOMAIN), probably parisc as well.
> 
>  Arnd



Re: [GIT PULL] virtio: fixes, features

2022-10-12 Thread Michael S. Tsirkin
On Thu, Oct 13, 2022 at 01:33:59AM +1100, Michael Ellerman wrote:
> Michael Ellerman  writes:
> > [ Cc += Bjorn & linux-pci ]
> >
> > "Michael S. Tsirkin"  writes:
> >> On Wed, Oct 12, 2022 at 05:21:24PM +1100, Michael Ellerman wrote:
> >>> "Michael S. Tsirkin"  writes:
> > ...
> >>> > 
> >>> > virtio: fixes, features
> >>> >
> >>> > 9k mtu perf improvements
> >>> > vdpa feature provisioning
> >>> > virtio blk SECURE ERASE support
> >>> >
> >>> > Fixes, cleanups all over the place.
> >>> >
> >>> > Signed-off-by: Michael S. Tsirkin 
> >>> >
> >>> > 
> >>> > Alvaro Karsz (1):
> >>> >   virtio_blk: add SECURE ERASE command support
> >>> >
> >>> > Angus Chen (1):
> >>> >   virtio_pci: don't try to use intxif pin is zero
> >>> 
> >>> This commit breaks virtio_pci for me on powerpc, when running as a qemu
> >>> guest.
> >>> 
> >>> vp_find_vqs() bails out because pci_dev->pin == 0.
> >>> 
> >>> But pci_dev->irq is populated correctly, so vp_find_vqs_intx() would
> >>> succeed if we called it - which is what the code used to do.
> >>> 
> >>> I think this happens because pci_dev->pin is not populated in
> >>> pci_assign_irq().
> >>> 
> >>> I would absolutely believe this is bug in our PCI code, but I think it
> >>> may also affect other platforms that use of_irq_parse_and_map_pci().
> >>
> >> How about fixing this in of_irq_parse_and_map_pci then?
> >> Something like the below maybe?
> >> 
> >> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> >> index 196834ed44fe..504c4d75c83f 100644
> >> --- a/drivers/pci/of.c
> >> +++ b/drivers/pci/of.c
> >> @@ -446,6 +446,8 @@ static int of_irq_parse_pci(const struct pci_dev 
> >> *pdev, struct of_phandle_args *
> >>if (pin == 0)
> >>return -ENODEV;
> >>  
> >> +  pdev->pin = pin;
> >> +
> >>/* Local interrupt-map in the device node? Use it! */
> >>if (of_get_property(dn, "interrupt-map", NULL)) {
> >>pin = pci_swizzle_interrupt_pin(pdev, pin);
> 
> Backing up a bit. Should the virtio code be looking at pci_dev->pin in
> the first place?
> 
> Shouldn't it be checking pci_dev->irq instead?
> 
> The original commit talks about irq being 0 and colliding with the timer
> interrupt.
> 
> But all (most?) platforms have converged on 0 meaning NO_IRQ since quite
> a fews ago AFAIK.

Are you sure?

arch/arm/include/asm/irq.h:#ifndef NO_IRQ
arch/arm/include/asm/irq.h:#define NO_IRQ   ((unsigned int)(-1))



> And the timer irq == 0 is a special case AIUI:
>   
> https://lore.kernel.org/all/ca+55afwilp1z+2mzkrfsid1wzq0tqkcn8f2e6nl_avr+m1f...@mail.gmail.com/
> 
> cheers



Re: [GIT PULL] virtio: fixes, features

2022-10-12 Thread Michael S. Tsirkin
On Wed, Oct 12, 2022 at 05:21:24PM +1100, Michael Ellerman wrote:
> "Michael S. Tsirkin"  writes:
> > The following changes since commit 4fe89d07dcc2804c8b562f6c7896a45643d34b2f:
> >
> >   Linux 6.0 (2022-10-02 14:09:07 -0700)
> >
> > are available in the Git repository at:
> >
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
> > tags/for_linus
> >
> > for you to fetch changes up to 71491c54eafa318fdd24a1f26a1c82b28e1ac21d:
> >
> >   virtio_pci: don't try to use intxif pin is zero (2022-10-07 20:00:44 
> > -0400)
> >
> > 
> > virtio: fixes, features
> >
> > 9k mtu perf improvements
> > vdpa feature provisioning
> > virtio blk SECURE ERASE support
> >
> > Fixes, cleanups all over the place.
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> > 
> > Alvaro Karsz (1):
> >   virtio_blk: add SECURE ERASE command support
> >
> > Angus Chen (1):
> >   virtio_pci: don't try to use intxif pin is zero
> 
> This commit breaks virtio_pci for me on powerpc, when running as a qemu
> guest.
> 
> vp_find_vqs() bails out because pci_dev->pin == 0.
> 
> But pci_dev->irq is populated correctly, so vp_find_vqs_intx() would
> succeed if we called it - which is what the code used to do.
> 
> I think this happens because pci_dev->pin is not populated in
> pci_assign_irq().
> 
> I would absolutely believe this is bug in our PCI code, but I think it
> may also affect other platforms that use of_irq_parse_and_map_pci().
> 
> cheers

How about fixing this in of_irq_parse_and_map_pci then?
Something like the below maybe?

diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 196834ed44fe..504c4d75c83f 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -446,6 +446,8 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, 
struct of_phandle_args *
if (pin == 0)
return -ENODEV;
 
+   pdev->pin = pin;
+
/* Local interrupt-map in the device node? Use it! */
if (of_get_property(dn, "interrupt-map", NULL)) {
pin = pci_swizzle_interrupt_pin(pdev, pin);



Re: [PATCH v2 1/4] Make place for common balloon code

2022-08-16 Thread Michael S. Tsirkin
On Tue, Aug 16, 2022 at 01:56:32PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Aug 16, 2022 at 02:47:22PM +0300, Alexander Atanasov wrote:
> > Hello,
> > 
> > On 16.08.22 12:49, Greg Kroah-Hartman wrote:
> > > On Tue, Aug 16, 2022 at 12:41:14PM +0300, Alexander Atanasov wrote:
> > 
> > > >   rename include/linux/{balloon_compaction.h => balloon_common.h} (99%)
> > > 
> > > Why rename the .h file?  It still handles the "balloon compaction"
> > > logic.
> > 
> > File contains code that is common to balloon drivers,
> > compaction is only part of it. Series add more code to it.
> > Since it was suggested to use it for such common code.
> > I find that common becomes a better name for it so the rename.
> > I can drop the rename easy on next iteration if you suggest to.
> 
> "balloon_common.h" is very vague, you should only need one balloon.h
> file in the include/linux/ directory, right, so of course it is "common"
> :)
> 
> thanks,
> 
> greg "naming is hard" k-h

Yea, just call it balloon.h and balloon.c then.

-- 
MST



Re: [PATCH 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries

2020-11-24 Thread Michael S. Tsirkin
On Tue, Nov 24, 2020 at 09:03:06PM +0100, Laurent Vivier wrote:
> With virtio, in multiqueue case, each queue IRQ is normally
> bound to a different CPU using the affinity mask.
> 
> This works fine on x86_64 but totally ignored on pseries.
> 
> This is not obvious at first look because irqbalance is doing
> some balancing to improve that.
> 
> It appears that the "managed" flag set in the MSI entry
> is never copied to the system IRQ entry.
> 
> This series passes the affinity mask from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs() by adding an affinity parameter to
> irq_create_mapping().
> 
> The first patch adds the parameter (no functional change), the
> second patch passes the actual affinity mask to irq_create_mapping()
> in rtas_setup_msi_irqs().
> 
> For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:
> 
> ... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32
> 
> for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
> for file in /proc/irq/$IRQ/ ; do
> echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
> done
> done
> 
> Without the patch (and without irqbalanced)
> 
> IRQ: 268 CPU: 0-31
> IRQ: 269 CPU: 0-31
> IRQ: 270 CPU: 0-31
> IRQ: 271 CPU: 0-31
> IRQ: 272 CPU: 0-31
> IRQ: 273 CPU: 0-31
> IRQ: 274 CPU: 0-31
> IRQ: 275 CPU: 0-31
> IRQ: 276 CPU: 0-31
> IRQ: 277 CPU: 0-31
> IRQ: 278 CPU: 0-31
> IRQ: 279 CPU: 0-31
> IRQ: 280 CPU: 0-31
> IRQ: 281 CPU: 0-31
> IRQ: 282 CPU: 0-31
> IRQ: 283 CPU: 0-31
> IRQ: 284 CPU: 0-31
> IRQ: 285 CPU: 0-31
> IRQ: 286 CPU: 0-31
> IRQ: 287 CPU: 0-31
> IRQ: 288 CPU: 0-31
> IRQ: 289 CPU: 0-31
> IRQ: 290 CPU: 0-31
> IRQ: 291 CPU: 0-31
> IRQ: 292 CPU: 0-31
> IRQ: 293 CPU: 0-31
> IRQ: 294 CPU: 0-31
> IRQ: 295 CPU: 0-31
> IRQ: 296 CPU: 0-31
> IRQ: 297 CPU: 0-31
> IRQ: 298 CPU: 0-31
> IRQ: 299 CPU: 0-31
> 
> With the patch:
> 
> IRQ: 265 CPU: 0
> IRQ: 266 CPU: 1
> IRQ: 267 CPU: 2
> IRQ: 268 CPU: 3
> IRQ: 269 CPU: 4
> IRQ: 270 CPU: 5
> IRQ: 271 CPU: 6
> IRQ: 272 CPU: 7
> IRQ: 273 CPU: 8
> IRQ: 274 CPU: 9
> IRQ: 275 CPU: 10
> IRQ: 276 CPU: 11
> IRQ: 277 CPU: 12
> IRQ: 278 CPU: 13
> IRQ: 279 CPU: 14
> IRQ: 280 CPU: 15
> IRQ: 281 CPU: 16
> IRQ: 282 CPU: 17
> IRQ: 283 CPU: 18
> IRQ: 284 CPU: 19
> IRQ: 285 CPU: 20
> IRQ: 286 CPU: 21
> IRQ: 287 CPU: 22
> IRQ: 288 CPU: 23
> IRQ: 289 CPU: 24
> IRQ: 290 CPU: 25
> IRQ: 291 CPU: 26
> IRQ: 292 CPU: 27
> IRQ: 293 CPU: 28
> IRQ: 294 CPU: 29
> IRQ: 295 CPU: 30
> IRQ: 299 CPU: 31
> 
> This matches what we have on an x86_64 system.


Makes sense to me. FWIW

Acked-by: Michael S. Tsirkin 

> Laurent Vivier (2):
>   genirq: add an affinity parameter to irq_create_mapping()
>   powerpc/pseries: pass MSI affinity to irq_create_mapping()
> 
>  arch/arc/kernel/intc-arcv2.c  | 4 ++--
>  arch/arc/kernel/mcip.c| 2 +-
>  arch/arm/common/sa.c  | 2 +-
>  arch/arm/mach-s3c/irq-s3c24xx.c   | 3 ++-
>  arch/arm/plat-orion/gpio.c| 2 +-
>  arch/mips/ath25/ar2315.c  | 4 ++--
>  arch/mips/ath25/ar5312.c  | 4 ++--
>  arch/mips/lantiq/irq.c| 2 +-
>  arch/mips/pci/pci-ar2315.c| 3 ++-
>  arch/mips/pic32/pic32mzda/time.c  | 2 +-
>  arch/mips/ralink/irq.c| 2 +-
>  arch/powerpc/kernel/pci-common.c  | 2 +-
>  arch/powerpc/kvm/book3s_xive.c| 2 +-
>  arch/powerpc/platforms/44x/ppc476.c   | 4 ++--
>  arch/powerpc/platforms/cell/interrupt.c   | 4 ++--
>  arch/powerpc/platforms/cell/iommu.c   | 3 ++-
>  arch/powerpc/platforms/cell/pmu.c | 2 +-
>  arch/powerpc/platforms/cell/spider-pic.c  | 2 +-
>  arch/powerpc/platforms/cell/spu_manage.c  | 6 +++---
>  arch/powerpc/platforms/maple/pci.c| 2 +-
>  arch/powerpc/platforms/pasemi/dma_lib.c   | 5 +++--
>  arch/powerpc/platforms/pasemi/msi.c   | 2 +-
>  arch/powerpc/platforms/pasemi/setup.c | 4 ++--
>  arch/powerpc/platforms/powermac/pci.c | 2 +-
>  arch/powerpc/platforms/powermac/pic.c | 2 +-
>  arch/powerpc/platforms/powermac/smp.c | 2 +-
>  arch/powerpc/platforms/powernv/opal-irqchip.c | 5 +++--
>  arch/powerpc/platforms/powernv/pci.c  | 2 +-
>  arch/powerpc/platforms/powernv/vas.c  | 2 +-
>  arch/powerpc/platforms/ps3/interrupt.c| 2 +-
>  arch/powerpc/platforms/pseries/ibmebus.c  | 2 +-
>  arch/powerpc/platforms/pseries/msi.c  | 2 +-
>  arch/powerp

Re: [PATCH V2] vhost: do not enable VHOST_MENU by default

2020-04-17 Thread Michael S. Tsirkin
On Fri, Apr 17, 2020 at 05:33:56PM +0800, Jason Wang wrote:
> 
> On 2020/4/17 下午5:01, Michael S. Tsirkin wrote:
> > > There could be some misunderstanding here. I thought it's somehow 
> > > similar: a
> > > CONFIG_VHOST_MENU=y will be left in the defconfigs even if CONFIG_VHOST is
> > > not set.
> > > 
> > > Thanks
> > > 
> > BTW do entries with no prompt actually appear in defconfig?
> > 
> 
> Yes. I can see CONFIG_VHOST_DPN=y after make ARCH=m68k defconfig

You see it in .config right? So that's harmless right?

-- 
MST



Re: [PATCH V2] vhost: do not enable VHOST_MENU by default

2020-04-17 Thread Michael S. Tsirkin
On Fri, Apr 17, 2020 at 04:51:19PM +0800, Jason Wang wrote:
> 
> On 2020/4/17 下午4:46, Michael S. Tsirkin wrote:
> > On Fri, Apr 17, 2020 at 04:39:49PM +0800, Jason Wang wrote:
> > > On 2020/4/17 下午4:29, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 17, 2020 at 03:36:52PM +0800, Jason Wang wrote:
> > > > > On 2020/4/17 下午2:33, Michael S. Tsirkin wrote:
> > > > > > On Fri, Apr 17, 2020 at 11:12:14AM +0800, Jason Wang wrote:
> > > > > > > On 2020/4/17 上午6:55, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Apr 15, 2020 at 10:43:56AM +0800, Jason Wang wrote:
> > > > > > > > > We try to keep the defconfig untouched after decoupling 
> > > > > > > > > CONFIG_VHOST
> > > > > > > > > out of CONFIG_VIRTUALIZATION in commit 20c384f1ea1a
> > > > > > > > > ("vhost: refine vhost and vringh kconfig") by enabling 
> > > > > > > > > VHOST_MENU by
> > > > > > > > > default. Then the defconfigs can keep enabling 
> > > > > > > > > CONFIG_VHOST_NET
> > > > > > > > > without the caring of CONFIG_VHOST.
> > > > > > > > > 
> > > > > > > > > But this will leave a "CONFIG_VHOST_MENU=y" in all defconfigs 
> > > > > > > > > and even
> > > > > > > > > for the ones that doesn't want vhost. So it actually shifts 
> > > > > > > > > the
> > > > > > > > > burdens to the maintainers of all other to add 
> > > > > > > > > "CONFIG_VHOST_MENU is
> > > > > > > > > not set". So this patch tries to enable CONFIG_VHOST 
> > > > > > > > > explicitly in
> > > > > > > > > defconfigs that enables CONFIG_VHOST_NET and 
> > > > > > > > > CONFIG_VHOST_VSOCK.
> > > > > > > > > 
> > > > > > > > > Acked-by: Christian Borntraeger   
> > > > > > > > > (s390)
> > > > > > > > > Acked-by: Michael Ellerman   (powerpc)
> > > > > > > > > Cc: Thomas Bogendoerfer
> > > > > > > > > Cc: Benjamin Herrenschmidt
> > > > > > > > > Cc: Paul Mackerras
> > > > > > > > > Cc: Michael Ellerman
> > > > > > > > > Cc: Heiko Carstens
> > > > > > > > > Cc: Vasily Gorbik
> > > > > > > > > Cc: Christian Borntraeger
> > > > > > > > > Reported-by: Geert Uytterhoeven
> > > > > > > > > Signed-off-by: Jason Wang
> > > > > > > > I rebased this on top of OABI fix since that
> > > > > > > > seems more orgent to fix.
> > > > > > > > Pushed to my vhost branch pls take a look and
> > > > > > > > if possible test.
> > > > > > > > Thanks!
> > > > > > > I test this patch by generating the defconfigs that wants 
> > > > > > > vhost_net or
> > > > > > > vhost_vsock. All looks fine.
> > > > > > > 
> > > > > > > But having CONFIG_VHOST_DPN=y may end up with the similar 
> > > > > > > situation that
> > > > > > > this patch want to address.
> > > > > > > Maybe we can let CONFIG_VHOST depends on !ARM || AEABI then add 
> > > > > > > another
> > > > > > > menuconfig for VHOST_RING and do something similar?
> > > > > > > 
> > > > > > > Thanks
> > > > > > Sorry I don't understand. After this patch CONFIG_VHOST_DPN is just
> > > > > > an internal variable for the OABI fix. I kept it separate
> > > > > > so it's easy to revert for 5.8. Yes we could squash it into
> > > > > > VHOST directly but I don't see how that changes logic at all.
> > > > > Sorry for being unclear.
> > > > > 
> > > > > I meant since it was enabled by default, "CONFIG_VHOST_DPN=y" will be 
> > > > > left
> > > > > in the defconfigs.
> > > > But who cares?
> > > FYI, please seehttps://www.spinics.net/lists/kvm/msg212685.html
> > The complaint was not about the symbol IIUC.  It was that we caused
> > everyone to build vhost unless they manually disabled it.
> 
> 
> There could be some misunderstanding here. I thought it's somehow similar: a
> CONFIG_VHOST_MENU=y will be left in the defconfigs even if CONFIG_VHOST is
> not set.
> 
> Thanks
> 

BTW do entries with no prompt actually appear in defconfig?

> > 



Re: [PATCH V2] vhost: do not enable VHOST_MENU by default

2020-04-17 Thread Michael S. Tsirkin
On Fri, Apr 17, 2020 at 04:51:19PM +0800, Jason Wang wrote:
> 
> On 2020/4/17 下午4:46, Michael S. Tsirkin wrote:
> > On Fri, Apr 17, 2020 at 04:39:49PM +0800, Jason Wang wrote:
> > > On 2020/4/17 下午4:29, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 17, 2020 at 03:36:52PM +0800, Jason Wang wrote:
> > > > > On 2020/4/17 下午2:33, Michael S. Tsirkin wrote:
> > > > > > On Fri, Apr 17, 2020 at 11:12:14AM +0800, Jason Wang wrote:
> > > > > > > On 2020/4/17 上午6:55, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Apr 15, 2020 at 10:43:56AM +0800, Jason Wang wrote:
> > > > > > > > > We try to keep the defconfig untouched after decoupling 
> > > > > > > > > CONFIG_VHOST
> > > > > > > > > out of CONFIG_VIRTUALIZATION in commit 20c384f1ea1a
> > > > > > > > > ("vhost: refine vhost and vringh kconfig") by enabling 
> > > > > > > > > VHOST_MENU by
> > > > > > > > > default. Then the defconfigs can keep enabling 
> > > > > > > > > CONFIG_VHOST_NET
> > > > > > > > > without the caring of CONFIG_VHOST.
> > > > > > > > > 
> > > > > > > > > But this will leave a "CONFIG_VHOST_MENU=y" in all defconfigs 
> > > > > > > > > and even
> > > > > > > > > for the ones that doesn't want vhost. So it actually shifts 
> > > > > > > > > the
> > > > > > > > > burdens to the maintainers of all other to add 
> > > > > > > > > "CONFIG_VHOST_MENU is
> > > > > > > > > not set". So this patch tries to enable CONFIG_VHOST 
> > > > > > > > > explicitly in
> > > > > > > > > defconfigs that enables CONFIG_VHOST_NET and 
> > > > > > > > > CONFIG_VHOST_VSOCK.
> > > > > > > > > 
> > > > > > > > > Acked-by: Christian Borntraeger   
> > > > > > > > > (s390)
> > > > > > > > > Acked-by: Michael Ellerman   (powerpc)
> > > > > > > > > Cc: Thomas Bogendoerfer
> > > > > > > > > Cc: Benjamin Herrenschmidt
> > > > > > > > > Cc: Paul Mackerras
> > > > > > > > > Cc: Michael Ellerman
> > > > > > > > > Cc: Heiko Carstens
> > > > > > > > > Cc: Vasily Gorbik
> > > > > > > > > Cc: Christian Borntraeger
> > > > > > > > > Reported-by: Geert Uytterhoeven
> > > > > > > > > Signed-off-by: Jason Wang
> > > > > > > > I rebased this on top of OABI fix since that
> > > > > > > > seems more orgent to fix.
> > > > > > > > Pushed to my vhost branch pls take a look and
> > > > > > > > if possible test.
> > > > > > > > Thanks!
> > > > > > > I test this patch by generating the defconfigs that wants 
> > > > > > > vhost_net or
> > > > > > > vhost_vsock. All looks fine.
> > > > > > > 
> > > > > > > But having CONFIG_VHOST_DPN=y may end up with the similar 
> > > > > > > situation that
> > > > > > > this patch want to address.
> > > > > > > Maybe we can let CONFIG_VHOST depends on !ARM || AEABI then add 
> > > > > > > another
> > > > > > > menuconfig for VHOST_RING and do something similar?
> > > > > > > 
> > > > > > > Thanks
> > > > > > Sorry I don't understand. After this patch CONFIG_VHOST_DPN is just
> > > > > > an internal variable for the OABI fix. I kept it separate
> > > > > > so it's easy to revert for 5.8. Yes we could squash it into
> > > > > > VHOST directly but I don't see how that changes logic at all.
> > > > > Sorry for being unclear.
> > > > > 
> > > > > I meant since it was enabled by default, "CONFIG_VHOST_DPN=y" will be 
> > > > > left
> > > > > in the defconfigs.
> > > > But who cares?
> > > FYI, please seehttps://www.spinics.net/lists/kvm/msg212685.html
> > The complaint was not about the symbol IIUC.  It was that we caused
> > everyone to build vhost unless they manually disabled it.
> 
> 
> There could be some misunderstanding here. I thought it's somehow similar: a
> CONFIG_VHOST_MENU=y will be left in the defconfigs even if CONFIG_VHOST is
> not set.
> 
> Thanks

Hmm. So looking at Documentation/kbuild/kconfig-language.rst :

Things that merit "default y/m" include:

a) A new Kconfig option for something that used to always be built
   should be "default y".


b) A new gatekeeping Kconfig option that hides/shows other Kconfig
   options (but does not generate any code of its own), should be
   "default y" so people will see those other options.

c) Sub-driver behavior or similar options for a driver that is
   "default n". This allows you to provide sane defaults.


So it looks like VHOST_MENU is actually matching rule b).
So what's the problem we are trying to solve with this patch, exactly?

Geert could you clarify pls?


> 
> > 



Re: [PATCH V2] vhost: do not enable VHOST_MENU by default

2020-04-17 Thread Michael S. Tsirkin
On Fri, Apr 17, 2020 at 04:39:49PM +0800, Jason Wang wrote:
> 
> On 2020/4/17 下午4:29, Michael S. Tsirkin wrote:
> > On Fri, Apr 17, 2020 at 03:36:52PM +0800, Jason Wang wrote:
> > > On 2020/4/17 下午2:33, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 17, 2020 at 11:12:14AM +0800, Jason Wang wrote:
> > > > > On 2020/4/17 上午6:55, Michael S. Tsirkin wrote:
> > > > > > On Wed, Apr 15, 2020 at 10:43:56AM +0800, Jason Wang wrote:
> > > > > > > We try to keep the defconfig untouched after decoupling 
> > > > > > > CONFIG_VHOST
> > > > > > > out of CONFIG_VIRTUALIZATION in commit 20c384f1ea1a
> > > > > > > ("vhost: refine vhost and vringh kconfig") by enabling VHOST_MENU 
> > > > > > > by
> > > > > > > default. Then the defconfigs can keep enabling CONFIG_VHOST_NET
> > > > > > > without the caring of CONFIG_VHOST.
> > > > > > > 
> > > > > > > But this will leave a "CONFIG_VHOST_MENU=y" in all defconfigs and 
> > > > > > > even
> > > > > > > for the ones that doesn't want vhost. So it actually shifts the
> > > > > > > burdens to the maintainers of all other to add "CONFIG_VHOST_MENU 
> > > > > > > is
> > > > > > > not set". So this patch tries to enable CONFIG_VHOST explicitly in
> > > > > > > defconfigs that enables CONFIG_VHOST_NET and CONFIG_VHOST_VSOCK.
> > > > > > > 
> > > > > > > Acked-by: Christian Borntraeger  (s390)
> > > > > > > Acked-by: Michael Ellerman  (powerpc)
> > > > > > > Cc: Thomas Bogendoerfer
> > > > > > > Cc: Benjamin Herrenschmidt
> > > > > > > Cc: Paul Mackerras
> > > > > > > Cc: Michael Ellerman
> > > > > > > Cc: Heiko Carstens
> > > > > > > Cc: Vasily Gorbik
> > > > > > > Cc: Christian Borntraeger
> > > > > > > Reported-by: Geert Uytterhoeven
> > > > > > > Signed-off-by: Jason Wang
> > > > > > I rebased this on top of OABI fix since that
> > > > > > seems more orgent to fix.
> > > > > > Pushed to my vhost branch pls take a look and
> > > > > > if possible test.
> > > > > > Thanks!
> > > > > I test this patch by generating the defconfigs that wants vhost_net or
> > > > > vhost_vsock. All looks fine.
> > > > > 
> > > > > But having CONFIG_VHOST_DPN=y may end up with the similar situation 
> > > > > that
> > > > > this patch want to address.
> > > > > Maybe we can let CONFIG_VHOST depends on !ARM || AEABI then add 
> > > > > another
> > > > > menuconfig for VHOST_RING and do something similar?
> > > > > 
> > > > > Thanks
> > > > Sorry I don't understand. After this patch CONFIG_VHOST_DPN is just
> > > > an internal variable for the OABI fix. I kept it separate
> > > > so it's easy to revert for 5.8. Yes we could squash it into
> > > > VHOST directly but I don't see how that changes logic at all.
> > > 
> > > Sorry for being unclear.
> > > 
> > > I meant since it was enabled by default, "CONFIG_VHOST_DPN=y" will be left
> > > in the defconfigs.
> > But who cares?
> 
> 
> FYI, please see https://www.spinics.net/lists/kvm/msg212685.html

The complaint was not about the symbol IIUC.  It was that we caused
everyone to build vhost unless they manually disabled it.

> 
> > That does not add any code, does it?
> 
> 
> It doesn't.
> 
> Thanks
> 
> 
> > 
> > > This requires the arch maintainers to add
> > > "CONFIG_VHOST_VDPN is not set". (Geert complains about this)
> > > 
> > > Thanks
> > > 
> > > 



Re: [PATCH V2] vhost: do not enable VHOST_MENU by default

2020-04-17 Thread Michael S. Tsirkin
On Fri, Apr 17, 2020 at 03:36:52PM +0800, Jason Wang wrote:
> 
> On 2020/4/17 下午2:33, Michael S. Tsirkin wrote:
> > On Fri, Apr 17, 2020 at 11:12:14AM +0800, Jason Wang wrote:
> > > On 2020/4/17 上午6:55, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 15, 2020 at 10:43:56AM +0800, Jason Wang wrote:
> > > > > We try to keep the defconfig untouched after decoupling CONFIG_VHOST
> > > > > out of CONFIG_VIRTUALIZATION in commit 20c384f1ea1a
> > > > > ("vhost: refine vhost and vringh kconfig") by enabling VHOST_MENU by
> > > > > default. Then the defconfigs can keep enabling CONFIG_VHOST_NET
> > > > > without the caring of CONFIG_VHOST.
> > > > > 
> > > > > But this will leave a "CONFIG_VHOST_MENU=y" in all defconfigs and even
> > > > > for the ones that doesn't want vhost. So it actually shifts the
> > > > > burdens to the maintainers of all other to add "CONFIG_VHOST_MENU is
> > > > > not set". So this patch tries to enable CONFIG_VHOST explicitly in
> > > > > defconfigs that enables CONFIG_VHOST_NET and CONFIG_VHOST_VSOCK.
> > > > > 
> > > > > Acked-by: Christian Borntraeger  (s390)
> > > > > Acked-by: Michael Ellerman  (powerpc)
> > > > > Cc: Thomas Bogendoerfer
> > > > > Cc: Benjamin Herrenschmidt
> > > > > Cc: Paul Mackerras
> > > > > Cc: Michael Ellerman
> > > > > Cc: Heiko Carstens
> > > > > Cc: Vasily Gorbik
> > > > > Cc: Christian Borntraeger
> > > > > Reported-by: Geert Uytterhoeven
> > > > > Signed-off-by: Jason Wang
> > > > I rebased this on top of OABI fix since that
> > > > seems more orgent to fix.
> > > > Pushed to my vhost branch pls take a look and
> > > > if possible test.
> > > > Thanks!
> > > 
> > > I test this patch by generating the defconfigs that wants vhost_net or
> > > vhost_vsock. All looks fine.
> > > 
> > > But having CONFIG_VHOST_DPN=y may end up with the similar situation that
> > > this patch want to address.
> > > Maybe we can let CONFIG_VHOST depends on !ARM || AEABI then add another
> > > menuconfig for VHOST_RING and do something similar?
> > > 
> > > Thanks
> > Sorry I don't understand. After this patch CONFIG_VHOST_DPN is just
> > an internal variable for the OABI fix. I kept it separate
> > so it's easy to revert for 5.8. Yes we could squash it into
> > VHOST directly but I don't see how that changes logic at all.
> 
> 
> Sorry for being unclear.
> 
> I meant since it was enabled by default, "CONFIG_VHOST_DPN=y" will be left
> in the defconfigs.

But who cares? That does not add any code, does it?

> This requires the arch maintainers to add
> "CONFIG_VHOST_VDPN is not set". (Geert complains about this)
> 
> Thanks
> 
> 
> > 



Re: [PATCH V2] vhost: do not enable VHOST_MENU by default

2020-04-17 Thread Michael S. Tsirkin
On Fri, Apr 17, 2020 at 11:12:14AM +0800, Jason Wang wrote:
> 
> On 2020/4/17 上午6:55, Michael S. Tsirkin wrote:
> > On Wed, Apr 15, 2020 at 10:43:56AM +0800, Jason Wang wrote:
> > > We try to keep the defconfig untouched after decoupling CONFIG_VHOST
> > > out of CONFIG_VIRTUALIZATION in commit 20c384f1ea1a
> > > ("vhost: refine vhost and vringh kconfig") by enabling VHOST_MENU by
> > > default. Then the defconfigs can keep enabling CONFIG_VHOST_NET
> > > without the caring of CONFIG_VHOST.
> > > 
> > > But this will leave a "CONFIG_VHOST_MENU=y" in all defconfigs and even
> > > for the ones that doesn't want vhost. So it actually shifts the
> > > burdens to the maintainers of all other to add "CONFIG_VHOST_MENU is
> > > not set". So this patch tries to enable CONFIG_VHOST explicitly in
> > > defconfigs that enables CONFIG_VHOST_NET and CONFIG_VHOST_VSOCK.
> > > 
> > > Acked-by: Christian Borntraeger  (s390)
> > > Acked-by: Michael Ellerman  (powerpc)
> > > Cc: Thomas Bogendoerfer
> > > Cc: Benjamin Herrenschmidt
> > > Cc: Paul Mackerras
> > > Cc: Michael Ellerman
> > > Cc: Heiko Carstens
> > > Cc: Vasily Gorbik
> > > Cc: Christian Borntraeger
> > > Reported-by: Geert Uytterhoeven
> > > Signed-off-by: Jason Wang
> > I rebased this on top of OABI fix since that
> > seems more orgent to fix.
> > Pushed to my vhost branch pls take a look and
> > if possible test.
> > Thanks!
> 
> 
> I test this patch by generating the defconfigs that wants vhost_net or
> vhost_vsock. All looks fine.
> 
> But having CONFIG_VHOST_DPN=y may end up with the similar situation that
> this patch want to address.
> Maybe we can let CONFIG_VHOST depends on !ARM || AEABI then add another
> menuconfig for VHOST_RING and do something similar?
> 
> Thanks

Sorry I don't understand. After this patch CONFIG_VHOST_DPN is just
an internal variable for the OABI fix. I kept it separate
so it's easy to revert for 5.8. Yes we could squash it into
VHOST directly but I don't see how that changes logic at all.

-- 
MST



Re: [PATCH V2] vhost: do not enable VHOST_MENU by default

2020-04-16 Thread Michael S. Tsirkin
On Wed, Apr 15, 2020 at 10:43:56AM +0800, Jason Wang wrote:
> We try to keep the defconfig untouched after decoupling CONFIG_VHOST
> out of CONFIG_VIRTUALIZATION in commit 20c384f1ea1a
> ("vhost: refine vhost and vringh kconfig") by enabling VHOST_MENU by
> default. Then the defconfigs can keep enabling CONFIG_VHOST_NET
> without the caring of CONFIG_VHOST.
> 
> But this will leave a "CONFIG_VHOST_MENU=y" in all defconfigs and even
> for the ones that doesn't want vhost. So it actually shifts the
> burdens to the maintainers of all other to add "CONFIG_VHOST_MENU is
> not set". So this patch tries to enable CONFIG_VHOST explicitly in
> defconfigs that enables CONFIG_VHOST_NET and CONFIG_VHOST_VSOCK.
> 
> Acked-by: Christian Borntraeger  (s390)
> Acked-by: Michael Ellerman  (powerpc)
> Cc: Thomas Bogendoerfer 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> Reported-by: Geert Uytterhoeven 
> Signed-off-by: Jason Wang 

I rebased this on top of OABI fix since that
seems more orgent to fix.
Pushed to my vhost branch pls take a look and
if possible test.
Thanks!

> ---
> Change since V1:
> - depends on EVENTFD for VHOST
> ---
>  arch/mips/configs/malta_kvm_defconfig  |  1 +
>  arch/powerpc/configs/powernv_defconfig |  1 +
>  arch/powerpc/configs/ppc64_defconfig   |  1 +
>  arch/powerpc/configs/pseries_defconfig |  1 +
>  arch/s390/configs/debug_defconfig  |  1 +
>  arch/s390/configs/defconfig|  1 +
>  drivers/vhost/Kconfig  | 26 +-
>  7 files changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/mips/configs/malta_kvm_defconfig 
> b/arch/mips/configs/malta_kvm_defconfig
> index 8ef612552a19..06f0c7a0ca87 100644
> --- a/arch/mips/configs/malta_kvm_defconfig
> +++ b/arch/mips/configs/malta_kvm_defconfig
> @@ -18,6 +18,7 @@ CONFIG_PCI=y
>  CONFIG_VIRTUALIZATION=y
>  CONFIG_KVM=m
>  CONFIG_KVM_MIPS_DEBUG_COP0_COUNTERS=y
> +CONFIG_VHOST=m
>  CONFIG_VHOST_NET=m
>  CONFIG_MODULES=y
>  CONFIG_MODULE_UNLOAD=y
> diff --git a/arch/powerpc/configs/powernv_defconfig 
> b/arch/powerpc/configs/powernv_defconfig
> index 71749377d164..404245b4594d 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -346,5 +346,6 @@ CONFIG_CRYPTO_DEV_VMX=y
>  CONFIG_VIRTUALIZATION=y
>  CONFIG_KVM_BOOK3S_64=m
>  CONFIG_KVM_BOOK3S_64_HV=m
> +CONFIG_VHOST=m
>  CONFIG_VHOST_NET=m
>  CONFIG_PRINTK_TIME=y
> diff --git a/arch/powerpc/configs/ppc64_defconfig 
> b/arch/powerpc/configs/ppc64_defconfig
> index 7e68cb222c7b..4599fc7be285 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -61,6 +61,7 @@ CONFIG_ELECTRA_CF=y
>  CONFIG_VIRTUALIZATION=y
>  CONFIG_KVM_BOOK3S_64=m
>  CONFIG_KVM_BOOK3S_64_HV=m
> +CONFIG_VHOST=m
>  CONFIG_VHOST_NET=m
>  CONFIG_OPROFILE=m
>  CONFIG_KPROBES=y
> diff --git a/arch/powerpc/configs/pseries_defconfig 
> b/arch/powerpc/configs/pseries_defconfig
> index 6b68109e248f..4cad3901b5de 100644
> --- a/arch/powerpc/configs/pseries_defconfig
> +++ b/arch/powerpc/configs/pseries_defconfig
> @@ -321,5 +321,6 @@ CONFIG_CRYPTO_DEV_VMX=y
>  CONFIG_VIRTUALIZATION=y
>  CONFIG_KVM_BOOK3S_64=m
>  CONFIG_KVM_BOOK3S_64_HV=m
> +CONFIG_VHOST=m
>  CONFIG_VHOST_NET=m
>  CONFIG_PRINTK_TIME=y
> diff --git a/arch/s390/configs/debug_defconfig 
> b/arch/s390/configs/debug_defconfig
> index 0c86ba19fa2b..6ec6e69630d1 100644
> --- a/arch/s390/configs/debug_defconfig
> +++ b/arch/s390/configs/debug_defconfig
> @@ -57,6 +57,7 @@ CONFIG_PROTECTED_VIRTUALIZATION_GUEST=y
>  CONFIG_CMM=m
>  CONFIG_APPLDATA_BASE=y
>  CONFIG_KVM=m
> +CONFIG_VHOST=m
>  CONFIG_VHOST_NET=m
>  CONFIG_VHOST_VSOCK=m
>  CONFIG_OPROFILE=m
> diff --git a/arch/s390/configs/defconfig b/arch/s390/configs/defconfig
> index 6b27d861a9a3..d1b3bf83d687 100644
> --- a/arch/s390/configs/defconfig
> +++ b/arch/s390/configs/defconfig
> @@ -57,6 +57,7 @@ CONFIG_PROTECTED_VIRTUALIZATION_GUEST=y
>  CONFIG_CMM=m
>  CONFIG_APPLDATA_BASE=y
>  CONFIG_KVM=m
> +CONFIG_VHOST=m
>  CONFIG_VHOST_NET=m
>  CONFIG_VHOST_VSOCK=m
>  CONFIG_OPROFILE=m
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index e79cbbdfea45..29f171a53d8a 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -12,23 +12,19 @@ config VHOST_RING
> This option is selected by any driver which needs to access
> the host side of a virtio ring.
>  
> -config VHOST
> - tristate
> +menuconfig VHOST
> + tristate "Vhost Devices"
> + depends on EVENTFD
>   select VHOST_IOTLB
>   help
> -   This option is selected by any driver which needs to access
> -   the core of vhost.
> +   Enable option to support host kernel or hardware accelerator
> +   for virtio device.
>  
> -menuconfig VHOST_MENU
> - bool "VHOST drivers"
> - default y
> -
> -if VHOST_MENU
> +if VHOST
>  
>  config VHOST_NET
>  

Re: [RFC v1 0/2] Enable IOMMU support for pseries Secure VMs

2019-11-06 Thread Michael S. Tsirkin
On Wed, Nov 06, 2019 at 12:59:50PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 05/11/2019 08:28, Ram Pai wrote:
> > This patch series enables IOMMU support for pseries Secure VMs.
> > 
> > 
> > Tested using QEMU command line option:
> > 
> >  "-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4,
> > iommu_platform=on,disable-modern=off,disable-legacy=on"
> >  and 
> > 
> >  "-device virtio-blk-pci,scsi=off,bus=pci.0,
> > addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,
> > iommu_platform=on,disable-modern=off,disable-legacy=on"
> 
> 
> Worth mentioning that SLOF won't boot with such devices as SLOF does not know 
> about iommu_platform=on.

Shouldn't be hard to support: set up the iommu to allow everything
and ack the feature. Right?

> > 
> > Ram Pai (2):
> >   powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
> >   powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.
> > 
> >  arch/powerpc/platforms/pseries/iommu.c | 30 ++
> >  1 file changed, 18 insertions(+), 12 deletions(-)
> > 
> 
> -- 
> Alexey



Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2019 at 07:03:03PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > So this is what I would call this option:
> >> >> >
> >> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
> >> >> >
> >> >> > and the explanation should state that all device
> >> >> > addresses are translated by the platform to identical
> >> >> > addresses.
> >> >> >
> >> >> > In fact this option then becomes more, not less restrictive
> >> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
> >> >> > by guest to only create identity mappings,
> >> >> > and only before driver_ok is set.
> >> >> > This option then would always be negotiated together with
> >> >> > VIRTIO_F_ACCESS_PLATFORM.
> >> >> >
> >> >> > Host then must verify that
> >> >> > 1. full 1:1 mappings are created before driver_ok
> >> >> > or can we make sure this happens before features_ok?
> >> >> > that would be ideal as we could require that features_ok fails
> >> >> > 2. mappings are not modified between driver_ok and reset
> >> >> > i guess attempts to change them will fail -
> >> >> > possibly by causing a guest crash
> >> >> > or some other kind of platform-specific error
> >> >>
> >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
> >> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
> >> >> SLOF as I mentioned above, another is that we would be requiring all
> >> >> guests running on the machine (secure guests or not, since we would use
> >> >> the same configuration for all guests) to support it. But
> >> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
> >> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
> >> >> it and wouldn't be able to use the device.
> >> >
> >> > OK and your target is to enable use with kernel drivers within
> >> > guests, right?
> >>
> >> Right.
> >>
> >> > My question is, we are defining a new flag here, I guess old guests
> >> > then do not set it. How does it help old guests? Or maybe it's
> >> > not designed to ...
> >>
> >> Indeed. The idea is that QEMU can offer the flag, old guests can reject
> >> it (or even new guests can reject it, if they decide not to convert into
> >> secure VMs) and the feature negotiation will succeed with the flag
> >> unset.
> >
> > OK. And then what does QEMU do? Assume guest is not encrypted I guess?
> 
> There's nothing different that QEMU needs to do, with or without the
> flag. the perspective of the host, a secure guest and a regular guest
> work the same way with respect to virtio.

OK. So now let's get back to implementation. What will
Linux guest driver do? It can't activate DMA API blindly since that
will assume translation also works, right?
Or do we somehow limit it to just a specific platform?

> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> >>
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > So this is what I would call this option:
> >> >
> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
> >> >
> >> > and the explanation should state that all device
> >> > addresses are translated by the platform to identical
> >> > addresses.
> >> >
> >> > In fact this option then becomes more, not less restrictive
> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
> >> > by guest to only create identity mappings,
> >> > and only before driver_ok is set.
> >> > This option then would always be negotiated together with
> >> > VIRTIO_F_ACCESS_PLATFORM.
> >> >
> >> > Host then must verify that
> >> > 1. full 1:1 mappings are created before driver_ok
> >> > or can we make sure this happens before features_ok?
> >> > that would be ideal as we could require that features_ok fails
> >> > 2. mappings are not modified between driver_ok and reset
> >> > i guess attempts to change them will fail -
> >> > possibly by causing a guest crash
> >> > or some other kind of platform-specific error
> >>
> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
> >> SLOF as I mentioned above, another is that we would be requiring all
> >> guests running on the machine (secure guests or not, since we would use
> >> the same configuration for all guests) to support it. But
> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
> >> it and wouldn't be able to use the device.
> >
> > OK and your target is to enable use with kernel drivers within
> > guests, right?
> 
> Right.
> 
> > My question is, we are defining a new flag here, I guess old guests
> > then do not set it. How does it help old guests? Or maybe it's
> > not designed to ...
> 
> Indeed. The idea is that QEMU can offer the flag, old guests can reject
> it (or even new guests can reject it, if they decide not to convert into
> secure VMs) and the feature negotiation will succeed with the flag
> unset.

OK. And then what does QEMU do? Assume guest is not encrypted I guess?

> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> 
> 
> Michael S. Tsirkin  writes:
> 
> > On Thu, Jun 27, 2019 at 10:58:40PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> >> >> I rephrased it in terms of address translation. What do you think of
> >> >> >> this version? The flag name is slightly different too:
> >> >> >>
> >> >> >>
> >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> >> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not 
> >> >> >> set,
> >> >> >> with the exception that address translation is guaranteed to be
> >> >> >> unnecessary when accessing memory addresses supplied to the 
> >> >> >> device
> >> >> >> by the driver. Which is to say, the device will always use 
> >> >> >> physical
> >> >> >> addresses matching addresses used by the driver (typically 
> >> >> >> meaning
> >> >> >> physical addresses used by the CPU) and not translated further. 
> >> >> >> This
> >> >> >> flag should be set by the guest if offered, but to allow for
> >> >> >> backward-compatibility device implementations allow for it to be
> >> >> >> left unset by the guest. It is an error to set both this flag and
> >> >> >> VIRTIO_F_ACCESS_PLATFORM.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> >> >> > drivers. This is why devices fail when it's not negotiated.
> >> >>
> >> >> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> >> >> implemented in guest userspace such as with VFIO? Or unprivileged in
> >> >> some other sense such as needing to use bounce buffers for some reason?
> >> >
> >> > I had drivers in guest userspace in mind.
> >>
> >> Great. Thanks for clarifying.
> >>
> >> I don't think this flag would work for guest userspace drivers. Should I
> >> add a note about that in the flag definition?
> >
> > I think you need to clarify access protection rules. Is it only
> > translation that is bypassed or is any platform-specific
> > protection mechanism bypassed too?
> 
> It is only translation. In a secure guest, if the device tries to access
> a memory address that wasn't provided by the driver then the
> architecture will deny that access. If the device accesses addresses
> provided to it by the driver, then there's no protection mechanism or
> translation to get in the way.
> 
> >> >> > This confuses me.
> >> >> > If driver is unpriveledged then what happens with this flag?
> >> >> > It can supply any address it wants. Will that corrupt kernel
> >> >> > memory?
> >> >>
> >> >> Not needing address translation doesn't necessarily mean that there's no
> >> >> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> >> >> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> >> >> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> >> >> to program the IOMMU.
> >> >>
> >> >> For our use case, we don't need address translation because we set up an
> >> >> identity mapping in the IOMMU so that the device can use guest physical
> >> >> addresses.
> >
> > OK so I think I am beginning to see it in a different light.  Right now the 
> > specific
> > platform creates an identity mapping. That in turn means DMA API can be
> > fast - it does not need to do anything.  What you are looking for is a
> > way to tell host it's an identity mapping - just as an optimization.
> >
> > Is that right?
> 
> Almost. Theoretically it is just an optimization. But in practice the
> pseries boot firmware (SLOF) doesn't support IOMM

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-01 Thread Michael S. Tsirkin
On Thu, Jun 27, 2019 at 10:58:40PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> >>
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> >> I rephrased it in terms of address translation. What do you think of
> >> >> this version? The flag name is slightly different too:
> >> >>
> >> >>
> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> >> >> with the exception that address translation is guaranteed to be
> >> >> unnecessary when accessing memory addresses supplied to the device
> >> >> by the driver. Which is to say, the device will always use physical
> >> >> addresses matching addresses used by the driver (typically meaning
> >> >> physical addresses used by the CPU) and not translated further. This
> >> >> flag should be set by the guest if offered, but to allow for
> >> >> backward-compatibility device implementations allow for it to be
> >> >> left unset by the guest. It is an error to set both this flag and
> >> >> VIRTIO_F_ACCESS_PLATFORM.
> >> >
> >> >
> >> >
> >> >
> >> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> >> > drivers. This is why devices fail when it's not negotiated.
> >>
> >> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> >> implemented in guest userspace such as with VFIO? Or unprivileged in
> >> some other sense such as needing to use bounce buffers for some reason?
> >
> > I had drivers in guest userspace in mind.
> 
> Great. Thanks for clarifying.
> 
> I don't think this flag would work for guest userspace drivers. Should I
> add a note about that in the flag definition?

I think you need to clarify access protection rules. Is it only
translation that is bypassed or is any platform-specific
protection mechanism bypassed too?

> >> > This confuses me.
> >> > If driver is unpriveledged then what happens with this flag?
> >> > It can supply any address it wants. Will that corrupt kernel
> >> > memory?
> >>
> >> Not needing address translation doesn't necessarily mean that there's no
> >> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> >> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> >> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> >> to program the IOMMU.
> >>
> >> For our use case, we don't need address translation because we set up an
> >> identity mapping in the IOMMU so that the device can use guest physical
> >> addresses.

OK so I think I am beginning to see it in a different light.  Right now the 
specific
platform creates an identity mapping. That in turn means DMA API can be
fast - it does not need to do anything.  What you are looking for is a
way to tell host it's an identity mapping - just as an optimization.

Is that right?  So this is what I would call this option:

VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS

and the explanation should state that all device
addresses are translated by the platform to identical
addresses.

In fact this option then becomes more, not less restrictive
than VIRTIO_F_ACCESS_PLATFORM - it's a promise
by guest to only create identity mappings,
and only before driver_ok is set.
This option then would always be negotiated together with
VIRTIO_F_ACCESS_PLATFORM.

Host then must verify that
1. full 1:1 mappings are created before driver_ok
or can we make sure this happens before features_ok?
that would be ideal as we could require that features_ok fails
2. mappings are not modified between driver_ok and reset
i guess attempts to change them will fail -
possibly by causing a guest crash
or some other kind of platform-specific error

So far so good, but now a question:

how are we handling guest address width limitations?
Is VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS subject to
guest address width limitations?
I am guessing we can make them so ...
This needs to be documented.




> >
> > And can it access any guest physical address?
> 
> Sorry, I was mistaken. We do support VFIO in guests but not for virtio
> devices, only for regular PCI devices. In which case they will use
> address translation.

Not su

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-06-03 Thread Michael S. Tsirkin
On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> 
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> I rephrased it in terms of address translation. What do you think of
> >> this version? The flag name is slightly different too:
> >>
> >>
> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> >> with the exception that address translation is guaranteed to be
> >> unnecessary when accessing memory addresses supplied to the device
> >> by the driver. Which is to say, the device will always use physical
> >> addresses matching addresses used by the driver (typically meaning
> >> physical addresses used by the CPU) and not translated further. This
> >> flag should be set by the guest if offered, but to allow for
> >> backward-compatibility device implementations allow for it to be
> >> left unset by the guest. It is an error to set both this flag and
> >> VIRTIO_F_ACCESS_PLATFORM.
> >
> >
> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> > drivers. This is why devices fail when it's not negotiated.
> 
> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> implemented in guest userspace such as with VFIO? Or unprivileged in
> some other sense such as needing to use bounce buffers for some reason?

I had drivers in guest userspace in mind.

> > This confuses me.
> > If driver is unpriveledged then what happens with this flag?
> > It can supply any address it wants. Will that corrupt kernel
> > memory?
> 
> Not needing address translation doesn't necessarily mean that there's no
> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> to program the IOMMU.
> 
> For our use case, we don't need address translation because we set up an
> identity mapping in the IOMMU so that the device can use guest physical
> addresses.

And can it access any guest physical address?

> If the guest kernel is concerned that an unprivileged driver could
> jeopardize its integrity it should not negotiate this feature flag.

Unfortunately flag negotiation is done through config space
and so can be overwritten by the driver.

> Perhaps there should be a note about this in the flag definition? This
> concern is platform-dependant though. I don't believe it's an issue in
> pseries.

Again ACCESS_PLATFORM has a pretty open definition. It does actually
say it's all up to the platform.

Specifically how will VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION be
implemented portably? virtio has no portable way to know
whether DMA API bypasses translation.



> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [PATCH 22/22] docs: fix broken documentation links

2019-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2019 at 10:17:32PM +0200, Federico Vaga wrote:
> On Thursday, May 30, 2019 1:23:53 AM CEST Mauro Carvalho Chehab wrote:
> > Mostly due to x86 and acpi conversion, several documentation
> > links are still pointing to the old file. Fix them.
> 
> For the Italian documentation I just send I patch to fix them in a dedicated 
> patch


Acked-by: Michael S. Tsirkin 

for the vhost change.

> > 
> > Signed-off-by: Mauro Carvalho Chehab 
> > ---
> >  Documentation/acpi/dsd/leds.txt  |  2 +-
> >  Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
> >  Documentation/admin-guide/kernel-parameters.txt  | 16 
> >  Documentation/admin-guide/ras.rst|  2 +-
> >  .../devicetree/bindings/net/fsl-enetc.txt|  7 +++
> >  .../bindings/pci/amlogic,meson-pcie.txt  |  2 +-
> >  .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
> >  Documentation/devicetree/booting-without-of.txt  |  2 +-
> >  Documentation/driver-api/gpio/board.rst  |  2 +-
> >  Documentation/driver-api/gpio/consumer.rst   |  2 +-
> >  .../firmware-guide/acpi/enumeration.rst  |  2 +-
> >  .../firmware-guide/acpi/method-tracing.rst   |  2 +-
> >  Documentation/i2c/instantiating-devices  |  2 +-
> >  Documentation/sysctl/kernel.txt  |  4 ++--
> >  .../translations/it_IT/process/howto.rst |  2 +-
> >  .../it_IT/process/stable-kernel-rules.rst|  4 ++--
> >  .../translations/zh_CN/process/4.Coding.rst  |  2 +-
> >  Documentation/x86/x86_64/5level-paging.rst   |  2 +-
> >  Documentation/x86/x86_64/boot-options.rst|  4 ++--
> >  .../x86/x86_64/fake-numa-for-cpusets.rst |  2 +-
> >  MAINTAINERS  |  6 +++---
> >  arch/arm/Kconfig |  2 +-
> >  arch/arm64/kernel/kexec_image.c  |  2 +-
> >  arch/powerpc/Kconfig |  2 +-
> >  arch/x86/Kconfig | 16 
> >  arch/x86/Kconfig.debug   |  2 +-
> >  arch/x86/boot/header.S   |  2 +-
> >  arch/x86/entry/entry_64.S|  2 +-
> >  arch/x86/include/asm/bootparam_utils.h   |  2 +-
> >  arch/x86/include/asm/page_64_types.h |  2 +-
> >  arch/x86/include/asm/pgtable_64_types.h  |  2 +-
> >  arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
> >  arch/x86/kernel/kexec-bzimage64.c|  2 +-
> >  arch/x86/kernel/pci-dma.c|  2 +-
> >  arch/x86/mm/tlb.c|  2 +-
> >  arch/x86/platform/pvh/enlighten.c|  2 +-
> >  drivers/acpi/Kconfig | 10 +-
> >  drivers/net/ethernet/faraday/ftgmac100.c |  2 +-
> >  .../fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
> >  drivers/vhost/vhost.c|  2 +-
> >  include/acpi/acpi_drivers.h  |  2 +-
> >  include/linux/fs_context.h   |  2 +-
> >  include/linux/lsm_hooks.h|  2 +-
> >  mm/Kconfig   |  2 +-
> >  security/Kconfig |  2 +-
> >  tools/include/linux/err.h|  2 +-
> >  tools/objtool/Documentation/stack-validation.txt |  4 ++--
> >  tools/testing/selftests/x86/protection_keys.c|  2 +-
> >  48 files changed, 77 insertions(+), 78 deletions(-)
> > 
> > diff --git a/Documentation/acpi/dsd/leds.txt
> > b/Documentation/acpi/dsd/leds.txt index 81a63af42ed2..cc58b1a574c5 100644
> > --- a/Documentation/acpi/dsd/leds.txt
> > +++ b/Documentation/acpi/dsd/leds.txt
> > @@ -96,4 +96,4 @@ where
> > 
> > http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-da
> > ta-extension-UUID-v1.1.pdf>, referenced 2019-02-21.
> > 
> > -[7] Documentation/acpi/dsd/data-node-reference.txt
> > +[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> > diff --git a/Documentation/admin-guide/kernel-parameters.rst
> > b/Documentation/admin-guide/kernel-parameters.rst index
> > 0124980dca2d..8d3273e32eb1 100644
> > --- a/Documentation/admin-guide/kernel-parameters.rst
> > +++ b/Documentation/admin-guide/kernel-parameters.rst
> > @@ -167,7 +167,7 @@ parameter is applicable::
> > X86-32  X86-32, aka i386 architecture is enabled.
> > X86-64  X86-64 architecture is enabled.
> >

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-05-20 Thread Michael S. Tsirkin
On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> I rephrased it in terms of address translation. What do you think of
> this version? The flag name is slightly different too:
> 
> 
> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> with the exception that address translation is guaranteed to be
> unnecessary when accessing memory addresses supplied to the device
> by the driver. Which is to say, the device will always use physical
> addresses matching addresses used by the driver (typically meaning
> physical addresses used by the CPU) and not translated further. This
> flag should be set by the guest if offered, but to allow for
> backward-compatibility device implementations allow for it to be
> left unset by the guest. It is an error to set both this flag and
> VIRTIO_F_ACCESS_PLATFORM.


OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
drivers. This is why devices fail when it's not negotiated.

This confuses me.
If driver is unpriveledged then what happens with this flag?
It can supply any address it wants. Will that corrupt kernel
memory?

-- 
MST


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-05-20 Thread Michael S. Tsirkin
On Fri, Apr 26, 2019 at 08:56:43PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 24, 2019 at 10:01:56PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >> >> >>
> >> >> >> Michael S. Tsirkin  writes:
> >> >> >>
> >> >> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann 
> >> >> >> > wrote:
> >> >> >> >> >From what I understand of the ACCESS_PLATFORM definition, the 
> >> >> >> >> >host will
> >> >> >> >> only ever try to access memory addresses that are supplied to it 
> >> >> >> >> by the
> >> >> >> >> guest, so all of the secure guest memory that the host cares 
> >> >> >> >> about is
> >> >> >> >> accessible:
> >> >> >> >>
> >> >> >> >> If this feature bit is set to 0, then the device has same 
> >> >> >> >> access to
> >> >> >> >> memory addresses supplied to it as the driver has. In 
> >> >> >> >> particular,
> >> >> >> >> the device will always use physical addresses matching 
> >> >> >> >> addresses
> >> >> >> >> used by the driver (typically meaning physical addresses used 
> >> >> >> >> by the
> >> >> >> >> CPU) and not translated further, and can access any address 
> >> >> >> >> supplied
> >> >> >> >> to it by the driver. When clear, this overrides any
> >> >> >> >> platform-specific description of whether device access is 
> >> >> >> >> limited or
> >> >> >> >> translated in any way, e.g. whether an IOMMU may be present.
> >> >> >> >>
> >> >> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> >> >> guests or not.
> >> >> >> >>
> >> >> >> >> Or are you saying that a virtio device may want to access memory
> >> >> >> >> addresses that weren't supplied to it by the driver?
> >> >> >> >
> >> >> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> >> >> > specific encrypted memory regions that driver has access to but 
> >> >> >> > device
> >> >> >> > does not. that seems to violate the constraint.
> >> >> >>
> >> >> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> >> >> the device can ignore the IOMMU for all practical purposes I would
> >> >> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >> >> >>
> >> >> >> I guess I'm still struggling with the purpose of signalling to the
> >> >> >> driver that the host may not have access to memory addresses that it
> >> >> >> will never try to access.
> >> >> >
> >> >> > For example, one of the benefits is to signal to host that driver does
> >> >> > not expect ability to access all memory. If it does, host can
> >> >> > fail initialization gracefully.
> >> >>
> >> >> But why would the ability to access all memory be necessary or even
> >> >> useful? When would the host access memory that the driver didn't tell it
> >> >> to access?
> >> >
> >> > When I say all memory I mean even memory not allowed by the IOMMU.
> >>
> >> Yes, but why? How is that memory relevant?
> >
> > It's relevant when driver is not trusted to only supply correct
> > addresses. The feature was originally designed to support userspace
> > drivers within guests.
> 
> Ah, thanks for clarifying. I don't think that's a problem in our case.
> If the guest provides an incorrect address, the hardware simply 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-04-24 Thread Michael S. Tsirkin
On Wed, Apr 24, 2019 at 10:01:56PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host 
> >> >> >> >will
> >> >> >> only ever try to access memory addresses that are supplied to it by 
> >> >> >> the
> >> >> >> guest, so all of the secure guest memory that the host cares about is
> >> >> >> accessible:
> >> >> >>
> >> >> >> If this feature bit is set to 0, then the device has same access 
> >> >> >> to
> >> >> >> memory addresses supplied to it as the driver has. In particular,
> >> >> >> the device will always use physical addresses matching addresses
> >> >> >> used by the driver (typically meaning physical addresses used by 
> >> >> >> the
> >> >> >> CPU) and not translated further, and can access any address 
> >> >> >> supplied
> >> >> >> to it by the driver. When clear, this overrides any
> >> >> >> platform-specific description of whether device access is 
> >> >> >> limited or
> >> >> >> translated in any way, e.g. whether an IOMMU may be present.
> >> >> >>
> >> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> >> guests or not.
> >> >> >>
> >> >> >> Or are you saying that a virtio device may want to access memory
> >> >> >> addresses that weren't supplied to it by the driver?
> >> >> >
> >> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> >> > specific encrypted memory regions that driver has access to but device
> >> >> > does not. that seems to violate the constraint.
> >> >>
> >> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> >> the device can ignore the IOMMU for all practical purposes I would
> >> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >> >>
> >> >> I guess I'm still struggling with the purpose of signalling to the
> >> >> driver that the host may not have access to memory addresses that it
> >> >> will never try to access.
> >> >
> >> > For example, one of the benefits is to signal to host that driver does
> >> > not expect ability to access all memory. If it does, host can
> >> > fail initialization gracefully.
> >>
> >> But why would the ability to access all memory be necessary or even
> >> useful? When would the host access memory that the driver didn't tell it
> >> to access?
> >
> > When I say all memory I mean even memory not allowed by the IOMMU.
> 
> Yes, but why? How is that memory relevant?

It's relevant when driver is not trusted to only supply correct
addresses. The feature was originally designed to support userspace
drivers within guests.

> >> >> >> >> > But the name "sev_active" makes me scared because at least AMD 
> >> >> >> >> > guys who
> >> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >> >> >>
> >> >> >> >> My understanding is, AMD guest-platform knows in advance that 
> >> >> >> >> their
> >> >> >> >> guest will run in secure mode and hence sets the flag at the time 
> >> >> >> >> of VM
> >> >> >> >> instantiation. Unfortunately we dont have that luxury on our 
> >> >> >> >> platforms.
> >> >> >> >
> >> >> >> > Well you do have that luxury. It looks like that there are existing
> >> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not 
> >> >>

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-04-19 Thread Michael S. Tsirkin
On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
> >> >> only ever try to access memory addresses that are supplied to it by the
> >> >> guest, so all of the secure guest memory that the host cares about is
> >> >> accessible:
> >> >>
> >> >> If this feature bit is set to 0, then the device has same access to
> >> >> memory addresses supplied to it as the driver has. In particular,
> >> >> the device will always use physical addresses matching addresses
> >> >> used by the driver (typically meaning physical addresses used by the
> >> >> CPU) and not translated further, and can access any address supplied
> >> >> to it by the driver. When clear, this overrides any
> >> >> platform-specific description of whether device access is limited or
> >> >> translated in any way, e.g. whether an IOMMU may be present.
> >> >>
> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> guests or not.
> >> >>
> >> >> Or are you saying that a virtio device may want to access memory
> >> >> addresses that weren't supplied to it by the driver?
> >> >
> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> > specific encrypted memory regions that driver has access to but device
> >> > does not. that seems to violate the constraint.
> >>
> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> the device can ignore the IOMMU for all practical purposes I would
> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >>
> >> I guess I'm still struggling with the purpose of signalling to the
> >> driver that the host may not have access to memory addresses that it
> >> will never try to access.
> >
> > For example, one of the benefits is to signal to host that driver does
> > not expect ability to access all memory. If it does, host can
> > fail initialization gracefully.
> 
> But why would the ability to access all memory be necessary or even
> useful? When would the host access memory that the driver didn't tell it
> to access?

When I say all memory I mean even memory not allowed by the IOMMU.


> >> >> >> > But the name "sev_active" makes me scared because at least AMD 
> >> >> >> > guys who
> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >> >>
> >> >> >> My understanding is, AMD guest-platform knows in advance that their
> >> >> >> guest will run in secure mode and hence sets the flag at the time of 
> >> >> >> VM
> >> >> >> instantiation. Unfortunately we dont have that luxury on our 
> >> >> >> platforms.
> >> >> >
> >> >> > Well you do have that luxury. It looks like that there are existing
> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
> >> >> > with how that path is slow. So you are trying to optimize for
> >> >> > them by clearing ACCESS_PLATFORM and then you have lost ability
> >> >> > to invoke DMA API.
> >> >> >
> >> >> > For example if there was another flag just like ACCESS_PLATFORM
> >> >> > just not yet used by anyone, you would be all fine using that right?
> >> >>
> >> >> Yes, a new flag sounds like a great idea. What about the definition
> >> >> below?
> >> >>
> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
> >> >> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
> >> >> exception that the IOMMU is explicitly defined to be off or bypassed
> >> >> when accessing memory addresses supplied to the device by the
> >> >> driver. This flag should be set by the guest if offered, but to
> >> >> allow for backward-compatibility device implem

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-26 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 08:44:27AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 29, 2019 at 09:36:08PM -0500, Michael S. Tsirkin wrote:
> > This has been discussed ad nauseum. virtio is all about compatibility.
> > Losing a couple of lines of code isn't worth breaking working setups.
> > People that want "just use DMA API no tricks" now have the option.
> > Setting a flag in a feature bit map is literally a single line
> > of code in the hypervisor. So stop pushing for breaking working
> > legacy setups and just fix it in the right place.
> 
> I agree with the legacy aspect.  What I am missing is an extremely
> strong wording that says you SHOULD always set this flag for new
> hosts, including an explanation why.


So as far as power is concerned, IIUC the issue they are struggling with is
that some platforms do not support pass-through mode in the emulated IOMMU.
Disabling PLATFORM_ACCESS is so far a way around that, unfortunately just for
virtio devices.  I would like virtio-iommu to be able to address that need as
well.

-- 
MST


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-23 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> Another way of looking at this issue which also explains our reluctance
> >> >> is that the only difference between a secure guest and a regular guest
> >> >> (at least regarding virtio) is that the former uses swiotlb while the
> >> >> latter doens't.
> >> >
> >> > But swiotlb is just one implementation. It's a guest internal thing. The
> >> > issue is that memory isn't host accessible.
> >>
> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
> >> only ever try to access memory addresses that are supplied to it by the
> >> guest, so all of the secure guest memory that the host cares about is
> >> accessible:
> >>
> >> If this feature bit is set to 0, then the device has same access to
> >> memory addresses supplied to it as the driver has. In particular,
> >> the device will always use physical addresses matching addresses
> >> used by the driver (typically meaning physical addresses used by the
> >> CPU) and not translated further, and can access any address supplied
> >> to it by the driver. When clear, this overrides any
> >> platform-specific description of whether device access is limited or
> >> translated in any way, e.g. whether an IOMMU may be present.
> >>
> >> All of the above is true for POWER guests, whether they are secure
> >> guests or not.
> >>
> >> Or are you saying that a virtio device may want to access memory
> >> addresses that weren't supplied to it by the driver?
> >
> > Your logic would apply to IOMMUs as well.  For your mode, there are
> > specific encrypted memory regions that driver has access to but device
> > does not. that seems to violate the constraint.
> 
> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> the device can ignore the IOMMU for all practical purposes I would
> indeed say that the logic would apply to IOMMUs as well. :-)
> 
> I guess I'm still struggling with the purpose of signalling to the
> driver that the host may not have access to memory addresses that it
> will never try to access.

For example, one of the benefits is to signal to host that driver does
not expect ability to access all memory. If it does, host can
fail initialization gracefully.

> >> >> And from the device's point of view they're
> >> >> indistinguishable. It can't tell one guest that is using swiotlb from
> >> >> one that isn't. And that implies that secure guest vs regular guest
> >> >> isn't a virtio interface issue, it's "guest internal affairs". So
> >> >> there's no reason to reflect that in the feature flags.
> >> >
> >> > So don't. The way not to reflect that in the feature flags is
> >> > to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.
> >> >
> >> >
> >> > Without ACCESS_PLATFORM
> >> > virtio has a very specific opinion about the security of the
> >> > device, and that opinion is that device is part of the guest
> >> > supervisor security domain.
> >>
> >> Sorry for being a bit dense, but not sure what "the device is part of
> >> the guest supervisor security domain" means. In powerpc-speak,
> >> "supervisor" is the operating system so perhaps that explains my
> >> confusion. Are you saying that without ACCESS_PLATFORM, the guest
> >> considers the host to be part of the guest operating system's security
> >> domain?
> >
> > I think so. The spec says "device has same access as driver".
> 
> Ok, makes sense.
> 
> >> If so, does that have any other implication besides "the host
> >> can access any address supplied to it by the driver"? If that is the
> >> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
> >> include that information because it's not part of the current
> >> definition.
> >>
> >> >> > But the name "sev_active" makes me scared because at least AMD guys 
> >> >> > who
> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >>
> >> >> My understanding is, AMD guest-platform knows in advance that their
> >> >> guest will run in 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> Another way of looking at this issue which also explains our reluctance
> >> is that the only difference between a secure guest and a regular guest
> >> (at least regarding virtio) is that the former uses swiotlb while the
> >> latter doens't.
> >
> > But swiotlb is just one implementation. It's a guest internal thing. The
> > issue is that memory isn't host accessible.
> 
> >From what I understand of the ACCESS_PLATFORM definition, the host will
> only ever try to access memory addresses that are supplied to it by the
> guest, so all of the secure guest memory that the host cares about is
> accessible:
> 
> If this feature bit is set to 0, then the device has same access to
> memory addresses supplied to it as the driver has. In particular,
> the device will always use physical addresses matching addresses
> used by the driver (typically meaning physical addresses used by the
> CPU) and not translated further, and can access any address supplied
> to it by the driver. When clear, this overrides any
> platform-specific description of whether device access is limited or
> translated in any way, e.g. whether an IOMMU may be present.
> 
> All of the above is true for POWER guests, whether they are secure
> guests or not.
> 
> Or are you saying that a virtio device may want to access memory
> addresses that weren't supplied to it by the driver?

Your logic would apply to IOMMUs as well.  For your mode, there are
specific encrypted memory regions that driver has access to but device
does not. that seems to violate the constraint.


> >> And from the device's point of view they're
> >> indistinguishable. It can't tell one guest that is using swiotlb from
> >> one that isn't. And that implies that secure guest vs regular guest
> >> isn't a virtio interface issue, it's "guest internal affairs". So
> >> there's no reason to reflect that in the feature flags.
> >
> > So don't. The way not to reflect that in the feature flags is
> > to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.
> >
> >
> > Without ACCESS_PLATFORM
> > virtio has a very specific opinion about the security of the
> > device, and that opinion is that device is part of the guest
> > supervisor security domain.
> 
> Sorry for being a bit dense, but not sure what "the device is part of
> the guest supervisor security domain" means. In powerpc-speak,
> "supervisor" is the operating system so perhaps that explains my
> confusion. Are you saying that without ACCESS_PLATFORM, the guest
> considers the host to be part of the guest operating system's security
> domain?

I think so. The spec says "device has same access as driver".

> If so, does that have any other implication besides "the host
> can access any address supplied to it by the driver"? If that is the
> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
> include that information because it's not part of the current
> definition.
> 
> >> That said, we still would like to arrive at a proper design for this
> >> rather than add yet another hack if we can avoid it. So here's another
> >> proposal: considering that the dma-direct code (in kernel/dma/direct.c)
> >> automatically uses swiotlb when necessary (thanks to Christoph's recent
> >> DMA work), would it be ok to replace virtio's own direct-memory code
> >> that is used in the !ACCESS_PLATFORM case with the dma-direct code? That
> >> way we'll get swiotlb even with !ACCESS_PLATFORM, and virtio will get a
> >> code cleanup (replace open-coded stuff with calls to existing
> >> infrastructure).
> >
> > Let's say I have some doubts that there's an API that
> > matches what virtio with its bag of legacy compatibility exactly.
> 
> Ok.
> 
> >> > But the name "sev_active" makes me scared because at least AMD guys who
> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >>
> >> My understanding is, AMD guest-platform knows in advance that their
> >> guest will run in secure mode and hence sets the flag at the time of VM
> >> instantiation. Unfortunately we dont have that luxury on our platforms.
> >
> > Well you do have that luxury. It looks like that there are existing
> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
> > with how that path is slow. So you are trying to optimize for
> > them by clearing ACCESS_PLATFORM and then you have lost ability
> > to invoke DMA API.
> >
> > For example if there was another flag just like ACCESS_PLATFORM
> > just not yet used by anyone, you would be all fine using that right?
> 
> Yes, a new flag sounds like a great idea. What about the definition
> below?
> 
> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
> exception that the IOMMU is explicitly defined to be off or bypassed
> when accessing memory addresses supplied to 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-05 Thread Michael S. Tsirkin
On Tue, Feb 05, 2019 at 08:24:07AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 04, 2019 at 04:38:21PM -0500, Michael S. Tsirkin wrote:
> > It was designed to make, when set, as many guests as we can work
> > correctly, and it seems to be successful in doing exactly that.
> > 
> > Unfortunately there could be legacy guests that do work correctly but
> > become slow. Whether trying to somehow work around that
> > can paint us into a corner where things again don't
> > work for some people is a question worth discussing.
> 
> The other problem is that some qemu machines just throw passthrough
> devices and virtio devices on the same virtual PCI(e) bus, and have a
> common IOMMU setup for the whole bus / root port / domain.  I think
> this is completely bogus, but unfortunately it is out in the field.
> 
> Given that power is one of these examples I suspect that is what
> Thiago referes to.  But in this case the answer can't be that we
> pile on hack ontop of another, but instead introduce a new qemu
> machine that separates these clearly, and make that mandatory for
> the secure guest support.

That could we one approach, assuming one exists that guests
already support.

-- 
MST


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Michael S. Tsirkin
On Mon, Feb 04, 2019 at 04:15:41PM -0200, Thiago Jung Bauermann wrote:
> 
> Christoph Hellwig  writes:
> 
> > On Tue, Jan 29, 2019 at 09:36:08PM -0500, Michael S. Tsirkin wrote:
> >> This has been discussed ad nauseum. virtio is all about compatibility.
> >> Losing a couple of lines of code isn't worth breaking working setups.
> >> People that want "just use DMA API no tricks" now have the option.
> >> Setting a flag in a feature bit map is literally a single line
> >> of code in the hypervisor. So stop pushing for breaking working
> >> legacy setups and just fix it in the right place.
> >
> > I agree with the legacy aspect.  What I am missing is an extremely
> > strong wording that says you SHOULD always set this flag for new
> > hosts, including an explanation why.
> 
> My understanding of ACCESS_PLATFORM is that it means "this device will
> behave in all aspects like a regular device attached to this bus".


Not really. Look it up in the spec:

VIRTIO_F_ACCESS_PLATFORM(33) This feature indicates that the device can be used 
on a platform
where device access to data in memory is limited and/or translated. 
E.g. this is the case if the device
can be located behind an IOMMU that translates bus addresses from the 
device into physical addresses
in memory, if the device can be limited to only access certain memory 
addresses or if special commands
such as a cache flush can be needed to synchronise data in memory with 
the device. Whether accesses
are actually limited or translated is described by platform-specific 
means. If this feature bit is set to 0,
then the device has same access to memory addresses supplied to it as 
the driver has. In particular, the
device will always use physical addresses matching addresses used by 
the driver (typically meaning
physical addresses used by the CPU) and not translated further, and can 
access any address supplied
to it by the driver. When clear, this overrides any platform-specific 
description of whether device access
is limited or translated in any way, e.g. whether an IOMMU may be 
present.



> Is
> that it? Therefore it should be set because it's the sane thing to do?

It's the sane thing to do unless you want the very specific thing that
having it clear means, which is just have it be another CPU.

It was designed to make, when set, as many guests as we can work
correctly, and it seems to be successful in doing exactly that.

Unfortunately there could be legacy guests that do work correctly but
become slow. Whether trying to somehow work around that
can paint us into a corner where things again don't
work for some people is a question worth discussing.


> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Michael S. Tsirkin
On Mon, Feb 04, 2019 at 04:14:20PM -0200, Thiago Jung Bauermann wrote:
> 
> Hello Michael,
> 
> Michael S. Tsirkin  writes:
> 
> > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> >>
> >> Fixing address of powerpc mailing list.
> >>
> >> Thiago Jung Bauermann  writes:
> >>
> >> > Hello,
> >> >
> >> > With Christoph's rework of the DMA API that recently landed, the patch
> >> > below is the only change needed in virtio to make it work in a POWER
> >> > secure guest under the ultravisor.
> >> >
> >> > The other change we need (making sure the device's dma_map_ops is NULL
> >> > so that the dma-direct/swiotlb code is used) can be made in
> >> > powerpc-specific code.
> >> >
> >> > Of course, I also have patches (soon to be posted as RFC) which hook up
> >> >  to the powerpc secure guest support code.
> >> >
> >> > What do you think?
> >> >
> >> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> >> > From: Thiago Jung Bauermann 
> >> > Date: Thu, 24 Jan 2019 22:08:02 -0200
> >> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is 
> >> > encrypted
> >> >
> >> > The host can't access the guest memory when it's encrypted, so using
> >> > regular memory pages for the ring isn't an option. Go through the DMA 
> >> > API.
> >> >
> >> > Signed-off-by: Thiago Jung Bauermann 
> >
> > Well I think this will come back to bite us (witness xen which is now
> > reworking precisely this path - but at least they aren't to blame, xen
> > came before ACCESS_PLATFORM).
> >
> > I also still think the right thing would have been to set
> > ACCESS_PLATFORM for all systems where device can't access all memory.
> 
> I understand. The problem with that approach for us is that because we
> don't know which guests will become secure guests and which will remain
> regular guests, QEMU would need to offer ACCESS_PLATFORM to all guests.
> 
> And the problem with that is that for QEMU on POWER, having
> ACCESS_PLATFORM turned off means that it can bypass the IOMMU for the
> device (which makes sense considering that the name of the flag was
> IOMMU_PLATFORM). And we need that for regular guests to avoid
> performance degradation.

You don't really, ACCESS_PLATFORM means just that, platform decides.

> So while ACCESS_PLATFORM solves our problems for secure guests, we can't
> turn it on by default because we can't affect legacy systems. Doing so
> would penalize existing systems that can access all memory. They would
> all have to unnecessarily go through address translations, and take a
> performance hit.

So as step one, you just give hypervisor admin an option to run legacy
systems faster by blocking secure mode. I don't see why that is
so terrible.

But as step two, assuming you use above step one to make legacy
guests go fast - maybe there is a point in detecting
such a hypervisor and doing something smarter with it.
By all means let's have a discussion around this but that is no longer
"to make it work" as the commit log says it's more a performance
optimization.


> The semantics of ACCESS_PLATFORM assume that the hypervisor/QEMU knows
> in advance - right when the VM is instantiated - that it will not have
> access to all guest memory.

Not quite. It just means that hypervisor can live with not having
access to all memory. If platform wants to give it access
to all memory that is quite all right.


> Unfortunately that assumption is subtly
> broken on our secure-platform. The hypervisor/QEMU realizes that the
> platform is going secure only *after the VM is instantiated*. It's the
> kernel running in the VM that determines that it wants to switch the
> platform to secure-mode.

ACCESS_PLATFORM is there so guests can detect legacy hypervisors
which always assumed it's another CPU.

> Another way of looking at this issue which also explains our reluctance
> is that the only difference between a secure guest and a regular guest
> (at least regarding virtio) is that the former uses swiotlb while the
> latter doens't.

But swiotlb is just one implementation. It's a guest internal thing. The
issue is that memory isn't host accessible.  Yes linux does not use that
info too much right now but it already begins to seep out of the
abstraction.  For example as you are doing data copies you should maybe
calculate the packet checksum just as well.  Not something DMA API will
let you know right now, but that's because any bounce buffer users so
far weren't terribly fast a

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 11:05:42AM +0800, Jason Wang wrote:
> 
> On 2019/1/30 上午10:36, Michael S. Tsirkin wrote:
> > On Wed, Jan 30, 2019 at 10:24:01AM +0800, Jason Wang wrote:
> > > On 2019/1/30 上午3:02, Michael S. Tsirkin wrote:
> > > > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> > > > > Fixing address of powerpc mailing list.
> > > > > 
> > > > > Thiago Jung Bauermann  writes:
> > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > With Christoph's rework of the DMA API that recently landed, the 
> > > > > > patch
> > > > > > below is the only change needed in virtio to make it work in a POWER
> > > > > > secure guest under the ultravisor.
> > > > > > 
> > > > > > The other change we need (making sure the device's dma_map_ops is 
> > > > > > NULL
> > > > > > so that the dma-direct/swiotlb code is used) can be made in
> > > > > > powerpc-specific code.
> > > > > > 
> > > > > > Of course, I also have patches (soon to be posted as RFC) which 
> > > > > > hook up
> > > > > >  to the powerpc secure guest support code.
> > > > > > 
> > > > > > What do you think?
> > > > > > 
> > > > > >   From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 
> > > > > > 2001
> > > > > > From: Thiago Jung Bauermann 
> > > > > > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > > > > > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is 
> > > > > > encrypted
> > > > > > 
> > > > > > The host can't access the guest memory when it's encrypted, so using
> > > > > > regular memory pages for the ring isn't an option. Go through the 
> > > > > > DMA API.
> > > > > > 
> > > > > > Signed-off-by: Thiago Jung Bauermann 
> > > > Well I think this will come back to bite us (witness xen which is now
> > > > reworking precisely this path - but at least they aren't to blame, xen
> > > > came before ACCESS_PLATFORM).
> > > > 
> > > > I also still think the right thing would have been to set
> > > > ACCESS_PLATFORM for all systems where device can't access all memory.
> > > > 
> > > > But I also think I don't have the energy to argue about power secure
> > > > guest anymore.  So be it for power secure guest since the involved
> > > > engineers disagree with me.  Hey I've been wrong in the past ;).
> > > > 
> > > > But the name "sev_active" makes me scared because at least AMD guys who
> > > > were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
> > > > wrong? I reemember distinctly that's so) will likely be affected too.
> > > > We don't want that.
> > > > 
> > > > So let's find a way to make sure it's just power secure guest for now
> > > > pls.
> > > > 
> > > > I also think we should add a dma_api near features under virtio_device
> > > > such that these hacks can move off data path.
> > > 
> > > Anyway the current Xen code is conflict with spec which said:
> > > 
> > > "If this feature bit is set to 0, then the device has same access to 
> > > memory
> > > addresses supplied to it as the driver has. In particular, the device will
> > > always use physical addresses matching addresses used by the driver
> > > (typically meaning physical addresses used by the CPU) and not translated
> > > further, and can access any address supplied to it by the driver. When
> > > clear, this overrides any platform-specific description of whether device
> > > access is limited or translated in any way, e.g. whether an IOMMU may be
> > > present. "
> > > 
> > > I wonder how much value that the above description can give us. It's kind 
> > > of
> > > odd that the behavior of "when the feature is not negotiated" is described
> > > in the spec.
> > Hmm what's odd about it? We need to describe the behaviour is all cases.
> 
> 
> Well, try to limit the behavior of 'legacy' driver is tricky or even
> impossible. Xen is an exact example for this.

So don't. Xen got grand-fathered in because when that came
along we thought it's a one off thi

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 10:24:01AM +0800, Jason Wang wrote:
> 
> On 2019/1/30 上午3:02, Michael S. Tsirkin wrote:
> > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> > > Fixing address of powerpc mailing list.
> > > 
> > > Thiago Jung Bauermann  writes:
> > > 
> > > > Hello,
> > > > 
> > > > With Christoph's rework of the DMA API that recently landed, the patch
> > > > below is the only change needed in virtio to make it work in a POWER
> > > > secure guest under the ultravisor.
> > > > 
> > > > The other change we need (making sure the device's dma_map_ops is NULL
> > > > so that the dma-direct/swiotlb code is used) can be made in
> > > > powerpc-specific code.
> > > > 
> > > > Of course, I also have patches (soon to be posted as RFC) which hook up
> > > >  to the powerpc secure guest support code.
> > > > 
> > > > What do you think?
> > > > 
> > > >  From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> > > > From: Thiago Jung Bauermann 
> > > > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > > > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is 
> > > > encrypted
> > > > 
> > > > The host can't access the guest memory when it's encrypted, so using
> > > > regular memory pages for the ring isn't an option. Go through the DMA 
> > > > API.
> > > > 
> > > > Signed-off-by: Thiago Jung Bauermann 
> > Well I think this will come back to bite us (witness xen which is now
> > reworking precisely this path - but at least they aren't to blame, xen
> > came before ACCESS_PLATFORM).
> > 
> > I also still think the right thing would have been to set
> > ACCESS_PLATFORM for all systems where device can't access all memory.
> > 
> > But I also think I don't have the energy to argue about power secure
> > guest anymore.  So be it for power secure guest since the involved
> > engineers disagree with me.  Hey I've been wrong in the past ;).
> > 
> > But the name "sev_active" makes me scared because at least AMD guys who
> > were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
> > wrong? I reemember distinctly that's so) will likely be affected too.
> > We don't want that.
> > 
> > So let's find a way to make sure it's just power secure guest for now
> > pls.
> > 
> > I also think we should add a dma_api near features under virtio_device
> > such that these hacks can move off data path.
> 
> 
> Anyway the current Xen code is conflict with spec which said:
> 
> "If this feature bit is set to 0, then the device has same access to memory
> addresses supplied to it as the driver has. In particular, the device will
> always use physical addresses matching addresses used by the driver
> (typically meaning physical addresses used by the CPU) and not translated
> further, and can access any address supplied to it by the driver. When
> clear, this overrides any platform-specific description of whether device
> access is limited or translated in any way, e.g. whether an IOMMU may be
> present. "
> 
> I wonder how much value that the above description can give us. It's kind of
> odd that the behavior of "when the feature is not negotiated" is described
> in the spec.

Hmm what's odd about it? We need to describe the behaviour is all cases.

> Personally I think we can remove the above and then we can
> switch to use DMA API unconditionally in guest driver. It may have single
> digit regression probably, we can try to overcome it.
> 
> Thanks

This has been discussed ad nauseum. virtio is all about compatibility.
Losing a couple of lines of code isn't worth breaking working setups.
People that want "just use DMA API no tricks" now have the option.
Setting a flag in a feature bit map is literally a single line
of code in the hypervisor. So stop pushing for breaking working
legacy setups and just fix it in the right place.

> 
> > 
> > By the way could you please respond about virtio-iommu and
> > why there's no support for ACCESS_PLATFORM on power?
> > 
> > I have Cc'd you on these discussions.
> > 
> > 
> > Thanks!
> > 
> > 
> > > > ---
> > > >   drivers/virtio/virtio_ring.c | 5 -
> > > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > >

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> 
> Fixing address of powerpc mailing list.
> 
> Thiago Jung Bauermann  writes:
> 
> > Hello,
> >
> > With Christoph's rework of the DMA API that recently landed, the patch
> > below is the only change needed in virtio to make it work in a POWER
> > secure guest under the ultravisor.
> >
> > The other change we need (making sure the device's dma_map_ops is NULL
> > so that the dma-direct/swiotlb code is used) can be made in
> > powerpc-specific code.
> >
> > Of course, I also have patches (soon to be posted as RFC) which hook up
> >  to the powerpc secure guest support code.
> >
> > What do you think?
> >
> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> > From: Thiago Jung Bauermann 
> > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
> >
> > The host can't access the guest memory when it's encrypted, so using
> > regular memory pages for the ring isn't an option. Go through the DMA API.
> >
> > Signed-off-by: Thiago Jung Bauermann 

Well I think this will come back to bite us (witness xen which is now
reworking precisely this path - but at least they aren't to blame, xen
came before ACCESS_PLATFORM).

I also still think the right thing would have been to set
ACCESS_PLATFORM for all systems where device can't access all memory.

But I also think I don't have the energy to argue about power secure
guest anymore.  So be it for power secure guest since the involved
engineers disagree with me.  Hey I've been wrong in the past ;).

But the name "sev_active" makes me scared because at least AMD guys who
were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
wrong? I reemember distinctly that's so) will likely be affected too.
We don't want that.

So let's find a way to make sure it's just power secure guest for now
pls.

I also think we should add a dma_api near features under virtio_device
such that these hacks can move off data path.

By the way could you please respond about virtio-iommu and
why there's no support for ACCESS_PLATFORM on power?

I have Cc'd you on these discussions.


Thanks!


> > ---
> >  drivers/virtio/virtio_ring.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index cd7e755484e3..321a27075380 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device 
> > *vdev)
> >  * not work without an even larger kludge.  Instead, enable
> >  * the DMA API if we're a Xen guest, which at least allows
> >  * all of the sensible Xen configurations to work correctly.
> > +*
> > +* Also, if guest memory is encrypted the host can't access
> > +* it directly. In this case, we'll need to use the DMA API.
> >  */
> > -   if (xen_domain())
> > +   if (xen_domain() || sev_active())
> > return true;
> >
> > return false;
> 
> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [PATCH v7 0/7] Add virtio-iommu driver

2019-01-29 Thread Michael S. Tsirkin
On Mon, Jan 21, 2019 at 11:29:05AM +, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 18/01/2019 15:51, Michael S. Tsirkin wrote:
> > 
> > On Tue, Jan 15, 2019 at 12:19:52PM +, Jean-Philippe Brucker wrote:
> >> Implement the virtio-iommu driver, following specification v0.9 [1].
> >>
> >> This is a simple rebase onto Linux v5.0-rc2. We now use the
> >> dev_iommu_fwspec_get() helper introduced in v5.0 instead of accessing
> >> dev->iommu_fwspec, but there aren't any functional change from v6 [2].
> >>
> >> Our current goal for virtio-iommu is to get a paravirtual IOMMU working
> >> on Arm, and enable device assignment to guest userspace. In this
> >> use-case the mappings are static, and don't require optimal performance,
> >> so this series tries to keep things simple. However there is plenty more
> >> to do for features and optimizations, and having this base in v5.1 would
> >> be good. Given that most of the changes are to drivers/iommu, I believe
> >> the driver and future changes should go via the IOMMU tree.
> >>
> >> You can find Linux driver and kvmtool device on v0.9.2 branches [3],
> >> module and x86 support on virtio-iommu/devel. Also tested with Eric's
> >> QEMU device [4]. Please note that the series depends on Robin's
> >> probe-deferral fix [5], which will hopefully land in v5.0.
> >>
> >> [1] Virtio-iommu specification v0.9, sources and pdf
> >> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
> >> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> >>
> >> [2] [PATCH v6 0/7] Add virtio-iommu driver
> >> 
> >> https://lists.linuxfoundation.org/pipermail/iommu/2018-December/032127.html
> >>
> >> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.2
> >> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9.2
> >>
> >> [4] [RFC v9 00/17] VIRTIO-IOMMU device
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
> >>
> >> [5] [PATCH] iommu/of: Fix probe-deferral
> >> https://www.spinics.net/lists/arm-kernel/msg698371.html
> > 
> > Thanks for the work!
> > So really my only issue with this is that there's no
> > way for the IOMMU to describe the devices that it
> > covers.
> > 
> > As a result that is then done in a platform-specific way.
> > 
> > And this means that for example it does not solve the problem that e.g.
> > some power people have in that their platform simply does not have a way
> > to specify which devices are covered by the IOMMU.
> 
> Isn't power using device tree? I haven't looked much at power because I
> was told a while ago that they already paravirtualize their IOMMU and
> don't need virtio-iommu, except perhaps for some legacy platforms. Or
> something along those lines. But I would certainly be interested in
> enabling the IOMMU for more architectures.

I have CC'd the relevant ppc developers, let's see what do they think.


> As for the enumeration problem, I still don't think we can get much
> better than DT and ACPI as solutions (and IMO they are necessary to make
> this device portable). But I believe that getting DT and ACPI support is
> just a one-off inconvenience. That is, once the required bindings are
> accepted, any future extension can then be done at the virtio level with
> feature bits and probe requests, without having to update ACPI or DT.
> 
> Thanks,
> Jean
> 
> > Solving that problem would make me much more excited about
> > this device.
> > 
> > On the other hand I can see that while there have been some
> > developments most of the code has been stable for quite a while now.
> > 
> > So what I am trying to do right about now, is making a small module that
> > loads early and pokes at the IOMMU sufficiently to get the data about
> > which devices use the IOMMU out of it using standard virtio config
> > space.  IIUC it's claimed to be impossible without messy changes to the
> > boot sequence.
> > 
> > If I succeed at least on some platforms I'll ask that this design is
> > worked into this device, minimizing info that goes through DT/ACPI.  If
> > I see I can't make it in time to meet the next merge window, I plan
> > merging the existing patches using DT (barring surprises).
> > 
> > As I only have a very small amount of time to spend on this attempt, If
> > someone else wants to try doing that in parallel, that would be great!
> > 
> > 
> >> Jean-Philippe Brucker (7):
> >>   dt-bindings: virti

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-08 Thread Michael S. Tsirkin
On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote:
> Sure, but all of this is just the configuration of the iommu. But I
> think we agree here, and your point remains valid, indeed my proposed
> hack:
> 
> >   if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
> 
> Will only work if the IOMMU and non-IOMMU path are completely equivalent.
> 
> We can provide that guarantee for our secure VM case, but not generally so if
> we were to go down the route of a quirk in virtio, it might be better to
> make it painfully obvious that it's specific to that one case with a different
> kind of turd:
> 
> - if (xen_domain())
> + if (xen_domain() || pseries_secure_vm())
>   return true;

I don't think it's pseries specific actually. E.g. I suspect AMD SEV
might benefit from the same kind of hack.


> So to summarize, and make sure I'm not missing something, the two approaches
> at hand are either:
> 
>  1- The above, which is a one liner and contained in the guest, so that's 
> nice, but
> also means another turd in virtio which isn't ...
> 
>  2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the 
> current
> architecture on our side that will force virtio to always go through an 
> emulated
> iommu, as pseries doesn't have the concept of a real bypass window, and thus 
> will
> impact performance for both secure and non-secure VMs.
> 
>  3- Invent a property that can be put in selected PCI device tree nodes that
> indicates that for that device specifically, the iommu can be bypassed, along 
> with
> a hypercall to turn that bypass on/off. Virtio would then use 
> VIRTIO_F_IOMMU_PLATFORM
> but its DT nodes would also have that property and Linux would notice it and 
> turn
> bypass on.

For completeness, virtio could also have its own bounce buffer
outside of DMA API one. I don't see lots of benefits to this
though.


> The resulting properties of those options are:
> 
> 1- Is what I want because it's the simplest, provides the best performance 
> now,
>and works without code changes to qemu or non-secure Linux. However it does
>add a tiny turd to virtio which is annoying.
> 
> 2- This works but it puts the iommu in the way always, thus reducing virtio 
> performance
>accross the board for pseries unless we only do that for secure VMs but 
> that is
>difficult (as discussed earlier).
> 
> 3- This would recover the performance lost in -2-, however it requires qemu 
> *and*
>guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
>performance hit of -2- unless modified to call that 'enable bypass' call, 
> which
>isn't great.
> 
> So imho we have to chose one of 3 not-great solutions here... Unless I missed
> something in your ideas of course.
> 
> Cheers,
> Ben.
> 
> 


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > > > As I said replying to Christoph, we are "leaking" into the interface
> > > > > something here that is really what's the VM is doing to itself, which
> > > > > is to stash its memory away in an inaccessible place.
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > 
> > > > I think Christoph merely objects to the specific implementation.  If
> > > > instead you do something like tweak dev->bus_dma_mask for the virtio
> > > > device I think he won't object.
> > > 
> > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> > > 
> > > So, something like that would be a possibility, but the problem is that
> > > the current virtio (guest side) implementation doesn't honor this when
> > > not using dma ops and will not use dma ops if not using iommu, so back
> > > to square one.
> > 
> > Well we have the RFC for that - the switch to using DMA ops unconditionally 
> > isn't
> > problematic itself IMHO, for now that RFC is blocked
> > by its perfromance overhead for now but Christoph says
> > he's trying to remove that for direct mappings,
> > so we should hopefully be able to get there in X weeks.
> 
> That would be good yes.
> 
>  ../..
> 
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> > > *vdev)
> > >  * the DMA API if we're a Xen guest, which at least allows
> > >  * all of the sensible Xen configurations to work correctly.
> > >  */
> > > -   if (xen_domain())
> > > +   if (xen_domain() || arch_virtio_direct_dma_ops(>dev))
> > > return true;
> > >  
> > > return false;
> > 
> > Right but can't we fix the retpoline overhead such that
> > vring_use_dma_api will not be called on data path any longer, making
> > this a setup time check?
> 
> Yes it needs to be a setup time check regardless actually !
> 
> The above is broken, sorry I was a bit quick here (too early in the
> morning... ugh). We don't want the arch to go override the dma ops
> every time that is callled.
> 
> But yes, if we can fix the overhead, it becomes just a matter of
> setting up the "right" ops automatically.
> 
> > > (Passing the dev allows the arch to know this is a virtio device in
> > > "direct" mode or whatever we want to call the !iommu case, and
> > > construct appropriate DMA ops for it, which aren't the same as the DMA
> > > ops of any other PCI device who *do* use the iommu).
> > 
> > I think that's where Christoph might have specific ideas about it.
> 
> OK well, assuming Christoph can solve the direct case in a way that
> also work for the virtio !iommu case, we still want some bit of logic
> somewhere that will "switch" to swiotlb based ops if the DMA mask is
> limited.
> 
> You mentioned an RFC for that ? Do you happen to have a link ?

No but Christoph did I think.

> It would be indeed ideal if all we had to do was setup some kind of
> bus_dma_mask on all PCI devices and have virtio automagically insert
> swiotlb when necessary.
> 
> Cheers,
> Ben.
> 


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > As I said replying to Christoph, we are "leaking" into the interface
> > > something here that is really what's the VM is doing to itself, which
> > > is to stash its memory away in an inaccessible place.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > I think Christoph merely objects to the specific implementation.  If
> > instead you do something like tweak dev->bus_dma_mask for the virtio
> > device I think he won't object.
> 
> Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> 
> So, something like that would be a possibility, but the problem is that
> the current virtio (guest side) implementation doesn't honor this when
> not using dma ops and will not use dma ops if not using iommu, so back
> to square one.

Well we have the RFC for that - the switch to using DMA ops unconditionally 
isn't
problematic itself IMHO, for now that RFC is blocked
by its perfromance overhead for now but Christoph says
he's trying to remove that for direct mappings,
so we should hopefully be able to get there in X weeks.

> Christoph seems to be wanting to use a flag in the interface to make
> the guest use dma_ops which is what I don't understand.
> 
> What would be needed then would be something along the lines of virtio
> noticing that dma_mask isn't big enough to cover all of memory (which
> isn't something generic code can easily do here for various reasons I
> can elaborate if you want, but that specific test more/less has to be
> arch specific), and in that case, force itself to use DMA ops routed to
> swiotlb.
> 
> I'd rather have arch code do the bulk of that work, don't you think ?
> 
> Which brings me back to this option, which may be the simplest and
> avoids the overhead of the proposed series (I found the series to be a
> nice cleanup but retpoline does kick us in the nuts here).
> 
> So what about this ?
> 
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> *vdev)
>  * the DMA API if we're a Xen guest, which at least allows
>  * all of the sensible Xen configurations to work correctly.
>  */
> -   if (xen_domain())
> +   if (xen_domain() || arch_virtio_direct_dma_ops(>dev))
> return true;
>  
> return false;

Right but can't we fix the retpoline overhead such that
vring_use_dma_api will not be called on data path any longer, making
this a setup time check?


> (Passing the dev allows the arch to know this is a virtio device in
> "direct" mode or whatever we want to call the !iommu case, and
> construct appropriate DMA ops for it, which aren't the same as the DMA
> ops of any other PCI device who *do* use the iommu).

I think that's where Christoph might have specific ideas about it.

> Otherwise, the harder option would be for us to hack so that
> xen_domain() returns true in our setup (gross), and have the arch code,
> when it sets up PCI device DMA ops, have a gross hack to identify
> virtio PCI devices, checks their F_IOMMU flag itself, and sets up the
> different ops at that point.
> 
> As for those "special" ops, they are of course just normal swiotlb ops,
> there's nothing "special" other that they aren't the ops that other PCI
> device on that bus use.
> 
> Cheers,
> Ben.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:
> > 
> > > Right, we'll need some quirk to disable balloons  in the guest I
> > > suppose.
> > > 
> > > Passing something from libvirt is cumbersome because the end user may
> > > not even need to know about secure VMs. There are use cases where the
> > > security is a contract down to some special application running inside
> > > the secure VM, the sysadmin knows nothing about.
> > > 
> > > Also there's repercussions all the way to admin tools, web UIs etc...
> > > so it's fairly wide ranging.
> > > 
> > > So as long as we only need to quirk a couple of devices, it's much
> > > better contained that way.
> > 
> > So just the balloon thing already means that yes management and all the
> > way to the user tools must know this is going on. Otherwise
> > user will try to inflate the balloon and wonder why this does not work.
> 
> There is *dozens* of management systems out there, not even all open
> source, we won't ever be able to see the end of the tunnel if we need
> to teach every single of them, including end users, about platform
> specific new VM flags like that.
> 
> .../...

In the end I suspect you will find you have to.

> > Here's another example: you can't migrate a secure vm to hypervisor
> > which doesn't support this feature. Again management tools above libvirt
> > need to know otherwise they will try.
> 
> There will have to be a new machine type for that I suppose, yes,
> though it's not just the hypervisor that needs to know about the
> modified migration stream, it's also the need to have a compatible
> ultravisor with the right keys on the other side.
> 
> So migration is going to be special and require extra admin work in all
> cases yes. But not all secure VMs are meant to be migratable.
> 
> In any case, back to the problem at hand. What a qemu flag gives us is
> just a way to force iommu at VM creation time.

I don't think a qemu flag is strictly required for a problem at hand.

> This is rather sub-optimal, we don't really want the iommu in the way,
> so it's at best a "workaround", and it's not really solving the real
> problem.

This specific problem, I think I agree.

> As I said replying to Christoph, we are "leaking" into the interface
> something here that is really what's the VM is doing to itself, which
> is to stash its memory away in an inaccessible place.
> 
> Cheers,
> Ben.

I think Christoph merely objects to the specific implementation.  If
instead you do something like tweak dev->bus_dma_mask for the virtio
device I think he won't object.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Mon, Aug 06, 2018 at 09:10:40AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote:
> > > I've done something very similar in the thread I posted a few years
> > > ago.
> > 
> > Right so that was before spectre where a virtual call was cheaper :(
> 
> Sorry, I meant days, not years.  The whole point of the thread was the
> slowdowns due to retpolines, which are the software spectre mitigation.

Oh that makes sense then. Could you post a pointer pls so
this patchset is rebased on top (there are things to
change about 4/4 but 1-3 could go in if they don't add
overhead)?

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Mon, Aug 06, 2018 at 08:24:06AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> > > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> > > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > >>>>>> Please go through these patches and review whether this approach 
> > > >>>>>> broadly
> > > >>>>>> makes sense. I will appreciate suggestions, inputs, comments 
> > > >>>>>> regarding
> > > >>>>>> the patches or the approach in general. Thank you.
> > > >>>>>
> > > >>>>> Jason did some work on profiling this. Unfortunately he reports
> > > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU.
> > > >>>>
> > > >>>> The test is rather simple, just run pktgen 
> > > >>>> (pktgen_sample01_simple.sh) in
> > > >>>> guest and measure PPS on tap on host.
> > > >>>>
> > > >>>> Thanks
> > > >>>
> > > >>> Could you supply host configuration involved please?
> > > >>
> > > >> I wonder how much of that could be caused by Spectre mitigations
> > > >> blowing up indirect function calls...
> > > >>
> > > >> Cheers,
> > > >> Ben.
> > > > 
> > > > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> > > 
> > > Did we get better results (lower regression due to indirect calls) with
> > > the suggested mitigation ? Just curious.
> > 
> > I'm referring to this:
> > I wonder whether we can support map_sg and friends being NULL, then use
> > that when mapping is an identity. A conditional branch there is likely
> > very cheap.
> > 
> > I don't think anyone tried implementing this yes.
> 
> I've done something very similar in the thread I posted a few years
> ago.

Right so that was before spectre where a virtual call was cheaper :(

> I plan to get a version of that upstream for 4.20, but it won't
> cover the virtio case, just the real direct mapping.

I guess this RFC will have to be reworked on top and performance retested.

Thanks,

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Sun, Aug 05, 2018 at 02:52:54PM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote:
> > I see the allure of this, but I think down the road you will
> > discover passing a flag in libvirt XML saying
> > "please use a secure mode" or whatever is a good idea.
> > 
> > Even thought it is probably not required to address this
> > specific issue.
> > 
> > For example, I don't think ballooning works in secure mode,
> > you will be able to teach libvirt not to try to add a
> > balloon to the guest.
> 
> Right, we'll need some quirk to disable balloons  in the guest I
> suppose.
> 
> Passing something from libvirt is cumbersome because the end user may
> not even need to know about secure VMs. There are use cases where the
> security is a contract down to some special application running inside
> the secure VM, the sysadmin knows nothing about.
> 
> Also there's repercussions all the way to admin tools, web UIs etc...
> so it's fairly wide ranging.
> 
> So as long as we only need to quirk a couple of devices, it's much
> better contained that way.

So just the balloon thing already means that yes management and all the
way to the user tools must know this is going on. Otherwise
user will try to inflate the balloon and wonder why this does not work.

> > > Later on, (we may have even already run Linux at that point,
> > > unsecurely, as we can use Linux as a bootloader under some
> > > circumstances), we start a "secure image".
> > > 
> > > This is a kernel zImage that includes a "ticket" that has the
> > > appropriate signature etc... so that when that kernel starts, it can
> > > authenticate with the ultravisor, be verified (along with its ramdisk)
> > > etc... and copied (by the UV) into secure memory & run from there.
> > > 
> > > At that point, the hypervisor is informed that the VM has become
> > > secure.
> > > 
> > > So at that point, we could exit to qemu to inform it of the change,
> > 
> > That's probably a good idea too.
> 
> We probably will have to tell qemu eventually for migration, as we'll
> need some kind of key exchange phase etc... to deal with the crypto
> aspects (the actual page copy is sorted via encrypting the secure pages
> back to normal pages in qemu, but we'll need extra metadata).
> 
> > > and
> > > have it walk the qtree and "Switch" all the virtio devices to use the
> > > IOMMU I suppose, but it feels a lot grosser to me.
> > 
> > That part feels gross, yes.
> > 
> > > That's the only other option I can think of.
> > > 
> > > > However in this specific case, the flag does not need to come from the
> > > > hypervisor, it can be set by arch boot code I think.
> > > > Christoph do you see a problem with that?
> > > 
> > > The above could do that yes. Another approach would be to do it from a
> > > small virtio "quirk" that pokes a bit in the device to force it to
> > > iommu mode when it detects that we are running in a secure VM. That's a
> > > bit warty on the virito side but probably not as much as having a qemu
> > > one that walks of the virtio devices to change how they behave.
> > > 
> > > What do you reckon ?
> > 
> > I think you are right that for the dma limit the hypervisor doesn't seem
> > to need to know.
> 
> It's not just a limit mind you. It's a range, at least if we allocate
> just a single pool of insecure pages. swiotlb feels like a better
> option for us.
> 
> > > What we want to avoid is to expose any of this to the *end user* or
> > > libvirt or any other higher level of the management stack. We really
> > > want that stuff to remain contained between the VM itself, KVM and
> > > maybe qemu.
> > > 
> > > We will need some other qemu changes for migration so that's ok. But
> > > the minute you start touching libvirt and the higher levels it becomes
> > > a nightmare.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > I don't believe you'll be able to avoid that entirely. The split between
> > libvirt and qemu is more about community than about code, random bits of
> > functionality tend to land on random sides of that fence.  Better add a
> > tag in domain XML early is my advice. Having said that, it's your
> > hypervisor. I'm just suggesting that when hypervisor does somehow need
> > to care then I suspect most people won't be receptive to the argument
> > that changing libvir

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> >>>>>> Please go through these patches and review whether this approach 
> >>>>>> broadly
> >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding
> >>>>>> the patches or the approach in general. Thank you.
> >>>>>
> >>>>> Jason did some work on profiling this. Unfortunately he reports
> >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU.
> >>>>
> >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> >>>> guest and measure PPS on tap on host.
> >>>>
> >>>> Thanks
> >>>
> >>> Could you supply host configuration involved please?
> >>
> >> I wonder how much of that could be caused by Spectre mitigations
> >> blowing up indirect function calls...
> >>
> >> Cheers,
> >> Ben.
> > 
> > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> 
> Did we get better results (lower regression due to indirect calls) with
> the suggested mitigation ? Just curious.

I'm referring to this:
I wonder whether we can support map_sg and friends being NULL, then use
that when mapping is an identity. A conditional branch there is likely
very cheap.

I don't think anyone tried implementing this yes.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-04 Thread Michael S. Tsirkin
On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > > However the question people raise is that DMA API is already full of
> > > > arch-specific tricks the likes of which are outlined in your post linked
> > > > above. How is this one much worse?
> > > 
> > > None of these warts is visible to the driver, they are all handled in
> > > the architecture (possibly on a per-bus basis).
> > > 
> > > So for virtio we really need to decide if it has one set of behavior
> > > as specified in the virtio spec, or if it behaves exactly as if it
> > > was on a PCI bus, or in fact probably both as you lined up.  But no
> > > magic arch specific behavior inbetween.
> > 
> > The only arch specific behaviour is needed in the case where it doesn't
> > behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> > our secure VMs, we still need to make it use swiotlb in order to bounce
> > through non-secure pages.
> 
> On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> transport (so definitely not PCI) have historically been advertised by qemu
> as not being cache coherent, but because the virtio core has bypassed DMA
> ops then everything has happened to work. If we blindly enable the arch DMA
> ops, we'll plumb in the non-coherent ops and start getting data corruption,
> so we do need a way to quirk virtio as being "always coherent" if we want to
> use the DMA ops (which we do, because our emulation platforms have an IOMMU
> for all virtio devices).
> 
> Will

Right that's not very different from placing the device within the IOMMU
domain but in fact bypassing the IOMMU. I wonder whether anyone ever
needs a non coherent virtio-mmio. If yes we can extend
PLATFORM_IOMMU to cover that or add another bit.

What exactly do the non-coherent ops do that causes the corruption?

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-04 Thread Michael S. Tsirkin
On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > > > Please go through these patches and review whether this approach 
> > > > > broadly
> > > > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > > > the patches or the approach in general. Thank you.
> > > > 
> > > > Jason did some work on profiling this. Unfortunately he reports
> > > > about 4% extra overhead from this switch on x86 with no vIOMMU.
> > > 
> > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> > > guest and measure PPS on tap on host.
> > > 
> > > Thanks
> > 
> > Could you supply host configuration involved please?
> 
> I wonder how much of that could be caused by Spectre mitigations
> blowing up indirect function calls...
> 
> Cheers,
> Ben.

I won't be surprised. If yes I suggested a way to mitigate the overhead.

-- 
MSR 


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-04 Thread Michael S. Tsirkin
On Fri, Aug 03, 2018 at 08:22:11PM -0500, Benjamin Herrenschmidt wrote:
> (Appologies if you got this twice, my mailer had a brain fart and I don't
> know if the first one got through & am about to disappear in a plane for 17h)

I got like 3 of these. I hope that's true for everyone as I replied to
1st one.


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-04 Thread Michael S. Tsirkin
On Fri, Aug 03, 2018 at 08:16:21PM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > > running on a secure VM in our case.
> > > > 
> > > > And total NAK the customer platform-provided part of this.  We need
> > > > a flag passed in from the hypervisor that the device needs all bus
> > > > specific dma api treatment, and then just use the normal plaform
> > > > dma mapping setup. 
> > > 
> > > Christoph, as I have explained already, we do NOT have a way to provide
> > > such a flag as neither the hypervisor nor qemu knows anything about
> > > this when the VM is created.
> > 
> > I think the fact you can't add flags from the hypervisor is
> > a sign of a problematic architecture, you should look at
> > adding that down the road - you will likely need it at some point.
> 
> Well, we can later in the boot process. At VM creation time, it's just
> a normal VM. The VM firmware, bootloader etc... are just operating
> normally etc...

I see the allure of this, but I think down the road you will
discover passing a flag in libvirt XML saying
"please use a secure mode" or whatever is a good idea.

Even thought it is probably not required to address this
specific issue.

For example, I don't think ballooning works in secure mode,
you will be able to teach libvirt not to try to add a
balloon to the guest.

> Later on, (we may have even already run Linux at that point,
> unsecurely, as we can use Linux as a bootloader under some
> circumstances), we start a "secure image".
> 
> This is a kernel zImage that includes a "ticket" that has the
> appropriate signature etc... so that when that kernel starts, it can
> authenticate with the ultravisor, be verified (along with its ramdisk)
> etc... and copied (by the UV) into secure memory & run from there.
> 
> At that point, the hypervisor is informed that the VM has become
> secure.
> 
> So at that point, we could exit to qemu to inform it of the change,

That's probably a good idea too.

> and
> have it walk the qtree and "Switch" all the virtio devices to use the
> IOMMU I suppose, but it feels a lot grosser to me.

That part feels gross, yes.

> That's the only other option I can think of.
> 
> > However in this specific case, the flag does not need to come from the
> > hypervisor, it can be set by arch boot code I think.
> > Christoph do you see a problem with that?
> 
> The above could do that yes. Another approach would be to do it from a
> small virtio "quirk" that pokes a bit in the device to force it to
> iommu mode when it detects that we are running in a secure VM. That's a
> bit warty on the virito side but probably not as much as having a qemu
> one that walks of the virtio devices to change how they behave.
> 
> What do you reckon ?

I think you are right that for the dma limit the hypervisor doesn't seem
to need to know.

> What we want to avoid is to expose any of this to the *end user* or
> libvirt or any other higher level of the management stack. We really
> want that stuff to remain contained between the VM itself, KVM and
> maybe qemu.
>
> We will need some other qemu changes for migration so that's ok. But
> the minute you start touching libvirt and the higher levels it becomes
> a nightmare.
> 
> Cheers,
> Ben.

I don't believe you'll be able to avoid that entirely. The split between
libvirt and qemu is more about community than about code, random bits of
functionality tend to land on random sides of that fence.  Better add a
tag in domain XML early is my advice. Having said that, it's your
hypervisor. I'm just suggesting that when hypervisor does somehow need
to care then I suspect most people won't be receptive to the argument
that changing libvirt is a nightmare.

> > > >  To get swiotlb you'll need to then use the DT/ACPI
> > > > dma-range property to limit the addressable range, and a swiotlb
> > > > capable plaform will use swiotlb automatically.
> > > 
> > > This cannot be done as you describe it.
> > > 
> > > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > > at a point where it has *no idea* that the VM will later become secure
> > > and thus will have to restrict which pages can be used for "DMA".
>

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-04 Thread Michael S. Tsirkin
On Sat, Aug 04, 2018 at 01:15:00AM -0700, Christoph Hellwig wrote:
> On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote:
> > It seems reasonable to teach a platform to override dma-range
> > for a specific device e.g. in case it knows about bugs in ACPI.
> 
> A platform will be able override dma-range using the dev->bus_dma_mask
> field starting in 4.19.  But we'll still need a way how to
> 
>   a) document in the virtio spec that all bus dma quirks are to be
>  applied

I agree it's a good idea. In particular I suspect that PLATFORM_IOMMU
should be extended to cover that. But see below.

>   b) a way to document in a virtio-related spec how the bus handles
>  dma for Ben's totally fucked up hypervisor.  Without that there
>  is not way we'll get interoperable implementations.


So in this case however I'm not sure what exactly do we want to add. It
seems that from point of view of the device, there is nothing special -
it just gets a PA and writes there.  It also seems that guest does not
need to get any info from the device either. Instead guest itself needs
device to DMA into specific addresses, for its own reasons.

It seems that the fact that within guest it's implemented using a bounce
buffer and that it's easiest to do by switching virtio to use the DMA API
isn't something virtio spec concerns itself with.

I'm open to suggestions.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-03 Thread Michael S. Tsirkin
On Fri, Aug 03, 2018 at 12:05:07AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> > So let's differenciate the two problems of having an IOMMU (real or
> > emulated) which indeeds adds overhead etc... and using the DMA API.
> > 
> > At the moment, virtio does this all over the place:
> > 
> > if (use_dma_api)
> > dma_map/alloc_something(...)
> > else
> > use_pa
> > 
> > The idea of the patch set is to do two, somewhat orthogonal, changes
> > that together achieve what we want. Let me know where you think there
> > is "a bunch of issues" because I'm missing it:
> > 
> >  1- Replace the above if/else constructs with just calling the DMA API,
> > and have virtio, at initialization, hookup its own dma_ops that just
> > "return pa" (roughly) when the IOMMU stuff isn't used.
> > 
> > This adds an indirect function call to the path that previously didn't
> > have one (the else case above). Is that a significant/measurable
> > overhead ?
> 
> If you call it often enough it does:
> 
> https://www.spinics.net/lists/netdev/msg495413.html
> 
> >  2- Make virtio use the DMA API with our custom platform-provided
> > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > running on a secure VM in our case.
> 
> And total NAK the customer platform-provided part of this.  We need
> a flag passed in from the hypervisor that the device needs all bus
> specific dma api treatment, and then just use the normal plaform
> dma mapping setup.  To get swiotlb you'll need to then use the DT/ACPI
> dma-range property to limit the addressable range, and a swiotlb
> capable plaform will use swiotlb automatically.

It seems reasonable to teach a platform to override dma-range
for a specific device e.g. in case it knows about bugs in ACPI.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-03 Thread Michael S. Tsirkin
On Fri, Aug 03, 2018 at 10:41:41AM +0800, Jason Wang wrote:
> 
> 
> On 2018年08月03日 04:55, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> > > This patch series is the follow up on the discussions we had before about
> > > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> > > for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> > > were suggestions about doing away with two different paths of transactions
> > > with the host/QEMU, first being the direct GPA and the other being the DMA
> > > API based translations.
> > > 
> > > First patch attempts to create a direct GPA mapping based DMA operations
> > > structure called 'virtio_direct_dma_ops' with exact same implementation
> > > of the direct GPA path which virtio core currently has but just wrapped in
> > > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> > > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve 
> > > the
> > > existing semantics. The second patch does exactly that inside the function
> > > virtio_finalize_features(). The third patch removes the default direct GPA
> > > path from virtio core forcing it to use DMA API callbacks for all devices.
> > > Now with that change, every device must have a DMA operations structure
> > > associated with it. The fourth patch adds an additional hook which gives
> > > the platform an opportunity to do yet another override if required. This
> > > platform hook can be used on POWER Ultravisor based protected guests to
> > > load up SWIOTLB DMA callbacks to do the required (as discussed previously
> > > in the above mentioned thread how host is allowed to access only parts of
> > > the guest GPA range) bounce buffering into the shared memory for all I/O
> > > scatter gather buffers to be consumed on the host side.
> > > 
> > > Please go through these patches and review whether this approach broadly
> > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > the patches or the approach in general. Thank you.
> > Jason did some work on profiling this. Unfortunately he reports
> > about 4% extra overhead from this switch on x86 with no vIOMMU.
> 
> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> guest and measure PPS on tap on host.
> 
> Thanks

Could you supply host configuration involved please?

> > 
> > I expect he's writing up the data in more detail, but
> > just wanted to let you know this would be one more
> > thing to debug before we can just switch to DMA APIs.
> > 
> > 
> > > Anshuman Khandual (4):
> > >virtio: Define virtio_direct_dma_ops structure
> > >virtio: Override device's DMA OPS with virtio_direct_dma_ops 
> > > selectively
> > >virtio: Force virtio core to use DMA API callbacks for all virtio 
> > > devices
> > >virtio: Add platform specific DMA API translation for virito devices
> > > 
> > >   arch/powerpc/include/asm/dma-mapping.h |  6 +++
> > >   arch/powerpc/platforms/pseries/iommu.c |  6 +++
> > >   drivers/virtio/virtio.c| 72 
> > > ++
> > >   drivers/virtio/virtio_pci_common.h |  3 ++
> > >   drivers/virtio/virtio_ring.c   | 65 
> > > +-
> > >   5 files changed, 89 insertions(+), 63 deletions(-)
> > > 
> > > -- 
> > > 2.9.3


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-03 Thread Michael S. Tsirkin
On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > >   2- Make virtio use the DMA API with our custom platform-provided
> > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > running on a secure VM in our case.
> > 
> > And total NAK the customer platform-provided part of this.  We need
> > a flag passed in from the hypervisor that the device needs all bus
> > specific dma api treatment, and then just use the normal plaform
> > dma mapping setup. 
> 
> Christoph, as I have explained already, we do NOT have a way to provide
> such a flag as neither the hypervisor nor qemu knows anything about
> this when the VM is created.

I think the fact you can't add flags from the hypervisor is
a sign of a problematic architecture, you should look at
adding that down the road - you will likely need it at some point.

However in this specific case, the flag does not need to come from the
hypervisor, it can be set by arch boot code I think.
Christoph do you see a problem with that?

> >  To get swiotlb you'll need to then use the DT/ACPI
> > dma-range property to limit the addressable range, and a swiotlb
> > capable plaform will use swiotlb automatically.
> 
> This cannot be done as you describe it.
> 
> The VM is created as a *normal* VM. The DT stuff is generated by qemu
> at a point where it has *no idea* that the VM will later become secure
> and thus will have to restrict which pages can be used for "DMA".
> 
> The VM will *at runtime* turn itself into a secure VM via interactions
> with the security HW and the Ultravisor layer (which sits below the
> HV). This happens way after the DT has been created and consumed, the
> qemu devices instanciated etc...
> 
> Only the guest kernel knows because it initates the transition. When
> that happens, the virtio devices have already been used by the guest
> firmware, bootloader, possibly another kernel that kexeced the "secure"
> one, etc... 
> 
> So instead of running around saying NAK NAK NAK, please explain how we
> can solve that differently.
> 
> Ben.
> 


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-02 Thread Michael S. Tsirkin
On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:
> > > Yes, this is the purpose of Anshuman original patch (I haven't looked
> > > at the details of the patch in a while but that's what I told him to
> > > implement ;-) :
> > > 
> > >   - Make virtio always use DMA ops to simplify the code path (with a set
> > > of "transparent" ops for legacy)
> > > 
> > >   and
> > > 
> > >   -  Provide an arch hook allowing us to "override" those "transparent"
> > > DMA ops with some custom ones that do the appropriate swiotlb gunk.
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > 
> > Right but as I tried to say doing that brings us to a bunch of issues
> > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for
> > guest to hypervisor communication.
> 
> I'm not sure I see the problem, see below
> 
> > When we do (as is the case with PLATFORM_IOMMU right now) this adds a
> > bunch of overhead which we need to get rid of if we are to switch to
> > PLATFORM_IOMMU by default.  We need to fix that.
> 
> So let's differenciate the two problems of having an IOMMU (real or
> emulated) which indeeds adds overhead etc... and using the DMA API.

Well actually it's the other way around. An iommu in theory doesn't need
to bring overhead if you set it in bypass mode.  Which does imply the
iommu supports bypass mode. Is that universally the case?  DMA API does
see Christoph's list of things it does some of which add overhead.

> At the moment, virtio does this all over the place:
> 
>   if (use_dma_api)
>   dma_map/alloc_something(...)
>   else
>   use_pa
> 
> The idea of the patch set is to do two, somewhat orthogonal, changes
> that together achieve what we want. Let me know where you think there
> is "a bunch of issues" because I'm missing it:
> 
>  1- Replace the above if/else constructs with just calling the DMA API,
> and have virtio, at initialization, hookup its own dma_ops that just
> "return pa" (roughly) when the IOMMU stuff isn't used.
> 
> This adds an indirect function call to the path that previously didn't
> have one (the else case above). Is that a significant/measurable
> overhead ?

Seems to be :( Jason reports about 4%. I wonder whether we can support
map_sg and friends being NULL, then use that when mapping is an
identity. A conditional branch there is likely very cheap.

Would this cover all platforms with kvm (which is where we care
most about performance)?

> This change stands alone, and imho "cleans" up virtio by avoiding all
> that if/else "2 path" and unless it adds a measurable overhead, should
> probably be done.
> 
>  2- Make virtio use the DMA API with our custom platform-provided
> swiotlb callbacks when needed, that is when not using IOMMU *and*
> running on a secure VM in our case.
> 
> This benefits from -1- by making us just plumb in a different set of
> DMA ops we would have cooked up specifically for virtio in our arch
> code (or in virtio itself but build arch-conditionally in a separate
> file). But it doesn't strictly need it -1-:
> 
> Now, -2- doesn't strictly needs -1-. We could have just done another
> xen-like hack that forces the DMA API "ON" for virtio when running in a
> secure VM.
> 
> The problem if we do that however is that we also then need the arch
> PCI code to make sure it hooks up the virtio PCI devices with the
> special "magic" DMA ops that avoid the iommu but still do swiotlb, ie,
> not the same as other PCI devices. So it will have to play games such
> as checking vendor/device IDs for virtio, checking the IOMMU flag,
> etc... from the arch code which really bloody sucks when assigning PCI
> DMA ops.
> 
> However, if we do it the way we plan here, on top of -1-, with a hook
> called from virtio into the arch to "override" the virtio DMA ops, then
> we avoid the problem completely: The arch hook would only be called by
> virtio if the IOMMU flag is *not* set. IE only when using that special
> "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal
> PCI dma ops as usual.
> 
> That way, we have a very clear semantic: This hook is purely about
> replacing those "null" DMA ops that just return PA introduced in -1-
> with some arch provided specially cooked up DMA ops for non-IOMMU
> virtio that know about the arch special requirements. For us bounce
> buffering.
> 
> Is there something I'm missing ?
> 
> Cheers,
> Ben.

Right so I was try

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-02 Thread Michael S. Tsirkin
On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> This patch series is the follow up on the discussions we had before about
> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> were suggestions about doing away with two different paths of transactions
> with the host/QEMU, first being the direct GPA and the other being the DMA
> API based translations.
> 
> First patch attempts to create a direct GPA mapping based DMA operations
> structure called 'virtio_direct_dma_ops' with exact same implementation
> of the direct GPA path which virtio core currently has but just wrapped in
> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> existing semantics. The second patch does exactly that inside the function
> virtio_finalize_features(). The third patch removes the default direct GPA
> path from virtio core forcing it to use DMA API callbacks for all devices.
> Now with that change, every device must have a DMA operations structure
> associated with it. The fourth patch adds an additional hook which gives
> the platform an opportunity to do yet another override if required. This
> platform hook can be used on POWER Ultravisor based protected guests to
> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> in the above mentioned thread how host is allowed to access only parts of
> the guest GPA range) bounce buffering into the shared memory for all I/O
> scatter gather buffers to be consumed on the host side.
> 
> Please go through these patches and review whether this approach broadly
> makes sense. I will appreciate suggestions, inputs, comments regarding
> the patches or the approach in general. Thank you.

Jason did some work on profiling this. Unfortunately he reports
about 4% extra overhead from this switch on x86 with no vIOMMU.

I expect he's writing up the data in more detail, but
just wanted to let you know this would be one more
thing to debug before we can just switch to DMA APIs.


> Anshuman Khandual (4):
>   virtio: Define virtio_direct_dma_ops structure
>   virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
>   virtio: Force virtio core to use DMA API callbacks for all virtio devices
>   virtio: Add platform specific DMA API translation for virito devices
> 
>  arch/powerpc/include/asm/dma-mapping.h |  6 +++
>  arch/powerpc/platforms/pseries/iommu.c |  6 +++
>  drivers/virtio/virtio.c| 72 
> ++
>  drivers/virtio/virtio_pci_common.h |  3 ++
>  drivers/virtio/virtio_ring.c   | 65 +-
>  5 files changed, 89 insertions(+), 63 deletions(-)
> 
> -- 
> 2.9.3


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-02 Thread Michael S. Tsirkin
On Thu, Aug 02, 2018 at 10:33:05AM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 00:56 +0300, Michael S. Tsirkin wrote:
> > > but it's not, VMs are
> > > created in "legacy" mode all the times and we don't know at VM creation
> > > time whether it will become a secure VM or not. The way our secure VMs
> > > work is that they start as a normal VM, load a secure "payload" and
> > > call the Ultravisor to "become" secure.
> > > 
> > > So we're in a bit of a bind here. We need that one-liner optional arch
> > > hook to make virtio use swiotlb in that "IOMMU bypass" case.
> > > 
> > > Ben.
> > 
> > And just to make sure I understand, on your platform DMA APIs do include
> > some of the cache flushing tricks and this is why you don't want to
> > declare iommu support in the hypervisor?
> 
> I'm not sure I parse what you mean.
> 
> We don't need cache flushing tricks.

You don't but do real devices on same platform need them?

> The problem we have with our
> "secure" VMs is that:
> 
>  - At VM creation time we have no idea it's going to become a secure
> VM, qemu doesn't know anything about it, and thus qemu (or other
> management tools, libvirt etc...) are going to create "legacy" (ie
> iommu bypass) virtio devices.
> 
>  - Once the VM goes secure (early during boot but too late for qemu),
> it will need to make virtio do bounce-buffering via swiotlb because
> qemu cannot physically access most VM pages (blocked by HW security
> features), we need to bounce buffer using some unsecure pages that are
> accessible to qemu.
> 
> That said, I wouldn't object for us to more generally switch long run
> to changing qemu so that virtio on powerpc starts using the IOMMU as a
> default provided we fix our guest firmware to understand it (it
> currently doesn't), and provided we verify that the performance impact
> on things like vhost is negligible.
> 
> Cheers,
> Ben.
> 


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-02 Thread Michael S. Tsirkin
On Thu, Aug 02, 2018 at 12:53:41PM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote:
> > 
> > I see. So yes, given that device does not know or care, using
> > virtio features is an awkward fit.
> > 
> > So let's say as a quick fix for you maybe we could generalize the
> > xen_domain hack, instead of just checking xen_domain check some static
> > branch.  Then teach xen and others to enable that.>
> 
> > OK but problem then becomes this: if you do this and virtio device appears
> > behind a vIOMMU and it does not advertize the IOMMU flag, the
> > code will try to use the vIOMMU mappings and fail.
> >
> > It does look like even with trick above, you need a special version of
> > DMA ops that does just swiotlb but not any of the other things DMA API
> > might do.
> > 
> > Thoughts?
> 
> Yes, this is the purpose of Anshuman original patch (I haven't looked
> at the details of the patch in a while but that's what I told him to
> implement ;-) :
> 
>  - Make virtio always use DMA ops to simplify the code path (with a set
> of "transparent" ops for legacy)
> 
>  and
> 
>  -  Provide an arch hook allowing us to "override" those "transparent"
> DMA ops with some custom ones that do the appropriate swiotlb gunk.
> 
> Cheers,
> Ben.
> 

Right but as I tried to say doing that brings us to a bunch of issues
with using DMA APIs in virtio. Put simply DMA APIs weren't designed for
guest to hypervisor communication.

When we do (as is the case with PLATFORM_IOMMU right now) this adds a
bunch of overhead which we need to get rid of if we are to switch to
PLATFORM_IOMMU by default.  We need to fix that.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-02 Thread Michael S. Tsirkin
On Thu, Aug 02, 2018 at 11:01:26AM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 18:41 +0300, Michael S. Tsirkin wrote:
> > 
> > > I don't completely agree:
> > > 
> > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu
> > > for example. It indicates that the peer bypasses the normal platform
> > > iommu. The platform code in the guest has no real way to know that this
> > > is the case, this is a specific "feature" of the qemu implementation.
> > > 
> > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a
> > > property of the guest platform itself (not qemu), there's no way the
> > > "peer" can advertize it via the virtio negociated flags. At least for
> > > us. I don't know for sure whether that would be workable for the ARM
> > > case. In our case, qemu has no idea at VM creation time that the VM
> > > will turn itself into a secure VM and thus will require bounce
> > > buffering for IOs (including virtio).
> > > 
> > > So unless we have another hook for the arch code to set
> > > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the
> > > guest itself, I don't see that as a way to deal with it.
> > > 
> > > >  The other issue is VIRTIO_F_IO_BARRIER
> > > > which is very vaguely defined, and which needs a better definition.
> > > > And last but not least we'll need some text explaining the challenges
> > > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + 
> > > > VIRTIO_F_IO_BARRIER
> > > > is what would basically cover them, but a good description including
> > > > an explanation of why these matter.
> > > 
> > > Ben.
> > > 
> > 
> > So is it true that from qemu point of view there is nothing special
> > going on?  You pass in a PA, host writes there.
> 
> Yes, qemu doesn't see a different. It's the guest that will bounce the
> pages via a pool of "insecure" pages that qemu can access. Normal pages
> in a secure VM come from PAs that qemu cannot physically access.
> 
> Cheers,
> Ben.
> 

I see. So yes, given that device does not know or care, using
virtio features is an awkward fit.

So let's say as a quick fix for you maybe we could generalize the
xen_domain hack, instead of just checking xen_domain check some static
branch.  Then teach xen and others to enable that.
OK but problem then becomes this: if you do this and virtio device appears
behind a vIOMMU and it does not advertize the IOMMU flag, the
code will try to use the vIOMMU mappings and fail.

It does look like even with trick above, you need a special version of
DMA ops that does just swiotlb but not any of the other things DMA API
might do.

Thoughts?

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-02 Thread Michael S. Tsirkin
On Thu, Aug 02, 2018 at 10:24:57AM -0500, Benjamin Herrenschmidt wrote:
> On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig wrote:
> > We just need to figure out how to deal with devices that deviate
> > from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> > dma tweaks (offsets, cache flushing), which seems well in spirit of
> > the original design. 
> 
> I don't completely agree:
> 
> 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu
> for example. It indicates that the peer bypasses the normal platform
> iommu. The platform code in the guest has no real way to know that this
> is the case, this is a specific "feature" of the qemu implementation.
> 
> 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a
> property of the guest platform itself (not qemu), there's no way the
> "peer" can advertize it via the virtio negociated flags. At least for
> us. I don't know for sure whether that would be workable for the ARM
> case. In our case, qemu has no idea at VM creation time that the VM
> will turn itself into a secure VM and thus will require bounce
> buffering for IOs (including virtio).
> 
> So unless we have another hook for the arch code to set
> VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the
> guest itself, I don't see that as a way to deal with it.
> 
> >  The other issue is VIRTIO_F_IO_BARRIER
> > which is very vaguely defined, and which needs a better definition.
> > And last but not least we'll need some text explaining the challenges
> > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> > is what would basically cover them, but a good description including
> > an explanation of why these matter.
> 
> Ben.
> 

So is it true that from qemu point of view there is nothing special
going on?  You pass in a PA, host writes there.


-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-01 Thread Michael S. Tsirkin
On Wed, Aug 01, 2018 at 10:05:35AM +0100, Will Deacon wrote:
> Hi Christoph,
> 
> On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > > On arm/arm64, the problem we have is that legacy virtio devices on the 
> > > MMIO
> > > transport (so definitely not PCI) have historically been advertised by 
> > > qemu
> > > as not being cache coherent, but because the virtio core has bypassed DMA
> > > ops then everything has happened to work. If we blindly enable the arch 
> > > DMA
> > > ops,
> > 
> > No one is suggesting that as far as I can tell.
> 
> Apologies: it's me that wants the DMA ops enabled to handle legacy devices
> behind an IOMMU, but see below.
> 
> > > we'll plumb in the non-coherent ops and start getting data corruption,
> > > so we do need a way to quirk virtio as being "always coherent" if we want 
> > > to
> > > use the DMA ops (which we do, because our emulation platforms have an 
> > > IOMMU
> > > for all virtio devices).
> > 
> > From all that I've gather so far: no you do not want that.  We really
> > need to figure out virtio "dma" interacts with the host / device.
> > 
> > If you look at the current iommu spec it does talk of physical address
> > with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
> 
> That's true, although that doesn't exist in the legacy virtio spec, and we
> have an existing emulation platform which puts legacy virtio devices behind
> an IOMMU. Currently, Linux is unable to boot on this platform unless the
> IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops,
> then it works perfectly.
> 
> > So between that and our discussion in this thread and its previous
> > iterations I think we need to stick to the current always physical,
> > bypass system dma ops mode of virtio operation as the default.
> 
> As above -- that means we hang during boot because we get stuck trying to
> bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy
> answer is "just upgrade to latest virtio and advertise the presence of the
> IOMMU". I'm pushing for that in future platforms, but it seems a shame not
> to support the current platform, especially given that other systems do have
> hacks in mainline to get virtio working.
> 
> > We just need to figure out how to deal with devices that deviate
> > from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> > dma tweaks (offsets, cache flushing), which seems well in spirit of
> > the original design.  The other issue is VIRTIO_F_IO_BARRIER
> > which is very vaguely defined, and which needs a better definition.
> > And last but not least we'll need some text explaining the challenges
> > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> > is what would basically cover them, but a good description including
> > an explanation of why these matter.
> 
> I agree that this makes sense for future revisions of virtio (or perhaps
> it can just be a clarification to virtio 1.0), but we're still left in the
> dark with legacy devices and it would be nice to have them work on the
> systems which currently exist, even if it's a legacy-only hack in the arch
> code.
> 
> Will


Myself I'm sympathetic to this use-case and I see more uses to this
than just legacy support. But more work is required IMHO.
Will post tomorrow though - it's late here ...

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-01 Thread Michael S. Tsirkin
On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > transport (so definitely not PCI) have historically been advertised by qemu
> > as not being cache coherent, but because the virtio core has bypassed DMA
> > ops then everything has happened to work. If we blindly enable the arch DMA
> > ops,
> 
> No one is suggesting that as far as I can tell.
> 
> > we'll plumb in the non-coherent ops and start getting data corruption,
> > so we do need a way to quirk virtio as being "always coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > for all virtio devices).
> 
> >From all that I've gather so far: no you do not want that.  We really
> need to figure out virtio "dma" interacts with the host / device.
> 
> If you look at the current iommu spec it does talk of physical address
> with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
> 
> So between that and our discussion in this thread and its previous
> iterations I think we need to stick to the current always physical,
> bypass system dma ops mode of virtio operation as the default.
> 
> We just need to figure out how to deal with devices that deviate
> from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design.

Well I wouldn't say that. VIRTIO_F_IOMMU_PLATFORM is for guest
programmable protection which is designed for things like userspace
drivers but still very much which a CPU doing the accesses. I think
VIRTIO_F_IO_BARRIER needs to be extended to VIRTIO_F_PLATFORM_DMA.

>  The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.

I think VIRTIO_F_IOMMU_PLATFORM + VIRTIO_F_PLATFORM_DMA but yea.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-01 Thread Michael S. Tsirkin
On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > However the question people raise is that DMA API is already full of
> > > arch-specific tricks the likes of which are outlined in your post linked
> > > above. How is this one much worse?
> > 
> > None of these warts is visible to the driver, they are all handled in
> > the architecture (possibly on a per-bus basis).
> > 
> > So for virtio we really need to decide if it has one set of behavior
> > as specified in the virtio spec, or if it behaves exactly as if it
> > was on a PCI bus, or in fact probably both as you lined up.  But no
> > magic arch specific behavior inbetween.
> 
> The only arch specific behaviour is needed in the case where it doesn't
> behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> our secure VMs, we still need to make it use swiotlb in order to bounce
> through non-secure pages.
> 
> It would be nice if "real PCI" was the default

I think you are mixing "real PCI" which isn't coded up yet and IOMMU
bypass which is. IOMMU bypass will maybe with time become unnecessary
since it seems that one can just program an IOMMU in a bypass mode
instead.

It's hard to blame you since right now if you disable IOMMU bypass
you get a real PCI mode. But they are distinct and to allow people
to enable IOMMU by default we will need to teach someone
(virtio or DMA API) about this mode that does follow
translation and protection rules in the IOMMU but runs
on a CPU and so does not need cache flushes and whatnot.

OTOH real PCI mode as opposed to default hypervisor mode does not perform as
well when what you actually have is a hypervisor.

So we'll likely have a mix of these two modes for a while.

> but it's not, VMs are
> created in "legacy" mode all the times and we don't know at VM creation
> time whether it will become a secure VM or not. The way our secure VMs
> work is that they start as a normal VM, load a secure "payload" and
> call the Ultravisor to "become" secure.
> 
> So we're in a bit of a bind here. We need that one-liner optional arch
> hook to make virtio use swiotlb in that "IOMMU bypass" case.
> 
> Ben.

And just to make sure I understand, on your platform DMA APIs do include
some of the cache flushing tricks and this is why you don't want to
declare iommu support in the hypervisor?

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-30 Thread Michael S. Tsirkin
On Mon, Jul 30, 2018 at 04:18:02AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote:
> > Let me reply to the "crappy" part first:
> > So virtio devices can run on another CPU or on a PCI bus. Configuration
> > can happen over mupltiple transports.  There is a discovery protocol to
> > figure out where it is. It has some warts but any real system has warts.
> > 
> > So IMHO virtio running on another CPU isn't "legacy virtual crappy
> > virtio". virtio devices that actually sit on a PCI bus aren't "sane"
> > simply because the DMA is more convoluted on some architectures.
> 
> All of what you said would be true if virtio didn't claim to be
> a PCI device.  

There's nothing virtio claims to be.  It's a PV device that uses PCI for
its configuration.  Configuration is enumerated on the virtual PCI bus.
That part of the interface is emulated PCI. Data path is through a
PV device enumerated on the virtio bus.

> Once it claims to be a PCI device and we also see
> real hardware written to the interface I stand to all what I said
> above.

Real hardware would reuse parts of the interface but by necessity it
needs to behave slightly differently on some platforms.  However for
some platforms (such as x86) a PV virtio driver will by luck work with a
PCI device backend without changes. As these platforms and drivers are
widely deployed, some people will deploy hardware like that.  Should be
a non issue as by definition it's transparent to guests.

> > With this out of my system:
> > I agree these approaches are hacky. I think it is generally better to
> > have virtio feature negotiation tell you whether device runs on a CPU or
> > not rather than rely on platform specific ways for this. To this end
> > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
> > VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
> > e.g.  what if it's a VF - is that real or not? But I can see something
> > like e.g. VIRTIO_F_PLATFORM_DMA gaining support.
> > 
> > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
> > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.
> 
> I don't really care about the exact naming, and indeed a device that
> sets the flag doesn't have to be a 'real' device - it just has to act
> like one.  I explained all the issues that this means (at least relating
> to DMA) in one of the previous threads.

I believe you refer to this:
https://lkml.org/lkml/2018/6/7/15
that was a very helpful list outlining the problems we need to solve,
thanks a lot for that!

> The important bit is that we can specify exact behavior for both
> devices that sets the "I'm real!" flag and that ones that don't exactly
> in the spec.

I would very much like that, yes.

> And that very much excludes arch-specific (or
> Xen-specific) overrides.

We already committed to a xen specific hack but generally I prefer
devices that describe how they work instead of platforms magically
guessing, yes.

However the question people raise is that DMA API is already full of
arch-specific tricks the likes of which are outlined in your post linked
above. How is this one much worse?

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-30 Thread Michael S. Tsirkin
On Mon, Jul 30, 2018 at 02:34:14AM -0700, Christoph Hellwig wrote:
> We really need to distinguish between legacy virtual crappy
> virtio (and that includes v1) that totally ignores the bus it pretends
> to be on, and sane virtio (to be defined) that sit on a real (or
> properly emulated including iommu and details for dma mapping) bus.

Let me reply to the "crappy" part first:
So virtio devices can run on another CPU or on a PCI bus. Configuration
can happen over mupltiple transports.  There is a discovery protocol to
figure out where it is. It has some warts but any real system has warts.

So IMHO virtio running on another CPU isn't "legacy virtual crappy
virtio". virtio devices that actually sit on a PCI bus aren't "sane"
simply because the DMA is more convoluted on some architectures.

Performance impact of the optimizations possible when you know
your "device" is in fact just another CPU has been measured,
it is real, so we aren't interested in adding all that overhead back
just so we can use DMA API. The "correct then fast" mantra doesn't
apply to something that is as widely deployed as virtio.

And I can accept an argument that maybe the DMA API isn't designed to
support such virtual DMA. Whether it should I don't know.

With this out of my system:
I agree these approaches are hacky. I think it is generally better to
have virtio feature negotiation tell you whether device runs on a CPU or
not rather than rely on platform specific ways for this. To this end
there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
e.g.  what if it's a VF - is that real or not? But I can see something
like e.g. VIRTIO_F_PLATFORM_DMA gaining support.

We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.

-- 
MST


Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively

2018-07-28 Thread Michael S. Tsirkin
On Sat, Jul 28, 2018 at 02:26:24PM +0530, Anshuman Khandual wrote:
> On 07/20/2018 09:29 AM, Anshuman Khandual wrote:
> > Now that virtio core always needs all virtio devices to have DMA OPS, we
> > need to make sure that the structure it points is the right one. In the
> > absence of VIRTIO_F_IOMMU_PLATFORM flag QEMU expects GPA from guest kernel.
> > In such case, virtio device must use default virtio_direct_dma_ops DMA OPS
> > structure which transforms scatter gather buffer addresses as GPA. This
> > DMA OPS override must happen as early as possible during virtio device
> > initializatin sequence before virtio core starts using given device's DMA
> > OPS callbacks for I/O transactions. This change detects device's IOMMU flag
> > and does the override in case the flag is cleared.
> > 
> > Signed-off-by: Anshuman Khandual 
> > ---
> >  drivers/virtio/virtio.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 7907ad3..6b13987 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, 
> > unsigned int status)
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_add_status);
> > 
> > +const struct dma_map_ops virtio_direct_dma_ops;
> > +
> >  int virtio_finalize_features(struct virtio_device *dev)
> >  {
> > int ret = dev->config->finalize_features(dev);
> > @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
> > if (ret)
> > return ret;
> 
> 
> The previous patch removed the code block for XEN guests which forced
> the use of DMA API all the time irrespective of VIRTIO_F_IOMMU_PLATFORM
> flag on the device. Here is what I have removed with patch 2/4 which
> breaks the existing semantics on XEN guests.
> 
> -static bool vring_use_dma_api(struct virtio_device *vdev)
> -{
> - if (!virtio_has_iommu_quirk(vdev))
> - return true;
> -
> - /* Otherwise, we are left to guess. */
> - /*
> -  * In theory, it's possible to have a buggy QEMU-supposed
> -  * emulated Q35 IOMMU and Xen enabled at the same time.  On
> -  * such a configuration, virtio has never worked and will
> -  * not work without an even larger kludge.  Instead, enable
> -  * the DMA API if we're a Xen guest, which at least allows
> -  * all of the sensible Xen configurations to work correctly.
> -  */
> - if (xen_domain())
> - return true;
> -
> - return false;
> -}
> 
> XEN guests would not like override with virtio_direct_dma_ops in any
> case irrespective of the flag VIRTIO_F_IOMMU_PLATFORM. So the existing
> semantics can be preserved with something like this. It just assumes
> that dev->dma_ops is non-NULL and a valid one set by the architecture.
> If required we can add those tests here before skipping the override.
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 7907ad3..6b13987 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, 
> unsigned int status)
>  }
>  EXPORT_SYMBOL_GPL(virtio_add_status);
> 
> +const struct dma_map_ops virtio_direct_dma_ops;
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>   int ret = dev->config->finalize_features(dev);
> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (ret)
>   return ret;
> +
> + if (xen_domain())
> + goto skip_override;
> +
> + if (virtio_has_iommu_quirk(dev))
> + set_dma_ops(dev->dev.parent, _direct_dma_ops);
> +
> + skip_override:
> +

I prefer normal if scoping as opposed to goto spaghetti pls.
Better yet move vring_use_dma_api here and use it.
Less of a chance something will break.

>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   return 0
> 
> Will incorporate these changes in the next version.


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-27 Thread Michael S. Tsirkin
On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote:
> Results with and without the patches are similar.

Thanks! And another thing to try is virtio-net with
a fast NIC backend (40G and up). Unfortunately
at this point loopback tests stress the host
scheduler too much.

-- 
MST


Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices

2018-07-25 Thread Michael S. Tsirkin
On Mon, Jul 23, 2018 at 07:46:09AM +0530, Anshuman Khandual wrote:
> There is a redundant definition of virtio_has_iommu_quirk in the tools
> directory (tools/virtio/linux/virtio_config.h) which does not seem to
> be used any where. I guess that can be removed without problem.

It's there just to make tools/virtio build.
Try
make -C tools/virtio/
In fact I see there's a missing definition for
dma_rmb/dma_wmb there, I'll post a patch.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-23 Thread Michael S. Tsirkin
On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote:
> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> >> This patch series is the follow up on the discussions we had before about
> >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> >> were suggestions about doing away with two different paths of transactions
> >> with the host/QEMU, first being the direct GPA and the other being the DMA
> >> API based translations.
> >>
> >> First patch attempts to create a direct GPA mapping based DMA operations
> >> structure called 'virtio_direct_dma_ops' with exact same implementation
> >> of the direct GPA path which virtio core currently has but just wrapped in
> >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> >> existing semantics. The second patch does exactly that inside the function
> >> virtio_finalize_features(). The third patch removes the default direct GPA
> >> path from virtio core forcing it to use DMA API callbacks for all devices.
> >> Now with that change, every device must have a DMA operations structure
> >> associated with it. The fourth patch adds an additional hook which gives
> >> the platform an opportunity to do yet another override if required. This
> >> platform hook can be used on POWER Ultravisor based protected guests to
> >> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> >> in the above mentioned thread how host is allowed to access only parts of
> >> the guest GPA range) bounce buffering into the shared memory for all I/O
> >> scatter gather buffers to be consumed on the host side.
> >>
> >> Please go through these patches and review whether this approach broadly
> >> makes sense. I will appreciate suggestions, inputs, comments regarding
> >> the patches or the approach in general. Thank you.
> > I like how patches 1-3 look. Could you test performance
> > with/without to see whether the extra indirection through
> > use of DMA ops causes a measurable slow-down?
> 
> I ran this simple DD command 10 times where /dev/vda is a virtio block
> device of 10GB size.
> 
> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct
> 
> With and without patches bandwidth which has a bit wide range does not
> look that different from each other.
> 
> Without patches
> ===
> 
> -- 1 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
> -- 2 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
> -- 3 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
> -- 4 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
> -- 5 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
> -- 6 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
> -- 7 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
> -- 8 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
> -- 9 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
> -- 10 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s
> 
> 
> With patches
> 
> 
> -- 1 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
> -- 2 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
> -- 3 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
> -- 4 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
> -- 5 -
> 1024+0 records in
> 1024+0 recor

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-20 Thread Michael S. Tsirkin
On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> This patch series is the follow up on the discussions we had before about
> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> were suggestions about doing away with two different paths of transactions
> with the host/QEMU, first being the direct GPA and the other being the DMA
> API based translations.
> 
> First patch attempts to create a direct GPA mapping based DMA operations
> structure called 'virtio_direct_dma_ops' with exact same implementation
> of the direct GPA path which virtio core currently has but just wrapped in
> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> existing semantics. The second patch does exactly that inside the function
> virtio_finalize_features(). The third patch removes the default direct GPA
> path from virtio core forcing it to use DMA API callbacks for all devices.
> Now with that change, every device must have a DMA operations structure
> associated with it. The fourth patch adds an additional hook which gives
> the platform an opportunity to do yet another override if required. This
> platform hook can be used on POWER Ultravisor based protected guests to
> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> in the above mentioned thread how host is allowed to access only parts of
> the guest GPA range) bounce buffering into the shared memory for all I/O
> scatter gather buffers to be consumed on the host side.
> 
> Please go through these patches and review whether this approach broadly
> makes sense. I will appreciate suggestions, inputs, comments regarding
> the patches or the approach in general. Thank you.

I like how patches 1-3 look. Could you test performance
with/without to see whether the extra indirection through
use of DMA ops causes a measurable slow-down?

> Anshuman Khandual (4):
>   virtio: Define virtio_direct_dma_ops structure
>   virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
>   virtio: Force virtio core to use DMA API callbacks for all virtio devices
>   virtio: Add platform specific DMA API translation for virito devices
> 
>  arch/powerpc/include/asm/dma-mapping.h |  6 +++
>  arch/powerpc/platforms/pseries/iommu.c |  6 +++
>  drivers/virtio/virtio.c| 72 
> ++
>  drivers/virtio/virtio_pci_common.h |  3 ++
>  drivers/virtio/virtio_ring.c   | 65 +-
>  5 files changed, 89 insertions(+), 63 deletions(-)
> 
> -- 
> 2.9.3


Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices

2018-07-20 Thread Michael S. Tsirkin
On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote:
>Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for
> virito devices

s/virito/virtio/

> This adds a hook which a platform can define in order to allow it to
> override virtio device's DMA OPS irrespective of whether it has the
> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
> bounce-buffering of data on the new secure pSeries platform, currently
> under development, where a KVM host cannot access all of the memory
> space of a secure KVM guest.  The host can only access the pages which
> the guest has explicitly requested to be shared with the host, thus
> the virtio implementation in the guest has to copy data to and from
> shared pages.
> 
> With this hook, the platform code in the secure guest can force the
> use of swiotlb for virtio buffers, with a back-end for swiotlb which
> will use a pool of pre-allocated shared pages.  Thus all data being
> sent or received by virtio devices will be copied through pages which
> the host has access to.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/include/asm/dma-mapping.h | 6 ++
>  arch/powerpc/platforms/pseries/iommu.c | 6 ++
>  drivers/virtio/virtio.c| 7 +++
>  3 files changed, 19 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa3945..bc5a9d3 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
>  
>  #endif /* __KERNEL__ */
>  #endif   /* _ASM_DMA_MAPPING_H */
> +
> +#define platform_override_dma_ops platform_override_dma_ops
> +
> +struct virtio_device;
> +
> +extern void platform_override_dma_ops(struct virtio_device *vdev);
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 06f0296..5773bc7 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
>  __setup("multitce=", disable_multitce);
>  
>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> +
> +void platform_override_dma_ops(struct virtio_device *vdev)
> +{
> + /* Override vdev->parent.dma_ops if required */
> +}
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 6b13987..432c332 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
>  
>  const struct dma_map_ops virtio_direct_dma_ops;
>  
> +#ifndef platform_override_dma_ops
> +static inline void platform_override_dma_ops(struct virtio_device *vdev)
> +{
> +}
> +#endif
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>   int ret = dev->config->finalize_features(dev);
> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (virtio_has_iommu_quirk(dev))
>   set_dma_ops(dev->dev.parent, _direct_dma_ops);
>  
> + platform_override_dma_ops(dev);

Is there a single place where virtio_has_iommu_quirk is called now?
If so, we could put this into virtio_has_iommu_quirk then.

>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   return 0;
>  
> -- 
> 2.9.3


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-13 Thread Michael S. Tsirkin
On Mon, Jun 11, 2018 at 01:34:50PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-11 at 06:28 +0300, Michael S. Tsirkin wrote:
> > 
> > > However if the administrator
> > > ignores/forgets/deliberatey-decides/is-constrained to NOT enable the
> > > flag, virtio will not be able to pass control to the DMA ops associated
> > > with the virtio devices. Which means, we have no opportunity to share
> > > the I/O buffers with the hypervisor/qemu.
> > > 
> > > How do you suggest, we handle this case?
> > 
> > As step 1, ignore it as a user error.
> 
> Ugh ... not again. Ram, don't bring that subject back we ALREADY
> addressed it, and requiring the *user* to do special things is just
> utterly and completely wrong.
> 
> The *user* has no bloody idea what that stuff is, will never know to
> set whatver magic qemu flag etc... The user will just start a a VM
> normally and expect things to work. Requiring the *user* to know things
> like that iommu virtio flag is complete nonsense.

You should consider teaching QEMU that the platform has support for the
ultravisor thing so it can set flags for you.

That's true even if something else is done for virtio - if you don't
have the capability right now you will come to regret it, host userspace
is just fundamentally in charge of control-path enumeration in the KVM
stack, I understand you want to take some of it away for security but
trying to hide things from QEMU completely is a mistake IMHO.

Just my $.02.

> If by "user" you mean libvirt, then you are now requesting about 4 or 5
> different projects to be patched to add speical cases for something
> they know nothing about and is completely irrelevant, while it can be
> entirely addressed with a 1-liner in virtio kernel side to allow the
> arch to plumb alternate DMA ops.
> 
> So for some reason you seem to be dead set on a path that leads to
> mountain of user pain, changes to many different projects and overall
> havok while there is a much much simpler and elegant solution at hand
> which I described (again) in the response to Ram I sent about 5mn ago.

What you sent to Ram sounds OK to me - I only hope we do all mean
the same thing because the 3 paragraphs above are all
about the libvirt/QEMU split mess which linux or virtio
should not really care about.

And I'd like to stress that direct path isn't merely legacy.

It's a common sense approach that when you have a hypervisor doing
things like taking care of cache coherency, you do not want to do them
in the guest as well. And the same guest can use a pass-through device
where it does need to do all the coherency mess, so while it's quite
possible to build a platform-specific way to tell guests which devices
need which kind of treatment, people understandably chose to do it in a
portable way through the virtio device.

I understand you guys are working on a new project that is going to use
bounce buffers anyway so what do you care about optimizations but we
can't just slow everyone else down for your benefit.

> > Further you can for example add per-device quirks in virtio so it can be
> > switched to dma api. make extra decisions in platform code then.

Isn't above exactly what you sent to Ram?

-- 
MST


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-13 Thread Michael S. Tsirkin
On Mon, Jun 11, 2018 at 01:29:18PM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2018-06-10 at 19:39 -0700, Ram Pai wrote:
> > 
> > However if the administrator
> > ignores/forgets/deliberatey-decides/is-constrained to NOT enable the
> > flag, virtio will not be able to pass control to the DMA ops associated
> > with the virtio devices. Which means, we have no opportunity to share
> > the I/O buffers with the hypervisor/qemu.
> > 
> > How do you suggest, we handle this case?
> 
> At the risk of repeating myself, let's just do the first pass which is
> to switch virtio over to always using the DMA API in the actual data
> flow code, with a hook at initialization time that replaces the DMA ops
> with some home cooked "direct" ops in the case where the IOMMU flag
> isn't set.

I'm not sure I understand all of the details, will have to
see the patch, but superficially it sounds good to me.

> This will be equivalent to what we have today but avoids having 2
> separate code path all over the driver.
> 
> Then a second stage, I think, is to replace this "hook" so that the
> architecture gets a say in the matter.
> 
> Basically, something like:
> 
>   arch_virtio_update_dma_ops(pci_dev, qemu_direct_mode).
> 
> IE, virtio would tell the arch whether the "other side" is in fact QEMU
> in a mode that bypasses the IOMMU and is cache coherent with the guest.
> This is our legacy "qemu special" mode. If the performance is
> sufficient we may want to deprecate it over time and have qemu enable
> the iommu by default but we still need it.
> 
> A weak implementation of the above will be provied that just puts in
> the direct ops when qemu_direct_mode is set, and thus provides today's
> behaviour on any arch that doesn't override it. If the flag is not set,
> the ops are left to whatever the arch PCI layer already set.
> 
> This will provide the opportunity for architectures that want to do
> something special, such as in our case, when we want to force even the
> "qemu_direct_mode" to go via bounce buffers, to put our own ops in
> there, while retaining the legacy behaviour otherwise.
> 
> It also means that the "gunk" is entirely localized in that one
> function, the rest of virtio just uses the DMA API normally.
> 
> Christoph, are you actually hacking "stage 1" above already or should
> we produce patches ?
> 
> Cheers,
> Ben.



Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-13 Thread Michael S. Tsirkin
On Wed, Jun 13, 2018 at 12:41:41AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 11, 2018 at 01:29:18PM +1000, Benjamin Herrenschmidt wrote:
> > At the risk of repeating myself, let's just do the first pass which is
> > to switch virtio over to always using the DMA API in the actual data
> > flow code, with a hook at initialization time that replaces the DMA ops
> > with some home cooked "direct" ops in the case where the IOMMU flag
> > isn't set.
> > 
> > This will be equivalent to what we have today but avoids having 2
> > separate code path all over the driver.
> > 
> > Then a second stage, I think, is to replace this "hook" so that the
> > architecture gets a say in the matter.
> 
> I don't think we can actually use dma_direct_ops.  It still allows
> architectures to override parts of the dma setup, which virtio seems
> to blindly assume phys == dma and not cache flushing.
> 
> I think the right way forward is to either add a new
> VIRTIO_F_IS_PCI_DEVICE (or redefine the existing iommu flag if deemed
> possible).

Given this is exactly what happens now, this seems possible, but maybe
we want a non-PCI specific name.

>  And then make sure recent qemu always sets it.

I don't think that part is going to happen, sorry.

Hypervisors can set it when they *actually have* a real PCI device.
People emulate systems which have a bunch of overhead in the DMA API
which is required for real DMA. Your proposal would double that overhead
by first doing it in guest then re-doing it in host.

I don't think it's justified when 99% of the world doesn't need it.
-- 
MST


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-13 Thread Michael S. Tsirkin
On Thu, Jun 07, 2018 at 11:36:55PM -0700, Christoph Hellwig wrote:
> > This seems to be what was being asked for in this thread,
> > with comments claiming IOMMU flag adds too much overhead.
> 
> Right now it means implementing a virtual iommu, which I agree is
> way too much overhead.

Well not really. The flag in question will have a desired effect without
a virtual iommu.

> > SEV guys report that they just set the iommu flag and then it all works.
> > I guess if there's translation we can think of this as a kind of iommu.
> > Maybe we should rename PLATFORM_IOMMU to PLARTFORM_TRANSLATION?
> 
> VIRTIO_F_BEHAVES_LIKE_A_REAL_PCI_DEVICE_DONT_TRY_TO_OUTSMART_ME
> 
> as said it's not just translations, it is cache coherence as well.

Well it's only for DMA. So maybe PLATFORM_DMA.

I suspect people will then come and complain that they
do *not* want cache coherence tricks because virtio is
running on a CPU, but we'll see.

> > And apparently some people complain that just setting that flag makes
> > qemu check translation on each access with an unacceptable performance
> > overhead.  Forcing same behaviour for everyone on general principles
> > even without the flag is unlikely to make them happy.
> 
> That sounds like a qemu implementation bug.  If qemu knowns that
> guest physiscall == guest dma space there is no need to check.

Possibly. Or it's possible it's all just theoretical, no one
posted any numbers.

-- 
MST


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-10 Thread Michael S. Tsirkin
On Sun, Jun 10, 2018 at 07:39:09PM -0700, Ram Pai wrote:
> On Thu, Jun 07, 2018 at 07:28:35PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2018 at 10:23:06PM -0700, Christoph Hellwig wrote:
> > > On Thu, May 31, 2018 at 08:43:58PM +0300, Michael S. Tsirkin wrote:
> > > > Pls work on a long term solution. Short term needs can be served by
> > > > enabling the iommu platform in qemu.
> > > 
> > > So, I spent some time looking at converting virtio to dma ops overrides,
> > > and the current virtio spec, and the sad through I have to tell is that
> > > both the spec and the Linux implementation are complete and utterly fucked
> > > up.
> > 
> > Let me restate it: DMA API has support for a wide range of hardware, and
> > hardware based virtio implementations likely won't benefit from all of
> > it.
> > 
> > And given virtio right now is optimized for specific workloads, improving
> > portability without regressing performance isn't easy.
> > 
> > I think it's unsurprising since it started a strictly a guest/host
> > mechanism.  People did implement offloads on specific platforms though,
> > and they are known to work. To improve portability even further,
> > we might need to make spec and code changes.
> > 
> > I'm not really sympathetic to people complaining that they can't even
> > set a flag in qemu though. If that's the case the stack in question is
> > way too inflexible.
> 
> We did consider your suggestion. But can't see how it will work.
> Maybe you can guide us here. 
> 
> In our case qemu has absolutely no idea if the VM will switch itself to
> secure mode or not.  Its a dynamic decision made entirely by the VM
> through direct interaction with the hardware/firmware; no
> qemu/hypervisor involved.
> 
> If the administrator, who invokes qemu, enables the flag, the DMA ops
> associated with the virito devices will be called, and hence will be
> able to do the right things. Yes we might incur performance hit due to
> the IOMMU translations, but lets ignore that for now; the functionality
> will work. Good till now.
> 
> However if the administrator
> ignores/forgets/deliberatey-decides/is-constrained to NOT enable the
> flag, virtio will not be able to pass control to the DMA ops associated
> with the virtio devices. Which means, we have no opportunity to share
> the I/O buffers with the hypervisor/qemu.
> 
> How do you suggest, we handle this case?

As step 1, ignore it as a user error.
Further you can for example add per-device quirks in virtio so it can be
switched to dma api. make extra decisions in platform code then.

> > 
> > 
> > 
> > > Both in the flag naming and the implementation there is an implication
> > > of DMA API == IOMMU, which is fundamentally wrong.
> > 
> > Maybe we need to extend the meaning of PLATFORM_IOMMU or rename it.
> > 
> > It's possible that some setups will benefit from a more
> > fine-grained approach where some aspects of the DMA
> > API are bypassed, others aren't.
> > 
> > This seems to be what was being asked for in this thread,
> > with comments claiming IOMMU flag adds too much overhead.
> > 
> > 
> > > The DMA API does a few different things:
> > > 
> > >  a) address translation
> > > 
> > >   This does include IOMMUs.  But it also includes random offsets
> > >   between PCI bars and system memory that we see on various
> > >   platforms.
> > 
> > I don't think you mean bars. That's unrelated to DMA.
> > 
> > >  Worse so some of these offsets might be based on
> > >   banks, e.g. on the broadcom bmips platform.  It also deals
> > >   with bitmask in physical addresses related to memory encryption
> > >   like AMD SEV.  I'd be really curious how for example the
> > >   Intel virtio based NIC is going to work on any of those
> > >   plaforms.
> > 
> > SEV guys report that they just set the iommu flag and then it all works.
> 
> This is one of the fundamental difference between SEV architecture and
> the ultravisor architecture. In SEV, qemu is aware of SEV.  In
> ultravisor architecture, only the VM that runs within qemu is aware of
> ultravisor;  hypervisor/qemu/administrator are untrusted entities.

Spo one option is to teach qemu that it's on a platform with an
ultravisor, this might have more advantages.

> I hope, we can make virtio subsystem flexibe enough to support various
> security paradigms.

So if you are worried about qemu attacking guests, I see
more problems than just passing an incorrect iommu
flag.


> Apart from the above reas

Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-07 Thread Michael S. Tsirkin
On Wed, Jun 06, 2018 at 10:23:06PM -0700, Christoph Hellwig wrote:
> On Thu, May 31, 2018 at 08:43:58PM +0300, Michael S. Tsirkin wrote:
> > Pls work on a long term solution. Short term needs can be served by
> > enabling the iommu platform in qemu.
> 
> So, I spent some time looking at converting virtio to dma ops overrides,
> and the current virtio spec, and the sad through I have to tell is that
> both the spec and the Linux implementation are complete and utterly fucked
> up.

Let me restate it: DMA API has support for a wide range of hardware, and
hardware based virtio implementations likely won't benefit from all of
it.

And given virtio right now is optimized for specific workloads, improving
portability without regressing performance isn't easy.

I think it's unsurprising since it started a strictly a guest/host
mechanism.  People did implement offloads on specific platforms though,
and they are known to work. To improve portability even further,
we might need to make spec and code changes.

I'm not really sympathetic to people complaining that they can't even
set a flag in qemu though. If that's the case the stack in question is
way too inflexible.



> Both in the flag naming and the implementation there is an implication
> of DMA API == IOMMU, which is fundamentally wrong.

Maybe we need to extend the meaning of PLATFORM_IOMMU or rename it.

It's possible that some setups will benefit from a more
fine-grained approach where some aspects of the DMA
API are bypassed, others aren't.

This seems to be what was being asked for in this thread,
with comments claiming IOMMU flag adds too much overhead.


> The DMA API does a few different things:
> 
>  a) address translation
> 
>   This does include IOMMUs.  But it also includes random offsets
>   between PCI bars and system memory that we see on various
>   platforms.

I don't think you mean bars. That's unrelated to DMA.

>  Worse so some of these offsets might be based on
>   banks, e.g. on the broadcom bmips platform.  It also deals
>   with bitmask in physical addresses related to memory encryption
>   like AMD SEV.  I'd be really curious how for example the
>   Intel virtio based NIC is going to work on any of those
>   plaforms.

SEV guys report that they just set the iommu flag and then it all works.
I guess if there's translation we can think of this as a kind of iommu.
Maybe we should rename PLATFORM_IOMMU to PLARTFORM_TRANSLATION?

And apparently some people complain that just setting that flag makes
qemu check translation on each access with an unacceptable performance
overhead.  Forcing same behaviour for everyone on general principles
even without the flag is unlikely to make them happy.

>   b) coherency
> 
>   On many architectures DMA is not cache coherent, and we need
>   to invalidate and/or write back cache lines before doing
>   DMA.  Again, I wonder how this is every going to work with
>   hardware based virtio implementations.


You mean dma_Xmb and friends?
There's a new feature VIRTIO_F_IO_BARRIER that's being proposed
for that.


>  Even worse I think this
>   is actually broken at least for VIVT event for virtualized
>   implementations.  E.g. a KVM guest is going to access memory
>   using different virtual addresses than qemu, vhost might throw
>   in another different address space.

I don't really know what VIVT is. Could you help me please?

>   c) bounce buffering
> 
>   Many DMA implementations can not address all physical memory
>   due to addressing limitations.  In such cases we copy the
>   DMA memory into a known addressable bounc buffer and DMA
>   from there.

Don't do it then?


>   d) flushing write combining buffers or similar
> 
>   On some hardware platforms we need workarounds to e.g. read
>   from a certain mmio address to make sure DMA can actually
>   see memory written by the host.

I guess it isn't an issue as long as WC isn't actually used.
It will become an issue when virtio spec adds some WC capability -
I suspect we can ignore this for now.

> 
> All of this is bypassed by virtio by default despite generally being
> platform issues, not particular to a given device.

It's both a device and a platform issue. A PV device is often more like
another CPU than like a PCI device.



-- 
MST


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-04 Thread Michael S. Tsirkin
On Tue, Jun 05, 2018 at 09:26:56AM +1000, Benjamin Herrenschmidt wrote:
> I would like to keep however the ability to bypass the iommu for
> performance reasons

So that's easy, clear the IOMMU flag and this means "bypass the IOMMU".

-- 
MST


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-04 Thread Michael S. Tsirkin
On Mon, Jun 04, 2018 at 11:14:36PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-04 at 05:55 -0700, Christoph Hellwig wrote:
> > On Mon, Jun 04, 2018 at 03:43:09PM +0300, Michael S. Tsirkin wrote:
> > > Another is that given the basic functionality is in there, optimizations
> > > can possibly wait until per-device quirks in DMA API are supported.
> > 
> > We have had per-device dma_ops for quite a while.
> 
> I've asked Ansuman to start with a patch that converts virtio to use
> DMA ops always, along with an init quirk to hookup "direct" ops when
> the IOMMU flag isn't set.
> 
> This will at least remove that horrid duplication of code path we have
> in there.
> 
> Then we can just involve the arch in that init quirk so we can chose an
> alternate set of ops when running a secure VM.
> 
> This is completely orthogonal to whether an iommu exist qemu side or
> not, and should be entirely solved on the Linux side.
> 
> Cheers,
> Ben.

Sounds good to me.

-- 
MST


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-04 Thread Michael S. Tsirkin
On Mon, Jun 04, 2018 at 11:11:52PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-04 at 15:43 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2018-05-23 at 21:50 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > I re-read that discussion and I'm still unclear on the
> > > > original question, since I got several apparently
> > > > conflicting answers.
> > > > 
> > > > I asked:
> > > > 
> > > > Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
> > > > hypervisor side sufficient?
> > > 
> > > I thought I had replied to this...
> > > 
> > > There are a couple of reasons:
> > > 
> > > - First qemu doesn't know that the guest will switch to "secure mode"
> > > in advance. There is no difference between a normal and a secure
> > > partition until the partition does the magic UV call to "enter secure
> > > mode" and qemu doesn't see any of it. So who can set the flag here ?
> > 
> > The user should set it. You just tell user "to be able to use with
> > feature X, enable IOMMU".
> 
> That's completely backwards. The user has no idea what that stuff is.
> And it would have to percolate all the way up the management stack,
> libvirt, kimchi, whatever else ... that's just nonsense.
> 
> Especially since, as I explained in my other email, this is *not* a
> qemu problem and thus the solution shouldn't be messing around with
> qemu.

virtio is implemented in qemu though. If you prefer to stick
all your code in either guest or the UV that's your decision
but it looks like qemu could be helpful here.

For example what if you have a guest that passes physical addresses
to qemu bypassing swiotlb? Don't you want to detect
that and fail gracefully rather than crash the guest?
That's what VIRTIO_F_IOMMU_PLATFORM will do for you.

Still that's hypervisor's decision. What isn't up to the hypervisor is
the way we structure code. We made an early decision to merge a hack
with xen, among discussion about how with time DMA API will learn to
support per-device quirks and we'll be able to switch to that.
So let's do that now?

> > 
> > > - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or
> > > vhost) go through the emulated MMIO for every access to the guest,
> > > which adds additional overhead.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > There are several answers to this.  One is that we are working hard to
> > make overhead small when the mappings are static (which they would be if
> > there's no actual IOMMU). So maybe especially given you are using
> > a bounce buffer on top it's not so bad - did you try to
> > benchmark?
> > 
> > Another is that given the basic functionality is in there, optimizations
> > can possibly wait until per-device quirks in DMA API are supported.
> 
> The point is that requiring specific qemu command line arguments isn't
> going to fly. We have additional problems due to the fact that our
> firmware (SLOF) inside qemu doesn't currently deal with iommu's etc...
> though those can be fixed.
> 
> Overall, however, this seems to be the most convoluted way of achieving
> things, require user interventions where none should be needed etc...
> 
> Again, what's wrong with a 2 lines hook instead that solves it all and
> completely avoids involving qemu ?
> 
> Ben.

That each platform wants to add hacks in this data path function.

> > 
> > > > 
> > > > 
> > > > >  arch/powerpc/include/asm/dma-mapping.h |  6 ++
> > > > >  arch/powerpc/platforms/pseries/iommu.c | 11 +++
> > > > >  drivers/virtio/virtio_ring.c   | 10 ++
> > > > >  3 files changed, 27 insertions(+)
> > > > > 
> > > > > diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> > > > > b/arch/powerpc/include/asm/dma-mapping.h
> > > > > index 8fa3945..056e578 100644
> > > > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device 
> > > > > *dev);
> > > > >  #define ARCH_HAS_DMA_MMAP_COHERENT
> > > > >  
> > > > >  #endif /* __KERNEL__ */
> > > > > +
> > > > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > > > +
> > 

Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-04 Thread Michael S. Tsirkin
On Mon, Jun 04, 2018 at 07:48:54PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-04 at 18:57 +1000, David Gibson wrote:
> > 
> > > - First qemu doesn't know that the guest will switch to "secure mode"
> > > in advance. There is no difference between a normal and a secure
> > > partition until the partition does the magic UV call to "enter secure
> > > mode" and qemu doesn't see any of it. So who can set the flag here ?
> > 
> > This seems weird to me.  As a rule HV calls should go through qemu -
> > or be allowed to go directly to KVM *by* qemu.
> 
> It's not an HV call, it's a UV call, qemu won't see it, qemu isn't
> trusted. Now the UV *will* reflect that to the HV via some synthetized
> HV calls, and we *could* have those do a pass by qemu, however, so far,
> our entire design doesn't rely on *any* qemu knowledge whatsoever and
> it would be sad to add it just for that purpose.

It's a temporary work-around. I think that the long-term fix is to
support per-device quirks and have the DMA API DTRT for virtio.

> Additionally, this is rather orthogonal, see my other email, the
> problem we are trying to solve is *not* a qemu problem and it doesn't
> make sense to leak that into qemu.
> 
> >   We generally reserve
> > the latter for hot path things.  Since this isn't a hot path, having
> > the call handled directly by the kernel seems wrong.
> >
> > Unless a "UV call" is something different I don't know about.
> 
> Yes, a UV call goes to the Ultravisor, not the Hypervisor. The
> Hypervisor isn't trusted.
> 
> > > - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or
> > > vhost) go through the emulated MMIO for every access to the guest,
> > > which adds additional overhead.
> > 
> Ben.


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-06-04 Thread Michael S. Tsirkin
On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-05-23 at 21:50 +0300, Michael S. Tsirkin wrote:
> 
> > I re-read that discussion and I'm still unclear on the
> > original question, since I got several apparently
> > conflicting answers.
> > 
> > I asked:
> > 
> > Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
> > hypervisor side sufficient?
> 
> I thought I had replied to this...
> 
> There are a couple of reasons:
> 
> - First qemu doesn't know that the guest will switch to "secure mode"
> in advance. There is no difference between a normal and a secure
> partition until the partition does the magic UV call to "enter secure
> mode" and qemu doesn't see any of it. So who can set the flag here ?

The user should set it. You just tell user "to be able to use with
feature X, enable IOMMU".

> - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or
> vhost) go through the emulated MMIO for every access to the guest,
> which adds additional overhead.
> 
> Cheers,
> Ben.

There are several answers to this.  One is that we are working hard to
make overhead small when the mappings are static (which they would be if
there's no actual IOMMU). So maybe especially given you are using
a bounce buffer on top it's not so bad - did you try to
benchmark?

Another is that given the basic functionality is in there, optimizations
can possibly wait until per-device quirks in DMA API are supported.


> > 
> > 
> > >  arch/powerpc/include/asm/dma-mapping.h |  6 ++
> > >  arch/powerpc/platforms/pseries/iommu.c | 11 +++
> > >  drivers/virtio/virtio_ring.c   | 10 ++
> > >  3 files changed, 27 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> > > b/arch/powerpc/include/asm/dma-mapping.h
> > > index 8fa3945..056e578 100644
> > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device 
> > > *dev);
> > >  #define ARCH_HAS_DMA_MMAP_COHERENT
> > >  
> > >  #endif /* __KERNEL__ */
> > > +
> > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > +
> > > +struct virtio_device;
> > > +
> > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> > >  #endif   /* _ASM_DMA_MAPPING_H */
> > > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > > b/arch/powerpc/platforms/pseries/iommu.c
> > > index 06f0296..a2ec15a 100644
> > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > @@ -38,6 +38,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> > >  __setup("multitce=", disable_multitce);
> > >  
> > >  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > > +
> > > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > +{
> > > + /*
> > > +  * On protected guest platforms, force virtio core to use DMA
> > > +  * MAP API for all virtio devices. But there can also be some
> > > +  * exceptions for individual devices like virtio balloon.
> > > +  */
> > > + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > > +}
> > 
> > Isn't this kind of slow?  vring_use_dma_api is on
> > data path and supposed to be very fast.
> > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 21d464a..47ea6c3 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -141,8 +141,18 @@ struct vring_virtqueue {
> > >   * unconditionally on data path.
> > >   */
> > >  
> > > +#ifndef platform_forces_virtio_dma
> > > +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > +{
> > > + return false;
> > > +}
> > > +#endif
> > > +
> > >  static bool vring_use_dma_api(struct virtio_device *vdev)
> > >  {
> > > + if (platform_forces_virtio_dma(vdev))
> > > + return true;
> > > +
> > >   if (!virtio_has_iommu_quirk(vdev))
> > >   return true;
> > >  
> > > -- 
> > > 2.9.3


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-05-31 Thread Michael S. Tsirkin
On Thu, May 31, 2018 at 09:09:24AM +0530, Anshuman Khandual wrote:
> On 05/24/2018 12:51 PM, Ram Pai wrote:
> > On Wed, May 23, 2018 at 09:50:02PM +0300, Michael S. Tsirkin wrote:
> >> subj: s/virito/virtio/
> >>
> > ..snip..
> >>>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> >>> +
> >>> +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> >>> +{
> >>> + /*
> >>> +  * On protected guest platforms, force virtio core to use DMA
> >>> +  * MAP API for all virtio devices. But there can also be some
> >>> +  * exceptions for individual devices like virtio balloon.
> >>> +  */
> >>> + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> >>> +}
> >>
> >> Isn't this kind of slow?  vring_use_dma_api is on
> >> data path and supposed to be very fast.
> > 
> > Yes it is slow and not ideal. This won't be the final code. The final
> > code will cache the information in some global variable and used
> > in this function.
> 
> Right this will be a simple check based on a global variable.
> 

Pls work on a long term solution. Short term needs can be served by
enabling the iommu platform in qemu.

-- 
MST


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-05-25 Thread Michael S. Tsirkin
On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-05-23 at 21:50 +0300, Michael S. Tsirkin wrote:
> 
> > I re-read that discussion and I'm still unclear on the
> > original question, since I got several apparently
> > conflicting answers.
> > 
> > I asked:
> > 
> > Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
> > hypervisor side sufficient?
> 
> I thought I had replied to this...
> 
> There are a couple of reasons:
> 
> - First qemu doesn't know that the guest will switch to "secure mode"
> in advance. There is no difference between a normal and a secure
> partition until the partition does the magic UV call to "enter secure
> mode" and qemu doesn't see any of it. So who can set the flag here ?

Not sure I understand. Just set the flag e.g. on qemu command line.
I might be wrong, but these secure mode things usually
a. require hypervisor side tricks anyway

> - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or
> vhost) go through the emulated MMIO for every access to the guest,
> which adds additional overhead.
> 
> Cheers,
> Ben.

Well it's not supposed to be much slower for the static case.

vhost has a cache so should be fine.

A while ago Paolo implemented a translation cache which should be
perfect for this case - most of the code got merged but
never enabled because of stability issues.

If all else fails, we could teach QEMU to handle the no-iommu case
as if VIRTIO_F_IOMMU_PLATFORM was off.



> > 
> > 
> > >  arch/powerpc/include/asm/dma-mapping.h |  6 ++
> > >  arch/powerpc/platforms/pseries/iommu.c | 11 +++
> > >  drivers/virtio/virtio_ring.c   | 10 ++
> > >  3 files changed, 27 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> > > b/arch/powerpc/include/asm/dma-mapping.h
> > > index 8fa3945..056e578 100644
> > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device 
> > > *dev);
> > >  #define ARCH_HAS_DMA_MMAP_COHERENT
> > >  
> > >  #endif /* __KERNEL__ */
> > > +
> > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > +
> > > +struct virtio_device;
> > > +
> > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> > >  #endif   /* _ASM_DMA_MAPPING_H */
> > > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > > b/arch/powerpc/platforms/pseries/iommu.c
> > > index 06f0296..a2ec15a 100644
> > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > @@ -38,6 +38,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> > >  __setup("multitce=", disable_multitce);
> > >  
> > >  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > > +
> > > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > +{
> > > + /*
> > > +  * On protected guest platforms, force virtio core to use DMA
> > > +  * MAP API for all virtio devices. But there can also be some
> > > +  * exceptions for individual devices like virtio balloon.
> > > +  */
> > > + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > > +}
> > 
> > Isn't this kind of slow?  vring_use_dma_api is on
> > data path and supposed to be very fast.
> > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 21d464a..47ea6c3 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -141,8 +141,18 @@ struct vring_virtqueue {
> > >   * unconditionally on data path.
> > >   */
> > >  
> > > +#ifndef platform_forces_virtio_dma
> > > +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > +{
> > > + return false;
> > > +}
> > > +#endif
> > > +
> > >  static bool vring_use_dma_api(struct virtio_device *vdev)
> > >  {
> > > + if (platform_forces_virtio_dma(vdev))
> > > + return true;
> > > +
> > >   if (!virtio_has_iommu_quirk(vdev))
> > >   return true;
> > >  
> > > -- 
> > > 2.9.3


Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-05-23 Thread Michael S. Tsirkin
subj: s/virito/virtio/

On Tue, May 22, 2018 at 12:03:17PM +0530, Anshuman Khandual wrote:
> This adds a hook which a platform can define in order to allow it to
> force the use of the DMA API for all virtio devices even if they don't
> have the VIRTIO_F_IOMMU_PLATFORM flag set.  We want to use this to do
> bounce-buffering of data on the new secure pSeries platform, currently
> under development, where a KVM host cannot access all of the memory
> space of a secure KVM guest.  The host can only access the pages which
> the guest has explicitly requested to be shared with the host, thus
> the virtio implementation in the guest has to copy data to and from
> shared pages.
> 
> With this hook, the platform code in the secure guest can force the
> use of swiotlb for virtio buffers, with a back-end for swiotlb which
> will use a pool of pre-allocated shared pages.  Thus all data being
> sent or received by virtio devices will be copied through pages which
> the host has access to.
> 
> Signed-off-by: Anshuman Khandual 
> ---
> Changes in V2:
> 
> The arch callback has been enabled through an weak symbol defintion
> so that it is enabled only for those architectures subscribing to
> this new framework. Clarified the patch description. The primary
> objective for this RFC has been to get an in principle agreement
> on this approach.
> 
> Original V1:
> 
> Original RFC and discussions https://patchwork.kernel.org/patch/10324405/

I re-read that discussion and I'm still unclear on the
original question, since I got several apparently
conflicting answers.

I asked:

Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
hypervisor side sufficient?



>  arch/powerpc/include/asm/dma-mapping.h |  6 ++
>  arch/powerpc/platforms/pseries/iommu.c | 11 +++
>  drivers/virtio/virtio_ring.c   | 10 ++
>  3 files changed, 27 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa3945..056e578 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device *dev);
>  #define ARCH_HAS_DMA_MMAP_COHERENT
>  
>  #endif /* __KERNEL__ */
> +
> +#define platform_forces_virtio_dma platform_forces_virtio_dma
> +
> +struct virtio_device;
> +
> +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
>  #endif   /* _ASM_DMA_MAPPING_H */
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 06f0296..a2ec15a 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
>  __setup("multitce=", disable_multitce);
>  
>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> +
> +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> +{
> + /*
> +  * On protected guest platforms, force virtio core to use DMA
> +  * MAP API for all virtio devices. But there can also be some
> +  * exceptions for individual devices like virtio balloon.
> +  */
> + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> +}

Isn't this kind of slow?  vring_use_dma_api is on
data path and supposed to be very fast.

> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 21d464a..47ea6c3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -141,8 +141,18 @@ struct vring_virtqueue {
>   * unconditionally on data path.
>   */
>  
> +#ifndef platform_forces_virtio_dma
> +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> +{
> + return false;
> +}
> +#endif
> +
>  static bool vring_use_dma_api(struct virtio_device *vdev)
>  {
> + if (platform_forces_virtio_dma(vdev))
> + return true;
> +
>   if (!virtio_has_iommu_quirk(vdev))
>   return true;
>  
> -- 
> 2.9.3


Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU

2018-04-18 Thread Michael S. Tsirkin
On Wed, Apr 18, 2018 at 08:47:10AM +0530, Anshuman Khandual wrote:
> On 04/15/2018 05:41 PM, Christoph Hellwig wrote:
> > On Fri, Apr 06, 2018 at 06:37:18PM +1000, Benjamin Herrenschmidt wrote:
>  implemented as DMA API which the virtio core understands. There is no
>  need for an IOMMU to be involved for the device representation in this
>  case IMHO.
> >>>
> >>> This whole virtio translation issue is a mess.  I think we need to
> >>> switch it to the dma API, and then quirk the legacy case to always
> >>> use the direct mapping inside the dma API.
> >>
> >> Fine with using a dma API always on the Linux side, but we do want to
> >> special case virtio still at the arch and qemu side to have a "direct
> >> mapping" mode. Not sure how (special flags on PCI devices) to avoid
> >> actually going through an emulated IOMMU on the qemu side, because that
> >> slows things down, esp. with vhost.
> >>
> >> IE, we can't I think just treat it the same as a physical device.
> > 
> > We should have treated it like a physical device from the start, but
> > that device has unfortunately sailed.
> > 
> > But yes, we'll need a per-device quirk that says 'don't attach an
> > iommu'.
> 
> How about doing it per platform basis as suggested in this RFC through
> an arch specific callback. Because all the virtio devices in the given
> platform would require and exercise this option (to avail bounce buffer
> mechanism for secure guests as an example). So the flag basically is a
> platform specific one not a device specific one.

That's not the case. A single platform can have a mix of virtio and
non-virtio devices. Same applies even within virtio, e.g. the balloon
device always bypasses an iommu.  Further, QEMU supports out of process
devices some of which might bypass the IOMMU.

-- 
MST


Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU

2018-04-05 Thread Michael S. Tsirkin
On Fri, Apr 06, 2018 at 01:09:43AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2018-04-05 at 17:54 +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 05, 2018 at 08:09:30PM +0530, Anshuman Khandual wrote:
> > > On 04/05/2018 04:26 PM, Anshuman Khandual wrote:
> > > > There are certian platforms which would like to use SWIOTLB based DMA 
> > > > API
> > > > for bouncing purpose without actually requiring an IOMMU back end. But 
> > > > the
> > > > virtio core does not allow such mechanism. Right now DMA MAP API is only
> > > > selected for devices which have an IOMMU and then the QEMU/host back end
> > > > will process all incoming SG buffer addresses as IOVA instead of simple
> > > > GPA which is the case for simple bounce buffers after being processed 
> > > > with
> > > > SWIOTLB API. To enable this usage, it introduces an architecture 
> > > > specific
> > > > function which will just make virtio core front end select DMA 
> > > > operations
> > > > structure.
> > > > 
> > > > Signed-off-by: Anshuman Khandual <khand...@linux.vnet.ibm.com>
> > > 
> > > + "Michael S. Tsirkin" <m...@redhat.com>
> > 
> > I'm confused by this.
> > 
> > static bool vring_use_dma_api(struct virtio_device *vdev)
> > {
> > if (!virtio_has_iommu_quirk(vdev))
> > return true;
> > 
> > 
> > Why doesn't setting VIRTIO_F_IOMMU_PLATFORM on the
> > hypervisor side sufficient?
> 
> In this specific case, because that would make qemu expect an iommu,
> and there isn't one.


I think that you can set iommu_platform in qemu without an iommu.


> Anshuman, you need to provide more background here. I don't have time
> right now it's late, but explain about the fact that this is for a
> specific type of secure VM which has only a limited pool of (insecure)
> memory that can be shared with qemu, so all IOs need to bounce via that
> pool, which can be achieved by using swiotlb.
> 
> Note: this isn't urgent, we can discuss alternative approaches, this is
> just to start the conversation.
> 
> Cheers,
> Ben.


Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU

2018-04-05 Thread Michael S. Tsirkin
On Thu, Apr 05, 2018 at 08:09:30PM +0530, Anshuman Khandual wrote:
> On 04/05/2018 04:26 PM, Anshuman Khandual wrote:
> > There are certian platforms which would like to use SWIOTLB based DMA API
> > for bouncing purpose without actually requiring an IOMMU back end. But the
> > virtio core does not allow such mechanism. Right now DMA MAP API is only
> > selected for devices which have an IOMMU and then the QEMU/host back end
> > will process all incoming SG buffer addresses as IOVA instead of simple
> > GPA which is the case for simple bounce buffers after being processed with
> > SWIOTLB API. To enable this usage, it introduces an architecture specific
> > function which will just make virtio core front end select DMA operations
> > structure.
> > 
> > Signed-off-by: Anshuman Khandual <khand...@linux.vnet.ibm.com>
> 
> + "Michael S. Tsirkin" <m...@redhat.com>

I'm confused by this.

static bool vring_use_dma_api(struct virtio_device *vdev)
{
if (!virtio_has_iommu_quirk(vdev))
return true;


Why doesn't setting VIRTIO_F_IOMMU_PLATFORM on the
hypervisor side sufficient?




Re: Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM)

2017-12-01 Thread Michael S. Tsirkin
On Fri, Dec 01, 2017 at 11:31:08PM +1100, Michael Ellerman wrote:
> "Michael S. Tsirkin" <m...@redhat.com> writes:
> 
> > fill_balloon doing memory allocations under balloon_lock
> > can cause a deadlock when leak_balloon is called from
> > virtballoon_oom_notify and tries to take same lock.
> >
> > To fix, split page allocation and enqueue and do allocations outside the 
> > lock.
> >
> > Here's a detailed analysis of the deadlock by Tetsuo Handa:
> >
> > In leak_balloon(), mutex_lock(>balloon_lock) is called in order to
> > serialize against fill_balloon(). But in fill_balloon(),
> > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > is specified, this allocation attempt might indirectly depend on somebody
> > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> > will cause OOM lockup.
> >
> >   Thread1   Thread2
> > fill_balloon()
> >   takes a balloon_lock
> >   balloon_page_enqueue()
> > alloc_page(GFP_HIGHUSER_MOVABLE)
> >   direct reclaim (__GFP_FS context)   takes a fs lock
> > waits for that fs lock  alloc_page(GFP_NOFS)
> >   
> > __alloc_pages_may_oom()
> > takes the oom_lock
> > out_of_memory()
> >   
> > blocking_notifier_call_chain()
> > leak_balloon()
> >               tries to take 
> > that balloon_lock and deadlocks
> >
> > Reported-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp>
> > Cc: Michal Hocko <mho...@suse.com>
> > Cc: Wei Wang <wei.w.w...@intel.com>
> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> >
> >  drivers/virtio/virtio_balloon.c| 24 +++-
> >  include/linux/balloon_compaction.h | 35 ++-
> >  mm/balloon_compaction.c| 28 +---
> >  3 files changed, 74 insertions(+), 13 deletions(-)
> 
> 
> Somehow this commit seems to be killing powerpc guests.
> 
> The symptom is that the first page (64K) of the guests memory gets over
> written with zeroes, which is where our interrupt handlers are, so the
> system rapidly locks up due to illegal instructions in the illegal
> instruction handler.
> 
> There seems to be some element of a race, because it doesn't always
> crash. Sometimes I can boot to a shell, but not often. When it does
> happen it's fairly late in boot, but before I get to a shell.
> 
> I had a few bisects go off into the weeds due to the intermittent
> nature. But once I realised that I changed my script to boot 5 times
> before declaring a kernel good, and that bisected straight here.
> 
> I can also revert this commit on v4.15-rc1 and everything's fine again -
> I got through ~250 boots with that kernel.
> 
> So I'm pretty sure this commit is triggering/exposing/causing the bug.
> 
> The other data point is that the page that's overwritten is mapped read
> only in the guest kernel. So either the guest kernel is writing to it in
> real mode (MMU off), or the hypervisor/DMA is doing it.
> 
> I haven't isolated if it's host kernel/qemu version dependent, at the
> moment I'm just using distro packaged versions of both.
> 
> Anyway I'll try and dig further into it on Monday, but I thought I'd let
> you know in case this is a known bug with a fix in the pipeline, or
> rings any bells or whatever.
> 
> cheers

Thanks for the report!
A fix was just posted:
virtio_balloon: fix increment of vb->num_pfns in fill_balloon()

Would appreciate testing.

> 
> > diff --git a/drivers/virtio/virtio_balloon.c 
> > b/drivers/virtio/virtio_balloon.c
> > index f0b3a0b..7960746 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -143,16 +143,17 @@ static void s

Re: [PATCH 09/15] virtio-blk: Pass attribute group to device_add_disk

2016-09-03 Thread Michael S. Tsirkin
On Wed, Aug 17, 2016 at 03:15:09PM +0800, Fam Zheng wrote:
> Previously after device_add_disk returns, the KOBJ_ADD uevent is already
> emitted. Adding attributes after that is a poor usage of kobject, and
> in practice may result in race conditions with userspace, for
> example udev checks availability of certain attributes and initializes
> /dev entries conditionally.
> 
> device_add_disk can handle adding attribute group better, so use it.
> 
> Meanwhile, handle error of device_add_disk.
> 
> Signed-off-by: Fam Zheng <f...@redhat.com>

Feel free to merge this with the rest of patchset
Acked-by: Michael S. Tsirkin <m...@redhat.com>



> ---
>  drivers/block/virtio_blk.c | 38 +++---
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 4564df5..ff60d82 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -522,10 +522,10 @@ virtblk_cache_type_show(struct device *dev, struct 
> device_attribute *attr,
>   return snprintf(buf, 40, "%s\n", virtblk_cache_types[writeback]);
>  }
>  
> -static const struct device_attribute dev_attr_cache_type_ro =
> +static struct device_attribute dev_attr_cache_type_ro =
>   __ATTR(cache_type, S_IRUGO,
>  virtblk_cache_type_show, NULL);
> -static const struct device_attribute dev_attr_cache_type_rw =
> +static struct device_attribute dev_attr_cache_type_rw =
>   __ATTR(cache_type, S_IRUGO|S_IWUSR,
>  virtblk_cache_type_show, virtblk_cache_type_store);
>  
> @@ -550,6 +550,26 @@ static struct blk_mq_ops virtio_mq_ops = {
>  static unsigned int virtblk_queue_depth;
>  module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
>  
> +static struct attribute *virtblk_attrs_ro[] = {
> + _attr_serial.attr,
> + _attr_cache_type_ro.attr,
> + NULL
> +};
> +
> +static struct attribute *virtblk_attrs_rw[] = {
> + _attr_serial.attr,
> + _attr_cache_type_rw.attr,
> + NULL
> +};
> +
> +static struct attribute_group virtblk_attr_group_ro = {
> + .attrs  = virtblk_attrs_ro,
> +};
> +
> +static struct attribute_group virtblk_attr_group_rw = {
> + .attrs  = virtblk_attrs_rw,
> +};
> +
>  static int virtblk_probe(struct virtio_device *vdev)
>  {
>   struct virtio_blk *vblk;
> @@ -560,6 +580,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>   u32 v, blk_size, sg_elems, opt_io_size;
>   u16 min_io_size;
>   u8 physical_block_exp, alignment_offset;
> + struct attribute_group *attr_group;
>  
>   if (!vdev->config->get) {
>   dev_err(>dev, "%s failure: config access disabled\n",
> @@ -719,19 +740,14 @@ static int virtblk_probe(struct virtio_device *vdev)
>  
>   virtio_device_ready(vdev);
>  
> - device_add_disk(>dev, vblk->disk, NULL);
> - err = device_create_file(disk_to_dev(vblk->disk), _attr_serial);
> - if (err)
> - goto out_del_disk;
> -
>   if (virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE))
> - err = device_create_file(disk_to_dev(vblk->disk),
> -  _attr_cache_type_rw);
> + attr_group = _attr_group_rw;
>   else
> - err = device_create_file(disk_to_dev(vblk->disk),
> -  _attr_cache_type_ro);
> + attr_group = _attr_group_ro;
> + err = device_add_disk(>dev, vblk->disk, attr_group);
>   if (err)
>   goto out_del_disk;
> +
>   return 0;
>  
>  out_del_disk:
> -- 
> 2.7.4


Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Michael S. Tsirkin
On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> And all that is out-of-topic here in my mind. I just want to be sure that
> this patchset still provides a use of a specific lightweight SYNCs on MIPS
> vs bold and heavy generalized "SYNC 0" in any case.
> 
> - Leonid.

Of course it does. All this patchset does is rename smp_mb/rmb/wmb
to __smp_mb()/__smp_rmb()/__smp_wmb()
and then asm-generic does #define smp_mb __smp_mb
or #define smp_mb barrier depending on CONFIG_SMP.

Why is that needed? So we can implement
[PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Michael S. Tsirkin
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> >smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
> >the asm-generic variants exactly. Drop the local definitions and pull in
> >asm-generic/barrier.h instead.
> >
> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
> 
> http://patchwork.linux-mips.org/patch/10506/
> 
> - Leonid.

Fine, but it matches what current code is doing.  Since that
MIPS_LIGHTWEIGHT_SYNC patch didn't go into linux-next yet, do
you see a problem reworking it on top of this patchset?

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()

2016-01-12 Thread Michael S. Tsirkin
On Tue, Jan 12, 2016 at 08:28:44AM -0800, Paul E. McKenney wrote:
> On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> > From: Davidlohr Bueso <d...@stgolabs.net>
> > 
> > With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> > it was made clear that the context of this call (and thus set_mb)
> > is strictly for CPU ordering, as opposed to IO. As such all archs
> > should use the smp variant of mb(), respecting the semantics and
> > saving a mandatory barrier on UP.
> > 
> > Signed-off-by: Davidlohr Bueso <dbu...@suse.de>
> > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
> > Cc: <linux-a...@vger.kernel.org>
> > Cc: Andrew Morton <a...@linux-foundation.org>
> > Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
> > Cc: Heiko Carstens <heiko.carst...@de.ibm.com>
> > Cc: Linus Torvalds <torva...@linux-foundation.org>
> > Cc: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <pet...@infradead.org>
> > Cc: Thomas Gleixner <t...@linutronix.de>
> > Cc: Tony Luck <tony.l...@intel.com>
> > Cc: d...@stgolabs.net
> > Link: 
> > http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-d...@stgolabs.net
> > Signed-off-by: Ingo Molnar <mi...@kernel.org>
> 
> Aside from a need for s/lcoking/locking/ in the subject line:
> 
> Reviewed-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>

Thanks!
Though Ingo already put this in tip tree like this,
and I need a copy in my tree to avoid breaking bisect,
so I will probably keep it exactly the same to avoid confusion.

> > ---
> >  arch/ia64/include/asm/barrier.h| 2 +-
> >  arch/powerpc/include/asm/barrier.h | 2 +-
> >  arch/s390/include/asm/barrier.h| 2 +-
> >  include/asm-generic/barrier.h  | 2 +-
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/ia64/include/asm/barrier.h 
> > b/arch/ia64/include/asm/barrier.h
> > index df896a1..209c4b8 100644
> > --- a/arch/ia64/include/asm/barrier.h
> > +++ b/arch/ia64/include/asm/barrier.h
> > @@ -77,7 +77,7 @@ do {  
> > \
> > ___p1;  \
> >  })
> > 
> > -#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); mb(); } 
> > while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } 
> > while (0)
> > 
> >  /*
> >   * The group barrier in front of the rsm & ssm are necessary to ensure
> > diff --git a/arch/powerpc/include/asm/barrier.h 
> > b/arch/powerpc/include/asm/barrier.h
> > index 0eca6ef..a7af5fb 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -34,7 +34,7 @@
> >  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> >  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> > 
> > -#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); mb(); } 
> > while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } 
> > while (0)
> > 
> >  #ifdef __SUBARCH_HAS_LWSYNC
> >  #define SMPWMB  LWSYNC
> > diff --git a/arch/s390/include/asm/barrier.h 
> > b/arch/s390/include/asm/barrier.h
> > index d68e11e..7ffd0b1 100644
> > --- a/arch/s390/include/asm/barrier.h
> > +++ b/arch/s390/include/asm/barrier.h
> > @@ -36,7 +36,7 @@
> >  #define smp_mb__before_atomic()smp_mb()
> >  #define smp_mb__after_atomic() smp_mb()
> > 
> > -#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); 
> > mb(); } while (0)
> > +#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); smp_mb(); 
> > } while (0)
> > 
> >  #define smp_store_release(p, v)
> > \
> >  do {   
> > \
> > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> > index b42afad..0f45f93 100644
> > --- a/include/asm-generic/barrier.h
> > +++ b/include/asm-generic/barrier.h
> > @@ -93,7 +93,7 @@
> >  #endif /* CONFIG_SMP */
> > 
> >  #ifndef smp_store_mb
> > -#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); mb(); } 
> > while (0)
> > +#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } 
> > while (0)
> >  #endif
> > 
> >  #ifndef smp_mb__before_atomic
> > -- 
> > MST
> > 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 0/3] checkpatch: handling of memory barriers

2016-01-11 Thread Michael S. Tsirkin
On Mon, Jan 11, 2016 at 12:59:25PM +0200, Michael S. Tsirkin wrote:
> As part of memory barrier cleanup, this patchset
> extends checkpatch to make it easier to stop
> incorrect memory barrier usage.
> 
> This replaces the checkpatch patches in my series
>   arch: barrier cleanup + barriers for virt
> and will be included in the pull request including
> the series.
> 
> changes from v3:
>   rename smp_barrier_stems to barrier_stems
>   as suggested by Julian Calaby.

In fact it was Joe Perches that suggested it.
Sorry about the confusion.

>   add (?: ... ) around a variable in regexp,
>   in case we change the value later so that it matters.
> changes from v2:
>   address comments by Joe Perches:
>   use (?: ... ) to avoid unnecessary capture groups
>   rename smp_barriers to smp_barrier_stems for clarity
>   add barriers before/after atomic
> Changes from v1:
>   catch optional\s* before () in barriers
>   rewrite using qr{} instead of map
> 
> Michael S. Tsirkin (3):
>   checkpatch.pl: add missing memory barriers
>   checkpatch: check for __smp outside barrier.h
>   checkpatch: add virt barriers
> 
> Michael S. Tsirkin (3):
>   checkpatch.pl: add missing memory barriers
>   checkpatch: check for __smp outside barrier.h
>   checkpatch: add virt barriers
> 
>  scripts/checkpatch.pl | 33 -
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> -- 
> MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/3] checkpatch: add virt barriers

2016-01-11 Thread Michael S. Tsirkin
On Mon, Jan 11, 2016 at 09:40:18PM +1100, Julian Calaby wrote:
> Hi Michael,
> 
> On Mon, Jan 11, 2016 at 9:35 PM, Michael S. Tsirkin <m...@redhat.com> wrote:
> > On Sun, Jan 10, 2016 at 02:52:16PM -0800, Joe Perches wrote:
> >> On Mon, 2016-01-11 at 09:13 +1100, Julian Calaby wrote:
> >> > On Mon, Jan 11, 2016 at 6:31 AM, Michael S. Tsirkin <m...@redhat.com> 
> >> > wrote:
> >> > > Add virt_ barriers to list of barriers to check for
> >> > > presence of a comment.
> >> []
> >> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> >> []
> >> > > @@ -5133,7 +5133,8 @@ sub process {
> >> > > }x;
> >> > > my $all_barriers = qr{
> >> > > $barriers|
> >> > > -   smp_(?:$smp_barrier_stems)
> >> > > +   smp_(?:$smp_barrier_stems)|
> >> > > +   virt_(?:$smp_barrier_stems)
> >> >
> >> > Sorry I'm late to the party here, but would it make sense to write this 
> >> > as:
> >> >
> >> > (?:smp|virt)_(?:$smp_barrier_stems)
> >>
> >> Yes.  Perhaps the name might be better as barrier_stems.
> >>
> >> Also, ideally this would be longest match first or use \b
> >> after the matches so that $all_barriers could work
> >> successfully without a following \s*\(
> >>
> >> my $all_barriers = qr{
> >>   (?:smp|virt)_(?:barrier_stems)|
> >>   $barriers)
> >> }x;
> >>
> >> or maybe add separate $smp_barriers and $virt_barriers
> >>
> >>   it doesn't matter much in any case
> >
> > OK just to clarify - are you OK with merging the patch as is?
> > Refactorings can come as patches on top if required.
> 
> I don't really care either way, I was just asking if it was possible.
> If you don't see any value in that change, then don't make it.
> 
> Thanks,
> 
> -- 
> Julian Calaby
> 
> Email: julian.cal...@gmail.com
> Profile: http://www.google.com/profiles/julian.calaby/

OK, got it, thanks.

I will rename smp_barrier_stems to barrier_stems since
this doesn't need too much testing.

I'd rather keep the regex code as is since changing it requires
testing.  I might play with it some more in the future
but I'd like to merge it in the current form to help make
sure __smp barriers are not misused.

I'll post v4 now - an ack will be appreciated.
-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 2/3] checkpatch: check for __smp outside barrier.h

2016-01-11 Thread Michael S. Tsirkin
Introduction of __smp barriers cleans up a bunch of duplicate code, but
it gives people an additional handle onto a "new" set of barriers - just
because they're prefixed with __* unfortunately doesn't stop anyone from
using it (as happened with other arch stuff before.)

Add a checkpatch test so it will trigger a warning.

Reported-by: Russell King <li...@arm.linux.org.uk>
Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
---
 scripts/checkpatch.pl | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 94b4e33..25476c2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5143,6 +5143,16 @@ sub process {
}
}
 
+   my $underscore_smp_barriers = qr{__smp_(?:$barrier_stems)}x;
+
+   if ($realfile !~ m@^include/asm-generic/@ &&
+   $realfile !~ m@/barrier\.h$@ &&
+   $line =~ m/\b(?:$underscore_smp_barriers)\s*\(/ &&
+   $line !~ 
m/^.\s*\#\s*define\s+(?:$underscore_smp_barriers)\s*\(/) {
+   WARN("MEMORY_BARRIER",
+"__smp memory barriers shouldn't be used outside 
barrier.h and asm-generic\n" . $herecurr);
+   }
+
 # check for waitqueue_active without a comment.
if ($line =~ /\bwaitqueue_active\s*\(/) {
if (!ctx_has_comment($first_line, $linenr)) {
-- 
MST

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 1/3] checkpatch.pl: add missing memory barriers

2016-01-11 Thread Michael S. Tsirkin
SMP-only barriers were missing in checkpatch.pl

Refactor code slightly to make adding more variants easier.

Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
---
 scripts/checkpatch.pl | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 2b3c228..94b4e33 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5116,7 +5116,27 @@ sub process {
}
}
 # check for memory barriers without a comment.
-   if ($line =~ 
/\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/)
 {
+
+   my $barriers = qr{
+   mb|
+   rmb|
+   wmb|
+   read_barrier_depends
+   }x;
+   my $barrier_stems = qr{
+   mb__before_atomic|
+   mb__after_atomic|
+   store_release|
+   load_acquire|
+   store_mb|
+   (?:$barriers)
+   }x;
+   my $all_barriers = qr{
+   (?:$barriers)|
+   smp_(?:$barrier_stems)
+   }x;
+
+   if ($line =~ /\b(?:$all_barriers)\s*\(/) {
if (!ctx_has_comment($first_line, $linenr)) {
WARN("MEMORY_BARRIER",
 "memory barrier without comment\n" . 
$herecurr);
-- 
MST

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/3] checkpatch: add virt barriers

2016-01-11 Thread Michael S. Tsirkin
On Sun, Jan 10, 2016 at 02:52:16PM -0800, Joe Perches wrote:
> On Mon, 2016-01-11 at 09:13 +1100, Julian Calaby wrote:
> > On Mon, Jan 11, 2016 at 6:31 AM, Michael S. Tsirkin <m...@redhat.com> wrote:
> > > Add virt_ barriers to list of barriers to check for
> > > presence of a comment.
> []
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > > @@ -5133,7 +5133,8 @@ sub process {
> > > }x;
> > > my $all_barriers = qr{
> > > $barriers|
> > > -   smp_(?:$smp_barrier_stems)
> > > +   smp_(?:$smp_barrier_stems)|
> > > +   virt_(?:$smp_barrier_stems)
> > 
> > Sorry I'm late to the party here, but would it make sense to write this as:
> > 
> > (?:smp|virt)_(?:$smp_barrier_stems)
> 
> Yes.  Perhaps the name might be better as barrier_stems.
> 
> Also, ideally this would be longest match first or use \b
> after the matches so that $all_barriers could work
> successfully without a following \s*\(
> 
> my $all_barriers = qr{
>   (?:smp|virt)_(?:barrier_stems)|
>   $barriers)
> }x;
> 
> or maybe add separate $smp_barriers and $virt_barriers
> 
>   it doesn't matter much in any case

OK just to clarify - are you OK with merging the patch as is?
Refactorings can come as patches on top if required.

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 0/3] checkpatch: handling of memory barriers

2016-01-11 Thread Michael S. Tsirkin
As part of memory barrier cleanup, this patchset
extends checkpatch to make it easier to stop
incorrect memory barrier usage.

This replaces the checkpatch patches in my series
arch: barrier cleanup + barriers for virt
and will be included in the pull request including
the series.

changes from v3:
rename smp_barrier_stems to barrier_stems
as suggested by Julian Calaby.
add (?: ... ) around a variable in regexp,
in case we change the value later so that it matters.
changes from v2:
address comments by Joe Perches:
use (?: ... ) to avoid unnecessary capture groups
rename smp_barriers to smp_barrier_stems for clarity
add barriers before/after atomic
Changes from v1:
catch optional\s* before () in barriers
rewrite using qr{} instead of map

Michael S. Tsirkin (3):
  checkpatch.pl: add missing memory barriers
  checkpatch: check for __smp outside barrier.h
  checkpatch: add virt barriers

Michael S. Tsirkin (3):
  checkpatch.pl: add missing memory barriers
  checkpatch: check for __smp outside barrier.h
  checkpatch: add virt barriers

 scripts/checkpatch.pl | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

-- 
MST

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 3/3] checkpatch: add virt barriers

2016-01-11 Thread Michael S. Tsirkin
Add virt_ barriers to list of barriers to check for
presence of a comment.

Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
---
 scripts/checkpatch.pl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 25476c2..c7bf1aa 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5133,7 +5133,8 @@ sub process {
}x;
my $all_barriers = qr{
(?:$barriers)|
-   smp_(?:$barrier_stems)
+   smp_(?:$barrier_stems)|
+   virt_(?:$barrier_stems)
}x;
 
if ($line =~ /\b(?:$all_barriers)\s*\(/) {
-- 
MST

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 20/32] metag: define __smp_xxx

2016-01-11 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 12:09:30AM +, James Hogan wrote:
> Hi Michael,
> 
> On Thu, Dec 31, 2015 at 09:08:22PM +0200, Michael S. Tsirkin wrote:
> > This defines __smp_xxx barriers for metag,
> > for use by virtualization.
> > 
> > smp_xxx barriers are removed as they are
> > defined correctly by asm-generic/barriers.h
> > 
> > Note: as __smp_XX macros should not depend on CONFIG_SMP, they can not
> > use the existing fence() macro since that is defined differently between
> > SMP and !SMP.  For this reason, this patch introduces a wrapper
> > metag_fence() that doesn't depend on CONFIG_SMP.
> > fence() is then defined using that, depending on CONFIG_SMP.
> 
> I'm not a fan of the inconsistent commit message wrapping. I wrap to 72
> columns (although I now notice SubmittingPatches says to use 75...).
> 
> > 
> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > Acked-by: Arnd Bergmann <a...@arndb.de>
> > ---
> >  arch/metag/include/asm/barrier.h | 32 +++-
> >  1 file changed, 15 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/metag/include/asm/barrier.h 
> > b/arch/metag/include/asm/barrier.h
> > index b5b778b..84880c9 100644
> > --- a/arch/metag/include/asm/barrier.h
> > +++ b/arch/metag/include/asm/barrier.h
> > @@ -44,13 +44,6 @@ static inline void wr_fence(void)
> >  #define rmb()  barrier()
> >  #define wmb()  mb()
> >  
> > -#ifndef CONFIG_SMP
> > -#define fence()do { } while (0)
> > -#define smp_mb()barrier()
> > -#define smp_rmb()   barrier()
> > -#define smp_wmb()   barrier()
> > -#else
> 
> !SMP kernel text differs, but only because of new presence of unused
> metag_fence() inline function. If I #if 0 that out, then it matches, so
> thats fine.
> 
> > -
> >  #ifdef CONFIG_METAG_SMP_WRITE_REORDERING
> >  /*
> >   * Write to the atomic memory unlock system event register (command 0). 
> > This is
> > @@ -60,26 +53,31 @@ static inline void wr_fence(void)
> >   * incoherence). It is therefore ineffective if used after and on the same
> >   * thread as a write.
> >   */
> > -static inline void fence(void)
> > +static inline void metag_fence(void)
> >  {
> > volatile int *flushptr = (volatile int *) LINSYSEVENT_WR_ATOMIC_UNLOCK;
> > barrier();
> > *flushptr = 0;
> > barrier();
> >  }
> > -#define smp_mb()fence()
> > -#define smp_rmb()   fence()
> > -#define smp_wmb()   barrier()
> > +#define __smp_mb()metag_fence()
> > +#define __smp_rmb()   metag_fence()
> > +#define __smp_wmb()   barrier()
> >  #else
> > -#define fence()do { } while (0)
> > -#define smp_mb()barrier()
> > -#define smp_rmb()   barrier()
> > -#define smp_wmb()   barrier()
> > +#define metag_fence()  do { } while (0)
> > +#define __smp_mb()barrier()
> > +#define __smp_rmb()   barrier()
> > +#define __smp_wmb()   barrier()
> 
> Whitespace is now messed up. Admitedly its already inconsistent
> tabs/spaces, but it'd be nice if the definitions at least still all
> lined up. You're touching all the definitions which use spaces anyway,
> so feel free to convert them to tabs while you're at it.
> 
> Other than those niggles, it looks sensible to me:
> Acked-by: James Hogan <james.ho...@imgtec.com>
> 
> Cheers
> James

Thanks!

I did this in my tree (replaced spaces with tabs in the new
definitions); not reposting just because of this change.

> >  #endif
> > +
> > +#ifdef CONFIG_SMP
> > +#define fence() metag_fence()
> > +#else
> > +#define fence()do { } while (0)
> >  #endif
> >  
> > -#define smp_mb__before_atomic()barrier()
> > -#define smp_mb__after_atomic() barrier()
> > +#define __smp_mb__before_atomic()  barrier()
> > +#define __smp_mb__after_atomic()   barrier()
> >  
> >  #include 
> >  
> > -- 
> > MST
> > 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/3] checkpatch: check for __smp outside barrier.h

2016-01-10 Thread Michael S. Tsirkin
Introduction of __smp barriers cleans up a bunch of duplicate code, but
it gives people an additional handle onto a "new" set of barriers - just
because they're prefixed with __* unfortunately doesn't stop anyone from
using it (as happened with other arch stuff before.)

Add a checkpatch test so it will trigger a warning.

Reported-by: Russell King <li...@arm.linux.org.uk>
Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
---
 scripts/checkpatch.pl | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 97b8b62..a96adcb 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5141,6 +5141,16 @@ sub process {
}
}
 
+   my $underscore_smp_barriers = qr{__smp_($smp_barriers)}x;
+
+   if ($realfile !~ m@^include/asm-generic/@ &&
+   $realfile !~ m@/barrier\.h$@ &&
+   $line =~ m/\b($underscore_smp_barriers)\s*\(/ &&
+   $line !~ 
m/^.\s*\#\s*define\s+($underscore_smp_barriers)\s*\(/) {
+   WARN("MEMORY_BARRIER",
+"__smp memory barriers shouldn't be used outside 
barrier.h and asm-generic\n" . $herecurr);
+   }
+
 # check for waitqueue_active without a comment.
if ($line =~ /\bwaitqueue_active\s*\(/) {
if (!ctx_has_comment($first_line, $linenr)) {
-- 
MST

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/3] checkpatch: handling of memory barriers

2016-01-10 Thread Michael S. Tsirkin
As part of memory barrier cleanup, this patchset
extends checkpatch to make it easier to stop
incorrect memory barrier usage.

This applies on top of my series
arch: barrier cleanup + barriers for virt
and will be included in the next version of the series.

Changes from v2:
catch optional\s* before () in barriers
rewrite using qr{} instead of map

Michael S. Tsirkin (3):
  checkpatch.pl: add missing memory barriers
  checkpatch: check for __smp outside barrier.h
  checkpatch: add virt barriers

 scripts/checkpatch.pl | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

-- 
MST

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/3] checkpatch.pl: add missing memory barriers

2016-01-10 Thread Michael S. Tsirkin
SMP-only barriers were missing in checkpatch.pl

Refactor code slightly to make adding more variants easier.

Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
---
 scripts/checkpatch.pl | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 2b3c228..97b8b62 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5116,7 +5116,25 @@ sub process {
}
}
 # check for memory barriers without a comment.
-   if ($line =~ 
/\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/)
 {
+
+   my $barriers = qr{
+   mb|
+   rmb|
+   wmb|
+   read_barrier_depends
+   }x;
+   my $smp_barriers = qr{
+   store_release|
+   load_acquire|
+   store_mb|
+   ($barriers)
+   }x;
+   my $all_barriers = qr{
+   $barriers|
+   smp_($smp_barriers)
+   }x;
+
+   if ($line =~ /\b($all_barriers)\s*\(/) {
if (!ctx_has_comment($first_line, $linenr)) {
WARN("MEMORY_BARRIER",
 "memory barrier without comment\n" . 
$herecurr);
-- 
MST

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] checkpatch.pl: add missing memory barriers

2016-01-10 Thread Michael S. Tsirkin
On Mon, Jan 04, 2016 at 02:15:50PM -0800, Joe Perches wrote:
> On Mon, 2016-01-04 at 22:45 +0200, Michael S. Tsirkin wrote:
> > On Mon, Jan 04, 2016 at 08:07:40AM -0800, Joe Perches wrote:
> > > On Mon, 2016-01-04 at 13:36 +0200, Michael S. Tsirkin wrote:
> > > > SMP-only barriers were missing in checkpatch.pl
> > > > 
> > > > Refactor code slightly to make adding more variants easier.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > > ---
> > > >  scripts/checkpatch.pl | 9 -
> > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > index 2b3c228..0245bbe 100755
> > > > --- a/scripts/checkpatch.pl
> > > > +++ b/scripts/checkpatch.pl
> > > > @@ -5116,7 +5116,14 @@ sub process {
> > > >     }
> > > >     }
> > > >  # check for memory barriers without a comment.
> > > > -   if ($line =~ 
> > > > /\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/)
> > > >  {
> > > > +
> > > > +   my @barriers = ('mb', 'rmb', 'wmb', 
> > > > 'read_barrier_depends');
> > > > +   my @smp_barriers = ('smp_store_release', 
> > > > 'smp_load_acquire', 'smp_store_mb');
> > > > +
> > > > +   @smp_barriers = (@smp_barriers, map {"smp_" . $_} 
> > > > @barriers);
> > > 
> > > I think using map, which so far checkpatch doesn't use,
> > > makes smp_barriers harder to understand and it'd be
> > > better to enumerate them.
> > 
> > Okay - I'll rewrite using foreach.
> > 
> 
> I think using the qr form (like 'our $Attribute' and
> a bunch of others) is the general style that should
> be used for checkpatch.

Thanks - that's what I did in the new version (included in
v3 of the arch cleanup patchset now).

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

  1   2   3   >