Re: [PATCH] VSOCK: Only check error on skb_recv_datagram when skb is NULL

2016-04-19 Thread David Miller
From: Jorgen Hansen 
Date: Mon, 18 Apr 2016 23:58:52 -0700

> If skb_recv_datagram returns an skb, we should ignore the err
> value returned. Otherwise, datagram receives will return EAGAIN
> when they have to wait for a datagram.
> 
> Acked-by: Adit Ranadive 
> Signed-off-by: Jorgen Hansen 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Andy Lutomirski
On Tue, Apr 19, 2016 at 1:54 PM, Michael S. Tsirkin  wrote:
> On Tue, Apr 19, 2016 at 01:27:29PM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin  wrote:
>> > On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote:
>> >> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin  
>> >> wrote:
>> >> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote:
>> >> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote:
>> >> >> >
>> >> >> > > I thought that PLATFORM served that purpose.  Woudn't the host
>> >> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the 
>> >> >> > > host
>> >> >> > > device would skip translation?  Or is that problematic for vfio?
>> >> >> >
>> >> >> > Exactly that's problematic for security.
>> >> >> > You can't allow guest driver to decide whether device skips security.
>> >> >>
>> >> >> Right. Because fundamentally, this *isn't* a property of the endpoint
>> >> >> device, and doesn't live in virtio itself.
>> >> >>
>> >> >> It's a property of the platform IOMMU, and lives there.
>> >> >
>> >> > It's a property of the hypervisor virtio implementation, and lives 
>> >> > there.
>> >>
>> >> It is now, but QEMU could, in principle, change the way it thinks
>> >> about it so that virtio devices would use the QEMU DMA API but ask
>> >> QEMU to pass everything through 1:1.  This would be entirely invisible
>> >> to guests but would make it be a property of the IOMMU implementation.
>> >> At that point, maybe QEMU could find a (platform dependent) way to
>> >> tell the guest what's going on.
>> >>
>> >> FWIW, as far as I can tell, PPC and SPARC really could, in principle,
>> >> set up 1:1 mappings in the guest so that the virtio devices would work
>> >> regardless of whether QEMU is ignoring the IOMMU or not -- I think the
>> >> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set
>> >> up with an offset.  I don't know too much about those platforms, but
>> >> presumably the layout could be changed so that 1:1 really was 1:1.
>> >>
>> >> --Andy
>> >
>> > Sure. Do you see any reason why the decision to do this can't be
>> > keyed off the virtio feature bit?
>>
>> I can think of three types of virtio host:
>>
>> a) virtio always bypasses the IOMMU.
>>
>> b) virtio never bypasses the IOMMU (unless DMAR tables or similar say
>> it does) -- i.e. virtio works like any other device.
>>
>> c) virtio may bypass the IOMMU depending on what the guest asks it to do.
>
> d) some virtio devices bypass the IOMMU and some don't,
> e.g. it's harder to support IOMMU with vhost.
>
>
>> If this is keyed off a virtio feature bit and anyone tries to
>> implement (c), the vfio is going to have a problem.  And, if it's
>> keyed off a virtio feature bit, then (a) won't work on Xen or similar
>> setups unless the Xen hypervisor adds a giant and probably unreliable
>> kludge to support it.  Meanwhile, 4.6-rc works fine under Xen on a
>> default x86 QEMU configuration, and I'd really like to keep it that
>> way.
>>
>> What could plausibly work using a virtio feature bit is for a device
>> to say "hey, I'm a new device and I support the platform-defined IOMMU
>> mechanism".  This bit would be *set* on default IOMMU-less QEMU
>> configurations and on physical virtio PCI cards.
>
> And clear on xen.

How?  QEMU has no idea that the guest is running Xen.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2016 at 01:27:29PM -0700, Andy Lutomirski wrote:
> On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin  wrote:
> > On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote:
> >> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin  
> >> wrote:
> >> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote:
> >> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote:
> >> >> >
> >> >> > > I thought that PLATFORM served that purpose.  Woudn't the host
> >> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the 
> >> >> > > host
> >> >> > > device would skip translation?  Or is that problematic for vfio?
> >> >> >
> >> >> > Exactly that's problematic for security.
> >> >> > You can't allow guest driver to decide whether device skips security.
> >> >>
> >> >> Right. Because fundamentally, this *isn't* a property of the endpoint
> >> >> device, and doesn't live in virtio itself.
> >> >>
> >> >> It's a property of the platform IOMMU, and lives there.
> >> >
> >> > It's a property of the hypervisor virtio implementation, and lives there.
> >>
> >> It is now, but QEMU could, in principle, change the way it thinks
> >> about it so that virtio devices would use the QEMU DMA API but ask
> >> QEMU to pass everything through 1:1.  This would be entirely invisible
> >> to guests but would make it be a property of the IOMMU implementation.
> >> At that point, maybe QEMU could find a (platform dependent) way to
> >> tell the guest what's going on.
> >>
> >> FWIW, as far as I can tell, PPC and SPARC really could, in principle,
> >> set up 1:1 mappings in the guest so that the virtio devices would work
> >> regardless of whether QEMU is ignoring the IOMMU or not -- I think the
> >> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set
> >> up with an offset.  I don't know too much about those platforms, but
> >> presumably the layout could be changed so that 1:1 really was 1:1.
> >>
> >> --Andy
> >
> > Sure. Do you see any reason why the decision to do this can't be
> > keyed off the virtio feature bit?
> 
> I can think of three types of virtio host:
> 
> a) virtio always bypasses the IOMMU.
> 
> b) virtio never bypasses the IOMMU (unless DMAR tables or similar say
> it does) -- i.e. virtio works like any other device.
> 
> c) virtio may bypass the IOMMU depending on what the guest asks it to do.

d) some virtio devices bypass the IOMMU and some don't,
e.g. it's harder to support IOMMU with vhost.


> If this is keyed off a virtio feature bit and anyone tries to
> implement (c), the vfio is going to have a problem.  And, if it's
> keyed off a virtio feature bit, then (a) won't work on Xen or similar
> setups unless the Xen hypervisor adds a giant and probably unreliable
> kludge to support it.  Meanwhile, 4.6-rc works fine under Xen on a
> default x86 QEMU configuration, and I'd really like to keep it that
> way.
> 
> What could plausibly work using a virtio feature bit is for a device
> to say "hey, I'm a new device and I support the platform-defined IOMMU
> mechanism".  This bit would be *set* on default IOMMU-less QEMU
> configurations and on physical virtio PCI cards.

And clear on xen.

>  The guest could
> operate accordingly.  I'm not sure I see a good way for feature
> negotiation to work the other direction, though.

I agree.

> PPC and SPARC could only set this bit on emulated devices if they know
> that new guest kernels are in use.
> 
> --Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Andy Lutomirski
On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin  wrote:
> On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin  wrote:
>> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote:
>> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote:
>> >> >
>> >> > > I thought that PLATFORM served that purpose.  Woudn't the host
>> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host
>> >> > > device would skip translation?  Or is that problematic for vfio?
>> >> >
>> >> > Exactly that's problematic for security.
>> >> > You can't allow guest driver to decide whether device skips security.
>> >>
>> >> Right. Because fundamentally, this *isn't* a property of the endpoint
>> >> device, and doesn't live in virtio itself.
>> >>
>> >> It's a property of the platform IOMMU, and lives there.
>> >
>> > It's a property of the hypervisor virtio implementation, and lives there.
>>
>> It is now, but QEMU could, in principle, change the way it thinks
>> about it so that virtio devices would use the QEMU DMA API but ask
>> QEMU to pass everything through 1:1.  This would be entirely invisible
>> to guests but would make it be a property of the IOMMU implementation.
>> At that point, maybe QEMU could find a (platform dependent) way to
>> tell the guest what's going on.
>>
>> FWIW, as far as I can tell, PPC and SPARC really could, in principle,
>> set up 1:1 mappings in the guest so that the virtio devices would work
>> regardless of whether QEMU is ignoring the IOMMU or not -- I think the
>> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set
>> up with an offset.  I don't know too much about those platforms, but
>> presumably the layout could be changed so that 1:1 really was 1:1.
>>
>> --Andy
>
> Sure. Do you see any reason why the decision to do this can't be
> keyed off the virtio feature bit?

I can think of three types of virtio host:

a) virtio always bypasses the IOMMU.

b) virtio never bypasses the IOMMU (unless DMAR tables or similar say
it does) -- i.e. virtio works like any other device.

c) virtio may bypass the IOMMU depending on what the guest asks it to do.

If this is keyed off a virtio feature bit and anyone tries to
implement (c), the vfio is going to have a problem.  And, if it's
keyed off a virtio feature bit, then (a) won't work on Xen or similar
setups unless the Xen hypervisor adds a giant and probably unreliable
kludge to support it.  Meanwhile, 4.6-rc works fine under Xen on a
default x86 QEMU configuration, and I'd really like to keep it that
way.

What could plausibly work using a virtio feature bit is for a device
to say "hey, I'm a new device and I support the platform-defined IOMMU
mechanism".  This bit would be *set* on default IOMMU-less QEMU
configurations and on physical virtio PCI cards.  The guest could
operate accordingly.  I'm not sure I see a good way for feature
negotiation to work the other direction, though.

PPC and SPARC could only set this bit on emulated devices if they know
that new guest kernels are in use.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote:
> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin  wrote:
> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote:
> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote:
> >> >
> >> > > I thought that PLATFORM served that purpose.  Woudn't the host
> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host
> >> > > device would skip translation?  Or is that problematic for vfio?
> >> >
> >> > Exactly that's problematic for security.
> >> > You can't allow guest driver to decide whether device skips security.
> >>
> >> Right. Because fundamentally, this *isn't* a property of the endpoint
> >> device, and doesn't live in virtio itself.
> >>
> >> It's a property of the platform IOMMU, and lives there.
> >
> > It's a property of the hypervisor virtio implementation, and lives there.
> 
> It is now, but QEMU could, in principle, change the way it thinks
> about it so that virtio devices would use the QEMU DMA API but ask
> QEMU to pass everything through 1:1.  This would be entirely invisible
> to guests but would make it be a property of the IOMMU implementation.
> At that point, maybe QEMU could find a (platform dependent) way to
> tell the guest what's going on.
> 
> FWIW, as far as I can tell, PPC and SPARC really could, in principle,
> set up 1:1 mappings in the guest so that the virtio devices would work
> regardless of whether QEMU is ignoring the IOMMU or not -- I think the
> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set
> up with an offset.  I don't know too much about those platforms, but
> presumably the layout could be changed so that 1:1 really was 1:1.
> 
> --Andy

Sure. Do you see any reason why the decision to do this can't be
keyed off the virtio feature bit?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Andy Lutomirski
On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin  wrote:
> On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote:
>> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote:
>> >
>> > > I thought that PLATFORM served that purpose.  Woudn't the host
>> > > advertise PLATFORM support and, if the guest doesn't ack it, the host
>> > > device would skip translation?  Or is that problematic for vfio?
>> >
>> > Exactly that's problematic for security.
>> > You can't allow guest driver to decide whether device skips security.
>>
>> Right. Because fundamentally, this *isn't* a property of the endpoint
>> device, and doesn't live in virtio itself.
>>
>> It's a property of the platform IOMMU, and lives there.
>
> It's a property of the hypervisor virtio implementation, and lives there.

It is now, but QEMU could, in principle, change the way it thinks
about it so that virtio devices would use the QEMU DMA API but ask
QEMU to pass everything through 1:1.  This would be entirely invisible
to guests but would make it be a property of the IOMMU implementation.
At that point, maybe QEMU could find a (platform dependent) way to
tell the guest what's going on.

FWIW, as far as I can tell, PPC and SPARC really could, in principle,
set up 1:1 mappings in the guest so that the virtio devices would work
regardless of whether QEMU is ignoring the IOMMU or not -- I think the
only obstacle is that the PPC and SPARC 1:1 mappings are currectly set
up with an offset.  I don't know too much about those platforms, but
presumably the layout could be changed so that 1:1 really was 1:1.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote:
> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote:
> > 
> > > I thought that PLATFORM served that purpose.  Woudn't the host
> > > advertise PLATFORM support and, if the guest doesn't ack it, the host
> > > device would skip translation?  Or is that problematic for vfio?
> > 
> > Exactly that's problematic for security.
> > You can't allow guest driver to decide whether device skips security.
> 
> Right. Because fundamentally, this *isn't* a property of the endpoint
> device, and doesn't live in virtio itself.
> 
> It's a property of the platform IOMMU, and lives there.

It's a property of the hypervisor virtio implementation, and lives there.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread David Woodhouse
On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote:
> 
> > I thought that PLATFORM served that purpose.  Woudn't the host
> > advertise PLATFORM support and, if the guest doesn't ack it, the host
> > device would skip translation?  Or is that problematic for vfio?
> 
> Exactly that's problematic for security.
> You can't allow guest driver to decide whether device skips security.

Right. Because fundamentally, this *isn't* a property of the endpoint
device, and doesn't live in virtio itself.

It's a property of the platform IOMMU, and lives there.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2016 at 09:12:03AM -0700, Andy Lutomirski wrote:
> On Tue, Apr 19, 2016 at 9:09 AM, Michael S. Tsirkin  wrote:
> > On Tue, Apr 19, 2016 at 09:02:14AM -0700, Andy Lutomirski wrote:
> >> On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin  
> >> wrote:
> >> > On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote:
> >> >> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse  
> >> >> wrote:
> >> >> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell
> >> >> > the truth, and even legacy kernels ought to cope with that.
> >> >> > FSVO 'ought to' where I suspect some of them will actually crash with 
> >> >> > a
> >> >> > NULL pointer dereference if there's no "catch-all" DMAR unit in the
> >> >> > tables, which puts it back into the same camp as ARM and Power.
> >> >>
> >> >> I think x86 may get a bit of a free pass here.  AFAIK the QEMU IOMMU
> >> >> implementation on x86 has always been "experimental", so it just might
> >> >> be okay to change it in a way that causes some older kernels to OOPS.
> >> >>
> >> >> --Andy
> >> >
> >> > Since it's experimental, it might be OK to change *guest kernels*
> >> > such that they oops on old QEMU.
> >> > But guest kernels were not experimental - so we need a QEMU mode that
> >> > makes them work fine. The more functionality is available in this QEMU
> >> > mode, the betterm because it's going to be the default for a while. For
> >> > the same reason, it is preferable to also have new kernels not crash in
> >> > this mode.
> >> >
> >>
> >> People add QEMU features that need new guest kernels all time time.
> >> If you enable virtio-scsi and try to boot a guest that's too old, it
> >> won't work.  So I don't see anything fundamentally wrong with saying
> >> that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest
> >> kernel is too old.  It might be annoying, since old kernels do work on
> >> actual Q35 hardware, but it at least seems to be that it might be
> >> okay.
> >>
> >> --Andy
> >
> > Yes but we need a mode that makes both old and new kernels work, and
> > that should be the default for a while.  this is what the
> > IOMMU_PASSTHROUGH flag was about: old kernels ignore it and bypass DMA
> > API, new kernels go "oh compatibility mode" and bypass the IOMMU
> > within DMA API.
> 
> I thought that PLATFORM served that purpose.  Woudn't the host
> advertise PLATFORM support and, if the guest doesn't ack it, the host
> device would skip translation?  Or is that problematic for vfio?

Exactly that's problematic for security.
You can't allow guest driver to decide whether device skips security.

> >
> > --
> > MST
> 
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Andy Lutomirski
On Tue, Apr 19, 2016 at 9:09 AM, Michael S. Tsirkin  wrote:
> On Tue, Apr 19, 2016 at 09:02:14AM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin  wrote:
>> > On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote:
>> >> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse  
>> >> wrote:
>> >> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell
>> >> > the truth, and even legacy kernels ought to cope with that.
>> >> > FSVO 'ought to' where I suspect some of them will actually crash with a
>> >> > NULL pointer dereference if there's no "catch-all" DMAR unit in the
>> >> > tables, which puts it back into the same camp as ARM and Power.
>> >>
>> >> I think x86 may get a bit of a free pass here.  AFAIK the QEMU IOMMU
>> >> implementation on x86 has always been "experimental", so it just might
>> >> be okay to change it in a way that causes some older kernels to OOPS.
>> >>
>> >> --Andy
>> >
>> > Since it's experimental, it might be OK to change *guest kernels*
>> > such that they oops on old QEMU.
>> > But guest kernels were not experimental - so we need a QEMU mode that
>> > makes them work fine. The more functionality is available in this QEMU
>> > mode, the betterm because it's going to be the default for a while. For
>> > the same reason, it is preferable to also have new kernels not crash in
>> > this mode.
>> >
>>
>> People add QEMU features that need new guest kernels all time time.
>> If you enable virtio-scsi and try to boot a guest that's too old, it
>> won't work.  So I don't see anything fundamentally wrong with saying
>> that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest
>> kernel is too old.  It might be annoying, since old kernels do work on
>> actual Q35 hardware, but it at least seems to be that it might be
>> okay.
>>
>> --Andy
>
> Yes but we need a mode that makes both old and new kernels work, and
> that should be the default for a while.  this is what the
> IOMMU_PASSTHROUGH flag was about: old kernels ignore it and bypass DMA
> API, new kernels go "oh compatibility mode" and bypass the IOMMU
> within DMA API.

I thought that PLATFORM served that purpose.  Woudn't the host
advertise PLATFORM support and, if the guest doesn't ack it, the host
device would skip translation?  Or is that problematic for vfio?

>
> --
> MST



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2016 at 09:02:14AM -0700, Andy Lutomirski wrote:
> On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin  wrote:
> > On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse  
> >> wrote:
> >> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell
> >> > the truth, and even legacy kernels ought to cope with that.
> >> > FSVO 'ought to' where I suspect some of them will actually crash with a
> >> > NULL pointer dereference if there's no "catch-all" DMAR unit in the
> >> > tables, which puts it back into the same camp as ARM and Power.
> >>
> >> I think x86 may get a bit of a free pass here.  AFAIK the QEMU IOMMU
> >> implementation on x86 has always been "experimental", so it just might
> >> be okay to change it in a way that causes some older kernels to OOPS.
> >>
> >> --Andy
> >
> > Since it's experimental, it might be OK to change *guest kernels*
> > such that they oops on old QEMU.
> > But guest kernels were not experimental - so we need a QEMU mode that
> > makes them work fine. The more functionality is available in this QEMU
> > mode, the betterm because it's going to be the default for a while. For
> > the same reason, it is preferable to also have new kernels not crash in
> > this mode.
> >
> 
> People add QEMU features that need new guest kernels all time time.
> If you enable virtio-scsi and try to boot a guest that's too old, it
> won't work.  So I don't see anything fundamentally wrong with saying
> that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest
> kernel is too old.  It might be annoying, since old kernels do work on
> actual Q35 hardware, but it at least seems to be that it might be
> okay.
> 
> --Andy

Yes but we need a mode that makes both old and new kernels work, and
that should be the default for a while.  this is what the
IOMMU_PASSTHROUGH flag was about: old kernels ignore it and bypass DMA
API, new kernels go "oh compatibility mode" and bypass the IOMMU
within DMA API.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2016 at 09:00:27AM -0700, Andy Lutomirski wrote:
> On Apr 19, 2016 2:13 AM, "Michael S. Tsirkin"  wrote:
> >
> >
> > I guess you are right in that we should split this part out.
> > What I wanted is really the combination
> > PASSTHROUGH && !PLATFORM so that we can say "ok we don't
> > need to guess, this device actually bypasses the IOMMU".
> 
> What happens when you use a device like this on Xen or with a similar
> software translation layer?

I think you don't use it on Xen since virtio doesn't bypass an IOMMU there.
If you do you have misconfigured your device.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Andy Lutomirski
On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin  wrote:
> On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse  
>> wrote:
>> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell
>> > the truth, and even legacy kernels ought to cope with that.
>> > FSVO 'ought to' where I suspect some of them will actually crash with a
>> > NULL pointer dereference if there's no "catch-all" DMAR unit in the
>> > tables, which puts it back into the same camp as ARM and Power.
>>
>> I think x86 may get a bit of a free pass here.  AFAIK the QEMU IOMMU
>> implementation on x86 has always been "experimental", so it just might
>> be okay to change it in a way that causes some older kernels to OOPS.
>>
>> --Andy
>
> Since it's experimental, it might be OK to change *guest kernels*
> such that they oops on old QEMU.
> But guest kernels were not experimental - so we need a QEMU mode that
> makes them work fine. The more functionality is available in this QEMU
> mode, the betterm because it's going to be the default for a while. For
> the same reason, it is preferable to also have new kernels not crash in
> this mode.
>

People add QEMU features that need new guest kernels all time time.
If you enable virtio-scsi and try to boot a guest that's too old, it
won't work.  So I don't see anything fundamentally wrong with saying
that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest
kernel is too old.  It might be annoying, since old kernels do work on
actual Q35 hardware, but it at least seems to be that it might be
okay.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Andy Lutomirski
On Apr 19, 2016 2:13 AM, "Michael S. Tsirkin"  wrote:
>
>
> I guess you are right in that we should split this part out.
> What I wanted is really the combination
> PASSTHROUGH && !PLATFORM so that we can say "ok we don't
> need to guess, this device actually bypasses the IOMMU".

What happens when you use a device like this on Xen or with a similar
software translation layer?

I think that a "please bypass IOMMU" feature would be better in the
PCI, IOMMU, or platform code.  For Xen, virtio would still want to use
the DMA API, just without translating at the DMAR or hardware level.
Doing it in virtio is awkward, because virtio is involved at the
device level and the driver level, but the translation might be
entirely in between.

I think a nicer long-term approach would be to have a way to ask the
guest to set up a full 1:1 mapping for performance, but to still
handle the case where the guest refuses to do so or where there's more
than one translation layer involved.

But I agree that this part shouldn't delay the other part of your series.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Alex Williamson
On Tue, 19 Apr 2016 12:13:29 +0300
"Michael S. Tsirkin"  wrote:

> On Mon, Apr 18, 2016 at 02:29:33PM -0400, David Woodhouse wrote:
> > On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote:  
> > > I balk at adding more hacks to a broken system. My goals are
> > > merely to
> > > - make things work correctly with an IOMMU and new guests,
> > >   so people can use userspace drivers with virtio devices
> > > - prevent security risks when guest kernel mistakenly thinks
> > >   it's protected by an IOMMU, but in fact isn't
> > > - avoid breaking any working configurations  
> > 
> > AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this.
> > That's just an optimisation, for telling an OS "you don't really need
> > to bother with the IOMMU, even though you it works".
> > 
> > There are two main reasons why an operating system might want to use
> > the IOMMU via the DMA API for native drivers: 
> >  - To protect against driver bugs triggering rogue DMA.
> >  - To protect against hardware (or firmware) bugs.
> > 
> > With virtio, the first reason still exists. But the second is moot
> > because the device is part of the hypervisor and if the hypervisor is
> > untrustworthy then you're screwed anyway... but then again, in SoC
> > devices you could replace 'hypervisor' with 'chip' and the same is
> > true, isn't it? Is there *really* anything virtio-specific here?
> >
> > Sure, I want my *external* network device on a PCIe card with software-
> > loadable firmware to be behind an IOMMU because I don't trust it as far
> > as I can throw it. But for on-SoC devices surely the situation is
> > *just* the same as devices provided by a hypervisor?  
> 
> Depends on how SoC is designed I guess.  At the moment specifically QEMU
> runs everything in a single memory space so an IOMMU table lookup does
> not offer any extra protection. That's not a must, one could come
> up with modular hypervisor designs - it's just what we have ATM.
> 
> 
> > And some people want that external network device to use passthrough
> > anyway, for performance reasons.  
> 
> That's a policy decision though.
> 
> > On the whole, there are *plenty* of reasons why we might want to have a
> > passthrough mapping on a per-device basis,  
> 
> That's true. And driver security also might differ, for example maybe I
> trust a distro-supplied driver more than an out of tree one.  Or maybe I
> trust a distro-supplied userspace driver more than a closed-source one.
> And maybe I trust devices from same vendor as my chip more than a 3rd
> party one.  So one can generalize this even further, think about device
> and driver security/trust level as an integer and platform protection as an
> integer.
> 
> If platform IOMMU offers you extra protection over trusting the device
> (trust < protection) it improves you security to use platform to limit
> the device. If trust >= protection it just adds overhead without
> increasing the security.
> 
> > and I really struggle to
> > find justification for having this 'hint' in a virtio-specific way.  
> 
> It's a way. No system seems to expose this information in a more generic
> way at the moment, and it's portable. Would you like to push for some
> kind of standartization of such a hint? I would be interested
> to hear about that.
> 
> 
> > And it's complicating the discussion of the *actual* fix we're looking
> > at.  
> 
> I guess you are right in that we should split this part out.
> What I wanted is really the combination
> PASSTHROUGH && !PLATFORM so that we can say "ok we don't
> need to guess, this device actually bypasses the IOMMU".
> 
> And I thought it's a nice idea to use PASSTHROUGH && PLATFORM
> as a hint since it seemed to be unused.
> But maybe the best thing to do for now is to say
> - hosts should not set PASSTHROUGH && PLATFORM
> - guests should ignore PASSTHROUGH if PLATFORM is set
> 
> and then we can come back to this optimization idea later
> if it's appropriate.
> 
> So yes I think we need the two bits but no we don't need to
> mix the hint discussion in here.
> 
> > > Looking at guest code, it looks like virtio was always
> > > bypassing the IOMMU even if configured, but no other
> > > guest driver did.
> > > 
> > > This makes me think the problem where guest drivers
> > > ignore the IOMMU is virtio specific
> > > and so a virtio specific solution seems cleaner.
> > > 
> > > The problem for assigned devices is IMHO different: they bypass
> > > the guest IOMMU too but no guest driver knows about this,
> > > so guests do not work. Seems cleaner to fix QEMU to make
> > > existing guests work.  
> > 
> > I certainly agree that it's better to fix QEMU. Whether devices are
> > behind an IOMMU or not, the DMAR tables we expose to a guest should
> > tell the truth.
> > 
> > Part of the issue here is virtio-specific; part isn't.
> > 
> > Basically, we have a conjunction of two separate bugs which happened to
> > work (for virtio) — the IOMMU support in QEMU wasn't working for virti

Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse  wrote:
> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell
> > the truth, and even legacy kernels ought to cope with that.
> > FSVO 'ought to' where I suspect some of them will actually crash with a
> > NULL pointer dereference if there's no "catch-all" DMAR unit in the
> > tables, which puts it back into the same camp as ARM and Power.
> 
> I think x86 may get a bit of a free pass here.  AFAIK the QEMU IOMMU
> implementation on x86 has always been "experimental", so it just might
> be okay to change it in a way that causes some older kernels to OOPS.
> 
> --Andy

Since it's experimental, it might be OK to change *guest kernels*
such that they oops on old QEMU.
But guest kernels were not experimental - so we need a QEMU mode that
makes them work fine. The more functionality is available in this QEMU
mode, the betterm because it's going to be the default for a while. For
the same reason, it is preferable to also have new kernels not crash in
this mode.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 3/3] vfio: add virtio pci quirk

2016-04-19 Thread Michael S. Tsirkin
On Mon, Apr 18, 2016 at 02:00:06PM -0600, Alex Williamson wrote:
> On Mon, 18 Apr 2016 12:58:28 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > to signal they are safe to use with an IOMMU.
> > 
> > Without this bit, exposing the device to userspace is unsafe, so probe
> > and fail VFIO initialization unless noiommu is enabled.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> >  drivers/vfio/pci/vfio_pci.c |  11 +++
> >  drivers/vfio/pci/vfio_pci_virtio.c  | 135 
> > 
> >  drivers/vfio/pci/Makefile   |   1 +
> >  4 files changed, 148 insertions(+)
> >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > b/drivers/vfio/pci/vfio_pci_private.h
> > index 8a7d546..604d445 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -130,4 +130,5 @@ static inline int vfio_pci_igd_init(struct 
> > vfio_pci_device *vdev)
> > return -ENODEV;
> >  }
> >  #endif
> > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, int 
> > noiommu);
> >  #endif /* VFIO_PCI_PRIVATE_H */
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index d622a41..2bb8c76 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1125,6 +1125,17 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > const struct pci_device_id *id)
> > return ret;
> > }
> >  
> > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET &&
> 
> Virtio really owns this entire vendor ID block?  Apparently nobody told
> ivshmem: http://pci-ids.ucw.cz/read/PC/1af4/1110  Even the comment by
> virtio_pci_id_table[] suggests virtio is only a subset even if the code
> doesn't appear to honor that comment.  I don't know the history there,
> but that seems like really inefficient use of an entire, coveted vendor
> block.

True - virtio spec also says it's up to 0x107f.


> > +   ((ret = vfio_pci_virtio_quirk(vdev, ret {
> 
> Please don't set variables like this unless necessary.
> 
> if (vendor...) {
>ret = vfio_pci_virtio_quir...
>if (ret) {
>...
> 
> > +   dev_warn(&vdev->pdev->dev,
> > +"Failed to setup Virtio for VFIO\n");
> > +   vfio_del_group_dev(&pdev->dev);
> > +   vfio_iommu_group_put(group, &pdev->dev);
> > +   kfree(vdev);
> > +   return ret;
> > +   }
> > +
> > +
> > if (vfio_pci_is_vga(pdev)) {
> > vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > vga_set_legacy_decoding(pdev,
> > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > b/drivers/vfio/pci/vfio_pci_virtio.c
> > new file mode 100644
> > index 000..1a32064
> > --- /dev/null
> > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > @@ -0,0 +1,135 @@
> > +/*
> > + * VFIO PCI Intel Graphics support
> > + *
> > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > + * Author: Alex Williamson 
> > + *
> 
> Update
> 
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Register a device specific region through which to provide read-only
> > + * access to the Intel IGD opregion.  The register defining the opregion
> > + * address is also virtualized to prevent user modification.
> 
> Update
> 
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> 
> I don't see where io or uaccess are needed here.
> 
> > +
> > +#include "vfio_pci_private.h"
> > +
> > +/**
> > + * virtio_pci_find_capability - walk capabilities to find device info.
> > + * @dev: the pci device
> > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > + *
> > + * Returns offset of the capability, or 0.
> > + */
> > +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> > cfg_type)
> 
> This is called from probe code, why inline?  There's already a function
> with this exact same name in virtio code, can we come up with something
> unique to avoid confusion?
> 
> > +{
> > +   int pos;
> > +
> > +   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> > +pos > 0;
> > +pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > +   u8 type;
> > +   pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> > +cfg_type),
> > +&type);
> > +
> > +   if (type != cfg_type)
> > +   continue;
> > +
> > +   /* Ignore structures with reserved BAR values */
> > +   if (type != VIRTIO_PCI_CAP_PCI_CFG) {
> > +   u8 bar;
> > +
> > +   pci_

Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Michael S. Tsirkin
On Mon, Apr 18, 2016 at 02:29:33PM -0400, David Woodhouse wrote:
> On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote:
> > I balk at adding more hacks to a broken system. My goals are
> > merely to
> > - make things work correctly with an IOMMU and new guests,
> >   so people can use userspace drivers with virtio devices
> > - prevent security risks when guest kernel mistakenly thinks
> >   it's protected by an IOMMU, but in fact isn't
> > - avoid breaking any working configurations
> 
> AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this.
> That's just an optimisation, for telling an OS "you don't really need
> to bother with the IOMMU, even though you it works".
> 
> There are two main reasons why an operating system might want to use
> the IOMMU via the DMA API for native drivers: 
>  - To protect against driver bugs triggering rogue DMA.
>  - To protect against hardware (or firmware) bugs.
> 
> With virtio, the first reason still exists. But the second is moot
> because the device is part of the hypervisor and if the hypervisor is
> untrustworthy then you're screwed anyway... but then again, in SoC
> devices you could replace 'hypervisor' with 'chip' and the same is
> true, isn't it? Is there *really* anything virtio-specific here?
>
> Sure, I want my *external* network device on a PCIe card with software-
> loadable firmware to be behind an IOMMU because I don't trust it as far
> as I can throw it. But for on-SoC devices surely the situation is
> *just* the same as devices provided by a hypervisor?

Depends on how SoC is designed I guess.  At the moment specifically QEMU
runs everything in a single memory space so an IOMMU table lookup does
not offer any extra protection. That's not a must, one could come
up with modular hypervisor designs - it's just what we have ATM.


> And some people want that external network device to use passthrough
> anyway, for performance reasons.

That's a policy decision though.

> On the whole, there are *plenty* of reasons why we might want to have a
> passthrough mapping on a per-device basis,

That's true. And driver security also might differ, for example maybe I
trust a distro-supplied driver more than an out of tree one.  Or maybe I
trust a distro-supplied userspace driver more than a closed-source one.
And maybe I trust devices from same vendor as my chip more than a 3rd
party one.  So one can generalize this even further, think about device
and driver security/trust level as an integer and platform protection as an
integer.

If platform IOMMU offers you extra protection over trusting the device
(trust < protection) it improves you security to use platform to limit
the device. If trust >= protection it just adds overhead without
increasing the security.

> and I really struggle to
> find justification for having this 'hint' in a virtio-specific way.

It's a way. No system seems to expose this information in a more generic
way at the moment, and it's portable. Would you like to push for some
kind of standartization of such a hint? I would be interested
to hear about that.


> And it's complicating the discussion of the *actual* fix we're looking
> at.

I guess you are right in that we should split this part out.
What I wanted is really the combination
PASSTHROUGH && !PLATFORM so that we can say "ok we don't
need to guess, this device actually bypasses the IOMMU".

And I thought it's a nice idea to use PASSTHROUGH && PLATFORM
as a hint since it seemed to be unused.
But maybe the best thing to do for now is to say
- hosts should not set PASSTHROUGH && PLATFORM
- guests should ignore PASSTHROUGH if PLATFORM is set

and then we can come back to this optimization idea later
if it's appropriate.

So yes I think we need the two bits but no we don't need to
mix the hint discussion in here.

> > Looking at guest code, it looks like virtio was always
> > bypassing the IOMMU even if configured, but no other
> > guest driver did.
> > 
> > This makes me think the problem where guest drivers
> > ignore the IOMMU is virtio specific
> > and so a virtio specific solution seems cleaner.
> > 
> > The problem for assigned devices is IMHO different: they bypass
> > the guest IOMMU too but no guest driver knows about this,
> > so guests do not work. Seems cleaner to fix QEMU to make
> > existing guests work.
> 
> I certainly agree that it's better to fix QEMU. Whether devices are
> behind an IOMMU or not, the DMAR tables we expose to a guest should
> tell the truth.
> 
> Part of the issue here is virtio-specific; part isn't.
> 
> Basically, we have a conjunction of two separate bugs which happened to
> work (for virtio) — the IOMMU support in QEMU wasn't working for virtio
> (and assigned) devices even though it theoretically *should* have been,
> and the virtio drivers weren't using the DMA API as they theoretically
> should have been.
> 
> So there were corner cases like assigned PCI devices, and real hardware
> implementations of virtio stuff (and perhaps vi

Re: [PATCH v3 11/16] zsmalloc: separate free_zspage from putback_zspage

2016-04-19 Thread Sergey Senozhatsky
Hello Minchan,

On (04/19/16 16:51), Minchan Kim wrote:
[..]
> 
> I guess it is remained thing after I rebased to catch any mistake.
> But I'm heavily chainging this part.
> Please review next version instead of this after a few days. :)

ah, got it. thanks!

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 11/16] zsmalloc: separate free_zspage from putback_zspage

2016-04-19 Thread Minchan Kim
Hi Sergey,

On Mon, Apr 18, 2016 at 10:04:08AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (03/30/16 16:12), Minchan Kim wrote:
> [..]
> > @@ -1835,23 +1827,31 @@ static void __zs_compact(struct zs_pool *pool, 
> > struct size_class *class)
> > if (!migrate_zspage(pool, class, &cc))
> > break;
> >  
> > -   putback_zspage(pool, class, dst_page);
> > +   VM_BUG_ON_PAGE(putback_zspage(pool, class,
> > +   dst_page) == ZS_EMPTY, dst_page);
> 
> can this VM_BUG_ON_PAGE() condition ever be true?

I guess it is remained thing after I rebased to catch any mistake.
But I'm heavily chainging this part.
Please review next version instead of this after a few days. :)

> 
> > }
> > /* Stop if we couldn't find slot */
> > if (dst_page == NULL)
> > break;
> > -   putback_zspage(pool, class, dst_page);
> > -   if (putback_zspage(pool, class, src_page) == ZS_EMPTY)
> > +   VM_BUG_ON_PAGE(putback_zspage(pool, class,
> > +   dst_page) == ZS_EMPTY, dst_page);
> 
> hm... this VM_BUG_ON_PAGE(dst_page) is sort of confusing. under what
> circumstances it can be true?
> 
> a minor nit, it took me some time (need some coffee I guess) to
> correctly parse this macro wrapper
> 
>   VM_BUG_ON_PAGE(putback_zspage(pool, class,
>   dst_page) == ZS_EMPTY, dst_page);
> 
> may be do it like:
>   fullness = putback_zspage(pool, class, dst_page);
>   VM_BUG_ON_PAGE(fullness == ZS_EMPTY, dst_page);
> 
> 
> well, if we want to VM_BUG_ON_PAGE() at all. there haven't been any
> problems with compaction, is there any specific reason these macros
> were added?
> 
> 
> 
> > +   if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
> > pool->stats.pages_compacted += class->pages_per_zspage;
> > -   spin_unlock(&class->lock);
> > +   spin_unlock(&class->lock);
> > +   free_zspage(pool, class, src_page);
> 
> do we really need to free_zspage() out of class->lock?
> wouldn't something like this
> 
>   if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
>   pool->stats.pages_compacted += class->pages_per_zspage;
>   free_zspage(pool, class, src_page);
>   }
>   spin_unlock(&class->lock);
> 
> be simpler?

The reason I did out of class->lock is deadlock between page_lock
and class->lock with upcoming page migration.
However, as I said, I'm now heavily changing the part. :)

> 
> besides, free_zspage() now updates class stats out of class lock,
> not critical but still.
> 
>   -ss
> 
> > +   } else {
> > +   spin_unlock(&class->lock);
> > +   }
> > +
> > cond_resched();
> > spin_lock(&class->lock);
> > }
> >  
> > if (src_page)
> > -   putback_zspage(pool, class, src_page);
> > +   VM_BUG_ON_PAGE(putback_zspage(pool, class,
> > +   src_page) == ZS_EMPTY, src_page);
> >  
> > spin_unlock(&class->lock);
> >  }
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 10/16] zsmalloc: factor page chain functionality out

2016-04-19 Thread Minchan Kim
On Mon, Apr 18, 2016 at 09:33:05AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/30/16 16:12), Minchan Kim wrote:
> > @@ -1421,7 +1434,6 @@ static unsigned long obj_malloc(struct size_class 
> > *class,
> > unsigned long m_offset;
> > void *vaddr;
> >  
> > -   handle |= OBJ_ALLOCATED_TAG;
> 
> a nitpick, why did you replace this ALLOCATED_TAG assignment
> with 2 'handle | OBJ_ALLOCATED_TAG'?

I thought this handle variable in here is pure handle but OBJ_ALLOCATED_TAG
should live in head(i.e., link->handle), not pure handle, itself.

> 
>   -ss
> 
> > obj = get_freeobj(first_page);
> > objidx_to_page_and_offset(class, first_page, obj,
> > &m_page, &m_offset);
> > @@ -1431,10 +1443,10 @@ static unsigned long obj_malloc(struct size_class 
> > *class,
> > set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG);
> > if (!class->huge)
> > /* record handle in the header of allocated chunk */
> > -   link->handle = handle;
> > +   link->handle = handle | OBJ_ALLOCATED_TAG;
> > else
> > /* record handle in first_page->private */
> > -   set_page_private(first_page, handle);
> > +   set_page_private(first_page, handle | OBJ_ALLOCATED_TAG);
> > kunmap_atomic(vaddr);
> > mod_zspage_inuse(first_page, 1);
> > zs_stat_inc(class, OBJ_USED, 1);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 08/16] zsmalloc: squeeze freelist into page->mapping

2016-04-19 Thread Minchan Kim
On Mon, Apr 18, 2016 at 12:56:21AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/30/16 16:12), Minchan Kim wrote:
> [..]
> > +static void objidx_to_page_and_offset(struct size_class *class,
> > +   struct page *first_page,
> > +   unsigned long obj_idx,
> > +   struct page **obj_page,
> > +   unsigned long *offset_in_page)
> >  {
> > -   unsigned long obj;
> > +   int i;
> > +   unsigned long offset;
> > +   struct page *cursor;
> > +   int nr_page;
> >  
> > -   if (!page) {
> > -   VM_BUG_ON(obj_idx);
> > -   return NULL;
> > -   }
> > +   offset = obj_idx * class->size;
> 
> so we already know the `offset' before we call objidx_to_page_and_offset(),
> thus we can drop `struct size_class *class' and `obj_idx', and pass
> `long obj_offset'  (which is `obj_idx * class->size') instead, right?
> 
> we also _may be_ can return `cursor' from the function.
> 
> static struct page *objidx_to_page_and_offset(struct page *first_page,
>   unsigned long obj_offset,
>   unsigned long *offset_in_page);
> 
> this can save ~20 instructions, which is not so terrible for a hot path
> like obj_malloc(). what do you think?
> 
> well, seems that `unsigned long *offset_in_page' can be calculated
> outside of this function too, it's basically
> 
>   *offset_in_page = (obj_idx * class->size) & ~PAGE_MASK;
> 
> so we don't need to supply it to this function, nor modify it there.
> which can save ~40 instructions on my system. does this sound silly?

Sound smart. :)
At least, we will use it in hot path.

Thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 06/16] zsmalloc: squeeze inuse into page->mapping

2016-04-19 Thread Minchan Kim
On Mon, Apr 18, 2016 at 12:08:04AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/30/16 16:12), Minchan Kim wrote:
> [..]
> > +static int get_zspage_inuse(struct page *first_page)
> > +{
> > +   struct zs_meta *m;
> > +
> > +   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> > +
> > +   m = (struct zs_meta *)&first_page->mapping;
> ..
> > +static void set_zspage_inuse(struct page *first_page, int val)
> > +{
> > +   struct zs_meta *m;
> > +
> > +   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> > +
> > +   m = (struct zs_meta *)&first_page->mapping;
> ..
> > +static void mod_zspage_inuse(struct page *first_page, int val)
> > +{
> > +   struct zs_meta *m;
> > +
> > +   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> > +
> > +   m = (struct zs_meta *)&first_page->mapping;
> ..
> >  static void get_zspage_mapping(struct page *first_page,
> > unsigned int *class_idx,
> > enum fullness_group *fullness)
> >  {
> > -   unsigned long m;
> > +   struct zs_meta *m;
> > +
> > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> > +   m = (struct zs_meta *)&first_page->mapping;
> ..
> >  static void set_zspage_mapping(struct page *first_page,
> > unsigned int class_idx,
> > enum fullness_group fullness)
> >  {
> > +   struct zs_meta *m;
> > +
> > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> >  
> > +   m = (struct zs_meta *)&first_page->mapping;
> > +   m->fullness = fullness;
> > +   m->class = class_idx;
> >  }
> 
> 
> a nitpick: this
> 
>   struct zs_meta *m;
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>   m = (struct zs_meta *)&first_page->mapping;
> 
> 
> seems to be common in several places, may be it makes sense to
> factor it out and turn into a macro or a static inline helper?
> 
> other than that, looks good to me

Yeb.

> 
> Reviewed-by: Sergey Senozhatsky 

Thanks for the review!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization