Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Jason Wang



On 2018年06月13日 12:24, Samudrala, Sridhar wrote:

On 6/12/2018 7:38 PM, Jason Wang wrote:



On 2018年06月12日 19:54, Michael S. Tsirkin wrote:

On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:


On 2018年06月05日 20:33, Michael S. Tsirkin wrote:

I don't think this is sufficient.

If both primary and standby devices are present, a legacy guest 
without

support for the feature might see two devices with same mac and get
confused.

I think that we should only make primary visible after guest acked 
the

backup feature bit.
I think we want exactly the reverse? E.g fail the negotiation when 
guest

does not ack backup feature.

Otherwise legacy guest won't even have the chance to see primary 
device in

the guest.

That's by design.


So management needs to know the capability of guest to set the backup 
feature. This looks a chicken or egg problem to me.


I don't think so. If the tenant requests 'accelerated datapath 
feature', the management
will set 'standby' feature bit on virtio-net interface and if the 
guest virtio-net driver
supports this feature, then the tenant VM will get that capability via 
a hot-plugged

primary device.


Ok, I thought exactly the reverse because of the commit title is "enable 
virtio_net to act as a standby for a passthru device". But re-read the 
commit log content, I understand the case a little bit. Btw, VF is not 
necessarily faster than virtio-net, especially consider virtio-net may 
have a lot of queues.










And on reset or when backup is cleared in some other way, unplug the
primary.

What if guest does not support hotplug?

It shouldn't ack the backup feature then and it's a good point.
We should both document this and check kernel config has
hotplug support. Sridhar could you take a look pls?


Something like the below will do the job:

Primary device is added with a special "primary-failover" flag.
A virtual machine is then initialized with just a standby virtio
device. Primary is not yet added.

A question is how to do the matching? Qemu knows nothing about e.g mac
address of a pass-through device I believe?

Supply a flag to the VFIO when it's added, this way QEMU will know.


Later QEMU detects that guest driver device set DRIVER_OK.
It then exposes the primary device to the guest, and triggers
a device addition event (hot-plug event) for it.

Do you mean we won't have primary device in the initial qemu cli?

No, that's not what I mean.

I mean management will supply a flag to VFIO and then


- VFIO defers exposing
primary to guest until guest acks the feature bit.
- When we see guest ack, initiate hotplug.
- On reboot, hide it again.
- On reset without reboot, request hot-unplug and on eject hide it 
again.


This sounds much like a kind of bonding in qemu.



If QEMU detects guest driver removal, it initiates a hot-unplug 
sequence

to remove the primary driver.  In particular, if QEMU detects guest
re-initialization (e.g. by detecting guest reset) it immediately 
removes

the primary device.

I believe guest failover module should handle this gracefully?

It can't control enumeration order, if primary is enumerated before
standby then guest will load its driver and it's too late
when failover driver is loaded.


Well, even if we can delay the visibility of primary until DRIVER_OK, 
there still be a race I think? And it looks to me it's still a bug of 
guest:


E.g primary could be probed before failover_register() in guest. Then 
we will miss the enslaving of primary forever.


That is not an issue. Even if the primary is probed before failover 
driver, it will
enslave the primary via the call to failover_existing_slave_register() 
as part of

failover_register() routine.


Aha I get it. So the enumeration order is not an issue.

Consider primary may still be seen by guest kernel even if we delay its 
visibility, I wonder whether we can control the lifecycle of primary 
through driver but not qemu. This can simplify a lot of things.


Thanks





Thanks




Thanks

We can move some of this code to management as well, 
architecturally it
does not make too much sense but it might be easier 
implementation-wise.


HTH

On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
Ping on this patch now that the kernel patches are accepted into 
davem's net-next tree.

https://patchwork.ozlabs.org/cover/920005/


On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
This feature bit can be used by hypervisor to indicate 
virtio_net device to

act as a standby for another device with the same MAC address.

I tested this with a small change to the patch to mark the 
STANDBY feature 'true'

by default as i am using libvirt to start the VMs.
Is there a way to pass the newly added feature bit 'standby' to 
qemu via libvirt

XML file?

Signed-off-by: Sridhar Samudrala 
---
    hw/net/virtio-net.c | 2 ++
    include/standard-headers/linux/virtio_net.h | 3 +++
    2 files changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c 

Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Samudrala, Sridhar

On 6/12/2018 7:38 PM, Jason Wang wrote:



On 2018年06月12日 19:54, Michael S. Tsirkin wrote:

On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:


On 2018年06月05日 20:33, Michael S. Tsirkin wrote:

I don't think this is sufficient.

If both primary and standby devices are present, a legacy guest 
without

support for the feature might see two devices with same mac and get
confused.

I think that we should only make primary visible after guest acked the
backup feature bit.
I think we want exactly the reverse? E.g fail the negotiation when 
guest

does not ack backup feature.

Otherwise legacy guest won't even have the chance to see primary 
device in

the guest.

That's by design.


So management needs to know the capability of guest to set the backup 
feature. This looks a chicken or egg problem to me.


I don't think so. If the tenant requests 'accelerated datapath feature', the 
management
will set 'standby' feature bit on virtio-net interface and if the guest 
virtio-net driver
supports this feature, then the tenant VM will get that capability via a 
hot-plugged
primary device.







And on reset or when backup is cleared in some other way, unplug the
primary.

What if guest does not support hotplug?

It shouldn't ack the backup feature then and it's a good point.
We should both document this and check kernel config has
hotplug support. Sridhar could you take a look pls?


Something like the below will do the job:

Primary device is added with a special "primary-failover" flag.
A virtual machine is then initialized with just a standby virtio
device. Primary is not yet added.

A question is how to do the matching? Qemu knows nothing about e.g mac
address of a pass-through device I believe?

Supply a flag to the VFIO when it's added, this way QEMU will know.


Later QEMU detects that guest driver device set DRIVER_OK.
It then exposes the primary device to the guest, and triggers
a device addition event (hot-plug event) for it.

Do you mean we won't have primary device in the initial qemu cli?

No, that's not what I mean.

I mean management will supply a flag to VFIO and then


- VFIO defers exposing
primary to guest until guest acks the feature bit.
- When we see guest ack, initiate hotplug.
- On reboot, hide it again.
- On reset without reboot, request hot-unplug and on eject hide it 
again.


This sounds much like a kind of bonding in qemu.



If QEMU detects guest driver removal, it initiates a hot-unplug 
sequence

to remove the primary driver.  In particular, if QEMU detects guest
re-initialization (e.g. by detecting guest reset) it immediately 
removes

the primary device.

I believe guest failover module should handle this gracefully?

It can't control enumeration order, if primary is enumerated before
standby then guest will load its driver and it's too late
when failover driver is loaded.


Well, even if we can delay the visibility of primary until DRIVER_OK, 
there still be a race I think? And it looks to me it's still a bug of 
guest:


E.g primary could be probed before failover_register() in guest. Then 
we will miss the enslaving of primary forever.


That is not an issue. Even if the primary is probed before failover driver, it 
will
enslave the primary via the call to failover_existing_slave_register() as part 
of
failover_register() routine.



Thanks




Thanks

We can move some of this code to management as well, 
architecturally it
does not make too much sense but it might be easier 
implementation-wise.


HTH

On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
Ping on this patch now that the kernel patches are accepted into 
davem's net-next tree.

https://patchwork.ozlabs.org/cover/920005/


On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
This feature bit can be used by hypervisor to indicate virtio_net 
device to

act as a standby for another device with the same MAC address.

I tested this with a small change to the patch to mark the 
STANDBY feature 'true'

by default as i am using libvirt to start the VMs.
Is there a way to pass the newly added feature bit 'standby' to 
qemu via libvirt

XML file?

Signed-off-by: Sridhar Samudrala 
---
    hw/net/virtio-net.c | 2 ++
    include/standard-headers/linux/virtio_net.h | 3 +++
    2 files changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 90502fca7c..38b3140670 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
 true),
    DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, 
SPEED_UNKNOWN),
    DEFINE_PROP_STRING("duplex", VirtIONet, 
net_conf.duplex_str),
+    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,

+  false),
    DEFINE_PROP_END_OF_LIST(),
    };
diff --git a/include/standard-headers/linux/virtio_net.h 
b/include/standard-headers/linux/virtio_net.h

index e9f255ea3f..01ec09684c 100644

Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Jason Wang



On 2018年06月13日 08:20, Samudrala, Sridhar wrote:

On 6/12/2018 4:54 AM, Michael S. Tsirkin wrote:

On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:


On 2018年06月05日 20:33, Michael S. Tsirkin wrote:

I don't think this is sufficient.

If both primary and standby devices are present, a legacy guest 
without

support for the feature might see two devices with same mac and get
confused.

I think that we should only make primary visible after guest acked the
backup feature bit.
I think we want exactly the reverse? E.g fail the negotiation when 
guest

does not ack backup feature.

Otherwise legacy guest won't even have the chance to see primary 
device in

the guest.

That's by design.


And on reset or when backup is cleared in some other way, unplug the
primary.

What if guest does not support hotplug?

It shouldn't ack the backup feature then and it's a good point.
We should both document this and check kernel config has
hotplug support. Sridhar could you take a look pls?


Right now we select NET_FAILOVER when virtio-net is enabled, Should we 
select
PCI_HOTPLUG when NET_FAILOVER is enabled? 


Maybe I was wrong but it looks to me PCI_HOTPLUG is no sufficient since 
it depends on other to work e.g HOTPLUG_PCI_ACPI.


Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Samudrala, Sridhar

On 6/12/2018 4:54 AM, Michael S. Tsirkin wrote:

On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:


On 2018年06月05日 20:33, Michael S. Tsirkin wrote:

I don't think this is sufficient.

If both primary and standby devices are present, a legacy guest without
support for the feature might see two devices with same mac and get
confused.

I think that we should only make primary visible after guest acked the
backup feature bit.

I think we want exactly the reverse? E.g fail the negotiation when guest
does not ack backup feature.

Otherwise legacy guest won't even have the chance to see primary device in
the guest.

That's by design.


And on reset or when backup is cleared in some other way, unplug the
primary.

What if guest does not support hotplug?

It shouldn't ack the backup feature then and it's a good point.
We should both document this and check kernel config has
hotplug support. Sridhar could you take a look pls?


Right now we select NET_FAILOVER when virtio-net is enabled,  Should we select
PCI_HOTPLUG when NET_FAILOVER is enabled?




Something like the below will do the job:

Primary device is added with a special "primary-failover" flag.
A virtual machine is then initialized with just a standby virtio
device. Primary is not yet added.

A question is how to do the matching? Qemu knows nothing about e.g mac
address of a pass-through device I believe?

Supply a flag to the VFIO when it's added, this way QEMU will know.


Later QEMU detects that guest driver device set DRIVER_OK.
It then exposes the primary device to the guest, and triggers
a device addition event (hot-plug event) for it.

Do you mean we won't have primary device in the initial qemu cli?

No, that's not what I mean.

I mean management will supply a flag to VFIO and then


- VFIO defers exposing
primary to guest until guest acks the feature bit.
- When we see guest ack, initiate hotplug.
- On reboot, hide it again.
- On reset without reboot, request hot-unplug and on eject hide it again.



If QEMU detects guest driver removal, it initiates a hot-unplug sequence
to remove the primary driver.  In particular, if QEMU detects guest
re-initialization (e.g. by detecting guest reset) it immediately removes
the primary device.

I believe guest failover module should handle this gracefully?

It can't control enumeration order, if primary is enumerated before
standby then guest will load its driver and it's too late
when failover driver is loaded.


Does it mean that if guest unloads virtio-net driver, Qemu should automatically
unplug the associated primary device?

What about guest unloading/re-loading primary device driver?





Thanks


We can move some of this code to management as well, architecturally it
does not make too much sense but it might be easier implementation-wise.

HTH

On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:

Ping on this patch now that the kernel patches are accepted into davem's 
net-next tree.
https://patchwork.ozlabs.org/cover/920005/


On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:

This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.

I tested this with a small change to the patch to mark the STANDBY feature 
'true'
by default as i am using libvirt to start the VMs.
Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
XML file?

Signed-off-by: Sridhar Samudrala 
---
hw/net/virtio-net.c | 2 ++
include/standard-headers/linux/virtio_net.h | 3 +++
2 files changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 90502fca7c..38b3140670 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
 true),
DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
DEFINE_PROP_END_OF_LIST(),
};
diff --git a/include/standard-headers/linux/virtio_net.h 
b/include/standard-headers/linux/virtio_net.h
index e9f255ea3f..01ec09684c 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,9 @@
 * Steering */
#define VIRTIO_NET_F_CTRL_MAC_ADDR 23   /* Set MAC address */
+#define VIRTIO_NET_F_STANDBY  62/* Act as standby for another device
+ * with the same MAC.
+ */
#define VIRTIO_NET_F_SPEED_DUPLEX 63/* Device set linkspeed and 
duplex */
#ifndef VIRTIO_NET_NO_LEGACY


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Samudrala, Sridhar

On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote:

On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote:

On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:

On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:

On 2018年06月12日 01:26, Michael S. Tsirkin wrote:

On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:

This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.

I tested this with a small change to the patch to mark the STANDBY feature 
'true'
by default as i am using libvirt to start the VMs.
Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
XML file?

Signed-off-by: Sridhar Samudrala 

So I do not think we can commit to this interface: we
really need to control visibility of the primary device.

The problem is legacy guest won't use primary device at all if we do this.

And that's by design - I think it's the only way to ensure the
legacy guest isn't confused.

Yes. I think so. But i am not sure if Qemu is the right place to control the 
visibility
of the primary device. The primary device may not be specified as an argument 
to Qemu. It
may be plugged in later.
The cloud service provider is providing a feature that enables low latency 
datapath and live
migration capability.
A tenant can use this feature only if he is running a VM that has virtio-net 
with failover support.

Well live migration is there already. The new feature is low latency
data path.


we get live migration with just virtio.  But I meant live migration with VF as
primary device.



And it's the guest that needs failover support not the VM.


Isn't guest and VM synonymous?






I think Qemu should check if guest virtio-net supports this feature and provide 
a mechanism for
an upper layer indicating if the STANDBY feature is successfully negotiated or 
not.
The upper layer can then decide if it should hot plug a VF with the same MAC 
and manage the 2 links.
If VF is successfully hot plugged, virtio-net link should be disabled.

Did you even talk to upper layer management about it?
Just list the steps they need to do and you will see
that's a lot of machinery to manage by the upper layer.

What do we gain in flexibility? As far as I can see the
only gain is some resources saved for legacy VMs.

That's not a lot as tenant of the upper layer probably already has
at least a hunch that it's a new guest otherwise
why bother specifying the feature at all - you
save even more resources without it.



I am not all that familiar with how Qemu manages network devices. If we can do 
all the
required management of the primary/standby devices within Qemu, that is 
definitely a better
approach without upper layer involvement.






How about control the visibility of standby device?

Thanks

standy the always there to guarantee no downtime.


However just for testing purposes, we could add a non-stable
interface "x-standby" with the understanding that as any
x- prefix it's unstable and will be changed down the road,
likely in the next release.



---
hw/net/virtio-net.c | 2 ++
include/standard-headers/linux/virtio_net.h | 3 +++
2 files changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 90502fca7c..38b3140670 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
 true),
DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
DEFINE_PROP_END_OF_LIST(),
};
diff --git a/include/standard-headers/linux/virtio_net.h 
b/include/standard-headers/linux/virtio_net.h
index e9f255ea3f..01ec09684c 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,9 @@
 * Steering */
#define VIRTIO_NET_F_CTRL_MAC_ADDR 23   /* Set MAC address */
+#define VIRTIO_NET_F_STANDBY  62/* Act as standby for another device
+ * with the same MAC.
+ */
#define VIRTIO_NET_F_SPEED_DUPLEX 63/* Device set linkspeed and 
duplex */
#ifndef VIRTIO_NET_NO_LEGACY
--
2.14.3

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 1/3] compiler-gcc.h: add gnu_inline to all inline declarations

2018-06-12 Thread H. Peter Anvin
On 06/12/18 13:19, Nick Desaulniers wrote:
> 
> Oh, by [0] __GCC_STDC_INLINE__ is indeed actually the correct symbol
> to check for.  Clang does support that, so nothing to fix there.
> 
>> By the way, you should check clang against gcc's predefined macros by doing:
>>
>> gcc [options] -x c -Wp,-dM -E /dev/null | sort
>>
>> Options can change the predefined macros substantially, especially the, 
>> -std=, arch and -O options. -x c can be replaced with e.g. -x c++, 
>> objective-c, assembler-with-cpp etc.
> 
> Neat, I'll have to bookmark that incantation.  I can s/gcc/clang/ to
> get a similar list (which is how I know it supports
> __GCC_STDC_INLINE__).>

I bet that if you add -fgnu89-inlines then it *does* define
__GNUC_GNU_INLINE__.

> 
> Patch now becomes something like:
> 
> #ifdef __GNUC_GNU_INLINE__
> #define __gnu_inline __attribute__((gnu_inline))
> #else
> #define __gnu_inline
> #endif
> 
> #define inline inline __attribute__((always_inline, unused)) notrace
> __gnu_inline
> ...
> 
> Issues with that approach?
> 

s/__GNUC_GNU_INLINE__/__GNUC_STDC_INLINE__/

-hpa

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] Use sbitmap instead of percpu_ida

2018-06-12 Thread Jens Axboe
On 6/12/18 1:05 PM, Matthew Wilcox wrote:
> Removing the percpu_ida code nets over 400 lines of removal.  It's not
> as spectacular as deleting an entire architecture, but it's still a
> worthy reduction in lines of code.
> 
> Untested due to lack of hardware and not understanding how to set up a
> target platform.
> 
> Changes from v1:
>  - Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
>  - Abstracted out tag freeing as requested by Bart
>  - Made iscsit_wait_for_tag static as pointed out by 0day

Reviewed-by: Jens Axboe 

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] Use sbitmap instead of percpu_ida

2018-06-12 Thread Bart Van Assche
On Tue, 2018-06-12 at 12:05 -0700, Matthew Wilcox wrote:
> Removing the percpu_ida code nets over 400 lines of removal.  It's not
> as spectacular as deleting an entire architecture, but it's still a
> worthy reduction in lines of code.
> 
> Untested due to lack of hardware and not understanding how to set up a
> target platform.
> 
> Changes from v1:
>  - Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
>  - Abstracted out tag freeing as requested by Bart
>  - Made iscsit_wait_for_tag static as pointed out by 0day

For the whole series:

Reviewed-by: Bart Van Assche 

Thanks,

Bart.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/3] Remove percpu_ida

2018-06-12 Thread Matthew Wilcox
With its one user gone, remove the library code.

Signed-off-by: Matthew Wilcox 
---
 include/linux/percpu_ida.h |  83 -
 lib/Makefile   |   2 +-
 lib/percpu_ida.c   | 370 -
 3 files changed, 1 insertion(+), 454 deletions(-)
 delete mode 100644 include/linux/percpu_ida.h
 delete mode 100644 lib/percpu_ida.c

diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
deleted file mode 100644
index 07d78e4653bc..
--- a/include/linux/percpu_ida.h
+++ /dev/null
@@ -1,83 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __PERCPU_IDA_H__
-#define __PERCPU_IDA_H__
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct percpu_ida_cpu;
-
-struct percpu_ida {
-   /*
-* number of tags available to be allocated, as passed to
-* percpu_ida_init()
-*/
-   unsignednr_tags;
-   unsignedpercpu_max_size;
-   unsignedpercpu_batch_size;
-
-   struct percpu_ida_cpu __percpu  *tag_cpu;
-
-   /*
-* Bitmap of cpus that (may) have tags on their percpu freelists:
-* steal_tags() uses this to decide when to steal tags, and which cpus
-* to try stealing from.
-*
-* It's ok for a freelist to be empty when its bit is set - steal_tags()
-* will just keep looking - but the bitmap _must_ be set whenever a
-* percpu freelist does have tags.
-*/
-   cpumask_t   cpus_have_tags;
-
-   struct {
-   spinlock_t  lock;
-   /*
-* When we go to steal tags from another cpu (see steal_tags()),
-* we want to pick a cpu at random. Cycling through them every
-* time we steal is a bit easier and more or less equivalent:
-*/
-   unsignedcpu_last_stolen;
-
-   /* For sleeping on allocation failure */
-   wait_queue_head_t   wait;
-
-   /*
-* Global freelist - it's a stack where nr_free points to the
-* top
-*/
-   unsignednr_free;
-   unsigned*freelist;
-   } cacheline_aligned_in_smp;
-};
-
-/*
- * Number of tags we move between the percpu freelist and the global freelist 
at
- * a time
- */
-#define IDA_DEFAULT_PCPU_BATCH_MOVE32U
-/* Max size of percpu freelist, */
-#define IDA_DEFAULT_PCPU_SIZE  ((IDA_DEFAULT_PCPU_BATCH_MOVE * 3) / 2)
-
-int percpu_ida_alloc(struct percpu_ida *pool, int state);
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
-
-void percpu_ida_destroy(struct percpu_ida *pool);
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
-   unsigned long max_size, unsigned long batch_size);
-static inline int percpu_ida_init(struct percpu_ida *pool, unsigned long 
nr_tags)
-{
-   return __percpu_ida_init(pool, nr_tags, IDA_DEFAULT_PCPU_SIZE,
-   IDA_DEFAULT_PCPU_BATCH_MOVE);
-}
-
-typedef int (*percpu_ida_cb)(unsigned, void *);
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
-   void *data);
-
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu);
-#endif /* __PERCPU_IDA_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index 84c6dcb31fbb..f4722a7fa62c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -40,7 +40,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o 
random32.o \
 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
-percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
+percpu-refcount.o rhashtable.o reciprocal_div.o \
 once.o refcount.o usercopy.o errseq.o bucket_locks.o
 obj-$(CONFIG_STRING_SELFTEST) += test_string.o
 obj-y += string_helpers.o
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
deleted file mode 100644
index 9bbd9c5d375a..
--- a/lib/percpu_ida.c
+++ /dev/null
@@ -1,370 +0,0 @@
-/*
- * Percpu IDA library
- *
- * Copyright (C) 2013 Datera, Inc. Kent Overstreet
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct percpu_ida_cpu {
-

[PATCH 1/3] target: Abstract tag freeing

2018-06-12 Thread Matthew Wilcox
Introduce target_free_tag() and convert all drivers to use it.

Signed-off-by: Matthew Wilcox 
---
 drivers/scsi/qla2xxx/qla_target.c| 4 ++--
 drivers/target/iscsi/iscsi_target_util.c | 2 +-
 drivers/target/sbp/sbp_target.c  | 2 +-
 drivers/target/tcm_fc/tfc_cmd.c  | 4 ++--
 drivers/usb/gadget/function/f_tcm.c  | 2 +-
 drivers/vhost/scsi.c | 2 +-
 drivers/xen/xen-scsiback.c   | 4 +---
 include/target/target_core_base.h| 5 +
 8 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index b85c833099ff..05290966e630 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -3783,7 +3783,7 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
return;
}
cmd->jiffies_at_free = get_jiffies_64();
-   percpu_ida_free(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+   target_free_tag(sess->se_sess, >se_cmd);
 }
 EXPORT_SYMBOL(qlt_free_cmd);
 
@@ -4146,7 +4146,7 @@ static void __qlt_do_work(struct qla_tgt_cmd *cmd)
qlt_send_term_exchange(qpair, NULL, >atio, 1, 0);
 
qlt_decr_num_pend_cmds(vha);
-   percpu_ida_free(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+   target_free_tag(sess->se_sess, >se_cmd);
spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
 
spin_lock_irqsave(>tgt.sess_lock, flags);
diff --git a/drivers/target/iscsi/iscsi_target_util.c 
b/drivers/target/iscsi/iscsi_target_util.c
index 4435bf374d2d..7e98697cfb8e 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -711,7 +711,7 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
kfree(cmd->iov_data);
kfree(cmd->text_in_ptr);
 
-   percpu_ida_free(>se_sess->sess_tag_pool, se_cmd->map_tag);
+   target_free_tag(sess->se_sess, se_cmd);
 }
 EXPORT_SYMBOL(iscsit_release_cmd);
 
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index fb1003921d85..679ae29d25ab 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -1460,7 +1460,7 @@ static void sbp_free_request(struct sbp_target_request 
*req)
kfree(req->pg_tbl);
kfree(req->cmd_buf);
 
-   percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag);
+   target_free_tag(se_sess, se_cmd);
 }
 
 static void sbp_mgt_agent_process(struct work_struct *work)
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index ec372860106f..13e4efbe1ce7 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -92,7 +92,7 @@ static void ft_free_cmd(struct ft_cmd *cmd)
if (fr_seq(fp))
fc_seq_release(fr_seq(fp));
fc_frame_free(fp);
-   percpu_ida_free(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+   target_free_tag(sess->se_sess, >se_cmd);
ft_sess_put(sess);  /* undo get from lookup at recv */
 }
 
@@ -461,7 +461,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct 
fc_frame *fp)
cmd->sess = sess;
cmd->seq = fc_seq_assign(lport, fp);
if (!cmd->seq) {
-   percpu_ida_free(_sess->sess_tag_pool, tag);
+   target_free_tag(se_sess, >se_cmd);
goto busy;
}
cmd->req_frame = fp;/* hold frame during cmd */
diff --git a/drivers/usb/gadget/function/f_tcm.c 
b/drivers/usb/gadget/function/f_tcm.c
index d78dbb73bde8..9f670d9224b9 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1288,7 +1288,7 @@ static void usbg_release_cmd(struct se_cmd *se_cmd)
struct se_session *se_sess = se_cmd->se_sess;
 
kfree(cmd->data_buf);
-   percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag);
+   target_free_tag(se_sess, se_cmd);
 }
 
 static u32 usbg_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 7ad57094d736..70d35e696533 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -324,7 +324,7 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
}
 
vhost_scsi_put_inflight(tv_cmd->inflight);
-   percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag);
+   target_free_tag(se_sess, se_cmd);
 }
 
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index 7bc88fd43cfc..ec6635258ed8 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -1377,9 +1377,7 @@ static int scsiback_check_stop_free(struct se_cmd *se_cmd)
 
 static void scsiback_release_cmd(struct se_cmd *se_cmd)
 {
-   struct se_session *se_sess = se_cmd->se_sess;
-
-   percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag);
+   target_free_tag(se_cmd->se_sess, se_cmd);
 }
 
 static u32 scsiback_sess_get_index(struct se_session 

Re: [PATCH v4 1/3] compiler-gcc.h: add gnu_inline to all inline declarations

2018-06-12 Thread H. Peter Anvin
,Greg KH 
,jarkko.sakki...@linux.intel.com,jgr...@suse.com,Josh
 Poimboeuf ,Matthias Kaehlcke 
,thomas.lenda...@amd.com,Thiebaud Weksteen 
,mj...@google.com,j...@perches.com
From: h...@zytor.com
Message-ID: <191e4ebe-4cb2-4c8b-ab61-689a91ffe...@zytor.com>

On June 12, 2018 11:33:14 AM PDT, Nick Desaulniers  
wrote:
>On Fri, Jun 8, 2018 at 3:04 AM Sedat Dilek 
>wrote:
>>
>> On Fri, Jun 8, 2018 at 9:59 AM, Arnd Bergmann  wrote:
>> > On Thu, Jun 7, 2018 at 10:49 PM, Nick Desaulniers
>> >  wrote:
>> >> Functions marked extern inline do not emit an externally visible
>> >> function when the gnu89 C standard is used. Some KBUILD Makefiles
>> >> overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as
>without
>> >> an explicit C standard specified, the default is gnu11. Since c99,
>the
>> >> semantics of extern inline have changed such that an externally
>visible
>> >> function is always emitted. This can lead to multiple definition
>errors
>> >> of extern inline functions at link time of compilation units whose
>build
>> >> files have removed an explicit C standard compiler flag for users
>of GCC
>> >> 5.1+ or Clang.
>> >>
>> >> Signed-off-by: Nick Desaulniers 
>> >> Suggested-by: H. Peter Anvin 
>> >> Suggested-by: Joe Perches 
>> >
>> > I suspect this will break Geert's gcc-4.1.2, which I think doesn't
>have that
>> > attribute yet (4.1.3 or higher have it according to the
>documentation.
>> >
>> > It wouldn't be hard to work around that if we want to keep that
>version
>> > working, or we could decide that it's time to officially stop
>supporting
>> > that version, but we should probably decide on one or the other.
>
>Heh, so earlier we decided against compiler flags (-std=gnu89 or
>-fgnu89-inline) in preference to function attributes.  The function
>attribute is preferable as some of the Makefiles [accidentally?]
>overwrite KBUILD_CFLAGS, which is problematic for gcc 5.1 users as the
>implicit c standard used was changed to gnu11 from gnu89.  What's nice
>is that to support gcc 4.1 users, we simply don't need to add any
>attribute, as their implicit c standard is gnu89 which has the
>semantics for extern inline that we want.  I have a simple change to
>this patch that can support users of various gcc versions, see below:
>
>> Good point.
>> What is the minimum requirement of GCC version currently?
>> AFAICS x86/asm-goto support requires GCC >= 4.5?
>
>Yes, but that's only for x86, IIUC.  It seems the kernel may have
>different minimum required versions of GCC based on arch then?  That
>may be ok, but I'm not sure that's easy to keep track of without
>having it explicitly stated somewhere like the docs perhaps?
>
>> Just FYI...
>> ...saw the last days in upstream commits that kbuild/kconfig for
>> 4.18-rc1 offers possibilities to check for cc-version dependencies.
>
>Those will be helpful.  If we want to pursue compiler flags, which get
>set some Makefiles, then yes.  But I think a simpler change to my
>patch would be as below.
>
>It seems gcc did not get __has_attribute [0] until 5.1, but will
>define __GNUC_GNU_INLINE__ if supported. [1]  Unfortunately, Clang
>does not define __GNUC_GNU_INLINE__ [2].  So a proper feature test
>might look like:
>
>```
>#ifndef __has_attribute
>#define __has_attribute(x) 0
>#endif
>
>#if defined(__GNUC_GNU_INLINE__) || __has_attribute(gnu_inline)
>#define __gnu_inline __attribute__(gnu_inline)
>#endif
>
>#define inline inline __attribute__((always_inline, unused)) notrace
>__gnu_inline
>```
>
>Thoughts on this approach? I can send a v5 tomorrow if there's no
>major issues.  Feedback appreciated, as always.
>
>[0] https://clang.llvm.org/docs/LanguageExtensions.html#has-attribute
>[1]
>https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
>[2] https://bugs.llvm.org/show_bug.cgi?id=37784

Please fix clang. It isn't all that hard to fix.

However, __GCC_GNU_INLINE__ means you are in GNU mode by default, on gcc's new 
enough to have multiple misses.

The right thing to look for is __GCC_STDC_INLINE__ in which case you need the 
attribute.

By the way, you should check clang against gcc's predefined macros by doing:

gcc [options] -x c -Wp,-dM -E /dev/null | sort

Options can change the predefined macros substantially, especially the, -std=, 
arch and -O options. -x c can be replaced with e.g. -x c++, objective-c, 
assembler-with-cpp etc.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] Convert target drivers to use sbitmap

2018-06-12 Thread Matthew Wilcox
On Tue, Jun 12, 2018 at 04:32:03PM +, Bart Van Assche wrote:
> On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> > On Tue, Jun 12, 2018 at 03:22:42PM +, Bart Van Assche wrote:
> > > Please introduce functions in the target core for allocating and freeing 
> > > a tag
> > > instead of spreading the knowledge of how to allocate and free tags over 
> > > all
> > > target drivers.
> > 
> > I can't without doing an unreasonably large amount of work on drivers that
> > I have no way to test.  Some of the drivers have the se_cmd already; some
> > of them don't.  I'd be happy to introduce a common function for freeing
> > a tag.
> 
> Which target drivers are you referring to? If you are referring to the sbp 
> driver:
> I think that driver is dead and can be removed from the kernel tree. I even 
> don't
> know whether that driver ever has had any users other than the developer of 
> that
> driver.

For example tcm_fc:

tag = sbitmap_queue_get(_sess->sess_tag_pool, );
if (tag < 0)
goto busy;

cmd = &((struct ft_cmd *)se_sess->sess_cmd_map)[tag];

or qla2xxx:

tag = sbitmap_queue_get(_sess->sess_tag_pool, );
if (tag < 0)
return NULL;

cmd = &((struct qla_tgt_cmd *)se_sess->sess_cmd_map)[tag];

The core doesn't know at what offset from the pointer to store the tag
& cpu.  Only the individual drivers know their cmd layout.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] Convert target drivers to use sbitmap

2018-06-12 Thread Bart Van Assche
On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> On Tue, Jun 12, 2018 at 03:22:42PM +, Bart Van Assche wrote:
> > On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > > diff --git a/drivers/scsi/qla2xxx/qla_target.c 
> > > b/drivers/scsi/qla2xxx/qla_target.c
> > > index 025dc2d3f3de..cdf671c2af61 100644
> > > --- a/drivers/scsi/qla2xxx/qla_target.c
> > > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> > >   return;
> > >   }
> > >   cmd->jiffies_at_free = get_jiffies_64();
> > > - percpu_ida_free(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > > + sbitmap_queue_clear(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > > + cmd->se_cmd.map_cpu);
> > >  }
> > >  EXPORT_SYMBOL(qlt_free_cmd);
> > 
> > Please introduce functions in the target core for allocating and freeing a 
> > tag
> > instead of spreading the knowledge of how to allocate and free tags over all
> > target drivers.
> 
> I can't without doing an unreasonably large amount of work on drivers that
> I have no way to test.  Some of the drivers have the se_cmd already; some
> of them don't.  I'd be happy to introduce a common function for freeing
> a tag.

Which target drivers are you referring to? If you are referring to the sbp 
driver:
I think that driver is dead and can be removed from the kernel tree. I even 
don't
know whether that driver ever has had any users other than the developer of that
driver.

> > This looks weird to me. Shouldn't target code ignore signals instead of 
> > causing
> > tag allocation to fail if a signal is received?
> 
> It's what the current code did:
> 
> -   if (signal_pending_state(state, current)) {
> -   tag = -ERESTARTSYS;
> -   break;
> -   }
> 
> and the current callers literally indicate that they want signals:
> 
> drivers/infiniband/ulp/isert/ib_isert.c:cmd = 
> iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
> drivers/target/iscsi/iscsi_target.c:cmd = iscsit_allocate_cmd(conn, 
> TASK_INTERRUPTIBLE);

Right, the iSCSI target driver uses signals to wake up threads (see also the
send_sig() calls in the iSCSI target code).

Bart.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] Convert target drivers to use sbitmap

2018-06-12 Thread Matthew Wilcox
On Tue, Jun 12, 2018 at 03:22:42PM +, Bart Van Assche wrote:
> On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > diff --git a/drivers/scsi/qla2xxx/qla_target.c 
> > b/drivers/scsi/qla2xxx/qla_target.c
> > index 025dc2d3f3de..cdf671c2af61 100644
> > --- a/drivers/scsi/qla2xxx/qla_target.c
> > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> > return;
> > }
> > cmd->jiffies_at_free = get_jiffies_64();
> > -   percpu_ida_free(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > +   sbitmap_queue_clear(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > +   cmd->se_cmd.map_cpu);
> >  }
> >  EXPORT_SYMBOL(qlt_free_cmd);
> 
> Please introduce functions in the target core for allocating and freeing a tag
> instead of spreading the knowledge of how to allocate and free tags over all
> target drivers.

I can't without doing an unreasonably large amount of work on drivers that
I have no way to test.  Some of the drivers have the se_cmd already; some
of them don't.  I'd be happy to introduce a common function for freeing
a tag.

> > +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
> > +{
> > +   int tag = -1;
> > +   DEFINE_WAIT(wait);
> > +   struct sbq_wait_state *ws;
> > +
> > +   if (state == TASK_RUNNING)
> > +   return tag;
> > +
> > +   ws = _sess->sess_tag_pool.ws[0];
> > +   for (;;) {
> > +   prepare_to_wait_exclusive(>wait, , state);
> > +   if (signal_pending_state(state, current))
> > +   break;
> 
> This looks weird to me. Shouldn't target code ignore signals instead of 
> causing
> tag allocation to fail if a signal is received?

It's what the current code did:

-   if (signal_pending_state(state, current)) {
-   tag = -ERESTARTSYS;
-   break;
-   }

and the current callers literally indicate that they want signals:

drivers/infiniband/ulp/isert/ib_isert.c:cmd = iscsit_allocate_cmd(conn, 
TASK_INTERRUPTIBLE);
drivers/target/iscsi/iscsi_target.c:cmd = iscsit_allocate_cmd(conn, 
TASK_INTERRUPTIBLE);

(etc)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support

2018-06-12 Thread Stefan Hajnoczi
On Mon, Jun 11, 2018 at 03:37:00AM +, Liu, Changpeng wrote:
> 
> 
> > -Original Message-
> > From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> > Sent: Friday, June 8, 2018 6:20 PM
> > To: Liu, Changpeng 
> > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com;
> > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W
> > 
> > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands
> > support
> > 
> > On Thu, Jun 07, 2018 at 11:07:06PM +, Liu, Changpeng wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> > > > Sent: Thursday, June 7, 2018 9:10 PM
> > > > To: Liu, Changpeng 
> > > > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com;
> > > > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W
> > > > 
> > > > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES 
> > > > commands
> > > > support
> > > >
> > > > On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote:
> > > > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES 
> > > > > commands
> > > > > support, this will impact the performance when using SSD backend over
> > > > > file systems.
> > > > >
> > > > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to
> > > > > specification"(see https://github.com/oasis-tcs/virtio-spec) extended
> > > > > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES
> > > > > commands support.
> > > > >
> > > > > While here, using 16 bytes descriptor to describe one segment of 
> > > > > DISCARD
> > > > > or WRITE ZEROES commands, each command may contain one or more
> > > > decriptors.
> > > > >
> > > > > The following data structure shows the definition of one descriptor:
> > > > >
> > > > > struct virtio_blk_discard_write_zeroes {
> > > > > le64 sector;
> > > > > le32 num_sectors;
> > > > > le32 unmap;
> > > > > };
> > > > >
> > > > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES,
> > > > > filed 'num_sectors' means the number of sectors for DISCARD and WRITE
> > > > > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap'
> > > > > maybe used for WRITE ZEROES command with DISCARD enabled.
> > > > >
> > > > > We also extended the virtio-blk configuration space to let backend
> > > > > device put DISCARD and WRITE ZEROES configuration parameters.
> > > > >
> > > > > struct virtio_blk_config {
> > > > > [...]
> > > > >
> > > > > le32 max_discard_sectors;
> > > > > le32 max_discard_seg;
> > > > > le32 discard_sector_alignment;
> > > > > le32 max_write_zeroes_sectors;
> > > > > le32 max_write_zeroes_seg;
> > > > > u8 write_zeroes_may_unmap;
> > > > > }
> > > > >
> > > > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support 
> > > > > discard
> > > > > command, maximum discard sectors size in field 'max_discard_sectors' 
> > > > > and
> > > > > maximum discard segment number in field 'max_discard_seg'.
> > > > >
> > > > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write
> > > > > zeroes command, maximum write zeroes sectors size in field
> > > > > 'max_write_zeroes_sectors' and maximum write zeroes segment number in
> > > > > field 'max_write_zeroes_seg'.
> > > > >
> > > > > The parameters in the configuration space of the device field
> > > > > 'max_discard_sectors' and field 'discard_sector_alignment' are 
> > > > > expressed in
> > > > > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. 
> > > > > The
> > > > > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the
> > > > > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated.
> > > > >
> > > > > Signed-off-by: Changpeng Liu 
> > > > > ---
> > > > > CHANGELOG:
> > > > > v6: don't set T_OUT bit to discard and write zeroes commands.
> > > >
> > > > I don't see this in the patch...
> > > Yeah, do noting with DISCARD/WRITE ZEROES means no need to OR BLK_T_OUT
> > again.
> > > >
> > > > > @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > >   int qid = hctx->queue_num;
> > > > >   int err;
> > > > >   bool notify = false;
> > > > > + bool unmap = false;
> > > > >   u32 type;
> > > > >
> > > > >   BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> > > > > @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > >   case REQ_OP_FLUSH:
> > > > >   type = VIRTIO_BLK_T_FLUSH;
> > > > >   break;
> > > > > + case REQ_OP_DISCARD:
> > > > > + type = VIRTIO_BLK_T_DISCARD;
> > > > > + break;
> > > > > + case REQ_OP_WRITE_ZEROES:
> > > > > + type = VIRTIO_BLK_T_WRITE_ZEROES;
> > > > > + unmap = !(req->cmd_flags & REQ_NOUNMAP);
> > > > > + break;
> > > > >   case REQ_OP_SCSI_IN:
> > > > >   

Re: [PATCH 1/2] Convert target drivers to use sbitmap

2018-06-12 Thread Bart Van Assche
On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> diff --git a/drivers/scsi/qla2xxx/qla_target.c 
> b/drivers/scsi/qla2xxx/qla_target.c
> index 025dc2d3f3de..cdf671c2af61 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
>   return;
>   }
>   cmd->jiffies_at_free = get_jiffies_64();
> - percpu_ida_free(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> + sbitmap_queue_clear(>se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> + cmd->se_cmd.map_cpu);
>  }
>  EXPORT_SYMBOL(qlt_free_cmd);

Please introduce functions in the target core for allocating and freeing a tag
instead of spreading the knowledge of how to allocate and free tags over all
target drivers.
 
> +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
> +{
> + int tag = -1;
> + DEFINE_WAIT(wait);
> + struct sbq_wait_state *ws;
> +
> + if (state == TASK_RUNNING)
> + return tag;
> +
> + ws = _sess->sess_tag_pool.ws[0];
> + for (;;) {
> + prepare_to_wait_exclusive(>wait, , state);
> + if (signal_pending_state(state, current))
> + break;

This looks weird to me. Shouldn't target code ignore signals instead of causing
tag allocation to fail if a signal is received?

> + schedule();
> + tag = sbitmap_queue_get(_sess->sess_tag_pool, cpup);
> + }
> +
> + finish_wait(>wait, );
> + return tag;
> +}

Thanks,

Bart.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Michael S. Tsirkin
On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
> > I don't think this is sufficient.
> > 
> > If both primary and standby devices are present, a legacy guest without
> > support for the feature might see two devices with same mac and get
> > confused.
> > 
> > I think that we should only make primary visible after guest acked the
> > backup feature bit.
> 
> I think we want exactly the reverse? E.g fail the negotiation when guest
> does not ack backup feature.
> 
> Otherwise legacy guest won't even have the chance to see primary device in
> the guest.

That's by design.

> > 
> > And on reset or when backup is cleared in some other way, unplug the
> > primary.
> 
> What if guest does not support hotplug?

It shouldn't ack the backup feature then and it's a good point.
We should both document this and check kernel config has
hotplug support. Sridhar could you take a look pls?

> > 
> > Something like the below will do the job:
> > 
> > Primary device is added with a special "primary-failover" flag.
> > A virtual machine is then initialized with just a standby virtio
> > device. Primary is not yet added.
> 
> A question is how to do the matching? Qemu knows nothing about e.g mac
> address of a pass-through device I believe?

Supply a flag to the VFIO when it's added, this way QEMU will know.

> > 
> > Later QEMU detects that guest driver device set DRIVER_OK.
> > It then exposes the primary device to the guest, and triggers
> > a device addition event (hot-plug event) for it.
> 
> Do you mean we won't have primary device in the initial qemu cli?

No, that's not what I mean.

I mean management will supply a flag to VFIO and then


- VFIO defers exposing
primary to guest until guest acks the feature bit.
- When we see guest ack, initiate hotplug.
- On reboot, hide it again.
- On reset without reboot, request hot-unplug and on eject hide it again.


> > 
> > If QEMU detects guest driver removal, it initiates a hot-unplug sequence
> > to remove the primary driver.  In particular, if QEMU detects guest
> > re-initialization (e.g. by detecting guest reset) it immediately removes
> > the primary device.
> 
> I believe guest failover module should handle this gracefully?

It can't control enumeration order, if primary is enumerated before
standby then guest will load its driver and it's too late
when failover driver is loaded.

> Thanks
> 
> > 
> > We can move some of this code to management as well, architecturally it
> > does not make too much sense but it might be easier implementation-wise.
> > 
> > HTH
> > 
> > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
> > > Ping on this patch now that the kernel patches are accepted into davem's 
> > > net-next tree.
> > > https://patchwork.ozlabs.org/cover/920005/
> > > 
> > > 
> > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
> > > > This feature bit can be used by hypervisor to indicate virtio_net 
> > > > device to
> > > > act as a standby for another device with the same MAC address.
> > > > 
> > > > I tested this with a small change to the patch to mark the STANDBY 
> > > > feature 'true'
> > > > by default as i am using libvirt to start the VMs.
> > > > Is there a way to pass the newly added feature bit 'standby' to qemu 
> > > > via libvirt
> > > > XML file?
> > > > 
> > > > Signed-off-by: Sridhar Samudrala 
> > > > ---
> > > >hw/net/virtio-net.c | 2 ++
> > > >include/standard-headers/linux/virtio_net.h | 3 +++
> > > >2 files changed, 5 insertions(+)
> > > > 
> > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > index 90502fca7c..38b3140670 100644
> > > > --- a/hw/net/virtio-net.c
> > > > +++ b/hw/net/virtio-net.c
> > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > > > true),
> > > >DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, 
> > > > SPEED_UNKNOWN),
> > > >DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > > > +DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
> > > > VIRTIO_NET_F_STANDBY,
> > > > +  false),
> > > >DEFINE_PROP_END_OF_LIST(),
> > > >};
> > > > diff --git a/include/standard-headers/linux/virtio_net.h 
> > > > b/include/standard-headers/linux/virtio_net.h
> > > > index e9f255ea3f..01ec09684c 100644
> > > > --- a/include/standard-headers/linux/virtio_net.h
> > > > +++ b/include/standard-headers/linux/virtio_net.h
> > > > @@ -57,6 +57,9 @@
> > > >  * Steering */
> > > >#define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
> > > > +#define VIRTIO_NET_F_STANDBY  62/* Act as standby for another 
> > > > device
> > > > + * with the same MAC.
> > > > + */
> > > >#define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and 
> > > > 

Re: [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Michael S. Tsirkin
On Tue, Jun 05, 2018 at 03:09:26PM -0700, Siwei Liu wrote:
> The thing is cloud service provider might prefer sticking to the same
> level of service agreement (SLA) of keeping VF over migration,

That requirement is trivially satisfied by just a single VF :) If your
SLA does not require live migration, you should do just that.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-12 Thread Michael S. Tsirkin
On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote:
> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:
> > On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
> > > 
> > > On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
> > > > On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
> > > > > This feature bit can be used by hypervisor to indicate virtio_net 
> > > > > device to
> > > > > act as a standby for another device with the same MAC address.
> > > > > 
> > > > > I tested this with a small change to the patch to mark the STANDBY 
> > > > > feature 'true'
> > > > > by default as i am using libvirt to start the VMs.
> > > > > Is there a way to pass the newly added feature bit 'standby' to qemu 
> > > > > via libvirt
> > > > > XML file?
> > > > > 
> > > > > Signed-off-by: Sridhar Samudrala 
> > > > So I do not think we can commit to this interface: we
> > > > really need to control visibility of the primary device.
> > > The problem is legacy guest won't use primary device at all if we do this.
> > And that's by design - I think it's the only way to ensure the
> > legacy guest isn't confused.
> 
> Yes. I think so. But i am not sure if Qemu is the right place to control the 
> visibility
> of the primary device. The primary device may not be specified as an argument 
> to Qemu. It
> may be plugged in later.
> The cloud service provider is providing a feature that enables low latency 
> datapath and live
> migration capability.
> A tenant can use this feature only if he is running a VM that has virtio-net 
> with failover support.

Well live migration is there already. The new feature is low latency
data path.

And it's the guest that needs failover support not the VM.


> I think Qemu should check if guest virtio-net supports this feature and 
> provide a mechanism for
> an upper layer indicating if the STANDBY feature is successfully negotiated 
> or not.
> The upper layer can then decide if it should hot plug a VF with the same MAC 
> and manage the 2 links.
> If VF is successfully hot plugged, virtio-net link should be disabled.

Did you even talk to upper layer management about it?
Just list the steps they need to do and you will see
that's a lot of machinery to manage by the upper layer.

What do we gain in flexibility? As far as I can see the
only gain is some resources saved for legacy VMs.

That's not a lot as tenant of the upper layer probably already has
at least a hunch that it's a new guest otherwise
why bother specifying the feature at all - you
save even more resources without it.




> 
> > 
> > > How about control the visibility of standby device?
> > > 
> > > Thanks
> > standy the always there to guarantee no downtime.
> > 
> > > > However just for testing purposes, we could add a non-stable
> > > > interface "x-standby" with the understanding that as any
> > > > x- prefix it's unstable and will be changed down the road,
> > > > likely in the next release.
> > > > 
> > > > 
> > > > > ---
> > > > >hw/net/virtio-net.c | 2 ++
> > > > >include/standard-headers/linux/virtio_net.h | 3 +++
> > > > >2 files changed, 5 insertions(+)
> > > > > 
> > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > > index 90502fca7c..38b3140670 100644
> > > > > --- a/hw/net/virtio-net.c
> > > > > +++ b/hw/net/virtio-net.c
> > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > > > > true),
> > > > >DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, 
> > > > > SPEED_UNKNOWN),
> > > > >DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > > > > +DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
> > > > > VIRTIO_NET_F_STANDBY,
> > > > > +  false),
> > > > >DEFINE_PROP_END_OF_LIST(),
> > > > >};
> > > > > diff --git a/include/standard-headers/linux/virtio_net.h 
> > > > > b/include/standard-headers/linux/virtio_net.h
> > > > > index e9f255ea3f..01ec09684c 100644
> > > > > --- a/include/standard-headers/linux/virtio_net.h
> > > > > +++ b/include/standard-headers/linux/virtio_net.h
> > > > > @@ -57,6 +57,9 @@
> > > > >* Steering */
> > > > >#define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
> > > > > +#define VIRTIO_NET_F_STANDBY  62/* Act as standby for 
> > > > > another device
> > > > > + * with the same MAC.
> > > > > + */
> > > > >#define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed 
> > > > > and duplex */
> > > > >#ifndef VIRTIO_NET_NO_LEGACY
> > > > > -- 
> > > > > 2.14.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PULL] vhost: cleanups and fixes

2018-06-12 Thread Wei Wang

On 06/12/2018 09:59 AM, Linus Torvalds wrote:

On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin  wrote:

Maybe it will help to have GFP_NONE which will make any allocation
fail if attempted. Linus, would this address your comment?

It would definitely have helped me initially overlook that call chain.

But then when I started looking at the whole dma_map_page() thing, it
just raised my hackles again.

I would seriously suggest having a much simpler version for the "no
allocation, no dma mapping" case, so that it's *obvious* that that
never happens.

So instead of having virtio_balloon_send_free_pages() call a really
generic complex chain of functions that in _some_ cases can do memory
allocation, why isn't there a short-circuited "vitruque_add_datum()"
that is guaranteed to never do anything like that?

Honestly, I look at "add_one_sg()" and it really doesn't make me
happy. It looks hacky as hell. If I read the code right, you're really
trying to just queue up a simple tuple of , except you encode
it as a page pointer in order to play games with the SG logic, and
then you hmap that to the ring, except in this case it's all a fake
ring that just adds the cpu-physical address instead.

And to figuer that out, it's like five layers of indirection through
different helper functions that *can* do more generic things but in
this case don't.

And you do all of this from a core VM callback function with some
_really_ core VM locks held.

That makes no sense to me.

How about this:

  - get rid of all that code

  - make the core VM callback save the "these are the free memory
regions" in a fixed and limited array. One that DOES JUST THAT. No
crazy "SG IO dma-mapping function crap". Just a plain array of a fixed
size, pre-allocated for that virtio instance.

  - make it obvious that what you do in that sequence is ten
instructions and no allocations ("Look ma, I wrote a value to an array
and incremented the array idex, and I'M DONE")

  - then in that workqueue entry that you start *anyway*, you empty the
array and do all the crazy virtio stuff.

In fact, while at it, just simplify the VM interface too. Instead of
traversing a random number of buddy lists, just trraverse *one* - the
top-level one. Are you seriously ever going to shrink or mark
read-only anythin *but* something big enough to be in the maximum
order?

MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really*
want the balloon code to work on smaller things, particularly since
the whole interface is fundamentally racy and opportunistic to begin
with?


OK, I will implement a new version based on the suggestions. Thanks.

Best,
Wei

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization