Re: [RFC PATCH 01/17] virtio_ring: Avoid reading unneeded used length

2021-05-17 Thread Johannes Berg
On Mon, 2021-05-17 at 17:08 +0800, Xie Yongji wrote:
> If device driver doesn't need used length, it can pass a NULL
> len in virtqueue_get_buf()/virtqueue_get_buf_ctx().
> 

Well, actually, it can't right now?

You should probably rephrase this, saying something like

   Allow passing NULL to len in ... if the device driver doesn't need
   the length, and don't read it in that case.

or so?

>  Then
> we can avoid reading and validating the len value in used
> ring entries.

Not sure what that "validating" is about, I only see reading?

johannes

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: Work around frames incorrectly marked as gso

2019-12-09 Thread Johannes Berg


>   else if (sinfo->gso_type & SKB_GSO_TCPV6)
>   hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> - else
> - return -EINVAL;
> + else {
> + if (skb->data_len == 0)
> + hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE;


maybe use "else if" like in the before? yes, it's a different type of
condition, but braces look a bit unnatural here to me at least

johannes


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] custom virt-io support (in user-mode-linux)

2019-05-24 Thread Johannes Berg
On Thu, 2019-05-23 at 15:41 +0100, Stefan Hajnoczi wrote:

> > Also, not sure I understand how the client is started?
> 
> The vhost-user device backend can be launched before QEMU.  QEMU is
> started with the UNIX domain socket path so it can connect.

Hmm. I guess I'm confusing the terminology then - I thought qemu was the
server and the backend was the client that connects to it. If it's the
other way around, yeah, that makes things easier and certainly makes
sense (you could have a daemon that implements something).

> QEMU itself doesn't fork+exec the vhost-user device backend.  It's
> expected that the user or the management stack has already launched
> the vhost-user device backend.

Right.

> > Do you know if there's a sample client/server somewhere?
> 
> See contrib/libvhost-user in the QEMU source tree as well as the
> vhost-user-blk and vhost-user-scsi examples in the contrib/ directory.

Awesome, thanks!

johannes

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] custom virt-io support (in user-mode-linux)

2019-05-23 Thread Johannes Berg
Hi Stefan,

> Check out vhost-user.  It's a protocol for running a subset of a VIRTIO
> device's emulation in a separate process (usually just the data plane
> with the PCI emulation and other configuration/setup still handled by
> QEMU).

Yes, I think that's basically what I'm looking for.

> vhost-user uses a UNIX domain socket to pass file descriptors to shared
> memory regions.  This way the vhost-user device backend process has
> access to guest RAM.
> 
> This would be quite different for UML since my understanding is you
> don't have guest RAM but actual host Linux processes, but vhost-user
> might still give you ideas:
> https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/interop/vhost-user.rst;hb=HEAD

I guess it could still be implemented. Do you know how qemu actually
creates the shared memory region though? It's normal inside kernel
memory, no?

Ah, no, I see ... you have to give -mem-path and then the entire guest
memory isn't allocated as anonymous memory but from a file, and then you
can pass a descriptor to that file and effectively the client/slave of
vhost-user can access the whole guest's memory. Interesting. Next you're
going to want an IOMMU there, not just fake one, to protect against
hostile virt-user client? Not that I care :-)

UML in fact already maps all of its memory as a file (see arch/um/
create_mem_file()), so this part is easy.

What confused me at first is how all this talks about the ioctl()
interface, but I think I understand now - it's basically replacing
ioctl() with talking to a client.

So ultimately, it would actually seem "pretty simple".

Not sure I understand why there's all this stuff about multiple FDs,
once you have access to the guest's memory, why do you still need a
second (or more) FDs?

Also, not sure I understand how the client is started?

Once we have a connection, I guess as a client I'd at the very least
have to handle
 * VHOST_USER_GET_FEATURES and reply with the features, obviously, which
   is in this case just VHOST_USER_F_PROTOCOL_FEATURES?

 * VHOST_USER_SET_FEATURES - not sure, what would that do? the master
   sends VHOST_USER_GET_PROTOCOL_FEATURES which is with this feature
   bit? Especially since it says: "Slave that reported
   VHOST_USER_F_PROTOCOL_FEATURES must support this message even before
   VHOST_USER_SET_FEATURES was called."

 * VHOST_USER_GET_PROTOCOL_FEATURES - looking at the list, most I don't
   really need here, but OK

 * VHOST_USER_SET_OWNER - ??

 * VHOST_USER_RESET_OWNER - ignore

 * VHOST_USER_SET_MEM_TABLE - store the data/FDs for later use, I guess

 * VHOST_USER_SET_VRING_NUM - store the data for later use
 * VHOST_USER_SET_VRING_ADDR - dito
 * VHOST_USER_SET_VRING_BASE - dito
 * VHOST_USER_SET_VRING_KICK - start epoll on the FD (assuming there is
   one, give up if not?) - well, if ring is
   enabled?
 * VHOST_USER_SET_VRING_CALL - ...

I guess there might be better documentation on the ioctl interfaces?


Do you know if there's a sample client/server somewhere?

I guess we should implement the server in UML like it is in QEMU (unless
we can figure out how to virtualize the time with HPET or something in
QEMU) and then have our client and kernel driver for it...


Thanks a lot!

johannes

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: custom virt-io support (in user-mode-linux)

2019-05-22 Thread Johannes Berg
Hi Anton,

> > I'm thinking about adding virt-io support to UML, but the tricky part is
> > that while I want to use the virt-io basics (because it's a nice
> > interface from the 'inside'), I don't actually want the stock drivers
> > that are part of the kernel now (like virtio-net etc.) but rather
> > something that integrates with wifi (probably building on hwsim).

> I have looked at using virtio semantics in UML in the past around the 
> point when I wanted to make the recvmmsg/sendmmsg vector drivers common 
> in UML and QEMU. It is certainly possible,
> 
> I went for the native approach at the end though.

Hmm. I'm not sure what you mean by either :-)

Is there any commonality between the vector drivers? I can't see how
that'd work without a bus abstraction (like virtio) in qemu? I mean, the
kernel driver just calls uml_vector_sendmmsg(), which I'd say belongs
more to the 'outside world', but that can't really be done in qemu?

Ok, I guess then I see what you mean by 'native' though.

Similarly, of course, I can implement arbitrary virt-io devices - just
the kernel side doesn't call a function like uml_vector_sendmmsg()
directly, but instead the virt-io model, and the model calls the
function, which essentially is the same just with a (convenient)
abstraction layer.

But this leaves the fundamental fact the model code ("vector_user.c" or
a similar "virtio_user.c") is still part of the build.

I guess what I'm thinking is have something like "virtio_user_rpc.c"
that uses some appropriate RPC to interact with the real model. IOW,
rather than having all the model-specific logic actually be here (like
vector_user.c actually knows how to send network packets over a real
socket fd), try to call out to some RPC that contains the real model.

Now that I thought about it further, I guess my question boils down to
"did anyone ever think about doing RPC for Virt-IO instead of putting
the entire device model into the hypervisor/emulator/...".

johannes

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


custom virt-io support (in user-mode-linux)

2019-05-22 Thread Johannes Berg
Hi,

While my main interest is mostly in UML right now [1] I've CC'ed the
qemu and virtualization lists because something similar might actually
apply to other types of virtualization.

I'm thinking about adding virt-io support to UML, but the tricky part is
that while I want to use the virt-io basics (because it's a nice
interface from the 'inside'), I don't actually want the stock drivers
that are part of the kernel now (like virtio-net etc.) but rather
something that integrates with wifi (probably building on hwsim).

The 'inside' interfaces aren't really a problem - just have a specific
device ID for this, and then write a normal virtio kernel driver for it.

The 'outside' interfaces are where my thinking breaks down right now.

Looking at lkl, the outside is just all implemented in lkl as code that
gets linked to the library, so in UML terms it'd just be extra 'outside'
code like the timer handling or other netdev stuff we have today.
Looking at qemu, it's of course also implemented there, and then
interfaces with the real network, console abstraction, etc.

However, like I said above, I really need something very custom and not
likely to make it upstream to any project (because what point is that if
you cannot connect to the rest of the environment I'm building), so I'm
thinking that perhaps it should be possible to write an abstract
'outside' that lets you interact with it really from out-of-process?
Perhaps through some kind of shared memory segment? I think that gets
tricky with virt-io doing DMA (I think it does?) though, so that part
would have to be implemented directly and not out-of-process?

But really that's why I'm asking - is there a better way than to just
link the device-side virt-io code into the same binary (be it lkl lib,
uml binary, qemu binary)?

Thanks,
johannes

[1] Actually, I've considered using qemu, but it doesn't have
virtualized time and doesn't seem to support TSC virtualization. I guess
I could remove TSC from the guest CPU and add a virtualized HPET, but
I've yet to convince myself this works - on UML I made virtual time as a
prototype already:
https://patchwork.ozlabs.org/patch/1095814/
(though my real goal isn't to just skip time forward when the host goes
idle, it's to sync with other simulated components)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] average: change to declare precision, not factor

2017-02-15 Thread Johannes Berg
From: Johannes Berg 

Declaring the factor is counter-intuitive, and people are prone
to using small(-ish) values even when that makes no sense.

Change the DECLARE_EWMA() macro to take the fractional precision,
in bits, rather than a factor, and update all users.

While at it, add some more documentation.

Signed-off-by: Johannes Berg 
---
Unless I hear any objections, I will take this through my tree.
---
 drivers/net/virtio_net.c|  2 +-
 drivers/net/wireless/ath/ath5k/ath5k.h  |  2 +-
 drivers/net/wireless/ralink/rt2x00/rt2x00.h |  2 +-
 include/linux/average.h | 61 +++--
 net/batman-adv/types.h  |  2 +-
 net/mac80211/ieee80211_i.h  |  2 +-
 net/mac80211/sta_info.h |  6 +--
 7 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11e28530c83c..5e0cc9ec0f81 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -49,7 +49,7 @@ module_param(gso, bool, 0444);
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
  * term, transient changes in packet size.
  */
-DECLARE_EWMA(pkt_len, 1, 64)
+DECLARE_EWMA(pkt_len, 0, 64)
 
 /* With mergeable buffers we align buffer address and use the low bits to
  * encode its true size. Buffer size is up to 1 page so we need to align to
diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h 
b/drivers/net/wireless/ath/ath5k/ath5k.h
index 67fedb61fcc0..979800c6f57f 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ -1252,7 +1252,7 @@ struct ath5k_statistics {
 #define ATH5K_TXQ_LEN_MAX  (ATH_TXBUF / 4) /* bufs per queue */
 #define ATH5K_TXQ_LEN_LOW  (ATH5K_TXQ_LEN_MAX / 2) /* low mark */
 
-DECLARE_EWMA(beacon_rssi, 1024, 8)
+DECLARE_EWMA(beacon_rssi, 10, 8)
 
 /* Driver state associated with an instance of a device */
 struct ath5k_hw {
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00.h 
b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
index 26869b3bef45..340787894c69 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
@@ -257,7 +257,7 @@ struct link_qual {
int tx_failed;
 };
 
-DECLARE_EWMA(rssi, 1024, 8)
+DECLARE_EWMA(rssi, 10, 8)
 
 /*
  * Antenna settings about the currently active link.
diff --git a/include/linux/average.h b/include/linux/average.h
index d04aa58280de..7ddaf340d2ac 100644
--- a/include/linux/average.h
+++ b/include/linux/average.h
@@ -1,45 +1,66 @@
 #ifndef _LINUX_AVERAGE_H
 #define _LINUX_AVERAGE_H
 
-/* Exponentially weighted moving average (EWMA) */
+/*
+ * Exponentially weighted moving average (EWMA)
+ *
+ * This implements a fixed-precision EWMA algorithm, with both the
+ * precision and fall-off coefficient determined at compile-time
+ * and built into the generated helper funtions.
+ *
+ * The first argument to the macro is the name that will be used
+ * for the struct and helper functions.
+ *
+ * The second argument, the precision, expresses how many bits are
+ * used for the fractional part of the fixed-precision values.
+ *
+ * The third argument, the weight reciprocal, determines how the
+ * new values will be weighed vs. the old state, new values will
+ * get weight 1/weight_rcp and old values 1-1/weight_rcp. Note
+ * that this parameter must be a power of two for efficiency.
+ */
 
-#define DECLARE_EWMA(name, _factor, _weight)   \
+#define DECLARE_EWMA(name, _precision, _weight_rcp)\
struct ewma_##name {\
unsigned long internal; \
};  \
static inline void ewma_##name##_init(struct ewma_##name *e)\
{   \
-   BUILD_BUG_ON(!__builtin_constant_p(_factor));   \
-   BUILD_BUG_ON(!__builtin_constant_p(_weight));   \
-   BUILD_BUG_ON_NOT_POWER_OF_2(_factor);   \
-   BUILD_BUG_ON_NOT_POWER_OF_2(_weight);   \
+   BUILD_BUG_ON(!__builtin_constant_p(_precision));\
+   BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp));   \
+   /*  \
+* Even if you want to feed it just 0/1 you should have \
+* some bits for the non-fractional part... \
+*/ \
+   BUILD_BUG_ON((_precision) > 30);\
+   BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp);   \
e->intern

Re: [PATCH 10/10] virtio: enable endian checks for sparse builds

2016-12-06 Thread Johannes Berg
On Tue, 2016-12-06 at 17:41 +0200, Michael S. Tsirkin wrote:

> It seems that there should be a better way to do it,
> but this works too.

In some cases there might be:

> --- a/drivers/s390/virtio/Makefile
> +++ b/drivers/s390/virtio/Makefile
> @@ -6,6 +6,8 @@
>  # it under the terms of the GNU General Public License (version 2
> only)
>  # as published by the Free Software Foundation.
>  
> +CFLAGS_virtio_ccw.o += -D__CHECK_ENDIAN__
> +CFLAGS_kvm_virtio.o += -D__CHECK_ENDIAN__
>  s390-virtio-objs := virtio_ccw.o
>  ifdef CONFIG_S390_GUEST_OLD_TRANSPORT
>  s390-virtio-objs += kvm_virtio.o

Here you could use

ccflags-y += -D__CHECK_ENDIAN__

for example, or even

subdir-ccflags-y += -D__CHECK_ENDIAN__

(in case any subdirs ever get added here)

> --- a/drivers/vhost/Makefile
> +++ b/drivers/vhost/Makefile
> @@ -1,3 +1,4 @@
> +ccflags-y := -D__CHECK_ENDIAN__

Looks like you did that here and in some other places though - so
perhaps the s390 one was intentionally different?

> --- a/net/packet/Makefile
> +++ b/net/packet/Makefile
> @@ -2,6 +2,7 @@
>  # Makefile for the packet AF.
>  #
>  
> +ccflags-y := -D__CHECK_ENDIAN__

Technically this is slightly more than advertised, but I guess that
still makes sense if it's clean now.

johannes

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] virtio_net: use DECLARE_EWMA

2015-08-19 Thread Johannes Berg
Sorry, forgot to Cc Michael and the virt list - patch reproduced below
in full.

johannes

>From 22500fbcf722748fe3471b2e4c6156db47aade15 Mon Sep 17 00:00:00 2001
From: Johannes Berg 
Date: Wed, 19 Aug 2015 09:25:18 +0200
Subject: [PATCH] virtio_net: use DECLARE_EWMA

Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA()
to create static inlines. On x86/64 this results in no change in code
size for me, but reduces the struct receive_queue size by the two
unsigned long values that store the parameters.

Signed-off-by: Johannes Berg 
---
 drivers/net/Kconfig  |  1 -
 drivers/net/virtio_net.c | 22 +++---
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index e58468b02987..f50373645ab4 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -282,7 +282,6 @@ config VETH
 config VIRTIO_NET
tristate "Virtio network driver"
depends on VIRTIO
-   select AVERAGE
---help---
  This is the virtual network driver for virtio.  It can be used with
  lguest or QEMU based VMMs (like KVM or Xen).  Say Y or M.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 546b669fbfdd..9b950f2db836 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -40,12 +40,12 @@ module_param(gso, bool, 0444);
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN  128
 
-/* Weight used for the RX packet size EWMA. The average packet size is used to
- * determine the packet buffer size when refilling RX rings. As the entire RX
- * ring may be refilled at once, the weight is chosen so that the EWMA will be
- * insensitive to short-term, transient changes in packet size.
+/* RX packet size EWMA. The average packet size is used to determine the packet
+ * buffer size when refilling RX rings. As the entire RX ring may be refilled
+ * at once, the weight is chosen so that the EWMA will be insensitive to short-
+ * term, transient changes in packet size.
  */
-#define RECEIVE_AVG_WEIGHT 64
+DECLARE_EWMA(pkt_len, 1, 64)
 
 /* Minimum alignment for mergeable packet buffers. */
 #define MERGEABLE_BUFFER_ALIGN max(L1_CACHE_BYTES, 256)
@@ -85,7 +85,7 @@ struct receive_queue {
struct page *pages;
 
/* Average packet length for mergeable receive buffers. */
-   struct ewma mrg_avg_pkt_len;
+   struct ewma_pkt_len mrg_avg_pkt_len;
 
/* Page frag for packet buffer allocation. */
struct page_frag alloc_frag;
@@ -407,7 +407,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
}
}
 
-   ewma_add(&rq->mrg_avg_pkt_len, head_skb->len);
+   ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
return head_skb;
 
 err_skb:
@@ -600,12 +600,12 @@ static int add_recvbuf_big(struct virtnet_info *vi, 
struct receive_queue *rq,
return err;
 }
 
-static unsigned int get_mergeable_buf_len(struct ewma *avg_pkt_len)
+static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
 {
const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
unsigned int len;
 
-   len = hdr_len + clamp_t(unsigned int, ewma_read(avg_pkt_len),
+   len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
 }
@@ -1615,7 +1615,7 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
napi_hash_add(&vi->rq[i].napi);
 
sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
-   ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
+   ewma_pkt_len_init(&vi->rq[i].mrg_avg_pkt_len);
sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
}
 
@@ -1658,7 +1658,7 @@ static ssize_t mergeable_rx_buffer_size_show(struct 
netdev_rx_queue *queue,
 {
struct virtnet_info *vi = netdev_priv(queue->dev);
unsigned int queue_index = get_netdev_rx_queue_index(queue);
-   struct ewma *avg;
+   struct ewma_pkt_len *avg;
 
BUG_ON(queue_index >= vi->max_queue_pairs);
avg = &vi->rq[queue_index].mrg_avg_pkt_len;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] lguest: virtio-rng support

2008-05-19 Thread Johannes Berg

> +
> +/* Our random number generator device reads from /dev/urandom into the 
> Guest's
> + * input buffers.  The usual case is that the Guest doesn't want random 
> numbers
> + * and so has no buffers although /dev/urandom is still readable, whereas
> + * console is the reverse.

Is it really a good idea to use the hosts /dev/urandom to fill the
guests /dev/random?

johannes


signature.asc
Description: This is a digitally signed message part
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization