Re: [PATCH for-3.19] vhost/net: fix up num_buffers endian-ness

2015-02-04 Thread Al Viro
On Wed, Feb 04, 2015 at 01:59:42PM -0800, David Miller wrote:
 From: Michael S. Tsirkin m...@redhat.com
 Date: Tue, 3 Feb 2015 11:07:06 +0200
 
  In virtio 1.0 mode, when mergeable buffers are enabled on a big-endian
  host, num_buffers wasn't byte-swapped correctly, so large incoming
  packets got corrupted.
  
  To fix, fill it in within hdr - this also makes sure it gets
  the correct type.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Applied.

FWIW, conflict with iov_iter patches is trivial; once it shows up in
your kernel.org tree I can either rebase the series or just push
#merge-candidate - whichever you prefer.  Linus usually prefers the
second variant, but then he seriously dislikes rebasing of any sort;
I've no idea what your preferences are in that area...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 18/18] vhost: vhost_scsi_handle_vq() should just use copy_from_user()

2015-02-03 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

it has just verified that it asks no more than the length of the
first segment of iovec.

And with that the last user of stuff in lib/iovec.c is gone.
RIP.

Cc: Michael S. Tsirkin m...@redhat.com
Cc: Nicholas A. Bellinger n...@linux-iscsi.org
Cc: kvm@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/scsi.c |  2 +-
 include/linux/uio.h  |  2 --
 lib/Makefile |  2 +-
 lib/iovec.c  | 36 
 4 files changed, 2 insertions(+), 40 deletions(-)
 delete mode 100644 lib/iovec.c

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index d695b16..dc78d87 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1079,7 +1079,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
   req_size, vq-iov[0].iov_len);
break;
}
-   ret = memcpy_fromiovecend(req, vq-iov[0], 0, req_size);
+   ret = copy_from_user(req, vq-iov[0].iov_base, req_size);
if (unlikely(ret)) {
vq_err(vq, Faulted on virtio_scsi_cmd_req\n);
break;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 02bd8a9..3e0cb4e 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -135,6 +135,4 @@ static inline void iov_iter_reexpand(struct iov_iter *i, 
size_t count)
 size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 
-int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
-   int offset, int len);
 #endif
diff --git a/lib/Makefile b/lib/Makefile
index 3c3b30b..1071d06 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -24,7 +24,7 @@ obj-y += lockref.o
 
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
-gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
+gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
 bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
diff --git a/lib/iovec.c b/lib/iovec.c
deleted file mode 100644
index d8f17a9..000
--- a/lib/iovec.c
+++ /dev/null
@@ -1,36 +0,0 @@
-#include linux/uaccess.h
-#include linux/export.h
-#include linux/uio.h
-
-/*
- * Copy iovec to kernel. Returns -EFAULT on error.
- */
-
-int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
-   int offset, int len)
-{
-   /* No data? Done! */
-   if (len == 0)
-   return 0;
-
-   /* Skip over the finished iovecs */
-   while (offset = iov-iov_len) {
-   offset -= iov-iov_len;
-   iov++;
-   }
-
-   while (len  0) {
-   u8 __user *base = iov-iov_base + offset;
-   int copy = min_t(unsigned int, len, iov-iov_len - offset);
-
-   offset = 0;
-   if (copy_from_user(kdata, base, copy))
-   return -EFAULT;
-   len -= copy;
-   kdata += copy;
-   iov++;
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL(memcpy_fromiovecend);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 17/18] vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()

2015-02-03 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

Cc: Michael S. Tsirkin m...@redhat.com
Cc: kvm@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/net.c | 82 +++--
 include/linux/uio.h |  3 --
 lib/iovec.c | 26 -
 3 files changed, 23 insertions(+), 88 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index d86cc9b..e022cc4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -84,10 +84,6 @@ struct vhost_net_ubuf_ref {
 
 struct vhost_net_virtqueue {
struct vhost_virtqueue vq;
-   /* hdr is used to store the virtio header.
-* Since each iovec has = 1 byte length, we never need more than
-* header length entries to store the header. */
-   struct iovec hdr[sizeof(struct virtio_net_hdr_mrg_rxbuf)];
size_t vhost_hlen;
size_t sock_hlen;
/* vhost zerocopy support fields below: */
@@ -235,44 +231,6 @@ static bool vhost_sock_zcopy(struct socket *sock)
sock_flag(sock-sk, SOCK_ZEROCOPY);
 }
 
-/* Pop first len bytes from iovec. Return number of segments used. */
-static int move_iovec_hdr(struct iovec *from, struct iovec *to,
- size_t len, int iov_count)
-{
-   int seg = 0;
-   size_t size;
-
-   while (len  seg  iov_count) {
-   size = min(from-iov_len, len);
-   to-iov_base = from-iov_base;
-   to-iov_len = size;
-   from-iov_len -= size;
-   from-iov_base += size;
-   len -= size;
-   ++from;
-   ++to;
-   ++seg;
-   }
-   return seg;
-}
-/* Copy iovec entries for len bytes from iovec. */
-static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
-  size_t len, int iovcount)
-{
-   int seg = 0;
-   size_t size;
-
-   while (len  seg  iovcount) {
-   size = min(from-iov_len, len);
-   to-iov_base = from-iov_base;
-   to-iov_len = size;
-   len -= size;
-   ++from;
-   ++to;
-   ++seg;
-   }
-}
-
 /* In case of DMA done not in order in lower device driver for some reason.
  * upend_idx is used to track end of used idx, done_idx is used to track head
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -570,9 +528,9 @@ static void handle_rx(struct vhost_net *net)
.msg_controllen = 0,
.msg_flags = MSG_DONTWAIT,
};
-   struct virtio_net_hdr_mrg_rxbuf hdr = {
-   .hdr.flags = 0,
-   .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
+   struct virtio_net_hdr hdr = {
+   .flags = 0,
+   .gso_type = VIRTIO_NET_HDR_GSO_NONE
};
size_t total_len = 0;
int err, mergeable;
@@ -580,6 +538,7 @@ static void handle_rx(struct vhost_net *net)
size_t vhost_hlen, sock_hlen;
size_t vhost_len, sock_len;
struct socket *sock;
+   struct iov_iter fixup;
 
mutex_lock(vq-mutex);
sock = vq-private_data;
@@ -624,14 +583,19 @@ static void handle_rx(struct vhost_net *net)
break;
}
/* We don't need to be notified again. */
-   if (unlikely((vhost_hlen)))
-   /* Skip header. TODO: support TSO. */
-   move_iovec_hdr(vq-iov, nvq-hdr, vhost_hlen, in);
-   else
-   /* Copy the header for use in VIRTIO_NET_F_MRG_RXBUF:
-* needed because recvmsg can modify msg_iov. */
-   copy_iovec_hdr(vq-iov, nvq-hdr, sock_hlen, in);
-   iov_iter_init(msg.msg_iter, READ, vq-iov, in, sock_len);
+   iov_iter_init(msg.msg_iter, READ, vq-iov, in, vhost_len);
+   fixup = msg.msg_iter;
+   if (unlikely((vhost_hlen))) {
+   /* We will supply the header ourselves
+* TODO: support TSO.
+*/
+   iov_iter_advance(msg.msg_iter, vhost_hlen);
+   } else {
+   /* It'll come from socket; we'll need to patch
+* -num_buffers over if VIRTIO_NET_F_MRG_RXBUF
+*/
+   iov_iter_advance(fixup, sizeof(hdr));
+   }
err = sock-ops-recvmsg(NULL, sock, msg,
 sock_len, MSG_DONTWAIT | MSG_TRUNC);
/* Userspace might have consumed the packet meanwhile:
@@ -643,18 +607,18 @@ static void handle_rx(struct vhost_net *net)
vhost_discard_vq_desc(vq, headcount);
continue;
}
+   /* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR

[PATCH v3 16/18] vhost: don't bother with copying iovec in handle_tx()

2015-02-03 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

just advance the msg.msg_iter and be done with that.

Cc: Michael S. Tsirkin m...@redhat.com
Cc: kvm@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/net.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 6906f76..d86cc9b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -336,7 +336,7 @@ static void handle_tx(struct vhost_net *net)
 {
struct vhost_net_virtqueue *nvq = net-vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = nvq-vq;
-   unsigned out, in, s;
+   unsigned out, in;
int head;
struct msghdr msg = {
.msg_name = NULL,
@@ -395,16 +395,17 @@ static void handle_tx(struct vhost_net *net)
break;
}
/* Skip header. TODO: support TSO. */
-   s = move_iovec_hdr(vq-iov, nvq-hdr, hdr_size, out);
len = iov_length(vq-iov, out);
iov_iter_init(msg.msg_iter, WRITE, vq-iov, out, len);
+   iov_iter_advance(msg.msg_iter, hdr_size);
/* Sanity check */
-   if (!len) {
+   if (!iov_iter_count(msg.msg_iter)) {
vq_err(vq, Unexpected header len for TX: 
   %zd expected %zd\n,
-  iov_length(nvq-hdr, s), hdr_size);
+  len, hdr_size);
break;
}
+   len = iov_iter_count(msg.msg_iter);
 
zcopy_used = zcopy  len = VHOST_GOODCOPY_LEN
(nvq-upend_idx + 1) % UIO_MAXIOV !=
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 15/18] vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()

2015-02-03 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

Cc: Michael S. Tsirkin m...@redhat.com
Cc: kvm@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/vhost.c |  6 --
 include/linux/uio.h   |  1 -
 lib/iovec.c   | 25 -
 3 files changed, 4 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index cb807d0..2ee2826 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1125,6 +1125,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
struct vring_desc desc;
unsigned int i = 0, count, found = 0;
u32 len = vhost32_to_cpu(vq, indirect-len);
+   struct iov_iter from;
int ret;
 
/* Sanity check */
@@ -1142,6 +1143,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
vq_err(vq, Translation failure %d in indirect.\n, ret);
return ret;
}
+   iov_iter_init(from, READ, vq-indirect, ret, len);
 
/* We will use the result as an address to read from, so most
 * architectures only need a compiler barrier here. */
@@ -1164,8 +1166,8 @@ static int get_indirect(struct vhost_virtqueue *vq,
   i, count);
return -EINVAL;
}
-   if (unlikely(memcpy_fromiovec((unsigned char *)desc,
- vq-indirect, sizeof desc))) {
+   if (unlikely(copy_from_iter(desc, sizeof(desc), from) !=
+sizeof(desc))) {
vq_err(vq, Failed indirect descriptor: idx %d, %zx\n,
   i, (size_t)vhost64_to_cpu(vq, indirect-addr) + 
i * sizeof desc);
return -EINVAL;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 1c5e453..af3439f 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -135,7 +135,6 @@ static inline void iov_iter_reexpand(struct iov_iter *i, 
size_t count)
 size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 
-int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len);
 int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
int offset, int len);
 int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
diff --git a/lib/iovec.c b/lib/iovec.c
index 2d99cb4..4a90875 100644
--- a/lib/iovec.c
+++ b/lib/iovec.c
@@ -3,31 +3,6 @@
 #include linux/uio.h
 
 /*
- * Copy iovec to kernel. Returns -EFAULT on error.
- *
- * Note: this modifies the original iovec.
- */
-
-int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len)
-{
-   while (len  0) {
-   if (iov-iov_len) {
-   int copy = min_t(unsigned int, len, iov-iov_len);
-   if (copy_from_user(kdata, iov-iov_base, copy))
-   return -EFAULT;
-   len -= copy;
-   kdata += copy;
-   iov-iov_base += copy;
-   iov-iov_len -= copy;
-   }
-   iov++;
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL(memcpy_fromiovec);
-
-/*
  * Copy kernel to iovec. Returns -EFAULT on error.
  */
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 17/18] vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()

2015-02-03 Thread Al Viro
On Tue, Feb 03, 2015 at 04:21:54PM +0100, Michael S. Tsirkin wrote:
  Hmm having second thoughts here.
  Will this modify the iov in vq-iov?
  If yes, how will recvmsg fill it?
 
 OK that was just me misunderstanding what the
 function does. As it doesn't modify the iovec itself,
 I think there's no issue, my ack stands.

That, BTW, was the point of switching -sendmsg() and -recvmsg() to
iov_iter primitives - not only do memcpy_toiovec() et.al. change the
iovec, the way it's done was protocol-dependent, up to and including
the things like have sent 60 caller-supplied bytes, iovec modified
with only 6 bytes consumed.  What's worse, on TCP sendmsg we usually
left iovec unchanged, but sometimes it got drained.  Granted, that
happened only after setsockopt() playing caller with TCP_REPAIR (i.e.
checkpoint/restart stuff), but...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v3 5/9] vhost/scsi: Add ANY_LAYOUT vhost_virtqueue callback

2015-02-03 Thread Al Viro
On Tue, Feb 03, 2015 at 06:29:59AM +, Nicholas A. Bellinger wrote:
 +  * Copy over the virtio-scsi request header, which when
 +  * ANY_LAYOUT is enabled may span multiple iovecs, or a
 +  * single iovec may contain both the header + outgoing
 +  * WRITE payloads.
 +  *
 +  * copy_from_iter() is modifying the iovecs as copies over
 +  * req_size bytes into req, so the returned out_iter.iov[0]
 +  * will contain the correct start + offset of the outgoing
 +  * WRITE payload, if DMA_TO_DEVICE is set.

It does no such thing.  What it does, though, is changing out_iter so
that subsequent copy_from_iter() will return the data you want.  Note
that out_iter.iov[0] will contain the correct _segment_ of that vector,
with the data you want at out_iter.iov_offset bytes from the beginning
of that segment.  .iov may come to point to subsequent segments and .iov_offset
keeps changing, but segments themselves are never changed.

 +  */
 + iov_iter_init(out_iter, READ, vq-iov[0], out,
  WRITE, please - as in this is
the source of some write, we'll be copying _from_ it.  READ would be
destination of some read, we'll be copying into it.

 +  (data_direction == DMA_TO_DEVICE) ?
 +   req_size + exp_data_len : req_size);
 +
 + ret = copy_from_iter(req, req_size, out_iter);

...

 + /*
 +  * Determine start of T10_PI or data payload iovec in ANY_LAYOUT
 +  * mode based upon data_direction.
 +  *
 +  * For DMA_TO_DEVICE, this is iov_out from copy_from_iter()
 +  * with the already recalculated iov_base + iov_len.

ITYM this is out_iter, which is already pointing to the right place

AFAICS, the actual use is correct, it's just that the comments are confused.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 17/18] vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()

2015-02-02 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

Cc: Michael S. Tsirkin m...@redhat.com
Cc: kvm@vger.kernel.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/net.c | 79 ++---
 include/linux/uio.h |  3 --
 lib/iovec.c | 26 --
 3 files changed, 20 insertions(+), 88 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index d86cc9b..73c0ebf 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -84,10 +84,6 @@ struct vhost_net_ubuf_ref {
 
 struct vhost_net_virtqueue {
struct vhost_virtqueue vq;
-   /* hdr is used to store the virtio header.
-* Since each iovec has = 1 byte length, we never need more than
-* header length entries to store the header. */
-   struct iovec hdr[sizeof(struct virtio_net_hdr_mrg_rxbuf)];
size_t vhost_hlen;
size_t sock_hlen;
/* vhost zerocopy support fields below: */
@@ -235,44 +231,6 @@ static bool vhost_sock_zcopy(struct socket *sock)
sock_flag(sock-sk, SOCK_ZEROCOPY);
 }
 
-/* Pop first len bytes from iovec. Return number of segments used. */
-static int move_iovec_hdr(struct iovec *from, struct iovec *to,
- size_t len, int iov_count)
-{
-   int seg = 0;
-   size_t size;
-
-   while (len  seg  iov_count) {
-   size = min(from-iov_len, len);
-   to-iov_base = from-iov_base;
-   to-iov_len = size;
-   from-iov_len -= size;
-   from-iov_base += size;
-   len -= size;
-   ++from;
-   ++to;
-   ++seg;
-   }
-   return seg;
-}
-/* Copy iovec entries for len bytes from iovec. */
-static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
-  size_t len, int iovcount)
-{
-   int seg = 0;
-   size_t size;
-
-   while (len  seg  iovcount) {
-   size = min(from-iov_len, len);
-   to-iov_base = from-iov_base;
-   to-iov_len = size;
-   len -= size;
-   ++from;
-   ++to;
-   ++seg;
-   }
-}
-
 /* In case of DMA done not in order in lower device driver for some reason.
  * upend_idx is used to track end of used idx, done_idx is used to track head
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -570,9 +528,9 @@ static void handle_rx(struct vhost_net *net)
.msg_controllen = 0,
.msg_flags = MSG_DONTWAIT,
};
-   struct virtio_net_hdr_mrg_rxbuf hdr = {
-   .hdr.flags = 0,
-   .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
+   struct virtio_net_hdr hdr = {
+   .flags = 0,
+   .gso_type = VIRTIO_NET_HDR_GSO_NONE
};
size_t total_len = 0;
int err, mergeable;
@@ -580,6 +538,7 @@ static void handle_rx(struct vhost_net *net)
size_t vhost_hlen, sock_hlen;
size_t vhost_len, sock_len;
struct socket *sock;
+   struct iov_iter fixup;
 
mutex_lock(vq-mutex);
sock = vq-private_data;
@@ -624,14 +583,17 @@ static void handle_rx(struct vhost_net *net)
break;
}
/* We don't need to be notified again. */
-   if (unlikely((vhost_hlen)))
-   /* Skip header. TODO: support TSO. */
-   move_iovec_hdr(vq-iov, nvq-hdr, vhost_hlen, in);
-   else
-   /* Copy the header for use in VIRTIO_NET_F_MRG_RXBUF:
-* needed because recvmsg can modify msg_iov. */
-   copy_iovec_hdr(vq-iov, nvq-hdr, sock_hlen, in);
-   iov_iter_init(msg.msg_iter, READ, vq-iov, in, sock_len);
+   iov_iter_init(msg.msg_iter, READ, vq-iov, in, vhost_len);
+   fixup = msg.msg_iter;
+   if (unlikely((vhost_hlen))) {
+   /* We will supply the header ourselves
+* TODO: support TSO. */
+   iov_iter_advance(msg.msg_iter, vhost_hlen);
+   } else {
+   /* It'll come from socket; we'll need to patch
+* -num_buffers over if VIRTIO_NET_F_MRG_RXBUF */
+   iov_iter_advance(fixup, sizeof(hdr));
+   }
err = sock-ops-recvmsg(NULL, sock, msg,
 sock_len, MSG_DONTWAIT | MSG_TRUNC);
/* Userspace might have consumed the packet meanwhile:
@@ -643,18 +605,17 @@ static void handle_rx(struct vhost_net *net)
vhost_discard_vq_desc(vq, headcount);
continue;
}
+   /* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
if (unlikely(vhost_hlen) 
-   memcpy_toiovecend(nvq-hdr, (unsigned char

[PATCH v2 18/18] vhost: vhost_scsi_handle_vq() should just use copy_from_user()

2015-02-02 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

it has just verified that it asks no more than the length of the
first segment of iovec.

And with that the last user of stuff in lib/iovec.c is gone.
RIP.

Cc: Michael S. Tsirkin m...@redhat.com
Cc: Nicholas A. Bellinger n...@linux-iscsi.org
Cc: kvm@vger.kernel.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/scsi.c |  2 +-
 include/linux/uio.h  |  2 --
 lib/Makefile |  2 +-
 lib/iovec.c  | 36 
 4 files changed, 2 insertions(+), 40 deletions(-)
 delete mode 100644 lib/iovec.c

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index d695b16..dc78d87 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1079,7 +1079,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
   req_size, vq-iov[0].iov_len);
break;
}
-   ret = memcpy_fromiovecend(req, vq-iov[0], 0, req_size);
+   ret = copy_from_user(req, vq-iov[0].iov_base, req_size);
if (unlikely(ret)) {
vq_err(vq, Faulted on virtio_scsi_cmd_req\n);
break;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 02bd8a9..3e0cb4e 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -135,6 +135,4 @@ static inline void iov_iter_reexpand(struct iov_iter *i, 
size_t count)
 size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 
-int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
-   int offset, int len);
 #endif
diff --git a/lib/Makefile b/lib/Makefile
index 3c3b30b..1071d06 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -24,7 +24,7 @@ obj-y += lockref.o
 
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
-gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
+gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
 bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
diff --git a/lib/iovec.c b/lib/iovec.c
deleted file mode 100644
index d8f17a9..000
--- a/lib/iovec.c
+++ /dev/null
@@ -1,36 +0,0 @@
-#include linux/uaccess.h
-#include linux/export.h
-#include linux/uio.h
-
-/*
- * Copy iovec to kernel. Returns -EFAULT on error.
- */
-
-int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
-   int offset, int len)
-{
-   /* No data? Done! */
-   if (len == 0)
-   return 0;
-
-   /* Skip over the finished iovecs */
-   while (offset = iov-iov_len) {
-   offset -= iov-iov_len;
-   iov++;
-   }
-
-   while (len  0) {
-   u8 __user *base = iov-iov_base + offset;
-   int copy = min_t(unsigned int, len, iov-iov_len - offset);
-
-   offset = 0;
-   if (copy_from_user(kdata, base, copy))
-   return -EFAULT;
-   len -= copy;
-   kdata += copy;
-   iov++;
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL(memcpy_fromiovecend);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/18] vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()

2015-02-02 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

Cc: Michael S. Tsirkin m...@redhat.com
Cc: kvm@vger.kernel.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/vhost.c |  6 --
 include/linux/uio.h   |  1 -
 lib/iovec.c   | 25 -
 3 files changed, 4 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index cb807d0..2ee2826 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1125,6 +1125,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
struct vring_desc desc;
unsigned int i = 0, count, found = 0;
u32 len = vhost32_to_cpu(vq, indirect-len);
+   struct iov_iter from;
int ret;
 
/* Sanity check */
@@ -1142,6 +1143,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
vq_err(vq, Translation failure %d in indirect.\n, ret);
return ret;
}
+   iov_iter_init(from, READ, vq-indirect, ret, len);
 
/* We will use the result as an address to read from, so most
 * architectures only need a compiler barrier here. */
@@ -1164,8 +1166,8 @@ static int get_indirect(struct vhost_virtqueue *vq,
   i, count);
return -EINVAL;
}
-   if (unlikely(memcpy_fromiovec((unsigned char *)desc,
- vq-indirect, sizeof desc))) {
+   if (unlikely(copy_from_iter(desc, sizeof(desc), from) !=
+sizeof(desc))) {
vq_err(vq, Failed indirect descriptor: idx %d, %zx\n,
   i, (size_t)vhost64_to_cpu(vq, indirect-addr) + 
i * sizeof desc);
return -EINVAL;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 1c5e453..af3439f 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -135,7 +135,6 @@ static inline void iov_iter_reexpand(struct iov_iter *i, 
size_t count)
 size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct 
iov_iter *i);
 
-int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len);
 int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
int offset, int len);
 int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
diff --git a/lib/iovec.c b/lib/iovec.c
index 2d99cb4..4a90875 100644
--- a/lib/iovec.c
+++ b/lib/iovec.c
@@ -3,31 +3,6 @@
 #include linux/uio.h
 
 /*
- * Copy iovec to kernel. Returns -EFAULT on error.
- *
- * Note: this modifies the original iovec.
- */
-
-int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len)
-{
-   while (len  0) {
-   if (iov-iov_len) {
-   int copy = min_t(unsigned int, len, iov-iov_len);
-   if (copy_from_user(kdata, iov-iov_base, copy))
-   return -EFAULT;
-   len -= copy;
-   kdata += copy;
-   iov-iov_base += copy;
-   iov-iov_len -= copy;
-   }
-   iov++;
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL(memcpy_fromiovec);
-
-/*
  * Copy kernel to iovec. Returns -EFAULT on error.
  */
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 16/18] vhost: don't bother with copying iovec in handle_tx()

2015-02-02 Thread Al Viro
From: Al Viro v...@zeniv.linux.org.uk

just advance the msg.msg_iter and be done with that.

Cc: Michael S. Tsirkin m...@redhat.com
Cc: kvm@vger.kernel.org
Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
 drivers/vhost/net.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 6906f76..d86cc9b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -336,7 +336,7 @@ static void handle_tx(struct vhost_net *net)
 {
struct vhost_net_virtqueue *nvq = net-vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = nvq-vq;
-   unsigned out, in, s;
+   unsigned out, in;
int head;
struct msghdr msg = {
.msg_name = NULL,
@@ -395,16 +395,17 @@ static void handle_tx(struct vhost_net *net)
break;
}
/* Skip header. TODO: support TSO. */
-   s = move_iovec_hdr(vq-iov, nvq-hdr, hdr_size, out);
len = iov_length(vq-iov, out);
iov_iter_init(msg.msg_iter, WRITE, vq-iov, out, len);
+   iov_iter_advance(msg.msg_iter, hdr_size);
/* Sanity check */
-   if (!len) {
+   if (!iov_iter_count(msg.msg_iter)) {
vq_err(vq, Unexpected header len for TX: 
   %zd expected %zd\n,
-  iov_length(nvq-hdr, s), hdr_size);
+  len, hdr_size);
break;
}
+   len = iov_iter_count(msg.msg_iter);
 
zcopy_used = zcopy  len = VHOST_GOODCOPY_LEN
(nvq-upend_idx + 1) % UIO_MAXIOV !=
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v2 01/11] lib/iovec: Add memcpy_fromiovec_out library function

2015-02-01 Thread Al Viro
On Mon, Feb 02, 2015 at 04:06:24AM +, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 This patch adds a new memcpy_fromiovec_out() library function which modifies
 the passed *iov following memcpy_fromiovec(), but also returns the next 
 current
 iovec pointer via **iov_out.
 
 This is useful for vhost ANY_LAYOUT support when guests are allowed to 
 generate
 incoming virtio request headers combined with subsequent SGL payloads into a
 single iovec.

Please, don't.  Just use copy_from_iter(); you are open-coding an uglier
variant of such.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v2 01/11] lib/iovec: Add memcpy_fromiovec_out library function

2015-02-01 Thread Al Viro
On Mon, Feb 02, 2015 at 04:44:12AM +, Al Viro wrote:
 On Mon, Feb 02, 2015 at 04:06:24AM +, Nicholas A. Bellinger wrote:
  From: Nicholas Bellinger n...@linux-iscsi.org
  
  This patch adds a new memcpy_fromiovec_out() library function which modifies
  the passed *iov following memcpy_fromiovec(), but also returns the next 
  current
  iovec pointer via **iov_out.
  
  This is useful for vhost ANY_LAYOUT support when guests are allowed to 
  generate
  incoming virtio request headers combined with subsequent SGL payloads into a
  single iovec.
 
 Please, don't.  Just use copy_from_iter(); you are open-coding an uglier
 variant of such.

PS: see vfs.git#for-davem (or postings on netdev with the same stuff).
I really hope to bury memcpy_...iovec...() crap for good; please, don't
reintroduce more of it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio-pci: Use fdget() rather than eventfd_fget()

2013-08-20 Thread Al Viro
On Tue, Aug 20, 2013 at 01:18:07PM -0600, Alex Williamson wrote:
 eventfd_fget() tests to see whether the file is an eventfd file, which
 we then immediately pass to eventfd_ctx_fileget(), which again tests
 whether the file is an eventfd file.  Simplify slightly by using
 fdget() so that we only test that we're looking at an eventfd once.
 fget() could also be used, but fdget() makes use of fget_light() for
 another slight optimization.

Umm...

 @@ -210,8 +210,8 @@ fail:
   if (ctx  !IS_ERR(ctx))
   eventfd_ctx_put(ctx);
  
 - if (file  !IS_ERR(file))
 - fput(file);
 + if (irqfd.file)
 + fdput(irqfd);
  
   kfree(virqfd);

IMO it's a bad style; you have three failure exits leading here, and those
ifs are nothing but how far did we get before we'd failed.

fail3:
eventfd_ctx_put(ctx);
fail2:
fdput(irqfd);
fail1:
kfree(virqfd);

is much easier to analyse.  It's a very common pattern and IME it's more
robust than this kind of flexibility...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio-pci: PCI hot reset interface

2013-08-19 Thread Al Viro
On Wed, Aug 14, 2013 at 04:42:14PM -0600, Bjorn Helgaas wrote:
 [+cc Al, linux-fsdevel for fdget/fdput usage]

fdget/fdput use looks sane, the only thing is that I would rather
have an explicit include of linux/file.h instead of relying upon
linux/eventfd.h pulling it.  Incidentally, there are only 5 files
that include the latter without an explicit include of the former -
drivers/vfio/pci/vfio_pci.c, drivers/vhost/scsi.c, kernel/cgroup.c,
mm/memcontrol.c and mm/vmpressure.c.  And only kernel/cgroup.c (and,
with this patch, vfio_pci.c) really wants anything from linux/file.h,
so I'd rather kill that indirect include in eventfd.h and slapped
an explicit include of file.h in these two files...

BTW, most of the eventfd_fget() users might as well be using fget()
(or fdget(), for that matter).  They tend to be immediately followed
by eventfd_ctx_fileget(), which repeats the is that an eventfd file?
check anyway.

Completely untested patch below does that to kernel/cgroup.c; Tejun,
Davide - do you have any objections against the following?

Kill indirect include of file.h from eventfd.h, use fdget() in cgroup.c

kernel/cgroup.c is the only place in the tree that relies on eventfd.h
pulling file.h; move that include there.  Switch from eventfd_fget()/fput()
to fdget()/fdput(), while we are at it - eventfd_ctx_fileget() will fail
on non-eventfd descriptors just fine, no need to do that check twice...

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index cf5d2af..ff0b981 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -9,7 +9,6 @@
 #define _LINUX_EVENTFD_H
 
 #include linux/fcntl.h
-#include linux/file.h
 #include linux/wait.h
 
 /*
@@ -26,6 +25,8 @@
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
 
+struct file;
+
 #ifdef CONFIG_EVENTFD
 
 struct file *eventfd_file_create(unsigned int count, int flags);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 781845a..f88ecaf 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -60,6 +60,7 @@
 #include linux/poll.h
 #include linux/flex_array.h /* used in cgroup_attach_task */
 #include linux/kthread.h
+#include linux/file.h
 
 #include linux/atomic.h
 
@@ -3969,8 +3970,8 @@ static int cgroup_write_event_control(struct cgroup 
*cgrp, struct cftype *cft,
struct cgroup_event *event = NULL;
struct cgroup *cgrp_cfile;
unsigned int efd, cfd;
-   struct file *efile = NULL;
-   struct file *cfile = NULL;
+   struct fd efile;
+   struct fd cfile;
char *endp;
int ret;
 
@@ -3993,31 +3994,31 @@ static int cgroup_write_event_control(struct cgroup 
*cgrp, struct cftype *cft,
init_waitqueue_func_entry(event-wait, cgroup_event_wake);
INIT_WORK(event-remove, cgroup_event_remove);
 
-   efile = eventfd_fget(efd);
-   if (IS_ERR(efile)) {
-   ret = PTR_ERR(efile);
-   goto fail;
+   efile = fdget(efd);
+   if (!efile.file) {
+   ret = -EBADF;
+   goto fail1;
}
 
-   event-eventfd = eventfd_ctx_fileget(efile);
+   event-eventfd = eventfd_ctx_fileget(efile.file);
if (IS_ERR(event-eventfd)) {
ret = PTR_ERR(event-eventfd);
-   goto fail;
+   goto fail2;
}
 
-   cfile = fget(cfd);
-   if (!cfile) {
+   cfile = fdget(cfd);
+   if (!cfile.file) {
ret = -EBADF;
-   goto fail;
+   goto fail3;
}
 
/* the process need read permission on control file */
/* AV: shouldn't we check that it's been opened for read instead? */
-   ret = inode_permission(file_inode(cfile), MAY_READ);
+   ret = inode_permission(file_inode(cfile.file), MAY_READ);
if (ret  0)
goto fail;
 
-   event-cft = __file_cft(cfile);
+   event-cft = __file_cft(cfile.file);
if (IS_ERR(event-cft)) {
ret = PTR_ERR(event-cft);
goto fail;
@@ -4027,7 +4028,7 @@ static int cgroup_write_event_control(struct cgroup 
*cgrp, struct cftype *cft,
 * The file to be monitored must be in the same cgroup as
 * cgroup.event_control is.
 */
-   cgrp_cfile = __d_cgrp(cfile-f_dentry-d_parent);
+   cgrp_cfile = __d_cgrp(cfile.file-f_dentry-d_parent);
if (cgrp_cfile != cgrp) {
ret = -EINVAL;
goto fail;
@@ -4043,7 +4044,7 @@ static int cgroup_write_event_control(struct cgroup 
*cgrp, struct cftype *cft,
if (ret)
goto fail;
 
-   efile-f_op-poll(efile, event-pt);
+   efile.file-f_op-poll(efile.file, event-pt);
 
/*
 * Events should be removed after rmdir of cgroup directory, but before
@@ -4056,21 +4057,18 @@ static int cgroup_write_event_control(struct cgroup 
*cgrp, struct cftype *cft,
list_add(event-list

Re: [patch 1/2] vhost: potential integer overflows

2010-10-11 Thread Al Viro
On Mon, Oct 11, 2010 at 07:22:57PM +0200, Dan Carpenter wrote:
 I did an audit for potential integer overflows of values which get passed
 to access_ok() and here are the results.

FWIW, UINT_MAX is wrong here.  What you want is maximal size_t value.

 Signed-off-by: Dan Carpenter erro...@gmail.com
 
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index dd3d6f7..c2aa12c 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -429,6 +429,14 @@ static int vq_access_ok(unsigned int num,
   struct vring_avail __user *avail,
   struct vring_used __user *used)
  {
 +
 + if (num  UINT_MAX / sizeof *desc)
 + return 0;
 + if (num  UINT_MAX / sizeof *avail-ring - sizeof *avail)
 + return 0;
 + if (num  UINT_MAX / sizeof *used-ring - sizeof *used)
 + return 0;
 +
   return access_ok(VERIFY_READ, desc, num * sizeof *desc) 
  access_ok(VERIFY_READ, avail,
sizeof *avail + num * sizeof *avail-ring) 
 @@ -447,6 +455,9 @@ int vhost_log_access_ok(struct vhost_dev *dev)
  /* Caller should have vq mutex and device mutex */
  static int vq_log_access_ok(struct vhost_virtqueue *vq, void __user 
 *log_base)
  {
 + if (vq-num  UINT_MAX / sizeof *vq-used-ring - sizeof *vq-used)
 + return 0;
 +
   return vq_memory_access_ok(log_base, vq-dev-memory,
   vhost_has_feature(vq-dev, VHOST_F_LOG_ALL)) 
   (!vq-log_used || log_access_ok(log_base, vq-log_addr,
 @@ -606,12 +617,17 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
   }
  
   /* Also validate log access for used ring if enabled. */
 - if ((a.flags  (0x1  VHOST_VRING_F_LOG)) 
 - !log_access_ok(vq-log_base, a.log_guest_addr,
 + if (a.flags  (0x1  VHOST_VRING_F_LOG)) {
 + if (vq-num  UINT_MAX / sizeof *vq-used-ring 
 - sizeof *vq-used) {
 + r = -EINVAL;
 + break;
 + }
 + if (!log_access_ok(vq-log_base, 
 a.log_guest_addr,
  sizeof *vq-used +
  vq-num * sizeof *vq-used-ring)) {
 - r = -EINVAL;
 - break;
 + r = -EINVAL;
 + break;
 + }
   }
   }
  
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 1/2] eventfd: export eventfd interfaces for module use

2009-05-04 Thread Al Viro
On Mon, May 04, 2009 at 01:57:45PM -0400, Gregory Haskins wrote:
 @@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n)
  
   return n;
  }
 +EXPORT_SYMBOL_GPL(eventfd_signal);

perhaps, but...

 @@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd)
  
   return file;
  }
 +EXPORT_SYMBOL_GPL(eventfd_fget);

this one looks very odd.  Could you show legitimate users?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
 + /* We re-use eventfd for irqfd */
 + fd = sys_eventfd2(0, 0);
 + if (fd  0) {
 + ret = fd;
 + goto fail;
 + }
 +
 + /* We maintain a reference to eventfd for the irqfd lifetime */
 + file = eventfd_fget(fd);
 + if (IS_ERR(file)) {
 + ret = PTR_ERR(file);
 + goto fail;
 + }
 +
 + irqfd-file = file;

This is just plain wrong.  You have no promise whatsoever that caller of
that sucker won't race with e.g. dup2().  IOW, you can't assume that
file will be of the expected kind.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Sun, May 03, 2009 at 11:07:26AM -0700, Davide Libenzi wrote:
 On Sun, 3 May 2009, Al Viro wrote:
  On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
   + /* We re-use eventfd for irqfd */
   + fd = sys_eventfd2(0, 0);
   + if (fd  0) {
   + ret = fd;
   + goto fail;
   + }
   +
   + /* We maintain a reference to eventfd for the irqfd lifetime */
   + file = eventfd_fget(fd);
   + if (IS_ERR(file)) {
   + ret = PTR_ERR(file);
   + goto fail;
   + }
   +
   + irqfd-file = file;
  
  This is just plain wrong.  You have no promise whatsoever that caller of
  that sucker won't race with e.g. dup2().  IOW, you can't assume that
  file will be of the expected kind.
 
 The eventfd_fget() checks for the file_operations pointer, before 
 returning the file*, and fails if the fd in not an eventfd. Or you have 
 other concerns?

OK, but... it's still wrong.  Descriptor numbers are purely for interaction
with userland; using them that way violates very general race-prevention
rules, even if you do paper over the worst of consequences with check in
eventfd_fget().

General rules:
* descriptor you've generated is fit only for return to userland;
* descriptor you've got from userland is fit only for *single*
fget() or equivalent, unless you are one of the core syscalls manipulating
the descriptor table itself (dup2, etc.)
* once file is installed in descriptor table, you'd better be past
the last failure exit; sys_close() on cleanup path is not acceptable.
That's what reserving descriptors is for.

IOW, the sane solution would be to export something that returns your
struct file *.  And stop playing with fd completely.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


file descriptor abuses

2009-05-03 Thread Al Viro
On Sun, May 03, 2009 at 08:01:36PM +0100, Al Viro wrote:

 General rules:
   * descriptor you've generated is fit only for return to userland;
   * descriptor you've got from userland is fit only for *single*
 fget() or equivalent, unless you are one of the core syscalls manipulating
 the descriptor table itself (dup2, etc.)
   * once file is installed in descriptor table, you'd better be past
 the last failure exit; sys_close() on cleanup path is not acceptable.
 That's what reserving descriptors is for.
 
 IOW, the sane solution would be to export something that returns your
 struct file *.  And stop playing with fd completely.

Speaking of which, quick look through fget() callers shows this turd:

static int p9_socket_open(struct p9_client *client, struct socket *csocket)
{
int fd, ret;

fd = sock_map_fd(csocket, 0);
.

ret = p9_fd_open(client, fd, fd);
if (ret  0) {
P9_EPRINTK(KERN_ERR, p9_socket_open: failed to open fd\n);
sockfd_put(csocket);
return ret;
}
.
return 0;
}

where p9_fd_open() calls fget() on its 2nd and 3rd arguments.  Which does
worse than just a leak, AFAICT - on failure exit it leaves a dangling
pointer from descriptor table.

On the almost unrelated note, we have (in drivers/sneak_in^Wstaging/usbip)
sockfd_to_socket(), with all callers leaking struct file, AFAICS.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html