Re: [RFC PATCH v2 02/13] epoll: introduce user structures for polling from userspace

2019-01-22 Thread Roman Penyaev

On 2019-01-21 22:34, Linus Torvalds wrote:

So I'm not entirely convinced, but I guess actual numbers and users
might convince me otherwise.

However, a quick comment:

On Tue, Jan 22, 2019 at 9:15 AM Roman Penyaev  wrote:


+struct epoll_uitem {
+   __poll_t ready_events;
+   struct epoll_event event;
+};


This really ends up being a horrible data structure.

struct epoll_event is declared as

struct epoll_event {
__poll_t events;
__u64 data;
} EPOLL_PACKED;

and __poll_t is "unsigned". So on pretty much all 64-bit architectures
except for x86-64 (which sets that packed attribute), you have a
packing hole there in between the events and the data, and "struct
epoll_event" has 8-byte alignment.

Now, in "struct epoll_uitem", you end up having *another* packing hold
in between "ready_events" and "struct epoll_event".

So this data structure that has 16 bytes of actual data, ends up being
24 bytes in size.

Again, x86-64 happens to be the exception to this, but that's a random
small implementation detail, not a design thing.

I think "struct epoll_event" was badly designed to begin with to have
this issue, but it shouldn't then be an excuse to make things even
worse with this array of "struct epoll_uitem" things.

Hmm?


Ha! Yes, you are right.  Eyes see "packed" and brain responds
"ok, this is 12 bytes, + 4 for ready_events = 16, perfect".
I have not paid any attention to how actually this EPOLL_PACKED is
defined.  Not nice at all.  I will unfold the structure like this:

/*
 * Item, shared with userspace.  Unfortunately we can't embed 
epoll_event

 * structure, because it is badly aligned on all 64-bit archs, except
 * x86-64 (see EPOLL_PACKED).  sizeof(epoll_uitem) == 16
 */
struct epoll_uitem {
__poll_t ready_events;
__poll_t events;
__u64 data;
};

Also BUILD_BUG_ON(sizeof(epoll_uitem) != 16) somewhere in alloc won't
hurt.

--
Roman




Re: [RFC PATCH v2 02/13] epoll: introduce user structures for polling from userspace

2019-01-21 Thread Linus Torvalds
So I'm not entirely convinced, but I guess actual numbers and users
might convince me otherwise.

However, a quick comment:

On Tue, Jan 22, 2019 at 9:15 AM Roman Penyaev  wrote:
>
> +struct epoll_uitem {
> +   __poll_t ready_events;
> +   struct epoll_event event;
> +};

This really ends up being a horrible data structure.

struct epoll_event is declared as

struct epoll_event {
__poll_t events;
__u64 data;
} EPOLL_PACKED;

and __poll_t is "unsigned". So on pretty much all 64-bit architectures
except for x86-64 (which sets that packed attribute), you have a
packing hole there in between the events and the data, and "struct
epoll_event" has 8-byte alignment.

Now, in "struct epoll_uitem", you end up having *another* packing hold
in between "ready_events" and "struct epoll_event".

So this data structure that has 16 bytes of actual data, ends up being
24 bytes in size.

Again, x86-64 happens to be the exception to this, but that's a random
small implementation detail, not a design thing.

I think "struct epoll_event" was badly designed to begin with to have
this issue, but it shouldn't then be an excuse to make things even
worse with this array of "struct epoll_uitem" things.

Hmm?

  Linus


[RFC PATCH v2 02/13] epoll: introduce user structures for polling from userspace

2019-01-21 Thread Roman Penyaev
This one introduces structures of user items array:

struct user_epheader -
describes inserted epoll items.

struct user_epitem -
single epoll item, visible to userspace.

Signed-off-by: Roman Penyaev 
Cc: Andrew Morton 
Cc: Davidlohr Bueso 
Cc: Jason Baron 
Cc: Al Viro 
Cc: "Paul E. McKenney" 
Cc: Linus Torvalds 
Cc: Andrea Parri 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventpoll.c |  9 +
 include/uapi/linux/eventpoll.h | 19 +++
 2 files changed, 28 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2cc183e86a29..f598442512f3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -9,6 +9,8 @@
  *
  *  Davide Libenzi 
  *
+ *  Polling from userspace support by Roman Penyaev 
+ *  (C) Copyright 2019 SUSE, All Rights Reserved
  */
 
 #include 
@@ -109,6 +111,13 @@
 
 #define EP_ITEM_COST (sizeof(struct epitem) + sizeof(struct eppoll_entry))
 
+/*
+ * That is around 1.3mb of allocated memory for one epfd.  What is more
+ * important is ->index_length, which should be ^2, so do not increase
+ * max items number to avoid size doubling of user index.
+ */
+#define EP_USERPOLL_MAX_ITEMS_NR 65536
+
 struct epoll_filefd {
struct file *file;
int fd;
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index 39dfc29f0f52..690a625ddeb2 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -79,4 +79,23 @@ struct epoll_event {
__u64 data;
 } EPOLL_PACKED;
 
+struct epoll_uitem {
+   __poll_t ready_events;
+   struct epoll_event event;
+};
+
+#define EPOLL_USERPOLL_HEADER_SIZE  128
+#define EPOLL_USERPOLL_HEADER_MAGIC 0xeb01eb01
+
+struct epoll_uheader {
+   u32 magic;  /* epoll user header magic */
+   u32 header_length;  /* length of the header + items */
+   u32 index_length;   /* length of the index ring, always pow2 */
+   u32 max_items_nr;   /* max number of items */
+   u32 head;   /* updated by userland */
+   u32 tail;   /* updated by kernel */
+
+   struct epoll_uitem items[] __aligned(EPOLL_USERPOLL_HEADER_SIZE);
+};
+
 #endif /* _UAPI_LINUX_EVENTPOLL_H */
-- 
2.19.1