Re: [RFC v3 1/2] epoll: avoid spinlock contention with wfcqueue

2013-03-22 Thread Eric Wong
Arve Hjønnevåg wrote: > On Thu, Mar 21, 2013 at 8:24 PM, Eric Wong wrote: > > > > With EPOLLET and improper usage (not hitting EAGAIN), the event now > > has a larger window to be lost (as mentioned in my changelog). > > > > What about the case where EPOLLET

Re: [RFC v3 1/2] epoll: avoid spinlock contention with wfcqueue

2013-03-21 Thread Eric Wong
Arve Hjønnevåg wrote: > On Thu, Mar 21, 2013 at 4:52 AM, Eric Wong wrote: > > Changes since v2: > > * epi->state is no longer atomic, we only cmpxchg in ep_poll_callback > > now and rely on implicit barriers in other places for reading. > > * intermediate EP_STA

[RFC v2 3/2] epoll: avoid using extra cache line on most 64-bit

2013-03-21 Thread Eric Wong
ed-by: Eric Wong Cc: Mathieu Desnoyers Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton --- fs/eventpoll.c | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 1e04175..82bf483 100644 --- a/fs/eventpoll.c +++

Re: [RFC v3 1/2] epoll: avoid spinlock contention with wfcqueue

2013-03-21 Thread Eric Wong
Eric Wong wrote: > This is still not a proper commit, I've lightly tested this. Btw, full series here (already sent to LKML and some in -mm) http://yhbt.net/epoll-wfcqueue-v3.8.3-20130321.mbox (should apply cleanly to all 3.8/3.9 kernels) -- To unsubscribe from this list: send

[RFC v3 2/2] epoll: use a local wfcq functions for Level Trigger

2013-03-21 Thread Eric Wong
n wfcq and these new _local functions is large and outside of the margin of error. ref: http://www.xmailserver.org/epwbench.c Somewhat-tested-by: Eric Wong Cc: Mathieu Desnoyers Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton --- fs/eventpoll.c | 20 +++- 1 file changed, 15

[RFC v3 1/2] epoll: avoid spinlock contention with wfcqueue

2013-03-21 Thread Eric Wong
now). * minor code cleanups Lightly-tested-by: Eric Wong Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton Cc: Mathieu Desnoyers --- fs/eventpoll.c | 615 ++--- 1 file changed, 276 insertions(+), 339 deletions(-) diff --git a/fs/eventp

[PATCH] wfcqueue: functions for local append and enqueue

2013-03-21 Thread Eric Wong
-by: Eric Wong Cc: Mathieu Desnoyers Cc: Lai Jiangshan Cc: Paul E. McKenney Cc: Stephen Hemminger Cc: Davide Libenzi --- Benchmark for this coming with updated epoll patches. include/linux/wfcqueue.h | 43 +++ 1 file changed, 43 insertions(+) diff

[PATCH mm] epoll: fix suspicious RCU usage in ep_poll_callback

2013-03-20 Thread Eric Wong
The commit "epoll: use RCU to protect wakeup_source in epitem" introduced the ep_pm_stay_awake_rcu function for ep_poll_callback use, but I left it unused on accident. ep->mtx cannot be held in ep_poll_callback, so RCU should be used here. Signed-off-by: Eric Wong Cc: "Rafa

Re: [RFC v2] epoll: avoid spinlock contention with wfcqueue

2013-03-18 Thread Eric Wong
Eric Wong wrote: > Mathieu Desnoyers wrote: > > I'm also not entirely sure why you need to add enum epoll_item_state > > along with expensive atomic ops to compute the state. Wouldn't it be > > enough to know in which queue the nodes are located ? If need be, yo

Re: [RFC v2] epoll: avoid spinlock contention with wfcqueue

2013-03-18 Thread Eric Wong
Mathieu Desnoyers wrote: > * Eric Wong (normalper...@yhbt.net) wrote: > > Eric Wong wrote: > > > I'm posting this lightly tested version since I may not be able to do > > > more testing/benchmarking until the weekend. > > > > Still lightly tested (on

[RFC v2] epoll: avoid spinlock contention with wfcqueue

2013-03-18 Thread Eric Wong
Eric Wong wrote: > I'm posting this lightly tested version since I may not be able to do > more testing/benchmarking until the weekend. Still lightly tested (on an initramfs KVM, no real applications, yet). > Davide's totalmess is still running, so that's probab

Re: [RFC PATCH] Linux kernel Wait-Free Concurrent Queue Implementation

2013-03-18 Thread Eric Wong
Mathieu Desnoyers wrote: > Thanks for providing this detailed scenario. I think there is an > important aspect in the use of splice I suggested on which we are not > fully understanding each other. I will annotate your scenario below with > clarifications: Ah yes, I somehow thought splice would o

Re: [RFC PATCH] Linux kernel Wait-Free Concurrent Queue Implementation

2013-03-16 Thread Eric Wong
Eric Wong wrote: > Mathieu Desnoyers wrote: > > * Eric Wong (normalper...@yhbt.net) wrote: > > > Mathieu Desnoyers wrote: > > > > +/* > > > > + * Load a data from shared memory. > > > > + */ > > > > +#define CMM_LOAD_SHARED(p)

Re: [RFC PATCH] Linux kernel Wait-Free Concurrent Queue Implementation

2013-03-14 Thread Eric Wong
Mathieu Desnoyers wrote: > * Eric Wong (normalper...@yhbt.net) wrote: > > Mathieu Desnoyers wrote: > > > The advantage of using splice() over dequeue() is that you will reduce > > > the amount of interactions between concurrent enqueue and dequeue > > > op

Re: [RFC PATCH] Linux kernel Wait-Free Concurrent Queue Implementation

2013-03-14 Thread Eric Wong
Mathieu Desnoyers wrote: > * Eric Wong (normalper...@yhbt.net) wrote: > > Mathieu Desnoyers wrote: > > > +/* > > > + * Load a data from shared memory. > > > + */ > > > +#define CMM_LOAD_SHARED(p) ACCESS_ONCE(p) > > > > When

[RFC] epoll: avoid spinlock contention with wfcqueue

2013-03-13 Thread Eric Wong
4.mbox (should apply cleanly to 3.9-rc* since there's no epoll changes in that) --8<--- >From 139f0d4528c3fabc6a54e47be73ba9990b42cdd8 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Thu, 14 Mar 2013 02:37:12 + Subject: [PATCH] epoll:

Re: [RFC PATCH] Linux kernel Wait-Free Concurrent Queue Implementation

2013-03-13 Thread Eric Wong
Mathieu Desnoyers wrote: > Ported to the Linux kernel from Userspace RCU library, at commit > 108a92e5b97ee91b2b902dba2dd2e78aab42f420. > > Ref: http://git.lttng.org/userspace-rcu.git > > It is provided as a starting point only. Test cases should be ported > from Userspace RCU to kernel space an

[PATCH] epoll: cleanup: hoist out f_op->poll calls

2013-03-13 Thread Eric Wong
This reduces the amount of code inside the ready list iteration loops for better readability IMHO. Signed-off-by: Eric Wong Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton --- (I think this depends on my RCU wakeup source patch sitting in -mm) fs/eventpoll.c | 22 -- 1

[PATCH mm] epoll: lock ep->mtx in ep_free to silence lockdep

2013-03-13 Thread Eric Wong
Technically we do not need to hold ep->mtx during ep_free since we are certain there are no other users of ep at that point. However, lockdep complains with a "suspicious rcu_dereference_check() usage!" message; so lock the mutex before ep_remove to silence the warning. Signed-off-

Re: [PATCH] epoll: fix sparse error on RCU assignment

2013-03-13 Thread Eric Wong
Oleg Nesterov wrote: > On 03/10, Eric Wong wrote: > > > > This fixes the following sparse error when using > > CONFIG_SPARSE_RCU_POINTER=y and "make C=2 fs/eventpoll.o" > > > > fs/eventpoll.c:514:17: error: incompatible types in compariso

Re: epoll: possible bug from wakeup_source activation

2013-03-11 Thread Eric Wong
Arve Hjønnevåg wrote: > On Mon, Mar 11, 2013 at 5:17 PM, Eric Wong wrote: > > Arve Hjønnevåg wrote: > >> On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong wrote: > >> > Arve Hjønnevåg wrote: > >> >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong > >&g

Re: epoll: possible bug from wakeup_source activation

2013-03-11 Thread Eric Wong
Arve Hjønnevåg wrote: > On Fri, Mar 8, 2013 at 11:10 PM, Eric Wong wrote: > > Arve Hjønnevåg wrote: > >> On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong wrote: > >> > What happens if ep_modify calls ep_destroy_wakeup_source > >> > while __pm_stay_awake is

wfcqueue (in Userspace RCU) for Linux kernel (for epoll)

2013-03-11 Thread Eric Wong
Hi, I'm looking to reduce contention for the ep->lock spin lock in epoll. I came across wfcqueue in Userspace RCU and am wondering if there's any reason (other that lack of developer time/users) it hasn't been adapted for the Linux kernel. I'd be happy to do the work if it's suitable (and omit pa

[PATCH] epoll: use RCU protect wakeup_source in epitem

2013-03-10 Thread Eric Wong
Eric Dumazet wrote: > On Sun, 2013-03-10 at 01:11 +0000, Eric Wong wrote: > > > > static void ep_destroy_wakeup_source(struct epitem *epi) > > { > > - wakeup_source_unregister(epi->ws); > > - epi->ws = NULL; > > + struct wakeup_source *ws =

[PATCH] epoll: fix sparse error on RCU assignment

2013-03-10 Thread Eric Wong
Cc: Oleg Nesterov Signed-off-by: Eric Wong --- Oleg: I found this error since I was working on an unrelated patch to convert wakeup_source users to RCU in epoll. This was introduced in: commit 971316f0503a5c50633d07b83b6db2f15a3a5b00 (epoll: ep_unregister_pollwait() can use the freed pwq-&g

Re: epoll: possible bug from wakeup_source activation

2013-03-09 Thread Eric Wong
Eric Wong wrote: > Arve Hjønnevåg wrote: > > On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong wrote: > > > What happens if ep_modify calls ep_destroy_wakeup_source > > > while __pm_stay_awake is running on the same epi->ws? > > > > Yes,

Re: epoll: possible bug from wakeup_source activation

2013-03-08 Thread Eric Wong
Arve Hjønnevåg wrote: > On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong wrote: > > What happens if ep_modify calls ep_destroy_wakeup_source > > while __pm_stay_awake is running on the same epi->ws? > > Yes, that looks like a problem. I think calling > ep_destroy_wakeu

[PATCH] epoll: comment + BUILD_BUG_ON to prevent epitem bloat

2013-03-08 Thread Eric Wong
This will prevent us from accidentally introducing a memory bloat regression here in the future. Signed-off-by: Eric Wong Cc: Andrew Morton Cc: Davide Libenzi , Cc: Al Viro --- Andrew Morton wrote: > On Thu, 7 Mar 2013 10:32:40 +0000 Eric Wong wrote: > > > Andrew M

Re: epoll: possible bug from wakeup_source activation

2013-03-08 Thread Eric Wong
Arve Hjønnevåg wrote: > On Thu, Mar 7, 2013 at 5:30 PM, Eric Wong wrote: > > Eric Wong wrote: > >> Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238 > >> (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...) > >> > >> I think t

Re: [PATCH 2/2] epoll: add tracepoints for epitem enqueue/dequeue

2013-03-07 Thread Eric Wong
Putting this on hold for now. I'm awaiting answers for <20130307112639.ga25...@dcvr.yhbt.net>, (Subject: epoll: possible bug from wakeup_source activation) this patch may hide the possible bug I'm referring to in that email. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel

Re: epoll: possible bug from wakeup_source activation

2013-03-07 Thread Eric Wong
Eric Wong wrote: > Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238 > (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...) > > I think the reason for using ep->ws instead of epi->ws in the unlikely > ovflist case applies to the likely rdllist case, t

epoll: possible bug from wakeup_source activation

2013-03-07 Thread Eric Wong
Hi Arve, looking at commit 4d7e30d98939a0340022ccd49325a3d70f7e0238 (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend ...) I think the reason for using ep->ws instead of epi->ws in the unlikely ovflist case applies to the likely rdllist case, too. Since epi->ws is only protected by ep->mtx, it

Re: [PATCH] epoll: trim epitem by one cache line on x86_64

2013-03-07 Thread Eric Wong
-x86_64-fix > +++ a/fs/eventpoll.c > @@ -105,7 +105,7 @@ > struct epoll_filefd { > struct file *file; > int fd; > -} EPOLL_PACKED; > +} __packed; Thanks for testing on ppc. Looks good to me. For what it's worth: Acked-by: Eric Wong -- To unsubscribe

[PATCH] epoll: trim epitem by one cache line on x86_64

2013-03-04 Thread Eric Wong
LL_PACKED instead of __attribute__((packed)) Signed-off-by: Eric Wong Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton --- fs/eventpoll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index cfc4b16..06f3d0e 100644 --- a/fs/eventpoll.c

Re: sendfile and EAGAIN

2013-03-04 Thread Eric Wong
Ulrich Drepper wrote: > On Mon, Feb 25, 2013 at 2:22 PM, Eric Dumazet wrote: > > I don't understand the issue. > > > > sendfile() returns -EAGAIN only if no bytes were copied to the socket. > > There is something wrong/unexpected/... > > I have a program which can use either sendfile or send.

[PATCH 2/2] epoll: add tracepoints for epitem enqueue/dequeue

2013-03-03 Thread Eric Wong
-off-by: Eric Wong Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton --- fs/eventpoll.c | 35 +++- include/linux/eventpoll.h| 8 +++ include/trace/events/eventpoll.h | 113 +++ 3 files changed, 144 insertions(+), 12

[PATCH 1/2] epoll: hoist out duplicated wake up logic

2013-03-03 Thread Eric Wong
This makes the kernel slightly smaller, and hopefully easier to follow. Signed-off-by: Eric Wong Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton --- fs/eventpoll.c | 27 +++ 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/fs/eventpoll.c b/fs

Re: [PATCH] epoll: preserve ordering of events from ovflist

2013-03-03 Thread Eric Wong
Eric Wong wrote: > Events arriving in ovflist are stored in LIFO order, so > we should account for that when inserting them into rddlist. Fwiw, I noticed this oddity because I wanted to start tracing epitem readiness (to detect when my application is not calling epoll_wait() fast enoug

[PATCH] epoll: preserve ordering of events from ovflist

2013-03-01 Thread Eric Wong
Events arriving in ovflist are stored in LIFO order, so we should account for that when inserting them into rddlist. Signed-off-by: Eric Wong Cc: Davide Libenzi Cc: Al Viro Cc: Andrew Morton --- I think this can lead to starvation in some rare cases, but I have not been able to trigger it

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Eric Wong
"Myklebust, Trond" wrote: > > -Original Message- > > From: Zach Brown [mailto:z...@redhat.com] > > Sent: Thursday, February 21, 2013 5:25 PM > > To: Myklebust, Trond > > Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; > > linux-kernel@vger.kernel.org; > > Chris L. Mason; Christoph Hellwig

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2013-02-22 Thread Eric Wong
Phillip Susi wrote: > > On Sat, Dec 15, 2012 at 12:54:48AM +, Eric Wong wrote: > >> "strace -T" timing on an uncached, one gigabyte file: > >> > >> Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832> > >> After: fadv

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Eric Wong
Jeremy Allison wrote: > On Thu, Feb 21, 2013 at 01:51:53PM +, Myklebust, Trond wrote: > > On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote: > > > We have debated the need to have a system call to allow for offloading > > > copy > > > operations, for example to an NFS server (part to the

Re: [PATCH 1/1] eventfd: implementation of EFD_MASK flag

2013-02-09 Thread Eric Wong
Martin Sustrik wrote: > On 09/02/13 04:54, Eric Wong wrote: > >>>Using one eventfd per userspace socket still seems a bit wasteful. > >> > >>Wasteful in what sense? Occupying a slot in file descriptor table? > >>That's the price for having the socke

Re: [PATCH 1/1] eventfd: implementation of EFD_MASK flag

2013-02-08 Thread Eric Wong
Martin Sustrik wrote: > On 08/02/13 23:21, Eric Wong wrote: > >Martin Sustrik wrote: > >>To address the question, I've written down detailed description of > >>the challenges of the network protocol development in user space and > >>how the proposed fe

Re: [PATCH 1/1] eventfd: implementation of EFD_MASK flag

2013-02-08 Thread Eric Wong
Martin Sustrik wrote: > On 07/02/13 23:44, Andrew Morton wrote: > >That's a nice changelog but it omitted a critical thing: why do you > >think the kernel needs this feature? What's the value and use case for > >being able to poll these descriptors? > > To address the question, I've written down

Re: [PATCH 1/1] eventfd: implementation of EFD_MASK flag

2013-02-08 Thread Eric Wong
Andy Lutomirski wrote: > On Thu, Feb 7, 2013 at 12:11 PM, Martin Sustrik wrote: > > On 07/02/13 20:12, Andy Lutomirski wrote: > >> On 02/06/2013 10:41 PM, Martin Sustrik wrote: > >>> The value of 'events' should be any combination of event flags as defined > >>> by > >>> poll(2) function (POLLIN,

Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+

2013-02-07 Thread Eric Wong
David Miller wrote: > From: Eric Dumazet > Date: Fri, 18 Jan 2013 22:13:16 -0800 > > > On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote: > > > >> > >> Hmm, this might be already fixed in net-next tree, could you try it ? > >> > > > > Yes, running your program on net-next seems OK. > >

Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+

2013-01-18 Thread Eric Wong
Eric Dumazet wrote: > On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote: > > Hmm, this might be already fixed in net-next tree, could you try it ? > > Yes, running your program on net-next seems OK. > > David, we need the two following commits. > commit 9ca1b22d6d228177e6f929f6818a1cd3d5e30

splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+

2013-01-18 Thread Eric Wong
With the following flow, I'm sometimes getting an unexpected EOF on the pipe reader even though I never close the pipe writer: tcp_wr -write-> tcp_rd -splice-> pipe_wr -> pipe_rd -splice-> /dev/null I encounter this in in 3.7.3, 3.8-rc3, and the latest from Linus 3.8-rc4+(5da1f88b8b727dc3a66c52

Re: 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket

2013-01-10 Thread Eric Wong
uld be allowed to come even without ACK bit set. We validate > the RST by checking the exact sequence, as requested by RFC 793 and > 5961 3.2, in tcp_validate_incoming() > > Reported-by: Eric Wong > Signed-off-by: Eric Dumazet All good here, thanks for the quick turnaround! Te

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Eric Wong
Mel Gorman wrote: > mm: compaction: Partially revert capture of suitable high-order page > Reported-by: Eric Wong > Cc: sta...@vger.kernel.org > Signed-off-by: Mel Gorman Thanks, my original use case and test works great after several hours! Tested-by: Eric Wong Unfortuna

3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket

2013-01-10 Thread Eric Wong
writes # it to the TCP:#{addr}:#{port}, the server response goes to fifos[1], # which the above dd(1) invocation reads the first 4K of. # This socat is expected to error out with EPIPE here ) | socat - TCP:#{addr}:#{port} > #{fifos[1]} || : echo "Waiting on #{fifos[0]} for client=$clie

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Eric Wong
Mel Gorman wrote: > Thanks Eric, it's much appreciated. However, I'm still very much in favour > of a partial revert as in retrospect the implementation of capture took the > wrong approach. Could you confirm the following patch works for you? > It's should functionally have the same effect as the

Re: [v2] fadvise: perform WILLNEED readahead asynchronously

2013-01-10 Thread Eric Wong
Riccardo Magliocchetti wrote: > Hello, > > Il 25/12/2012 03:22, Eric Wong ha scritto: > > Any other (Free Software) applications that might benefit from > > lower FADV_WILLNEED latency? > > Not with fadvise but with madvise. Libreoffice / Openoffice.org have >

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Eric Wong
Mel Gorman wrote: > page->pfmemalloc can be left set for captured pages so try this but as > capture is rarely used I'm strongly favouring a partial revert even if > this works for you. I haven't reproduced this using your workload yet > but I have found that high-order allocation stress tests for

Re: [PATCH v2] fadvise: perform WILLNEED readahead asynchronously

2013-01-09 Thread Eric Wong
Simon Jeons wrote: > On Tue, 2012-12-25 at 02:22 +0000, Eric Wong wrote: > > Please add changelog. Changes since v1: * separate unbound workqueue for high-priority tasks * account for inflight readahead to avoid denial-of-service * limit concurrency for non-high-priority tasks (

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Eric Wong
Mel Gorman wrote: > When I looked at it for long enough I found a number of problems. Most > affect timing but two serious issues are in there. One affects how long > kswapd spends compacting versus reclaiming and the other increases lock > contention meaning that async compaction can abort early.

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Eric Wong
Eric Wong wrote: > Oops, I had to restart my test :x. However, I was able to reproduce the > issue very quickly again with your patch. I've double-checked I'm > booting into the correct kernel, but I do have more load on this > laptop host now, so maybe that made it happen

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Eric Wong
Eric Wong wrote: > Eric Dumazet wrote: > > On Tue, 2013-01-08 at 18:32 -0800, Eric Dumazet wrote: > > > Hmm, it seems sk_filter() can return -ENOMEM because skb has the > > > pfmemalloc() set. > > > > > > > > One TCP socket keeps retransmitt

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Wong
ete 4117, find 585/628 Free swap = 376288kB Total swap = 392188kB 131054 pages RAM 3820 pages reserved 280574 pages shared 116800 pages non-shared -- Eric Wong -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Mo

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Wong
Mel Gorman wrote: > Please try the following patch. However, even if it works the benefit of > capture may be so marginal that partially reverting it and simplifying > compaction.c is the better decision. I already got my VM stuck on this one. I had two twosleepy instances, 2774 was the one that

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Wong
Eric Wong wrote: > Mel Gorman wrote: > > Right now it's difficult to see how the capture could be the source of > > this bug but I'm not ruling it out either so try the following (untested > > but should be ok) patch. It's not a proper revert, it just disabl

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-07 Thread Eric Wong
Eric Dumazet wrote: > It would not surprise me if sk_stream_wait_memory() have plain bug(s) or > race(s). > > In 2010, in commit 482964e56e132 Nagendra Tomar fixed a pretty severe > long standing bug. > > This path is not taken very often on most machines. > > I would try the following patch :

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-07 Thread Eric Wong
Mel Gorman wrote: > Right now it's difficult to see how the capture could be the source of > this bug but I'm not ruling it out either so try the following (untested > but should be ok) patch. It's not a proper revert, it just disables the > capture page logic to see if it's at fault. Things loo

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-06 Thread Eric Wong
Mel Gorman wrote: > Using a 3.7.1 or 3.8-rc2 kernel, can you reproduce the problem and then > answer the following questions please? This is on my main machine running 3.8-rc2 > 1. What are the contents of /proc/vmstat at the time it is stuck? ===> /proc/vmstat <=== nr_free_pages 40305 nr_inact

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-04 Thread Eric Wong
Mel Gorman wrote: > On Wed, Jan 02, 2013 at 08:08:48PM +0000, Eric Wong wrote: > > Instead, I disabled THP+compaction under v3.7.1 and I've been unable to > > reproduce the issue without THP+compaction. > > > > Implying that it's stuck in compaction s

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-04 Thread Eric Wong
Mel Gorman wrote: > On Wed, Jan 02, 2013 at 08:08:48PM +0000, Eric Wong wrote: > > Instead, I disabled THP+compaction under v3.7.1 and I've been unable to > > reproduce the issue without THP+compaction. > > > > Implying that it's stuck in compaction s

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Wong wrote: > Eric Wong wrote: > > I think this requires frequent dirtying/cycling of pages to reproduce. > > (from copying large files around) to interact with compaction. > > I'll see if I can reproduce the issue with read-only FS activity. > > Still su

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Wong wrote: > I think this requires frequent dirtying/cycling of pages to reproduce. > (from copying large files around) to interact with compaction. > I'll see if I can reproduce the issue with read-only FS activity. Still successfully running the read-only test on my main

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Wong wrote: > Eric Dumazet wrote: > > With the following patch, I cant reproduce the 'apparent stuck' > > Right, the output is just an approximation and the logic there > was bogus. > > Thanks for looking at this. I'm still able to reproduce the i

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Dumazet wrote: > On Wed, 2013-01-02 at 20:47 +0000, Eric Wong wrote: > > Eric Wong wrote: > > > [1] my full setup is very strange. > > > > > > Other than the FUSE component I forgot to mention, little depends on > > > the kernel. Wi

Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

2013-01-02 Thread Eric Wong
Eric Wong wrote: > Linus Torvalds wrote: > > Please document the barrier that this mb() pairs with, and then give > > an explanation for the fix in the commit message, and I'll happily > > take it. Even if it's just duplicating the comments above the > &

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-02 Thread Eric Wong
Eric Wong wrote: > [1] my full setup is very strange. > > Other than the FUSE component I forgot to mention, little depends on > the kernel. With all this, the standalone toosleepy can get stuck. > I'll try to reproduce it with less... I just confirmed my toos

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-02 Thread Eric Wong
(changing Cc:) Eric Wong wrote: > I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a > local TCP socket. The isolated code below can reproduces the issue > after many minutes (<1 hour). It might be easier to reproduce on > a busy system while disk I/O is ha

Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

2013-01-02 Thread Eric Wong
Eric Dumazet wrote: > On Wed, 2013-01-02 at 18:40 +0000, Eric Wong wrote: > > Eric Dumazet wrote: > > > It seems the real problem is the epi->event.events = event->events; > > > which is done without taking ep->lock > > > > Yes. I am hoping it is

Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

2013-01-02 Thread Eric Wong
Eric Dumazet wrote: > First, thanks for working on this issue. No problem! > It seems the real problem is the epi->event.events = event->events; > which is done without taking ep->lock Yes. I am hoping it is possible to do it without a lock there, but your change is more obviously correct. >

[PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

2013-01-01 Thread Eric Wong
-- 8< >From 02f43757d04bb6f2786e79eecf1cfa82e6574379 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 1 Jan 2013 21:20:27 + Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD EPOLL_CTL_MOD sets the interest mask before calling f_o

Re: [PATCH] poll: prevent missed events if _qproc is NULL

2013-01-01 Thread Eric Wong
Eric Wong wrote: > Eric Dumazet wrote: > > commit 626cf236608505d376e4799adb4f7eb00a8594af should not have this > > side effect, at least for poll()/select() functions. The epoll() changes > > I am not yet very confident. > > I have a better explanation of the e

Re: [PATCH] poll: prevent missed events if _qproc is NULL

2013-01-01 Thread Eric Wong
Eric Dumazet wrote: > On Mon, 2012-12-31 at 13:21 +0000, Eric Wong wrote: > > This patch seems to fix my issue with ppoll() being stuck on my > > SMP machine: http://article.gmane.org/gmane.linux.file-systems/70414 > > > > The change to soc

Re: [PATCH] poll: prevent missed events if _qproc is NULL

2012-12-31 Thread Eric Wong
Eric Wong wrote: > This patch seems to fix my issue with ppoll() being stuck on my > SMP machine: http://article.gmane.org/gmane.linux.file-systems/70414 OK, it doesn't fix my issue, but it seems to make it harder-to-hit... > The change to sock_poll_wait

[PATCH] poll: prevent missed events if _qproc is NULL

2012-12-31 Thread Eric Wong
() barrier in poll_schedule_timeout() appears to be insufficient on my SMP x86-64 machine (as it's only an xchg()). This may also be related to the epoll issue described by Andreas Voellmy in http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong Cc: Hans Verkuil Cc:

Re: ppoll() stuck on POLLIN while TCP peer is sending

2012-12-29 Thread Eric Wong
Eric Wong wrote: > Eric Wong wrote: > > I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a > > local TCP socket. The isolated code below can reproduces the issue > > after many minutes (<1 hour). It might be easier to reproduce on > > a busy

Re: ppoll() stuck on POLLIN while TCP peer is sending

2012-12-27 Thread Eric Wong
Eric Wong wrote: > I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a > local TCP socket. The isolated code below can reproduces the issue > after many minutes (<1 hour). It might be easier to reproduce on > a busy system while disk I/O is happening.

ppoll() stuck on POLLIN while TCP peer is sending

2012-12-27 Thread Eric Wong
&s, NULL, send_loop, &pair_a[1])); assert(0 == pthread_join(s, NULL)); assert(0 == pthread_join(rs, NULL)); assert(0 == pthread_join(r, NULL)); return 0; } 8< Any help/suggestions/test patches would be greatly

[PATCH v2] fadvise: perform WILLNEED readahead asynchronously

2012-12-24 Thread Eric Wong
Cc: Alan Cox Cc: Dave Chinner Cc: Zheng Liu Signed-off-by: Eric Wong --- I have not tested on NUMA (since I've no access to NUMA hardware) and do not know how the use of the workqueue affects RA performance. I'm only using WQ_UNBOUND on non-NUMA, though. I'm halfway tempte

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-21 Thread Eric Wong
"Junchang(Jason) Wang" wrote: > We still believe this is a bug in epoll system even though we can't > prove that so far. Both Andi and I are very interested in this problem > and helping you experts solve this it. Just let us know if we can > help. I'm just another epoll user, definitely not an e

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-20 Thread Eric Wong
Andreas Voellmy wrote: > I wrote a C program that behaves similar to my original program and > triggers the bug. The bug only arises when I use enough cores and > threads (about 16). The program is here: > https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c I finally took a closer l

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-17 Thread Eric Wong
Andreas Voellmy wrote: > There were a couple of errors in the code when I posted my last > message. I have fixed those. The epoll bug still occurs. Sorry I haven't gotten around to this. Can you reproduce this with fewer cores? (I only have 4 at most). Have you tried the latest stable kernel ve

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner wrote: > On Sun, Dec 16, 2012 at 03:35:49AM +0000, Eric Wong wrote: > > Dave Chinner wrote: > > > On Sun, Dec 16, 2012 at 12:25:49AM +, Eric Wong wrote: > > > > Alan Cox wrote: > > > > > On Sat, 15 Dec 2

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner wrote: > On Sun, Dec 16, 2012 at 03:59:53AM +0000, Eric Wong wrote: > > I want the first read() to happen sooner than it would under current > > fadvise. > > You're not listening. You do not need the kernel to be modified to > avoid the latency of

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner wrote: > On Sun, Dec 16, 2012 at 03:04:42AM +0000, Eric Wong wrote: > > Dave Chinner wrote: > > > On Sat, Dec 15, 2012 at 12:54:48AM +, Eric Wong wrote: > > > > > > > > Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner wrote: > On Sun, Dec 16, 2012 at 12:25:49AM +0000, Eric Wong wrote: > > Alan Cox wrote: > > > On Sat, 15 Dec 2012 00:54:48 +0000 > > > Eric Wong wrote: > > > > > > > Applications streaming large files may want to reduce disk spi

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Eric Wong wrote: > Perhaps squashing something like the following will work? Last hunk should've had a return before skip_ra: --- a/mm/readahead.c +++ b/mm/readahead.c @@ -264,6 +266,10 @@ void wq_page_cache_readahead(struct address_space *mapping, struct file *filp, req->

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner wrote: > On Sat, Dec 15, 2012 at 12:54:48AM +0000, Eric Wong wrote: > > Applications streaming large files may want to reduce disk spinups and > > I/O latency by performing large amounts of readahead up front. > > Applications also tend to read files soon a

Re: resend--[PATCH] improve read ahead in kernel

2012-12-15 Thread Eric Wong
xtu4 wrote: > resend it, due to format error > > Subject: [PATCH] when system in low memory scenario, imaging there is a mp3 > play, ora video play, we need to read mp3 or video file > from memory to page cache,but when system lack of memory, > page cache of mp3 or video file will be reclaimed

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Alan Cox wrote: > On Sat, 15 Dec 2012 00:54:48 + > Eric Wong wrote: > > > Applications streaming large files may want to reduce disk spinups and > > I/O latency by performing large amounts of readahead up front > > How does it compare benchmark wise with

[PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-14 Thread Eric Wong
hange should not hurt existing applications. "strace -T" timing on an uncached, one gigabyte file: Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832> After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.61> Signed-off-by: Eric Wong --- N.B.: I'm not

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Eric Wong
Andreas Voellmy wrote: > Using strace, I checked that my program is using epoll api as I > described. Here is a fragment of the strace output that demonstrates > my use: > > recvfrom(161, "GET / HTTP/1.1\r\nHost: 10.12.0.1:"..., 90, 0, NULL, NULL) = 90 > sendto(161, "HTTP/1.1 200 OK\r\nDate: Tue

Re: [RFC/PATCH] epoll: replace EPOLL_CTL_DISABLE with EPOLL_CTL_POKE

2012-11-06 Thread Eric Wong
Christof Meerwald wrote: > On Fri, 2 Nov 2012 04:13:12 +0000, Eric Wong wrote: > [...] > > EPOLL_CTL_POKE may be used to force an item into the epoll > > ready list. Instead of disabling an item asynchronously > > via EPOLL_CTL_DISABLE, this forces the threads calling &g

[RFC/PATCH] epoll: replace EPOLL_CTL_DISABLE with EPOLL_CTL_POKE

2012-11-01 Thread Eric Wong
regular kernel hacker, either) >From 12a2d7c4584605dd763c7510140666d2a6b51d89 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Fri, 2 Nov 2012 03:47:08 + Subject: [PATCH] epoll: replace EPOLL_CTL_DISABLE with EPOLL_CTL_POKE EPOLL_CTL_POKE may be used to force an item into the epoll ready

<    1   2   3   >